control(speedglm)
control()所属R语言包:speedglm
Miscellanea of functions
杂记功能
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Utility functions for least squares estimation in large data sets.
利用最小二乘估计在大型数据集的功能。
用法----------Usage----------
control(B, symmetric = TRUE, tol.values = 1e-7, tol.vectors = 1e-7, out.B = TRUE, method = "eigen")
cp(X, w = NULL, row.chunk = NULL, sparse = FALSE)
is.sparse(X, sparselim = .9, camp = .05)
参数----------Arguments----------
参数:B
a squared matrix.
一个方形矩阵。
参数:symmetric
logical, is B symmetric?
逻辑,是B对称的吗?
参数:tol.values
tolerance to be consider eigenvalues equals to zero.
公差要考虑特征值等于零。
参数:tol.vectors
tolerance to be consider eigenvectors equals to zero.
耐受性要考虑的特征向量等于零。
参数:out.B
Have the matrix B to be returned?
有矩阵B要返回?
参数:method
the method to check for singularity. By default is "eigen", and an eigendecomposition of X'X is made. The "Cholesky" method is faster than "eigen" and does not use tolerance, but the former seems to be more stable for opportune tolerance values.
奇异的方法来检查。默认情况下是“本征”,XX的特征值分解的。的Cholesky法“的速度比”特征“,不使用耐受性,但前者似乎是适当的公差值更稳定。
参数:X
the model matrix.
模型矩阵。
参数:w
a weights vector.
一个权重向量。
参数:sparse
logical, is X sparse?
逻辑,是X稀疏?
参数:sparselim
a real in the interval [0; 1]. It indicates the minimal proportion of zeroes in the data matrix X in order to consider X as sparse
一个真正的在区间[0,1]。它表示零在数据矩阵X的最小比例,以便考虑X为稀疏
参数:row.chunk
an integer which indicates the total rows number compounding each of the first g-1 blocks. If row.chunk is not a divisor of nrow(X), the g-th block will be formed by the remaining data.
一个整数,它表示,配合每个的第一克-1块的总的行数。如果row.chunk是不是的NROW(X)的除数,第g块将形成由其它数据。
参数:camp
the sample proportion of elements of X on which the survey will be based.
的X元素将根据调查的样本比例。
Details
详细信息----------Details----------
Function control makes an eigendecomposition of B according established values of tolerance. Function cp makes the cross-product X'X by partitioning X in row-blocks. When an optimized BLAS, such as ATLAS, is not installed, the function represents an attempt to speed up the calculation and avoid overflows with medium-large data sets loaded in R memory. The results depending on processor type. Good results are obtained, for example, with an AMD Athlon dual core 1.5 Gb RAM by setting row.chunk to some value less than 1000. Try the example below by changing the matrix size and the value of row.chunk. If the matrix X is sparse, it will have class "dgCMatrix" (the package Matrix is required) and the cross-product will be made without partitioning. However, good performances are usually obtained with a very high zeroes proportion. Function is.sparse makes a quick sample survey on sample proportion of zeroes in X.
函数control进行特征值分解的B根据既定的公差值。功能cp使得跨产品XX行块分区中的X。功能优化的BLAS,ATLAS,如不安装,试图加快与R内存中加载的中大型数据集的计算,避免溢出。结果取决于处理器的类型。取得良好的效果,例如,AMD速龙双核1.5 GB RAM通过设置row.chunk一定的价值低于1000。尝试下面的例子中,通过改变矩阵大小和row.chunk的值。如果矩阵X是稀疏的,它将有类“dgCMatrix”(程序包矩阵是必需的),和将未分割的交叉乘积。然而,通常是获得良好的性能具有非常高的零比例。函数is.sparse是一个快速抽样调查样本比例的零十
值----------Value----------
for the function control, a list with the following elements: <table summary="R valueblock"> <tr valign="top"><td>XTX</td> <td> the matrix product B without singularities (if there are).</td></tr> <tr valign="top"><td>rank</td> <td> the rank of B</td></tr> <tr valign="top"><td>pivot</td> <td> an ordered set of column indeces of B with, if the case, the last rank+1,...,p columns which indicate possible linear combinations.</td></tr> </table> for the function cp: <table summary="R valueblock"> <tr valign="top"><td>new.B</td> <td> the matrix product X'X (weighted, if w is given).</td></tr> </table> for the function is.sparse: <table summary="R valueblock"> <tr valign="top"><td>sparse</td> <td> a logical value which indicates if the sample proportion of zeroes is greater than sparselim, with the sample proportion as attribute.</td></tr> </table>
功能control,列表中包含下列元素:<table summary="R valueblock"> <tr valign="top"> <TD>XTX </ TD> <TD>的矩阵产品B无奇点(如果有)。</ TD> </ TR> <tr valign="top"> <TD>rank </ TD> <TD>的B级</ TD> < / TR> <tr valign="top"> <TD> pivot </ TD> <TD>的B柱选取的一组有序的,如果的情况下,最后rank+1,...,p列显示可能的线性组合。</ TD> </ TR> </ TABLE>的功能cp:<table summary="R valueblock"> <tr valign="top"> <TD> new.B </ TD> <TD>:矩阵产品XX(加权,如果w)。</ TD> </ TR> </ TABLE>功能is.sparse:<表摘要是=“R valueblock”> <tr valign="top"> <TD>sparse </ TD> <td>一个逻辑值,它表明如果零的样本比例大于sparselim样本比例为属性。</ TD> </ TR> </ TABLE>
(作者)----------Author(s)----------
Marco ENEA
参见----------See Also----------
eigen, chol, qr, crossprod
特征,哲,QR,crossprod的
实例----------Examples----------
#### example 1.[###示例1。]
n <- 100000
k <- 100
x <- round(matrix(rnorm(n*k),n,k),digits=4)
y <- rnorm(n)
# if an optimized BLAS is not installed, depending on processor type, cp() may be [如果一个优化的BLAS没有安装,具体取决于处理器类型,CP()]
# faster than crossprod() for large matrices.[更快的,比crossprod()大型矩阵。]
system.time(a1 <- crossprod(x))
system.time(a2 <- cp(x,,row.chunk = 500))
all.equal(a1, a2)
#### example 2.1.[###例如2.1。]
n <- 100000
k <- 10
x <- matrix(rnorm(n*k),n,k)
x[,2] <- x[,1] + 2*x[,3] # x has rank 9[x有9级]
y <- rnorm(n)
# estimation by least squares [通过最小二乘法估计]
A <- function(){
A1 <- control(crossprod(x))
ok <- A1$pivot[1:A1$rank]
as.vector(solve(A1$XTX,crossprod(x[,ok],y)))
}
# estimation by QR decomposition[估计的QR分解]
B <- function(){
B1 <- qr(x)
qr.solve(x[,B1$pivot[1:B1$rank]],y)
}
system.time(a <- A())
system.time(b <- B())
all.equal(a,b)
### example 2.2[##例如2.2]
x <- matrix(c(1:5, (1:5)^2), 5, 2)
x <- cbind(x, x[, 1] + 3*x[, 2])
m <- crossprod(x)
qr(m)$rank # is 2, as it should be[是2时,因为它应该是]
control(m,method="eigen")$rank # is 2, as it should be[是2时,因为它应该是]
control(m,method="Cholesky")$rank # is wrong[是错误的]
### example 3. [##例如3。]
n <- 10000
fat1 <- gl(20,500)
y <- rnorm(n)
da <- data.frame(y,fat1)
m <- model.matrix(y ~ factor(fat1),data = da)
is.sparse(m)
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|