找回密码
 注册
查看: 2940|回复: 0

R语言 impute包 impute.knn()函数中文帮助文档(中英文对照)

[复制链接]
发表于 2012-2-25 22:17:47 | 显示全部楼层 |阅读模式
impute.knn(impute)
impute.knn()所属R语言包:impute

                                        A function to impute missing expression data
                                         一个函数来填补缺失的表达数据

                                         译者:生物统计家园网 机器人LoveR

描述----------Description----------

A function to impute missing expression data, using
以填补缺失的表达数据,使用一个函数


用法----------Usage----------


impute.knn(data ,k = 10, rowmax = 0.5, colmax = 0.8, maxp = 1500, rng.seed=362436069)



参数----------Arguments----------

参数:data
An expression matrix with genes in the rows, samples in the columns
在列行中的基因,样本中的表达矩阵


参数:k
Number of neighbors to be used in the imputation (default=10)
归集使用邻居的数量(默认值= 10)


参数:rowmax
The maximum percent missing data allowed in any row (default 50%). For any rows with more than rowmax% missing are imputed using the overall mean per sample.
最大%丢失的数据,允许在任何行(默认为50%)。对于超过任何行rowmax%失踪归罪于使用整体平均每个样品。


参数:colmax
The maximum percent missing data allowed in any column (default 80%). If any column has more than colmax% missing data, the program halts and reports an error.
失踪的最大百分比允许任何列中(默认为80%)的数据。如果任何列有比colmax%丢失的数据,程序停止,并报告一个错误的更多。


参数:maxp
The largest block of genes imputed using the knn algorithm inside impute.knn (default 1500); larger blocks are divided by two-means clustering (recursively) prior to imputation. If maxp=p, only knn imputation is done.
使用KNN算法,里面的最大的基因块归咎于impute.knn(默认1500);较大的区块分为两个均值聚类(递归)之前归集。如果maxp=p,只有KNN归责完成。


参数:rng.seed
The seed used for the random number generator (default 362436069) for reproducibility.
种子用于重复性的随机数发生器(默认为362436069)。


Details

详情----------Details----------

impute.knn  uses k-nearest neighbors in the space of genes to impute missing expression values.
impute.knn使用k近邻,在基因空间,以填补缺失的表达值。

For each gene with missing values, we find the k nearest neighbors using a Euclidean metric, confined to the columns for which that gene is NOT missing. Each candidate neighbor might be missing some of the coordinates used to calculate the distance. In this case we average the distance from the non-missing coordinates. Having found the k nearest neighbors for a gene, we impute the missing elements by averaging those (non-missing) elements of its neighbors. This can fail if ALL the neighbors are missing in a particular element. In this case we use the overall column mean for that block of genes.
对于每个基因缺失值,我们发现近邻k使用欧氏度量,仅限于该基因没有丢失的列。每名候选人的邻居可能会丢失一些用来计算距离的坐标。在这种情况下,我们从非丢失坐标平均距离。经发现K最近的一个基因的邻居,我们意指平均其邻国的(非缺失)元素缺少的元素。这可能会失败,如果所有的邻居都在一个特定的元素缺失。在这种情况下,我们用基因块的整体列的平均值。

Since nearest neighbor imputation costs O(p*log(p)) operations per gene, where p is the number of rows, the computational time can be excessive for large p and a large number of missing rows. Our strategy is to break blocks with more than maxp genes into two smaller blocks using two-mean clustering. This is done recursively till all blocks have less than maxp genes. For each block, k-nearest neighbor imputation is done separately. We have set the default value of maxp to 1500. Depending on the speed of the machine, and number of samples, this number might be increased. Making it too small is counter-productive, because the number of two-mean clustering algorithms will increase.
由于近邻归集成本O(p*log(p))每个基因操作p是行数,可以计算时间过多大p和大量丢失的行。我们的策略是使用2-均值聚类块打入超过maxp基因块。这是递归直到所有区块比maxp基因有少。对于每个块,k近邻归集单独完成。我们已经设置了默认值maxp1500。根据机器的运行速度,样本数量,这个数字可能会增加。它太小是适得其反,因为2-均值聚类算法的人数将会增加。

For reproducibility, this function reseeds the random number generator using the seed provided or the default seed (362436069).
此功能可重复性,重新提供的随机数发生器使用提供种子或默认的种子(362436069)。


值----------Value----------


参数:data
the new imputed data matrix
新的估算数据矩阵


参数:rng.seed
the rng.seed that can be used to reproduce the imputation. This should be saved by any prudent user if different from the default.
rng.seed,可用于繁殖的归集。这应该由任何审慎的用户如果从不同的默认保存。


参数:rng.state
the state of the random number generator, if available, prior to the call to set.seed. Otherwise, it is NULL. If necessary, this can be used in the calling code to undo the side-effect of changing the random number generator sequence.  
随机数发生器的状态,如果有的话,前调用set.seed。否则,它是NULL。如果有必要,这可以被用来在调用代码撤消改变随机数生成器序列的副作用。


注意----------Note----------

A bug in the function knnimp.split was fixed in version 1.18.0.  This means that results from earlier versions may not be exactly reproducible.
功能knnimp.split错误被固定在1.18.0版本。这意味着,从早期版本的结果可能不完全重现。


作者(S)----------Author(s)----------


Trevor Hastie, Robert Tibshirani, Balasubramanian Narasimhan, and Gilbert Chu



参考文献----------References----------

Botstein, D., Imputing Missing Data for Gene Expression Arrays, Stanford University Statistics Department Technical report (1999), http://www-stat.stanford.edu/~hastie/Papers/missing.pdf
Trevor Hastie, Robert Tibshirani, David Botstein and Russ B. Altman, Missing value estimation methods for DNA microarrays BIOINFORMATICS Vol. 17

参见----------See Also----------

set.seed, save
set.seed,保存


举例----------Examples----------


data(khanmiss)
khan.expr <- khanmiss[-1, -(1:2)]
##[#]
## First example[#第一个例子]
##[#]
if(exists(".Random.seed")) rm(.Random.seed)
khan.imputed <- impute.knn(as.matrix(khan.expr))
##[#]
## khan.imputed$data should now contain the imputed data matrix[#khan.imputed元数据现在应该包含估算数据矩阵]
## khan.imputed$rng.seed should contain the random number seed used[#khan.imputed美元rng.seed应包含使用随机数种子]
## in imputation. In the above invocation, it is the default seed.[#归集。在上面的调用,它是默认的种子。]
##[#]
khan.imputed$rng.seed # should be 362436069[应该是362436069]
khan.imputed$rng.state # should be NULL[应该是NULL]
##[#]
## Second example[#第二个例子]
##[#]
set.seed(12345)
saved.state <- .Random.seed
khan.imputed <- impute.knn(as.matrix(khan.expr))
# Assuming all goes well with no guarantees in case of error...[如果一切顺利,在错误的情况下不能保证......]
.Random.seed <- khan.imputed$rng.state
sum(saved.state - khan.imputed$rng.state) # should be zero![应该是零!]
save(khan.imputed, file="khanimputation.Rda")

转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2025-2-5 19:46 , Processed in 0.025315 second(s), 15 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表