varSelRFBoot(varSelRF)
varSelRFBoot()所属R语言包:varSelRF
Bootstrap the variable selection procedure in varSelRF
引导变量的选择过程中varSelRF
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Use the bootstrap to estimate the prediction error rate (wuth the .632+ rule) and the stability of the variable selection procedure implemented in varSelRF.
使用的bootstrap估计预测误差率(茂盛0.632 +规则)和实施varSelRF的变量选择过程的稳定性。
用法----------Usage----------
varSelRFBoot(xdata, Class, c.sd = 1,
mtryFactor = 1, ntree = 5000, ntreeIterat = 2000,
vars.drop.frac = 0.2, bootnumber = 200,
whole.range = TRUE,
recompute.var.imp = FALSE,
usingCluster = TRUE,
TheCluster = NULL, srf = NULL, verbose = TRUE, ...)
参数----------Arguments----------
参数:xdata
A data frame or matrix, with subjects/cases in rows and variables in columns. NAs not allowed.
一个数据框或矩阵的行和列中的变量,与科目/箱。来港定居不允许的。
参数:Class
The dependent variable; must be a factor.
因变量;必须是一个因素。
参数:c.sd
The factor that multiplies the sd. to decide on stopping the tierations or choosing the final solution. See reference for details.
的因素乘以SD。决定,上停止的tierations,或选择最终的解决方案。有关详细信息,请参见参考。
参数:mtryFactor
The multiplication factor of √{number.of.variables} for the number of variables to use for the ntry argument of randomForest.
乘法因子√{number.of.variables}的变量数的n请尝试使用参数randomForest。
参数:ntree
The number of trees to use for the first forest; same as ntree for randomForest.
第一个森林的树木数量使用;一样ntree为randomForest。
参数:ntreeIterat
The number of trees to use (ntree of randomForest) for all additional forests.
使用(ntree的randomForest)的所有其他森林树木的数量。
参数:vars.drop.frac
The fraction of variables, from those in the previous forest, to exclude at each iteration.
变量的馏分,从那些在前面的森林,在每次迭代中排除。
参数:whole.range
If TRUE continue dropping variables until a forest with only two variables is built, and choose the best model from the complete series of models. If FALSE, stop the iterations if the current OOB error becomes larger than the initial OOB error (plus c.sd*OOB standard error) or if the current OOB error becoems larger than the previous OOB error (plus c.sd*OOB standard error).
如果TRUE继续下降,直到只有两个变量建立森林的变量,并选择最好的模式从完整的系列车型。如果为FALSE,停止迭代,如果当前的OOB错误变得大于初始OOB错误(OOB和和另加c.sd *标准误差)或,如果当前OOB错误becoems大于以前的OOB错误(OOB和和另加c.sd *标准误差)。
参数:recompute.var.imp
If TRUE recompute variable importances at each new iteration.
如果是TRUE重新计算在每个新的迭代变量的重要性。
参数:bootnumber
The number of bootstrap samples to draw.
画的bootstrap样本的数量。
参数:usingCluster
If TRUE use a cluster to parallelize the calculations.
如果是TRUE,使用一个聚类并行计算。
参数:TheCluster
The name of the cluster, if one is used.
聚类的名称,如果使用了。
参数:srf
An object of class varSelRF. If used, the ntree and mtryFactor parameters are taken from this object, not from the arguments to this function. If used, it allows to skip carrying out a first iteration to build the random forest to the complete, original data set.
对象的类varSelRF。如果使用,的ntree和mtryFactor参数是从该对象,而不是从传递到该函数。如果使用,它允许跳到进行构建随机森林,完整的,原始数据集的第一次迭代。
参数:verbose
Give more information about what is being done.
提供更多的信息,正在做什么。
参数:...
Not used.
未使用。
Details
详细信息----------Details----------
If a cluster is used for the calculations, it will be used for the embarrisingly parallelizable task of building as many random forests as bootstrap samples.
如果聚类用于计算的,它会被用于许多bootstrap样本的随机森林建设的embarrisingly并行化的任务。
值----------Value----------
An object of class varSelRFBoot, which is a list with components: <table summary="R valueblock"> <tr valign="top"><td>number.of.bootsamples</td> <td> The number of bootstrap replicates.</td></tr> <tr valign="top"><td>bootstrap.pred.error</td> <td> The .632+ estimate of the prediction error.</td></tr> <tr valign="top"><td>leave.one.out.bootstrap</td> <td> The leave-one-out estimate of the error rate (used when computing the .632+ estimate).</td></tr> <tr valign="top"><td>all.data.randomForest</td> <td> A random forest built from all the data, but after the variable selection. Thus, beware because the OOB error rate is severely biased down.</td></tr> <tr valign="top"><td>all.data.vars</td> <td> The variables selected in the run with all the data.</td></tr> <tr valign="top"><td>all.data.run</td> <td> An object of class varSelRF; the one obtained from a run of varSelRF on the original, complete, data set. See varSelRF.</td></tr> <tr valign="top"><td>class.predictions</td> <td> The out-of-bag predictions from the bootstrap, of type "response".See predict.randomForest. This is an array, with dimensions number of cases by number of bootstrap replicates. </td></tr> <tr valign="top"><td>prob.predictions</td> <td> The out-of-bag predictions from the bootstrap, of type "class probability". See predict.randomForest. This is a 3-way array; the last dimension is the bootstrap replication; for each bootstrap replication, the 2D array has dimensions case by number of classes, and each value is the probability of belonging to that class.</td></tr> <tr valign="top"><td>number.of.vars</td> <td> A vector with the number of variables selected for each bootstrap sample.</td></tr> <tr valign="top"><td>overlap</td> <td> The "overlap" between the variables selected from the run in original sample and the variables returned from a bootstrap sample. Overlap between the sets of variables A and B is defined as
对象的类varSelRFBoot的,这是一个组件列表:<table summary="R valueblock"> <tr valign="top"> <TD>number.of.bootsamples</ TD> <TD>的引导复制</ TD> </ TR> <tr valign="top"> <TD>bootstrap.pred.error</ TD> <TD> 0.632 +的预测误差估计。</ TD> </ TR> <tr valign="top"> <TD> leave.one.out.bootstrap </ TD> <TD>留一出估计的错误率(0.632 +估计计算时使用)。</ TD> < / TR> <tr valign="top"> <TD>all.data.randomForest</ TD> <td>一个随机森林建立的所有资料,但在变量的选择。因此,谨防,因为OOB错误率严重偏见的。</ TD> </ TR> <tr valign="top"> <TD>all.data.vars </ TD> <TD>的变量中选择运行所有的数据。</ TD> </ TR> <tr valign="top"> <TD>all.data.run </ TD> <TD>一个对象的类varSelRF的,一个从运行的varSelRF获得原始的,完整的,数据集上。见varSelRF。</ TD> </ TR> <tr valign="top"> <TD> class.predictions</ TD> <TD>袋预测的引导,型“响应”。predict.randomForest。这是一个数组,尺寸数量的情况下,通过引导复制的数量。 </ TD> </ TR> <tr valign="top"> <TD> prob.predictions</ TD> <TD>袋预测,从引导型“类的可能性”。见predict.randomForest。这是一个3路数组的最后一个维度是引导复制,每个引导复制,二维数组的类的数量有尺寸的情况下,每个值的概率是属于这一类。</ TD> </ TR> <tr valign="top"> <TD> number.of.vars</ TD> <td>一个向量,每个引导样本选择的变量的数量。</ TD> </ TR> <TR VALIGN = “顶”> <TD>overlap</ TD> <TD>的“重叠”从运行在原始样本选择和变量之间的自举样品返回的变量。的变量的集合之间的重叠被定义为A和B的
|variables.in.B|}}</i> or size (cardinality) of
| variables.in.B |}} </ I>或大小(基数)
参数:all.vars.in.solutions
A vector with all the genes selected in the runs on all the bootstrap samples. If the same gene is selected in several bootstrap runs, it appears multiple times in this vector.
一个向量,其所有的基因中选择运行所有的bootstrap样本。如果相同的基因选自在几个引导运行,此向量中多次出现。
参数:all.solutions
Each solutions is a character vector with all the variables in a particular solution concatenated by a "+". Thus, all.solutions is a vector, with length equal to number.of.bootsamples, of the solution from each bootstrap run.
每一个解决方案是一个字符向量与一个“+”的连接的特定解决方案中的所有变量。因此,all.solutions是一个矢量,其长度等于number.of.bootsamples,从每个引导运行的解决方案。
参数:Class
The original class argument.
原始的类的说法。
参数:allBootRuns
A list of length number.of.bootsamples. Each component of this list is an element of class varSelRF and stores the results from the runs on each bootstrap sample.
的列表长度number.of.bootsamples。该列表中的每个组成部分,是元素类varSelRF并将结果存储在从每个引导样本的运行。
</table>
</ TABLE>
注意----------Note----------
The out-of-bag predictions stored in class.predictions and prob.predictions are NOT the OOB votes from random forest itself for a given run. These are predictions from the out-of-bag samples for each bootstrap replication. Thus, these are samples that have not been used at all in any of the variable selection
袋预测存储在class.predictions和prob.predictions是从给定的运行随机森林本身的OOB票。这些是每个引导复制袋样品的预测。因此,这些样品中的任何变量选择,但没有在所有使用
(作者)----------Author(s)----------
Ramon Diaz-Uriarte <a href="mailto:rdiaz02@gmail.com">rdiaz02@gmail.com</a>
参考文献----------References----------
Breiman, L. (2001) Random forests. Machine Learning, 45, 5–32.
Diaz-Uriarte, R. and Alvarez de Andres, S. (2005) Variable selection from random forests: application to gene expression data. Tech. report. http://ligarto.org/rdiaz/Papers/rfVS/randomForestVarSel.html
Efron, B. & Tibshirani, R. J. (1997) Improvements on cross-validation: the .632+ bootstrap method. J. American Statistical Association, 92, 548–560.
Svetnik, V., Liaw, A. , Tong, C & Wang, T. (2004) Application of Breiman's random forest to modeling structure-activity relationships of pharmaceutical molecules. Pp. 334-343 in F. Roli, J. Kittler, and T. Windeatt (eds.). Multiple Classier Systems, Fifth International Workshop, MCS 2004, Proceedings, 9-11 June 2004, Cagliari, Italy. Lecture Notes in Computer Science, vol. 3077. Berlin: Springer.
参见----------See Also----------
randomForest, varSelRF, summary.varSelRFBoot, plot.varSelRFBoot,
randomForest,varSelRF,summary.varSelRFBoot,plot.varSelRFBoot,
实例----------Examples----------
## Not run: [#不运行:]
## This is a small example, but can take some time.[#这是一个小例子,但可能需要一些时间。]
x <- matrix(rnorm(25 * 30), ncol = 30)
x[1:10, 1:2] <- x[1:10, 1:2] + 2
cl <- factor(c(rep("A", 10), rep("B", 15)))
rf.vs1 <- varSelRF(x, cl, ntree = 200, ntreeIterat = 100,
vars.drop.frac = 0.2)
rf.vsb <- varSelRFBoot(x, cl,
bootnumber = 10,
usingCluster = FALSE,
srf = rf.vs1)
rf.vsb
summary(rf.vsb)
plot(rf.vsb)
## End(Not run)[#(不执行)]
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|