R语言 CMA包 GeneSelection()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-25 15:21:22

GeneSelection(CMA)
GeneSelection()所属R语言包：CMA

 General method for variable selection with various methods
 各种方法的变量选择与通用方法

 译者：生物统计家园网机器人LoveR

描述----------Description----------

For different learning data sets as defined by the argument learningsets, this method ranks the genes from the most relevant to the less relevant using one of various 'filter' criteria or provides a sparse collection of variables (Lasso, ElasticNet, Boosting). The results are typically used for variable selection for the classification procedure that follows. For S4 class information, s. GeneSelection-methods.
对于不同的学习资料，设置参数定义learningsets，此方法的排名从最相关的基因相关性较低的使用各种“过滤器”的标准之一，或提供了一个稀疏的变量集合（套索，ElasticNet，提高）。通常用于如下S4类信息的分类程序。参考变量选择的结果。 GeneSelection-methods。

用法----------Usage----------

GeneSelection(X, y, f, learningsets, method = c("t.test", "welch.test", "wilcox.test", "f.test", "kruskal.test", "limma", "rfe", "rf", "lasso", "elasticnet", "boosting", "golub", "shrinkcat"), scheme, trace = TRUE, ...)

参数----------Arguments----------

参数：X
Gene expression data. Can be one of the following:
基因表达数据。可以是下列之一：

A matrix. Rows correspond to observations, columns to variables.
Amatrix。行对应的观察，列变量。

A data.frame, when f is not missing (s. below).
一个data.frame时f不缺少（S.下面）。

An object of class ExpressionSet.
对象类ExpressionSet。

参数：y
Class labels. Can be one of the following:
类的标签。可以是下列之一：

A numeric vector.
一个numeric向量。

A factor.
Afactor。

A character if X is an ExpressionSet.
一个character如果X是ExpressionSet。

missing, if X is a data.frame and a proper formula f is provided.
missing，X是data.frame和适当的公式f提供。

参数：f
A two-sided formula, if X is a data.frame. The left part correspond to class labels, the right to variables.
一个双面的公式，如果X是data.frame。左边部分对应类的标签，对变量的权利。

参数：learningsets
An object of class learningsets. May be missing, then the complete datasets is used as learning set.
对象类learningsets。可能会丢失，然后学习一套完整的数据集。

参数：method
A character specifying the method to be used:
一个字符指定要使用的方法：

t.testtwo-sample t.test (equal variances for both classes assumed).
t.test两样本t.test（假定两个类的方差相等）。

welch.testWelch modification of the t.test (unequal variances for both classes).
welch.test韦尔奇的t.test修改（两个类不平等的差异）。

wilcox.testWilcoxon rank sum test.
wilcox.test秩和检验。

f.testF test belonging to the linear hypothesis that the mean is the same for all classes. Usually used for the multiclass scheme, is equivalent to method = t.test in the two-class case.
f.testF检验，属于线性假设平均是所有类相同。通常用于多类计划，是相当于method = t.test在两个阶级的情况。

kruskal.testMulti-class generalization of the Wilcoxon rank sum test and the nonparametric pendant to the F test, respectively.
kruskal.test多级Wilcoxon秩和检验和F检验非参数挂件，分别概括。

limma'Moderated t' statistic for the two-class case and 'moderated F' statistic for the multiclass case, described in Smyth (2003). Requires the package limma.
limma主持人T为两类情况的统计和放缓F“统计为多的情况下，史密斯（2003年）中描述。需要包limma。

rfeOne-step Recursive Feature Elimination, based on the Support Vector Machine. The method is decribed in Guyon et al. (2002). Requires the package e1071. Take care that appropriate hyperparameters are passed by the ... argument.
rfe一步递归特征消除，基于支持向量机。该方法描述下尺管等。（2002年）。需要包e1071。照顾...参数传递，适当hyperparameters是。

rfRandom Forest Variable Importance Measure. Requires the package randomForest
rf随机森林变量的重要性措施。需要的包randomForest

lassoL1 penalized logistic regression leads to sparsity with respect to the variables used. Calls the function LassoCMA, which requires the package glmpath. warning: Take care that appropriate hyperparameters are passed by the ... argument.
lasso L1处罚logistic回归导致稀疏方面使用的变量。调用函数LassoCMA，这就要求包glmpath。警告：小心，适当hyperparameters是...参数传递。

elasticnetPenalized logistic regression with both L1 and L2 penalty, claimed by Zhou and Hastie (2004) to select 'variable groups'. Calls the function ElasticNetCMA, which requires the package glmpath. warning: Take care that appropriate hyperparameters are passed by the ... argument.
elasticnet处罚logistic回归与两L1和L2罚款，由周恩来和哈斯蒂（2004）声称，选择“变量组”。调用函数ElasticNetCMA，这就要求包glmpath。警告：小心，适当hyperparameters是...参数传递。

boostingComponentwise boosting (Buehlmann and Yu, 2003) has been shown to mimic the LASSO (Efron et al., 2004; Buehlmann and Yu, 2006). Calls the function compBoostCMA Take care that appropriate hyperparameters are passed by the ... argument.
boosting的分支刺激（Buehlmann和玉，2003年）已被证明是模仿套索埃弗龙等人，2004年。Buehlmann瑜（2006）。调用函数compBoostCMA小心，适当hyperparameters是...参数传递。

golubThe (theoretically unfounded) variable selection criterion used by Golub et al. (1999), s. golub.
golub（理论上是毫无根据的）变量Golub等所使用的选择标准。（1999年）。 golub。

shrinkcatThe correlation-adjusted t-score from Zuber and Strimmer (2009)
shrinkcat相关调整从朱伯和Strimmer T-评分（2009）

参数：scheme
The scheme to be used in the case of a non-binary response. Must be one of "pairwise","one-vs-all" or "multiclass". The last case only makes sense if method is one of f.test, limma, rf, boosting, which can directly be applied to the multi class case.
该计划将在一个非二进制的反应的情况下使用。必须之一"pairwise"，"one-vs-all"或"multiclass"。最后一种情况下才有意义，如果method是f.test, limma, rf, boosting，可直接应用于多类情况之一。

参数：trace
Should the progress be traced ? Default is TRUE.
应追溯到进展？默认TRUE。

参数：...
Further arguments passed to the function performing variable selection, s. method.
进一步传递的参数进行变量选择的功能。 method。

值----------Value----------

An object of class genesel.
对象类genesel。

注意----------Note----------

most of the methods described above are only apt for the binary classification case. The only ones that can be used without restriction in the multiclass case are
上述方法中的大部分都是只为二元分类的情况下容易。用在多的情况下，可以不受任何限制的，仅仅是

f.test
f.test

kruskal.test
kruskal.test

rf
rf

boosting
boosting

For the rest, pairwise or one-vs-all schemes are used.
对于剩下的，成对或一对所有计划使用。

作者（S）----------Author(s)----------

Martin Slawski <a href="mailto:ms@cs.uni-sb.de">ms@cs.uni-sb.de</a>

Anne-Laure Boulesteix <a href="mailto:boulesteix@ibe.med.uni-muenchen.de">boulesteix@ibe.med.uni-muenchen.de</a>

Christoph Bernau <a href="mailto:bernau@ibe.med.uni-muenchen.de">bernau@ibe.med.uni-muenchen.de</a>

参考文献----------References----------

Statistical issues in microarray data analysis. Methods in Molecular Biology 224, 111-136.
Gene Selection for Cancer Classification using support vector machines. Journal of Machine Learning Research, 46, 389-422
Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society B, 67(2),301-320
Boosting with the L2 loss: Regression and Classification. Journal of the American Statistical Association, 98, 324-339
Least Angle Regression. Annals of Statistics, 32:407-499
Sparse Boosting. Journal of Machine Learning Research, 7- 1001:1024
CMA - A comprehensive Bioconductor package for supervised classification with high dimensional data. BMC Bioinformatics 9: 439

参见----------See Also----------

filter, GenerateLearningsets, tune,
filter，GenerateLearningsets，tune

举例----------Examples----------

# load Golub AML/ALL data[加载戈卢布反洗钱/所有数据]
data(golub)
### extract class labels[＃提取类标签]
golubY <- golub[,1]
### extract gene expression from first 10 genes[＃提取从第10个基因的基因表达]
golubX <- as.matrix(golub[,-1])
### Generate five different learningsets[＃生成五个不同learningsets]
set.seed(111)
five <- GenerateLearningsets(y=golubY, method = "CV", fold = 5, strat = TRUE)
### simple t-test:[＃简单的t-检验：]
selttest <- GeneSelection(golubX, golubY, learningsets = five, method = "t.test")
### show result:[＃显示的结果：]
show(selttest)
toplist(selttest, k = 10, iter = 1)
plot(selttest, iter = 1)

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册