tune(CMA)
tune()所属R语言包:CMA
Hyperparameter tuning for classifiers
hyperparameter调整为分类
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Most classifiers implemented in this package depend on one or even several hyperparameters (s. details) that should be optimized to obtain good (and comparable !) results. As tuning scheme, we propose three fold Cross-Validation on each learningset (for fixed selected variables). Note that learningsets usually do not contain the complete dataset, so tuning involves a second level of splitting the dataset. Increasing the number of folds leads to larger datasets (and possibly to higher accuracy), but also to higher computing times.<br>
在这个包中实现的大多数分类依赖于一个或几个hyperparameters甚至(S.细节)应优化,以获得良好(相媲美!)的结果。作为调整计划,我们提出了三个交叉验证每个learningset(固定选定的变量)。注意learningsets通常不包含完整的数据集,因此调整涉及的分裂集第二级。倍数数的增加,导致更大的数据集(并可能以更高的精度),而且以更高的计算时间。参考
用法----------Usage----------
tune(X, y, f, learningsets, genesel, genesellist = list(), nbgene, classifier, fold = 3, strat = FALSE, grids = list(), trace = TRUE, ...)
参数----------Arguments----------
参数:X
Gene expression data. Can be one of the following:
基因表达数据。可以是下列之一:
A matrix. Rows correspond to observations, columns to variables.
Amatrix。行对应的观察,列变量。
A data.frame, when f is not missing (s. below).
一个data.frame时f不缺少(S.下面)。
An object of class ExpressionSet.
对象类ExpressionSet。
参数:y
Class labels. Can be one of the following:
类的标签。可以是下列之一:
A numeric vector.
一个numeric向量。
A factor.
Afactor。
A character if X is an ExpressionSet that specifies the phenotype variable.
一个如果character X是一个ExpressionSet指定的表型变量。
missing, if X is a data.frame and a proper formula f is provided.
missing,X是data.frame和适当的公式f提供。
参数:f
A two-sided formula, if X is a data.frame. The left part correspond to class labels, the right to variables.
一个双面的公式,如果X是data.frame。左边部分对应类的标签,对变量的权利。
参数:learningsets
An object of class learningsets. May be missing, then the complete datasets is used as learning set.
对象类learningsets。可能会丢失,然后学习一套完整的数据集。
参数:genesel
Optional (but usually recommended) object of class genesel containing variable importance information for the argument learningsets
可选(但通常推荐)对象类genesel含参数learningsets变量的重要性的信息
参数:genesellist
In the case that the argument genesel is missing, this is an argument list passed to GeneSelection. If both genesel and genesellist are missing, no variable selection is performed.
参数genesel是失踪的情况下,这是一个参数列表传递到GeneSelection。如果这两个genesel和genesellist缺少,没有变量选择执行。
参数:nbgene
Number of best genes to be kept for classification, based on either genesel or the call to GeneSelection using genesellist. In the case that both are missing, this argument is not necessary. note:
最好的基因数目要保持分类的基础上,要么genesel或GeneSelection用genesellist。都缺少的情况下,这种说法是没有必要的。注意:
If the gene selection method has been one of "lasso", "elasticnet", "boosting", nbgene will be reset to min(s, nbgene) where s is the number of nonzero coefficients.
如果基因选择方法之一"lasso", "elasticnet", "boosting",nbgene将重置min(s, nbgene)其中s是非零系数的数目。
if the gene selection scheme has been "one-vs-all", "pairwise" for the multiclass case, there exist several rankings. The top nbgene will be kept of each of them, so the number of effective used genes will sometimes be much larger.
如果基因选择方案一直"one-vs-all", "pairwise"为多的情况下,存在着几个排名。顶部nbgene将保持他们每个人的,所以有效使用基因的数量有时会大得多。
参数:classifier
Name of function ending with CMA indicating the classifier to be used.
CMA指示要使用的分类函数名结束。
参数:fold
The number of cross-validation folds used within each learningset. Default is 3. Increasing fold will lead to higher computing times.
每个learningset内使用交叉验证褶皱。默认值是3。增加fold会导致更高的计算时间。
参数:strat
Should stratified cross-validation according to the class proportions in the complete dataset be used ? Default is FALSE.
应分层交叉验证,根据完整的数据集类的比例用于?默认FALSE。
参数:grids
A named list. The names correspond to the arguments to be tuned, e.g. k (the number of nearest neighbours) for knnCMA, or cost for svmCMA. Each element is a numeric vector defining the grid of candidate values. Of course, several hyperparameters can be tuned simultaneously (though requiring much time). By default, grids is an empty list. In that case, a pre-defined list will be used, s. details.
命名名单。名称对应的参数进行调整,如k(近邻)knnCMA或costsvmCMA。每个元素是一个numeric向量确定候选值的电网。当然,几个hyperparameters可以同时调整(虽然需要多少时间)。默认情况下,grids是一个空列表。在这种情况下,将一个预定义列表的。细节。
参数:trace
Should progress be traced ? Default is TRUE.
应该进步被追踪吗?默认TRUE。
参数:...
Further arguments to be passed to classifier, of course not one of the arguments to be tuned (!).
进一步的参数被传递给classifier,当然不是一个参数被调整的(!)。
Details
详情----------Details----------
The following default settings are used, if the arguments grids is an empty list:
以下使用默认设置,如果参数grids是一个空列表:
gbmCMA n.trees = c(50, 100, 200, 500, 1000)
gbmCMAn.trees = c(50, 100, 200, 500, 1000)
compBoostCMA mstop = c(50, 100, 200, 500, 1000)
compBoostCMAmstop = c(50, 100, 200, 500, 1000)
LassoCMA norm.fraction = seq(from=0.1, to=0.9, length=9)
LassoCMAnorm.fraction = seq(from=0.1, to=0.9, length=9)
ElasticNetCMA norm.fraction = seq(from=0.1, to=0.9, length=5), lambda2 = 2^{-(5:1)}
ElasticNetCMAnorm.fraction = seq(from=0.1, to=0.9, length=5), lambda2 = 2^{-(5:1)}
plrCMA lambda = 2^{-4:4}
plrCMAlambda = 2^{-4:4}
pls_ldaCMA comp = 1:10
pls_ldaCMAcomp = 1:10
pls_lrCMA comp = 1:10
pls_lrCMAcomp = 1:10
pls_rfCMA comp = 1:10
pls_rfCMAcomp = 1:10
rfCMA mtry = ceiling(c(0.1, 0.25, 0.5, 1, 2)*sqrt(ncol(X))), nodesize = c(1,2,3)
rfCMAmtry = ceiling(c(0.1, 0.25, 0.5, 1, 2)*sqrt(ncol(X))), nodesize = c(1,2,3)
knnCMA k=1:10
knnCMAk=1:10
pknnCMA k = 1:10
pknnCMAk = 1:10
scdaCMA delta = c(0.1, 0.25, 0.5, 1, 2, 5)
scdaCMAdelta = c(0.1, 0.25, 0.5, 1, 2, 5)
pnnCMA sigma = c(2^{-2:2})
pnnCMAsigma = c(2^{-2:2})
,
,
nnetCMA size = 1:5, decay = c(0, 2^{-(4:1)})
nnetCMAsize = 1:5, decay = c(0, 2^{-(4:1)})
svmCMA, kernel = "linear" cost = c(0.1, 1, 5, 10, 50, 100, 500)
svmCMA,kernel = "linear"cost = c(0.1, 1, 5, 10, 50, 100, 500)
svmCMA, kernel = "radial" cost = c(0.1, 1, 5, 10, 50, 100, 500), gamma = 2^{-2:2}
svmCMA,kernel = "radial"cost = c(0.1, 1, 5, 10, 50, 100, 500), gamma = 2^{-2:2}
svmCMA, kernel = "polynomial" cost = c(0.1, 1, 5, 10, 50, 100, 500), degree = 2:4
svmCMA,kernel = "polynomial"cost = c(0.1, 1, 5, 10, 50, 100, 500), degree = 2:4
值----------Value----------
An object of class tuningresult
一个对象的类tuningresult
注意----------Note----------
The computation time can be enormously high. Note that for each different learningset, the classifier must be trained fold times number of possible different hyperparameter combinations times. E.g. if the number of the learningsets is fifty, fold = 3 and two hyperparameters (each with 5 candidate values) are tuned, 50x3x25=3750
计算时间可能非常高。需要注意的是每个不同的learningset,分类必须训练fold次number of possible different hyperparameter combinations倍。例如如果的learningsets的数是五十,fold = 3“两个hyperparameters(每5个候选值)的调整,50x3x25 = 3750
作者(S)----------Author(s)----------
Martin Slawski <a href="mailto:ms@cs.uni-sb.de">ms@cs.uni-sb.de</a>
Anne-Laure Boulesteix <a href="mailto:boulesteix@ibe.med.uni-muenchen.de">boulesteix@ibe.med.uni-muenchen.de</a>
Christoph Bernau <a href="mailto:bernau@ibe.med.uni-muenchen.de">bernau@ibe.med.uni-muenchen.de</a>
参考文献----------References----------
CMA - A comprehensive Bioconductor package for supervised classification with high dimensional data. BMC Bioinformatics 9: 439
参见----------See Also----------
tuningresult, GeneSelection, classification
tuningresult,GeneSelection,classification
举例----------Examples----------
## Not run: [#无法运行:]
### simple example for a one-dimensional grid, using compBoostCMA.[一维网格使用compBoostCMA,#简单的例子。]
### dataset[##集]
data(golub)
golubY <- golub[,1]
golubX <- as.matrix(golub[,-1])
### learningsets[##learningsets]
set.seed(111)
lset <- GenerateLearningsets(y=golubY, method = "CV", fold=5, strat =TRUE)
### tuning after gene selection with the t.test[#微调之后基因的t.test的选择]
tuneres <- tune(X = golubX, y = golubY, learningsets = lset,
genesellist = list(method = "t.test"),
classifier=compBoostCMA, nbgene = 100,
grids = list(mstop = c(50, 100, 250, 500, 1000)))
### inspect results[##检查结果]
show(tuneres)
best(tuneres)
plot(tuneres, iter = 3)
## End(Not run)[#结束(不运行)]
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|