找回密码
 注册
查看: 1313|回复: 0

R语言 stepwiseCM包 Curve.generator()函数中文帮助文档(中英文对照)

[复制链接]
发表于 2012-2-26 15:13:10 | 显示全部楼层 |阅读模式
Curve.generator(stepwiseCM)
Curve.generator()所属R语言包:stepwiseCM

                                         A function to generate accuracy with different cut points .
                                         函数产生不同的切点的准确性。

                                         译者:生物统计家园网 机器人LoveR

描述----------Description----------

For given clinical and molecular data, this function first calculates the predicted class labels of the training sets and the proximity matrices using clinical and molecular data separately. In the second step based on the classification performances of the two data types and the sample distribution in the two spaces, it calculates the reclassification score for the test set. Then, produce a vector accuracies by passing different percentage of samples to molecular data. This curve can be used as a reference for choosing the RS threshold for incoming test samples.
对于给定的临床和分子数据,这个函数首先计算训练集的预测类标签和单独使用的临床和分子数据接近矩阵。在两个数据类型的分类性能和样本分布在两个空间的基础上第二步,计算测试集的改叙得分。然后,通过不同比例的样品分子数据的矢量精度。这条曲线可以用来作为一个收到的测试样品选择的RS阈值的参考。


用法----------Usage----------


Curve.generator(train.cli, train.gen, train.label, test.cli, test.gen,
    test.label, type = c("TSP", "GLM", "GLM_L1", "GLM_L2", "PAM", "SVM",
    "plsrf_x", "plsrf_x_pv", "RF"), RStype = c("rank", "proximity", "both"),
    Parallel = FALSE, CVtype = c("loocv", "k-fold"), outerkfold = 5,
    innerkfold = 5, ncpus = 2, N = 50, featurenames = NULL, plot.it = TRUE)



参数----------Arguments----------

参数:train.cli
A data frame or matrix of containing predictors of the training set from clinical data, where columns correspond to samples and rows to features.  
一个数据框或矩阵含有预测从临床资料,其中列对应的样品和功能行设置的培训。


参数:train.gen
A data frame or matrix of containing predictors of the training set from molecular data, where columns correspond to samples and rows to features.  
一个数据框或矩阵含有预测从分子数据,列对应的样品和功能行设置的培训。


参数:train.label
A vector of the class labels (0 or 1) of the training set. NOTE: response values should be numeric not factor.  
一类标签的训练集(0或1)的向量。注:响应值应该是数值不是因素。


参数:test.cli
A data frame or matrix of containing predictors of the test set from clinical data, where columns correspond to samples and rows to features.  
预测从临床资料,其中列对应功能的样品和行测试数据框或矩阵。


参数:test.gen
A data frame or matrix containing predictors of the test set from genomic data, where columns correspond to samples and rows to features.  
一个数据框或矩阵预测从基因组数据,其中列对应功能的样品和行测试。


参数:test.label
A vector of the class labels (0 or 1) of the test set (optional). NOTE: response values should be numeric not factor.   
之类的标签的测试集(可选)(0或1)向量。注:响应值应该是数值不是因素。


参数:type
Type of classification algorithms. Currently 9 different types of algorithm are available. They are: top scoring pair (TSP), logistic regression (GLM), GLM with L1 (lasso) penalty, GLM with L2 (ridge) penalty, prediction analysis for microarray (PAM), support vector machine (SVM), random forest method after partial least square dimension reduction (plsrf_x), random forest method after partial least square dimension reduction plus prevalidation (plsrf_x_pv), random forest (RF). NOTE: TSP, PAM, plsrf_x and plsrf_x_pv algorithms does not work with clinical data.  
类型的分类算法。目前有9种不同的算法。它们分别是:(TSP),得分最高的一对,罗吉斯回归(GLM),L1(套索)的罚款,罚款,L2(脊)芯片的预测分析(PAM),支持向量机(SVM),随机森林法的GLM的GLM后偏最小二乘降维(plsrf_x),随机森林法后,偏最小二乘降维加上prevalidation(plsrf_x_pv),随机森林(RF)。注:TSP,PAM plsrf_x和plsrf_x_pv算法不起作用的临床资料。


参数:RStype
Which values are used to calculate the reclassification score (RS)? There are three options available: proximity, rank and both. If set to proximity, RS will be calculated directly from the proximity value. If set to rank, RS calculate based on the rank of the proximity values (more robust). If set to both, both of them will be calculated. Default is rank.  
被用来计算重新定级评分(RS)的值?有三种选择:接近,排名都。如果设置为接近,RS将被计算直接从邻近的价值。如果设置排名,遥感计算基础上的接近值(更强大)的排名。如果设置两个,他们都将被计算。默认是排名。


参数:Parallel
Should class prediction and proximity calculation use the parallel processing procedure? Default is FALSE.  
一流的预测和接近计算应使用并行处理程序?默认为false。


参数:CVtype
Cross validation type.  
交叉验证类型。


参数:outerkfold
Number of cross validation used in the training phase.  
在训练阶段采用交叉验证的数目。


参数:innerkfold
Number of cross validation used to estimate the model parameters.  
交叉验证的数量估计模型参数。


参数:ncpus
Number of CPUs assign to the parallel computation.  
CPU的数量分配给并行计算。


参数:N
Number of repetition for calculating the proximity matrix, final proximity matrix is average of these repeats. We recommend setting this number high, so that more stable proximity matrix will be produced. Default is 50.  
重复计算的相似性矩阵的数量,最终接近矩阵是这些重复的平均值。我们建议设置这个数字高,因此,将产生更稳定的相似性矩阵。默认值是50。


参数:featurenames
Feature names in molecular data (e.g. gene or probe names). If given, function also produces name of the selected feature during the training and test phases. Feature selection only works with "TSP", "GLM_L1" and "GLM_L2" algorithms. "RF" provides feature importance.  
分子数据的功能名称(如基因或探针的名称)。如果给定的,功能也产生所选功能的名称,在训练和测试阶段。只适用于“TSP问题”,“GLM_L1”和“GLM_L2”算法的特征选择。 “射频”功能的重要性。


参数:plot.it
If set to TRUE, this function produces a plot in which Y axis denotes the accuracy and X denotes the percentage of samples passed to the second stage. In order to make this plot, class labels and molecular data for the test set must be given. Default is TRUE.   
如果设置为TRUE,这个函数会产生一个图,其中Y轴表示的准确性和X表示传递到第二阶段的样品的百分比。为了使这个图,一流的标签和测试集的分子数据,必须给予。默认值是TRUE。


Details

详情----------Details----------

This function requires the molecular information for a group of samples (called test here) which are not used in the training phase. Based on their RS a accuracy curve will be attained by setting different thresholds. If the option type, has length 2 (two classification algorithm types are given), then first one used for the prediction using clinical data and second one used for the prediction using molecular data. If only one algorithm type is given, the same algorithm used for both data types. Note that, TSP, PAM, plsrf_x and plsrf_x_pv algorithms does not work with clinical data.
此功能需要一个样本组(称为测试这里)这是不是在训练阶段使用的分子信息。根据他们的RS A精度曲线将达到通过设置不同的阈值。如果选项类型,长度为2(两种分类算法类型),然后第一个用于临床数据,并利用分子数据的预测所用的第二个使用的预测。如果只有一个算法类型,相同的算法,用于这两个数据类型。请注意,TSP,PAM plsrf_x和plsrf_x_pv算法不起作用的临床资料。


值----------Value----------

A list object with the following components:
具有下列组件列表中的对象:


参数:Predicted.cli
A list object includes the predicted class labels of the training set and the test set if test is given. This is classification result with clinical data.
如果测试给出一个列表对象包括预测类标签的训练集和测试集。这与临床资料的分类结果。


参数:Predicted.gen
A list object includes the predicted class labels of the training set and the test set if test is given. This is classification result with molecular data.
如果测试给出一个列表对象包括预测类标签的训练集和测试集。这是与分子数据的分类结果。


参数:Proximity
A list object of proximity matrices which includes the proximity matrix of test samples in the clinical data space (ncol(test.cli) by ncol(train.cli) matrix) and the proximity matrix of the training samples in the genomic data space (ncol(train.gen) by ncol(train.gen) matrix).
一个接近矩阵列表对象,其中包括接近测试样品基质和空间(由NCOL(train.cli)矩阵(test.cli)NCOL)的临床资料,在训练样本的基因组数据的空间相似性矩阵(NCOL (train.gen)由NCOL(train.gen)矩阵)。


参数:RS
If the "type" set to "rank" ("proximity"), it gives a vector of re-classification scores calculated from the ranking (proximity) approach , otherwise it gives a matrix with two columns and size of rows equal number of test samples, calculated using the both approaches.
如果“类型”设置为“排名”(“近水楼台”),它提供了从排名(接近)的方法计算分数重新分类的向量,否则,它使两列和行的大小相等的矩阵试验样品的数量,使用这两种方法计算。


参数:Accuracy
If test.gen is given, accuracies corresponding to different percentage of samples are classified with molecular data are produced. If RStype is set to rank or proximity, accuracy will be a vector. If RStype is set to both, accuracy will be a matrix with two columns and eleven rows. First column corresponding to accuracies when RS is calculated using the rank of proximity, second column corresponding to accuracies when RS is calculated using the proximity.
如果test.gen给出精度对应不同的样品百分比分类与分子数据产生。 ,如果RStype设置排名或接近,精度将是一个向量。 ,如果RStype设置既,准确性将是一个两列和11列的矩阵。第一列相应的,当RS是使用接近,第二列相应的,当RS是使用接近计算的精度等级计算的精度。


参数:Param
A list object contains the values of parameters specified by user.
一个列表对象包含由用户指定的参数值。


参数:Matrices
A list object contains the input data matrices.
一个列表对象包含输入数据矩阵。


作者(S)----------Author(s)----------



Askar Obulkasim

Maintainer: Askar Obulkasim <askar.wubulikasimu@vumc.nl>




参见----------See Also----------

Classifier, Classifier.par, Proximity, RS.generator
Classifier,Classifier.par,Proximity,RS.generator


举例----------Examples----------


data(CNS)
tr.cli <- t(CNS$cli[1:40, ])
te.cli <- t(CNS$cli[41:60, ])
tr.gen <- CNS$mrna[, 1:40]
te.gen <- CNS$mrna[, 41:60]
tr.label <- CNS$class[1:40]
te.label <- CNS$class[41:60]
result <- Curve.generator(train.cli=tr.cli, train.gen=tr.gen, train.label=tr.label, test.cli=te.cli,
                         test.gen=te.gen, test.label=te.label, type = c("GLM_L1", "GLM_L2"),
                         RStype = "rank", Parallel = FALSE, CVtype = "k-fold", outerkfold = 2,
                         innerkfold = 2, N = 2)
names(result)

转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2025-1-23 12:19 , Processed in 0.022684 second(s), 15 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表