R语言 rocc包 o.rocc()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-9-27 22:40:26

o.rocc(rocc)
o.rocc()所属R语言包：rocc

                                    LOOCV using the ROC based classifier
                                       LOOCV使用中华民国的分类

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

The function performs classification by leave-one-out-cross-validation (LOOCV) using the ROC based classifier:  Features are combined to a metagene by the mean expression and samples are ranked according to the metagene expression. The metagene threshold that yields optimal accuracy in the training samples is then used to classify new samples according to their metagene expression values.
该函数执行了交叉验证（LOOCV），使用ROC基于分类的特点相结合，一个metagene的平均表达和样品的许可分类排名根据的metagene表达。训练样本的metagene的阈值，可得最佳精度，然后使用新的样本分类根据他们metagene的表达式的值。

用法----------Usage----------

o.rocc(g, out, xgenes = 200)

参数----------Arguments----------

参数：g
the input data in form of a matrix with genes as rows and samples as columns. rownames(g) and colnames (g) must be specified.
与作为列的行和样品的基因作为一个矩阵的形式的输入数据。 rownames（g）及colnames（G），必须指定。

参数：out
describes the phenotype of the samples. a factor vector with levels 0 and 1 (in this order)  with as many values as there are samples.
描述的表型的样本。向量与0和1的水平（按照这个顺序）与尽可能多的价值有样品的一个因素。

参数：xgenes
number of genes in the classifier. numeric vector with length of at least 1.
在分级的基因数。长度至少为1的数值向量。

Details

详细信息----------Details----------

For feature selection the genes are ranked by AUC values and the top ranking AUC genes according to xgenes are picked (all AUCs below 0.5 are mirrored).  To obtain AUC values for a given gene signature an arithmetic mean is computed by summing up the expression values after multiplying expression values for genes negatively associated with the feature (AUC below 0.5) by -1.  The resulting expression values for the thus formed metagenes are then used in ROC analysis. The optimal split of positive (i.e., 1) and negative (i.e., 0) samples is determined as the metagene expression threshold which produces the highest accuracy, correct class assignments in respect to the real class, in the training set. The split yielding optimal accuracy in the ROC curve is determined using the package ROCR. The threshold is computed as the mean metagene expression value of the two samples at the boarder of the split.  A new sample to be classified has its metagene expression value determined with the same genes to be multiplied by -1. The new sample is classified according to which side of the threshold the sample falls in, with a sample having higher metagene expression being classified as positive (i.e., 1) and with lower expression as negative (i.e., 0). Performance of the classifier is estimated in the dataset by leave-one-out cross validation with feature selection and classifier specification repeated in each loop to ensure that the remaining sample has not seen the classifier.
特征选择的基因的排名AUC值和顶部的排名AUC基因，根据xgenes采摘（所有AUC值低于0.5的镜像）。 AUC值要获得一个给定的基因签名的算术平均的计算方法总结后的表达式的值表达式的值相乘的功能（AUC 0.5以下）的负相关，与-1的基因。然后，使用得到的表达值由此形成metagenes的ROC分析。最优分割的正（即，1）和阴性（也就是0）来确定样品的metagene表达阈值产生最高的准确性，正确的类别分配到真正的类，在训练集。分割得到最佳精度的ROC曲线确定使用包ROCR。该阈值被计算为平均metagene表达式的值的分割的边界（boarder）处的两个样本。以被分类一个新的样品，具有其的metagene表达确定的值乘以-1具有相同的基因。新的样本是根据哪一方的阈值落在样品，与样品具有较高的metagene被归类为阳性（即，1）的表达和低表达为阴性（即0）分类。留一交叉验证在每个循环中重复，以确保剩余的样品还没有看到分类的特征选择和分类规范，分类器的性能，预计在数据集中。

值----------Value----------

a list (orocc object) with components <table summary="R valueblock"> <tr valign="top"><td>confusion</td> <td> a matrix containing the classifier performance in leave-one-out cross validation for each gene size determined by xgenes. The following measures are returned: accuracy, lower and upper 95 percent confidence interval, the largest prior class (accuracy null), the p value from the binomial test that the accuracy is different from accuracy null, accuracy obtained by random assignment (using prior distributions), the p value from the binomial test that the accuracy is different from random assignment, sensitivity, specificity, positive predictive value, negative predictive value, prevalence, contingency table values of predicted versus true class, and the balanced accuracy that is calculated as (sensitivity+specificity)/2. </td></tr> <tr valign="top"><td>concordance</td> <td> a matrix that contains the predicted classification obtained by leave-one-out cross validation. Additionally the true classes (out) are shown. </td></tr> <tr valign="top"><td>method</td> <td> the classification method used: ROC.based.predictor. </td></tr>
一个的列表（orocc对象）的组件<table summary="R valueblock"> <tr valign="top"> <TD> confusion</ TD> <td>一个矩阵的分类器的性能，在离开一的交叉验证每个基因的大小由xgenes。返回采取以下措施：精确度，下限和上限95％的置信区间，最大的前级（精度为null），二项测试的准确性是不同的精度为null，通过随机分配的准确性（使用先验分布的p值; ），p值的二项测试的准确度是不同的随机分配，灵敏度，特异性，阳性预测值，阴性预测值，患病率，列联表的预测值与真实类，平衡精度的计算公式为（灵敏度+特异度）/ 2。 </ TD> </ TR> <tr valign="top"> <TD> concordance</ TD> <td>一个矩阵，它包含通过留一交叉验证的预测分类。此外，真正的类（出）。 </ TD> </ TR> <tr valign="top"> <TD>method</ TD> <TD>的分类方法：ROC.based.predictor。 </ TD> </ TR>

</table>
</ TABLE>

注意----------Note----------

depends on the package ROCR
取决于包ROCR中

（作者）----------Author(s)----------

Martin Lauss

参考文献----------References----------

Lauss M, Frigyesi A, Ryden T, Hoglund M. Robust assignment of cancer subtypes from expression data using a uni-variate gene expression average as classifier. BMC Cancer 2010 (in print)

参见----------See Also----------

tr.rocc(), p.rocc()
tr.rocc p.rocc（），（）

实例----------Examples----------

## Random dataset and phenotype (small dataset for demonstration)[＃随机数据集和表型（示范小的数据集）]
## Dataset should be a matrix[＃数据集应是一个矩阵]
set.seed(100)
g <- matrix(rnorm(1000*25),ncol=25)
rownames(g) <- paste("Gene",1:1000,sep="_")
colnames(g) <- paste("Sample",1:25,sep="_")
## Phenotype should be a factor with levels 0 and 1: [＃表型与0级和1应该是一个因素：]
out <- as.factor(sample(c(0:1),size=25,replace=TRUE))
## Set the size of the Gene Signature[＃设置大小的基因签名]
xgenes=c(50,500)

####### o.rocc[＃＃＃＃＃＃o.rocc]

results<-o.rocc (g,out,xgenes)
results

####### performance of a given gene set by LOOCV in independent data[＃＃＃＃＃＃一个给定的基因的性能由LOOCV设置在独立的数据]

## load given genes (or take $genes from tr.rocc output)[＃负载基因（或$基因tr.rocc输出）]
genes<-paste("Gene",1:50,sep="_")
## load validation data[＃加载验证数据。]
set.seed(101)
f <- matrix(rnorm(1000*25),ncol=25)
rownames(f) <- paste("Gene",1:1000,sep="_")
colnames(f) <- paste("Sample",1:25,sep="_")
outf <- as.factor(sample(c(0:1),size=25,replace=TRUE))
## reduce validation set to gene signature genes[减少验证设置为基因标记基因]
f<-f[genes,]
## use all genes of reduced dataset for LOOCV[＃使用减少数据集的所有基因LOOCV]
xgenes<-length(genes)

resultval<-o.rocc (f,outf,xgenes)
resultval

######### o.rocc results can be redone as a LOOCV with tr.rocc und p.rocc functions[＃＃＃＃＃＃＃＃o.rocc的结果可以作为LOOCV与tr.rocc和p.rocc功能重做]

results$concordance[,"50"]

## now with a LOOCV loop of tr.rocc and p.rocc[＃现在一个的LOOCV循环的tr.rocc和p.rocc]
pr<-as.numeric(rep(NA,length(colnames(g))))
pr<-factor(pr,level=c(0,1))

for (i in 1:length(colnames(g))){
e<-g[,-i]
oute<-out[-i]
train<-tr.rocc(e,oute,xgenes=50)
procc<-p.rocc(train,g[,i]) ## ignore warnings, they dont apply here[＃忽略警告，他们不适用]
pr[i]<-procc
}

all.equal(results$concordance[,"50"],pr)
# TRUE[TRUE]

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册

R语言 rocc包 o.rocc()函数中文帮助文档(中英文对照)

浏览过的版块