找回密码
 注册
查看: 1601|回复: 0

R语言 pamr包 pamr.train()函数中文帮助文档(中英文对照)

  [复制链接]
发表于 2012-9-24 00:45:20 | 显示全部楼层 |阅读模式
pamr.train(pamr)
pamr.train()所属R语言包:pamr

                                         A function to train a nearest shrunken centroid
                                         培养了最近的萎缩质心的功能

                                         译者:生物统计家园网 机器人LoveR

描述----------Description----------

A function that computes a nearest shrunken centroid for gene expression (microarray) data
一个函数,计算一个最接近的萎缩基因表达的质心(微阵列)数据


用法----------Usage----------


pamr.train(data, gene.subset=NULL, sample.subset=NULL,
         threshold = NULL, n.threshold = 30,
        scale.sd = TRUE, threshold.scale = NULL, se.scale = NULL, offset.percent = 50,
         hetero=NULL, prior = NULL, remove.zeros = TRUE, sign.contrast="both",
ngroup.survival = 2)



参数----------Arguments----------

参数:data
The input data. A list with components: x- an expression genes in the rows, samples in the columns), and y-  a vector of the class labels for each sample.  Optional components- genenames, a vector of gene names, and geneid- a vector of gene identifiers.
输入数据。列表组件:的x在行中,样品中的列)的表达基因,和y的向量对每个样品的类标签。可选组件genenames,基因名称的向量,和geneid的向量基因标识符。


参数:gene.subset
Subset of genes to be used.  Can be either a logical vector of length total number of genes, or a list of integers of the row numbers of the genes to be used
的基因的子集被使用。可以是一个逻辑的矢量长度的基因的总数,或要使用的列表的基因的行号的整数


参数:sample.subset
Subset of samples to be used.  Can be either a logical vector of length total number of samples, or a list of integers of the column numbers of the samples to be used.
要使用的样本的子集。可以是一个逻辑的矢量的长度的样本总数,或要使用的样品的列编号的列表的整数。


参数:threshold
A vector of threshold values for the centroid shrinkage.Default is a set of 30 values chosen by the software
的阈值的值的重心shrinkage.Default的向量是一组30的值,可以通过软件选择


参数:n.threshold
Number of threshold values desired (default 30)
所需的阈值(默认为30)


参数:scale.sd
Scale each threshold by the wthin class standard deviations? Default: true
缩放每个阈值的wthin类标准差吗?默认值:true


参数:threshold.scale
Additional scaling factors to be applied to the thresholds. Vector of length equal to the number of classes. Default- a vectors of ones.
额外的缩放因子被应用到的阈值。向量的长度相等的类的数量。默认的的一个向量。


参数:se.scale
Vector of scaling factors for the within class standard errors. Default is sqrt(1/n.class-1/n), where n is the overall sample size and n.class is the sample sizes in each class. This default adjusts for different class sizes.
向量的内一流的标准误差的比例系数。默认值是SQRT(1/n.class-1/n),其中n是样本总体的的大小和n.class是每类的样本量。默认情况下调整不同的班级规模。


参数:offset.percent
Fudge factor added to  the denominator of each t-statistic, expressed as a percentile of the gene standard deviation values. This is a small positive quantity to penalize genes with expression values near zero, which can result in very large ratios. This factor is expecially impotant for Affy data. Default is the median of the standard deviations of each gene.
忽悠因子添加到每个t-统计量的分母,表示为一个百分点的基因标准偏差值。这是一个小的正数惩罚基因表达式的值接近零,这可能会导致非常大的比例。这个因素是expecially impotant AFFY数据。默认值是每个基因的标准偏差的中位数。


参数:hetero
Should a heterogeneity transformation be done? If yes, hetero must be set to one of the class labels (see Details below). Default is no (hetero=NULL)  
如果异质性转型做呢?如果是的话,异质必须设置之类的标签之一(详见下文)。默认是没有(杂= NULL)


参数:prior
Vector of length the number of classes, representing prior probabilities for each of the classes. The prior is used in Bayes rule for making class prediction. Default is NULL, and prior is then taken to be  n.class/n, where  n is the overall sample size and n.class is the sample sizes in   each class.
矢量的长度数的类,每个类的先验概率。之前采用的是贝叶斯规则类预测。默认值是NULL,在此之前,然后采取n.class / n,其中n是样本总体的大小和n.class是在每个类中的样本规模。


参数:remove.zeros
Remove threshold values yielding zero genes? Default TRUE
删除阈值,产生零基因?默认为true


参数:sign.contrast
Directions of allowed deviations of class-wise average gene  expression from the  overall average  gene expression. Default is “both” (positive or negative). Can also  be set to “positive” or “negative”.
从整体的平均基因表达的类明智的的平均基因的表达,允许偏差的方向。默认值是“既”(正或负)。也可以被设置为“正”或“负”。


参数:ngroup.survival
Number of groups formed for survival data. Default 2
形成的生存数据的组数。默认值2


Details

详细信息----------Details----------

pamr.train fits a nearest shrunken centroid classifier to gene expression data. Details may be found in the PNAS paper referenced below. One feature not described there is "heterogeneity analysis". Suppose there are two classes labelled "A" and "B". CLass "A" is considered a normal class, and "B" an abnormal class. Setting hetero="A" transforms   expression values x[i,j] to |x[i,j]- mean(x[i,j])| where the mean is taken only over samples in class "A". The transformed feature values are then used in Pam. This is useful when the abnormal class "B" is heterogeneous, i.e. a given gene might have higher expresion than normal for some class "B" samples, and lower for others. With more than 2 classes, each class is centered on the class specified by hetero.
pamr.train萎缩适合最近的质心的基因表达数据的分类。详情可在美国国家科学院院刊下面引用。一个没有的特性的描述是“异质性分析”。假设有两个类标记为“A”和“B”。 “A”级被认为是一个普通的类,和“B”异常类。设置异质=“A”将表达式的值×[我,J]到| X [I,J]  - 平均(X [我,J)平均只样品中类“A”。转化的特征值,然后使用在Pam。这是非常有用的,当异常“B”级是异构的,即一个特定的基因可能有一些“B”级样品中表达较正常高和低。超过2个班,每个班是集中在指定的类的异质。


值----------Value----------

A list with components
组件列表


参数:y
The outcome classes.  
结果类。


参数:yhat
A matrix of predicted classes, each column representing the results from one threshold. </table>
预测类的矩阵,每一列代表一个阈值的结果。 </ TABLE>

.



参数:prob
A array of predicted class probabilities. of dimension n by nclass by n.threshold. n is the number samples, nclass is the number of classes, n.threshold is the number of thresholds tried
一个数组的预测类的概率。的n维nclass的n.threshold。 n为样品数量,nclass的类的数量,n.threshold的阈值是多少,试图


参数:centroids
A matrix of (unshrunken) class centroids, n by nclass
A矩阵类质心的(unshrunken)的,正nclass的


参数:hetero
Value of hetero used in call to pamr.train
杂的价值,在调用pamr.train


参数:norm.cent
Centroid of "normal" group, if hetero was specified
质心的“正常”的基团,如果被指定的杂


参数:centroid.overall
A vector containing the (unshrunken) overall centroid (all classes together)
一个向量(unshrunken)整体质心(一起上课)


参数:sd
A vector of the standard deviations for each gene
每个基因的向量的标准偏差


参数:threshold
A vector of the threshold tried in the shrinkage
一个向量的阈值试图在收缩


参数:nonzero
A vector of the number of genes that survived the thresholding, for each threshold value tried
阈值处理中幸存下来的基因的数量,一种向量,为每个阈值试图


参数:threshold.scale
A vector of threshold scale factors that were used
的矢量阈值的比例因子,


参数:se.scale
A vector of standard error scale factors that were used
一个向量的标准误差尺度的因素


参数:call
The calling sequence used
使用的调用序列


参数:prior
The prior probabilities used
所使用的先验概率


参数:errors
The number of trainin errors for each threshold value
赖宁错误的每个阈值的数目


(作者)----------Author(s)----------


Trevor Hastie,Robert Tibshirani, Balasubramanian Narasimhan, and Gilbert Chu  



参考文献----------References----------

Diagnosis of multiple cancer types by shrunken centroids of gene expression

实例----------Examples----------


#generate some data[生成一些数据]
set.seed(120)
x <- matrix(rnorm(1000*20),ncol=20)
y <- sample(c(1:4),size=20,replace=TRUE)
mydata <- list(x=x,y=factor(y))

#train classifier[火车分类]
results<-   pamr.train(mydata)

# train classifier on all  data except class 4[4级分类器上的所有数据,但火车]
results2 <- pamr.train(mydata,sample.subset=(mydata$y!=4))

# train classifier on  only the first 500 genes[仅在第500个基因的火车分类]
results3 <- pamr.train(mydata,gene.subset=1:500)


转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2024-11-23 20:04 , Processed in 0.034876 second(s), 16 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表