R语言 dualKS包 dksTrain()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-25 16:48:56

dksTrain(dualKS)
dksTrain()所属R语言包：dualKS

                                       Perform Dual KS Discriminant Analysis
                                       执行双KS判别分析

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

This function will perform dual KS discriminant analysis on a  training set of gene expression data (in the form of an  ExpressionSet) and a vector of classes describing which of  (two or more) classes each column of data corresponds to.  Genes  will be be ranked based on the degree to which they are  upregulated or downregulated in each class, or both. Discriminant gene signatures are then extracted using  dksSelectGenes and applied to new samples with dksClassify.
此功能将执行对基因表达数据的（在一个ExpressionSet形式）和描述（两个或两个以上）类，每列数据对应的类向量设置培训双KS判别分析。基因会被排名的基础上他们是上调或下调，在每一类中，或两者的程度。判别基因签名，然后提取使用dksSelectGenes和应用于与dksClassify新样品。

用法----------Usage----------

dksTrain(eset, class, type = "up", verbose=FALSE, weights=FALSE, logweights=TRUE, method='kort')

参数----------Arguments----------

参数：eset
Gene expression data in the form of an  ExpressionSet or matrix
基因表达数据在ExpressionSet或matrix形式

参数：class
A factor with two or more levels indicating which  class each sample in the expression set belongs OR  an integer indicating which column of pData(eset)  contains this information.
与两个或两个以上的水平，表明每个样本中的表达集属于哪一类的因素或一个整数，指示PDATA（ESET）列包含此信息。

参数：type
One of "up", "down", or "both" indicating whether you  want to analyze and classify based on up or down  regulated genes, or both (note that classification of  samples based on down regulated genes from single  color experiments should be expected to work well due  to the noise at low expression levels.  Therefore,  'down', or 'both' should only be used for two color  experiments or one color data that has been converted  to ratios based on some reference sample(s).)
一个“上升”，“打倒”，或“既”，表示是否要进行分析和分类的基础上向上或向下调节的基因，或两个（注下来的样本分类监管从单一的颜色实验的基因应预期工作，由于在表达水平低的噪音。因此，向下，或既只应使用两种颜色的实验或一个颜色数据已被转换比率的基础上有一定的参考样本（））。

参数：verbose
Set to TRUE if you want more evidence of progress while data is being processed.  Set to FALSE if you  want your CPU cycles to be used on analysis and not  printing messages.
如果你想了解更多进展的证据正在处理数据时，设置为TRUE。设置为FALSE，如果你想你的CPU周期来进行分析和打印消息。

参数：weights
Value determines whether and how genes are weighted  when building the signatures.  See details.
基因是否和如何建立的签名时，加权值决定。查看详情。

参数：logweights
Should the weights be log10 transformed prior to applying?
应的重量LOG10改造前申请？

参数：method
Two methods are supported.  The 'kort' method returns  the maximum of the running sum.  The 'yang' method  returns the sum of the maximum and the minimum of the  running sum, thereby penalizing genes that are highly enriched in a subset of samples of a given class, but highly  down regulated in another subset of that same class.
两种方法都支持。 KORT“方法返回的运行总和的最大值。阳方法返回的最大和最低的运行总和的总和，从而惩罚某一类样本的一个子集，高度丰富的基因，但在另一个子集，同一类管制高度下降。

Details

详情----------Details----------

This function calculates the Kolmogorov-Smirnov rank sum statistic for  each gene and each level of 'class'.  The highest scoring genes can  then be extracted for use in classification.
此函数计算柯尔莫哥洛夫 - 斯米尔诺夫每一个基因的秩和统计，每个“类”的水平。得分最高的基因，然后可以提取使用的分类。

If weights=FALSE, signatures are defined based on the ranks of members  of each class when sorted on each gene.  Those genes for which a given  class has the highest rank when sorting samples by those genes will  be included in the classifier, with no regard to the absolute expression  level of those genes.  This is the classic KS statistic.
如果权重= FALSE时，签名定义的基础上每类成员的行列时，每个基因排序。为某一类具有最高等级的基因样本进行排序时，这些基因将被列入分类，没有绝对的基因表达水平方面。这是经典的KS统计。

Very discriminant genes identified in this way may or may not be the  highest expressed genes.  The result is that signatures identified  in this way have arbitrary "baseline" values.  This may lead to  misclassification when comparing two signatures (using, for example,  dksClassify).  Therefore, one may wish to weight genes  based on absolute expression level, or some other metric.
以这种方式确定的判别基因可能或可能不是最高的基因表达。其结果是，在这种方式确定的签名有任意的“基线”值。当比较两个签名（例如，使用dksClassify），这可能会导致误判。因此，不妨以绝对的表达水平，或其他一些度量的重量基因。

Setting weights = TRUE causes the genes to be weighted according  to the log (base 10) of the relative rank of the mean expression of  each gene in each class.  Alternatively, you may provide your own weight  matrix as the argument to weights.  This matrix must have one  column for each possible value of class, and one row for each  gene in eset.  Note that for type='down' or the down  component of type='both', the weight matrix will be inverted  as 1-matrix, so the range of weights should be 0 - 1 for each  class.  NAs are handled "gracefully" by discarding any  genes for which any column of the corresponding row of weights  is NA.  Our experience has been that weights that are a linear function  of some feature of the gene expression (like mean) can be too subtle.  The  effect of the weights can be increased by setting logweights=TRUE  (which is the default).
设置weights = TRUE会根据每一类中的每个基因的平均表达的相对排名的log（基数为10）的基因进行加权。另外，你也可以提供自己的权重矩阵作为weights参数。这个矩阵必须有一列的每个可能值class，每个eset的基因行。注意权重矩阵type='down'或type='both'下来组件，将反转为1-matrix，所以权重的范围应该是0  - 为每个类1。定居正在处理“优雅”丢弃任何的weights的相应行列是不适用的任何基因。我们的经验，一直是一些基因表达的功能（如平均值）的线性函数的权重，可能过于微妙。权重的影响，可以通过设置增加logweights=TRUE（这是默认值）。

值----------Value----------

An object of class DKSGeneScores.
对象类DKSGeneScores。

作者（S）----------Author(s)----------

Eric J. Kort, Yarong Yang

参见----------See Also----------

dksTrain, dksSelectGenes, dksClassify, DKSGeneScores,  DKSPredicted,
dksTrain，dksSelectGenes，dksClassify，DKSGeneScores，DKSPredicted

举例----------Examples----------

data("dks")
tr <- dksTrain(eset, 1, "up")
cl <- dksSelectGenes(tr, 100)
pr <- dksClassify(eset, cl)
summary(pr, pData(eset)[,1])
show(pr)
plot(pr, actual=pData(eset)[,1])

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册