dksTrain(dualKS)
dksTrain()所属R语言包:dualKS
Perform Dual KS Discriminant Analysis
执行双KS判别分析
译者:生物统计家园网 机器人LoveR
描述----------Description----------
This function will perform dual KS discriminant analysis on a training set of gene expression data (in the form of an ExpressionSet) and a vector of classes describing which of (two or more) classes each column of data corresponds to. Genes will be be ranked based on the degree to which they are upregulated or downregulated in each class, or both. Discriminant gene signatures are then extracted using dksSelectGenes and applied to new samples with dksClassify.
此功能将执行对基因表达数据的(在一个ExpressionSet形式)和描述(两个或两个以上)类,每列数据对应的类向量设置培训双KS判别分析。基因会被排名的基础上他们是上调或下调,在每一类中,或两者的程度。判别基因签名,然后提取使用dksSelectGenes和应用于与dksClassify新样品。
用法----------Usage----------
dksTrain(eset, class, type = "up", verbose=FALSE, weights=FALSE, logweights=TRUE, method='kort')
参数----------Arguments----------
参数:eset
Gene expression data in the form of an ExpressionSet or matrix
基因表达数据在ExpressionSet或matrix形式
参数:class
A factor with two or more levels indicating which class each sample in the expression set belongs OR an integer indicating which column of pData(eset) contains this information.
与两个或两个以上的水平,表明每个样本中的表达集属于哪一类的因素或一个整数,指示PDATA(ESET)列包含此信息。
参数:type
One of "up", "down", or "both" indicating whether you want to analyze and classify based on up or down regulated genes, or both (note that classification of samples based on down regulated genes from single color experiments should be expected to work well due to the noise at low expression levels. Therefore, 'down', or 'both' should only be used for two color experiments or one color data that has been converted to ratios based on some reference sample(s).)
一个“上升”,“打倒”,或“既”,表示是否要进行分析和分类的基础上向上或向下调节的基因,或两个(注下来的样本分类监管从单一的颜色实验的基因应预期工作,由于在表达水平低的噪音。因此,向下,或既只应使用两种颜色的实验或一个颜色数据已被转换比率的基础上有一定的参考样本( ))。
参数:verbose
Set to TRUE if you want more evidence of progress while data is being processed. Set to FALSE if you want your CPU cycles to be used on analysis and not printing messages.
如果你想了解更多进展的证据正在处理数据时,设置为TRUE。设置为FALSE,如果你想你的CPU周期来进行分析和打印消息。
参数:weights
Value determines whether and how genes are weighted when building the signatures. See details.
基因是否和如何建立的签名时,加权值决定。查看详情。
参数:logweights
Should the weights be log10 transformed prior to applying?
应的重量LOG10改造前申请?
参数:method
Two methods are supported. The 'kort' method returns the maximum of the running sum. The 'yang' method returns the sum of the maximum and the minimum of the running sum, thereby penalizing genes that are highly enriched in a subset of samples of a given class, but highly down regulated in another subset of that same class.
两种方法都支持。 KORT“方法返回的运行总和的最大值。 阳方法返回的最大和最低的运行总和的总和,从而惩罚某一类样本的一个子集,高度丰富的基因,但在另一个子集,同一类管制高度下降。
Details
详情----------Details----------
This function calculates the Kolmogorov-Smirnov rank sum statistic for each gene and each level of 'class'. The highest scoring genes can then be extracted for use in classification.
此函数计算柯尔莫哥洛夫 - 斯米尔诺夫每一个基因的秩和统计,每个“类”的水平。得分最高的基因,然后可以提取使用的分类。
If weights=FALSE, signatures are defined based on the ranks of members of each class when sorted on each gene. Those genes for which a given class has the highest rank when sorting samples by those genes will be included in the classifier, with no regard to the absolute expression level of those genes. This is the classic KS statistic.
如果权重= FALSE时,签名定义的基础上每类成员的行列时,每个基因排序。为某一类具有最高等级的基因样本进行排序时,这些基因将被列入分类,没有绝对的基因表达水平方面。这是经典的KS统计。
Very discriminant genes identified in this way may or may not be the highest expressed genes. The result is that signatures identified in this way have arbitrary "baseline" values. This may lead to misclassification when comparing two signatures (using, for example, dksClassify). Therefore, one may wish to weight genes based on absolute expression level, or some other metric.
以这种方式确定的判别基因可能或可能不是最高的基因表达。其结果是,在这种方式确定的签名有任意的“基线”值。当比较两个签名(例如,使用dksClassify),这可能会导致误判。因此,不妨以绝对的表达水平,或其他一些度量的重量基因。
Setting weights = TRUE causes the genes to be weighted according to the log (base 10) of the relative rank of the mean expression of each gene in each class. Alternatively, you may provide your own weight matrix as the argument to weights. This matrix must have one column for each possible value of class, and one row for each gene in eset. Note that for type='down' or the down component of type='both', the weight matrix will be inverted as 1-matrix, so the range of weights should be 0 - 1 for each class. NAs are handled "gracefully" by discarding any genes for which any column of the corresponding row of weights is NA. Our experience has been that weights that are a linear function of some feature of the gene expression (like mean) can be too subtle. The effect of the weights can be increased by setting logweights=TRUE (which is the default).
设置weights = TRUE会根据每一类中的每个基因的平均表达的相对排名的log(基数为10)的基因进行加权。另外,你也可以提供自己的权重矩阵作为weights参数。这个矩阵必须有一列的每个可能值class,每个eset的基因行。注意权重矩阵type='down'或type='both'下来组件,将反转为1-matrix,所以权重的范围应该是0 - 为每个类1。定居正在处理“优雅”丢弃任何的weights的相应行列是不适用的任何基因。我们的经验,一直是一些基因表达的功能(如平均值)的线性函数的权重,可能过于微妙。权重的影响,可以通过设置增加logweights=TRUE(这是默认值)。
值----------Value----------
An object of class DKSGeneScores.
对象类DKSGeneScores。
作者(S)----------Author(s)----------
Eric J. Kort, Yarong Yang
参见----------See Also----------
dksTrain, dksSelectGenes, dksClassify, DKSGeneScores, DKSPredicted,
dksTrain,dksSelectGenes,dksClassify,DKSGeneScores,DKSPredicted
举例----------Examples----------
data("dks")
tr <- dksTrain(eset, 1, "up")
cl <- dksSelectGenes(tr, 100)
pr <- dksClassify(eset, cl)
summary(pr, pData(eset)[,1])
show(pr)
plot(pr, actual=pData(eset)[,1])
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|