R语言 anota包 anotaPerformQc()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-25 12:01:26

anotaPerformQc(anota)
anotaPerformQc()所属R语言包：anota

                                       Perform quality control to ensure that the supplied data set is suitable for Analysis of Partial Variance (APV) within anota.
                                       执行质量控制，以确保所提供的数据集是适合部分方差分析（APV）的范围内anota。

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

Generates a distribution of interaction p-values which are compared to the expected NULL distribution. Also assesses the frequency of highly influential data points using dfbetas for the regression slope and compares the dfbetas to randomly generated simulation data. Calculates omnibus class effects.
产生互动，比预期的NULL分布p值分布。还评估了极具影响力的数据，利用回归斜率dfbetas点频率和随机生成的模拟数据比较dfbetas。计算综合类影响。

用法----------Usage----------

anotaPerformQc(dataT=NULL, dataP=NULL, phenoVec=NULL,
generatePlot=FALSE, file="ANOTA_Total_vs_Polysomal_regressions.pdf",
nReg=200, correctionMethod="BH", useDfb=TRUE, useDfbSim=TRUE,
nDfbSimData=2000, useRVM=TRUE, onlyGroup=FALSE, useProgBar=TRUE)

参数----------Arguments----------

参数：dataT
A matrix with cytosolic mRNA data. Non numerical rownames are needed.
与单元内的表达数据矩阵。非数值rownames需要。

参数：dataP
A matrix with translational activity data. Non numerical rownames are needed.
翻译活动数据矩阵。非数值rownames需要。

参数：phenoVec
A vector describing the sample classes (each class should have a unique identifier). Note that dataT, dataP and phenoVec must have the same sample order so that column 1 in dataP is the translational activity data for a sample, column 1 in dataT is the cytosolic mRNA data and position 1 in phenoVec describes the sample class.
一个向量描述样本类（每类应该有一个唯一的标识符）。请注意，dataT，从datap和phenoVec必须有相同的样本为了使列从datap 1转化活动数据为样本，列dataT 1是单元质中的mRNA的数据和位置phenoVec 1描述了样本类。

参数：generatePlot
anota can plot the regression for each gene. However, as there are many genes, this output is normally not informative. Default is FALSE, no individual plotting.
anota可以绘制出每个基因的回归。然而，因为有许多基因，这个输出通常是不翔实。默认是假的，没有单独的图。

参数：file
If generatePlot is set to TRUE use file to set desired file name (prints to current directory as a pdf). Default is "ANOTA_Total_vs_Polysomal_regressions.pdf"
如果generatePlot设置为TRUE时使用的文件，设置所需的文件名（作为一个PDF打印到当前目录）。默认是“ANOTA_Total_vs_Polysomal_regressions.pdf”

参数：nReg
If generatePlot is set to TRUE, nReg can be used to limit the number of output plots. Default is 200.
如果generatePlot设置为TRUE，nReg可用于限制输出图。默认是200。

参数：correctionMethod
anota adjusts the omnibus interaction and sample class p-values for multiple testing. Correction method can be "Bonferroni", "Holm", "Hochberg", "SidakSS", "SidakSD", "BH", "BY", "ABH" or "TSBH" as implemented in the multtest package or "qvalue" as implemented in the qvalue package. Default is "BH".
anota调整的综合性互动多个测试和样品类的p值。纠正的方法可以是“邦弗朗尼”，“霍尔姆”，“Hochberg”，“SidakSS”，“SidakSD”，“波黑”，“靠”，“陆地棉”或“TSBH”实施在multtest包或的“qvalue”作为在qvalue包实施。默认是“波黑”。

参数：useDfb
Should anota assess the occurrence of highly influential data points (defult is TRUE)?
应anota评估极具影响力的数据点（defult是TRUE）的发生？

参数：useDfbSim
The random occurrence of dfbetas can be simulated. Default is TRUE. FALSE represses simulation which reduces computation time but makes interpretation of the dfbetas difficult.
随机发生的dfbetas可以模拟。默认值是TRUE。假压制模拟，从而减少计算时间，但使解释的dfbetas的困难。

参数：nDfbSimData
If useDfbSim is TRUE the user can select the number of samplings that will be performed per step (10 steps with different correlations between the translationally activty and the cytosolic mRNA level). Default is 2000.
如果useDfbSim为TRUE，用户可以选择将每一步与不同的相关性之间的翻译activty和单元内的mRNA水平（10步）进行的抽样数量。默认为2000。

参数：useRVM
The Random Variance Model (RVM) can be used for the omnibus sample class comparison. In this case the effect of RVM on the distribution of the interaction significances needs to be tested as well. Default (TRUE) leads to calculation of RVM p-values for both omnibus interactions and omnibus sample class effects.
可用于综合样本类比较随机方差模型（RVM的）。在这种情况下的互动意义上的分布，RVM的效果，以及需要进行测试。默认（TRUE），导致RVM的综合相互作用和综合样本类影响p值计算。

参数：onlyGroup
It is possible to suppress the omnibus interaction analysis and only perform the omnibus sample class effect analysis. Default is FALSE (analyse both interactions and sample class effects.)
这是可能的镇压综合互动分析，只有执行的综合性的示例类效果分析。默认值为FALSE（分析两者相互作用和样品类效果。）

参数：useProgBar
Should the progress bar be shown. Default is TRUE, show progress bar.
应该显示进度条。默认值为true，显示进度栏。

Details

详情----------Details----------

The anotaPerformQc performs the basic quality control of the data set. Two levels of quality control are assessed, both of which need to show good performance for valid application of anota. First, anota assumes that there are no interactions (for slopes). The output for this analysis is  both a density plot and a histogram plot of both the raw p-values and the p-values adjusted by the selected multiple correction method (if RVM was used, the second page shows the same presentation using RMV p-values). anota requires a uniform distribution of the raw interaction p-values for valid analysis of differential translation. anota also assesses if there are more data points with high influence on the regression analyses than would be expected by chance. anota identifies influential data points as data points that influence the slope of the regression using standardized dfbeta (dfbetas). In the literature there are multiple suggestions of what should be regarded as an outlier dfbetas (dfbetas>1, dfbetas>2, dfbetas>3, dfbetas>(2/sqrt(N)), dfbetas>(3/sqrt(N)), dfbetas>(3.5*IQR)). Independent of which threshold is preferred, what is of interest is the comparison to the underlying distribution. As this distribution is unknown, we simulate random data sets assuming that the cytosolic mRNA level and the translationally active mRNA levels are normally distributed and that there is a correlation between the cytosolic and the translationally active mRNA level. Following such simulation the frequencies of outlier dfbetas (using all thresholds) is compared to the frequencies found in the simulated data set. The function also performs an omnibus sample class effect test if there are more than 2 sample classes. It is possible to use RVM for the omnibus sample class statistics. If RVM is used, it is necessary to verify that the interaction RVM p-values also follow the expected NULL distribution. A rare error can occur when data within dataT or dataP from any gene and any sample class has no variance. This is reported as "ANOVA F-TEST on essentially perfect fit...". In this case those genes that show no variance for a sample class within either dataT or dataP need to be removed before analysis. Trying a different normalization method may fix the problem.
anotaPerformQc执行数据集的基本质量控制。两个层次的质量控制进行评估，这两者需要显示anota有效的应用程序的性能好。首先中，anota假定有没有相互作用（斜坡）。这种分析的输出密度图和原料的P-值和方法（如果使用了RVM的，第二页显示了同样的演示，使用RMV选定多个校正调整的p值的直方图图P-值）。 anota需要一个有效的分析差的翻译原料互动p值均匀分布。 anota也评估，如果有更多的数据回归分析影响高点比机会预计将。 anota标识为数据点的影响的回归使用，标准化dfbeta（dfbetas）的斜坡有影响力的数据点。在文献中有多个建议，应视为作为离群dfbetas（> 1 dfbetas，dfbetas> 2，dfbetas> 3，dfbetas>（2/sqrt（N））的，dfbetas>（3/sqrt（N）的） dfbetas>（3.5 * IQR为））。独立的阈值是首选，是底层的分布比较感兴趣的是什么。由于这种分布是未知的，我们模拟随机数据集假设，单元内的mRNA水平和mRNA水平的翻译活跃通常分布有单元质和翻译活性的mRNA水平之间的相关性。继等的模拟的离群dfbetas（阈值）的频率进行比较，发现在模拟数据集的频率。如果有超过2样本类的功能还执行一项综合性的示例类效果试验。这是可能使用RVM的综合样本类统计。如果使用RVM是，它是必要的，以验证互动RVM的p值也按照预期的NULL分布。一种罕见的错误可能会发生任何基因和任何样品类当内dataT或从datap数据有没有差异。这是作为“变异数分析F-试验基本上是完美的结合......”报告。在这种情况下，显示这些基因，内无论是dataT或从datap没有一个样本类的方差分析前需要删除。尝试不同的标准化方法可以解决这个问题。

值----------Value----------

anotaPerformQc generates several graphical outputs. One output ("ANOTA_interaction_p_distribution.pdf") shows the distribution of p-values and adjusted p-values for the omnibus interaction (both using densities and histograms). The second page of the pdf displays the same plots but for the RVM statistics if RVM is used. One output ("ANOTA_simulated_vs_obtained_dfbs.pdf") shows  bar graphs of the frequencies of outlier dfbetas using different dfbetas thresholds. If the simulation was enabled (recommended) these are compared to the frequencies from the random data set. One optional graphical output shows the gene by gene regressions with the sample classes indicated. In the case where RVM is used, a Q-Q plot and a comparison of the CDF of the variances to the theoretical CDF of the F-distribution is generated (output as "ANOTA_rvm_fit_for_....jpg") for both the omnibus sample class and the omnibus interaction test. The function also outputs a list object containing the following data:
anotaPerformQc生成多种图形输出。一的输出（“ANOTA_interaction_p_distribution.pdf）显示带够值和p-值调整的综合性互动（同时使用密度和直方图）的分布。第二页的PDF显示RVM的是使用相同的图，但对RVM的统计。一个输出（“ANOTA_simulated_vs_obtained_dfbs.pdf”）显示的使用不同dfbetas阈值的离群dfbetas频率的条形图。如果启用了模拟（推荐）这些都是从随机数据集的频率比较。一个可选的图形输出显示的基因样本类基因回归表示。在RVM的使用情况下，一个QQ的图和的方差民防比较的F-分布的理论民防生成（输出作为的“ANOTA_rvm_fit_for_ .... JPG”）为综合样本类和综合性的互动测试。该功能还输出一个列表对象包含以下数据：

参数：omniIntStats
A matrix with a summary of the statistics from the omnibus interaction analysis containing the following columns: "intMS" (the mean square for the interaction); "intDf" (the degrees of freedom for the interaction); "residMS" (the residual error mean square); "residDf" (the degrees of freedom for the residual error); "residMSRvm" (the mean square for the residual error after applying RVM); "residDfRvm"(the degrees of freedom for the residual error after applying RVM); "intRvmFval" (the F-value for the RVM statistics); "intP" (the p-value for the interaction); "intRvmP" (the p-value for the interaction using RVM statistics); "intPAdj" (the adjusted [for multiple testing using the selected multiple testing correction method] p-value of the interaction); "intRvmPAdj"(the adjusted [for multiple testing using the selected multiple testing correction method] p-value of the interaction using RVM statistics).
一个矩阵包含以下各列的综合相互作用分析统计的总结：“intDf”（交互作用的自由度）;的“intMS”（相互作用的平均平方）的“residMS”（残差均方根）;“residDf”（残差的自由度）;的“residMSRvm”（RVM的申请残差均方根）;的“residDfRvm”（自由度为应用RVM的后剩余误差）;的“intRvmFval”（RVM的统计F值）;“INTP”（互动p值）;“intRvmP”（p值使用RVM的相互作用统计数据）;“intPAdj”（[使用选定的多个测试校正方法的多个测试]相互作用的P-值）调整;的“intRvmPAdj”（[多个测试使用选定的多个测试校正方法校正P值使用RVM的统计）的相互作用。

参数：omniGroupStats
A matrix with a summary of the statistics from the omnibus sample class analysis containing the following columns:"groupSlope" (the common slope used in APV); "groupSlopeP" (if the slope is <0 or >1 a p-value for the slope being <0 or >1 is calculated; if the slope is >=0 & <=1 this value is set to 1); "groupMS" (the mean square for sample classes); "groupDf" (the degrees  of freedom for the sample classes); "groupResidMS" (the residual error mean square); "groupResidDf" (the degrees of freedom for the residual error); "residMSRvm" (the mean square for the residual error after applying RVM); "groupResidDfRvm"(the degrees of freedom for the residual error after applying RVM); "groupRvmFval" (the F-value for the RVM statistics); "groupP" (the p-value for the sample class effect); "groupRvmP" (the p-value for the sample class effect using RVM statistics); "groupPAdj" (the adjusted [for multiple testing using the selected multiple testing correction method] p-value of the sample class effect); "groupRvmPAdj"(the adjusted [for multiple testing using the selected multiple testing correction method] p-value of the sample class effect using RVM statistics).
一个包含下列列的统计，从综合样品阶级分析的摘要矩阵：的“groupSlope”（共同在APV的斜坡）;“groupSlopeP”（如果坡度<0或1个P斜坡<0或1>值计算;如果坡度> = 0＆<= 1这个值设置为1）;“groupMS”（样本类的平均平方） ;的“groupDf”（样本类的自由度）;的“groupResidMS”（残差均方根）;的“groupResidDf”（残差的自由度）;的“residMSRvm”（平均“groupResidDfRvm”（RVM的申请残差自由度）;的“groupRvmFval”（RVM的统计F值）;的“groupP”（在p广场后申请RVM的剩余误差）;价值为样本类效果）;的“groupRvmP”（使用RVM的统计数据为样本类效果的P-值）;的“groupPAdj”（[使用多个测试选定的多个测试校正方法]调整后的P-值样本类效果）;的“groupRvmPAdj”，（多个测试使用选定的多个测试校正方法[] RVM的统计样本类使用效果调整p值）。

参数：correctionMethod
The multiple testing correction method used to adjust the nominal p-values.
多个测试校正方法用来调整名义p值。

参数：dsfSummary
A vector with the obtained frequencies of outlier dfbetas without the interaction term in the model.
离群dfbetas模型中的相互作用术语没有获得频率向量。

参数：dfbetas
A matrix with the dfbetas from the model without the interaction term in the model.
一个从没有在模型中的相互作用模型的dfbetas矩阵。

参数：residuals
The residuals from the regressions without the interaction term in the model.
从没有在模型中的交互项的回归残差。

参数：fittedValues
A matrix with the fitted values from the regressions without the interaction term in the model.
一个从没有在模型中的交互项的回归拟合值的矩阵。

参数：phenoClasses
The sample classes used in the analysis. The sample class order can be used to create the contrast matrix when identifying differential translation using anotaGetSigGenes.
样品在分析中使用的类。可以使用示例类的顺序，确定差的翻译使用anotaGetSigGenes时创建的对比矩阵。

参数：sampleNames
A vector with the sample names (taken from the translationally active samples).
向量与样本名（从翻译活跃的样品）。

参数：abParametersInt
The ab parameters for the inverse gamma fit for the interactions within RVM.
内RVM的相互作用逆伽玛适合AB参数。

参数：abParametersGroup
The ab parameters for the inverse gamma fit for sample classes within RVM.
样本类内RVM的反伽玛适合AB参数。

作者（S）----------Author(s)----------

Ola Larsson <a href="mailto

la.larsson@ki.se">ola.larsson@ki.se</a>, Nahum Sonenberg
<a href="mailto:nahum.sonenberg@mcgill.ca">nahum.sonenberg@mcgill.ca</a>, Robert Nadon <a href="mailto:robert.nadon@mcgill.ca">robert.nadon@mcgill.ca</a>

参见----------See Also----------

anotaResidOutlierTest, anotaGetSigGenes,anotaPlotSigGenes
anotaResidOutlierTest，anotaGetSigGenes，anotaPlotSigGenes

举例----------Examples----------

## See example for \code{\link{anotaPlotSigGenes}}[＃请参阅\代码例如{\链接{anotaPlotSigGenes}的}]

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册