R语言 cycle包 backgroundData()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-25 16:12:50

backgroundData(cycle)
backgroundData()所属R语言包：cycle

                                    Generation of background expression set
                                       代背景表达集

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

The function generates background expression sets using different methods (permutation within rows, Gaussian distribution,  auto-regressive models)
函数生成背景表达设置使用不同的方法（行内，高斯分布，自回归模型排列）

用法----------Usage----------

backgroundData(eset,model=c("rr", "gauss", "ar1"))

参数----------Arguments----------

参数：eset
object of the class “ExpressionSet”
对象“ExpressionSet类”

参数：model
model for generation of background data: “rr”- random permutation,  “gauss”- Gaussian and “ar1”- AR1 models </table>
模型生成背景资料：“RR” - 随机排列，“高斯 - 高斯和”AR1“ -  AR1的模型</ TABLE>

Details

详情----------Details----------

Microarray data comprise the measurements of transcript levels for many thousands of genes. Due to the large number of genes, it can be expected that some genes show periodicity simply by chance. To assess therefore the significance of periodic signals, it is necessary first to define what distribution of signals can be expected if the studied process exhibits no true periodicity. In statistical terms this is equivalent with the definition of a null hypothesis of non-periodic expression.
芯片的数据包括数以千计的许多基因的转录水平的测量。由于大量的基因，它可以预料，一些基因显示机会周期性简单。因此，评估周期信号的意义，它是必要的，首先要定义什么样的信号可以预计，如果在研究过程中表现出没有真正的周期性分布。在统计上，这是相当于一个非周期表达的零假设的定义。

The most simple model for non-periodic expression is based on randomization of the observed times series. A background distribution can then be constructed by (repeated) random permutation of the sequentially ordered measurements in the experiment. This background model is used here if model="rr" is chosen.
非定期的表达最简单的模型是基于随机观测时间系列。（重复）然后，可以构建一个背景分布的随机实验中的排列顺序排列的测量。此背景模型用于model="rr"如果选择。

Alternatively, non-periodic expression can be derived using a statistical model. A conventional approach is based on the assumption of data normality and to use the normal distribution. This background model is chosen if model="gauss".
或者使用一个统计模型，可以得出，非定期的表达。一个传统的方法是基于数据正态假设和使用正态分布。如果model="gauss"此背景模型选择。

However, these two approaches  neglect the fact that time series data exhibit generally a considerable autocorrelation i.e. correlation between successive measurements. Therefore, neither the assumptions of data normality nor for randomizations may hold.  As demonstrated for yeast cell cycle data (Bioinformatics 2008),  this failure can substantially interfere with the significance testing, and that neglecting autocorrelation can potentially lead to a considerable overestimation of the number of periodically expressed genes.
然而，这两种方法忽视的事实，时间序列数据普遍表现出了相当大的自相关即连续测量之间相关。因此，既不是数据正常，也没有随机化的假设可能会举行。为酵母单元周期数据（生物信息学2008年）表明，这种故障可以大幅干预的重要性测试，并且，忽略自相关，有可能导致的周期性表达基因的数量相当高估。

A more suitable model is based on autoregressive processes of order one (AR(1)),  for which the value of the time-dependent variable X  depends on its previous value  up to a normally distributed random variable Z. Such model is used here for the setting of  model="ar1". The autocorrelation of X and variance of Z is estimated for each feature of the ExpressionSet object separately.  Mathematical details can be found in the given reference.
一个更合适的模型是基于一阶自回归过程（AR（1）），随时间变化的变量X的值取决于其以前的值高达一个正态分布的随机变量Z。这种模式用在这里为model="ar1"设置。 X和Z的方差自相关估计为每个ExpressionSet对象的功能分开。数学的细节，可以发现在给定的参考。

It is important to note in this context, that AR(1) processes cannot capture periodic patterns except for alternations with period two. Since Z is a random variable, we can readily generate a collection of time series with the same autocorrelation as in the original data set.  Therefore, although AR(1) processes constitute random processes, they allow us to construct a background distribution that captures the autocorrelation structure of original gene expression time series without fitting the potentially included periodic pattern.
重要的是要注意在这种情况下，AR（1）进程无法捕捉到的期间内，两个交替的周期模式。由于Z是一个随机变量，我们可以很容易地生成与原始数据集相同的自相关时间序列的集合。因此，虽然AR（1）进程构成的随机过程，他们让我们构建一个背景没有装修可能包括定期模式捕获的原癌基因的表达时间序列的自相关结构的分布。

值----------Value----------

ExpressionSet object with expression data generated by the chosen background model
ExpressionSet对象选择的背景模型所产生的基因表达数据

注意----------Note----------

Note that this function evaluates soley the exprs matrix and  no information is used from the phenoData. In particular,  the ordering of samples (arrays) is the same as the ordering  of the columns in the exprs matrix. Also, replicated arrays in the  exprs matrix are treated as independent  i.e. they should be averagered prior to analysis or placed into different
请注意，这个函数计算掌上明珠矩阵和没有信息exprs用于从phenoData。样本（阵列）的顺序，特别是作为exprs矩阵中列的顺序是相同的。此外，exprs矩阵复制阵列被视为独立的，即他们分析之前，应averagered或放置到不同的

作者（S）----------Author(s)----------

Matthias E. Futschik (<a href="http://www.cbme.ualg.pt/mfutschik_cbme.html">http://www.cbme.ualg.pt/mfutschik_cbme.html</a>)

参考文献----------References----------

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册