R语言 MEDIPS包 MEDIPS.saturationAnalysis()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-26 00:38:51

MEDIPS.saturationAnalysis(MEDIPS)
MEDIPS.saturationAnalysis()所属R语言包：MEDIPS

                                       Function calculates the saturation/reproducibility of the provided MeDIP-Seq data.
                                       函数计算所提供的MeDIP SEQ数据饱和度/可重复性。

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

The saturation analysis addresses the question, whether the number of input regions is sufficient to generate a saturated and reproducible methylation profile of the reference genome. The main idea is that an insufficent number of short reads will not result in a saturated methylation profile.  Only if there is a sufficient number of short reads, the resulting genome wide methylation profile will be reproducible by another independent set of a similar number of short reads.
饱和度分析，解决问题，输入区域的数量是否足够产生一个参考基因组甲基化谱饱和，重现性好。其主要思想是，短读insufficent不会导致饱和的甲基化。仅当有足够数量的短读，由此产生的全基因组甲基化谱将是另一个类似的短读的一套独立的重复性。

用法----------Usage----------

MEDIPS.saturationAnalysis(data = NULL, no_iterations = 10, no_random_iterations = 1, empty_bins = TRUE, rank = FALSE, extend = 400, bin_size = NULL)

参数----------Arguments----------

参数：data
has to be a MEDIPS SET object
是MEDIPS集对象

参数：no_iterations
defines the number of subsets created from the full sets of available regions (default=10)
定义从可用区域的全套创建子集的数量（默认值= 10）

参数：no_random_iterations
approaches that randomly select data entries may be processed several times in order to obtain more stable results.  By specifying the no_random_iterations parameter (default=1) it is possible to run the saturation analysis several times.  The final results returned to the saturation results object are the averaged results of each random iteration step.
可处理的方法，随机选取数据项几次，以获得更稳定的结果。通过指定no_random_iterations参数（默认值= 1），它是可以多次运行的饱和度分析。返回到饱和结果对象的最终结果是每个随机迭代步的平均结果。

参数：empty_bins
can be either TRUE or FALSE (default TRUE). This parameter effects the way of calculating correlations between the resulting genome vectors.  A genome vector consists of concatenated vectors for each included chromosome. The size of the vectors is defined by the bin_size parameter. If there occur genomic bins which contain no overlapping regions, neither from the subsets of A nor from the subsets of B,  these bins will be neglected when the paramter is set to FALSE.
可以是TRUE或FALSE（默认为true）。此参数的效果由此产生的基因组向量之间的相关性的计算方式。一个基因向量由每个染色体的串联向量。定义由bin_size参数向量的大小。如果出现其中包含没有重叠的区域，无论是从B的子集也不子集，这些回收箱放慢参数设置为FALSE时，将被忽略不计的基因组箱。

参数：rank
can be either TRUE or FALSE (default FALSE). This parameter also effects the way of calculating correlations between the resulting genome vectors. If rank is set to TRUE, the correlation will be calculated for the ranks of the bins instead of considering the counts. Setting this parameter to TRUE is a more robust approach that reduces the effect of possible occuring outliers (these are bins with a very high number of overlapping regions) to the correlation.
可以是TRUE或FALSE（默认为FALSE）。此参数也影响导致基因向量之间的相关性的计算方式。如果等级设置为TRUE，将计算相关，而不是考虑计数箱的行列。这个参数设置为TRUE，是一个更强大的方法，减少了可能易出现异常值的影响（这些都是箱具有很高的重叠区域）的相关性。

参数：extend
defines the number of bases by which the region will be extended before the genome vector is calculated. Regions will be extended along the plus or the minus strand as defined by their provided strand information.
定义前计算的基因向量，将扩大该区域的碱基。区域将沿加或由他们提供的钢绞线信息定义的负链延伸。

参数：bin_size
defines the size of genome wide bins and therefore, the size of the genome vector.  Read coverages will be calculated for bins separated by bin_size base pairs.
定义的全基因组箱的大小，因此，大小的基因向量。阅读覆盖率将计算箱分离由bin_size碱基。

值----------Value----------

参数：distinctSets
Contains the results of each iteration step (row-wise) of the saturation analysis. The first column is the number of considered regions in each set, the second column is the resulting pearson correlation coefficient when comparing the two independent genome vectors.
包含每个迭代步（逐行）的饱和度分析的结果。每套考虑区域的第一列，第二列是比较两个独立的基因向量时产生的Pearson相关系数。

参数：estimation
Contains the results of each iteration step (row-wise) of the estimated saturation analysis. The first column is the number of considered regions in each set, the second column is the resulting pearson correlation coefficient when comparing the two independent genome vectors.
包含每个迭代步估计饱和度分析（逐行）的结果。每套考虑区域的第一列，第二列是比较两个独立的基因向量时产生的Pearson相关系数。

参数：distinctSets
the total number of available regions
可用区域总数

参数：maxEstCor
contains the best pearson correlation (second column) obtained by considering the artifically doubled set of reads (first column)
包含最佳的Pearson相关考虑读取中人工一倍（第二列）（第一列）

参数：distinctSets
contains the best pearson correlation (second column) obtained by considering the total set of reads (first column)
包含最佳的Pearson相关考虑读取的总集（第二列）（第一列）

作者（S）----------Author(s)----------

Lukas Chavez

举例----------Examples----------

library(BSgenome.Hsapiens.UCSC.hg19)
file=system.file("extdata", "MeDIP_hESCs_chr22.txt", package="MEDIPS")
CONTROL.SET = MEDIPS.readAlignedSequences(BSgenome="BSgenome.Hsapiens.UCSC.hg19", file=file)

sr.control = MEDIPS.saturationAnalysis(data = CONTROL.SET, bin_size = 50, extend = 400, no_iterations = 10, no_random_iterations = 1)

sr.control

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册