enrichedRegions(htSeqTools)
enrichedRegions()所属R语言包:htSeqTools
Find significantly enriched regions in sequencing experiments.
测序实验,发现在显着富集区域。
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Find regions with a significant accumulation of reads in a sequencing experiment.
发现的大量积累,在读取测序实验的区域。
用法----------Usage----------
enrichedRegions(sample1, sample2, regions, minReads=10, mappedreads,
pvalFilter=0.05, exact=FALSE, p.adjust.method='none', twoTailed=FALSE,
mc.cores=1)
参数----------Arguments----------
参数:sample1
Either start and end of sequences in sample 1 (IRangesList, RangedData or IRanges object), of RangedDataList with sequences for all samples (sample2 must be left missing in this case) .
无论是开始和结束序列样品1(IRangesList,RangedData或IRanges对象),RangedDataList所有样品的序列(sample2必须离开在这种情况下丢失)。
参数:sample2
Same for sample 2. Can be left missing.
样品2相同。可以留下失踪。
参数:regions
If specified, the analysis is restricted to the regions indicated in regions. If not specified, the regions are automatically defined using the argument minReads.
如果指定,分析被限制在regions表示的区域。如果没有指定,该区域被自动定义使用参数minReads。
参数:minReads
This argument is only used when regions is not specified. The regions to be tested for enrichment are those with coverage greater or equal than minReads. If sample1 is a RangedDataList, the overall coverage adding all samples is used. Otherwise, if twoTailed is FALSE, only the reads in sample 1 are counted. If twoTailed is TRUE, the sum of reads in samples 1 and 2 are counted.
仅用于时regions未指定此参数。测试富集区域的覆盖率大于或小于minReads平等。如果sample1是RangedDataList,加入所有样品的整体覆盖。否则,如果twoTailed是假的,只有样品中读取计数。 twoTailed如果是TRUE的总和读取样品1和2都算。
参数:mappedreads
Number of mapped reads for the sample. Has to be of class integer. Will be used to compute RPKM.
映射的读取数为样本。必须是整数类的。将用来计算RPKM的。
参数:pvalFilter
Only regions with P-value below pvalFilter are reported as being enriched.
只有P值低于pvalFilter区域据报道,为不断丰富。
参数:exact
If set to TRUE, an exact test is used whenever some expected cell counts are 5 or less (chi-square test based on permutations if sample1 is a RangedDataList object, Fisher's exact test otherwise), i.e. when the asymptotic chi-square/likelihood-ratio test calculations break down. Ignored if sample2 is missing, as in this case calculations are always exact.
如果设置为TRUE,用于精确的测试时,一些预期的单元计数是5或更少(卡方测试的基础上排列,如果sample1是RangedDataList对象,否则Fisher精确测试),即当的渐近chi-square/likelihood-ratio测试计算打破。如果sample2丢失,被忽略,在这种情况下,总是精确计算。
参数:p.adjust.method
P-value adjustment method, passed on to p.adjust.
P值调整方法,通过p.adjust。
参数:twoTailed
If set to FALSE, only regions with a higher concentration of reads in sample 1 than in sample 2 are reported. If set to TRUE, regions with higher concentration of sample 2 reads are also reported. Ignored if sample2 is missing.
如果设置一个读取样品1的浓度高于样品2假的,只有区域的报告。如果设置为TRUE,样品2的浓度较高的区域,读取也报告。 sample2缺少的,如果忽略。
参数:mc.cores
If mc.cores is greater than 1, computations are performed in parallel for each element in the IRangesList objects. Whenever possible the mclapply function is used, therefore exactly mc.cores are used. For some signatures mclapply cannot be used, in which case the parallel function from package multicore is used. Note: the latter option launches as many parallel processes as there are elements in x, which can place strong demands on the processor and memory.
如果mc.cores是大于1,计算每个元素进行并行在IRangesList对象。尽可能mclapply函数,因此完全mc.cores用于。对于一些签名的mclapply不能使用,在这种情况下包的功能parallel multicore使用。注:选择后者,许多平行的进程启动有x的元素,它可以把处理器和内存上的强烈要求。
Details
详情----------Details----------
The calculations depend on whether sample2 is missing or not. Non-missing sample2 case. First, regions with coverage above minReads are selected. Second, the number of reads falling in the selected regions are computed for sample 1 and sample 2. Third, the counts are compared via a chi-square test (with Yates continuity correction), which takes into account the total number of sequences in each sample. Finally, statistically significant regions are selected and returned in RangedData or RangedDataList objects.
取决于是否sample2丢失或不计算。非缺失sample2情况。首先,区域覆盖上面的minReads选择。其次,在选定的区域下降读取数计算样品1和样品2。第三,通过卡方检验(耶茨连续性校正),其中考虑到每个样品总数的序列计数比较。最后,统计学意义的区域选择RangedData或RangedDataList对象返回。
Missing sample2. First, regions with coverage above minReads are selected. Second, the number of reads in sample 1 falling in the selected regions is computed. Third, the proportion of reads in each region is tested for enrichment via a one-tailed Binomial exact test.
失踪sample2。首先,区域覆盖上面的minReads选择。第二,读取样品1,在选定的区域计算下降。第三,在每个区域中读取的比例富集一尾二项式精确测试通过测试。
值----------Value----------
Object of class RangedData indicating the significantly enriched regions, the number of reads in each sample for those regions, the fold changes (adjusted considering the overall number of sequences in each sample) and the chi-square test P-values.
类的对象RangedData表示显着富集的区域,这些区域,褶皱的变化(调整后的序列考虑每个样本中的总人数)和卡方检验P值读取每个样品。
方法----------Methods----------
"missing", regions = "RangedData")</dt> ranges(regions) indicates the chromosome, start and end of genomic regions, while values{regions} should indicate the observed number of reads for each group in each region. enrichedRegions tests the null hypothesis that the proportion of reads in the region is equal across all groups via a likelihood-ratio test (or permutation-based chi-square for regions
“失踪”,区域=“RangedData”)</代码> </ DT>ranges(regions)表示染色体,基因组区域的开始和结束,而values{regions}应表明每个组的读取的观测数在每个区域。 enrichedRegions区域的测试通过似然比检验(或置换基于卡方读取在该区域的比例是零假设,即在所有群体平等
"missing", regions = "missing")</dt> Each element in sample1 contains the read start/end of an individual sample. enrichedRegions identifies regions with high concentration of reads (across all samples) and then compares the counts across groups using a likelihood-ratio test (or permutation-based chi-square for regions
“失踪”,区域“失踪”)</代码> </ DT>中的每个元素sample1包含只读个别样本的开始/结束。 enrichedRegions识别与读取的高浓度(所有样品)区域,然后比较跨组使用似然比检验(或区域排列为基础的卡方计数
regions = "missing")</dt> space(sample1) indicates the chromosome, start(sample1) and end(sample1) the start/end position of the reads. Similarly for sample2. enrichedRegions identifies regions with high concentration of reads (across all samples) and then compares the counts across groups using a likelihood-ratio test (or permutation-based chi-square for regions where the expected counts are
区域=“失踪”)</代码> </ DT>space(sample1)表示染色体,start(sample1)和end(sample1)读取的开始/结束位置。同样sample2。 enrichedRegions识别与读取的高浓度(所有样品)区域,然后比较跨组使用似然比检验(或区域预期数排列为基础的卡方计数
regions = "missing")</dt> space(sample1) indicates the chromosome, start(sample1) and end(sample1) the start/end position of the reads. enrichedRegions tests the null hypothesis that an unusually high proportion of reads has been
区域=“失踪”)</代码> </ DT>space(sample1)表示染色体,start(sample1)和end(sample1)读取的开始/结束位置。 enrichedRegions测试零假设,即读取异常高的比例一直
举例----------Examples----------
set.seed(1)
st <- round(rnorm(1000,500,100))
strand <- rep(c('+','-'),each=500)
space <- rep('chr1',length(st))
sample1 <- RangedData(IRanges(st,st+38),strand=strand,space=space)
st <- round(rnorm(1000,1000,100))
sample2 <- RangedData(IRanges(st,st+38),strand=strand,space=space)
enrichedRegions(sample1,sample2,twoTailed=TRUE)
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|