R语言 htSeqTools包 enrichedChrRegions()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-25 22:01:58

enrichedChrRegions(htSeqTools)
enrichedChrRegions()所属R语言包：htSeqTools

                                       Find chromosomal regions with a high concentration of hits.
                                       染色体区域发现高浓度的命中。

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

This function looks for chromosomal regions where there is a large accumulation of hits, e.g. significant peaks in a chip-seq experiment or differentially expressed genes in an rna-seq or microarray experiment. Regions are found by computing number of hits in a moving window and selecting regions based on a FDR cutoff.
这个功能看起来那里是大量堆积的命中，如染色体区域在一个芯片跳台实验或差异显着峰表示基因在RNA-seq的或微阵列实验。通过计算在一个移动的窗口点击次数，并选择区域，上FDR截止的区域被发现。

用法----------Usage----------

enrichedChrRegions(hits1, hits2, chrLength, windowSize=10^4-1, fdr=0.05, nSims=10, mc.cores=1)

参数----------Arguments----------

参数：hits1
Object containing hits (start, end and chromosome). Currently only RangedData objects are accepted.
对象（起点，终点和染色体）含命中。目前只有RangedData对象接受。

参数：hits2
Optionally, another object containing hits. If specified, regions will be defined by comparing hits1 vs hits2.
或者，另一个对象包含命中。如果指定的区域将被定义，，比较hits1与hits2。

参数：chrLength
Named vector indicating the length of each chromosome in base pairs
命名的向量表示每个染色体的碱基对的长度

参数：windowSize
Size of the window used to smooth the hit count (see details)
命中计数用来平滑窗口大小（见详情）

参数：fdr
Desired FDR level (see details)
所需FDR水平（见详情）

参数：nSims
Number of simulations to be used to estimate the FDR
模拟被用来估计的FDR号

参数：mc.cores
Number of processors to be used in parallel computations (passed on to mclapply)
用于并行计算的处理器数目（传递到mclapply的）

Details

详情----------Details----------

A smoothed number of hits is computed by counting the number of hits in a moving window of size windowSize. Notice that only the mid-point of each hit in hits1 (and hits2 if specified) is used. That is, hits are not treated as intervals but as being located at a single base pair.
移动窗口的大小windowSize点击次数计数的命中的平滑数量计算。请注意，只有每一击中点hits1（hits2如果指定）。也就是说，命中不被视为间隔，但在一个单一碱基对位于。

If hits2 is missing, regions with large smoothed number of hits are selected. To assess statistical significance, we generate hits (also 1 base pair long) randomly distributed along the genome and compute the smoothed number of hits. The number of simulated hits is set equal to nrow(hits1). The process is repeated nSims times, resulting in several independent simulations. To estimate the FDR, several thresholds to define enriched chromosomal regions are considered. For each threshold, we count the  number of regions above the threshold in the observed data and in the simulations. For each threshold t, the FDR is estimated as the average number of regions with score >=t in the simulations over the number of regions with score >=t in the observed data.
如果hits2缺少，命中平滑数量区域被选中。评估统计学意义，我们产生随机分布的基因组沿计算命中平滑号码的次数（也有1个碱基对长）。模拟点击的数量设置平等nrow(hits1)。 nSims次重复这个过程，导致几个独立的模拟。为了估计FDR，几个阈值来定义丰富的染色体区域被认为是。为每个阈值，我们指望在阈值以上的观测数据，并在一些区域的模拟。对于每一个阈值T，FDR是作为区域得分的平均数估计> =在区域与得分的数量超过了模拟T> =在观测数据吨。

If hits2 is not missing, the difference in smoothed proportion of hits (i.e. the number of hits in the window divided by the overall number of hits) between the two groups is used as a test statistic. To assess statistical significance, we generate randomly scramble hits between sample 1 and sample 2 (maintaining the original number of hits in each sample), and we re-compute the test statistic. The FDR for a given threshold t is estimated as the number of bases in the simulated data with test statistic>t divided by number of bases in observed data with test statistic>t.
hits2如果不缺，命中两组之间平滑的比例（即窗口的数量除以整体命中数命中）的区别是用来作为检验统计量。评估统计意义，我们生成随机打乱样品1和样品2（保持原号码命中，在每个样品）之间的命中，我们重新计算检验统计量。对于一个给定的阈值TFDR碱基在模拟测试统计数据的数量估计> T数除以观测数据，检验统计> T碱基。

The lowest t with estimated FDR below fdr is used to define enriched chromosomal regions.
最低估计FDR低于fdr T是用来定义丰富的染色体区域。

值----------Value----------

Object of class RangedData containing regions with smoothed hit count above the specified FDR level.
类RangedData含平滑命中计数以上指定FDR水平的区域，对象。

方法----------Methods----------

Look for chromosome zones with a large number of hits reported in hits1.
看大量的报道点击hits1的染色体区域。

Look for chromosomal zones with a different density of hits in hits1 vs hits2.
染色体区域看，不同密度的命中hits1与hits2。

举例----------Examples----------

set.seed(1)
st <- round(rnorm(100,500,100))
st[st>10000] <- 10000
strand <- rep(c('+','-'),each=50)
space <- rep('chr1',length(st))
hits1 <- RangedData(IRanges(st,st+38),strand=strand,space=space)
chrLength <- c(chr1=10000)
enrichedChrRegions(hits1,chrLength=chrLength, windowSize=99, nSims=1)

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册