bayespeak(BayesPeak)
bayespeak()所属R语言包:BayesPeak
BayesPeak - Bayesian analysis of ChIP-seq data
BayesPeak - SEQ芯片数据的贝叶斯分析
译者:生物统计家园网 机器人LoveR
描述----------Description----------
BayesPeak - Bayesian analysis of ChIP-seq data. This function divides the genome into jobs, and performs the BayesPeak algorithm on each using a C backend. The jobs can be performed in parallel, using the package multicore. Results are returned in R.
BayesPeak - SEQ芯片数据的贝叶斯分析。此功能分为工作的基因组,并执行使用C后端每个BayesPeak算法。作业可以并行执行,使用包multicore。返回结果在R
用法----------Usage----------
bayespeak(treatment, control, chr = NULL, start,
end, bin.size = 100L, iterations = 10000L,
repeat.offset = TRUE, into.jobs = TRUE, job.size = 6E6L,
job.overlap = 20L, use.multicore = FALSE,
mc.cores = getOption("cores"), snow.cluster,
prior = c(5, 5, 10, 5, 25, 4, 0.5, 5),
report.p.samples = TRUE)
参数----------Arguments----------
参数:treatment, control
These arguments should contain the treated ChIP-seq data and the control data, respectively. Each of these arguments can be:
这些参数应该包含治疗的ChIP-seq的数据和控制数据。这些参数中的每个人都可以是:
a path to a .bed file (this file will be read in as per read.bed).
向。床上文件的路径(此文件将被读取每read.bed)。
OR a data.frame, which should have columns "chr", "start", "end", "strand".
或者一个的data.frame,它应该有列"chr", "start", "end", "strand"。
OR a RangedData object. This object is expected to be split into spaces by chromosome, and should have a data track labelled "strand".
或RangedData对象。预计将通过染色体分裂成空间对象,应标有“股”有一个数据轨道。
The control argument is entirely optional. (Mathematically, leaving this argument out is equivalent to setting gamma = 1 in the model.) Strand information is expected to be given as "+" or "-".
control参数是完全可选的。 (数学,离开这个说法是相当于设置模型中的γ= 1)。预计东街信息被视为“+”或“ - ”。
参数:chr
Character vector, specifying which chromosomes to restrict analysis to. Chromosome names must be specified exactly as they appear in the treatment and control arguments. If left as the default value chr = NULL, then BayesPeak will find all chromosomes present in the treatment file.
特征向量,指定染色体限制分析。正是因为他们出现在治疗和控制参数,必须指定染色体名。如果作为默认值chr = NULL左,然后BayesPeak会发现所有的染色体在treatment文件提出。
参数:start, end
Numeric. Locations on the chromosome to start and end at, respectively. If unspecified, then the algorithm will start and end at the minimum and maximum reads found in the data, respectively.
数字。开始和结束,分别在染色体上的位置。如果未指定,则算法将开始和结束时的最低和最高读取数据,发现在分别。
参数:bin.size
Numeric. Reads are collected into bins. This parameter controls the width of each bin. The bin size is related to the mean fragment length in the library being sequenced, and thus a smaller mean fragment may merit a smaller bin size - please see Spyrou et al. (2009) for more information.
数字。读取收集箱。此参数控制每个容器的宽度。的bin大小与库在被测序的平均片段长度,和一个较小的平均片段可能值得一个较小的bin大小 - 请看到Spyrou等。 (2009年)获得更多信息。
参数:iterations
Numeric. Number of iterations to run the Monte Carlo analysis for.
数字。运行蒙特卡罗分析的迭代次数。
参数:repeat.offset
Logical. If TRUE, the algorithm is run a second time, this time with the bins offset by floor(window/2).
逻辑。 TRUE如果,该算法运行第二次,这一次由floor(window/2)抵消箱。
参数:into.jobs
Logical. By default, BayesPeak will divide a large region into smaller jobs and analyse each one separately. To prevent this behaviour, set into.chunks = FALSE. This may put BayesPeak at increased risk of overflow and underflow issues, and will additionally prevent usage of the parallel processing options.
逻辑。 BayesPeak默认情况下,将划分成更小的工作的一个大区域,并分析每一个分开。为了防止这种行为,设置into.chunks = FALSE。溢和下溢问题的风险增加,这可能使BayesPeak,此外,将防止使用并行处理选项。
参数:job.size
Numeric. The size of the jobs in base pairs, as described above.
数字。如上所述,大小的碱基对的工作。
参数:job.overlap
Numeric. Jobs are expanded to overlap each other. This is prevent peaks on the boundary between two jobs being missed. job.overlap corresponds to the number of bins by which each job is expanded.
数字。工作扩大到互相重叠。这是防止被错过的两个作业之间的边界上的高峰。 job.overlap对应到每个作业扩展箱的数量。
参数:use.multicore
Logical. If use.multicore = TRUE, then the individual chunks will be processed in parallel, using the multicore package. The multicore package must be installed for this feature to be enabled. At time of writing, it can be downloaded from http://www.rforge.net/multicore/index.html.
逻辑。如果use.multicore = TRUE,然后个别的块将并行处理,使用multicore包。要启用此功能,必须安装multicore包。在写作时,它可以从http://www.rforge.net/multicore/index.html下载。
参数:mc.cores
Numeric. The number of cores to be used for parallel processing. This argument is passed directly to the mclapply function.
数字。用于并行处理核心数量。此参数直接传递给mclapply功能。
参数:snow.cluster
Cluster object. A cluster to be used for parallel processing, as per the snow package. A cluster can be created via the makeCluster function.
聚类对象。用于并行处理的聚类,每snow包。通过makeCluster函数可以创建一个聚类。
参数:prior
Numeric. A vector, specifying the prior on the hyperparameters as follows. We have lambda_0 ~ gamma(alpha_0, beta_0) and lambda_1 ~ gamma(alpha_1, beta_1). Additionally, we have that alpha_0, alpha_1, beta_0, beta_1 all have gamma priors. This argument should be c(alpha_0 shape, alpha_0 scale, beta_0 shape, beta_1 scale, alpha_1 shape, alpha_1 scale, beta_1 shape, beta_1 scale).
数字。一个向量,上hyperparameters指定事先如下。我们有lambda_0~伽玛(alpha_0,beta_0)和lambda_1~伽玛(alpha_1,beta_1)。此外,我们有alpha_0,alpha_1,beta_0,beta_1所有Gamma先验。这种说法应该是c(形状alpha_0,alpha_0规模,beta_0形状,beta_1规模,alpha_1形状,alpha_1规模,beta_1形状,beta_1规模)。
参数:report.p.samples
Logical. If FALSE, do not collect information required for the parameter samples reported in the output. Thus, output\$p.samples will be an empty list. If this information is not required, setting this parameter to FALSE will reduce memory usage.
逻辑。如果为FALSE,不收集在输出参数样本所需的信息。因此,输出\ $ p.samples将是一个空列表。如果不需要此信息,这个参数设置为FALSE,将减少内存使用。
Details
详情----------Details----------
BayesPeak uses a fully Bayesian hidden Markov model to detect enriched locations in the genome. The structure accommodates the natural features of the Solexa/Illumina sequencing data and allows for overdispersion in the abundance of reads in different regions. Markov chain Monte Carlo algorithms are applied to estimate the posterior distributions of the model parameters, and posterior probabilities are provided for the sites of interest.
BayesPeak使用完全贝叶斯隐马尔可夫模型检测丰富的基因组中的位置。结构适应公司Solexa / Illumina的测序数据的自然特征,并允许在读取不同区域丰富的偏大离。马尔可夫链蒙特卡罗算法估计模型参数的后验分布,后验概率为感兴趣的网站提供。
值----------Value----------
A list of 4 objects:
4对象名单:
peaks: A data.frame corresponding to the bins that BayesPeak has identified as potentially being enriched. chr, start, end give the genomic co-ordinates of the bin. PP refers to the posterior probability of the bin being enriched. job is the number of the job within which the bin was called, which corresponds to a row in the QC data.frame (see below).
高峰:一个data.frame对应,BayesPeak已确定为潜在的被丰富的垃圾箱。 chr, start, end给垃圾桶基因组统筹。 PP是指被丰富的bin后验概率。 job是被称为垃圾桶内的工作,这对应行中的QC数据框(见下文)的数量。
QC: details of each individual job, listed in columns as follows:
质量控制:每个人的工作细节,列为列如下:
calls is the number of potentially enriched bins identified in a job (i.e. bins with PP > 0.01).
calls是在工作中发现潜在的丰富桶(即PP> 0.01桶)。
score is simply the proportion of potentially enriched bins with a PP value above 0.5. Intuitively, a larger score is "better", as it indicates that more of the PP values have tended to 0 or 1.
score简直是可能丰富的箱与PP值在0.5以上的比例。直观地说,一个更大的得分是“更好”,因为它表明,更多的PP值趋于0或1。
chr, start, end are the genomic co-ordinates of the job.
chr, start, end基因组的工作协调。
We report the average value, across iterations of the algorithm, of the important parameters p, theta, lambda0, lambda1, gamma and the average log likelihood loglhood.
我们报告的跨迭代算法的平均价值,的p, theta, lambda0, lambda1, gamma和平均log的可能性loglhood的重要参数,。
var is the variance of the bin counts.
var是方差的bin计数。
autocorr is an estimate of the first order autocorrelation of bin counts.
autocorr斌计数一阶自相关估计。
status indicates whether the job was normal, or offset by half a bin width.
status指示是否工作正常,或抵消一个bin宽度的一半。
call: the line of code used to run BayesPeak.
检测:行代码用于运行BayesPeak。
p.samples: A list of matrix objects, each containing parameter samples from the MCMC runs. p.samples[[i]] corresponds to the samples taken in job i. Samples are taken every 10 iterations, with the first half of the run being discarded, and to avoid using too much memory, not all parameters are given. This output can be used to assess convergence e.g. using the CRAN packages coda or boa. (see the vignette - vignette("BayesPeak"))
p.samples:listmatrix对象,每个都包含参数的MCMC运行样本。 p.samples[[i]]对应工作i采取样品。抽取样本,每10次迭代,上半年被丢弃的运行,并避免使用过多的内存,并不是所有的参数。该输出可以用来评估衔接,例如使用CRAN的包尾或蟒蛇。 (见插曲 - vignette("BayesPeak"))
Note that the raw output of this function is not intended to be used directly as results - the output should be summarized using the summarize.peaks function before using it in later analysis.
请注意,这个函数的原始输出不打算直接被用来作为结果 - 输出应使用前summarize.peaks功能,使用它在以后的分析总结。
作者(S)----------Author(s)----------
Christiana Spyrou and Jonathan Cairns
参考文献----------References----------
BayesPeak: Bayesian analysis of ChIP-seq data, BMC Bioinformatics 2009, 10:299 doi:10.1186/1471-2105-10-299
参见----------See Also----------
read.bed, summarize.peaks.
read.bed,summarize.peaks。
举例----------Examples----------
dir <- system.file("extdata", package="BayesPeak")
treatment <- file.path(dir, "H3K4me3reduced.bed")
input <- file.path(dir, "Inputreduced.bed")
##look at specific region 92-95Mb on chromosome 16[#看看16号染色体上的特定区域的92-95MB]
##(we've used half the number of iterations here to reduce the time this example takes)[#(我们已经使用一半数量的迭代,以减少这个例子所需要的时间)]
raw.output <- bayespeak(treatment, input, chr = "chr16", start = 9.2E7, end = 9.5E7, iterations = 5000L, use.multicore = TRUE)
output <- summarize.peaks(raw.output)
output
## Not run: [#无法运行:]
##analyse all data in file[#文件中的所有数据进行分析。]
raw.output.wg <- bayespeak(treatment, input, use.multicore = TRUE)
output <- summarize.peaks(raw.output.wg)
## End(Not run)[#结束(不运行)]
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|