R语言 exomeCopy包 exomeCopy()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-25 17:18:37

exomeCopy(exomeCopy)
exomeCopy()所属R语言包：exomeCopy

                                       Fit the exomeCopy or exomeCopyVar model to the observed counts.
                                       适合模式观察计数exomeCopy或exomeCopyVar的。

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

Fits a hidden Markov model to observed read counts using positional covariates. It returns an object containing the fitted parameters and the Viterbi path, the most likely path of hidden states, which is the predicted copy count at each window.  exomeCopy is designed to run on read counts from a single chromosome.  Please see the vignette for an example of how to prepare input data for exomeCopy and how to loop the function over multiple chromosomes and samples.
隐马尔可夫模型适合读计数观察位置的协变量。它返回一个对象，它包含的拟合参数的Viterbi路径，预测复印计数在每个窗口的隐藏状态，这是最有可能的路径。 exomeCopy被设计为运行在读从一个单一的染色体计数。请参见如何准备exomeCopy和如何循环的功能在多个染色体和样品的输入数据，例如小品文。

exomeCopy requires as input a RangedData object containing read counts in genomic ranges along with the covariates. Two convenience functions are provided for preparing input for exomeCopy:
exomeCopy作为输入需要RangedData对象，包含在基因组范围内与协变量读取计数。两个方便的功能是提供准备exomeCopy输入：

subdivideGRanges, to subdivide a GRanges object containing the genomic ranges of the targeted region into genomic ranges of nearly equal width, and
subdivideGRanges，细分GRanges含有几乎相等宽度的基因组范围的目标区域的基因组范围的对象，

countBamInGRanges, to count the number of read starts from a BAM read mapping file in a GRanges object.
countBamInGRanges，数读的BAM只读从GRanges对象的映射文件开始。

The GC-content (ratio of G and C bases to total number of bases) for the input ranges can be obtained using scanFa in the Rsamtools package to obtain a DNAStringSet object and letterFrequency in the Biostrings package.  See the vignette for an example.
GC含量（G和C碱基的碱基总数中的比例）为输入范围可通过使用scanFa在Rsamtools包获得DNAStringSet对象和letterFrequency在Biostrings包。看到一个例子的小插曲。

用法----------Usage----------

  exomeCopy(rdata, sample.name, X.names, Y.names, fit.var=FALSE, reltol
  = 0.0001, S = 0:4, d = 2, goto.cnv = 1e-4, goto.normal = 1/20,
init.phi="norm")

参数----------Arguments----------

参数：rdata
A RangedData object with the sample counts and positional covariates over the genomic ranges.
一个在基因组范围与样品的数量和位置变项RangedData对象。

参数：sample.name
The name of the value column of rdata with the sample read counts.
与样品的RDATA值列名读计数。

参数：X.names
The names of the value columns of rdata with covariates for estimating mu.
估计亩的协变量的名称RDATA值列。

参数：Y.names
(optional) the names of the value columns of rdata with covariates for estimating phi, only required if fit.var = TRUE.
（可选）估算披协变量的RDATA值列的名字，只需要如果fit.var = TRUE，。

参数：fit.var
A logical, whether the model should fit the overdispersion parameter phi with a linear combination of covariates (exomeCopyVar) or with a scalar (exomeCopy).  Defaults to FALSE (exomeCopy).
一个逻辑，模型是否适合与协变量的线性组合（exomeCopyVar），或与一个标量（exomeCopy）披偏大离差的参数。默认到假（exomeCopy）。

参数：reltol
The relative tolerance for convergence used in the optim function for optimizing the parameter settings.  From testing, the default value was sufficient for fitting parameters, but lower relative tolerances can be used.
相对宽容的收敛optim函数优化的参数设置。从测试中，默认值是足够的拟合参数，但可以用较低的相对公差。

参数：S
A vector of possible copy numbers for the different states.
一个可能的拷贝数不同的状态向量。

参数：d
The expected copy number for the normal state.  This should be set to 2 for autosomes and 1 for haploid data.
预期的拷贝数为正常状态。这应设置为2对常染色体和1单倍体数据。

参数：goto.cnv
The initial setting for probability to transfer to a CNV state.
初始设置为概率转移到CNV的状态。

参数：goto.normal
The initial setting for probability to transfer to a normal state.
概率转移到正常状态的初始设置。

参数：init.phi
Either "norm" or "counts": initialize phi with the moment estimate using residuals from a linear model of read counts on covariates or with the raw counts.
无论是“规范”或“罪状”：初始化与目前使用的协变量或与原始计数读取数线性模型的残差估计披。

Details

详情----------Details----------

exomeCopy fits transitional and emission parameters of an HMM to best explain the observed counts of a sample from exome or targeted sequencing.  The set of underlying copy number states, S, in the sample must be provided before running the algorithm.
exomeCopy适合一个HMM的过渡和排放参数，以最好的解释，从外显子组或有针对性的测序样品观察计数。一套底层拷贝数州，S在样品，必须提供之前运行的算法。

The emission probabilities are given as a negative binomial distribution using positional covariates, such as background read depth, quadratic terms for GC-content, and range width, which are stored in a matrix X.  Optionally, for fitting the variance of the distribution, the standard deviation and/or variance of the background set can be included in a matrix Y.  All covariates are normalized within exomeCopy for improved optimization.
发射概率为负二项分布，使用位置的变项，如背景，读深入，二次GC含量，范围宽，这是存储在矩阵X。或者，装修分布的方差，标准差和/或背景集的方差可以包含在一个矩阵Y。所有协变量归在exomeCopy改进优化。

For the observed count at range t, O_t, the emission probability is given by:
观测到的计数范围t，O_t，排放的概率是由：

The mean parameter mu_ti is given by:
平均参数mu_ti是由：

Here S_i is the i-th possible copy number state, d is the expected background copy number (d = 2 for diploid sequence), and beta is a vector of coefficients fitted by the model.  x_t* is the t-th row of the matrix X.
这里S_i是第i个可能的拷贝数的状态，d是预期的背景拷贝数（D = 2为二倍体顺序），beta是由装有一个系数向量模型。 x_t*是矩阵X的T-TH行。

mu must be positive, so it is replaced with a small positive number if the value is less than zero.
mu必须是积极的，因此它被替换用一个小的正数，如果该值小于零。

For exomeCopyVar, which also fits the variance, the emission probability is given by:
，为exomeCopyVar，这也符合方差，排放的概率：

where
哪里

or a small positive number if this is less than zero.
或一个小的正数，如果小于零。

Two transition probabilities are fitted in the model: the probabilities of transitioning to a normal state and to a CNV state.
装有两个转移概率模型过渡到正常状态，并以CNV的状态的概率。

exomeCopy calls negLogLike to evaluate the likelihood of the HMM.  The parameters are fit using Nelder-Mead optimization with the optim function on the negative likelihood. The viterbi path is calculated by calling viterbiPath.
exomeCopynegLogLike评估HMM模型的可能性。适合使用optim阴性似然函数内尔德米德优化参数。调用viterbiPath的Viterbi路径计算。

值----------Value----------

Returns an ExomeCopy object with the following slots:
返回一个与以下插槽ExomeCopy对象的：

type: the type of model used, either "exomeCopy" or
type：使用的模型类型，无论是“exomeCopy”或

path: the index of the predicted state for each genomic range
path：每个基因组范围内的指数的预测状态

ranges: the IRangesList for ranges
ranges：的范围IRangesList，

O: the input vector of counts
O：计数输入向量

O.norm: the input vector of counts divided by the
O.norm：计数输入向量除以

mu: the estimated mean vector, matrix multiplication
mu：估计均值向量，矩阵乘法

phi: a scalar esimate of phi ( or matrix
phi：披（或矩阵标量esimate

fx.par: a list of the settings S, d, cnv.states,
fx.par：列表的设置小号，D，cnv.states，

init.par: a list of the initial parameters goto.cnv,
init.par：初始参数goto.cnv的名单，

final.par: a list of the final parameters goto.cnv,
final.par：一个最终参数goto.cnv的名单，

counts: the number of evaluations of the log likelihood performed by optim
counts：log的可能性评估的数量optim进行

convergence: the integer for convergence of optim, 0 for convergence
convergence：optim收敛的整数，0衔接

nll: the final value of the negative log likelihood
nll：最后的负对数似然值

作者（S）----------Author(s)----------

Michael Love

参考文献----------References----------

Vingron, Martin; and Haas, Stefan A. (2011) "Modeling Read Counts for CNV Detection in Exome Sequencing Data," Statistical Applications in Genetics and Molecular Biology: Vol. 10 : Iss. 1, Article 52. DOI: 10.2202/1544-6115.1732 http://www.bepress.com/sagmb/vol10/iss1/art52.
genomic data by copy number:
selected applications in speech recognition," Proceedings of the IEEE, 77, 257, 286, http://dx.doi.org/10.1109/5.18626.
(2004): "Hidden Markov models approach to the analysis of array CGH data," Journal of Multivariate Analysis, 90, 132, 153,  http://dx.doi.org/10.1016/j.jmva.2004.02.008.
heterogeneous hidden Markov model for segmenting array CGH data." Bioinformatics, 22, 1144, 1146,  http://view.ncbi.nlm.nih.gov/pubmed/16533818.

参见----------See Also----------

subdivideGRanges countBamInGRanges copyCountSegments plot.ExomeCopy negLogLike IRanges RangedData
subdivideGRangescountBamInGRangescopyCountSegmentsplot.ExomeCopynegLogLikeIRangesRangedData

举例----------Examples----------

## The following is an example of running exomeCopy on simulated[＃下面是一个运行模拟exomeCopy的例子]
## read counts using the model parameters defined above.  For an example[＃使用上述模型中定义的参数读取计数。举一个例子]
## using real exome sequencing read counts (with simulated CNV) please[＃用真正的外显子组测序读计数（与模拟CNV）请]
## see the vignette.[＃看到的小插曲。]

## create RangedData for storing genomic ranges and covariate data[＃创建一个存储基因组范围和协数据RangedData]
## (background, background stdev, GC-content)[＃（背景，背景STDEV，GC含量）]
m <- 5000
rdata <- RangedData(IRanges(start=0

m-1)*100+1,width=100),space=rep("chr1",m),universe="hg19",bg=rexp(m,1),bg.sd=rexp(m,1),gc=rnorm(m,50,10))

## create read depth distributional parameters mu and phi[＃创建深度分布参数亩，披读]
rdata$gc.sq <- rdata$gc^2
X <- cbind(bg=rdata$bg,gc=rdata$gc,gc.sq=rdata$gc.sq)
Y <- cbind(bg.sd=rdata$bg.sd)
beta <- c(20,10,2,-.01)
gamma <- c(.1,.05)
rdata$mu <- beta[1] + scale(X) %*% beta[2:4]
rdata$mu[rdata$mu<1e-8] <- 1e-8
rdata$phi <- gamma[1] + scale(Y) %*% gamma[2]
rdata$phi[rdata$phi<1e-8] <- 1e-8

## create observed counts with simulated heterozygous duplication[＃创建与模拟合子重复观测计数。]
cnv.nranges <- 200
bounds <- (round(m/2)+1)

round(m/2)+cnv.nranges)
O <- rnbinom(nrow(rdata),mu=rdata$mu,size=1/rdata$phi)
O[bounds] <- O[bounds] + rbinom(cnv.nranges,prob=0.5,size=O[bounds])
rdata[["sample1"]] <- O

## run exomeCopy() and list segments[＃运行exomeCopy（）和列表分部]
fit <- exomeCopy(rdata,"sample1",X.names=c("bg","gc","gc.sq"))

## see man page for copyCountSegments() for summary of[＃见为copyCountSegments手册页（摘要）]
## the predicted segments of constant copy count, and[＃恒复印计数的预测段，]
## for plot.ExomeCopy() for plotting fitted objects[＃为的plot.ExomeCopy（）绘制拟合对象]

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册