R语言 DNAcopy包 segment()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-25 16:42:31

segment(DNAcopy)
segment()所属R语言包：DNAcopy

                                    Genome Segmentation Program
                                       基因组分割计划

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

This program segments DNA copy number data into regions of estimated  equal copy number using circular binary segmentation (CBS).
此程序段的DNA拷贝数的数据将使用循环二元分割（CBS）的数量估计等于副本区域。

用法----------Usage----------

  segment(x, weights = NULL, alpha = 0.01, nperm = 10000, p.method =
                  c("hybrid", "perm"), min.width=2, kmax=25, nmin=200,
                  eta=0.05, sbdry=NULL, trim = 0.025, undo.splits =
                  c("none", "prune", "sdundo"), undo.prune=0.05,
                  undo.SD=3, verbose=1)

参数----------Arguments----------

参数：x
an object of class CNA
中央社类的对象

参数：weights
a vector of weights for the probes. The weights should be inversely proportional to their variances.  Currently all weights should be positive i.e. remove probes with zero weight prior to segmentation.
探针的权重向量。他们的差异，应该是成反比的权重。目前，所有的权重应该是积极的，即删除权重为零的探针来分割前。

参数：alpha
significance levels for the test to accept change-points.
接受变点测试的显着水平。

参数：nperm
number of permutations used for p-value computation.
用于P-值计算的排列数。

参数：p.method
method used for p-value computation.  For the "perm" method the p-value is based on full permutation.  For the "hybrid" method the maximum over the entire region is split into maximum of max over small segments and max over the rest.  Approximation is used for the larger segment max. Default is hybrid.
P-值计算方法。为“烫发”的方法，p值的基础上充分置换。在整个区域的最大的“混合”的方法分成小部分，并在截断的最大最大最大。用于较大的段最大逼近。默认是混合的。

参数：min.width
the minimum number of markers for a changed segment. The default is 2 but can be made larger.  Maximum possible value is set at 5 since arbitrary widths can have the undesirable effect of incorrect change-points when a true signal of narrow widths exists.
标记为改变段的最低数量。默认是2，但可以做得比较大。因为任意宽度可以有不正确的变化点的不良影响，窄宽度的真实信号存在时，最大可能的值设置为5。

参数：kmax
the maximum width of smaller segment for permutation in the hybrid method.
较小的混合方法在置换段的最大宽度。

参数：nmin
the minimum length of data for which the approximation of maximum statistic is used under the hybrid method. should be larger than 4*kmax
下的混合方法使用近似的最大统计数据的最小长度。应该大于4 * KMAX

参数：eta
the probability to declare a change conditioned on the permuted statistic exceeding the observed statistic exactly j (= 1,...,nperm*alpha) times.
的概率超过观察到的统计准确J（= 1，...，nperm *阿尔法）次置换统计申报条件的变化。

参数：sbdry
the sequential boundary used to stop and declare a change. This boundary is a function of nperm, alpha and eta.  It can be obtained using the function "getbdry" and used instead of having the "segment" function compute it every time it is called.
连续的边界停止，并宣布改变。这个边界是一个nperm，α和ETA的功能。它可以通过使用功能“getbdry”和有“段”的功能，计算每次被调用时，而不是使用。

参数：trim
proportion of data to be trimmed for variance calculation for smoothing outliers and undoing splits based on SD.
方差计算数据的比例进行修整平滑离群撤消分裂的SD。

参数：undo.splits
A character string specifying how change-points are to be undone, if at all.  Default is "none".  Other choices are "prune", which uses a sum of squares criterion, and "sdundo", which  undoes splits that are not at least this many SDs apart.
一个字符串，指定如何变点被撤消，如果在所有。默认为“无”。其他的选择是“修剪”，它采用了最小二乘准则的总和，的“sdundo”，撤销分裂，是不是至少很多SDS除了。

参数：undo.prune
the proportional increase in sum of squares allowed when eliminating splits if undo.splits="prune".
如果允许时，消除方块的总和的比例增加分裂undo.splits =“剪枝”。

参数：undo.SD
the number of SDs between means to keep a split if undo.splits="sdundo".
之间的SDS意味着保持分裂，如果undo.splits =“sdundo”的。

参数：verbose
level of verbosity for monitoring the program's progress where 0 produces no printout, 1 prints the current sample, 2 the current chromosome and 3 the current segment.  The default level is 1.
为监督该计划的进展情况，其中0产生没有打印输出电流采样，1张，2当前染色体和3当前段的冗长的水平。默认级别为1。

Details

详情----------Details----------

This function implements the cicular binary segmentation (CBS) algorithm of Olshen and Venkatraman (2004).  Given a set of genomic data, either continuous or binary, the algorithm recursively splits chromosomes into either two or three subsegments based on a maximum t-statistic.  A reference distribution, used to decided whether or not to split, is estimated by permutation.  Options are given to eliminate splits when the means of adjacent segments are not sufficiently far apart.  Note that after the first split the α-levels of the tests for splitting are not unconditional.
此功能实现的cicular的二元分割的Olshen和Venkatraman（2004）（CBS）的算法。鉴于一组基因组数据，连续或二进制的算法递归分割成两个或三个子段的基础上最大的t-统计的染色体。参考分布，来决定是否或不分裂，估计置换。给出选项，以消除分裂时，相邻节段的手段不够相距甚远。请注意，先拆后α分裂的测试水平都不是无条件的。

We recommend using one of the undoing options to remove change-points detected due to local trends (see the manuscript below for examples of local trends).
我们建议使用撤消选项之一，以消除由于检测到当地发展趋势的变化点（见下面的手稿，为当地发展趋势的例子）。

Since the segmentation procedure uses a permutation reference distribution, R commands for setting and saving seeds should be used if the user wishes to reproduce the results.
由于分割过程使用一个置换的参考分布，R命令用于设置和保存种子，应使用如果用户希望重现的结果。

Data that are NA, Inf, NaN will be removed on a per sample basis for "genomdat" and all samples for "chrom" and "maploc".
每个样品的基础上为“genomdat”和“铬”和“maploc”所有样品将被删除的数据是不适用，INF，NaN的。

值----------Value----------

An object of class DNAcopy.  It has three elements:
对象类DNAcopy。它有三个要素：

参数：data
The original CNA object which was the input for segment
段输入的原始中央社对象

参数：out
a data frame with six columns.  Each row of the data frame contains a segment for which there are six variables: the sample id, the chromosome number, the map position of the start of the segment, the map position of the end of the segment, the number of markers in the segment, and the average value in the segment.
与六列的数据框。数据框的每一行包含一个段，其中有六个变量：样品编号，染色体数目，段开始的图位置，段年底的图位置，标记在段和段的平均值。

参数：segRows
a data frame with the start and end row of each segment in the data matrix.  print command shows it with the showSegRows=T
与数据框的数据矩阵中的每个段的开始和结束的行。 print命令显示它与showSegRows = T

参数：call
the call that produced the output object.
呼叫产生的输出对象。

作者（S）----------Author(s)----------

Venkatraman E. Seshan <a href="mailto:seshanv@mskcc.org">seshanv@mskcc.org</a> and Adam Olshen
<a href="mailto

lshena@biostat.ucsf.edu">olshena@biostat.ucsf.edu</a>

参考文献----------References----------

Circular binary segmentation for the analysis of array-based DNA copy number data.  Biostatistics 5: 557-572.
segmentation algorithm for the analysis of array CGH data. Bioinformatics 23: 657-63.

举例----------Examples----------

# test code on an easy data set[一个简单的数据集上的测试代码]
set.seed(25)
genomdat <- rnorm(500, sd=0.1) +
rep(c(-0.2,0.1,1,-0.5,0.2,-0.5,0.1,-0.2),c(137,87,17,49,29,52,87,42))
plot(genomdat)
chrom <- rep(1:2,c(290,210))
maploc <- c(1:290,1:210)
test1 <- segment(CNA(genomdat, chrom, maploc))

# test code on a noisier and hence more difficult data set[1噪音，因此更难数据集上的测试代码]
set.seed(51)
genomdat <- rnorm(500, sd=0.2) +
rep(c(-0.2,0.1,1,-0.5,0.2,-0.5,0.1,-0.2),c(137,87,17,49,29,52,87,42))
plot(genomdat)
chrom <- rep(1:2,c(290,210))
maploc <- c(1:290,1:210)
test2 <- segment(CNA(genomdat, chrom, maploc))

# test code for weighted CBS[加权哥伦比亚广播公司的测试代码]
set.seed(97)
wts <- sample(1:3, 500, replace=TRUE)
genomdat <- rnorm(500, sd=0.3)/sqrt(wts) +
rep(c(-0.2,0.1,1,-0.5,0.2,-0.5,0.1,-0.2),c(137,87,17,49,29,52,87,42))
plot(genomdat)
chrom <- rep(1:2,c(290,210))
maploc <- c(1:290,1:210)
test3 <- segment(CNA(genomdat, chrom, maploc), weights=wts)

#A real analyis[一个真正的analyis]

data(coriell)

#Combine into one CNA object to prepare for analysis on Chromosomes 1-23[结合成一个中央社的对象，准备进行分析，染色体1-23]

CNA.object <- CNA(cbind(coriell$Coriell.05296,coriell$Coriell.13330),
               coriell$Chromosome,coriell$Position,
               data.type="logratio",sampleid=c("c05296","c13330"))

#We generally recommend smoothing single point outliers before analysis[我们一般建议在分析单点离群平滑]
#Make sure to check that the smoothing is proper[确保检查是正确的平滑]

smoothed.CNA.object <- smooth.CNA(CNA.object)

#Segmentation at default parameters[在默认参数的分割]

segment.smoothed.CNA.object <- segment(smoothed.CNA.object, verbose=1)
data(coriell)

#Combine into one CNA object to prepare for analysis on Chromosomes 1-23[结合成一个中央社的对象，准备进行分析，染色体1-23]

CNA.object <- CNA(cbind(coriell$Coriell.05296,coriell$Coriell.13330),
               coriell$Chromosome,coriell$Position,
               data.type="logratio",sampleid=c("c05296","c13330"))

#We generally recommend smoothing single point outliers before analysis[我们一般建议在分析单点离群平滑]
#Make sure to check that the smoothing is proper[确保检查是正确的平滑]

smoothed.CNA.object <- smooth.CNA(CNA.object)

#Segmentation at default parameters[在默认参数的分割]

segment.smoothed.CNA.object <- segment(smoothed.CNA.object, verbose=1)

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册