R语言 Clonality包 clonality.analysis()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-25 15:10:40

clonality.analysis(Clonality)
clonality.analysis()所属R语言包：Clonality

                                       Clonality testing using copy number data
                                       使用拷贝数数据的克隆试验

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

Function to test clonality of two tumors from the same patient based on their genomewide copy number profiles. This function calculates likelihood ratios and the reference distribution  under the hypothesis of independence.
功能测试从同一病人根据其基因组拷贝数型材两个肿瘤克隆。此函数计算似然比和独立的假设下的参考分布。

用法----------Usage----------

clonality.analysis(data, ptlist, pfreq = NULL, refdata = NULL, nmad = 1.25, reference = TRUE, allpairs = TRUE,segmethod="oneseg", segpar=NULL)

参数----------Arguments----------

参数：data
Copy Number Array object (output of function CNA() from package DNAcopy). First column contains chromosomes, second column contains genomic locations. Each remaining column contains log-ratios from a particular tumor or sample. Chromosomes X and Y should be removed prior to analysis, and chromosomes should be split into p and q arms to improve the power (use function splitChromosomes()).
拷贝数的数组对象（函数中央社（）输出从包DNAcopy）。第一列包含染色体，第二列包含基因组的位置。其余的每个列包含从一个特定的肿瘤或样本数的比率。 X和Y染色体应拆除之前分析，应为p和q武器染色体分裂，提高功率（使用的功能splitChromosomes（））。

参数：ptlist
Vector of the patient IDs in the order of the samples appearing in the data.  For example, if the first three tumors (columns 3, 4, 5 of data) belong to patient A, and the following two (columns 6, 7 of data) belong to patient B, then ptlist=c('ptA', 'ptA', 'ptA', 'ptB', 'ptB'). Note that while sample names in data should be unique the ptlist should have repeated labels.
病人在出现在数据样本的顺序标识的向量。例如，如果前三肿瘤（列3，4，5）的数据属于病人，以下两种（6，7的数据列）属于患者B，然后ptlist = C（“PTA”，“角，角，肺结核，PTB）。请注意，虽然在数据样本的名称应该是唯一的ptlist应该有重复的标签。

参数：pfreq
Marginal frequencies of Gains, Losses and Normals for all the chromosomes. If it is not known, pfreq should be set to NULL and frequencies will be estimated from all the samples in the dataset.  If frequencies are known, pfreq should be a data frame with 4 columns: 1) chromosome arm in the format 'chr01p', probability of 2) gain, 3) loss and 4) normal.
边际收益，亏损及所有的染色体资料的频率。如果它不知道，pfreq应设置为NULL和频率估计将DataSet中的所有样品。如果是已知的频率，pfreq应该是一个4列的数据框：1）在格式“chr01p的染色体臂，概率2）增益，3）的损失和4）正常。

参数：refdata
If available,  additional cohort of patients with the same disease that should be used to estimate the marginal gain/loss frequencies. If NULL, the original set of tumors is used, otherwise, refdata should be a CNA object. It will be segmented with 1 step CBS and each chromosome will be classified as gain/loss as described in the manuscript, leading to frequency estimates. No averaging or chromosome splitting is done for this dataset, so users should make sure refdata has chromosomes in the format 'chr01p' and that its resolution is similar to the one of the original data.
如果提供额外的队列具有相同疾病的患者应该被用来估计的边际收益/损失频率。如果为NULL，用于肿瘤的原设定，否则，refdata应该是中央社的对象。这将是分段与哥伦比亚广播公司1步中所述的手稿，导致频率估计，每个染色体将被归类为收益/亏损。没有平均或染色体分裂这个数据集，因此，用户应确保refdata在格式chr01p染色体，其决议是类似的原始数据之一。

参数：nmad
Number of MADs (median absolute deviations) that is used for Gain/Loss calls.  For each array MAD of its residuals (that is, data minus segmentation means) is calculated. Residuals  represent the array's noise revel. Any segment of this array that has a mean at least nmad MADs above or below array's median is called a gain or a loss. We use value of 1.25, while values in the range of 0.5 to 2 can also be admissible depending on the resolution and presence of artifacts.
数量（平均绝对偏差）的MADS收益/损失呼叫。每个阵列，其残留物狂（即数据减去分割手段）来计算。的残值代表阵列的噪声陶醉。任何有一个数组的中位数高于或低于平均至少nmad MADS这个数组段被称为收益或损失。我们使用的价值为1.25，而在0.5至2范围内的值也可以根据受理的决议和文物的存在。

参数：reference
If TRUE the reference distribution of likelihood ratios is created under  hypothesis of independence by pairing (independent) tumors from different patients.
如果是TRUE的可能性比率的参考分布独立性假设下创建配对（独立）来自不同患者的肿瘤。

参数：allpairs
If TRUE  all possible pairs of tumors from different patients will be used for reference distribution. If two tumors in a pair are not  exchangeable, for example primary tumor  vs recurrence, or pre-cancerous lesion vs tumor, then allpairs should be set to FALSE and the 'first' tumor should always come earlier in the data before the 'second' tumor for all the patients. Then 'first' tumors of patients will only be paired with 'second' tumors of other patients for the reference distribution.
如果是TRUE，所有可能对来自不同患者的肿瘤将用于参考分布。如果两个一对肿瘤不能退换，例如与原发肿瘤复发，或癌前病变与肿瘤，然后allpairs应设置为FALSE，第一的肿瘤总是要来之前的“第二”中的数据早些时候所有患者的肿瘤。然后第一的肿瘤患者将只搭配“第二”的参考分布的其他病人的肿瘤。

参数：segmethod
The segmentation algorithm to be used. The default is "oneseg" which uses the built in function of the same name based on the CBS algorithm. An alternative segmentation algorithm can be used. A function should be created and the name passed as described in the vignette.
要使用的分割算法。默认是“oneseg”在CBS算法的基础上名称相同的功能，它使用内置的。可以用来替代分割算法。应建立一个函数名通过描述中的小插曲。

参数：segpar
The parameters necessary for the segmentation algorithm as a list. For "oneseg" you can specify alpha (default = 0.01) and nperm (default = 2000) necessary for the CBS algorithm.
必要的参数列表分割算法。，为“oneseg”你可以指定阿尔法（默认值= 0.01）和nperm（默认= 2000）为CBS算法所需。

Details

详情----------Details----------

The function implements the statistical procedure designed to distinguish whether the two tumors from the same patient are clonal (have the same progenitor cancer cell) or independent (developed from normal cells independently). At first  data are segmented with one step CBS (Olshen, A. B., Venkatraman, E. S., Lucito, R., Wigler, M. (2004). Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5: 557-572) that picks at most one copy number change per chromosome arm. Then each chromosome arm is classified as Gain/Loss/Normal based  on a middle segment if there are 3 segments, or  based  on the most outstanding segment if there are 2 segments. The multinomial likelihood ratio comparing these classifications is computed (LR1). For each concordant partial arm gain or loss  we also calculate likelihood ratio that this change is exactly the same in both tumors. These likelihood ratios are multiplied by LR1 to obtain our final statistic, LR2. If LR2 is much greater than 1, that indicates clonality. If LR2 is much smaller than 1, it indicates independence. The reference distribution of LR2 under the hypothesis of independence is obtained  by pairing up tumors from different patients, which are independent by default.
的功能，实现了统计的程序设计来区分是否从同一病人的肿瘤克隆（具有相同的祖癌单元）或独立（从正常单元自主开发）。起初的数据分割一步CBS（Olshen，AB，卡特拉曼，胚胎，Lucito，Wigler，M.（2004）阵列为基础的DNA拷贝数数据分析的循环二元分割的生物统计5。557 -572），拿起一个副本最多每染色体臂数变化。然后每个染色体臂被归类为收益/损失/普通中间段为基础，如果有3段，或基于最优秀的部分，如果有2段。比较这些分类是多项式的可能性比计算（LR1）。对于每一个一致的部分手臂的收益或损失，我们也计算似然比，这种变化是在这两个肿瘤完全一样。这些可能性比率乘以LR1，以获得我们的最终统计，LR2的。 LR2的是，如果远远大于1，表明克隆。 LR2的是，如果远小于1，它表示的独立性。 LR2的参考分布独立的假设下，得到由来自不同患者的肿瘤，这是默认情况下，独立配对。

Since only one gain/loss is admissible per chromosome arm it is highly recommended to apply this methodology to arrays with at most 10,000-15,000 markers.  We suggest averaging  blocks of consecutive probes for arrays with larger resolution, see function ave.adj.probes.
由于只有一个增益/损耗是可以受理的每染色体臂，它强烈建议采用这种方法用在最10,000-15,000标记数组。我们建议均较大决议连续探测器阵列块，看到功能ave.adj.probes。

值----------Value----------

If the reference is TRUE, function returns the list with 4 elements: LR, OneStepSeg, ChromClass, refLR.
如果参考为TRUE，函数返回4个元素的列表：LR，OneStepSeg，ChromClass，refLR。

LR - matrix with the within patient comparisons. Each row corresponds to a pair of samples being compared. Columns are: Sample1 - name of sample 1; Sample2 - name of sample 2; LR1 - likelihood ratio without comparisons of specific concordant gains/losses; LR2  - final likelihood ratio with individual comparisons; GGorLL - number of chromosome arms that are classified as Gains in both tumors or Losses in both tumors; NN - number of chromosome arms  that are classified as Normal in both tumors; GL    - number of chromosome arms  that are classified as Gain in one tumors and Loss in another; GNorLN -  number of chromosome arms  that are classified as Gain(Loss) in one tumors and Normal in another; IndividualComparisons    - list of chromosome arms that had comparisons of specific concordant gains/losses in both tumors and the corresponding likelihood ratio for them being exactly the same. p-value - quantile of the reference distribution under the null hypothesis (refLR$LR2) that the value of LR2 match.
-  LR内病人比较矩阵。每一行对应于一对被比较的样本。栏目有：样品1  - 样品1名;的sample2  - 样品2名; LR1  - 似然比，没有比较具体的一致收益/亏损; LR2的 - 最后的可能性比个人比较; GGorLL  - 归类为收益的染色体武器的数量在肿瘤或肿瘤中都损失;神经网络 - 两个肿瘤归类为正常的染色体武器的数量; GL  - 归类在一个肿瘤和损失的增益，在另一个染色体武器的数量; GNorLN  - 染色体武器的数量，被列为一个肿瘤，在另一个正常; IndividualComparisons  - 染色体臂有比较具体的一致收益/损失，他们是完全相同的两个肿瘤和相应的似然比（损）益。 P-值 - 根据零假设（refLR美元LR2的），LR2的匹配值的参考分布的分位数。

OneStepSeg  - is the output of one step  segmentation of the data. It has the same structure as the output of 'segment' from DNAcopy, but only one most prominent change per arm is allowed.
OneStepSeg  - 是一个步骤的数据分割输出。它具有相同的结构作为输出段从DNAcopy，但只有一个每臂最突出的变化是允许的。

ChromClass  - is the matrix of chromosome classifications based on the one step segmentation. Rows correspond to chromosome arms, columns correspond to samples. Chromosome arms are classified by the middle segment if there are 3 segments, and by the most outstanding segment if there are 2 segments.
ChromClass  - 染色体分类的基础上一步分割矩阵。行对应的染色体武器，列对应的样本。染色体臂被归类的中段，如果有3段，由最优秀的部分，如果有2段。

refLR - matrix with the between patient comparisons (reference distribution under the hypothesis of independence). Has the same structure as LR but the pairs of tumors are selected from different patients.
refLR  - 与病人比较（参考分布的独立性假设下）之间的矩阵。作为LR的结构相同，但对肿瘤从不同的患者选择。

Note that calculating the reference distribution might take along time.
需要注意的是计算的参考分布可能沿着时间。

If the  reference is FALSE,  there is no p-value column in LR and no refLR output.
如果参考是假的，有没有LR的P-值列并没有refLR输出。

作者（S）----------Author(s)----------

Irina Ostrovnaya <a href="mailto

strovni@mskcc.org">ostrovni@mskcc.org</a>

参考文献----------References----------

cancer? Evaluating the clonal origin of tumors using array copy number data. Statistics in Medicine, 29: 1608-1621

Bioinformatics, 23:657 63.
number data. Biostatistics 5: 557-572.

举例----------Examples----------

#Analysis of paired breast samples from study[从学习配对的乳腺癌样本分析]
#Hwang ES, Nyante SF, Chen YY, Moore D, DeVries S, Korkola JE, Esserman LJ, and Waldman FM. [黄禹锡胚胎干，Nyante科幻，陈宜瑜，摩尔DeVries医师小号，Korkola乙脑，埃瑟曼LJ，瓦尔德曼调频。]
#Clonality of lobular carcinoma in situ and synchronous invasive lobular cancer. Cancer 100(12):2562-72, 2004.[小叶癌在原位和同步的浸润性小叶癌的克隆。癌症100（12）:2562-72，2004。]
library(gdata) #needed to read .xls files[需要读取xls文件]
library(DNAcopy)
arrayinfo<-read.xls("http://waldman.ucsf.edu/Colon/nakao.data.xls")  #needed to extract  genomic locations[需要提取基因组的位置]
data<-read.xls("http://waldman.ucsf.edu/Breast/Hwang.data.xls")
data<-data[!is.na(data[,2]),]
data<-data[apply(is.na(data),1,sum)<=50,]
data<-data[,apply(is.na(data),2,sum)<=1000]
data$Position<-arrayinfo$Mb[match(toupper(as.character(data[,1])),toupper(as.character(arrayinfo[,1])))]
data<-data[!is.na(data$Position),]
dim(data)
length(unique(paste(data$Chromosome, data$Position))) #there are repeated genomic locations[有重复的基因组位置]
data<-data[c(TRUE,data$Position[-1]!=data$Position[-1864]),] #discard probes with repeated genomic locations[丢弃重复的基因组位置的探针]
data<-data[data$Chromosome<=22,] #getting rid of X and Y chromosomes[摆脱X和Y染色体]
dataCNA<-CNA(as.matrix(data[,c(4:6,28:30)]),maploc=data$Position,chrom=data$Chromosome,sampleid=names(data)[c(4:6,28:30)]) #taking the first 3 patients only to shorten the computation time; use c(4:51) for the full dataset[到第3例患者不仅缩短了计算时间;使用C完整的数据集（4:51）]

dataCNA$maploc<-dataCNA$maploc*1000 #transforming maploc to Kb scale[KB规模转化maploc]
dataCNA$chrom<- splitChromosomes( dataCNA$chrom,dataCNA$maploc)  #splits the chromosomes into arms[染色体分裂成武器]

ptlist<-substr(names(dataCNA)[-c(1,2)],1,4)
samnms<-names(dataCNA)[-c(1,2)]

results<-clonality.analysis(dataCNA, ptlist,  pfreq = NULL, refdata = NULL, nmad = 1.25,
reference = TRUE, allpairs = FALSE)

#genomewide plots of pairs of tumors from the same patient[从同一病人对肿瘤的基因组图]
pdf("genomewideplots.pdf",height=7,width=11)
for (i in unique(ptlist))
{
w<-which(ptlist==i)
ns<- length(w)
if (ns>1)
{
for (p1 in c(1

ns-1)))
for (p2 in c((p1+1):ns))
genomewidePlots(results$OneStepSeg, results$ChromClass,ptlist , ptpair=samnms[c(w[p1],w[p2])],results$LR, plot.as.in.analysis = TRUE)
}
}
dev.off()

pdf("hist.pdf",height=7,width=11)
histogramPlot(results$LR[,4], results$refLR[,4])
dev.off()

for (i in unique(ptlist))
{
pdf(paste("pt",i,".pdf",sep=""),height=7,width=11)
chromosomePlots(results$OneStepSeg, ptlist,ptname=i,nmad=1.25)
dev.off()
}

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册