R语言 genoCN包 genoCNA()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-25 19:09:01

genoCNA(genoCN)
genoCNA()所属R语言包：genoCN

                                       Copy Number Aberration
                                       拷贝数畸变

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

extract genotype and copy number calls for copy number aberrations, which are often observed in tumor tissues
拷贝数畸变，这往往是在观察肿瘤组织中提取的基因型和拷贝数呼叫

用法----------Usage----------

genoCNA(snpNames, chr, pos, LRR, BAF, pBs, sampleID,
  Para=NULL, fixPara=FALSE, cnv.only=NULL, estimate.pi.r=TRUE,
  estimate.pi.b=TRUE, estimate.trans.m=TRUE, outputSeg = TRUE,
  outputSNP=3, outputTag=sampleID, outputViterbi=FALSE,
  Ds=c(1e10, 1e10, rep(1e8, 7)), pBs.alpha=0.001, contamination=TRUE,
  normalGtp=NULL, geno.error=0.01, min.tp=1e-4, max.diff=0.1,
  distThreshold=1e6, transB=c(0.5,.05,.05,0.1,0.1,.05,.05,.05,.05),
  epsilon=0.005, K=5, maxIt=200, seg.nSNP=3, traceIt=5)

参数----------Arguments----------

参数：snpNames
a vector of SNP names. SNPs must be ordered by chromosme locations
一个SNP的名字向量。单核苷酸多态性，必须责令chromosme地点

参数：chr
chromosomes of all the SNPs specified in snpNames
所有的SNPs染色体指定snpNames

参数：pos
positions of all the SNPs specified in snpNames
所有的SNP的位置指定snpNames

参数：LRR
Log R Ratio of all the SNPs specified in snpNames
登录所有的SNPs的R比snpNames指定

参数：BAF
B Allele Frequency of all the SNPs specified in snpNames
B等位基因频率所有的SNPs中指定snpNames

参数：pBs
population frequency of of all the SNPs specified in snpNames
人口的频率对所有指定的单核苷酸多态性snpNames

参数：sampleID
symbol/name of the studied sample. Only one sample is studied each time
符号/研究样本的名称。只有一个样本进行了研究，每次

参数：Para
a list of initial parameters for the HMM. If Para is NULL, The  default initial parameters: init.Para.CNA is used
一个HMM的初始参数的列表。如果段为NULL，使用默认的初始参数：init.Para.CNA

参数：fixPara
if fixPara is TRUE, the parameters in Para are fixed, and are  used directly to calculate posterior probabilities. It is not recommended to  set fixPara as TRUE for CNA studies.
，如果fixPara为TRUE，段的参数是固定的，直接用于计算后验概率。这是不建议设置为TRUE中央社研究fixPara。

参数：cnv.only
a vector indicating those CNV-only probes, for which we  only consider their Log R ratio. If it is NULL, there is no CNV-only probes
矢量表示只有那些CNV的探针，我们只考虑自己的logR比。如果它是空的，没有任何的CNV仅探针

参数：estimate.pi.r
to estimate pi.r (proportion of uniform component for  LRR) or not. By default, estimate.pi.r=FALSE, and the initial value of pi.r is  used to estimate other parameters
估计，pi.r（LRR类的统一组成部分的比例）或没有。默认情况下，estimate.pi.r = FALSE时，初始值的pi.r用于估计其他参数

参数：estimate.pi.b
to estimate pi.b (proportion of uniform component for  BAF) or not. By default, estimate.pi.b=FALSE, and the initial value of pi.b is  used to estimate other parameters
估计，pi.b（曝气生物滤池的统一组成部分的比例）或没有。默认情况下，estimate.pi.b = FALSE和初始值的pi.b用于估计其他参数

参数：estimate.trans.m
to estimate transition probability matrix or not. By  default, estimate.trans.m=FALSE, and the initial value of estimate.trans.m is  used to estimate other parameters
估计转移概率矩阵或不是。默认情况下，estimate.trans.m = FALSE时，初始值的estimate.trans.m用于估计其他参数

参数：outputSeg
wether to output the information of copy number altered  segments
阉羊拷贝数改变段输出的信息

参数：outputSNP
if outputSNP is 0, do not output SNP specific information; if outputSNP is 1, output the most likely copy number and genotype state of the SNPs that are within copy number altered regions; if outputSNP is 2, output the most likely copy number and genotype state of all the SNPs (whether it is within CNV regions or not), if outputSNP is 3, output the posterior probability for all the copy number and genotype states for the SNPs.
outputSNP如果是0，不输出SNP的具体信息; outputSNP如果1，输出最有可能的拷贝数和拷贝数改变区域内的SNP基因型状态;如果outputSNP 2，输出最有可能的拷贝数和所有的SNPs基因型状态（是否是CNV区域内或不），如果outputSNP3，输出所有的拷贝数和基因型状态的后验概率单核苷酸多态性。

参数：outputTag
the prefix of the output files, output of copy number  altered segments is written into file outputTag\_segment.txt, and output of  SNP information is written into file outputTag\_SNP.txt
输出文件的前缀，拷贝数改变段输出写入到文件“_segment.txt，和SNP的信息输出到文件outputTag书面outputTag \ \ _SNP.txt

参数：outputViterbi
whether to output the copy altered regions identified by  the viterbi algorithm. see details
是否输出Viterbi算法确定的副本改变区域。查看详细信息

参数：Ds
Parameter to for transition probability of the HMM. A vector of  length N, where N is the number of states in the HMM
参数过渡HMM的概率。一个向量的长度为N，其中N是状态的HMM

参数：pBs.alpha
pBs.alpha is the lower limit of population B allele frequency, and the  upper limit is 1 - pBs.alpha
pBs.alpha是人口B等位基因频率的下限，上限为1  -  pBs.alpha

参数：contamination
whether tissue contamination is considered
是否被认为是组织污染

参数：normalGtp
normalGtp is specified only if paired tumor-normal  SNP array is availalble. It is the normal tissue genotype for all the SNPs  specified in snpNames, which can only take four different values: -1, 0, 1, and 2. Values 0, 1, 2 correspond to the number of B alleles,  and value -1 indicates the normal genotype is missing. By default,  it is NULL, then all the normal genotype are set missing (-1)
normalGtp如果只指定配对肿瘤正常的SNP阵列availalble。这是正常组织，为所有在snpNames指定的单核苷酸多态性基因型，只能采取四种不同的值：-1，0，1和2。值0，1，2对应B等位基因的数量，和值-1指出，缺少正常的基因型。默认情况下，它是NULL，那么所有的正常基因型设置失踪（-1）

参数：geno.error
probability of genotyping error in normal tissue genotypes
在正常组织基因型的基因分型错误的概率

参数：min.tp
the minimum of transition probability.
转移概率最低。

参数：max.diff
Due to normalization procedure, the BAF may not be symmetric.  Let's use state (AAA, AAB, ABB, BBB) as an example. Ideally, mean values of  normal components AAB and ABB, denoted by mu1 and mu2, respectively, should  have the relation mu1 = 1-mu2 if BAF is symmetric. However, this may not be  true due to normalization procedures. We restrict the difference of mu1 and  (1-mu2) by this parameter max.diff.
由于标准化过程，曝气生物滤池未必对称。作为一个例子，让我们的使用状态（AAA，AAB，ABB，BBB）。理想的情况下，平均MU1与MU2，分别表示，古物咨询委员会和ABB的正常组成部分的价值，应该有关系MU1 = 1-MU2，如果燃油附加费是对称的。然而，这可能不是真正由于标准化程序。我们限制的MU1此参数max.diff和差异（1-MU2）。

参数：distThreshold
If distance between adjacent probes is larger than  distThreshold, restart the transition probability by the default values in transB.
如果相邻的探针之间的距离是较大比distThreshold，重新启动的默认值transB过渡的可能性。

参数：transB
The default transition probability.
默认的转移概率。

参数：epsilon
see explanation of K
见解释K

参数：K
epsilon and K are used to specify the convergence criteria. We say the estimate.para is converged if for K consecutive  updates, the maximum change of parameter estimates in every adjacent step is smaller than epsilon
小量和K用于指定的趋同标准。我们说的estimate.para收敛，如果K的连续更新，最大的参数变化，在每相邻的一步估计比小量小

参数：maxIt
the maximum number of iterations of the EM algorithm to estimate parameters
数量最大的EM迭代算法来估计参数

参数：seg.nSNP
the minimum number of SNPs per segment
每段的最低数量的SNPs

参数：traceIt
if traceIt is a integer n, then the running time is printed out in every n iterations of the EM algorithm. if traceIt is 0 or negative, no tracing information is printed out.
，如果traceIt是一个整数n，然后运行时间将被打印出来在每n个迭代的EM算法。，如果traceIt为0或负数，没有跟踪信息打印出来。

值----------Value----------

results are written into output files
结果写入到输出文件

注意----------Note----------

Copy number altered regions are identified, by default, based on the SNP level  copy number calls. A CNA region boundary is declared simply when the adjacent  SNPs have different copy numbers. An alternative approach is to use viterbi  algorithm to output the “best path”. Most time the results based on the SNP  level copy number calls are the same as the results from viterbi algorithm.  For the following up association studies, the SNP level information is more  relevant if we examine the association SNP by SNP.
拷贝数改变区域确定，默认情况下，基础上的SNP水平拷贝数调用。简单地被宣布时，相邻的SNPs有不同的拷贝数中央社区域边界。另一种方法是使用Viterbi算法输出的“最佳路径”。大部分时间上的SNP水平拷贝数调用的结果是Viterbi算法的结果相同。对于后续的关联研究，SNP级信息更相关，如果我们检查的SNP关联的SNP。

作者（S）----------Author(s)----------

Wei Sun and Zhengzheng Tang

举例----------Examples----------

data(snpData)
data(snpInfo)

dim(snpData)
dim(snpInfo)

snpData[1:2,]
snpInfo[1:2,]

snpInfo[c(1001,1100,10001,10200),]

plotCN(pos=snpInfo$Position, LRR=snpData$LRR, BAF=snpData$BAF,
main = "simulated data on Chr22")

snpNames = snpInfo$Name
chr = snpInfo$Chr
pos = snpInfo$Position
LRR = snpData$LRR
BAF = snpData$BAF
pBs = snpInfo$PFB
cnv.only=(snpInfo$PFB>1)
sampleID="simu1"

# Note this simulated data is more of CNV rather than CNA. [注意：此模拟数据中央社CNV的比。]
# For example, there is no tissue contamination. [例如，有没有组织的污染。]
# We just use it to illustrate the usage of genoCNA. [我们只是用它来说明genoCNA使用。]

Theta = genoCNA(snpNames, chr, pos, LRR, BAF, pBs, contamination=TRUE,
  normalGtp=NULL, sampleID, cnv.only=cnv.only, outputSeg = TRUE,
         outputSNP = 1, outputTag = "simu1")

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册