R语言 crlmm包 genotype()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-25 16:03:22

genotype(crlmm)
genotype()所属R语言包：crlmm

                                       Preprocessing and genotyping of Affymetrix arrays.
                                       Affymetrix公司的阵列的预处理和基因型。

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

Preprocessing and genotyping of Affymetrix arrays.
Affymetrix公司的阵列的预处理和基因型。

用法----------Usage----------

genotype(filenames, cdfName, batch, mixtureSampleSize = 10^5, eps =0.1,
      verbose = TRUE, seed = 1, sns, probs = rep(1/3, 3),
      DF = 6, SNRMin = 5, recallMin = 10, recallRegMin = 1000,
      gender = NULL, returnParams = TRUE, badSNP = 0.7)

参数----------Arguments----------

参数：filenames
complete path to CEL files
CEL文件的完整路径

参数：cdfName
  annotation package  (see also validCdfNames)
注释包（又见validCdfNames）

参数：batch
batch variable. See details.
批处理中的变量。查看详情。

参数：mixtureSampleSize
Sample size to be use when fitting the mixture model.
样本大小必须使用混合模型拟合时。

参数：eps
Stop criteria.
停止条件。

参数：verbose
  Logical.  Whether to print descriptive messages during processing.
逻辑。是否打印在加工过程中的描述信息。

参数：seed
Seed to be used when sampling. Useful for reproducibility
种子取样时要使用。有用的重复性

参数：sns
The sample identifiers.  If missing, the default sample names are basename(filenames)
样品标识。如果缺少，默认样品名称basename(filenames)

参数：probs
'numeric' vector with priors for AA, AB and BB.
“数字”的向量为AA，AB和BB的前科。

参数：DF
'integer' with number of degrees of freedom to use with t-distribution.
“整数”数度自由使用t分布。

参数：SNRMin
'numeric' scalar defining the minimum SNR used to filter out samples.
“数字”标定义的最小SNR用筛选出的样本。

参数：recallMin
Minimum number of samples for recalibration.
校准样品的最低数量。

参数：recallRegMin
Minimum number of SNP's for regression.
最低数量的SNP的回归。

参数：gender
  integer vector (  male = 1, female =2 ) or missing, with same length as filenames.  If missing, the gender is predicted.
整数向量（男= 1女= 2）或失踪作为文件名的长度相同。如果丢失，性别进行了预测。

参数：returnParams
'logical'. Return recalibrated parameters from crlmm.
“逻辑”。返回从crlmm重新调整参数。

参数：badSNP
'numeric'. Threshold to flag as bad SNP (affects batchQC)
“数字”。阈值，不好的SNP标志（影响batchQC）

Details

详情----------Details----------

For large datasets it is important to utilize the large data support by installing and loading the ff package before calling the genotype function. In previous versions of the crlmm package, we useed different functions for genotyping depending on whether the ff package is loaded, namely genotype and genotype2.  The genotype function now handles both instances.
对于大型数据集，重要的是利用大量的数据支持，通过安装genotype函数之前调用加载FF包。我们在以前版本的crlmm包，取决于是否FF加载包，即genotype和genotype2useed不同功能的基因分型。现在的genotype函数处理的两个实例。

genotype is essentially a wrapper of the crlmm function for genotyping.  Differences include (1) that the copy number probes (if present) are also quantile-normalized and (2) the class of object returned by this function, CNSet, is needed for subsequent copy number estimation.  Note that the batch variable that must be passed to this function has no effect on the normalization or genotyping steps.  Rather, batch is required in order to initialize a CNSet container with the appropriate dimensions.
genotypecrlmm基因分型的功能基本上是一个的包装。差异包括：（1）拷贝数探针（如果存在）也位数标准化和（2）类的对象，由这个函数返回，CNSet，为随后的拷贝数估计需要。需要注意的是，必须传递给这个函数批处理变量标准化或基因型步骤没有影响。相反，batch需要以CNSet容器初始化适当的尺寸，。

值----------Value----------

A SnpSuperSet instance.
一个SnpSuperSet实例。

注意----------Note----------

For large datasets, load the 'ff' package prior to genotyping – this will greatly reduce the RAM required for big jobs.  See
对于大型数据集，加载FF包之前，基因分型 - 这将大大减少大的工作所需的RAM。见

作者（S）----------Author(s)----------

R. Scharpf

参考文献----------References----------

normalization, and genotype calls of high-density oligonucleotide SNP array data. Biostatistics. 2007 Apr;8(2):485-99. Epub 2006 Dec 22. PMID: 17189563.
Quantifying uncertainty in genotype calls. Bioinformatics. 2010 Jan 15;26(2):242-9.

参见----------See Also----------

snprma, crlmm, ocSamples, ldOpts, batch, crlmmCopynumber
snprma，crlmm，ocSamples，ldOpts，batch，crlmmCopynumber

举例----------Examples----------

if (require(ff) & require(genomewidesnp6Crlmm) & require(hapmapsnp6)){

  path <- system.file("celFiles", package="hapmapsnp6")
  ## the filenames with full path...[＃文件名的完整路径...]
  ## very useful when genotyping samples not in the working directory[＃非常有用的基因分型样品时，在工作目录]
  cels <- list.celfiles(path, full.names=TRUE)
  ## Note: one would need at least 10 CEL files for copy number estimation[＃注：需要拷贝数估计至少有10 CEL文件]
  ## To use less RAM, specify a smaller argument to ocProbesets[＃要使用更少的内存，指定一个较小的参数ocProbesets]
  ocProbesets(50e3)
  batch <- as.factor(rep("A", length(cels)))
  (cnSet <- genotype(cels, cdfName="genomewidesnp6", batch=batch))

  ## when gender is not specified (as in the above example), crlmm tries[＃当性别不指定（如在上面的例子），crlmm尝试]
  ## to predict the gender from SNPs on chromosome X[＃X染色体上的性别从单核苷酸多态性预测]
  cnSet$gender

  ## If gender is known, one should check that the assigned gender is[＃如果性别是已知的，应检查所分配的性别是]
  ## correct. Alternatively, one can pass gender as an argument to the[＃正确的。另外，可以作为参数传递的性别]
  ## genotype function.[＃基因的功能。]
  gender <- c("female", "female", "male")
  gender[gender == "female"] <- 2
  gender[gender == "male"] <- 1
  dim(cnSet)
  table(isSnp(cnSet))
  ## Not run: [＃无法运行：]
      ##---------------------------------------------------------------------------[＃------------------------------------------------- --------------------------]
      ##[＃]
   ## Genotype Imputation[＃基因型归责]
      ##[＃]
      ##---------------------------------------------------------------------------[＃------------------------------------------------- --------------------------]
   if(require("ff")){
   path <- system.file("celFiles", package="hapmapsnp6")
   cels <- list.celfiles(path, full.names=TRUE)
   gtSet <- genotype(cels, batch=rep(1L,3))

      gt1 <- calls(crlmmOutput)
      gt2 <- calls(gtSet)[isSnp(gtSet), ]
      gt2 <- gt2[match(rownames(gt2), rownames(gt1)), ]
   stopifnot(all.equal(gt1, gt2))

   XIndex <- which(chromosome(gtSet)==23 & isSnp(gtSet))
   YIndex <- which(chromosome(gtSet)==24 & isSnp(gtSet))
   A.X <- log2(A(gtSet)[XIndex,,drop=FALSE])
   B.X <- log2(B(gtSet)[XIndex,,drop=FALSE])
   meds.X <- apply(A.X+B.X, 2, median)/2
   A.Y <- log2(A(gtSet)[YIndex,,drop=FALSE])
   B.Y <- log2(B(gtSet)[YIndex,,drop=FALSE])
   meds.Y <- apply(A.Y+B.Y, 2, median)/2
   R <- meds.X - meds.Y
   SNR <- gtSet$SNR[]
   SNRMin <- 5
   gender <- kmeans(R, c(min(R[SNR[]>SNRMin]), max(R[SNR[]>SNRMin])))[["cluster"]]
   plot(seq_along(R), R, pch=21, bg=c("royalblue", "red")[gender],
         xlim=c(0, 4), xaxt='n', xlab="sample index")
   legend("topright", fill=c("royalblue", "red"), legend=c("male", "female"))
   delete(A(gtSet))
   delete(B(gtSet))
   delete(calls(gtSet))
   delete(assayDataElement(gtSet, "snpCallProbability"))
   ff.filenames <- list.files(".", pattern=".ff")
   unlink(ff.filename)
   } ## end if(require("ff"))[＃结束（要求（“FF”））]

## End(Not run)[＃结束（不运行）]
}

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册