crlmm(crlmm)
crlmm()所属R语言包:crlmm
Genotype oligonucleotide arrays with CRLMM
与CRLMM基因型寡核苷酸阵列
译者:生物统计家园网 机器人LoveR
描述----------Description----------
This is a faster and more efficient implementation of the CRLMM algorithm, especially designed for Affymetrix SNP 5 and 6 arrays (to be soon extended to other platforms).
这是的CRLMM算法,尤其是Affymetrix公司的SNP 5和6阵列(将很快扩展到其他平台)设计的更快和更有效地实施。
用法----------Usage----------
crlmm(filenames, row.names=TRUE, col.names=TRUE,
probs=c(1/3, 1/3, 1/3), DF=6, SNRMin=5,
gender=NULL, save.it=FALSE, load.it=FALSE,
intensityFile, mixtureSampleSize=10^5,
eps=0.1, verbose=TRUE, cdfName, sns, recallMin=10,
recallRegMin=1000, returnParams=FALSE, badSNP=0.7)
crlmm2(filenames, row.names=TRUE, col.names=TRUE,
probs=c(1/3, 1/3, 1/3), DF=6, SNRMin=5,
gender=NULL, save.it=FALSE, load.it=FALSE,
intensityFile, mixtureSampleSize=10^5,
eps=0.1, verbose=TRUE, cdfName, sns, recallMin=10,
recallRegMin=1000, returnParams=FALSE, badSNP=0.7)
参数----------Arguments----------
参数:filenames
'character' vector with CEL files to be genotyped.
字符CEL文件的向量进行基因分型。
参数:row.names
'logical'. Use rownames - SNP names?
“逻辑”。使用rownames - 单核苷酸多态性的名字吗?
参数:col.names
'logical'. Use colnames - Sample names?
“逻辑”。使用colnames - 样品的名字吗?
参数:probs
'numeric' vector with priors for AA, AB and BB.
“数字”的向量为AA,AB和BB的前科。
参数:DF
'integer' with number of degrees of freedom to use with t-distribution.
“整数”数度自由使用t分布。
参数:SNRMin
'numeric' scalar defining the minimum SNR used to filter out samples.
“数字”标定义的最小SNR用筛选出的样本。
参数:gender
'integer' vector, with same length as 'filenames', defining sex. (1 - male; 2 - female)
“整数”的向量,为“文件名”定义性别,相同长度的。 (1 - 男 - 女2)
参数:save.it
'logical'. Save preprocessed data?
“逻辑”。保存预处理数据?
参数:load.it
'logical'. Load preprocessed data to speed up analysis?
“逻辑”。装入预处理的数据,以加快分析?
参数:intensityFile
'character' with filename to be saved/loaded - preprocessed data.
“字符的文件名保存/加载 - 预处理数据。
参数:mixtureSampleSize
Number of SNP's to be used with the mixture model.
使用混合模型的SNP的数目。
参数:eps
Minimum change for mixture model.
混合模型的最小变化。
参数:verbose
'logical'.
“逻辑”。
参数:cdfName
'character' defining the CDF name to use ('GenomeWideSnp5', 'GenomeWideSnp6')
字符定义民防部队的名称使用(“GenomeWideSnp5,GenomeWideSnp6)
参数:sns
'character' vector with sample names to be used.
样品名称要使用“字符”向量。
参数:recallMin
Minimum number of samples for recalibration.
校准样品的最低数量。
参数:recallRegMin
Minimum number of SNP's for regression.
最低数量的SNP的回归。
参数:returnParams
'logical'. Return recalibrated parameters.
“逻辑”。返回重新调整参数。
参数:badSNP
'numeric'. Threshold to flag as bad SNP (affects batchQC)
“数字”。阈值,不好的SNP标志(影响batchQC)
Details
详情----------Details----------
'crlmm2' allows one to genotype very large datasets (via ff package) and also permits the use of clusters or multiple cores (via snow package) to speed up genotyping.
“crlmm2允许基因型非常大的数据集(通过FF软件包),并允许使用聚类或多个内核(通过雪包),以加快基因分型。
As noted above, the call probabilities are stored using an integer representation to reduce file size using the transformation 'round(-1000*log2(1-p))', where p is the probability. The function i2P can be used to convert the integers back to the scale of probabilities.
如上所述,通话概率存储使用整数表示,以减少文件大小,使用轮改造(-1000 *为log2(1-P)),其中p是概率。可以使用的功能i2P整数转换回规模的可能性。
值----------Value----------
A SnpSet object.
一个SnpSet对象。
参数:calls
Genotype calls (1 - AA, 2 - AB, 3 - BB)
的基因型分型(1 - AA,2 - AB公司,3 - BB的)
参数:confs
Confidence scores 'round(-1000*log2(1-p))'
信心分数“轮(-1000 *为log2(1-P))”
参数:SNPQC
SNP Quality Scores
SNP的质量得分
参数:batchQC
Batch Quality Score
批次质量分数
参数:params
Recalibrated parameters
重新调整参数
参考文献----------References----------
Irizarry RA. Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data. Biostatistics. 2007 Apr;8(2):485-99. Epub 2006 Dec 22. PMID: 17189563.
Quantifying uncertainty in genotype calls. Bioinformatics. 2010 Jan 15;26(2):242-9.
参见----------See Also----------
i2p, snpCall, snpCallProbability
i2p,snpCall,snpCallProbability
举例----------Examples----------
## this can be slow[#这可能是缓慢的]
if (require(genomewidesnp6Crlmm) & require(hapmapsnp6)){
path <- system.file("celFiles", package="hapmapsnp6")
## the filenames with full path...[#文件名的完整路径...]
## very useful when genotyping samples not in the working directory[#非常有用的基因分型样品时,在工作目录]
cels <- list.celfiles(path, full.names=TRUE)
(crlmmOutput <- crlmm(cels))
invisible(open(crlmmOutput$gender))
stopifnot(all(crlmmOutput$gender[] == c(2,2,1)))
invisible(close(crlmmOutput$gender))
## If gender is known, one should check that the assigned gender is[#如果性别是已知的,应检查所分配的性别是]
## correct, or pass the integer coding of gender as an argument to the[#正确,或性别的整数编码作为参数传递]
## crlmm function as done below[#crlmm完成以下功能]
gender <- c("female", "female", "male")
gender[gender == "female"] <- 2
gender[gender == "male"] <- 1
}
## Not run: [#无法运行:]
## HPC Example[#高性能计算范例]
library(ff)
library(snow)
library(crlmm)
## genotype 50K SNPs at a time[一次#基因型50K的SNPs]
ocProbesets(50000)
## setup cluster - 8 cores on the machine[#设置聚类 - 8核心的机器上]
setCluster(8, "SOCK")
path <- system.file("celFiles", package="hapmapsnp6")
cels <- list.celfiles(path, full.names=TRUE)
crlmmOutput <- crlmm2(cels)
## End(Not run)[#结束(不运行)]
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|