R语言 GWASTools包 anomIdentifyLowQuality()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-25 21:21:06

anomIdentifyLowQuality(GWASTools)
anomIdentifyLowQuality()所属R语言包：GWASTools

                                    Identify low quality samples
                                       识别低质量的样本

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

Identify low quality samples for which false positive rate for anomaly detection  is likely to be high. Measures of noise (high variance) and high segmentation are used.
识别低质量的样本异常检测的假阳性率可能会很高。噪音（高方差）和高分段的措施。

用法----------Usage----------

anomIdentifyLowQuality(snp.annot, med.sd, seg.info,
  sd.thresh, sng.seg.thresh, auto.seg.thresh)

参数----------Arguments----------

参数：snp.annot
SnpAnnotationDataFrame with column "eligible", where "eligible" is a logical vector indicating whether a SNP is eligible for consideration in anomaly detection (usually FALSE for HLA and XTR regions, failed SNPs, and intensity-only SNPs).  See HLA and pseudoautosomal.
SnpAnnotationDataFrame列“合格”，其中“合格”是一个逻辑向量指示是否一个SNP是考虑在异常检测的资格（通常为HLA和XTR区域的虚假，没有单核苷酸多态性，和强度的单核苷酸多态性）。看到HLA和pseudoautosomal。

参数：med.sd
data.frame of median standard deviation of BAlleleFrequency (BAF) or LogRRatio (LRR) values across autosomes for each scan, with columns "scanID" and "med.sd".  Usually the result of medianSdOverAutosomes. Usually only eligible SNPs are used in these computations. In addition, for BAF, homozygous SNPS are excluded.
数据框的标准差中位数与列“scanID”和“med.sd”BAlleleFrequency（BAF）或LogRRatio（LRR类）值在每个扫描的染色体。通常medianSdOverAutosomes。通常只有合资格的单核苷酸多态性在这些计算中使用。此外，为曝气生物滤池，合子的SNPs是排除。

参数：seg.info
data.frame with segmentation information from anomDetectBAF or anomDetectLOH.  Columns must include "scanID", "chromosome", and "num.segs".  (For anomDetectBAF, segmentation information  is found in $seg.info from output.  For anomDetectLOH, segmentation information is found in $base.info from output.)
数据框分割anomDetectBAF或anomDetectLOH信息。列必须包括“scanID”，“染色体”，和“num.segs”。（anomDetectBAF，分割的信息被发现在$ seg.info从输出。对于anomDetectLOH，分割信息在$ base.info发现从输出。）

参数：sd.thresh
Threshold for med.sd above which scan is identified as low quality. Suggested values are 0.1 for BAF and 0.25 for LOH.
阈值med.sd以上的扫描确定为低质量。建议值是0.1曝气生物滤池和蕙0.25。

参数：sng.seg.thresh
Threshold for segmentation factor for a given chromosome, above which the chromosome is said to be highly segmented.  See Details. Suggested values are 0.0008 for BAF and 0.0048 for LOH.
阈值分割，对于一个给定的染色体，上述染色体被认为是高度分割的因素。查看详细信息。建议值是0.0008曝气生物滤池和蕙0.0048。

参数：auto.seg.thresh
Threshold for segmentation factor across autosome, above which the scan is said to be highly segmented.  See Details. Suggested values are 0.0001 for BAF and 0.0006 for LOH.
阈值分割的因素跨越常染色体，上面的扫描被认为是高度分割的。查看详细信息。建议值是0.0001 BAF和蕙0.0006。

Details

详情----------Details----------

Low quality samples are determined separately with regard to each of the two methods of segmentation, anomDetectBAF and anomDetectLOH.  BAF anomalies (respectively  LOH anomalies) found for samples identified as low quality for BAF (respectively LOH) tend to have a high false positive rate.
低质量的样品分别确定每个分割的两种方法，anomDetectBAF和anomDetectLOH。曝气生物滤池异常（分别蕙异常）为曝气生物滤池（分别为蕙）的低质量鉴定的样本中发现往往有很高的假阳性率。

A scan is identified as low quality due to high variance (noise), i.e. if med.sd is above a certain threshold sd.thresh.
扫描被确定为高方差（噪声），即med.sd如果是高于某个阈值sd.thresh由于低质量。

High segmentation is often an indication of artifactual patterns in the  B Allele Frequency (BAF) or Log R Ratio values (LRR) that are not always captured by high variance.  Here segmentation information is determined by anomDetectBAF or  anomDetectLOH which use circular binary segmentation implemented by the R-package DNAcopy. The measure for high segmentation is a "segmentation factor" = (number of segments)/(number of eligible SNPS).  A single chromosome segmentation factor uses information for one chromosome.  A segmentation factor across  autosomes uses the total number of segments and eligible SNPs across all autosomes. See med.sd, sd.thresh, sng.seg.thresh, and auto.seg.thresh.
高分割往往是在B等位基因频率（BAF）或登录R比率值（LRR类）并不总是由高方差抓获artifactual模式的迹象。这里分割的信息是由anomDetectBAF或anomDetectLOH使用循环二元分割的R包DNAcopy实施。高分段的措施是“分割因素”=（段数）/（资格SNPS）。一条染色体分割因素使用一个染色体的信息。一个染色体之间的分割因子，使用段和所有染色体合资格的单核苷酸多态性的总数。看到med.sd，sd.thresh，sng.seg.thresh，auto.seg.thresh。

值----------Value----------

A data.frame with the following columns:
一个数据框与下面的列：

参数：scanID
integer id for the scan
整数ID扫描

参数：chrX.num.segs
number of segments for chromosome X
段数为X染色体

参数：chrX.fac
segmentation factor for chromosome X
X染色体为分割因素

参数：max.autosome
autosome with highest single segmentation factor
常染色体与最高的单一分割因素

参数：max.auto.fac
segmentation factor for chromosome = max.autosome
染色体的分割因子=max.autosome

参数：max.auto.num.segs
number of segments for chromosome = max.autosome
染色体段数=max.autosome

参数：num.ch.segd
number of chromosomes segmented, i.e. for which change points were found
分割的染色体数，即改变点被发现

参数：fac.all.auto
segmentation factor across all autosomes
在所有染色体的分割因素

参数：med.sd
median standard deviation of BAF (or LRR values) across autosomes. See med.sd in Arguments section.
曝气生物滤池（或LRR类值）的标准差中位数跨染色体。看到med.sd在参数部分。

参数：type
one of the following, indicating reason for identification as low quality:
以下，说明鉴定为低质量的原因：

auto.seg:  segmentation factor fac.all.auto above auto.seg.thresh but med.sd acceptable
auto.seg：分割因素fac.all.auto以上auto.seg.thresh但是med.sd可接受

sd:  standard deviation factor med.sd above sd.thresh but fac.all.auto acceptable
sd：标准偏差因子med.sd以上sd.thresh但是fac.all.auto可接受

both.sd.seg:  both high variance and high segmentation factors, fac.all.auto and med.sd, are above respective thresholds
both.sd.seg：高方差和高分割因素，fac.all.auto和med.sd，各自的阈值以上

sng.seg:  segmentation factor max.auto.fac is above sng.seg.thresh but other measures acceptable
sng.seg：分割因素max.auto.fac是以上sng.seg.thresh但其他措施，可以接受的

sng.seg.X: segmentation factor chrX.fac is above sng.seg.thresh but other measures acceptable
sng.seg.X：分割因素chrX.fac是以上sng.seg.thresh但其他措施，可以接受的

作者（S）----------Author(s)----------

Cecelia Laurie

参见----------See Also----------

findBAFvariance, anomDetectBAF, anomDetectLOH
findBAFvariance，anomDetectBAF，anomDetectLOH

举例----------Examples----------

library(GWASdata)
data(illumina_scan_annot)
scanAnnot <- ScanAnnotationDataFrame(illumina_scan_annot)
data(illumina_snp_annot)
snpAnnot <- SnpAnnotationDataFrame(illumina_snp_annot)

blfile <- system.file("extdata", "illumina_bl.nc", package="GWASdata")
blnc <- NcdfIntensityReader(blfile)
blData <-  IntensityData(blnc, scanAnnot=scanAnnot, snpAnnot=snpAnnot)

genofile <- system.file("extdata", "illumina_geno.nc", package="GWASdata")
genonc <- NcdfGenotypeReader(genofile)
genoData <-  GenotypeData(genonc, scanAnnot=scanAnnot, snpAnnot=snpAnnot)

# initial scan for low quality with median SD[中位数SD低质量的初始扫描]
baf.sd <- sdByScanChromWindow(blData, genoData)
med.baf.sd <- medianSdOverAutosomes(baf.sd)
low.qual.ids <- med.baf.sd$scanID[med.baf.sd$med.sd > 0.05]

# segment and filter BAF[段和过滤曝气生物滤池]
scan.ids <- scanAnnot$scanID[1:2]
chrom.ids <- unique(snpAnnot$chromosome)
snp.ids <- snpAnnot$snpID[snpAnnot$missing.n1 < 1]
data(centromeres.hg18)
anom <- anomDetectBAF(blData, genoData, scan.ids=scan.ids, chrom.ids=chrom.ids,
  snp.ids=snp.ids, centromere=centromeres.hg18, low.qual.ids=low.qual.ids)

# further screen for low quality scans[进一步屏幕低质量扫描]
snpAnnot$eligible <- snpAnnot$missing.n1 < 1
low.qual <- anomIdentifyLowQuality(snpAnnot, med.baf.sd, anom$seg.info,
  sd.thresh=0.1, sng.seg.thresh=0.0008, auto.seg.thresh=0.0001)

close(blData)
close(genoData)

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册