anomDetectBAF(GWASTools)
anomDetectBAF()所属R语言包:GWASTools
BAF Method for Chromosome Anomaly Detection
曝气生物滤池为染色体异常检测方法
译者:生物统计家园网 机器人LoveR
描述----------Description----------
anomSegmentBAF for each sample and chromosome, breaks the chromosome up into segments marked by change points of a metric based on B Allele Frequency (BAF) values.
anomSegmentBAF每个样品和染色体,分解成标B等位基因频率(BAF)的价值观为基础的指标的变化点段的染色体。
anomFilterBAF selects segments which are likely to be anomalous.
anomFilterBAF选择的分部,有可能是反常的。
anomDetectBAF is a wrapper to run anomSegmentBAF and anomFilterBAF in one step.
anomDetectBAF运行anomSegmentBAF和anomFilterBAF一步一个包装。
用法----------Usage----------
anomSegmentBAF(intenData, genoData, scan.ids, chrom.ids, snp.ids,
smooth = 50, min.width = 5, nperm = 10000, alpha = 0.001,
verbose = TRUE)
anomFilterBAF(intenData, genoData, segments, snp.ids, centromere,
low.qual.ids = NULL, num.mark.thresh = 15, long.num.mark.thresh = 200,
sd.reg = 2, sd.long = 1, low.frac.used = 0.1, run.size = 10,
inter.size = 2, low.frac.used.num.mark = 30, very.low.frac.used = 0.01,
low.qual.frac.num.mark = 150, lrr.cut = -2, ct.thresh = 10,
frac.thresh = 0.1, verbose=TRUE)
anomDetectBAF(intenData, genoData, scan.ids, chrom.ids, snp.ids,
centromere, low.qual.ids = NULL, ...)
参数----------Arguments----------
参数:intenData
An IntensityData object containing the B Allele Frequency. The order of the rows of intenData and the snp annotation are expected to be by chromosome and then by position within chromosome. The scan annotation should contain sex, coded as "M" for male and "F" for female.
IntensityData对象包含B等位基因频率。预计intenData和SNP注解行的顺序是由染色体,然后由内染色体上的位置。扫描注释应包含性别,代号为“M”代表男性和“F”为女性。
参数:genoData
A GenotypeData object. The order of the rows of genoData and the snp annotation are expected to be by chromosome and then by position within chromosome.
一个GenotypeData对象。预计行的genoData和SNP注解的顺序是由染色体,然后由内染色体上的位置。
参数:scan.ids
vector of scan ids (sample numbers) to process
IDS扫描矢量(样本数),以处理
参数:chrom.ids
vector of (unique) chromosomes to process. Recommended to include all autosomes.
向量(唯一的)染色体处理。建议包括所有染色体。
参数:snp.ids
vector of eligible snp ids. Usually exclude failed and intensity-only SNPs. Also recommended to exclude an HLA region on chromosome 6 and XTR region on chromosome 23 (X). See HLA and pseudoautosomal.
合资格的SNP ID向量。一般排除失败和强度的SNPs。还建议排除在6号染色体和23号染色体上的(X)的区域XTR的HLA区域。看到HLA和pseudoautosomal。
参数:smooth
number of markers for smoothing region. See smooth.CNA in the DNAcopy package.
数平滑区域的标记。看到smooth.CNADNAcopy包。
参数:min.width
minimum number of markers for a segment. See segment in the DNAcopy package.
段标记的最低数量。看到segmentDNAcopy包。
参数:nperm
number of permutations for deciding significance in segmentation. See segment in the DNAcopy package.
在分割决定意义的排列数。看到segmentDNAcopy包。
参数:alpha
significance level. See segment in the DNAcopy package.
显着水平。看到segmentDNAcopy包。
参数:verbose
logical indicator whether to print information about the scan id currently being processed. anomSegmentBAF prints each scan id; anomFilterBAF prints a message after every 10 samples: "processing ith scan id out of n" where "ith" with be 10, 10, etc. and "n" is the total number of samples
逻辑指示灯是否打印扫描ID目前正在处理的信息。 anomSegmentBAF打印每个扫描ID; anomFilterBAF打印后,每10个样品的一则消息:“处理第i扫描ID的N”,其中“第i”是10,10,等等。和“n”是样本总数
参数:segments
data.frame of segments from anomSegmentBAF. Names must include "scanID", "chromosome", "num.mark", "left.index", "right.index", "seg.mean". Here "left.index" and "right.index" are row indices of intenData. Left and right refer to start and end of anomaly,respectively, in position order.
从anomSegmentBAF段的数据框。名称必须包括“scanID”,“染色体”,“num.mark”,“left.index”,“right.index”,“seg.mean”。这里“left.index”和“right.index”行intenData指数。左,右指开始和结束的异常,位置顺序,分别在。
参数:centromere
data.frame with centromere position information. Names must include "chrom", "left.base", "right.base". Valid values for "chrom" are 1:22, "X", "Y", "XY". Here "left.base" and "right.base" are base positions of start and end of centromere location in position order.
与着丝粒的位置信息的数据框。名称必须包括“铬”,“left.base”,“right.base”。有效的值是1:22,“铬”的“X”型,“Y”,“XY”。这里的“left.base”和“right.base”碱基位置的开始和结束位置以便在着丝粒的位置。
参数:low.qual.ids
scan ids determined to be low quality for which some segments are filtered based on more stringent criteria. Default is NULL. Usual choice are scan ids for which median BAF across autosomes > 0.05. See sdByScanChromWindow and medianSdOverAutosomes.
扫描IDS确定是低质量的某些环节更严格的标准的基础上筛选。默认值为NULL。通常的选择是扫描IDS位数跨染色体> 0.05曝气生物滤池。看到sdByScanChromWindow和medianSdOverAutosomes。
参数:num.mark.thresh
minimum number of SNP markers in a segment to be considered for anomaly
在段被视为异常SNP标记的最低数量
参数:long.num.mark.thresh
min number of markers for "long" segment to be considered for anomaly for which significance threshold criterion is allowed to be less stringent
被视为异常,其中意义的阈值标准允许不那么严格分钟标记为“长”段数
参数:sd.reg
number of baseline standard deviations of segment mean from a baseline mean for "normal" needed to declare segment anomalous. This number is given by abs(mean of segment - baseline mean)/(baseline standard deviation)
基线段的标准偏差的意思是从基线的意思为“正常”,需要申报段异常。这个数字是由ABS(指段 - 基线平均)/(基线标准偏差)
参数:sd.long
same meaning as sd.reg but applied to "long" segments
sd.reg含义相同,但适用于“长”段
参数:low.frac.used
if fraction of heterozygous or missing SNP markers compared with number of eligible SNP markers in segment is below this, more stringent criteria are applied to declare them anomalous.
如果符合条件的SNP标记段数比杂合子或丢失的SNP标记的分数低于这个,更严格的标准,申报异常。
参数:run.size
min length of run of missing or heterozygous SNP markers for possible determination of homozygous deletions
运行失踪或杂合子SNP标记的最小长度为纯合性缺失可能决心
参数:inter.size
number of homozygotes allowed to "interrupt" run for possible determination of homozygous deletions
允许“中断”纯合子纯合性缺失可能决心运行
参数:low.frac.used.num.mark
number of markers threshold for low.frac.used segments (which are not declared homozygous deletions
标记为low.frac.used分部(不宣布纯合性缺失的阈值
参数:very.low.frac.used
any segments with (num.mark)/(number of markers in interval) less than this are filtered out since they tend to be false positives
任何分部(num.mark)/(标记间隔数)小于这个被过滤掉,因为他们往往是误报
参数:low.qual.frac.num.mark
minimum num.mark threshold for low quality scans (low.qual.ids) for segments that are also below low.frac.used threshold
低质量的扫描段,也低于low.frac.used阈值(low.qual.ids的的最低num.mark阈值)
参数:lrr.cut
look for runs of LRR values below lrr.cut to adjust homozygous deletion endpoints
看看下面的lrr.cut调整缺失端点LRR类值运行
参数:ct.thresh
minimum number of LRR values below lrr.cut needed in order to adjust
LRR类值低于最低数量lrr.cut需要,以调整
参数:frac.thresh
investigate interval for homozygous deletion only if lrr.cut and ct.thresh thresholds met and (# LRR values below lrr.cut)/(# eligible SNPs in segment) > frac.thresh
调查缺失的间隔只有lrr.cut和ct.thresh阈值满足(#LRR类值低于lrr.cut)/(#资格的SNP段)>frac.thresh
参数:...
arguments to pass to anomFilterBAF
参数传递anomFilterBAF
Details
详情----------Details----------
anomSegmentBAF uses the function segment from the DNAcopy package to perform circular binary segmentation on a metric based on BAF values. The metric for a given sample/chromosome is sqrt(min(BAF,1-BAF,abs(BAF-median(BAF))) where the median is across BAF values on the chromosome. Only BAF values for heterozygous or missing SNPs are used.
anomSegmentBAF使用的功能segmentDNAcopy包BAF值度量的基础上执行循环二元分割。对于一个给定的样本/染色体的度量是SQRT(分钟(,1曝气生物滤池,曝气生物滤池,ABS(曝气生物滤池中位数(BAF)的))为杂合子或丢失的单核苷酸多态性在染色体上的BAF值之间的中位数。只有BAF值是用来。
anomFilterBAF determines anomalous segments based on a combination of thresholds for number of SNP markers in the segment and on deviation from a "normal" baseline. (See num.mark.thresh,long.num.mark.thresh, sd.reg, and sd.long.) The "normal" baseline metric mean and standard deviation are found across all autosomes not segmented by anomSegmentBAF. This is why it is recommended to include all autosomes for the argument chrom.ids to ensure a more accurate baseline.
anomFilterBAF决定的基础上,SNP标记在段和偏差,从一个“正常”的基准数的阈值相结合的异常段。 (见num.mark.thresh,long.num.mark.thresh,sd.reg,sd.long)。“正常”基线度量均值和标准差,发现对面<X染色体不分段>这就是为什么它被推荐到包括所有参数anomSegmentBAF,以确保一个更准确的基线染色体。
Some initial filtering is done, including possible merging of consecutive segments meeting sd.reg threshold along with other criteria (such as not spanning the centromere) and adjustment for accurate break points for possible homozygous deletions (see lrr.cut, ct.thresh, frac.thresh, run.size, and inter.size). Male samples for chromosome 23 (X) are not processed.
做一些初步的筛选,包括可能合并连续段,以及与其他标准(如不跨越着丝粒)和准确的破发点,调整可能的纯合性缺失(见sd.reg,满足lrr.cut阈值 ct.thresh,frac.thresh,run.size,inter.size)。男23号染色体(X)的样本不被处理。
More stringent criteria are applied to some segments (see low.frac.used,low.frac.used.num.mark, very.low.frac.used, low.qual.ids, and low.qual.frac.num.mark).
更严格的标准适用于某些环节(见low.frac.used,low.frac.used.num.mark,very.low.frac.used,low.qual.ids,low.qual.frac.num.mark)。
anomDetectBAF runs anomSegmentBAF with default values and then runs anomFilterBAF. Additional parameters for anomFilterBAF may be passed as arguments.
anomDetectBAF运行anomSegmentBAF默认值,然后运行anomFilterBAF。 anomFilterBAF额外的参数,可作为参数传递。
值----------Value----------
anomSegmentBAF returns a data.frame with the following elements: Left and right refer to start and end of anomaly, respectively, in position order.
anomSegmentBAF返回一个数据框与下列因素:左,右指位置顺序,开始和结束的异常,分别在。
参数:scanID
integer id of scan
整数ID扫描
参数:chromosome
chromosome as integer where 23 refers to X chromosome
染色体为整数,其中23是指X染色体
参数:left.index
row index of intenData indicating left endpoint of segment
行索引intenData表明,段左端点
参数:right.index
row index of intenData indicating right endpoint of segment
段右端点行索引intenData表明,
参数:num.mark
number of heterozygous or missing SNPs in the segment
在该段杂合子或丢失的SNPs
参数:seg.mean
mean of the BAF metric over the segment
曝气生物滤池度量意味着在段
anomFilterBAF and anomDetectBAF return a list with the following elements:
anomFilterBAF和anomDetectBAF返回一个列表,包含下列元素:
参数:raw
data.frame of raw segmentation data, with same output as anomSegmentBAF as well as:
数据框相同的输出,原材料分割数据anomSegmentBAF以及:
left.base: base position of left endpoint of segment
left.base:段左端点的基础地位
right.base: base position of right endpoint of segment
right.base:段右端点的基础地位
sex: sex of scan.id coded as "M" or "F"
sex:scan.id性别编码为“M”或“F”
sd.fac: measure of deviation from baseline equal to abs(mean of segment - baseline mean)/(baseline standard deviation); used in determining anomalous segments
sd.fac:测量从基线的偏差等于ABS(指段 - 基线平均)/(基线标准偏差);异常段,在决定使用
参数:filtered
data.frame of the segments identified as anomalies, with the same columns as raw as well as:
数据框认定为异常raw以及相同的列,段:
merge: TRUE if segment was a result of merging. Consecutive segments from output of anomSegmentBAF that meet certain criteria are merged.
merge:段TRUE,如果是一个合并的结果。合并anomSegmentBAF,符合一定的标准,连续输出段。
homodel.adjust: TRUE if original segment was adjusted to narrow in on a homozygous deletion
homodel.adjust:如果原段调整,以缩小上缺失
frac.used: fraction of (eligible) heterozygous or missing SNP markers compared with total number of eligible SNP markers in segment
frac.used:(资格)杂合子或失踪的SNP标记的分数段资格的SNP标记的总数相比
参数:base.info
data frame with columns:
数据框的列:
scanID: integer id of scan
scanID整数ID:扫描
base.mean: mean of non-anomalous baseline. This is the mean of the BAF metric for heterozygous and missing SNPs over all unsegmented autosomes that were considered.
base.mean“:是指非异常基线。这是平均的曝气生物滤池对所有被认为无节染色体杂合子和失踪的SNPs度量。
base.sd: standard deviation of non-anomalous baseline
base.sd:非异常基线标准偏差
chr.ct: number of unsegmented chromosomes used in determining the non-anomalous baseline
chr.ct:在确定非异常基线使用无节染色体数目
参数:seg.info
data frame with columns:
数据框的列:
scanID: integer id of scan
scanID整数ID:扫描
chromosome: chromosome as integer
chromosome:染色体为整数
num.segs: number of segments produced by anomSegmentBAF
num.segs:anomSegmentBAF生产分部
注意----------Note----------
It is recommended to include all autosomes as input. This ensures a more accurate determination of baseline information.
建议包括作为输入的所有染色体。这将确保更精确地确定的基线资料。
作者(S)----------Author(s)----------
Cecelia Laurie
参考文献----------References----------
The BAF metric used is modified from Itsara,A., et.al (2009) Population Analysis of Large Copy Number Variants and Hotspots of Human Genetic Disease. American Journal of Human Genetics, 84, 148–161.
参见----------See Also----------
segment and smooth.CNA in the package DNAcopy, also findBAFvariance, anomDetectLOH
segment和smooth.CNA包DNAcopy,findBAFvariance,anomDetectLOH的
举例----------Examples----------
library(GWASdata)
data(illumina_scan_annot)
scanAnnot <- ScanAnnotationDataFrame(illumina_scan_annot)
data(illumina_snp_annot)
snpAnnot <- SnpAnnotationDataFrame(illumina_snp_annot)
blfile <- system.file("extdata", "illumina_bl.nc", package="GWASdata")
blnc <- NcdfIntensityReader(blfile)
blData <- IntensityData(blnc, scanAnnot=scanAnnot, snpAnnot=snpAnnot)
genofile <- system.file("extdata", "illumina_geno.nc", package="GWASdata")
genonc <- NcdfGenotypeReader(genofile)
genoData <- GenotypeData(genonc, scanAnnot=scanAnnot, snpAnnot=snpAnnot)
# segment BAF[段曝气生物滤池]
scan.ids <- scanAnnot$scanID[1:2]
chrom.ids <- unique(snpAnnot$chromosome)
snp.ids <- snpAnnot$snpID[snpAnnot$missing.n1 < 1]
seg <- anomSegmentBAF(blData, genoData, scan.ids=scan.ids,
chrom.ids=chrom.ids, snp.ids=snp.ids)
# filter segments to detect anomalies[发现异常的过滤段]
data(centromeres.hg18)
filt <- anomFilterBAF(blData, genoData, segments=seg, snp.ids=snp.ids,
centromere=centromeres.hg18)
# alternatively, run both steps at once[另外,一次运行两个步骤]
anom <- anomDetectBAF(blData, genoData, scan.ids=scan.ids, chrom.ids=chrom.ids,
snp.ids=snp.ids, centromere=centromeres.hg18)
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|