Threshold possible binding sites by Score or FDR
译者:生物统计家园网 机器人LoveR
Threshold the possible binding sites based on score, or False Discovery Rate (FDR). To threshold on FDR, you must have computed an FDR/Score map using calc.fdr, and chosen an FDR threshold, for which makeFdrPlot() is helpful.
阈值的可能结合位点得分,或错误发现率(FDR)的基础上。 FDR阈值,则必须计算的FDR /的分数图使用calc.fdr,并选择FDR阈值,这makeFdrPlot()是有帮助的。
output.sites(seqsScores, scoreThreshold = NULL,
fdrScoreMap = NULL, fdrThreshold = NULL)
score.ms output representing scores for candidate binding sites
A numeric value giving the lower score boundary significance threshold. Sequences with scores higher than this boundary will be selected. (Not required if thresholding by FDR.)
的得分越低,边界的意义阈值的数值。序列的得分较高,超过这个边界将被选中。 (不要求如果阈值由FDR)。
calc.fdr output giving mapping between score/FDR (only required if thresholding by FDR).
calc.fdr输出之间的映射得分/ FDR(如果阈值,只需要通过FDR)。
A numeric value between 0 and 1 giving upper FDR boundary- any site with a lower FDR score will be output. (only required if thresholding by FDR)
将输出的数值介于0和1之间,给上FDR FDR得分较低的边界任何网站。 (由FDR阈值时才需要)
Features object containing thresholded Transcription Factor Binding Sites, their locations, scores, strand, etc. If thresholding by score, this is equivalent to seqsScores[seqsScores$score > scoreThreshold,].
对象特点的阈值转录因子结合位点,它们的位置,分数,钢绞线等,如果阈值的得分,这是相当于seqsScores[seqsScores$score > scoreThreshold,]。
exampleArchive <- system.file("extdata", "NRSF.zip", package="rtfbs")
seqFile <- "input.fas"
unzip(exampleArchive, seqFile)
# Read in FASTA file "input.fas" from the examples into an [阅读在FASTA的文件“input.fas”中的例子成]
# MS (multiple sequences) object[多个序列(MS)对象]
ms <- read.ms(seqFile);
pwmFile <- "pwm.meme"
unzip(exampleArchive, pwmFile)
# Read in Position Weight Matrix (PWM) from MEME file from[阅读中的位置权重矩阵(PWM)MEME文件]
# the examples into a Matrix object[到Matrix对象的例子]
pwm <- read.pwm(pwmFile)
# Build a 3rd order Markov Model to represent the sequences[建立一个3阶Markov模型来表示序列]
# in the MS object "ms". The Model will be a list of[在MS对象“MS”。该模型将一个列表]
# matrices corrisponding in size to the order of the [矩阵的大小的顺序corrisponding]
# Markov Model[马尔可夫模型]
mm <- build.mm(ms, 3);
# Match the PWM against the sequences provided to find[对找到的序列匹配PWM]
# possible transcription factor binding sites. A [可能的转录因子结合位点。一]
# Features object is returned, containing the location[返回对象特点的,包含位置]
# of each possible binding site and an associated score.[每个可能的结合位点和一个相关的得分。]
# Sites with a negative score are not returned unless [除非网站不会返回一个负的成绩]
# we set threshold=-Inf as a parameter.[我们作为一个参数设置的阈值=-INF。]
cs <- score.ms(ms, pwm, mm)
# Generate a sequence 1000 bases long using the supplied[术语使用提供的1000个碱基生成一个序列]
# Markov Model and random numbers[马尔可夫模型和随机数]
v <- simulate.ms(mm, 10000)
# Match the PWM against the sequences provided to find[对找到的序列匹配PWM]
# possible transcription factor binding sites. A [可能的转录因子结合位点。一]
# Features object is returned, containing the location[返回对象特点的,包含位置]
# of each possible binding site and an associated score.[每个可能的结合位点和一个相关的得分。]
# Sites with a negative score are not returned unless [除非网站不会返回一个负的成绩]
# we set threshold=-Inf as a parameter. Any identified[我们作为一个参数设置的阈值=-INF。任何确定的]
# binding sites from simulated data are false positives[从模拟数据的结合位点是误报]
# and used to calculate False Discovery Rate[并用于计算假发现率]
xs <- score.ms(v, pwm, mm)
# Calculate the False Discovery Rate for each possible[每一个可能的假发现率计算]
# binding site in the Features object CS. Return[结合位点的特点反对CS。返回]
# a mapping between each binding site score and the[每个结合位点的得分和之间的映射]
# associated FDR.[相关的FDR。]
fdr <- calc.fdr(ms, cs, v, xs)
# Output identified transcription factor binding sites [输出确定了转录因子结合位点]
# below a 0.5 FDR threshold[低于0.5 FDR阈值]
output.sites(cs, NULL, fdr, 0.5)
# OR[或]
# Output identified transcription factor binding sites[输出确定了转录因子结合位点]
# above a 5.2 score threshold [超过了5.2的分数阈值]
output.sites(cs, 5.2)
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。