featureScores(Repitools)
featureScores()所属R语言包:Repitools
Get scores at regular sample points around genomic features.
在定期采样点周围的基因组功能得到分数。
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Given a GRanges / GRangesList object, or BAM file paths, of reads for each experimental condition, or a matrix or an AffynetrixCelSet, or a numeric matrix of array data, where the rows are probes and the columns are the different samples,and an anntotation of features of interest, scores at regularly spaced positions around the features is calculated. In the case of sequencing data, it is the smoothed coverage of reads divided by the library size. In the case of array data, it is array intensity.
GRanges/GRangesList对象,或BAM文件路径,读取每个实验条件下,或matrix或AffynetrixCelSet,或数字阵列数据矩阵,行探针和列在不同的样品,和利益的功能anntotation,在各地的特点,定期间隔位置的分数计算。在测序数据的情况下,它是平滑的覆盖库的大小分为读取。在阵列中的数据的情况下,它是阵列的强度。
用法----------Usage----------
<p>The ANY,data.frame method: <br>
<code>featureScores(x, anno, ...)</code> <br>
The ANY,GRanges method: <br>
<code>featureScores(x, anno, up = NULL, down = NULL, ...)</code>
</p>
参数----------Arguments----------
Details
详情----------Details----------
If x is a vector of paths or GRangesList object, then names(x) should contain the types of the experiments.
x如果是一个矢量路径或GRangesList对象,然后names(x)应该包含类型的实验。
If anno is a data.frame, it must contan the columns chr, start, and end. Optional columns are strand and name. If anno is a GRanges object, then the name can be present as a column called name in the element metadata of the GRanges object. If names are given, then the coverage matrices will use the names as their row names.
如果anno是data.frame,它必须contan列chr,start,end。可选列strand和name。如果anno是一个GRanges对象,那么该名称可以是一个名为列目前name在农庄对象的元素的元数据。如果给出的名字,然后覆盖矩阵将作为他们的行名称中使用的名称。
An approximation to running mean smoothing of the coverage is used. Reads are extended to the smoothing width, rather than to their fragment size, and coverage is used directly. This method is faster than a running mean of the calculated coverage, and qualtatively almost identical.
运行逼近的意思是用来平滑的覆盖面。读取延伸到平滑的宽度,而不是他们的片段大小,覆盖直接使用。这种方法的计算覆盖率平均运行速度比,和qualtatively几乎相同。
If providing a matrix of array intensity values, the column names of this matrix are used as the names of the samples.
如果提供了一个数组强度值的矩阵,这个矩阵的列名被用作样本的名称。
The annotation can be stranded or not. if the annotation is stranded, then the reference point is the start coordinate for features on the + strand, and the end coordinate for features on the - strand. If the annotation is unstranded (e.g. annotation of CpG islands), then the midpoint of the feature is used for the reference point.
注解可以被滞留或没有。如果注释被搁浅,然后参考点的起始坐标,+链的特点和功能上月底坐标 - 链。如果注释unstranded(如注释的CpG岛),然后使用功能的中点为基准点。
The up and down values give how far up and down from the reference point to find scores. The semantics of them depend on if the annotation is stranded or not. If the annotation is stranded, then they give how far upstream and downstream will be sampled. If the annotation is unstranded, then up gives how far towards the start of a chromosome to go, and down gives how far towards the end of a chromosome to go.
up和down值给多远,从参考点找到成绩。注释如果被滞留或不依赖于他们的语义。如果注释被搁浅,然后他们给多远的上游和下游进行采样。如果注解unstranded,那么up给down开始对染色体去多远,多远对染色体去年底给出。
If sequencing data is being analysed, and dist is "percent", then they give how many percent of each feature's width away from the reference point the sampling boundaries are. If dist is "base", then the boundaries of the sampling region are a fixed width for every feature, and the units of up and down are bases. up and down must be identical if the features are unstranded. The units of freq are percent for dist being "percent", and bases for dist being "base".
如果序列数据正在分析,dist是"percent",然后他们给每个要素的宽度,离参考点采样边界是多少%。如果dist是"base",然后采样区域的边界是一个固定的宽度,每一个功能和up和down是碱基的单位。 up和down必须是相同的功能unstranded。的单位freq%dist是"percent",和dist是"base"基础。
In the case of array data, the sequence of positions described by up, down, and freq actually describe the boundaries of windows, and the probe that is closest to the midpoint of each window is chosen as the representative score of that window. On the other hand, when analysing sequencing data, the sequence of positions refer to the positions that coverage is taken for.
在阵列中的数据的情况下,由up所述职位序列,down,freq实际上描述的边界的窗口,每个窗口中点最接近的探针被选为代表该窗口的得分。另一方面,测序数据分析时,职务序列,覆盖面采取的立场。
Providing a mappability object for sequencing data is recommended. Otherwise, it is not possible to know if a score of 0 is because the window around the sampling position is unmappable, or if there were really no reads mapping there in the experiment. Coverage is normalised by dividing the raw coverage by the total number of reads in a sample. The coverage at a sampling position is multiplied by 1 / mappability. Any positions that have mappabilty below the mappability cutoff will have their score set to NA.
建议提供了一个测序数据mappability的对象。否则,它是不可能知道,如果得分为0,是因为周围的取样位置的窗口,是不可映射,或如果真的有没有在实验中读取映射有。覆盖标准化读取样品中的总人数除以原始覆盖。覆盖在采样位置乘以1 / mappability。任何职位,有以下的mappability截止mappabilty将他们的得分设置为NA。
值----------Value----------
A ScoresList object, that holds a list of score matrices, one for each experiment type, and the parameters that were used to create the score matrices.
一个ScoresList对象,持有得分矩阵,每个实验类型之一,并且被用来创造的得分矩阵的参数列表。
作者(S)----------Author(s)----------
Dario Strbenac, with contributions from Matthew Young at WEHI.
参见----------See Also----------
mergeReplicates for merging sequencing data replicates of an
mergeReplicates复制合并测序数据
举例----------Examples----------
data(chr21genes)
data(samplesList) # Loads 'samples.list.subset'.[负载“samples.list.subset”。]
fs <- featureScores(samples.list.subset[1:2], chr21genes, up = 2000, down = 1000,
freq = 500, s.width = 500)
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|