filterFFT(nucleR)
filterFFT()所属R语言包:nucleR
Clean noise and smoothing for genomic data using Fourier-analysis
清洁使用傅立叶分析基因组数据的噪声和平滑
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Remove noise from genomic data smoothing and cleaning the observed signal. This function doesn't alter the shape or the values of the signal as much as the traditional method of sliding window average does, providing a great correlation within the original and filtered data (>0.99).
从基因组数据的平滑和清洁所观察到的信号的噪声。此功能不会改变形状或信号值的多少作为传统的滑动窗口平均法,确实提供了一个很大的相关性在原有和过滤数据(> 0.99)。
用法----------Usage----------
## S4 method for signature 'SimpleRleList'
filterFFT(data, pcKeepComp="auto", showPowerSpec=FALSE, useOptim=TRUE, mc.cores=1, ...)
## S4 method for signature 'list'
filterFFT(data, pcKeepComp="auto", showPowerSpec=FALSE, useOptim=TRUE, mc.cores=1, ...)
## S4 method for signature 'Rle'
filterFFT(data, pcKeepComp="auto", showPowerSpec=FALSE, useOptim=TRUE, ...)
## S4 method for signature 'numeric'
filterFFT(data, pcKeepComp="auto", showPowerSpec=FALSE, useOptim=TRUE, ...)
参数----------Arguments----------
参数:data
Coverage or intensities values representing the results of the NGS of TA experiment. This attribute could be a individual vector representing a chromosome (Rle or numeric object) or a list of them.
值范围或强度代表农工商TA的实验结果。此属性可以是一个单独的向量代表一个染色体(Rle或numeric对象)或它们的列表。
参数:pcKeepComp
Number of components to select, in percentage respect total length of the sample. Allowed values are numeric (in range 0:1) for manual setting or "auto" for automatic detection. See details.
选择,样品的总长度在比例方面,元件数量。允许的值是数值手动设置或自动检测“自动”(0:1范围内)。查看详情。
参数:showPowerSpec
Plot the Power Spectrum of the Fast Fourier Transform to visually identify the selected components (see details).
绘制功率谱的快速傅立叶变换在视觉识别(见详情)选定的组件。
参数:useOptim
This function implements tweaks to a standard fft call to improve (dramatically) the performance in large genomic data. These optimizations can be bypassed by setting this parameter to FALSE.
实现此功能调整到一个标准的FFT呼吁提高(大大)在大基因组数据的表现。这些优化可以绕过设置这个参数FALSE。
参数:mc.cores
If multiple cores are available, maximum number of them to use for parallel processing of data elements (only useful if data is a list of elements)
如果有多个内核,他们的最大数量的使用data元素并行处理的(唯一有用的data如果是一个元素列表)
参数:...
Other parameters to be passed to pcKeepCompDetect function
其他的参数被传递给pcKeepCompDetect函数
Details
详情----------Details----------
Fourier-analysis principal components selection is widely used in signal processing theory for an unbiased cleaning of a signal over the time.
傅立叶主成分分析选择被广泛应用于信号处理理论的一个信号在时间不偏不倚清洗。
Other procedures, as the traditional sliding window average, can change too much the shape of the results in function of the size of the window, and moreover they don't only smooth the noise without removing it.
其他程序,如传统的平均滑动窗口,可以改变窗口大小的功能,结果过多的形状,而且他们不只是删除它光滑,无噪音。
With a Fourier Transform of the original signal, the input signal is descomposed in diferent wavelets and described as a combination of them. Long frequencies can be explained as a function of two ore more periodical shorter frequecies. This is the reason why long, unperiodic sequences are usually identified as noise, and therefore is desireable to remove them from the signal we have to process.
输入信号与原始信号的傅立叶变换,descomposed diferent小波和描述它们的组合。长频率可以解释为两个矿石周期短frequecies功能。这是为什么长,unperiodic的序列通常被认定为噪声的原因,并因此是desireable以消除他们从我们要处理的信号。
This procedure here is applied to genomic data, providing a novel method to obtain perfectly clean values wich allow an efficient detection of the peaks which can be used for a direct nucleosome position recognition.
此过程在这里被应用到基因组数据,提供了一种新的方法来获得完美的清洁值wich允许高效的检测,可用于直接的核小体位置识别峰。
This function select a certain number of components in the original power spectrum (the result of the Fast Fourier Transform which can be seen with showPowerSpec=TRUE) and sets the rest of them to 0 (component knock-out).
此功能选择在原来的功率谱若干组件(快速傅立叶变换的结果可以用showPowerSpec=TRUE)和套,其余为0(组件敲除)。
The amout of components to keep (given as a percentage of the input lenght) can be set by the pcKeepComp. This will select the first components of the signal, knock-outing the rest. If this value is close to 1, more components will be selected and then more noise will be allowed in the output. For an effective filtering which removes the noise keeping almost all relevant peaks, a value between 0.01 and 0.05 is usually sufficient. Lower values can cause merging of adjacent minor peaks.
可以设置组件的大写金额,以保持(输入长度的百分比)由pcKeepComp。这将选择第一信号,敲郊游其余部分。如果这个值接近1,更多的组件将被选中,然后更多的噪音将被允许在输出。对于一个有效的过滤,从而消除了噪音,保持几乎所有相关峰,0.01和0.05之间的值通常是足够的。较低的值可以导致合并相邻的小峰。
This library also allows the automatic detection of a fitted value for pcKeepComp. By default, if uses the pcKeepCompDetect function, which looks which is the minimum percentage of components than can reproduce the original signal with a corelation between the filtered and the original one of 0.99. See the help page of pcKeepCompDetect for further details and reference of available parameters.
这个库还允许一个pcKeepComp拟合值的自动检测。默认情况下,如果使用pcKeepCompDetect功能,这看起来是最低的组件可以过滤和0.99原之间的相关性实证分析与重现原始信号的比例比。进一步的细节和可用参数参考pcKeepCompDetect帮助页。
One of the most powerful features of nucleR is the efficient implementation of the FFT to genomic data. This is achived trought few tweaks that allow an optimum performance of the Fourier Transform. This includes a by-range filtering, an automatic detection of uncovered regions, windowed execution of the filter and padding of the data till nearest power of 2 (this ensures an optimum case for FFT due the high factorization of components). Internal testing showed up that in specific datasets, these optimizations lead to a dramatic improvement of many orders of magnitude (from 3 days to few seconds) while keeping the correlation between the native fft call and our filterFFT higher than 0.99. So, the use of these optimizations is highly recomended.
nucleR最强大的功能之一是基因组数据的FFT的有效实施。这是允许的傅立叶变换的最佳性能achived trought一些调整。这包括范围由过滤,裸露区域的自动检测,过滤器的窗口执行,直到最近2功率(这样可以保证最佳的FFT由于元件的高分解的情况下)的数据和填充。内部测试表明,在特定的数据集,这些优化导致一个数量级的显着改善(从3天到几秒钟),同时保持与本地fft检测和我们的filterFFT更高的相关性大于0.99。因此,这些优化的使用,建议。
If for some reason you want to apply the function without any kind of optimizations you can specify the parameter useOptim=FALSE to bypass them and get the pure knockout inverse from native FFT call. All other parameters can be still applyied in this case.
如果由于某种原因,你要应用的功能,没有任何优化,你可以指定参数useOptim=FALSE绕过他们从本地的FFT调用纯淘汰赛制逆。其他所有参数,仍可applyied在这种情况下。
值----------Value----------
Numeric vector with cleaned/smoothed values
清洗/平滑值与数字矢量
作者(S)----------Author(s)----------
Oscar Flores <a href="mailtoflores@mmb.pcb.ub.es">oflores@mmb.pcb.ub.es</a>, David Rosell <a href="mailto:david.rossell@irbbarcelona.org">david.rossell@irbbarcelona.org</a>
参考文献----------References----------
举例----------Examples----------
#Load example data, raw hybridization values for Tiling Array[加载平铺阵列数据,例如原料杂交值]
raw_data = get(data(nucleosome_tiling))
#Filter data[筛选数据]
fft_data = filterFFT(raw_data, pcKeepComp=0.01)
#See both profiles[看到这两个配置文件]
par(mfrow=c(2,1), mar=c(3, 4, 1, 1))
plot(raw_data, type="l", xlab="position", ylab="Raw intensities")
plot(fft_data, type="l", xlab="position", ylab="Filtered intensities")
#The power spectrum shows a visual representation of the components[功率谱显示了组件的可视化表示]
fft_data = filterFFT(raw_data, pcKeepComp=0.01, showPowerSpec=TRUE)
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|