sagelibrary.simulate(sagenhaft)
sagelibrary.simulate()所属R语言包:sagenhaft
Simulate SAGE libraries
模拟耆库
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Function to simulate SAGE libraries with sequencing errors.
函数来模拟SAGE文库测序错误。
用法----------Usage----------
sagelibrary.simulate(taglength = 4, lambda = 1000, mean.error = 0.01,
error.sd = 1, withintagerror.sd = 0.2,
ngenes = min(4^taglength, 1e+05), base.lib = NULL,
libseed = -1, ...)
参数----------Arguments----------
参数:taglength
Tag length for library.
图书馆的标签长度。
参数:lambda
Aproximate size of library.
aproximate大小图书馆。
参数:mean.error
Mean amount of sequencing errors.
平均测序错误的金额。
参数:error.sd
Standard deviation for sequencing errors.
测序错误的标准偏差。
参数:withintagerror.sd
Standard deviation for sequencing errors within tags.
内标签测序误差的标准偏差。
参数:ngenes
Number of genes to generate tags from.
一些基因产生的标签。
参数:base.lib
Simulate library based on tags in other lib and create variations.
模拟库基于其他库中的标签和创造变化。
参数:libseed
Seed for random number generator.
随机数发生器的种子。
参数:...
Arguments passed to em.estimate.
参数传递em.estimate。
Details
详情----------Details----------
We set the number of possible transcripts and assign a random SAGE tag to each of them out of all 4\^taglength possible SAGE tags. For each SAGE tag a random proportion p within the library is generated from a log-normal distribution, and the proportions are then adjusted to have a sum of 1. The true counts of a tag are simulated by sampling from Poisson distributions with parameters p lambda, where p is the proportion of the tag in the library and lambda is a parameter for setting the size of the library. The simulation of the sequencing errors is done on each individual occurrence of a tag sequence. For each tag sequence a mean sequencing quality value is generated from a log-normal distribution. The individual quality values for each base are then generated from log-normal distributions with means equal to the simulated sequencing quality values for the tag sequences. We have noticed that with experimentally generated data the within tag sequence variation of sequencing quality values is usually about 1/5 of the between tag sequence variation. From each true tag sequence one observed tag sequence is generated using the simulated quality values of the true sequence as the multinomial probabilities, i.e. replacing each base with either one of the 3 other bases with the probability specified by the sequencing quality value of that base. The counts of these generated tags are then summed to represent the observed tags. When generating several simulated libraries for comparisons, we use the same proportions of the genes for all libraries, replacing up to 1/3 of the proportions by proportions with a known differential factor.
我们设置可能的成绩单,和他们每个人,所有4个\ ^ taglength的可能的SAGE标签分配一个随机的SAGE标签。图书馆内的随机比例p对于每个SAGE标签产生从数正态分布,调整的比例,然后有一笔1。标签的真正罪名是模拟抽样泊松分布参数p的lambda,其中p是标签库中的比例和lambda是一个参数设置库的大小。个别发生每一个标签序列测序错误进行了仿真。平均每个标签序列测序质量值会产生从数正态分布。每个碱基的个别质量值,然后生成数正态分布模拟标签序列测序质量值意味着平等。我们已经注意到,与实验产生的数据在标签序列测序质量值的变化通常是标签序列变异之间的约1/5。从每一个真实的标签序列生成一个观察标签序列使用的多项式概率模拟的真实序列的质量值,即每个碱基取代任一3与其他碱基该碱基的测序质量值指定的概率。这些生成的标签计数,然后总结代表观察到的标签。当生成几个模拟库进行比较,我们使用的所有库的基因相同的比例,取代了与已知差因素的比例为1/3的比例。
作者(S)----------Author(s)----------
Tim Beissbarth
参考文献----------References----------
<h3>See Also</h3> <code>sage.library</code>,
举例----------Examples----------
library(sagenhaft)
testlib1 <- sagelibrary.simulate(taglength=10, lambda=10000,
mean.error=0.01)
testlib2 <- sagelibrary.simulate(taglength=10, lambda=20000,
mean.error=0.02, base.lib=testlib1)
testlib3 <- sagelibrary.simulate(taglength=10, lambda=10000,
mean.error=0.01, libseed=testlib1$seed)
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|