cosmo(cosmo)
cosmo()所属R语言包:cosmo
Constrained motif detection main function
约束的主题检测的主要功能
译者:生物统计家园网 机器人LoveR
描述----------Description----------
cosmo searches a set of unaligned DNA sequences for a shared motif that may, for example, represent a common transcription factor binding site. The algorithm is similar to MEME, but also allows the user to specify a set of constraints that the position weight matrix of the unknown motif must satisfy. Such constraints may include bounds on the information content across certain regions of the unknown motif, for example, and can often be formulated on the basis of prior knowledge about the structure of the transcription factor in question.
COSMO搜索共享的主题,例如,代表一个共同的转录因子结合位点的未对齐的DNA序列集。该算法是类似于MEME,但也允许用户指定一个未知的主题位置权重矩阵必须满足的约束集。这样的限制可能包括对未知的主题,例如,在某些区域的信息内容的界限,往往可以对有关问题的转录因子的结构的先验知识的基础上制定的。
用法----------Usage----------
cosmo(seqs="browse",constraints="None", minW=6, maxW=15,
models = "ZOOPS", revComp = TRUE, minSites = NULL, maxSites = NULL,
starts = 5, approx = "over", cutFac = 5, wCrit = "bic",
wFold = 5, wTrunc = 100, modCrit = "lik", modFold = 5, modTrunc = 100,
conCrit = "likCV", conFold = 5, conTrunc = 90, intCrit = "lik",
intFold = 5, intTrunc = 100, maxIntensity = FALSE, lstarts = FALSE,
backSeqs = NULL, backFold = 5, bfile = NULL, transMat = NULL,
order = NULL, maxOrder=6, silent = FALSE)
参数----------Arguments----------
参数:seqs
This argument specifies the sequences to be analyzed. If seqs == "browse", a browser appears that allows the user to select a file that contains the sequences in FASTA format. If seqs is another character string, it is assumed to give the path to a FASTA file containing the sequences of interest. Lastly, seqs may be a list with each element representing a sequence in the form of a single string such as "ACGTAGCTAG" ("seq" entry) and a description ("desc" entry).
此参数指定的序列进行分析。如果seqs ==“浏览”,会出现一个浏览器,允许用户选择一个文件,该文件包含在FASTA格式的序列。 ,如果seqs是另一个字符串,它是假定给一个FASTA文件,其中包含利息序列路径。最后,seqs可能代表如的“ACGTAGCTAG”(“SEQ”项目)和描述(“desc”的条目)的单个字符串的形式序列与每个元素的列表。
参数:constraints
These are the constraints that are to be imposed on the unknown motif. If constraints == "None", cosmo() will be run without constraints. If constraints == "GUI" and the cosmoGUI package has been installed, a GUI will pop up that allows the user to interactively create a set of constraints, either from scratch or on the basis of several templates of interest. If constraints is another character string, it is assumed to give the path to a file that contains the constraint definitions in the standard text format (see http://cosmoweb.berkeley.edu/constraints.html). Lastly, constraints may be an object of class constraintSet or a list of such objects that defines the constraints of interest.
这是未知的主题是要施加的限制。如果限制==“没有”,COSMO()将无约束运行。如果限制==“GUI”的cosmoGUI包已安装,图形用户界面将弹出,允许用户以交互方式创建了一套约束,从头开始或几个模板利益的基础上。如果约束是另一个字符串,它是假定给到一个文件,它包含在标准文本格式(见http://cosmoweb.berkeley.edu/constraints.html)的约束定义的路径。最后,限制可能是一个对象的类constraintSet或定义感兴趣的约束的对象名单。
参数:minW
numeric indicating the minimum motif width to consider
numeric表示最低的图案宽度考虑
参数:maxW
numeric indicating the maximum motif width to consider
numeric显示最大的图案宽度考虑
参数:models
character a vector containing the different models to be considered for the distribution of motif occurrences ("OOPS", "ZOOPS", and "TCM"). The One-Occurrence-Per-Sequence (OOPS) model assumes that each sequence contains exactly one occurrence of the motif. The Zero-or-One-Occurrences-Per-Sequence model allows zero or one occurrences of the motif on a given sequence. The Two-Compoment-Mixture (TCM) model allows an arbitrary number of motif occurrences on a given seqence.
character矢量分布图案出现的“哎呀”,“ZOOPS”,“中医”被视为不同型号。每一个发生序列(OOPS)的模型假设,每个序列包含一个主题的发生。零或一的纪录,每序列模型允许在一个给定序列的零个或一个出现的主题。 (中医)双Compoment混合模型允许任意数量的图案出现在一个给定的SSR遗传多样性研究。
参数:revComp
logical indicating whether motifs are allowed to occur in the reverse complement orientation.
logical指示图案是否被允许出现在反向互补方向。
参数:minSites
numerical The minimum number of motif occurrences in the input sequences (default: 2)
numerical图案出现在输入序列的最低数量(默认值:2)
参数:maxSites
numerical The maximum number of motif occurrences in the input sequences (default: MIN(5*number of sequences, 50))
numerical图案出现的最大数量,在输入序列(默认:闵(5 *号的序列,50))
参数:starts
numerical number of starting values to use for each optimization
numerical初始值用于每个优化
参数:approx
approximation for TCM likelihood; one of "over", "cut", "exact"
“精确”逼近“过度”,“一刀切”,中医的可能性;
参数:cutFac
numerical if TCM model is approximated by over or cut models, subsequences are of length cutFac * motif width
numerical如果以上或切割模型近似中医模型,子序列的长度cutFac *图案宽度
参数:wCrit
Criterion for choosing the motif width. This can be either "lik" for the likelihood, "aic" for Akaike's Information Criterion, "bic" for the Bayesian Information Criterion, "eval" for the E-value of the alignment of the predicted motif sites, or "likCV" for likelihood-based cross-validation.
图案宽度为选择标准。这可能不是“力”的可能性,“AIC”赤池信息准则,贝叶斯信息准则“BIC”,“eval”的E值的预测主题网站的对齐,或“likCV”的可能性为基础的交叉验证。
参数:wFold
numerical cross-validation fold for selecting motif width
numerical折交叉验证选择图案宽度
参数:wTrunc
numerical truncate loss-function for selecting motif width to this percentile (1-100)
numerical截断亏损功能选择图案宽度百分(1-100)
参数:modCrit
Criterion for choosing the model type. This can be either "lik" for the likelihood, "aic" for Akaike's Information Criterion, "bic" for the Bayesian Information Criterion, "eval" for the E-value of the alignment of the predicted motif sites, or "likCV" for likelihood-based cross-validation.
模型类型的选择标准。这可能不是“力”的可能性,“AIC”赤池信息准则,贝叶斯信息准则“BIC”,“eval”的E值的预测主题网站的对齐,或“likCV”的可能性为基础的交叉验证。
参数:modFold
numerical cross-validation fold for selecting the model type
numerical折交叉验证选择模型类型
参数:modTrunc
numerical truncate loss-function for selecting model type to this percentile (1-100)
numerical截断亏损功能选择模型类型百分(1-100)
参数:conCrit
Criterion for choosing the constraint set. This can be either "lik" for the likelihood, "eval" for the E-value of the alignment of the predicted motif sites, "likCV" for likelihood-based cross-validation, or "pwmCV" for cross-validation based on the Euclidean norm between two position weight matrices.
约束集的选择标准。这可以是“力”的可能性,“eval”的为“likCV”的可能性为基础的交叉验证,预测主题网站对齐,或的“pwmCV”的E值交叉验证的基础上的两个位置权重矩阵之间的欧几里德范数。
参数:conFold
numerical cross-validation fold for selecting the constraint set (likelihood cross-validation only).
numerical倍交叉验证选择的约束集(可能性交叉验证)。
参数:conTrunc
numericaltruncate loss-function for selecting constraint set to this percentile (1-100)
numerical截断亏损功能选择设置这个百分约束(1-100)
参数:intCrit
Criterion for estimating the intensity parameter in the ZOOPS or TCM model. This can be either "lik" for the likelihood, "aic" for Akaike's Information Criterion, "bic" for the Bayesian Information Criterion, or "eval" for the E-value of the alignment of the predicted motif sites.
在ZOOPS或中医模型的强度参数估算标准。这可以是“力”的可能性,“AIC”赤池信息准则,贝叶斯信息准则“BIC”,或“eval”的E值的预测主题网站的对齐。
参数:intFold
numerical cross-validation fold for selecting the intensity parameter
numerical折交叉验证选择的强度参数
参数:intTrunc
numerical truncate loss-function for selecting intensity parameter to this percentile (1-100)
numerical截断亏损功能强度参数选择(1-100)百分
参数:maxIntensity
logical maximize likelihood function with respect to intensity parameter (in ZOOPS or TCM model) instead of using profiling approach?
logical似然函数最大化方面的强度参数,而不是使用分析方法(在ZOOPS或中医模型)?
参数:lstarts
logical should likelihood-based starting values be used rather than E-value-based starting values?
logical应的可能性为基础的价值观出发,而不是基于E值的初始值?
参数:backSeqs
This argument specifies the sequences that are to be used to estimate the background Markov model. If backseqs == NULL, the background model is estimated from the sequences supplied in the seqs argument. If backSeqs == "browse", a browser appears that allows the user to select a file that contains the sequences in FASTA format. If backSeqs is another character string, it is assumed to give the path to a FASTA file containing the sequences of interest. Lastly, backSeqs may be a list with each element representing a sequence in the form of a single string such as "ACGTAGCTAG" ("seq" entry) and a description ("desc" entry).
此参数指定的序列被用来估计背景马尔可夫模型。如果backseqs == NULL,背景模型估计序列提供在seqs参数。如果backSeqs ==“浏览”,会出现一个浏览器,允许用户选择一个文件,该文件包含在FASTA格式的序列。 ,如果backSeqs是另一个字符串,它是假定给一个FASTA文件,其中包含利息序列路径。最后,backSeqs可能代表如的“ACGTAGCTAG”(“SEQ”项目)和描述(“desc”的条目)的单个字符串的形式序列与每个元素的列表。
参数:backFold
numerical cross-validation fold for selecting order of background Markov model.
numerical倍交叉验证选择背景马尔可夫模型的顺序。
参数:bfile
character The name of a MEME-style background file for specifying the background Markov model. Such a file lists the frequencies of all tuples of all possible tuples of length up to order + 1. See the help file on the function bfile2tmat() for an example.
character一个指定的背景马尔科夫模型的的MEME风格背景文件的名称。这样一个文件列出了所有元组的长度可达所有可能的元组的频率令+ 1。功能bfile2tmat()一个例子,请参见帮助文件。
参数:transMat
The transition matrix to use for the background Markov model. This is a list of matrices, with the first matrix given the transition probabilities for the 0th order Markov model, the second matrix giving the transition probabilities for a 1st order Markov model, and so on. The entry in cell(i,j) of a k-th order transition matrix gives the probability of observing the nucleotide in column j given that the previous k nucleotides are equal to those in row i. Type 'data(transMats)' to look at an example. The function bgModel can be used to obtain a transition matrix from a set of sequences that can be used for this argument. The function bfile2tmat may be used to obtain a transition matrix from a MEME-style background file.
使用背景马尔可夫模型的过渡矩阵。这是一个矩阵的名单,与第0阶Markov模型,给人的第一阶Markov模型的转移概率矩阵,等过渡概率的第一方阵。在进入单元的一个k阶转移矩阵(I,J)给以前ķ核苷酸行我的人都是平等的观察给定的J列的核苷酸的概率。类型的数据(transMats)“看一个例子。功能bgModel可以用来获得一个过渡矩阵,从一组序列可以使用这种说法。功能bfile2tmat可以用来获得一个过渡矩阵,从的MEME风格背景文件。
参数:order
numerical order of Markov background model
numerical为了马尔可夫背景模型
参数:maxOrder
numerical maximum order to consider for Markov background model
numerical最大的马尔可夫背景模型,以考虑
参数:silent
logical suppress output?
logical抑制输出呢?
值----------Value----------
An object of class cosmo, returning all the results of the motif detection analysis.
类cosmo的对象,返回所有的主题检测分析的结果。
作者(S)----------Author(s)----------
Oliver Bembom, <a href="mailto:bembom@berkeley.edu">bembom@berkeley.edu</a>, Fabian Gallusser, <a href="mailto:fgallusser@berkeley.edu">fgallusser@berkeley.edu</a>
参考文献----------References----------
"Supervised Detection of Conserved Motifs in DNA Sequences with cosmo" (2007). Statistical Applications in Genetics and Molecular Biology:
参见----------See Also----------
bgModel, bfile2tmat
bgModel,bfile2tmat
举例----------Examples----------
## initialize constraint set[#初始化约束集]
## consisting of three intervals[#包括三个区间]
## 1st and 3rd intervals are 3bp long[排名第一和第三的时间间隔长3BP]
## middle interval is variable lenght[#中间的间隔是可变长度]
conSet <- makeConSet(numInt=3, type=c("B","V","B"),length=c(3,NA,3))
## construct two bound constraints[#兴建两界约束]
boundCon1 <- makeBoundCon(lower=1.0, upper=2.0)
boundCon2 <- makeBoundCon(lower=0.0, upper=1.0)
## construct palindromic constraint[#建设回文约束。]
## require intervals 1 and 3 to be palindromes[#需要间隔1和3是回文]
## to within 0.05 tolerance[#在0.05容忍]
palCon1 <- makePalCon(int1=1, int2=3, errBnd=0.05)
## add constraints to initial constraint set[#添加约束初始约束集]
constraint <- list(boundCon1, boundCon2, palCon1)
int <- list(1, 2, NA)
conSet <- addCon(conSet=conSet, constraint=constraint, int=int)
## path to example sequence file in FASTA format[#例如,在FASTA格式的序列文件的路径]
seqFile <- system.file("Exfiles","seq.fasta",package="cosmo")
## search for motifs of width 8[#搜寻图案宽度为8]
## assume zero or one occurrences of motif per sequence (ZOOPS)[#假设零次或一次出现的图案每序列(ZOOPS)]
res <- cosmo(seqs=seqFile, constraints=conSet, minW=8, maxW=8, models="ZOOPS")
plot(res)
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|