R语言 seqinr包 oriloc()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-9-30 01:21:13

oriloc(seqinr)
oriloc()所属R语言包：seqinr

 Prediction of origin and terminus of replication in bacteria.
 细菌复制的原点和终点的预测。

 译者：生物统计家园网机器人LoveR

描述----------Description----------

This program finds the putative origin and terminus of replication in procaryotic genomes. The program discriminates between codon positions.
此程序找到公认的原产地和终点在原核基因组的复制。的程序区分密码子的位置。

用法----------Usage----------

oriloc(seq.fasta = system.file("sequences/ct.fasta", package ="seqinr"),
g2.coord = system.file("sequences/ct.predict", package = "seqinr"),
glimmer.version = 3,
oldoriloc = FALSE, gbk = NULL, clean.tmp.files = TRUE, rot = 0)

参数----------Arguments----------

参数：seq.fasta
Character: the name of a file which contains the DNA sequence of a bacterial chromosome in fasta format. The default value, system.file("sequences/ct.fasta", package ="seqinr"), is to use the fasta file ct.fasta which is distributed in the sequences folder in the seqinR package. This is the file for the complete genome sequence of Chlamydia trachomatis that was used in Frank and Lobry (2000). You can replace this by something like seq.fasta = "myseq.fasta" to work with your own data if the file myseq.fasta is present in the current working directory (see getwd), or give a full path access to the sequence file (see file.choose).
特点：含有的细菌染色体DNA序列的FASTA格式的文件，该文件的名称。默认值，system.file("sequences/ct.fasta", package ="seqinr")，使用fasta格式文件ct.fasta分布在sequences文件夹在seqinR包。这是文件的完整基因组序列的沙眼衣原体，在弗兰克和Lobry的（2000）。您可以将这个类似seq.fasta = "myseq.fasta"工作的文件myseq.fasta是在当前工作目录（见getwd），或给一个完整的路径访问到自己的数据序列文件（见file.choose“）。

参数：g2.coord
Character: the name of file which contains the output of glimmer program (*.predict in glimmer version 3)
特点：文件的名称，其中包含一丝程序的输出（*.predict的一丝第3版）

参数：glimmer.version
Numeric: glimmer version used, could be 2 or 3
数字：一丝版本使用，可将2个或3个

参数：oldoriloc
Logical: to be set at TRUE to reproduce the (deprecated) outputs of previous (publication date: 2000) version of the oriloc program.
逻辑：被设置为TRUE，重现以前的（已废弃）输出（发布日期：2000年）版本的oriloc程序。

参数：gbk
Character: the URL of a file in GenBank format. When provided oriloc use as input a single GenBank file instead of the seq.fasta and the g2.coord. A local temporary copy of the GenBank file is made with download.file if gbk starts with http:// or ftp:// or file:// and whith file.copy otherwise. The local copy is then used as input for gb2fasta and gbk2g2 to produce a fasta file and a glimmer-like (version 2) file, respectively, to be used by oriloc instead of seq.fasta and g2.coord .
特点：在GenBank格式的文件的URL。当提供oriloc使用作为输入序列文件，而不是seq.fasta和g2.coord。局部临时在GenBank文件的副本与download.file如果gbk http://或ftp://或file://和杀虫file.copy否则开始。的本地副本，然后作为输入为gb2fasta和gbk2g2产生一个FASTA文件和微光像（第2版）文件，分别要使用的oriloc，而不是seq.fasta的和g2.coord。

参数：clean.tmp.files
Logical: if TRUE temporary files generated when working with a GenBank file are removed.
逻辑：在工作时产生一个序列文件，如果的TRUE临时文件将被删除。

参数：rot
Integer, with zero default value, used to permute circurlarly the genome.
整数，缺省值为零，用来重排circurlarly的基因组。

Details

详细信息----------Details----------

The method builds on the fact that there are compositional asymmetries between the leading and the lagging strand for replication. The programs works only with third codon positions so as to increase the signal/noise ratio. To discriminate between codon positions, the program use as input either an annotated genbank file, either a fasta file and a glimmer2.0 (or glimmer3.0) output file.
该方法的基础上的事实，有组成不对称性的领先和滞后链的复制。该方案仅适用于密码子第三位置，以便增加的信号/噪声比。要区分密码子的位置，用作输入程序使用带注释的序列文件，无论是fasta格式文件和一个glimmer2.0的（或glimmer3.0）输出文件。

值----------Value----------

A data.frame with seven columns: g2num for the CDS number in the g2.coord file, start.kb for the start position of CDS expressed in Kb (this is the position of the first occurence of a nucleotide in a CDS regardless of its orientation), end.kb for the last position of a CDS, CDS.excess for the DNA walk for gene orientation (+1 for a CDS in the direct strand, -1 for a CDS in the reverse strand) cummulated over genes, skew for the cummulated composite skew in third codon positions, x for the cummulated T - A skew in third codon position, y for the cummulated C - G skew in third codon positions.
一个数据框与七栏：g2numg2.coord文件，在CDS数start.kbKB表达的CDS（这是第一次出现的位置的起始位置的核苷酸无论其方向在CDS），end.kb的最后一个位置的CDS，CDS.excess的DNA基因定位步行（+1，-1的CDS中的直链一个CDS在反链）cummulated的多基因，skew在密码子第三位置的cummulated复合歪斜，x在第三位点的扭，cummulated T - y的cummulated C - G在密码子第三位置的偏移。

注意----------Note----------

The method works only for genomes having a single origin of replication from which the replication is bidirectional. To detect the composition changes, a DNA-walk is performed. In a 2-dimensional DNA walk, a C in the sequence corresponds to the movement in the positive y-direction and G to a movement in the negative y-direction. T and A are mapped by analogous steps along the x-axis. When there is a strand asymmetry, this will form a trajectory that turns at the origin and terminus of replication. Each step is the sum of nucleotides in a gene in third codon positions. Then orthogonal regression is used to find a line through this trajectory. Each point in the trajectory will have a corresponding point on the line, and the coordinates of each are calculated. Thereafter, the distances from each of these points to the origin (of the plane), are calculated. These distances will represent a form of cumulative skew. This permets us to make a plot with the gene position (gene number, start or end position) on the x-axis and the cumulative skew (distance) at the y-axis. Depending on where the sequence starts, such a plot will display one or two peaks. Positive peak means origin, and negative means terminus. In the case of only one peak, the sequence starts at the origin or terminus

（作者）----------Author(s)----------

J.R. Lobry and A.C. Frank

参考文献----------References----------

are available there: http://pbil.univ-lyon1.fr/software/Oriloc/howto.html. 
http://pbil.univ-lyon1.fr/software/Oriloc/index.html. 
Frank, A.C., Lobry, J.R. (2000) Oriloc: prediction of replication boundaries in unannotated bacterial chromosomes. Bioinformatics, 16:566-567. http://bioinformatics.oupjournals.org/cgi/reprint/16/6/560 
Lobry, J.R. (1999) Genomic landscapes. Microbiology Today, 26:164-165. http://www.socgenmicrobiol.org.uk/QUA/049906.pdf 
Lobry, J.R. (1996) A simple vectorial representation of DNA sequences for the detection of replication origins in bacteria. Biochimie, 78:323-326. 
in prokaryotic genome, is decribed at http://www.cbcb.umd.edu/software/glimmer/. For a description of Glimmer 1.0 and 2.0 see: 
Improved microbial gene identification with GLIMMER, Nucleic Acids Research, 27:4636-4641. 
Microbial gene identification using interpolated Markov models, Nucleic Acids Research, 26:544-548. 

参见----------See Also----------

draw.oriloc, rearranged.oriloc
draw.oriloc，rearranged.oriloc

实例----------Examples----------

## Not run: [＃不运行：]
#[]
# A little bit too long for routine checks because oriloc() is already[有点太长了例行检查，因为已经oriloc（）]
# called in draw.oriloc.Rd documentation file. Try example(draw.oriloc)[要求在draw.oriloc.Rd文档文件。尝试例如（draw.oriloc）]
# instead, or copy/paste the following code:[而不是，或复制/粘贴下面的代码：]
#[]
out <- oriloc()
plot(out$st, out$sk, type = "l", xlab = "Map position in Kb",
ylab = "Cumulated composite skew",
main = expression(italic(Chlamydia~~trachomatis)~~complete~~genome))
#[]
# Example with a single GenBank file:[用一个序列文件的例子：]
#[]
out2 <- oriloc(gbk=system.file("sequences/ct.gbk", package = "seqinr"))
draw.oriloc(out2)
#[]
# (some warnings are generated because of join in features and a gene that[（生成一些警告，因为加入功能的基因，]
# wrap around the genome)[环绕的基因组）]
#[]

## End(Not run)[＃（不执行）]

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册

R语言 seqinr包 oriloc()函数中文帮助文档(中英文对照)

浏览过的版块