R语言 PAnnBuilder包 crossBuilder_DB()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-26 10:18:09

crossBuilder_DB(PAnnBuilder)
crossBuilder_DB()所属R语言包：PAnnBuilder

                                    Build Data Packages for Protein ID Mapping
                                       建立蛋白质ID映射的数据包

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

This function creates a data package with the protein id mapping stored  as R environment objects in the data directory.
该函数创建一个数据包与蛋白质ID R环境中的数据目录对象存储的映射。

用法----------Usage----------

crossBuilder_DB(src = c("sp","ipi","gi"), organism,
         blast, match,
         prefix, pkgPath, version, author
         )
fasta2list(type, srcUrl,organism="")
idBlast(query, subject, blast, match)

参数----------Arguments----------

参数：src
a character vector that can be "sp", "trembl", "ipi" or "gi"  to indicate which protein sequence databases will be used.
字符向量，可以是“SP”，“TrEMBL这”，“IPI”或“GI”，表明将用于蛋白质序列数据库。

参数：organism
a character string for the name of the organism of concern. (eg: "Homo sapiens")
为关注的有机体的名称的字符串。（例如：“智人”）

参数：blast
a named character vector defining the parameters of blastall.
一个名为特征向量定义blastall的参数。

参数：match
a named character vector defining the parameters of two sequence  matching.
一个命名的特征向量确定两个序列匹配的参数。

参数：prefix
the prefix of the name of the data package to be built. (e.g.  "hsaSP"). The name of builded package is prefix+".db".
兴建的数据包名称的前缀。（例如“hsaSP”）。建造包的名称为前缀+“DB”。

参数：pkgPath
a character string for the full path of an existing directory where the built backage will be stored.
为建backage将存储现有目录的完整路径的字符串。

参数：version
a character string for the version number.
一个版本号的字符串。

参数：author
a list with named elements "authors" containing a character vector of author names and "maintainer" containing the complete character string for the maintainer field, for example, "Jane Doe <jdoe@doe.com>".
与元素命名为“作者”包含作者姓名的特征向量和“维护者”，含有完整的字符串维护者领域，例如，“Jane Doe的<jdoe@doe.com>”的名单。

参数：type
a character string for the type of sequence data file, can be  "sp", "trembl", "ipi" or "gi"
序列数据文件类型为字符串，可以是“SP”，“TrEMBL这”，“IPI”或“GI”

参数：srcUrl
a character string for the url where sequence data file with  fasta format will be retained.
为FASTA格式的序列数据文件将被保留的URL字符串。

参数：query
a named vector of query sequences
命名为向量的查询序列

参数：subject
a named vector of subject sequences
命名为向量的主题序列

Details

详情----------Details----------

Build annotation data packages for protein id mapping. formatdb and blastall  are need to be installed.
建立蛋白质ID映射的注释数据包。 formatdb和blastall都需要安装。

Parameter "blast" is a named character vector defining the parameters  of blastall. Possible names and their meaning are listed as follows: p:  Program Name [String]. e:  Expectation value (E) [Real]. M:  Matrix [String]. W:  World Size, default if zero (blastn 11, megablast 28, all others 3)  [Integer] default = 0. G:  Cost to open a gap (-1 invokes default behavior) [Integer]. E:  Cost to open a gap (-1 invokes default behavior) [Integer]. U:  Use lower case filtering of FASTA sequence [T/F]  Optional. F:  Filter query sequence (DUST with blastn, SEG with others) [String].
“爆炸”的参数是一个命名的特征向量定义blastall的参数。可能名称和它们的含义如下：号码：项目名称字符串]。 E：期望值（五）[房地产]。男：矩阵[字符串]。女：世界的大小，默认情况下，如果为零（BLASTN，megablast 11月28日，所有其他3）[整数]默认值= 0。 G：打开一个缺口成本（-1调用默认行为）[整数]。电子邮件：成本打开了一个缺口（-1调用默认行为）[整数]。 üFASTA格式序列的筛选[T / F]可选：使用小写。传真：过滤查询序列（BLASTN灰尘，与他人赛格）[字符串]。

Parameter "match" a named character vector defining the parameters of  two sequence matching. Possible names and their meaning are listed as follows: e:  Expectation value of two sequence matching [Real]. c:  Coverage of the longest High-scoring Segment Pair (HSP) to the whole  protein sequence. (range: 0~1) i:  Identity of the longest High-scoring Segment Pair (HSP). (range: 0~1)
参数“匹配”命名的特征向量确定两个序列匹配的参数。列出可能的名字和它们的含义如下：E：两个序列匹配的期待值[实时]。 C：最长段对高得分（HSP）的整个蛋白质序列的覆盖率。（范围：0~1）我：最长的高得分段对（HSP）的身份。（范围：0~1）

Data files in the database will be automatically downloaded to the tmp directory, so enough space is needed for the data files. After downloading, files are parsed by perl, so perl must be installed.  It may  take a long time to parse database and build R package. Alternatively, we have  produced diverse R packages by PAnnBuilder, and you can download appropriate  package via http://www.biosino.org/PAnnBuilder.
数据库中的数据文件将被自动下载到tmp目录，以便有足够的空间所需的数据文件。下载后，文件是由Perl解析，所以必须安装perl的。这可能需要很长的时间解析数据库和建立R包。另外，我们由PAnnBuilder生产多样化的R包，你可以通过http://www.biosino.org/PAnnBuilder下载相应的包。

作者（S）----------Author(s)----------

Hong Li

举例----------Examples----------

# Set path, version and author for the package.[设置包的路径，版本和作者。]
pkgPath <- tempdir()
version <- "1.0.0"
author <- list()
author[["authors"]] <- "Hong Li"
author[["maintainer"]] <- "Hong Li <sysptm@gmail.com>"

# Set parameters for sequence similarity.[为序列相似性的参数。]
blast <- c("blastp", "10.0", "BLOSUM62", "0", "-1", "-1", "T", "F")
names(blast) <- c("p","e","M","W","G","E","U","F")
match <- c(0.00001, 0.95, 0.95)
names(match) <- c("e","c","i")

## It may take a long time to parse database and build R package.[＃这可能需要很长一段时间解析数据库和建立R包。]
# Build annotation data packages "org.Hs.cross" for id mapping of three major [建立ID映射的三大注释数据包“org.Hs.cross”]
# protein sequence databases.[蛋白质序列数据库。]
if(interactive()){
crossBuilder_DB(src=c("sp","ipi","gi"), organism="Homo sapiens",
                  blast, match,
                  prefix="org.Hs.cross", pkgPath, version, author)
}

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册

R语言 PAnnBuilder包 crossBuilder_DB()函数中文帮助文档(中英文对照)

浏览过的版块