R语言 BioSeqClass包 featurePSSM()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-25 13:42:28

featurePSSM(BioSeqClass)
featurePSSM()所属R语言包：BioSeqClass

                                    Feature Coding
                                       特征编码

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

A set of functions for extract features from biological sequences, and coding  features by numeric vector.
提取生物序列特征的功能，并通过数字矢量编码功能。

用法----------Usage----------

  featurePSSM(seq, start.pos, stop.pos, psiblast.path, database.path)

参数----------Arguments----------

参数：seq
a string vector for the protein, DNA, or RNA sequences.
为蛋白质，DNA或RNA序列的字符串向量。

参数：start.pos
a integer vector denoting the start position of the fragment window.
一个整数向量表示的片段窗口的起始位置。

参数：stop.pos
a integer vector denoting the stop position of the fragment window.
一个整数向量表示停止位置的片段窗口。

参数：psiblast.path
a string for the path of blastpgp program.  blastpgp will be employed to do PSI-BLAST and get Position-Specific  Scoring Matrix.
blastpgp程序的路径字符串。 blastpgp将做PSI-BLAST的具体位置打分矩阵。

参数：database.path
a string for the path of a formated reference database. Database can be formated by "formatdb" program.
路径的字符串格式化的参考数据库。数据库可以格式化由formatdb“节目。

Details

详情----------Details----------

featurePSSM returns a matrix with 20*N+N columns. Each row  represented features of one sequence coding by a 20*N+N dimension numeric  vector generated by PSI-BLAST. It contains two kinds of fatures: normalized  position-specific score of PSSM (Position-Specific Scoring Matrix), Shannon  entropy for each position of WOP (weighted observed percentages). Program  PSI-BLAST and formatted NCBI non-redundant protein database are needed.
featurePSSM20 * N + N的列返回一个矩阵。每一行代表一个20 *的N + N维的PSI-BLAST生成的数字向量编码序列的特点。它包含两种素质特征：规范化的位置，具体的PSSM得分（具体位置打分矩阵），香农熵为每个的WOP位置（加权观察百分比）。方案的PSI-BLAST和格式化NCBI的非冗余蛋白质数据库是必要的。

作者（S）----------Author(s)----------

Hong Li

举例----------Examples----------

if(interactive()){
  file = file.path(.path.package("BioSeqClass"), "example", "acetylation_K.fasta")
  tmp = readFASTA(file)
  proteinSeq = sapply(tmp,function(x){x[["seq"]]})
  names(proteinSeq) = sapply(tmp,function(x){x[["desc"]]})

  ## Need "blastpgp" program and a formated database. Database can be formated by "formatdb" program.[＃需要的“blastpgp”计划和格式化的数据库。数据库可以格式化由formatdb“节目。]
  PSSM1 = featurePSSM(proteinSeq[1:2], start.pos=rep(1,2), stop.pos=rep(10,2), psiblast.path="blastpgp", database.path="./result1.fasta")
}

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册