FindChimeras(DECIPHER)
FindChimeras()所属R语言包:DECIPHER
Find Chimeras In A Sequence Database
在序列数据库中查找嵌合体
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Finds chimeras present in a database of sequences. Makes use of a reference database of (presumed to be) good quality sequences.
发现序列数据库中的嵌合体。利用一个参考数据库(推定)质量好序列。
用法----------Usage----------
FindChimeras(dbFile,
tblName = "DNA",
dbFileReference,
batchSize = 100,
minNumFragments = 20000,
tb.width = 5,
multiplier = 20,
minLength = 70,
minCoverage = 0.6,
overlap = 200,
minSuspectFragments = 6,
showPercentCoverage = FALSE,
add2tbl = FALSE,
verbose = TRUE)
参数----------Arguments----------
参数:dbFile
A SQLite connection object or a character string specifying the path to the database file to be checked for chimeric sequences.
一个SQLite连接对象或一个字符串指定的嵌合序列被检查数据库文件的路径。
参数:tblName
Character string specifying the table in which to check for chimeras.
字符串指定为嵌合体检查表中。
参数:dbFileReference
A SQLite connection object or a character string specifying the path to the reference database file of (presumed to be) good quality sequences. A 16S reference database is available from DECIPHER.cee.wisc.edu.
一个SQLite连接对象或字符串指定路径的参考数据库文件(推定)质量好的序列。 16S参考数据库是从DECIPHER.cee.wisc.edu提供。
参数:batchSize
Number sequences to tile with fragments at a time.
数字序列平铺在时间的碎片。
参数:minNumFragments
Number of suspect fragments to accumulate before searching through other groups.
犯罪嫌疑人片段的数量积累之前通过其他组搜索。
参数:tb.width
A single integer [1..14] giving the number of nucleotides at the start of each fragment that are part of the trusted band.
一个整数[1 .. 14]给每个片段,是可信任带的一部分开始的核苷酸数目。
参数:multiplier
A single integer specifying the multiple of fragments found out-of-group greater than fragments found in-group in order to consider a sequence a chimera.
一个单一的整数,指定多个片段发现,组比组,以便在考虑一个序列嵌合体的碎片。
参数:minLength
Minimum length of a chimeric region in order to be considered as a chimera.
嵌合区域以最小长度被视为一种幻想。
参数:minCoverage
Minimum fraction of coverage necessary in a chimeric region.
最低分数在嵌合区域覆盖必要的。
参数:overlap
Number of nucleotides at the end of the sequence that the chimeric region must overlap in order to be considered a chimera.
数量的核苷酸序列的嵌合区域必须重叠被认为是一种幻想。
参数:minSuspectFragments
Minimum number of suspect fragments belonging to another group required to consider a sequence a chimera.
属于另一组的犯罪嫌疑人片段的最低数量,需要考虑一个序列的一种幻想。
参数:showPercentCoverage
Logical indicating whether to list the percent coverage of suspect fragments in each chimeric region in the output.
逻辑说明是否列入每个输出嵌合区域的犯罪嫌疑人片段的覆盖率。
参数:add2tbl
Logical or a character string specifying the table name in which to add the result.
逻辑或一个字符串,指定要在其中添加结果表名。
参数:verbose
Logical indicating whether to display progress.
逻辑表明是否显示进度。
Details
详情----------Details----------
The algorithm works by finding suspect fragments that are uncommon in the group where the sequence belongs, but very common in another group where the sequence does not belong. Each sequence in the dbFile is tiled into short sequence segments called fragments. If the fragments are infrequent in their respective group in the dbFileReference then they are considered suspect. If enough suspect fragments from a sequence meet the specified constraints then the sequence is flagged as a chimera.
该算法可以发现犯罪嫌疑人,是少见组序列所属的片段,但在另一组序列不属于非常普遍。每个dbFile序列平铺成短序列片段段。如果碎片是在各自组罕见在dbFileReference然后它们被认为是犯罪嫌疑人。如果犯罪嫌疑人片段序列足够满足指定约束,然后序列标记是一种幻想。
The default parameters are optimized for full-length 16S sequences (> 1,000 nucleotides). Shorter 16S sequences require optimal parameters that are different than the defaults. These are: minLength = 40, and minSuspectFragments = 2.
全长16S序列(> 1000个核苷酸)的默认参数进行了优化。短16S序列需要比默认不同的最优参数。它们是:minLength = 40,minSuspectFragments = 2。
Groups are determined by the identifier present in each database. For this reason, the groups in the dbFile should exist in the groups of the dbFileReference. The reference database is assumed to contain many sequences of only good quality.
组决定在每个数据库中的标识符。出于这个原因,在dbFile组应该存在于dbFileReference组。假设参考数据库包含许多只有质量好的序列。
If a reference database is not present then it is feasible to create a reference database by using the input database as the reference database. Removing chimeras from the reference database and then iteratively repeating the process can result in a clean reference database.
如果参考数据库是不存在的,那么它是可行的使用作为参考数据库输入数据库,以创建一个参考数据库。从参考数据库中删除的嵌合体,然后反复重复这个过程可能会导致在一个干净的参考数据库。
For non-16S sequences it may be necessary to optimize the parameters for the particular sequences. The simplest way to perform an optimization is to experiment with different input parameters on artificial chimeras such as those created using CreateChimeras. Adjusting input parameters until the maximum number of artificial chimeras are identified is the easiest way to determine new defaults.
对于非-16S序列,它可能是必要的参数来优化特定的序列。最简单的方法来进行优化,是与不同的输入参数,如使用CreateChimeras创建的人工嵌合体实验。调整输入参数,直到人工嵌合体的最大数量是确定的,是最简单的方法来确定新的默认。
值----------Value----------
A data.frame containing only the sequences that meet the specifications for being chimeric. The chimera column contains information on the chimeric region and to which group it belongs. The row.names of the data.frame correspond to those of the sequences in dbFile.
一个data.frame只包含序列,满足嵌合规格。嵌合体的列包含信息嵌合区域,它属于哪个组。在row.names的data.framedbFile序列的对应。
作者(S)----------Author(s)----------
Erik Wright <a href="mailto ECIPHER@cae.wisc.edu">DECIPHER@cae.wisc.edu</a>
参考文献----------References----------
参见----------See Also----------
CreateChimeras, Add2DB
CreateChimeras,Add2DB
举例----------Examples----------
db <- system.file("extdata", "Bacteria_175seqs.sqlite", package="DECIPHER")
# It is necessary to set dbFileReference to the file path of the[这是必要的文件的路径设置dbFileReference]
# 16S reference database available from DECIPHER.cee.wisc.edu[16S参考数据库从DECIPHER.cee.wisc.edu]
chimeras <- FindChimeras(db, dbFileReference=db)
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|