R语言 ChemmineR包 cmp.duplicated()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-25 14:42:10

cmp.duplicated(ChemmineR)
cmp.duplicated()所属R语言包：ChemmineR

                                    quickly detect compound duplication in a descriptor database
                                       快速检测一个描述数据库中的复合重复

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

'cmp.duplicated' detects duplicated compounds from a descriptor database generated by 'cmp.parse'. Two compounds are said to duplicate each other when their descriptors are the same.
“cmp.duplicated”重复检测从cmp.parse“产生的一个描述符数据库化合物。两种化合物相互重叠时，他们的描述是相同的。

用法----------Usage----------

cmp.duplicated(db, sort = FALSE, type=1)

参数----------Arguments----------

参数：db
The desciptor database, in the format returned by 'cmp.parse'.
返回的desciptor数据库格式，cmp.parse“。

参数：sort
Whether to sort the descriptors for a compound. See details.
是否排序的一个复合的描述。查看详情。

参数：type
Returns results as vector (type=1) or data frame (type=2).
（类型= 1）为向量或数据框（类型= 2）的返回结果。

Details

详情----------Details----------

'cmp.duplicated' will take the descriptors in the descriptor database, concatenate all descriptors for the same compound into a string, and use this string as the identification of a compound. If two compounds share the same identification string,  they are said to duplicate each other.
“cmp.duplicated”将在描述数据库中的描述符，描述符连接成一个字符串相同的化合物，化合物的鉴定，并使用此字符串。如果两种化合物有着相同的标识字符串，他们说是相互重叠。

'cmp.duplicated' assume the the database passed in as argument to follow the format generated by 'cmp.parse'. That is, 'db' is a list, 'db$descdb' is a list, and each entry of 'db$descdb' is an array of numeric values that give descriptors for one compound.
“cmp.duplicated”假设作为参数，以遵循cmp.parse“生成的格式通过数据库。 DB美元descdb“就是DB是一个列表，列表，和DB美元descdb”的每个条目是一个描述给一个复合的数值数组。

By default, 'cmp.duplicated' will assume the descriptors for a compound is already sorted. That is each entry in 'db\$descdb' is a sorted array. This is true for database generated by 'cmp.parse'. If you generate the database using some other tools, you might want to enable sorting.
默认情况下，“cmp.duplicated”将承担化合物描述已经排序。这是在“DB \ $ descdb”的每个条目是一个排序的数组。这是真正为cmp.parse“产生的数据库。如果你使用其他一些工具生成数据库，您可能要启用排序。

值----------Value----------

Returns a logic array, telling whether a compound in the database is a duplication of a compound appearing before this one. For example, if the i-th element of the array is TRUE, it means that the i-th compound in the database is a duplication of a compound listed before this compound in the database.
返回一个逻辑阵列，告诉是否在数据库中的化合物是一种重复的化合物出现在此之前一。例如，如果数组的第i个元素为TRUE，这意味着，在数据库中的第i个化合物是一种重复在此之前化合物数据库中所列的化合物。

The returned array can be used to remove duplication. Simply use it to index the descriptor database.
返回的数组可以用来消除重复。只是用它来描述数据库索引。

If you are interested in what compound is duplicated, you can do a search in the database with cutoff set to 1.
如果你是在重复的复合感兴趣，你可以做一个数据库设置为1的截止搜索。

作者（S）----------Author(s)----------

Y. Eddie Cao

参见----------See Also----------

cmp.parse, cmp.search
cmp.parse，cmp.search

举例----------Examples----------

## Load sample SD file[＃负载样品的SD文件]
# data(sdfsample); sdfset <- sdfsample[数据（sdfsample）; sdfset < -  sdfsample]

## Generate atom pair descriptor database for searching[＃生成原子对数据库搜索描述]
# apset <- sdf2ap(sdfset) [< -  sdf2ap apset（sdfset）]

## Loads same atom pair sample data set provided by library[＃加载相同的原子对样本数据集，由图书馆提供]
data(apset)
db <- apset

## Manually create a duplication (here compound 1 and 10)[＃手动创建一个重复（化合物1和10）]
db[10] <- db[1]

## Find duplication[＃查找重复]
dup <- cmp.duplicated(db)
dup
cid(db[dup])

## Remove all duplications [＃删除所有重复]
db <- db[!dup]

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册