查看: 378|回复: 0

R语言 TRAMPR包 group.knowns()函数中文帮助文档(中英文对照)

发表于 2012-10-1 11:43:18 | 显示全部楼层 |阅读模式

                                        Knowns Clustering

                                         译者:生物统计家园网 机器人LoveR


Group a TRAMPknowns object so that knowns with similar TRFLP patterns and knowns that share the same species name “group” together. In general, this function will be called automatically whenever appropriate (e.g. when loading a data set or adding new knowns).  Please see Details to understand why this function is necessary, and how it works.

The main reason for manually calling group.knowns is to change the default values of the arguments; if you call group.knowns on a TRAMPknowns object, then any subsequent automatic call to group.knowns will use any arguments you passed in the manual group.knowns call (e.g. after doing group.knowns(x, cut.height=20), all future groupings will use cut.height=20).
主要的原因手动调用group.knowns是更改默认的参数值,如果你叫group.knownsTRAMPknowns对象,那么任何后续的自动调用group.knowns,将使用group.knowns调用(例如,后做group.knowns(x, cut.height=20),未来所有的分组将使用手册中的任何参数通过cut.height=20“)。


group.knowns(x, ...)
## S3 method for class 'TRAMPknowns'
group.knowns(x, dist.method, hclust.method, cut.height, ...)
## S3 method for class 'TRAMP'
group.knowns(x, ...)


A TRAMPknowns or TRAMP object, containing identified TRFLP patterns.

Distance method used in calculating similarity between different knowns (see dist).  Valid options include "maximum", "euclidian" and "manhattan".

Clustering method used in generating clusters from the similarity matrix (see hclust).

Passed to cutree; controls how similar members of each group should be (the larger cut.height, the more inclusive knowns groups will be).

Arguments passed to further methods.



group.knowns groups together knowns in a TRAMPknowns object based on two criteria: (1) TRFLP profiles that are very similar across shared enzyme/primer combinations (based on clustering) and (2) TRFLP profiles that belong to the same species (i.e. share a common species column in the info data.frame of x; see TRAMPknowns for more information).  This is to solve three issues in TRFLP analysis:

The TRFLP profile of a single species can have variation in peak sizes due to DNA sequence variation.  By including multiple collections of each species, variation in TRFLP profiles can be accounted for.  If a TRAMPknowns object contains multiple collections of a species, these will be aggregated by group.knowns.  This aggregation is essential for community analysis, as leaving individual collections will artificially inflate the number of “present species” when running TRAMP.

Some authors have taken an alternative approach by using a larger tolerance in matching peaks between samples and knowns (effectively increasing accept.error in TRAMP) to account for within-species variation.  This is not recommended, as it dramatically increases the risk of incorrect matches.

Distinctly different TRFLP profiles may occur within a species (or in some cases within an individual); see Avis et al. (2006). group.knowns looks at the species column of the info data.frame of x and joins any knowns with identical species values as a group.    This can also be used where multiple profiles are present in an individual.
明显不同的TRFLP公司可能会出现一个物种内(或在某些情况下,在个别);见安飞士等。 (2006年)。 group.knowns在species的info数据框的x列,与任何已知的相同species值作为一组。这也可以用在多个配置文件是存在于个人。

Different species may share a similar TRFLP profile and therefore be indistinguishable using TRFLP. If these patterns are not grouped, two species will be recorded as present wherever either is present. group.knowns prevents this by joining knowns with “very similar” TRFLP patterns as a group.  Ideally, these problematic groups can be resolved by increasing the number of enzyme/primer pairs in the data.
不同的物种可能共享类似的TRFLP配置文件和因此无法区分TRFLP。如果这些模式不进行分组,两个品种将被记录作为当前的地方要么是存在的。 group.knowns防止这种通过加入已知与“非常相似”TRFLP图案为一组。理想的情况下,通过增加酶/引物对中的数据的数目,可以解决这些问题的组。

Groups names are generated by concatenating all unique (sorted) species names together, separated by commas.

To determine if knowns are “similar enough” to form a group, we use R's clustering tools: dist, hclust and cutree.  First, we generate a distance matrix of the knowns profiles using dist, and using method dist.method (see Example below; this is very similar to what TRAMP does, and dist.method should be specified accordingly).  We then generate clusters using hclust, and using method hclust.method, and “cut” the tree at cut.height using cutree.

Knowns are grouped together iteratively; so that all groups sharing a common cluster are grouped together, and all knowns that share a common species name are grouped together.  In certain cases this may chain together seemingly unrelated groups.

Because group.knowns is generic, it can be run on either a TRAMPknowns or a TRAMP object.  When run on a TRAMP object, it updates the TRAMPknowns object (stored as x$knowns), so that subsequent calls to plot.TRAMPknowns or summary.TRAMPknowns (for example) will use the new grouping parameters.

Parameters set by group.knowns are retained as part of the object, so that when adding additional knowns (add.known and combine), or when subsetting a knowns database (see [.TRAMPknowns,  aka TRAMPindexing), the same grouping parameters will be used.


For group.knowns.TRAMPknowns, a new TRAMPknowns object. The cluster.pars element will have been updated with new parameters, if any were specified.
对于group.knowns.TRAMPknowns,一个新的TRAMPknowns对象。 cluster.pars元素已被更新,以新的参数,如果任何指定的。

For group.knowns.TRAMP, a new TRAMP object, with an updated knowns element.  Note that the original TRAMPknowns object (i.e. the one from which the TRAMP object was constructed) will not be modified.


Warning about missing data: where there are NA values in certain combinations, NAs may be present in the final distance matrix, which means we cannot use hclust to generate the clusters!  In general, NA values are fine.  They just can't be everywhere.


testing the limitations of terminal restriction fragment length polymorphism (TRFLP) analysis of soil fungi. Molecular Ecology 15: 873-882.

参见----------See Also----------

TRAMPknowns, which describes the TRAMPknowns object.

build.knowns, which attempts to generate a knowns database from a TRAMPsamples data set.

plot.TRAMPknowns, which graphically displays the relationships between knowns.



demo.knowns <- group.knowns(demo.knowns, cut.height=2.5)

## Increasing cut.height makes groups more inclusive:[#提高cut.height使群体更具包容性的:]
plot(group.knowns(demo.knowns, cut.height=100))

res <- TRAMP(demo.samples, demo.knowns)
m1.ungrouped <- summary(res)
m1.grouped <- summary(res, group=TRUE)
ncol(m1.grouped) # 94 groups[94个团体]

res2 <- group.knowns(res, cut.height=100)
m2.ungrouped <- summary(res2)
m2.grouped <- summary(res2, group=TRUE)
ncol(m2.grouped) # Now only 38 groups[现在只有38组]

## group.knowns results in the same distance matrix as produced by[#group.knowns结果产生在相同的距离矩阵]
## TRAMP, therefore using the same method (e.g. method="maximum") is[#TRAMP,因此,使用同样的方法(例如method =“最大”)]
## important.  The example below shows how the matrix produced by[#很重要的。下面的例子演示了如何矩阵]
## dist(summary(x)) (as calculated by group.knowns) is the same as that[#DIST(摘要(x))的(计算的group.knowns)是相同]
## produced by TRAMP:[#制作的TRAMP:]
f <- function(x, method="maximum") {
  ## Create a pseudo-samples object from our knowns[#创建一个伪样从我们已知的对象]
  y <- x
  y$data$height <- 1
  names(y$info)[names(y$info) == "knowns.pk"] <- "sample.pk"
  names(y$data)[names(y$data) == "knowns.fk"] <- "sample.fk"
  class(y) <- "TRAMPsamples"

  ## Run TRAMP, clean up and return[#执行流浪汉,清理和返回]
  ## (If method != "maximum", rescale the error to match that[#(如果方法!=“最大”,重新调整的错误匹配]
  ## generated by dist()).[#产生的区())。]
  z <- TRAMP(y, x, method=method)
  if ( method != "maximum" ) z$error <- z$error * z$n
  names(dimnames(z$error)) <- NULL

g <- function(x, method="maximum")
  as.matrix(dist(summary(x), method=method))

all.equal(f(demo.knowns, "maximum")$error,   g(demo.knowns, "maximum"))
all.equal(f(demo.knowns, "euclidian")$error, g(demo.knowns, "euclidian"))
all.equal(f(demo.knowns, "manhattan")$error, g(demo.knowns, "manhattan"))

## However, TRAMP is over 100 times slower in this special case.[#然而,TRAMP在这种特殊情况下慢100倍以上。]

转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


使用道具 举报

您需要登录后才可以回帖 登录 | 注册


手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2024-12-2 08:31 , Processed in 0.024939 second(s), 15 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表