R语言 TWIX包 bootTWIX()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-10-1 13:12:37

bootTWIX(TWIX)
bootTWIX()所属R语言包：TWIX

 Bootstrap of the TWIX trees
 引导TWIX树的

 译者：生物统计家园网机器人LoveR

描述----------Description----------

Bootstrap samples of the Greedy-TWIX-trees or p-value adjusted TWIX-trees.
贪婪TWIX树或p值的bootstrap样本的调整TWIX树。

用法----------Usage----------

bootTWIX(formula, data = NULL, nbagg = 25, topN = 1, subset = NULL,
 comb = NULL, method = "deviance", topn.method = "complete",
 replace = TRUE, ns=1, minsplit = 2, minbucket = round(minsplit/3),
 splitf = "deviance", Devmin = 0.01, level = 30, tol = 0.01,
 cluster = NULL, seed.cluster = NULL)

参数----------Arguments----------

参数：formula
formula of the form y ~ x1 + x2 + ..., where y must be a factor and x1,x2,... are numeric.
公式的形式y ~ x1 + x2 + ...，y必须是一个因素和x1,x2,...是数字。

参数：data
an optional data frame containing the variables in the model (training data).
一个可选的数据框包含在模型中的变量（训练数据）。

参数：nbagg
an integer giving the number of bootstrap replications.
一个整数，引导复制的数量。

参数：comb
a list of additional model and it's prediction function for model combination, see below for some examples.
额外的模型，它的预测模型组合功能的列表，请参阅下面的一些例子。

参数：splitf
kind of the splitting function to be used. It can be one of "deviance"(default) or "p-adj". If splitf set to "p-adj", the p-value adjusted classification tree will be performed.
种的分离功能被使用。它可以是之一"deviance"（默认）或"p-adj"。如果splitf设置"p-adj"的p值，将执行调整后的分类树。

参数：replace
Should sampling be with replacement?
如果采样更换吗？

参数：ns
data set of size ns <= nrow(data) obtained by sampling without replacement.
数据集的大小ns <= nrow(data)无需更换采样。

参数：Devmin
the minimum improvement on entropy by splitting or by the p-value adjusted classification trees the significance level alpha.
的最小通过拆分或通过p-值调整后的分类树的显着性水平α-熵改善。

参数：topN
integer vector. How many splits will be selected and at which level? If length 1, the same size of splits will be selected at each level. If length > 1, for example topN=c(3,2), 3 splits will be chosen at first level, 2 splits at second level and for all next levels 1 split.
整数向量。多少拆分将被选中，在哪一级？如果长度为1，同样大小的分割，将选择在每个级别。如果长度> 1，例如topN=c(3,2)，3分割将在第一级选择，2分裂在第二级和可用于所有接下来的水平1拆分。

参数：subset
an optional vector specifying a subset of observations to be used.
一个可选的向量指定要使用的观测值的一个子集。

参数：method
Which split points will be used? This can be "deviance" (default), "grid" or "local". If the method is set to: "local" - the program uses the local maxima of the split function(entropy), "deviance" - all values of the entropy, "grid" - grid points.
将使用哪一个分割点呢？这可能是"deviance"（默认），"grid"或"local"。如果method设置为： "local" - 该程序使用split函数的局部最大值（熵），参考"deviance"“ - 所有的熵值， "grid" - 网格点。

参数：topn.method
one of "complete"(default) or "single". A specification of the consideration of the split points. If set to "complete" it uses split points from all variables, else it uses split points per variable.
"complete"（默认）或"single"。一个规范的分割点的代价。如果设置为"complete"使用分割点，从所有的变量，否则它使用的分割点，每个变量。

参数：minsplit
the minimum number of observations that must exist in a node.
必须存在在一个节点中的观测的最小数目。

参数：minbucket
the minimum number of observations in any terminal <leaf> node.
中的观测值的最小数目任何终端<leaf>节点。

参数：level
maximum depth of the trees. If level set to 1, trees consist of root node.
树的最大深度。如果level设置为1，树由根节点。

参数：tol
parameter, which will be used, if topn.method is set to "single".
参数，该参数将被使用，如果topn.method设置为"single"。

参数：cluster
the name of the cluster, if parallel computing will be used.
聚类的名称，如果将用于并行计算。

参数：seed.cluster
an integer to be supplied to set.seed, or NULL not to set reproducible seeds.
要被提供给set.seed，或NULL不设置重现种子的整数。

值----------Value----------

a list with the following components :
与以下组件的列表：

参数：call
the call generating the object.
呼叫生成的对象。

参数：trees
a list of all constructed trees, which include ID, Dev, Fit, Splitvar, ... for each tree.
列表中的所有构造的树木，其中包括ID，Dev，Fit，Splitvar，...每棵树的。

参见----------See Also----------

TWIX, get.tree, predict.bootTWIX, deviance.TWIX, bagg.TWIX
TWIX，get.tree，predict.bootTWIX，deviance.TWIX，bagg.TWIX

实例----------Examples----------

library(ElemStatLearn)
data(SAheart)

### response variable must be a factor[＃＃响应变量必须是一个因素，]
SAheart$chd <- factor(SAheart$chd)

### test and train data[＃＃测试和训练数据]
###[＃＃]
set.seed(1234)
icv <- sample(nrow(SAheart),nrow(SAheart)*0.3)
itr <- setdiff(1:nrow(SAheart),icv)
train <- SAheart[itr,]
test <- SAheart[icv,]

### Bagging with greedy decision trees as base classifier[＃＃套袋与贪婪的决策树作为基本分类]
M1 <- bootTWIX(chd~.,data=train,nbagg=50)

### Bagging with the p-value adjusted classification trees (alpha is 0.01)[＃＃套袋与p-值调整后的分类树（α为0.01）]
### as base classifier[＃＃基本分类]
M2 <- bootTWIX(chd~.,data=train,nbagg=50,splitf="p-adj",Devmin=0.01)

library(MASS)

### Double-Bagging: combine LDA and classification trees[＃＃双套袋：结合LDA和分类树]
comb.lda <- list(model=lda, predict=function(obj, newdata)
 predict(obj, newdata)$x)

M3 <- bootTWIX(chd~.,data=train,nbagg=50,comb=comb.lda)

### Double-Bagging: combine LDA and p-value adjusted classification trees[＃＃双套袋：结合LDA和p值调整分类树]
comb.lda <- list(model=lda, predict=function(obj, newdata)
 predict(obj, newdata)$x)

M4 <- bootTWIX(chd~.,data=train,nbagg=50,comb=comb.lda,
 splitf="p-adj",Devmin=0.01)

### Double-Bagging: combine GLM and classification trees[＃＃双套袋：结合的GLM和分类树]
comb.glm <- list(model=function(x,...){glm(x,family=binomial,...)},
 predict=function(obj, newdata) predict(obj, newdata))

M5 <- bootTWIX(chd~.,data=train,nbagg=50,comb=comb.glm)

### Double-Bagging: combine GLM and p-value adjusted classification trees[＃＃双套袋：联合GLM和p-值调整分类树]
comb.glm <- list(model=function(x,...){glm(x,family=binomial,...)},
 predict=function(obj, newdata) predict(obj, newdata))

M6 <- bootTWIX(chd~.,data=train,nbagg=50,comb=comb.glm,
 splitf="p-adj",Devmin=0.01)

pred1 <- predict(M1,test)
pred2 <- predict(M2,test)
pred3 <- predict(M3,test)
pred4 <- predict(M4,test)
pred5 <- predict(M5,test)
pred6 <- predict(M6,test)

###[＃＃]
### CCR's[＃＃CCR的]

sum(pred1 == test$chd)/nrow(test)
sum(pred2 == test$chd)/nrow(test)
sum(pred3 == test$chd)/nrow(test)
sum(pred4 == test$chd)/nrow(test)
sum(pred5 == test$chd)/nrow(test)
sum(pred6 == test$chd)/nrow(test)

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册