bootTWIX(TWIX)
bootTWIX()所属R语言包:TWIX
Bootstrap of the TWIX trees
引导TWIX树的
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Bootstrap samples of the Greedy-TWIX-trees or p-value adjusted TWIX-trees.
贪婪TWIX树或p值的bootstrap样本的调整TWIX树。
用法----------Usage----------
bootTWIX(formula, data = NULL, nbagg = 25, topN = 1, subset = NULL,
comb = NULL, method = "deviance", topn.method = "complete",
replace = TRUE, ns=1, minsplit = 2, minbucket = round(minsplit/3),
splitf = "deviance", Devmin = 0.01, level = 30, tol = 0.01,
cluster = NULL, seed.cluster = NULL)
参数----------Arguments----------
参数:formula
formula of the form y ~ x1 + x2 + ..., where y must be a factor and x1,x2,... are numeric.
公式的形式y ~ x1 + x2 + ...,y必须是一个因素和x1,x2,...是数字。
参数:data
an optional data frame containing the variables in the model (training data).
一个可选的数据框包含在模型中的变量(训练数据)。
参数:nbagg
an integer giving the number of bootstrap replications.
一个整数,引导复制的数量。
参数:comb
a list of additional model and it's prediction function for model combination, see below for some examples.
额外的模型,它的预测模型组合功能的列表,请参阅下面的一些例子。
参数:splitf
kind of the splitting function to be used. It can be one of "deviance"(default) or "p-adj". If splitf set to "p-adj", the p-value adjusted classification tree will be performed.
种的分离功能被使用。它可以是之一"deviance"(默认)或"p-adj"。如果splitf设置"p-adj"的p值,将执行调整后的分类树。
参数:replace
Should sampling be with replacement?
如果采样更换吗?
参数:ns
data set of size ns <= nrow(data) obtained by sampling without replacement.
数据集的大小ns <= nrow(data)无需更换采样。
参数:Devmin
the minimum improvement on entropy by splitting or by the p-value adjusted classification trees the significance level alpha.
的最小通过拆分或通过p-值调整后的分类树的显着性水平α-熵改善。
参数:topN
integer vector. How many splits will be selected and at which level? If length 1, the same size of splits will be selected at each level. If length > 1, for example topN=c(3,2), 3 splits will be chosen at first level, 2 splits at second level and for all next levels 1 split.
整数向量。多少拆分将被选中,在哪一级?如果长度为1,同样大小的分割,将选择在每个级别。如果长度> 1,例如topN=c(3,2),3分割将在第一级选择,2分裂在第二级和可用于所有接下来的水平1拆分。
参数:subset
an optional vector specifying a subset of observations to be used.
一个可选的向量指定要使用的观测值的一个子集。
参数:method
Which split points will be used? This can be "deviance" (default), "grid" or "local". If the method is set to:<br> "local" - the program uses the local maxima of the split function(entropy),<br> "deviance" - all values of the entropy,<br> "grid" - grid points.
将使用哪一个分割点呢?这可能是"deviance"(默认),"grid"或"local"。如果method设置为:<br>"local" - 该程序使用split函数的局部最大值(熵),参考"deviance"“ - 所有的熵值,< BR> "grid" - 网格点。
参数:topn.method
one of "complete"(default) or "single". A specification of the consideration of the split points. If set to "complete" it uses split points from all variables, else it uses split points per variable.
"complete"(默认)或"single"。一个规范的分割点的代价。如果设置为"complete"使用分割点,从所有的变量,否则它使用的分割点,每个变量。
参数:minsplit
the minimum number of observations that must exist in a node.
必须存在在一个节点中的观测的最小数目。
参数:minbucket
the minimum number of observations in any terminal <leaf> node.
中的观测值的最小数目任何终端<leaf>节点。
参数:level
maximum depth of the trees. If level set to 1, trees consist of root node.
树的最大深度。如果level设置为1,树由根节点。
参数:tol
parameter, which will be used, if topn.method is set to "single".
参数,该参数将被使用,如果topn.method设置为"single"。
参数:cluster
the name of the cluster, if parallel computing will be used.
聚类的名称,如果将用于并行计算。
参数:seed.cluster
an integer to be supplied to set.seed, or NULL not to set reproducible seeds.
要被提供给set.seed,或NULL不设置重现种子的整数。
值----------Value----------
a list with the following components :
与以下组件的列表:
参数:call
the call generating the object.
呼叫生成的对象。
参数:trees
a list of all constructed trees, which include ID, Dev, Fit, Splitvar, ... for each tree.
列表中的所有构造的树木,其中包括ID,Dev,Fit,Splitvar,...每棵树的。
参见----------See Also----------
TWIX, get.tree, predict.bootTWIX, deviance.TWIX, bagg.TWIX
TWIX,get.tree,predict.bootTWIX,deviance.TWIX,bagg.TWIX
实例----------Examples----------
library(ElemStatLearn)
data(SAheart)
### response variable must be a factor[##响应变量必须是一个因素,]
SAheart$chd <- factor(SAheart$chd)
### test and train data[##测试和训练数据]
###[##]
set.seed(1234)
icv <- sample(nrow(SAheart),nrow(SAheart)*0.3)
itr <- setdiff(1:nrow(SAheart),icv)
train <- SAheart[itr,]
test <- SAheart[icv,]
### Bagging with greedy decision trees as base classifier[##套袋与贪婪的决策树作为基本分类]
M1 <- bootTWIX(chd~.,data=train,nbagg=50)
### Bagging with the p-value adjusted classification trees (alpha is 0.01)[##套袋与p-值调整后的分类树(α为0.01)]
### as base classifier[##基本分类]
M2 <- bootTWIX(chd~.,data=train,nbagg=50,splitf="p-adj",Devmin=0.01)
library(MASS)
### Double-Bagging: combine LDA and classification trees[##双套袋:结合LDA和分类树]
comb.lda <- list(model=lda, predict=function(obj, newdata)
predict(obj, newdata)$x)
M3 <- bootTWIX(chd~.,data=train,nbagg=50,comb=comb.lda)
### Double-Bagging: combine LDA and p-value adjusted classification trees[##双套袋:结合LDA和p值调整分类树]
comb.lda <- list(model=lda, predict=function(obj, newdata)
predict(obj, newdata)$x)
M4 <- bootTWIX(chd~.,data=train,nbagg=50,comb=comb.lda,
splitf="p-adj",Devmin=0.01)
### Double-Bagging: combine GLM and classification trees[##双套袋:结合的GLM和分类树]
comb.glm <- list(model=function(x,...){glm(x,family=binomial,...)},
predict=function(obj, newdata) predict(obj, newdata))
M5 <- bootTWIX(chd~.,data=train,nbagg=50,comb=comb.glm)
### Double-Bagging: combine GLM and p-value adjusted classification trees[##双套袋:联合GLM和p-值调整分类树]
comb.glm <- list(model=function(x,...){glm(x,family=binomial,...)},
predict=function(obj, newdata) predict(obj, newdata))
M6 <- bootTWIX(chd~.,data=train,nbagg=50,comb=comb.glm,
splitf="p-adj",Devmin=0.01)
pred1 <- predict(M1,test)
pred2 <- predict(M2,test)
pred3 <- predict(M3,test)
pred4 <- predict(M4,test)
pred5 <- predict(M5,test)
pred6 <- predict(M6,test)
###[##]
### CCR's[##CCR的]
sum(pred1 == test$chd)/nrow(test)
sum(pred2 == test$chd)/nrow(test)
sum(pred3 == test$chd)/nrow(test)
sum(pred4 == test$chd)/nrow(test)
sum(pred5 == test$chd)/nrow(test)
sum(pred6 == test$chd)/nrow(test)
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|