R语言 MLInterfaces包 MLearn()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-26 01:04:20

MLearn(MLInterfaces)
MLearn()所属R语言包：MLInterfaces

                                    revised MLearn interface for machine learning
                                       修改为机器学习MLearn接口

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

revised MLearn interface for machine learning, emphasizing a schematic description of external learning functions like knn, lda, nnet, etc.
机器学习的修订MLearn接口，强调像KNN的外部学习功能的原理说明，LDA，nnet等。

用法----------Usage----------

MLearn( formula, data, .method, trainInd, ... )
makeLearnerSchema(packname, mlfunname, converter, predicter)

参数----------Arguments----------

参数：formula
standard model formula
标准模型公式

参数：data
data.frame or ExpressionSet instance
数据框或ExpressionSet实例

参数：.method
instance of learnerSchema
实例learnerSchema

参数：trainInd
obligatory numeric vector of indices of data to be used for training; all other data are used for testing, or instance of the xvalSpec class
强制性的数字向量用于训练的数据指标;所有其他数据用于测试，或的xvalSpec类的实例

参数：...
additional named arguments passed to external learning function
额外的命名参数传递到外部学习功能

参数：packname
character – name of package harboring a learner function
字符 - 窝藏学习功能的包的名称

参数：mlfunname
character – name of function to use
字符 - 使用功能名称

参数：converter
function – with parameters (obj, data, trainInd) that tells how to convert the material in obj [produced by [packname::mlfunname] ] into a classifierOutput instance.
功能 - 参数（OBJ，数据，trainInd）告诉你如何转换在obj材料成classifierOutput实例[生产[packname :: mlfunname]]。

参数：predicter
function – with parameters (obj, newdata, ...) that tells how to use the material in obj to predict newdata.
功能 - 参数（OBJ，newdata，...），讲述如何使用材料obj预测newdata。

Details

详情----------Details----------

The purpose of the MLearn methods is to provide a uniform calling sequence to diverse machine learning algorithms.  In R package, machine learning functions can have parameters (x, y, ...) or (formula, data, ...) or some other sequence, and these functions can return lists or vectors or other sorts of things.  With MLearn, we always have calling sequence MLearn(formula, data, .method, trainInd, ...),  and data can be a data.frame or ExpressionSet.  MLearn will always return an S4 instance of classifierObject or clusteringObject.
的MLearn方法的目的是提供一个统一的调用序列不同的机器学习算法。 R包，机器学习功能，可以有参数(x, y, ...)或(formula, data, ...)或其他一些序列，这些功能可以返回列表或向量或其他种类的东西。与MLearn，我们总是调用序列MLearn(formula, data, .method, trainInd, ...)，data能data.frame或ExpressionSet。 MLearn总是会返回一个classifierObject或clusteringObjectS4实例。

At this time (1.13.x), NA values in predictors trigger an error.
在这个时候（1.13.x），在预测NA值触发一个错误。

To obtain documentation on the older (pre bioc 2.1) version of the MLearn method, please use help(MLearn-OLD).
以获取旧的（前bioc 2.1）的版本的MLearn方法的文档，请使用帮助（MLearn岁）。

randomForestI randomForest.  Note, that to obtain the default performance of randomForestB, you need to set mtry and sampsize parameters to sqrt(number of features) and table([training set response factor]) respectively, as these were not taken to be the function's defaults. Note you can use xvalSpec("NOTEST") as trainInd, to use all the samples; the RObject() result
randomForestI randomForest。注意，获得默认randomForestB性能，你需要，设置mtry和参数sampsize SQRT（功能）和表（训练集的响应因子），因为这些不采取是函数的默认。注意：您可以使用trainInd xvalSpec（“NOTEST），使用所有的样品;的RObject（）的结果

knnI(k=1,l=0) knn; special support bridge required, defined in MLint
knnI（K = 1，L = 0），KNN;需要特殊的支持桥，MLint的定义

knn.cvI(k=1,l=0) knn.cv; special support bridge required, defined in MLint.  This option uses the embedded leave-one-out cross-validation of knn.cv, and thereby achieves high performance.  You can have more general cross-validation using knnI with an xvalSpec, but it will be slower. When using this learner schema, you should use the numerical trainInd setting with 1:N where
knn.cvI（K = 1，L = 0）knn.cv;需要特别支持的桥梁，在MLint定义。此选项使用嵌入式假一出knn.cv交叉验证，从而达到高性能。你可以有更一般的交叉验证使用knnIxvalSpec，但它会比较慢。使用此学习模式时，你应该用用trainInd其中数值1:N设置

dldaI diagDA; special support bridge required, defined in MLint
定义MLint dldaI diagDA;需要特殊的支持桥，

nnetI nnet
nnetI nnet

rpartI rpart
rpartI软件rpart

ldaI lda
ldaI LDA

svmI svm
svmI SVM

qdaI qda
qdaI QDA

logisticI(threshold) glm – with binomial family, expecting a dichotomous factor as response variable, not bulletproofed against other responses yet.  If response
（阈值）LogisticiMFGLM  - 与二项式家庭，期待一个响应变量的二元因素，不bulletproofed尚未对其他反应。如果响应

adaI ada
阿岱ADA

BgbmI gbm, forcing the Bernoulli loss function.
BgbmI GBM，迫使的伯努利损失函数。

blackboostI blackboost – you MUST supply
blackboostI blackboost  - 你必须提供

lvqI lvqtest after building codebook with lvqinit and updating with olvq1.  You will need to write your own detailed schema if you want to tweak tuning
lvqI lvqtest后建设与lvqinit码本和更新与olvq1。如果你想调整调整，你会需要编写自己的详细架构

naiveBayesI naiveBayes
naiveBayesI naiveBayes

baggingI bagging
baggingI套袋

sldaI slda
sldaI slda

rdaI rda – you must supply the alpha and delta parameters to
RDAI RDA  - 你必须提供Alpha和Delta参数

rdacvI rda.cv.  This interface is complicated.  The typical use includes cross-validation internal to the rda.cv function.  That process searches a tuning parameter space and delivers an ordering on parameters. The interface selects the parameters by looking at all parameter configurations achieving the smallest min+1SE cv.error estimate, and taking the one among them that employed the -most- features (agnosticism). A final run of rda is then conducted with the tuning parameters set at that 'optimal' choice.  The bridge code can be modified to facilitate alternative choices of the parameters in use.  plotXvalRDA is an interface to the plot method for objects of class rdacv defined in package rda.  You can use xvalSpec("NOTEST") with this procedure to
rdacvI rda.cv.这个接口是复杂的。典型的使用包括内部交叉验证的rda.cv功能。这个过程中，搜索调谐参数空间，并提供了一个参数的顺序。寻找实现的最小分+1东南cv.error估计在所有参数配置，其中之一，雇用的大部分功能（不可知论）接口选择的参数。最终运行的RDA然后进行设置调整参数是最佳的选择。桥代码可以被修改，以方便在使用中的参数的替代选择。 plotXvalRDA是一个接口定义在包RDA的类rdacv对象的图方法。您可以使用此过程xvalSpec（“NOTEST”）

ksvmI ksvm
ksvmI ksvm

hclustI(distMethod, agglomMethod) hclust –
hclustI（distMethod，agglomMethod）hclust  -

kmeansI(centers, algorithm) kmeans –
kmeansI（中心，算法）KMEANS  -

If the multicore package is attached, cross-validation will be distributed to cores using mclapply.
如果multicore包安装后，将分配给内核使用mclapply交叉验证。

值----------Value----------

Instances of classifierOutput or clusteringOutput
实例classifierOutput或clusteringOutput会

作者（S）----------Author(s)----------

Vince Carey <stvjc@channing.harvard.edu>

举例----------Examples----------

data(crabs)
set.seed(1234)
kp = sample(1:200, size=120)
rf1 = MLearn(sp~CW+RW, data=crabs, randomForestI, kp, ntree=600 )
rf1
nn1 = MLearn(sp~CW+RW, data=crabs, nnetI, kp, size=3, decay=.01 )
nn1
RObject(nn1)
knn1 = MLearn(sp~CW+RW, data=crabs, knnI(k=3,l=2), kp)
knn1
names(RObject(knn1))
dlda1 = MLearn(sp~CW+RW, data=crabs, dldaI, kp )
dlda1
names(RObject(dlda1))
lda1 = MLearn(sp~CW+RW, data=crabs, ldaI, kp )
lda1
names(RObject(lda1))
slda1 = MLearn(sp~CW+RW, data=crabs, sldaI, kp )
slda1
names(RObject(slda1))
svm1 = MLearn(sp~CW+RW, data=crabs, svmI, kp )
svm1
names(RObject(svm1))
ldapp1 = MLearn(sp~CW+RW, data=crabs, ldaI.predParms(method="debiased"), kp )
ldapp1
names(RObject(ldapp1))
qda1 = MLearn(sp~CW+RW, data=crabs, qdaI, kp )
qda1
names(RObject(qda1))
logi = MLearn(sp~CW+RW, data=crabs, glmI.logistic(threshold=0.5), kp, family=binomial ) # need family[需要家庭]
logi
names(RObject(logi))
rp2 = MLearn(sp~CW+RW, data=crabs, rpartI, kp)
rp2
## recode data for RAB[＃RAB的数据重新编码。]
#nsp = ifelse(crabs$sp=="O", -1, 1)[NSP = ifelse（蟹$ SP ==“O”-1，1）]
#nsp = factor(nsp)[NSP =因子（NSP）]
#ncrabs = cbind(nsp,crabs)[ncrabs = cbind（NSP，螃蟹）]
#rab1 = MLearn(nsp~CW+RW, data=ncrabs, RABI, kp, maxiter=10)[RAB1 = MLearn（NSP~CW + RW光盘，数据= ncrabs，拉比，KP，maxiter = 10）]
#rab1[RAB1]
#[]
# new approach to adaboost[新的方法来的adaboost]
#[]
ada1 = MLearn(sp ~ CW+RW, data = crabs, .method = adaI,
trainInd = kp, type = "discrete", iter = 200)
ada1
confuMat(ada1)
#[]
lvq.1 = MLearn(sp~CW+RW, data=crabs, lvqI, kp )
lvq.1
nb.1 = MLearn(sp~CW+RW, data=crabs, naiveBayesI, kp )
confuMat(nb.1)
bb.1 = MLearn(sp~CW+RW, data=crabs, baggingI, kp )
confuMat(bb.1)
#[]
# new mboost interface -- you MUST supply family for nonGaussian response[新mboost接口 - 你必须提供家庭nonGaussian响应]
#[]
require(party)  # trafo ... killing cmd check[互感器...杀CMD检查]
blb.1 = MLearn(sp~CW+RW+FL, data=crabs, blackboostI, kp, family=mboost::Binomial() )
confuMat(blb.1)
#[]
# ExpressionSet illustration[ExpressionSet插图]
# []
#  12/20/2012 -- increased training set size to avoid new randomForest[2012年12月20日 - 增加训练集的大小，以避免新的randomForest]
#  error when empty classes emerge[时出错出现空类]
data(sample.ExpressionSet)
X = MLearn(type~., sample.ExpressionSet[100:250,], randomForestI, 1:19, importance=TRUE )
library(randomForest)
library(hgu95av2.db)
opar = par(no.readonly=TRUE)
par(las=2)
plot(getVarImp(X), n=10, plat="hgu95av2", toktype="SYMBOL")
par(opar)
#[]
# demonstrate cross validation[演示了交叉验证]
#[]
nn1cv = MLearn(sp~CW+RW, data=crabs[c(1:20,101:120),], nnetI, xvalSpec("LOO"), size=3, decay=.01 )
confuMat(nn1cv)
nn2cv = MLearn(sp~CW+RW, data=crabs[c(1:20,101:120),], nnetI,
xvalSpec("LOG",5, balKfold.xvspec(5)), size=3, decay=.01 )
confuMat(nn2cv)
nn3cv = MLearn(sp~CW+RW+CL+BD+FL, data=crabs[c(1:20,101:120),], nnetI,
xvalSpec("LOG",5, balKfold.xvspec(5), fsFun=fs.absT(2)), size=3, decay=.01 )
confuMat(nn3cv)
nn4cv = MLearn(sp~.-index-sex, data=crabs[c(1:20,101:120),], nnetI,
xvalSpec("LOG",5, balKfold.xvspec(5), fsFun=fs.absT(2)), size=3, decay=.01 )
confuMat(nn4cv)
#[]
# try with expression data[尝试用基因表达数据]
#[]
library(golubEsets)
data(Golub_Train)
litg = Golub_Train[ 100:150, ]
g1 = MLearn(ALL.AML~. , litg, nnetI, xvalSpec("LOG",5, balKfold.xvspec(5), fsFun=fs.probT(.75)), size=3, decay=.01 )
confuMat(g1)
#[]
# illustrate rda.cv interface from package rda (requiring local bridge)[包RDA说明rda.cv接口（需要本地桥）]
#[]
library(ALL)
data(ALL)
#[]
# restrict to BCR/ABL or NEG[限制BCR / ABL的或负]
#[]
bio <- which( ALL$mol.biol %in% c("BCR/ABL", "NEG"))
#[]
# restrict to B-cell[限制B单元]
#[]
isb <- grep("^B", as.character(ALL$BT))
kp <- intersect(bio,isb)
all2 <- ALL[,kp]
mads = apply(exprs(all2),1,mad)
kp = which(mads>1)  # get around 250 genes[获得大约250个基因]
vall2 = all2[kp, ]
vall2$mol.biol = factor(vall2$mol.biol) # drop unused levels[删除未使用的水平]

r1 = MLearn(mol.biol~., vall2, rdacvI, 1:40)
confuMat(r1)
RObject(r1)
plotXvalRDA(r1)  # special interface to plots of parameter space[特殊的接口参数空间图]

# illustrate clustering support[说明聚类支持]

cl1 = MLearn(~CW+RW+CL+FL+BD, data=crabs, hclustI(distFun=dist, cutParm=list(k=4)))
plot(cl1)

cl1a = MLearn(~CW+RW+CL+FL+BD, data=crabs, hclustI(distFun=dist, cutParm=list(k=4)),
method="complete")
plot(cl1a)

cl2 = MLearn(~CW+RW+CL+FL+BD, data=crabs, kmeansI, centers=5, algorithm="Hartigan-Wong")
plot(cl2, crabs[,-c(1:3)])

c3 = MLearn(~CL+CW+RW, crabs, pamI(dist), k=5)
c3
plot(c3, data=crabs[,c("CL", "CW", "RW")])

#  new interfaces to PLS thanks to Laurent Gatto[新的接口至PLS感谢洛朗·加托]

set.seed(1234)
kp = sample(1:200, size=120)

plsda.1 = MLearn(sp~CW+RW, data=crabs, plsdaI, kp, probMethod="Bayes")
plsda.1
confuMat(plsda.1)
confuMat(plsda.1,t=.65) ## requires at least 0.65 post error prob to assign species[＃需要至少0.65后的错误概率分配物种]

plsda.2 = MLearn(type~., data=sample.ExpressionSet[100:250,], plsdaI, 1:16)
plsda.2
confuMat(plsda.2)
confuMat(plsda.2,t=.65) ## requires at least 0.65 post error prob to assign outcome[＃需要至少0.65后的错误概率分配结果]

## examples for predict[＃范例的预测]
clout <- MLearn(type~., sample.ExpressionSet[100:250,], svmI , 1:16)
predict(clout, sample.ExpressionSet[100:250,17:26])

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册