Importance(rminer)
Importance()所属R语言包:rminer
Measure input importance given a supervised data mining model.
测量输入的重要性有监督的数据挖掘模型。
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Measure input importance given a supervised data mining model.
测量输入的重要性有监督的数据挖掘模型。
用法----------Usage----------
Importance(M, data, RealL = 6, method = "sens", measure = "gradient",
sampling = "regular", baseline = "mean", responses = TRUE,
outindex = NULL, task = "default", PRED = NULL,
interactions = NULL)
参数----------Arguments----------
参数:M
fitted model, typically is the object returned by fit. Can also be any fitted model (i.e. not from rminer), provided that the predict function PRED is defined (see examples for details).
拟合模型,通常是的对象返回fit。也可任何模型拟合(即不rminer)的,提供的预测功能PRED定义(例子)。
参数:data
training data (the same data.frame that was used to fit the model, currently only used to add data histogram to VEC curve).
训练数据(数据框,是用来拟合模型,目前只用VEC数据直方图曲线)。
参数:RealL
for numeric inputs, the number of sensitivity analysis levels (e.g. 6). Note: you need to use RealL>=2.
对于数字输入,敏感度分析水平的数量(例如,6)。注意:您需要使用RealL> = 2。
参数:method
input importance method. Options are:
输入重要性的方法。选项有:
sens – sensitivity analysis
SENS - 敏感性分析
sensv -- equal to sens but sets measure="variance".
sensv - 等于sens但设置measure="variance"。
sensg -- equal to sens but sets measure="gradient".
sensg - 等于sens设置measure="gradient"。
sensr -- equal to sens but sets measure="range".
SENSR - 等于sens但设置measure="range"。
randomforest -- uses method of Leo Breiman (type=1), only makes sense when M is a randomRorest.
randomforest - 的狮子座布雷曼博士(TYPE = 1)使用方法,才有意义,当M是一个randomRorest的。
参数:measure
sensitivity analysis measure (used to measure input importance). Options are:
敏感性分析测量(测量输入的重要性)。选项有:
gradient – average absolute gradient (y_i+1-y_i) of the responses.
梯度(y_i +1- y_i) - 平均绝对梯度的响应。
variance -- variance of the responses.
方差 - 方差的答复。
range -- maximum - minimum of the responses.
范围 - 最大 - 最小的答复。
参数:sampling
for numeric inputs, the sampling scan function. Options are:
为数字输入,的取样扫描功能。选项有:
regular – regular sequence (uniform distribution).
常规 - 常规序列(均匀分布)。
quantile -- sample values from the input that are more closer to the variable distribution in data.
分位数 - 从所述输入样值,在data可变分布更接近。
参数:baseline
baseline vector used during the sensitivity analysis. Options are:
过程中使用的基线向量的敏感性分析。选项有:
mean – uses a vector with the mean values of each attribute from data.
的意思是 - 使用从data与平均的每个属性值的一个矢量。
median -- uses a vector with the median values of each attribute from data.
从data的每个属性值的中位数,中位数 - 使用矢量。
a data.frame with the baseline example (should have the same attribute names as data).
数据框与基线的例子(data)应具有相同的属性名。
参数:responses
if TRUE then all sensitivity analysis responses are stored and returned.
如果TRUE然后所有敏感性分析响应存储和返回。
参数:outindex
the output index (column) of data if M is not a model object (returned by fit).
输出指数(列)data如果M是不是一个模型对象(返回适合)。
参数:task
the task as defined in fit if M is not a model object (returned by fit).
task中定义的fit如果M是不是一个模型对象(返回适合)。
参数:PRED
the prediction function of M, if M is not a model object (returned by fit). Note: this function should behave like the rminer predict-methods, i.e. return a numeric vector in case of regression; a matrix of examples (rows) vs probabilities (columns) (task="prob") or a factor (task="class") in case of classification.
预测函数的M,M是不是一个模型对象(返回适合)。注意:这个函数应该像rminer predict-methods,即返回一个数字矢量回归的情况下,矩阵的例子(行)与概率(列)((task="prob")或一个因素。 X>)情况分类。
参数:interactions
numeric vector with the attributes (columns) used by Ith-D sensitivity analysis (2-D or higher):
第I-D的灵敏度分析(2-D或更高)由所使用的属性(列)的数值向量:
if NULL then only a 1-D sensitivity analysis is performed.
如果NULL那么只有一个1-D进行敏感度分析。
if length(interactions)==1?then a "special" 2-D sensitivity analysis is performed using the index of interactions versus all remaining inputs. Note: the $sresponses[[interactions]] will be empty (in vecplot do not use xval=interactions).
如果length(interactions)==1?然后一个“特殊”的2-D的敏感性分析是使用索引的所有剩余的输入与互动。注:$ sresponses [[相互作用]是空的(在vecplot不使用xval=interactions)。
if length(interactions)>1?then a full Ith-D sensitivity analysis is performed, where I=length(interactions). Note: Computational effort can highly increase if I is too large, i.e. O(RealL^I). Also, you need to preprocess the returned list (e.g. using avg_imp) to use the vecplot function (see the examples).
如果length(interactions)>1?一个完整的第I-D进行敏感度分析,其中I =长度(交互)。注:计算高度增加,如果我努力过大,即O(雷尔^ I)。此外,您还需要进行预处理,在返回的列表(例如,使用avg_imp),使用vecplot功能(请参见示例)。
Details
详细信息----------Details----------
This function provides several algorithms for measuring input importance of supervised data mining models. A particular emphasis is given on sensitivity analysis (SA), which is a simple method that measures the effects on the output of a given model when the inputs are varied through their range of values. Check the reference for more details.
此功能提供了多种算法,测量输入端的监控数据挖掘模型的重要性。特别强调的是给定的灵敏度分析(SA),这是一个简单的方法,该方法测量输入时的输出上的一个给定的模型通过其值的范围变化的影响。检查参考更多详细信息。
值----------Value----------
A list with the components:
Alist的组件:
$value – numeric vector with the computed sensitivity analysis measure for each attribute.
美元的价值 - 数字矢量计算的敏感性分析衡量每个属性。
$imp -- numeric vector with the relative importance for each attribute.
IMP - 数字矢量的每个属性的相对重要性。
$sresponses -- vector list as described in the Value documentation of mining. </ul>
$ sresponses中所描述的价值文档的mining - 矢量列表。 </ ul>
注意----------Note----------
See also http://www3.dsi.uminho.pt/pcortez/rminer.html
也http://www3.dsi.uminho.pt/pcortez/rminer.html
(作者)----------Author(s)----------
Paulo Cortez <a href="http://www3.dsi.uminho.pt/pcortez">http://www3.dsi.uminho.pt/pcortez</a>
参考文献----------References----------
To cite the Importance function or sensitivity analysis method, please use:<br> P. Cortez and M. Embrechts.<br> Opening Black Box Data Mining Models Using Sensitivity Analysis.<br> In Proceedings of the 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), pp. 341-348, Paris, France, April, 2011.<br> http://www3.dsi.uminho.pt/pcortez/<br> </ul>
参见----------See Also----------
vecplot, fit, mining, mgraph, mmetric, savemining.
vecplot,fit,mining,mgraph,mmetric,savemining。
实例----------Examples----------
### Typical use under rminer:[##典型下rminer使用:]
# 1st example, regression, 1-D sensitivity analysis[第一个例子,回归,1-D的敏感性分析]
data(sin1reg) # x1 should account for 70%, x2 for 30% and x3 for 0%.[X1应该占到70%,X2为30%,X3为0%。]
M=fit(y~.,sin1reg,model="svm")
I=Importance(M,sin1reg,method="sens",measure="gradient") # 1-D SA[1-D SA]
print(I)
L=list(runs=1,sen=t(I$imp),sresponses=I$sresponses)
mgraph(L,graph="IMP",leg=names(sin1reg),col="gray",Grid=10)
mgraph(L,graph="VEC",xval=1,Grid=10,data=sin1reg) # or:[或:]
vecplot(I,xval=1,Grid=10,data=sin1reg,datacol="gray") # the same graph[在同一张图]
vecplot(I,xval=c(1,2,3),pch=c(1,2,3),Grid=10,
leg=list(pos="bottomright",leg=c("x1","x2","x3"))) # all x1, x2 and x3 VEC curves[所有的X1,X2和X3 VEC曲线]
# 2nd example, regression, 2-D sensitivity analysis with [第二例如,回归,2-D敏感度分析与]
# the most relevant input (x1, index 1):[最相关的输入(×1,索引1):]
I2=Importance(M,sin1reg,method="sensg",interactions=which.max(I$imp))
print(I2)
# influence of x1 and x2 over y[x1和x2在y的影响]
vecplot(I2,graph="VEC3",xval=2) # VEC surface[血管内皮单元表面]
vecplot(I2,graph="VECC",xval=2) # VEC contour[VEC轮廓]
# influence of x1 and x3 over y (influence of x3 is small random noise, 0%):[在y(×3的影响是小的随机噪声,0%)的x1和x3影响:]
vecplot(I2,graph="VEC3",xval=3)
vecplot(I2,graph="VECC",xval=3)
# 3rd example, regression, full 3-D sensitivity analysis[第三个例子,回归,完整的3-D的敏感性分析]
I3=Importance(M,sin1reg,method="sensg",interactions=c(1,2,3))
print(I3)
I3_1d=avg_imp(I3,c(1)) # 1-D averaging under x1[1-D×1平均下]
vecplot(I3_1d,graph="VEC",xval=1,Grid=10)
I3_2d=avg_imp(I3,c(1,2)) # 2-D averaging under the pair x1,x2[的2-D平均下的对×1,×2]
vecplot(I3_2d,graph="VEC3")
### If you want to use Importance over your own model:[##如果你要使用在你自己的模型的重要性:]
# 1st example, regression, uses the theoretical sin1reg function[第一个例子,回归,采用理论sin1reg,功能]
mypred=function(M,data)
{ return (M[1]*sin(pi*data[,1]/M[3])+M[2]*sin(pi*data[,2]/M[3])) }
M=c(0.7,0.3,2000)
# 4 is the column index of y[四是列索引的Y]
I=Importance(M,sin1reg,method="sens",measure="gradient",PRED=mypred,outindex=4)
print(I$imp) # x1=72.3% and x2=27.7%[X1 = 72.3%,X2 = 27.7%]
L=list(runs=1,sen=t(I$imp),sresponses=I$sresponses)
mgraph(L,graph="IMP",leg=names(sin1reg),col="gray",Grid=10)
mgraph(L,graph="VEC",xval=1,Grid=10) # equal to:[等于:]
vecplot(I,graph="VEC",xval=1,Grid=10)
# 2nd example, 3-class classification for iris and lda model:[第二个示例,3级光圈和LDA模型的分类:]
data(iris)
library(MASS)
predlda=function(M,data) # the PRED function[PRED功能]
{ return (predict(M,data)$posterior) }
LDA=lda(Species ~ .,iris, prior = c(1,1,1)/3)
# 4 is the column index of Species[4是物种的列索引]
I=Importance(LDA,iris,method="sensg",PRED=predlda,outindex=4)
vecplot(I,graph="VEC",xval=1,Grid=10,TC=1,
main="1-D VEC for Sepal.Lenght (x-axis) influence in setosa (prob.)")
# 3rd example, binary classification for setosa iris and lda model:[第三个例子,为setosa光圈和lda模型的二元分类:]
iris2=iris;iris2$Species=factor(iris$Species=="setosa")
predlda2=function(M,data) # the PRED function[PRED功能]
{ return (predict(M,data)$class) }
LDA2=lda(Species ~ .,iris2)
I=Importance(LDA2,iris2,method="sensg",PRED=predlda2,outindex=4)
vecplot(I,graph="VEC",xval=1,
main="1-D VEC for Sepal.Lenght (x-axis) influence in setosa (class)",Grid=10)
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|