找回密码
 注册
查看: 4748|回复: 0

R语言 ROCR包 performance()函数中文帮助文档(中英文对照)

[复制链接]
发表于 2012-9-27 22:46:41 | 显示全部楼层 |阅读模式
performance(ROCR)
performance()所属R语言包:ROCR

                                        Function to create performance objects
                                         函数来创建性能对象

                                         译者:生物统计家园网 机器人LoveR

描述----------Description----------

All kinds of predictor evaluations are performed using this function.
使用此功能,进行各种预测评估。


用法----------Usage----------


performance(prediction.obj, measure, x.measure="cutoff", ...)



参数----------Arguments----------

参数:prediction.obj
An object of class prediction.
对象的类prediction。


参数:measure
Performance measure to use for the evaluation. A complete list of the performance measures that are available for measure and x.measure is given in the 'Details' section.
性能测量值,以用于评价。性能的措施,可为measure和x.measure的“详细资料”部分中的完整列表。


参数:x.measure
A second performance measure. If different from the default, a two-dimensional curve, with x.measure taken to be the unit in direction of the x axis, and measure to be the unit in direction of the y axis, is created. This curve is parametrized with the cutoff.
第二次执行措施。缺省情况下,一个两维曲线,如果不同的从x.measure采取的是在x轴方向上的单元,和measure是在y轴方向上的单元,被创建。这条曲线是参数化的与截止。


参数:...
Optional arguments (specific to individual performance measures).
可选参数(具体到个人表现的措施)。


Details

详细信息----------Details----------

Here is the list of available performance measures. Let Y and Yhat be random variables representing the class and the prediction for a randomly drawn sample, respectively. We denote by + and - the positive and negative class, respectively. Further, we use the following abbreviations for empirical quantities: P (\# positive samples), N (\# negative samples), TP (\# true positives), TN (\# true negatives), FP (\# false positives), FN (\# false negatives).
下面是列表中的可用性能的措施。让Y和Yhat类和预测的随机抽取的样本是随机变量。表示+和-的正面和负面的类,分别。此外,我们使用下面的缩写经验数量:P(\#阳性样品),N(\#阴性样品),\#真阳性(TP),总氮(\#真阴性),FP(\#误报) ,\#假阴性(FN)。

  


acc:Accuracy. P(Yhat = Y). Estimated as: (TP+TN)/(P+N).
acc:精度。 P(Yhat = Y)。估计:(TP+TN)/(P+N)。

err:Error rate. P(Yhat !=         Y). Estimated as: (FP+FN)/(P+N).
err:错误率。 P(Yhat !=         Y)。估计:(FP+FN)/(P+N)。

fpr:False positive rate. P(Yhat = + | Y = -). Estimated as: FP/N.
fpr:假阳性率。 P(Yhat = + | Y = -)。估计:FP/N。

fall:Fallout. Same as fpr.
fall:辐射。与fpr相同。

tpr:True positive rate. P(Yhat = + | Y = +). Estimated as: TP/P.
tpr:真阳性率。 P(Yhat = + | Y = +)。估计:TP/P。

rec:Recall. Same as tpr.
rec:记得。与tpr相同。

sens:Sensitivity. Same as tpr.
sens:灵敏度。与tpr相同。

fnr:False negative rate. P(Yhat = - | Y =         +). Estimated as: FN/P.
fnr:假阴性率。 P(Yhat = - | Y =         +)。估计:FN/P。

miss:Miss. Same as fnr.
miss:小姐与fnr相同。

tnr:True negative rate. P(Yhat = - | Y = -).
tnr:真阴性率。 P(Yhat = - | Y = -)。

spec:Specificity. Same as tnr.
spec:特异性。与tnr相同。

ppvositive predictive value. P(Y = + | Yhat =         +). Estimated as: TP/(TP+FP).
ppv:阳性预测值。 P(Y = + | Yhat =         +)。估计:TP/(TP+FP)。

precrecision. Same as ppv.
prec:精密。与ppv相同。

npv:Negative predictive value. P(Y = - | Yhat =         -). Estimated as: TN/(TN+FN).
npv:阴性预测值。 P(Y = - | Yhat =         -)。估计:TN/(TN+FN)。

pcfallrediction-conditioned fallout. P(Y = - | Yhat =         +). Estimated as: FP/(TP+FP).
pcfall:空调,预测后果。 P(Y = - | Yhat =         +)。估计:FP/(TP+FP)。

pcmissrediction-conditioned miss. P(Y = + | Yhat =         -). Estimated as: FN/(TN+FN).
pcmiss“:预测空调小姐的。 P(Y = + | Yhat =         -)。估计:FN/(TN+FN)。

rpp:Rate of positive predictions. P(Yhat = +). Estimated as: (TP+FP)/(TP+FP+TN+FN).
rpp:积极的预测。 P(Yhat = +)。测算公式为:(TP + FP)/(TP + FP + TN + FN)。

rnp:Rate of negative predictions. P(Yhat = -). Estimated as: (TN+FN)/(TP+FP+TN+FN).
rnp:阴性预测率。 P(Yhat = -)。测算公式为:(TN + FN)/(TP + FP + TN + FN)。

phihi correlation coefficient. (TP*TN -         FP*FN)/(sqrt((TP+FN)*(TN+FP)*(TP+FP)*(TN+FN))). Yields a number between -1 and 1, with 1 indicating a perfect prediction, 0 indicating a random prediction. Values below 0 indicate a worse than random prediction.
phi:皮皮的相关系数。 (TP*TN -         FP*FN)/(sqrt((TP+FN)*(TN+FP)*(TP+FP)*(TN+FN)))。产生一个-1和1之间的数,用1表示一个完美的预测,0表示一个随机预测。小于0的值表示一个更坏的比随机预测。

mat:Matthews correlation coefficient. Same as phi.
mat:马修斯相关系数。与phi相同。

mi:Mutual information. I(Yhat, Y) := H(Y) - H(Y | Yhat), where H is the (conditional) entropy. Entropies are estimated naively (no bias correction).
mi:互信息。 I(Yhat, Y) := H(Y) - H(Y | Yhat),其中H(视情况而定)的熵。熵估计天真地(无偏置校正)。

chisq:Chi square test statistic. ?chisq.test for details. Note that R might raise a warning if the sample size is too small.
chisq:卡方检验统计量。 ?chisq.test的详细信息。需要注意的是R可能发出警告,如果样本量太小。

odds:Odds ratio. (TP*TN)/(FN*FP). Note that odds ratio produces Inf or NA values for all cutoffs corresponding to FN=0 or FP=0. This can substantially decrease the plotted cutoff region.
odds:赔率比。 (TP*TN)/(FN*FP)。请注意,比值比Inf或截断相应的FN = 0或FP = 0的NA值。这可以大大地减少了绘制的截止区。

liftift value. P(Yhat = + |         Y = +)/P(Yhat = +).
lift:电梯值。 P(Yhat = + |         Y = +)/P(Yhat = +)。

frecision-recall F measure (van Rijsbergen, 1979). Weighted harmonic mean of precision (P) and recall (R). F = 1/         (alpha*1/P + (1-alpha)*1/R). If alpha=1/2, the mean is balanced. A frequent equivalent formulation is F = (beta^2+1) * P * R / (R + beta^2 * P). In this formulation, the mean is balanced if beta=1. Currently, ROCR only accepts the alpha version as input (e.g. alpha=0.5). If no value for alpha is given, the mean will be balanced by default.
f:精密召回F的措施(面包车Rijsbergen,1979年)。加权调和平均数的精度(P)和召回率(R)。 F = 1/         (alpha*1/P + (1-alpha)*1/R)。如果alpha=1/2,平均值平衡。一个常见的相当于的配方是F = (beta^2+1) * P * R / (R + beta^2 * P)。在这个公式中,平均是平衡的,如果beta=1。目前,ROCR只接受输入的alpha版本(如:alpha=0.5)。如果没有阿尔法值,平均默认情况下,将平衡。

rch:ROC convex hull. A ROC (=tpr vs fpr) curve with concavities (which represent suboptimal choices of cutoff) removed (Fawcett 2001). Since the result is already a parametric performance curve, it cannot be used in combination with other measures.
rch:ROC凸包。 ROC(=tpr与fpr)曲线与凹(占截止的次优选择)删除(福塞特2001年)。由于结果已是一个参数的性能曲线,它不能被用于组合与其他措施。

auc:Area under the ROC curve. This is equal to the value of the Wilcoxon-Mann-Whitney test statistic and also the probability that the classifier will score are randomly drawn positive sample higher than a randomly drawn negative sample. Since the output of auc is cutoff-independent, this measure cannot be combined with other measures into a parametric curve. The partial area under the ROC curve up to a given false positive rate can be calculated by passing the optional parameter fpr.stop=0.5 (or any other value between 0 and 1) to performance.
auc:ROC曲线下面积。这是相等的值的Wilcoxon-Mann-Whitney检验统计量和分类的概率将比分随机抽取的阳性样品高于一个随机绘制阴性样品的。由于输出auc是独立于截止,这项措施不能与其他措施结合成一个参数的曲线。到一个给定的假阳性率的ROC曲线下的局部区域,可以计算通过可选参数fpr.stop=0.5(或任何其它的值在0和1之间)为performance。

prberecision-recall break-even point. The cutoff(s) where precision and recall are equal. At this point, positive and negative predictions are made at the same rate as their prevalence in the data. Since the output of prbe is just a cutoff-independent scalar, this measure cannot be combined with other measures into a parametric curve.
prbe:精密召回的盈亏平衡点。截止(S)的查准率和查全是相等的。此时,阳性和阴性预测作为他们的患病率中的数据以相同的速率。由于输出的prbe只是一个截止独立的标量,这一措施不能与其他措施相结合成一个参数的曲线。

cal:Calibration error. The calibration error is the absolute difference between predicted confidence and actual reliability. This error is estimated at all cutoffs by sliding a window across the range of possible cutoffs. The default window size of 100 can be adjusted by passing the optional parameter window.size=200 to performance. E.g., if for several positive samples the output of the classifier is around 0.75, you might expect from a well-calibrated classifier that the fraction of them which is correctly predicted as positive is also around 0.75. In a well-calibrated classifier, the probabilistic confidence estimates are realistic. Only for use with probabilistic output (i.e. scores between 0 and 1).
cal:校准错误。校准误差预测信心和实际的可靠性之间的绝对差值。这个错误是通过滑动窗口在整个范围内可能的临界值估计在所有临界值。 100默认的窗口大小可以调整,通过可选参数window.size=200performance。例如,如果几个阳性标本的分类器的输出是0.75左右,你可能期望从一个校准分类的正确预测为阳性的比例也大约是0.75。在校准分类,概率的信心,估计是现实的。仅适用于使用概率输出(即0和1之间的分数)。

mxe:Mean cross-entropy. Only for use with probabilistic output. MXE := - 1/(P+N) \sum_{y_i=+}         ln(yhat_i) + \sum_{y_i=-} ln(1-yhat_i). Since the output of mxe is just a cutoff-independent scalar, this measure cannot be combined with other measures into a parametric curve.
mxe:平均交叉熵。仅适用于使用概率输出。 MXE := - 1/(P+N) \sum_{y_i=+}         ln(yhat_i) + \sum_{y_i=-} ln(1-yhat_i)。由于输出的mxe只是一个截止独立的标量,这一措施不能与其他措施相结合成一个参数的曲线。

rmse:Root-mean-squared error. Only for use with numerical class labels. RMSE := sqrt(1/(P+N) \sum_i (y_i -         yhat_i)^2). Since the output of rmse is just a cutoff-independent scalar, this measure cannot be combined with other measures into a parametric curve.
rmse:根均方误差。只有为使用数值类标签。 RMSE := sqrt(1/(P+N) \sum_i (y_i -         yhat_i)^2)。由于输出的rmse只是一个截止独立的标量,这一措施不能与其他措施相结合成一个参数的曲线。

sar:Score combinining performance measures of different characteristics, in the attempt of creating a more "robust" measure (cf. Caruana R., ROCAI2004): SAR = 1/3 * ( Accuracy + Area under the ROC curve + Root mean-squared error ).
sar:的分数combinining性能措施的不同特点,在试图创造一个更加“健壮”的措施(参见卡鲁阿纳R.,ROCAI2004):SAR = 1/3 *(精度+ ROC曲线下面积+均方根误差)。

ecost:Expected cost. For details on cost curves, cf. Drummond&Holte 2000,2004. ecost has an obligatory x axis, the so-called 'probability-cost function'; thus it cannot be combined with other measures. While using ecost one is interested in the lower envelope of a set of lines, it might be instructive to plot the whole set of lines in addition to the lower envelope. An example is given in demo(ROCR).
ecost:预期成本。对于成本曲线,比照。德拉蒙德与:霍尔特2000,2004。 ecost具有强制性的x轴,所谓的“概率成本函数”,因此它不能与其他措施相结合。虽然使用ecost中的下包络线的一组线的一个是有兴趣的,它可能会对指导绘制整个组的行中除了下包络线。在demo(ROCR)给出了一个例子。

cost:Cost of a classifier when class-conditional misclassification costs are explicitly given. Accepts the optional parameters cost.fp and cost.fn, by which the costs for false positives and negatives can be adjusted, respectively. By default, both are set to 1.   
cost:成本类条件时误分类成本都明确给出了分类。接受可选参数cost.fp和cost.fn,其中误报和漏报的成本可以调整,分别。缺省情况下,被设置为1。


值----------Value----------

An S4 object of class performance.
S4表现的类的对象。


注意----------Note----------

Here is how to call 'performance' to create some standard evaluation plots:
下面是如何调用“表演”,创建一些标准的评价图:

  


ROC curves:measure="tpr", x.measure="fpr".
ROC曲线:测量=“TPR”,x.measure =“FPR”。

Precision/recall graphs:measure="prec", x.measure="rec".
图:测量精度/召回=“prec”的,x.measure =“REC”。

Sensitivity/specificity plots:measure="sens", x.measure="spec".
图:测量灵敏度/特异性=“SENS”,x.measure =“规格”。

Lift charts:measure="lift", x.measure="rpp".   
提升图:测量=“升降机”,x.measure =“香港康复计划方案”。


(作者)----------Author(s)----------


Tobias Sing <a href="mailto:tobias.sing@mpi-sb.mpg.de">tobias.sing@mpi-sb.mpg.de</a>,
Oliver Sander <a href="mailtosander@mpi-sb.mpg.de">osander@mpi-sb.mpg.de</a>



参考文献----------References----------



参见----------See Also----------

prediction, prediction-class,
prediction,prediction-class,


实例----------Examples----------


## computing a simple ROC curve (x-axis: fpr, y-axis: tpr)[#计算一个简单的ROC曲线(X轴:玻璃钢,Y轴:TPR)]
library(ROCR)
data(ROCR.simple)
pred <- prediction( ROCR.simple$predictions, ROCR.simple$labels)
perf <- performance(pred,"tpr","fpr")
plot(perf)

## precision/recall curve (x-axis: recall, y-axis: precision)[#曲线(X轴:召回,Y轴:精度精度/召回)]
perf1 <- performance(pred, "prec", "rec")
plot(perf1)

## sensitivity/specificity curve (x-axis: specificity,[#灵敏度/特异性曲线(x-轴:特异性,]
## y-axis: sensitivity)[#Y轴灵敏度)]
perf1 <- performance(pred, "sens", "spec")
plot(perf1)

转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2024-11-25 14:31 , Processed in 0.022035 second(s), 16 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表