cvRlars(robustHD)
cvRlars()所属R语言包:robustHD
Cross-validation along a robust least angle regression sequence
沿着一个强大的最小角度回归序列的交叉验证
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Estimate the prediction error of submodels along a robust least angle regression sequence via (repeated) K-fold cross-validation.
(重复)K的-折交叉验证通过沿着一个强大的最小角度回归序列的子模型的预测误差估计。
用法----------Usage----------
cvRlars(x, ...)
## S3 method for class 'formula'
cvRlars(formula, data, ...)
## Default S3 method:
cvRlars(x, y, cost = rtmspe, K = 5,
R = 1,
foldType = c("random", "consecutive", "interleaved"),
folds = NULL, selectBest = c("min", "hastie"),
seFactor = 1, active, s = NULL, regFun = lmrob,
regArgs = list(), seed = NULL, ...)
参数----------Arguments----------
参数:formula
a formula describing the full model.
公式描述的完整模型。
参数:data
an optional data frame, list or environment (or object coercible to a data frame by as.data.frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which cvRlars is called.
一个可选的数据框,列表或环境(或对象转换成一个数据框由as.data.frame)包含在模型中的变量。如果没有找到,数据,变量environment(formula),通常是cvRlars被称为环境。
参数:x
a matrix or data frame containing the candidate predictors.
矩阵或数据框包含的候选预测。
参数:y
a numeric vector containing the response.
一个数值向量包含响应。
参数:cost
a robust cost function measuring prediction loss. It should expect vectors to be passed as its first two arguments, the first corresponding to the observed values of the response and the second to the predicted values, and must return a non-negative scalar value. The default is to use the root trimmed mean squared prediction error (see cost).
一个强大的成本函数测量预测的损失。它应该期待向量就可以通过它的前两个参数,第一个对应的观测值的响应和第二年的预测值,并且必须返回一个非负的标值。默认情况下是使用根修剪平均预测误差平方(见cost“)。
参数:K
an integer giving the number of groups into which the data should be split (the default is five). Keep in mind that this should be chosen such that all groups are of approximately equal size. Setting K equal to n yields leave-one-out cross-validation.
一个整数,数组的数据应该分开(默认为5)。请记住,这应该是所有组大小约等于选择了这样。设置Kn产量留一交叉验证。
参数:R
an integer giving the number of replications for repeated K-fold cross-validation. This is ignored for for leave-one-out cross-validation and other non-random splits of the data.
一个整数,代表数的重复,重复K倍交叉验证。这是离开了交叉验证和其他非随机的数据分割忽略。
参数:foldType
a character string specifying the type of folds to be generated. Possible values are "random" (the default), "consecutive" or "interleaved".
一个字符串指定要产生的褶皱的类型。可能值是"random"(默认值),"consecutive"或"interleaved"。
参数:folds
an object of class "cvFolds" giving the folds of the data for cross-validation (as returned by cvFolds). If supplied, this is preferred over K and R.
类的一个对象"cvFolds"给的数据进行交叉验证的褶皱(返回cvFolds)。如果提供,这是优于K和R。
参数:selectBest
a character string specifying a criterion for selecting the best model. Possible values are "min" (the default) or "hastie". The former selects the model with the smallest prediction error. The latter selects the most parsimonious model whose prediction error is no larger than seFactor standard errors above the prediction error of the best overall model.
一个字符串,指定标准选择的最佳模式。可能的值"min"(默认值)或"hastie"。前者的最小的预测误差,选择模型。后者选择最简约的模型,其预测误差不大于seFactor以上的标准误差的最佳整体模型的预测误差的。
参数:seFactor
a numeric value giving a multiplication factor of the standard error for the selection of the best model. This is ignored if selectBest is "min".
一个数字值,该值给予一个乘法因子的最佳模式的选择的标准误。这是忽略的,如果selectBest是"min"。
参数:active
an integer vector containing the sequence of predictor groups (as returned by rlars).
一个整数向量序列的预测组(返回rlars)。
参数:s
an integer vector giving the steps of the submodels for which to estimate the prediction errors (the default is to use all steps along the sequence as long as there are twice as many observations as predictors).
一个整数向量给子模型估计的预测误差(默认为沿该序列使用的所有步骤,只要有作为预测的两倍多观察)的步骤。
参数:regFun
a function to compute robust linear regressions (defaults to lmrob).
一个函数来计算鲁棒线性的回归(默认为lmrob)。
参数:regArgs
a list of arguments to be passed to regFun.
要传递给regFun的参数列表。
参数:seed
optional initial seed for the random number generator (see .Random.seed).
可选的初始种子的随机数发生器(见.Random.seed“)。
参数:...
additional arguments to be passed to the prediction loss function cost.
额外的参数传递的预测损失函数cost。
值----------Value----------
An object of class "cvSeqModel" (which inherits from class "cvSelect") with the following components:
类的一个对象"cvSeqModel"(继承自类"cvSelect")有以下组件:
参数:n
an integer giving the number of observations.
一个整数,给出了若干意见。
参数:K
an integer giving the number of folds used in cross-validation.
一个整数,给出交叉验证中使用的倍数的数目。
参数:R
an integer giving the number of replications used in cross-validation.
一个整数,给出交叉验证中使用的复制的数目。
参数:best
an integer giving the index of the submodel with the best prediction performance.
一个整数,给出的最佳的预测性能的索引的子模型。
参数:cv
a data frame containing the estimated prediction errors for the submodels. For repeated cross-validation, those are average values over all replications.
一个数据框包含的子模型的预测误差估计。对于重复交叉验证,这些都是在所有复制的平均值。
参数:se
a data frame containing the estimated standard errors of the prediction loss for the submodels.
一个数据框包含的子模型的预测损失的估计标准误差。
参数:selectBest
a character string specifying the criterion used for selecting the best model.
一个字符串,指定用于选择最佳的模型的标准。
参数:seFactor
a numeric value giving the multiplication factor of the standard error used for the selection of the best model.
一个数字值,用于选择的最好的模型的标准误差给出的乘法系数。
参数:reps
a data frame containing the estimated prediction errors for the submodels from all replications. This is only returned for repeated cross-validation.
一个数据框包含的所有复制子模型的预测误差估计。这是只返回进行反复交叉验证。
参数:call
the matched function call.
匹配的函数调用。
(作者)----------Author(s)----------
Andreas Alfons
参见----------See Also----------
repCV.rlars, rlars, predict.rlars, cvFolds, cost
repCV.rlars,rlars,predict.rlars,cvFolds,cost
实例----------Examples----------
## generate data[#生成数据]
# example is not high-dimensional to keep computation time low[例如不高维的计算时间保持低]
set.seed(1234) # for reproducibility[可重复性]
n <- 100 # number of observations[的观测数]
p <- 25 # number of variables[的变量数目]
beta <- rep.int(c(1, 0), c(5, p-5)) # coefficients[系数]
sigma <- 0.5 # controls signal-to-noise ratio[控制的信号 - 噪声比]
epsilon <- 0.1 # contamination level[污染水平]
x <- replicate(p, rnorm(n)) # predictor matrix[预测矩阵]
e <- rnorm(n) # error terms[误差项]
i <- 1:ceiling(epsilon*n) # observations to be contaminated[受到污染的意见]
e[i] <- e[i] + 5 # vertical outliers[垂直离群]
y <- c(x %*% beta + sigma * e) # response[响应]
x[i,] <- x[i,] + 5 # bad leverage points[坏的平衡点]
## obtain robust LARS sequence[#获得强大的LARS序列]
active <- rlars(x, y, fit = FALSE)
## evaluate models along sequence[#评估模型以及序列]
cv <- cvRlars(x, y, active = active, selectBest = "hastie",
includeSE = TRUE)
cv
dotplot(cv)
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|