cvSparseLTS(robustHD)
cvSparseLTS()所属R语言包:robustHD
Cross-validation for sparse LTS regression models
稀疏LTS回归模型的交叉验证
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Estimate the prediction error of sparse least trimmed squares regression models for various values of the penalty parameter via (repeated) K-fold cross-validation.
估计通过(重复)K的-折交叉验证的预测误差稀疏至少修剪最小二乘回归模型的惩罚参数不同的值。
用法----------Usage----------
cvSparseLTS(x, ...)
## S3 method for class 'formula'
cvSparseLTS(formula, data, ...)
## Default S3 method:
cvSparseLTS(x, y, cost = rtmspe,
K = 5, R = 1,
foldType = c("random", "consecutive", "interleaved"),
folds = NULL, fit = c("reweighted", "raw", "both"),
selectBest = c("min", "hastie"), seFactor = 1, lambda,
mode = c("lambda", "fraction"), alpha = 0.75,
intercept = TRUE, nsamp = c(500, 10),
initial = c("sparse", "hyperplane", "random"),
ncstep = 2, use.correction = TRUE,
tol = .Machine$double.eps^0.5,
eps = .Machine$double.eps, use.Gram = TRUE,
centerFun = median, scaleFun = mad, const = 2,
prob = 0.95, fallback = FALSE, seed = NULL, ...)
参数----------Arguments----------
参数:formula
a formula describing the model.
描述的模型的公式。
参数:data
an optional data frame, list or environment (or object coercible to a data frame by as.data.frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which cvSparseLTS is called.
一个可选的数据框,列表或环境(或对象转换成一个数据框由as.data.frame)包含在模型中的变量。如果没有找到,数据,变量environment(formula),通常是cvSparseLTS被称为环境。
参数:x
a numeric matrix containing the predictor variables.
一个包含预测变量的数值矩阵。
参数:y
a numeric vector containing the response variable.
一个数字包含响应变量的向量。
参数:cost
a robust cost function measuring prediction loss. It should expect vectors to be passed as its first two arguments, the first corresponding to the observed values of the response and the second to the predicted values, and must return a non-negative scalar value. The default is to use the root trimmed mean squared prediction error (see cost).
一个强大的成本函数测量预测的损失。它应该期待向量就可以通过它的前两个参数,第一个对应的观测值的响应和第二年的预测值,并且必须返回一个非负的标值。默认情况下是使用根修剪平均预测误差平方(见cost“)。
参数:K
an integer giving the number of groups into which the data should be split (the default is five). Keep in mind that this should be chosen such that all groups are of approximately equal size. Setting K equal to n yields leave-one-out cross-validation.
一个整数,数组的数据应该分开(默认为5)。请记住,这应该是所有组大小约等于选择了这样。设置Kn产量留一交叉验证。
参数:R
an integer giving the number of replications for repeated K-fold cross-validation. This is ignored for for leave-one-out cross-validation and other non-random splits of the data.
一个整数,代表数的重复,重复K倍交叉验证。这是离开了交叉验证和其他非随机的数据分割忽略。
参数:foldType
a character string specifying the type of folds to be generated. Possible values are "random" (the default), "consecutive" or "interleaved".
一个字符串指定要产生的褶皱的类型。可能值是"random"(默认值),"consecutive"或"interleaved"。
参数:folds
an object of class "cvFolds" giving the folds of the data for cross-validation (as returned by cvFolds). If supplied, this is preferred over K and R.
类的一个对象"cvFolds"给的数据进行交叉验证的褶皱(返回cvFolds)。如果提供,这是优于K和R。
参数:fit
a character string specifying for which fit to estimate the prediction error. Possible values are "reweighted" (the default) for the prediction error of the reweighted fit, "raw" for the prediction error of the raw fit, or "both" for the prediction error of both fits.
一个字符的字符串,它指定的适合的预测误差的估计。可能的值是"reweighted"(默认值)的预测误差的重加权拟合,"raw"的预测误差的原料配合,或"both"的预测误差都适合。
参数:selectBest
a character string specifying a criterion for selecting the best model. Possible values are "min" (the default) or "hastie". The former selects the model with the smallest prediction error. The latter selects the model with the largest penalty parameter whose prediction error is no larger than seFactor standard errors above the prediction error of the best overall model.
一个字符串,指定标准选择的最佳模式。可能的值"min"(默认值)或"hastie"。前者的最小的预测误差,选择模型。后者选择模型,最大的惩罚参数,其预测误差不大于seFactor以上的标准误差的最佳整体模型的预测误差的。
参数:seFactor
a numeric value giving a multiplication factor of the standard error for the selection of the best model. This is ignored if selectBest is "min".
一个数字值,该值给予一个乘法因子的最佳模式的选择的标准误。这是忽略的,如果selectBest是"min"。
参数:lambda
a numeric vector of non-negative values giving the penalty parameters for which to estimate the prediction error.
非负的值给出的罚款参数估计的预测误差的一个数值向量。
参数:mode
a character string specifying the type of penalty parameter. If "lambda", this gives penalty parameter directly. If "fraction", the smallest value of the penalty parameter that sets all coefficients to 0 is first estimated based on bivariate winsorization, then lambda gives the fraction of that estimate to be used (hence lambda should be in the interval [0,1] in that case).
一个字符串指定类型参数的罚款。如果"lambda",这给直接罚参数。如果"fraction",罚参数,设置所有的系数为0的值最小的第一估计基于二元极值调整,然后lambda给出的分数,估计要使用(因此lambda应该是在区间[0,1]在这种情况下)。
参数:alpha
a numeric value giving the percentage of the residuals for which the L1 penalized sum of squares should be minimized (the default is 0.75).
一个数字值,该值给人L1惩罚平方总和应当被最小化(缺省值是0.75)的残差的百分比。
参数:intercept
a logical indicating whether a constant term should be included in the model (the default is TRUE).
逻辑指示是否应包含在模型中的常数项(默认为TRUE)。
参数:nsamp
a numeric vector giving the number of subsamples to be used in the two phases of the algorithm. The first element gives the number of initial subsamples to be used. The second element gives the number of subsamples to keep after the first phase of ncstep C-steps. For those remaining subsets, additional C-steps are performed until convergence. The default is to first perform ncstep C-steps on 500 initial subsamples, and then to keep the 10 subsamples with the lowest value of the objective function for additional C-steps until convergence.
一个数值向量中要使用的算法的两个阶段的数目的子样本。的第一个元素给出了要使用的初始子样本的数目。第二个元素给出的数字的子样本后要保留的第一阶段ncstepC-步骤。对于那些剩余的子集,额外的C-步骤进行,直到收敛。默认情况下是先执行ncstepC-500初始子样本的步骤,然后保持在10个子样本与最低值的目标函数额外的步骤,直到C-收敛。
参数:initial
a character string specifying the type of initial subsamples to be used. If "sparse", the lasso fit given by three randomly selected data points is first computed. The corresponding initial subsample is then formed by the fraction alpha of data points with the smallest squared residuals. Note that this is optimal from a robustness point of view, as the probability of including an outlier in the initial lasso fit is minimized. If "hyperplane", a hyperplane through p randomly selected data points is first computed, where p denotes the number of variables. The corresponding initial subsample is then again formed by the fraction alpha of data points with the smallest squared residuals. Note that this cannot be applied if p is larger than the number of observations. Nevertheless, the probability of including an outlier increases with increasing dimension p. If "random", the initial subsamples are given by a fraction alpha of randomly selected data points. Note that this leads to the largest probability of including an outlier.
一个字符串,指定要使用的初始子样本的类型。如果"sparse",套索配合先计算由三个随机选择的数据点。相应的初始子样本,然后形成由馏分alpha残差平方最小的数据点。请注意,从一个鲁棒性的角度来看,这是最佳的,因为在初始套索适合包括一个离群的概率最小化。如果"hyperplane",p随机选择的数据点的超平面通过首先计算,其中p表示的变量的数量。相应的初始子样本,然后再次形成由馏分alpha残差平方最小的数据点。请注意,这不能适用于p如果是大于的若干意见。尽管如此,包括一个离群的概率增加而增加维p。如果"random",初始子样本给出alpha对随机选择的数据点的一小部分。请注意,这导致包括离群值的概率最大的。
参数:ncstep
a positive integer giving the number of C-steps to perform on all subsamples in the first phase of the algorithm (the default is to perform two C-steps).
一个正整数,C-步骤上执行的所有子样本的数量,在第一阶段的算法(默认的是执行两个C-步骤)。
参数:use.correction
currently ignored. The consistency factor for the residual scale estimate is always applied.
目前被忽略。的剩余规模估计总是适用的一致性因素。
参数:tol
a small positive numeric value giving the tolerance for convergence.
一个小的正数值给容忍收敛。
参数:eps
a small positive numeric value used to determine whether the variability within a variable is too small (an effective zero).
一个小的正数值使用,以确定是否一个变量内的变异是太小(一种有效的零)。
参数:use.Gram
a logical indicating whether the Gram matrix of the explanatory variables should be precomputed in the lasso fits (the default is TRUE). If the number of variables is large (e.g., larger than the number of observations), computation may be faster when this is set to FALSE.
一个逻辑适合指示是否应预先计算的套索革兰氏解释变量矩阵(默认为TRUE)。如果变量的数目大(如大于观测数),计算时可能会更快,这是设置为FALSE。
参数:centerFun
a function to compute a robust estimate for the center to be used for robust standardization (defaults to median). Ignored if standardized is TRUE.
一个函数来计算一个稳健估计为中心,以强大的的标准化(默认为median)使用。忽略如果standardized是TRUE。
参数:scaleFun
a function to compute a robust estimate for the scale to be used for robust standardization (defaults to mad). Ignored if standardized is TRUE.
一个功能强大的标准化(默认到mad)用于计算一个强大的规模估计。忽略如果standardized是TRUE。
参数:const
numeric; tuning constant to be used in univariate winsorization (defaults to 2).
数字;时间常数使用单因素极值调整(默认为2)。
参数:prob
numeric; probability for the quantile of the chi-squared distribution to be used in multivariate winsorization (defaults to 0.95).
数字;概率分位数的chi-squared分布的的多元极值调整(默认为0.95)。
参数:fallback
a logical indicating whether standardization with mean and sd should be performed as a fallback mode for variables whose robust scale estimate is too small. This is useful, e.g., for data containing dummy variables.
逻辑指示是否标准化mean和sd应作为备用模式的变量,其强大的规模估计太小。这是非常有用的,例如,包含虚拟变量的数据。
参数:seed
optional initial seed for the random number generator (see .Random.seed).
可选的初始种子的随机数发生器(见.Random.seed“)。
参数:...
additional arguments to be passed to the prediction loss function cost.
额外的参数传递的预测损失函数cost。
值----------Value----------
An object of class "cvSparseLTS" (which inherits from classes "cvTuning" and "cvSelect") with the following components:
类的一个对象"cvSparseLTS"(继承自类"cvTuning"和"cvSelect")有以下组件:
参数:n
an integer giving the number of observations.
一个整数,给出了若干意见。
参数:K
an integer giving the number of folds.
一个整数,给出的倍数的数目。
参数:R
an integer giving the number of replications.
一个整数,复制数量。
参数:tuning
a data frame containing the values of the penalty parameter for which the prediction error was estimated.
的数据框包含的罚参数的预测误差的估计值。
参数:best
an integer vector giving the indices of the optimal penalty parameter for the requested model fits.
给一个整数向量的最优惩罚参数的要求的模型拟合的指数。
参数:cv
a data frame containing the estimated prediction errors of the requested model fits for all values of the penalty parameter.
一个数据框包含所要求的模型预测误差的估计,适用于所有的惩罚参数值。
参数:se
a data frame vector containing the estimated standard errors of the prediction loss for the requested model fits for all values of the penalty parameter.
一个数据框向量的预测损失的估计标准误差所要求的模型,适用于所有的惩罚参数值。
参数:selectBest
a character string specifying the criterion used for selecting the best model.
一个字符串,指定用于选择最佳的模型的标准。
参数:seFactor
a numeric value giving the multiplication factor of the standard error used for the selection of the best model.
一个数字值,用于选择的最好的模型的标准误差给出的乘法系数。
参数:reps
a data frame containing the estimated prediction errors of the requested model fits from all replications and for all values of the penalty parameter. This is only returned for repeated cross-validation.
一个数据框包含所要求的模型预测误差的估计适合所有重复,所有的惩罚参数的值。这是只返回进行反复交叉验证。
参数:seed
the seed of the random number generator before cross-validation was performed.
进行交叉验证之前的随机数发生器的种子。
参数:call
the matched function call.
匹配的函数调用。
(作者)----------Author(s)----------
Andreas Alfons
参见----------See Also----------
repCV.sparseLTS, sparseLTS, predict.sparseLTS, cvFolds, cost
repCV.sparseLTS,sparseLTS,predict.sparseLTS,cvFolds,cost
实例----------Examples----------
## generate data[#生成数据]
# example is not high-dimensional to keep computation time low[例如不高维的计算时间保持低]
library("mvtnorm")
set.seed(1234) # for reproducibility[可重复性]
n <- 100 # number of observations[的观测数]
p <- 25 # number of variables[的变量数目]
beta <- rep.int(c(1, 0), c(5, p-5)) # coefficients[系数]
sigma <- 0.5 # controls signal-to-noise ratio[控制的信号 - 噪声比]
epsilon <- 0.1 # contamination level[污染水平]
Sigma <- 0.5^t(sapply(1:p, function(i, j) abs(i-j), 1:p))
x <- rmvnorm(n, sigma=Sigma) # predictor matrix[预测矩阵]
e <- rnorm(n) # error terms[误差项]
i <- 1:ceiling(epsilon*n) # observations to be contaminated[受到污染的意见]
e[i] <- e[i] + 5 # vertical outliers[垂直离群]
y <- c(x %*% beta + sigma * e) # response[响应]
x[i,] <- x[i,] + 5 # bad leverage points[坏的平衡点]
## evaluate sparse LTS models over a grid of values for lambda[#评估稀疏LTS模型的lambda值的网格]
frac <- seq(0.25, 0.05, by = -0.05)
cv <- cvSparseLTS(x, y, lambda = frac, mode = "fraction",
selectBest = "hastie", includeSE = TRUE)
cv
plot(cv)
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|