R语言 robustHD包 sparseLTS()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-9-27 22:25:20

sparseLTS(robustHD)
sparseLTS()所属R语言包：robustHD

                                    Sparse least trimmed squares regression
                                       稀疏至少修剪最小二乘回归

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

Compute least trimmed squares regression with an L1 penalty on the regression coefficients, which allows for sparse model estimates.
L1刑罚的回归系数，这使得稀疏模型估计至少修剪最小二乘回归计算。

用法----------Usage----------

  sparseLTS(x, ...)

  ## S3 method for class 'formula'
sparseLTS(formula, data, ...)

  ## Default S3 method:
sparseLTS(x, y, lambda,
mode = c("lambda", "fraction"), alpha = 0.75,
intercept = TRUE, nsamp = c(500, 10),
initial = c("sparse", "hyperplane", "random"),
ncstep = 2, use.correction = TRUE,
tol = .Machine$double.eps^0.5,
eps = .Machine$double.eps, use.Gram = TRUE,
seed = NULL, model = TRUE, ...)

参数----------Arguments----------

参数：formula
a formula describing the model.
描述的模型的公式。

参数：data
an optional data frame, list or environment (or object coercible to a data frame by as.data.frame) containing the variables in the model.  If not found in data, the variables are taken from environment(formula), typically the environment from which sparseLTS is called.
一个可选的数据框，列表或环境（或对象转换成一个数据框由as.data.frame）包含在模型中的变量。如果没有找到，数据，变量environment(formula)，通常是sparseLTS被称为环境。

参数：x
a numeric matrix containing the predictor variables.
一个包含预测变量的数值矩阵。

参数：y
a numeric vector containing the response variable.
一个数字包含响应变量的向量。

参数：lambda
a non-negative numeric value giving the penalty parameter.
一个非负的数值，给予刑罚参数。

参数：mode
a character string specifying the type of penalty parameter.  If "lambda", lambda gives penalty parameter directly.  If "fraction", the smallest value of the penalty parameter that sets all coefficients to 0 is first estimated based on bivariate winsorization, then lambda gives the fraction of that estimate to be used (hence lambda should be in the interval [0,1] in that case).
一个字符串指定类型参数的罚款。如果"lambda"，lambda直接给出了惩罚参数。如果"fraction"，罚参数，设置所有的系数为0的值最小的第一估计基于二元极值调整，然后lambda给出的分数，估计要使用（因此lambda应该是在区间[0,1]在这种情况下）。

参数：alpha
a numeric value giving the percentage of the residuals for which the L1 penalized sum of squares should be minimized (the default is 0.75).
一个数字值，该值给人L1惩罚平方总和应当被最小化（缺省值是0.75）的残差的百分比。

参数：intercept
a logical indicating whether a constant term should be included in the model (the default is TRUE).
逻辑指示是否应包含在模型中的常数项（默认为TRUE）。

参数：nsamp
a numeric vector giving the number of subsamples to be used in the two phases of the algorithm. The first element gives the number of initial subsamples to be used.  The second element gives the number of subsamples to keep after the first phase of ncstep C-steps.  For those remaining subsets, additional C-steps are performed until convergence.  The default is to first perform ncstep C-steps on 500 initial subsamples, and then to keep the 10 subsamples with the lowest value of the objective function for additional C-steps until convergence.
一个数值向量中要使用的算法的两个阶段的数目的子样本。的第一个元素给出了要使用的初始子样本的数目。第二个元素给出的数字的子样本后要保留的第一阶段ncstepC-步骤。对于那些剩余的子集，额外的C-步骤进行，直到收敛。默认情况下是先执行ncstepC-500初始子样本的步骤，然后保持在10个子样本与最低值的目标函数额外的步骤，直到C-收敛。

参数：initial
a character string specifying the type of initial subsamples to be used.  If "sparse", the lasso fit given by three randomly selected data points is first computed.  The corresponding initial subsample is then formed by the fraction alpha of data points with the smallest squared residuals.  Note that this is optimal from a robustness point of view, as the probability of including an outlier in the initial lasso fit is minimized.  If "hyperplane", a hyperplane through p randomly selected data points is first computed, where p denotes the number of variables. The corresponding initial subsample is then again formed by the fraction alpha of data points with the smallest squared residuals. Note that this cannot be applied if p is larger than the number of observations.  Nevertheless, the probability of including an outlier increases with increasing dimension p. If "random", the initial subsamples are given by a fraction alpha of randomly selected data points. Note that this leads to the largest probability of including an outlier.
一个字符串，指定要使用的初始子样本的类型。如果"sparse"，套索配合先计算由三个随机选择的数据点。相应的初始子样本，然后形成由馏分alpha残差平方最小的数据点。请注意，从一个鲁棒性的角度来看，这是最佳的，因为在初始套索适合包括一个离群的概率最小化。如果"hyperplane"，p随机选择的数据点的超平面通过首先计算，其中p表示的变量的数量。相应的初始子样本，然后再次形成由馏分alpha残差平方最小的数据点。请注意，这不能适用于p如果是大于的若干意见。尽管如此，包括一个离群的概率增加而增加维p。如果"random"，初始子样本给出alpha对随机选择的数据点的一小部分。请注意，这导致包括离群值的概率最大的。

参数：ncstep
a positive integer giving the number of C-steps to perform on all subsamples in the first phase of the algorithm (the default is to perform two C-steps).
一个正整数，C-步骤上执行的所有子样本的数量，在第一阶段的算法（默认的是执行两个C-步骤）。

参数：use.correction
currently ignored.  Small sample correction factors may be added in the future.
目前被忽略。小样品校正因子可以在将来添加。

参数：tol
a small positive numeric value giving the tolerance for convergence.
一个小的正数值给容忍收敛。

参数：eps
a small positive numeric value used to determine whether the variability within a variable is too small (an effective zero).
一个小的正数值使用，以确定是否一个变量内的变异是太小（一种有效的零）。

参数：use.Gram
a logical indicating whether the Gram matrix of the explanatory variables should be precomputed in the lasso fits (the default is TRUE).  If the number of variables is large (e.g., larger than the number of observations), computation may be faster when this is set to FALSE.
一个逻辑适合指示是否应预先计算的套索革兰氏解释变量矩阵（默认为TRUE）。如果变量的数目大（如大于观测数），计算时可能会更快，这是设置为FALSE。

参数：seed
optional initial seed for the random number generator (see .Random.seed).
可选的初始种子的随机数发生器（见.Random.seed“）。

参数：model
a logical indicating whether the data x and y should be added to the return object.  If intercept is TRUE, a column of ones is added to x to account for the intercept.
逻辑指示是否数据x和y应该被添加到返回的对象。如果interceptTRUE“的一列被添加到x考虑到拦截。

参数：...
additional arguments to be passed down.
其他参数得以流传下去。

值----------Value----------

An object of class "sparseLTS" with the following components:
对象的类"sparseLTS"以下组件：

参数：best
an integer vector giving the best subset of h observations found and used for computing the raw estimates.
一个整数向量给h观测发现，用于计算的原始估计的最佳子集。

参数：objective
numeric; the value of the sparse LTS objective function, i.e., the L1 penalized sum of the h smallest squared residuals from the raw fit.
数字值的稀疏的的LTS目标功能，即L1处罚金额的h的最小残差平方从原材料的配合。

参数：coefficients
a numeric vector of coefficient estimates of the reweighted fit (including the intercept if intercept is TRUE).
重加权的配合（包括拦截如果intercept是一个数值向量的系数估计值的TRUE）。

参数：fitted.values
a numeric vector containing the fitted values of the response of the reweighted fit.
一个数值向量含有的响应的重加权拟合的拟合值。

参数：residuals
a numeric vector containing the residuals of the reweighted fit.
一个数值向量，其中包含重加权拟合的残差。

参数：center
a numeric value giving the robust center estimate of the reweighted residuals.
强劲的中心重加权残差估计一个数值。

参数：scale
a numeric value giving the robust scale estimate of the reweighted residuals.
一个数值，强大的规模估计的重加权残差。

参数：lambda
a numeric value giving the penalty parameter.
一个数字值，给予刑罚参数。

参数：intercept
a logical indicating whether the model includes a constant term.
逻辑指示是否该模型包括一个常数项。

参数：alpha
a numeric value giving the percentage of the residuals for which the L1 penalized sum of squares was minimized.
一个数字值，该值给人L1惩罚平方和最小化的残差的百分比。

参数：quan
the number h of observations used to compute the raw estimates.
的数目h用于计算的原始估计的观测。

参数：cnp2
a numeric value giving the consistency factor applied to the scale estimate of the reweighted residuals.
一个数值的规模估计重加权残差给出的一致性因素。

参数：weights
an integer vector containing binary weights that indicate outliers, i.e., the weights are 1 for observations with reasonably small reweighted residuals and 0 for observations with large reweighted residuals.
包含二进制的权重，表明异常值，即一个整数向量，权重是1观测相当小的重加权残差和0大重加权残差观测。

参数：df
an integer giving the degrees of freedom of the obtained reweighted model fit, i.e., the number of nonzero coefficient estimates.
一个整数，所得到的重加权模型的拟合，即非零系数估计数度自由。

参数：raw.coefficients
a numeric vector of coefficient estimates of the raw fit (including the intercept if intercept is TRUE).
一个数值向量的原始适合的系数估计值（包括拦截如果intercept是TRUE）。

参数：raw.residuals
a numeric vector containing the residuals of the raw fit.
一个数值向量包含的原始适合的残差。

参数：raw.center
a numeric value giving the robust center estimate of the raw residuals.
一个强劲的中心的原始残差估计的数值。

参数：raw.scale
a numeric value giving the robust scale estimate of the raw residuals.
一个数值，强大的规模估计的原始残差。

参数：raw.cnp2
a numeric value giving the consistency factor applied to the scale estimate of the raw residuals.
数值的值，该值提供的一致性因子施加到原料残差规模估计。

参数：raw.weights
an integer vector containing binary weights that indicate outliers of the raw fit, i.e., the weights used for the reweighted fit.
整数向量中的二进制权重的原料适合用于重加权拟合的权重，即表明离群点。

参数：x
the predictor matrix (if model is TRUE).
如果model的预测矩阵（是TRUE“）。

参数：y
the response variable (if model is TRUE).
model响应变量（如果TRUE）。

参数：call
the matched function call.
匹配的函数调用。

注意----------Note----------

Package robustHD has a built-in back end for sparse least trimmed squares using the C++ library Armadillo. Another back end is available through package sparseLTSEigen, which uses the C++ library Eigen. The latter is faster, but not available on all platforms. For instance, sparseLTSEigen currently does not work on 32-bit R for Windows.  In addition, there is currently no binary package for OS X available on CRAN due to problems with the PowerPC architecture. Nevertheless, OS X users with Intel machines can install RcppEigen and sparseLTSEigen from source if the standard R developer tools are installed.
套件robustHD有一个内置的后端稀疏至少修剪广场使用C + +库的犰狳。另一个后端是通过包sparseLTSEigen，使用C + +库征。后者速度较快，但并不适用于所有平台。例如，sparseLTSEigen目前不用于Windows的32位R。此外，目前在CRAN OS X PowerPC架构的问题，由于没有二进制包。然而，与Intel的机器可以安装OS X用户RcppEigen和sparseLTSEigen源标准的R开发工具的安装。

（作者）----------Author(s)----------

Andreas Alfons

参见----------See Also----------

sparseLTSGrid, coef.sparseLTS, fitted.sparseLTS, plot.sparseLTS, predict.sparseLTS, residuals.sparseLTS, weights.sparseLTS, ltsReg
sparseLTSGrid，coef.sparseLTS，fitted.sparseLTS，plot.sparseLTS，predict.sparseLTS，residuals.sparseLTS，weights.sparseLTS，ltsReg

实例----------Examples----------

## generate data[＃生成数据]
# example is not high-dimensional to keep computation time low[例如不高维的计算时间保持低]
library("mvtnorm")
set.seed(1234)  # for reproducibility[可重复性]
n <- 100  # number of observations[的观测数]
p <- 25 # number of variables[的变量数目]
beta <- rep.int(c(1, 0), c(5, p-5))  # coefficients[系数]
sigma <- 0.5    # controls signal-to-noise ratio[控制的信号 - 噪声比]
epsilon <- 0.1 # contamination level[污染水平]
Sigma <- 0.5^t(sapply(1:p, function(i, j) abs(i-j), 1:p))
x <- rmvnorm(n, sigma=Sigma) # predictor matrix[预测矩阵]
e <- rnorm(n)                # error terms[误差项]
i <- 1:ceiling(epsilon*n)    # observations to be contaminated[受到污染的意见]
e[i] <- e[i] + 5             # vertical outliers[垂直离群]
y <- c(x %*% beta + sigma * e)  # response[响应]
x[i,] <- x[i,] + 5             # bad leverage points[坏的平衡点]

## fit sparse LTS model[＃适合稀疏LTS模型]
fit <- sparseLTS(x, y, lambda = 0.05, mode = "fraction")
coef(fit, zeros = FALSE)

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册