R语言 SeleMix包 ml.est()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-9-30 00:29:42

ml.est(SeleMix)
ml.est()所属R语言包：SeleMix

                                       Fitting Contamination Model
                                       装修污染型号

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

Provides ML estimates of a Gaussian contamination model.
提供的高斯污染模型的似然估计。

用法----------Usage----------

ml.est (y, x=NULL, model = "LN", lambda=3,  w=0.05,
         lambda.fix=FALSE, w.fix=FALSE, eps=1e-7,
         max.iter=500, t.outl=0.5, graph=FALSE)

参数----------Arguments----------

参数：y
matrix or data frame containing the response variables
矩阵或数据框包含响应变量

参数：x
optional matrix or data frame containing the error free covariates
可选的矩阵或数据框包含的错误协变量

参数：model
data distribution: LN = lognormal(default), N=normal
数据分布：LN =对数正态分布（默认），N =正常

参数：lambda
starting value for the variance inflation factor (default=3)
初始值的方差膨胀因子（默认值= 3）

参数：w
starting value for the proportion of contaminated data (default=0.05)
开始的污染数据的比例值（默认= 0.05）

参数：lambda.fix
logical. TRUE if lambda is known
逻辑。 TRUE，如果lambda的

参数：w.fix
logical. TRUE if w is known
逻辑。 TRUE如果w

参数：eps
epsilon : tolerance parameter for the log-likelihood convergence (default=1e-7)
EPSILON：对数似然收敛公差参数（默认值= 1E-7）

参数：max.iter
maximum number of EM iterations (default=500)
EM迭代的最大数量（默认值= 500）

参数：t.outl
threshold value for posterior probabilities of identifying outliers (default=0.5)
识别异常值的后验概率阈值（默认值= 0.5）

参数：graph
logical. TRUE to display graphics (default=FALSE)
逻辑。 TRUE显示图形（默认值= FALSE）

Details

详细信息----------Details----------

This function provides the parameter estimates of a contamination model where a set of y variables is assumed to depend on a (possibly empty) set of covariates (x variables)  through a mixture of two linear regressions with Gaussian residuals. The covariance matrices of the two mixture components are assumed to be proportional (the proportionality constant being  lambda). In case of no x variables a mixture of two Gaussian distribution is estimated. BIC and AIC scores (bic.aic) are returned corresponding to both standard Gaussian model and contamination model in order to help the user to avoid possible over-parametrisation.
此功能提供的污染y变量的一组假定依赖（可能为空）的协变量（x变量），通过两个线性回归与高斯混合模型的参数估计值残留物。两个混合物组分的协方差矩阵被假定为比例（比例常数是lambda）。箱子没有x变量估计两个高斯分布的混合物。 BIC和AIC评分（bic.aic）将返回对应于标准高斯模型和污染模型，以帮助用户避免可能出现的过参数化法。

According to the estimated model parameters, a matrix of predictions of "true" y values  (ypred) is computed. To each unit in the dataset, a flag (outlier) is assigned taking  value 0 or 1 depending on whether the posterior probability of being erroneous (tau) is  greater than the user specified threshold  (t.outl).
根据模型参数的估计，矩阵的“真”的预测y值（ypred）计算。给每个单元中的数据集，一个标志（outlier）被分配值0或1，这取决于是否有错误（的后验概率tau）是大于用户指定的阈值（<X >）。

The model is estimated using complete observations. Missing values in the x variables are not allowed. However,  y variables can be partly observed. Robust predictions of y variables  are provided even when they are not observed. A vector of missing pattern (pattern) indicates which  item is observed and which is missing.
使用完整的观测模型估计。 x变量的遗漏值是不允许的。然而，y变量的部分可被观察到。提供强大的y变量的预测，即使他们没有观察到。失踪模式的矢量（pattern）表示项目的观察和丢失。

In case the option "model = LN"  is specified, each zero value is changed in 1e-7 and  a warning is returned.
的选项的情况下“的模式= LN被指定的，每个零值被改变1e的-7中，并返回一个警告。

In order to graphically monitor EM algorithm, a scatter plot is showed where outliers  are depicted as long as they are identified. The trajectory of the lambda parameter  is also showed until convergence.
为了以图形方式监视EM算法，散点图表明离群值描绘，只要它们被识别。 lambda参数的轨迹也表明直到收敛为止。

值----------Value----------

ml.est returns a list containing the following components:<br><br> <table summary="R valueblock"> <tr valign="top"><td>ypred </td> <td> matrix of predicted values for y variables</td></tr> <tr valign="top"><td>B </td> <td> matrix of estimated regression coefficients</td></tr> <tr valign="top"><td>sigma </td> <td> estimated covariance matrix</td></tr> <tr valign="top"><td>lambda </td> <td> estimated variance inflation factor</td></tr> <tr valign="top"><td>w </td> <td> estimated proportion of erroneous data</td></tr> <tr valign="top"><td>tau </td> <td> vector of posterior probabilities of being contaminated</td></tr> <tr valign="top"><td>outlier </td> <td> 1 if the observation is classified as an outlier, 0 otherwise</td></tr> <tr valign="top"><td>pattern </td> <td> vector of non-response patterns for y variables: 0 = missing, 1 = present value</td></tr> <tr valign="top"><td>is.conv </td> <td> logical value: TRUE if the EM algorithm has converged</td></tr> <tr valign="top"><td>n.iter </td> <td> number of iterations of EM algorithm</td></tr> <tr valign="top"><td>bic.aic </td> <td> Bayesian Information Criterion and Akaike Information Criterion for contaminated  and non contaminated Gaussian models</td></tr> </table>
ml.est返回一个列表，其中包含以下组件：<BR>参考表summary="R valueblock"> <tr valign="top"> <TD> ypred  </ TD> <TD > </ TD> </ TR> <tr valign="top"> <TD> B </ TD> <TD>矩阵的估计回归系数</ TD> </ y变量的预测值矩阵TR> <tr valign="top"> <TD>sigma  </ TD> <TD>估计协方差矩阵</ TD> </ TR> <tr valign="top"> <TD><X > </ TD> <TD>估计方差膨胀因子</ TD> </ TR> <tr valign="top"> <TD>lambda  </ TD> <TD>估计错误数据的比例< / TD> </ TR> <tr valign="top"> <TD>w  </ TD> <TD>被污染后验概率向量</ TD> </ TR> <TR VALIGN =“顶“<TD> tau </ TD> <TD> 1如果观察被归类为异常值，否则为0 </ TD> </ TR> <tr valign="top"> <TD> outlier  </ TD> <TD>向量y变量的非响应模式：0 =人失踪，1 =现值</ TD> </ TR> <tr valign="top"> <TD> X> </ TD> <TD>逻辑值：TRUE如果EM算法融合</ TD> </ TR> <tr valign="top"> <TD>pattern  </ TD> <TD >的EM的迭代算法</ TD> </ TR> <tr valign="top"> <TD>is.conv  </ TD> <TD>贝叶斯信息标准和赤池信息准则的污染和非污染高斯模型</ TD> </ TR> </ TABLE>

（作者）----------Author(s)----------

M. Teresa Buglielli <bugliell@istat.it>, Ugo Guarnera <guarnera@istat.it>

参考文献----------References----------

Buglielli, M.T., Di Zio, M., Guarnera, U. (2010) "Use of Contamination Models for Selective Editing",  European Conference on Quality in Survey Statistics Q2010, Helsinki, 4-6 May 2010

实例----------Examples----------

# Parameter estimation with one contaminated variable and one covariate[参数估计一个污染的变量和一个协]
      data(ex1.data)
      ml.par <- ml.est(y=ex1.data[,"Y1"], x=ex1.data[,"X1"], graph=TRUE)
      str(ml.par)
      sum(ml.par$outlier)  # number of outliers[离群值数]
# Parameter estimation with two contaminated variables and no covariates[两个受污染的变量和不协变量的参数估计]
      data(ex2.data)
      par.joint <- ml.est(y=ex2.data, x=NULL, graph=TRUE)
      sum(par.joint$outlier)  # number of outliers          [离群值数]

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册

R语言 SeleMix包 ml.est()函数中文帮助文档(中英文对照)

浏览过的版块