找回密码
 注册
查看: 1392|回复: 0

R语言 VGAM包 negbinomial()函数中文帮助文档(中英文对照)

  [复制链接]
发表于 2012-10-1 15:45:43 | 显示全部楼层 |阅读模式
negbinomial(VGAM)
negbinomial()所属R语言包:VGAM

                                         Negative Binomial Distribution Family Function
                                         负二项分布家庭功能

                                         译者:生物统计家园网 机器人LoveR

描述----------Description----------

Maximum likelihood estimation of the two parameters of a negative binomial distribution.
最大似然估计的负二项式分布的两个参数。


用法----------Usage----------


negbinomial(lmu = "loge", lsize = "loge", emu = list(), esize = list(),
            imu = NULL, isize = NULL, quantile.probs = 0.75,
            nsimEIM = 100, cutoff = 0.995,
            Maxiter = 5000, deviance.arg = FALSE, imethod = 1,
            parallel = FALSE, shrinkage.init = 0.95, zero = -2)
polya(lprob = "logit", lsize = "loge", eprob = list(), esize = list(),
      iprob = NULL, isize = NULL, quantile.probs = 0.75, nsimEIM = 100,
      deviance.arg = FALSE, imethod = 1, shrinkage.init = 0.95, zero = -2)



参数----------Arguments----------

参数:lmu, lsize, lprob
Link functions applied to the mu, k and p  parameters. See Links for more choices. Note that the mu, k and p  parameters are the mu, size and prob arguments of  rnbinom respectively. Common alternatives for lsize are nloge and reciprocal.  
链接功能适用于mu,k和p参数。见Links更多的选择。需要注意的是mu,k和p参数是mu,size和prob参数的rnbinom分别。常见的替代品lsize是nloge和reciprocal。


参数:emu, esize, eprob
List. Extra argument for each of the links. See earg in Links for general information.  
列表。每个环节的额外参数。见earg中Links的一般信息。


参数:imu, isize, iprob
Optional initial values for the mean and k and p. For k, if failure to converge occurs then try different values (and/or use imethod). For a S-column response, isize can be of length S. A value NULL means an initial value for each response is computed internally using a range of values. The last argument is ignored if used within cqo; see the iKvector argument of qrrvglm.control instead.  
可选的初始值的均值和k和p。对于k,如果收敛失败,那就会尝试不同的值(和/或使用imethod)。对于S列响应,isize是的长度S。值NULL是指每个响应的计算在内部使用的值的范围的初始值。最后一个参数被忽略,如果使用内cqo看到iKvectorqrrvglm.control,而不是争论的。


参数:quantile.probs
Passed into the probs argument of quantile when imethod = 3 to obtain an initial value for the mean.  
probs参数传递到quantileimethod = 3获得的初始值的意思。


参数:nsimEIM
This argument is used for computing the diagonal element of the expected information matrix (EIM) corresponding to k. See CommonVGAMffArguments for more information and the note below.  
此参数用于计算预期的信息矩阵(EIM)的对角线元素对应的k。见CommonVGAMffArguments更多的信息和下面的注意事项。


参数:cutoff
Used in the finite series approximation. A numeric which is close to 1 but never exactly 1. Used to specify how many terms of the infinite series for computing the second diagonal element of the EIM are actually used. The sum of the probabilites are added until they reach this value or more (but no more than Maxiter terms allowed). It is like specifying p in an imaginary function qnegbin(p).  
用于在有限的级数逼近。一个数字是1,但从来没有完全。用于指定实际使用多少的无穷级数的计算第二对角线元素的EIM。的总和的probabilites的加入,直到它们到达此值或多个(但不超过Maxiter条款允许)。它是像p的虚函数qnegbin(p)。


参数:Maxiter
Used in the finite series approximation. Integer. The maximum number of terms allowed when computing the second diagonal element of the EIM. In theory, the value involves an infinite series. If this argument is too small then the value may be inaccurate.  
用于在有限的级数逼近。整数。 EIM第二对角线元素的计算时允许的最大数目的术语。从理论上讲,值涉及一个无限的系列。如果这种说法是太小的值可能不准确。


参数:deviance.arg
Logical. If TRUE, the deviance function is attached to the object. Under ordinary circumstances, it should be left alone because it really assumes the index parameter is at the maximum likelihood estimate. Consequently, one cannot use that criterion to minimize within the IRLS algorithm. It should be set TRUE only when used with cqo under the fast algorithm.  
逻辑。如果TRUE,越轨功能被附加到该对象。在一般情况下,它应该被单独留在家中,因为它确实假定的指标参数,在最大似然估计。因此,不能使用该标准,以尽量减少在IRLS算法。它应该被设置TRUE只有当使用cqo下的快速算法。


参数:imethod
An integer with value 1 or 2 or 3 which specifies the initialization method for the mu parameter. If failure to converge occurs try another value and/or else specify a value for shrinkage.init and/or else specify a value for isize.  
一个整数,值1或2或3指定为mu参数的初始化方法。如果出现收敛失败尝试另一个值和/或其他指定为shrinkage.init和/或其他指定的值isize。


参数:parallel
See CommonVGAMffArguments for more information. Setting parallel = TRUE is useful in order to get something similar to quasipoissonff or what is known as NB-1. The parallelism constraint does not apply to any intercept term. You should set zero = NULL too if parallel = TRUE to avoid a conflict.  
见CommonVGAMffArguments更多信息。设置parallel = TRUE是有用的,以获得类似quasipoissonff或NB-1被称为。并不适用于任何截距项的平行约束。你应该设置zero = NULL太多,如果parallel = TRUE,以避免冲突。


参数:shrinkage.init
How much shrinkage is used when initializing mu. The value must be between 0 and 1 inclusive, and a value of 0 means the individual response values are used, and a value of 1 means the median or mean is used. This argument is used in conjunction with imethod. If convergence failure occurs try setting this argument to 1.  
多少收缩是使用初始化mu时。值必须介于0和1之间,0值是指个人的响应值,和值1中位数或平均数的。该参数用于结合与imethod。如果出现收敛失败尝试将此参数设置为1。


参数:zero
Integer valued vector, usually assigned -2 or 2 if used at all. Specifies which of the two linear/additive predictors are modelled as an intercept only. By default, the k parameter (after lsize is applied) is modelled as a single unknown number that is estimated. It can be modelled as a function of the explanatory variables by setting zero = NULL; this has been called a NB-H model by Hilbe (2011). A negative value means that the value is recycled, so setting -2 means all k are intercept-only. See CommonVGAMffArguments for more information.  
整值向量,通常被分配-2或2如果使用的所有。指定的两个线性/添加剂的预测模型仅作为一个拦截。默认情况下,k参数(后lsize“)被建模为一个未知的数字,估计。它可以作为解释变量的函数模型,通过设置zero = NULL,这已被称为NB-H模型Hilbe(2011)。负值意味着,回收的价值,所以设置-2是指所有的k是仅截距。见CommonVGAMffArguments更多信息。


Details

详细信息----------Details----------

The negative binomial distribution can be motivated in several ways, e.g., as a Poisson distribution with a mean that is gamma distributed. There are several common parametrizations of the negative binomial distribution. The one used by negbinomial() uses the mean mu and an index parameter k, both which are positive. Specifically, the density of a random variable Y is
负二项分布可以在几个方面的动机,例如,为泊松分布,平均是Gamma分布。有几种常见的负二项分布的参数化。使用了一个由negbinomial()使用的平均mu和一个索引参数k,无论是正的。具体而言,一个随机变量Y密度是

where y=0,1,2,…, and mu > 0 and k > 0. Note that the dispersion parameter is  1/k, so that as k approaches infinity the negative binomial distribution approaches a Poisson distribution. The response has variance Var(Y)=mu*(1+mu/k). When fitted, the fitted.values slot of the object contains the estimated value of the mu parameter, i.e., of the mean E(Y). It is common for some to use alpha=1/k as the ancillary or heterogeneity parameter; so common alternatives for lsize are nloge and reciprocal.
y=0,1,2,…,mu > 0和k > 0。需要注意的是分散参数是1/k,k趋近于无穷大的负二项分布接近泊松分布。响应方差Var(Y)=mu*(1+mu/k)。嵌合时,fitted.values插槽的对象中包含的估计价值mu参数,即,是平均E(Y)。这是常见的一些使用alpha=1/k作为辅助性或异质性的参数,所以常见的替代品lsize是nloge和reciprocal。

For polya the density is
对于polya的密度是

where y=0,1,2,&hellip;, and 0 < p < 1 and k > 0.
y=0,1,2,&hellip;,0 < p < 1和k > 0。

The negative binomial distribution can be coerced into the classical GLM framework with one of the parameters being of interest and the other treated as a nuisance/scale parameter (this is implemented in the MASS library). The VGAM family function negbinomial treats both parameters on the same footing, and estimates them both by full maximum likelihood estimation. Simulated Fisher scoring is employed as the default (see the nsimEIM argument).
可以将其强制的经典GLM框架负二项分布的参数之一是利益和其他被视为滋扰/缩放参数(这是实施在MASS库中)。 VGAM家庭功能negbinomial对待这两个参数相同的基础上,和估计他们最大似然估计。模拟费舍尔得分作为默认的(见nsimEIM参数)。

The parameters mu and k are independent (diagonal EIM), and the confidence region for k is extremely skewed so that its standard error is often of no practical use. The parameter 1/k has been used as a measure of aggregation.
的参数mu和k是独立的(对角线EIM),以及置信区间为k是非常扭曲的,其标准错误往往是没有实际用处。参数1/k已被用来作为衡量的聚集。

These VGAM family functions handle multivariate responses, so that a matrix can be used as the response. The number of columns is the number of species, say, and setting zero = -2 means that all species have a k equalling a (different) intercept only.
这些VGAM家庭功能的处理多变量的响应,因此,可以将一个矩阵作为响应使用。列数是物种的数量,也就是说,并设置zero = -2是指所有物种有一个k等于(不同的)拦截。


值----------Value----------

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm, rrvglm and vgam.
类的一个对象"vglmff"(见vglmff-class)。该对象被用于建模功能,如vglm,rrvglm和vgam。


警告----------Warning----------

The Poisson model corresponds to k equalling infinity.  If the data is Poisson or close to Poisson, numerical problems will occur. Possibly choosing a log-log link may help in such cases, otherwise use poissonff or quasipoissonff.
的泊松模型对应于k的合计的无穷。如果该数据是泊松或接近泊松,数值的问题将会发生。可能选择log记录链接可以帮助在这种情况下,以其他方式,使用poissonff或quasipoissonff。

These functions are fragile; the maximum likelihood estimate of the index parameter is fraught (see Lawless, 1987). In general, the quasipoissonff is more robust.  Other alternatives to negbinomial are to fit a NB-1 or RR-NB (aka NB-P) model; see Yee (2012). Also available are the NB-C, NB-H and NB-G. Assigning values to the isize argument may lead to a local solution, and smaller values are preferred over large values when using this argument.
这些功能是脆弱的,最大似然估计的索引参数是充满(见不法分子,1987年)。在一般情况下,quasipoissonff更加强劲。其他替代品negbinomial,以适应NB-1或RR-NB(又名NB-P)的模型,见议(2012年)。此外,还包括NB-C,NB-H和NB-G。分配值可能会导致本地解决方案isize参数,数值越小,优于大值时,使用此参数。

Yet to do: write a family function which uses the methods of moments estimator for k.
然而,要做到:写一个家庭功能的矩估计k使用的方法。


注意----------Note----------

These two functions implement two common parameterizations of the negative binomial (NB). Some people called the NB with integer k the Pascal distribution, whereas if k is real then this is the Polya distribution. I don't. The one matching the details of rnbinom in terms of p and k is polya().
这两个函数实现两个共同的参数化的负二项分布(NB)。有些人称为NB整数kPascal分布的,而如果k是真实的,那么这是Pólya分布。我不知道。一个匹配的细节rnbinom p和k是polya()。

For polya() the code may fail when p is close to 0 or 1. It is not yet compatible with cqo or cao.
对于polya()当p接近为0或1的代码可能会失败。“它是尚未兼容cqo或cao。

Suppose the response is called ymat. For negbinomial() the diagonal element of the expected information matrix (EIM) for parameter k involves an infinite series; consequently simulated Fisher scoring (see nsimEIM) is the default. This algorithm should definitely be used if max(ymat) is large, e.g., max(ymat) > 300 or there are any outliers in ymat. A second algorithm involving a finite series approximation can be invoked by setting nsimEIM = NULL. Then the arguments Maxiter and cutoff are pertinent.
假设响应的名字叫做ymat。对于negbinomial()(EIM)的参数k预期的信息矩阵对角线元素涉及到一个无穷级数,因此模拟费舍尔场均得分(见nsimEIM)是默认的。如果max(ymat)是大,例如max(ymat) > 300或有任何异常值ymat这的算法绝对应该使用。涉及有限级数逼近的第二个算法可以通过设置nsimEIM = NULL调用。然后参数Maxiter和cutoff是恰当的。

Regardless of the algorithm used, convergence problems may occur, especially when the response has large outliers or is large in magnitude. If convergence failure occurs, try using arguments (in recommended decreasing order) nsimEIM, shrinkage.init, imethod, Maxiter,  cutoff, isize, zero.
不管使用的算法,收敛性,可能会出现问题,特别是当响应具有大的异常值,或者被大的幅度。如果收敛失败时,尝试使用参数(建议递减的顺序),nsimEIM,shrinkage.init,imethod,Maxiter,cutoff,isize, zero。

The function negbinomial can be used by the fast algorithm in cqo, however, setting EqualTolerances = TRUE and ITolerances = FALSE is recommended.
的功能negbinomial在cqo的快速算法,可以使用,但是,设置EqualTolerances = TRUE和ITolerances = FALSE建议。

In the first example below (Bliss and Fisher, 1953), from each of 6 McIntosh apple trees in an orchard that had been sprayed, 25 leaves were randomly selected. On each of the leaves, the number of adult female European red mites were counted.
(Bliss和费舍尔,1953年),从6麦金托什苹果树的果园被喷在第一个例子中,随机选取25片叶。在每个叶子上,成年女性的欧洲红螨的数量进行计数。

There are two special uses of negbinomial for handling count data. Firstly, when used by rrvglm  this  results in a continuum of models in between and inclusive of quasi-Poisson and negative binomial regression. This is known as a reduced-rank negative binomial model (RR-NB). It fits a negative binomial log-linear regression with variance function Var(Y) = mu + delta1 * mu^delta2 where delta1 and   delta2 are parameters to be estimated by MLE. Confidence intervals are available for delta2, therefore it can be decided upon whether the data are quasi-Poisson or negative binomial, if any.
有两个特殊用途的negbinomial处理计数资料。首先,当所使用的rrvglm这将导致在一个连续的之间并包括准泊松和负二项式回归模型。这是已知的作为降秩负二项式模型(RR-NB)。它适合负二项对数线性回归,方差函数Var(Y) = mu + delta1 * mu^delta2其中delta1和delta2是待估参数的极大似然估计。 delta2,因此,它可以决定数据是否是准泊松或负二项式,如果没有可用的置信区间。

Secondly, the use of negbinomial with parallel = TRUE inside vglm can result in a model similar to quasipoissonff. This is named the NB-1 model. The dispersion parameter is estimated by MLE whereas glm uses the method of moments. In particular, it fits a negative binomial log-linear regression with variance function Var(Y) = phi0 * mu where phi0 is a parameter to be estimated by MLE. Confidence intervals are available for phi0.
其次,使用negbinomialparallel = TRUE内vglm可能会导致模型中的类似于quasipoissonff。这种被命名为NB-1型。而glm使用矩量法的分散参数估计MLE。特别是,它适合负二项对数线性回归与方差函数Var(Y) = phi0 * mu其中phi0是通过极大似然估计的参数来估计。置信区间是可用于phi0。


(作者)----------Author(s)----------


Thomas W. Yee



参考文献----------References----------

Negative binomial and mixed Poisson regression. The Canadian Journal of Statistics 15, 209&ndash;225.
Negative Binomial Regression, 2nd Edition. Cambridge: Cambridge University Press.
Fitting the negative binomial distribution to biological data. Biometrics 9, 174&ndash;200.
Two-parameter reduced-rank vector generalized linear models. In preparation.

参见----------See Also----------

quasipoissonff, poissonff, zinegbinomial, negbinomial.size (e.g., NB-G), nbcanlink (NB-C), posnegbinomial, invbinomial,  rnbinom, nbolf, rrvglm, cao, cqo, CommonVGAMffArguments.
quasipoissonff,poissonff,zinegbinomial,negbinomial.size(例如,NB-G),nbcanlink(NB-C),posnegbinomial,invbinomial,rnbinom,nbolf,rrvglm,cao,cqo,CommonVGAMffArguments。


实例----------Examples----------


# Example 1: apple tree data[例1:苹果树数据]
appletree <- data.frame(y = 0:7, w = c(70, 38, 17, 10, 9, 3, 2, 1))
fit <- vglm(y ~ 1, negbinomial, appletree, weights = w)
summary(fit)
coef(fit, matrix = TRUE)
Coef(fit)

# Example 2: simulated data with multivariate response[例2:多变量响应的模拟数据]
ndata <- data.frame(x2 = runif(nn <- 500))
ndata <- transform(ndata, y1 = rnbinom(nn, mu = exp(3+x2), size = exp(1)),
                          y2 = rnbinom(nn, mu = exp(2-x2), size = exp(0)))
fit1 <- vglm(cbind(y1, y2) ~ x2, negbinomial, ndata, trace = TRUE)
coef(fit1, matrix = TRUE)

# Example 3: large counts so definitely use the nsimEIM argument[例3:大数量一定要使用nsimEIM的参数]
ndata <- transform(ndata, y3 = rnbinom(nn, mu = exp(12+x2), size = exp(1)))
with(ndata, range(y3))  # Large counts[大计数]
fit2 <- vglm(y3 ~ x2, negbinomial(nsimEIM = 100), ndata, trace = TRUE)
coef(fit2, matrix = TRUE)

# Example 4: a NB-1 to estimate a negative binomial with Var(Y) = phi0 * mu[例4:NB-1估计负二项VAR(Y)= phi0 *亩]
nn &lt;- 1000        # Number of observations[若干意见]
phi0 &lt;- 10        # Specify this; should be greater than unity[指定这应该是大于1]
delta0 <- 1 / (phi0 - 1)
mydata <- data.frame(x2 = runif(nn), x3 = runif(nn))
mydata <- transform(mydata, mu = exp(2 + 3 * x2 + 0 * x3))
mydata <- transform(mydata, y3 = rnbinom(nn, mu = mu, size = delta0 * mu))
## Not run: [#不运行:]
plot(y3 ~ x2, data = mydata, pch = "+", col = 'blue',
     main = paste("Var(Y) = ", phi0, " * mu", sep = ""), las = 1)
## End(Not run)[#(不执行)]
nb1 <- vglm(y3 ~ x2 + x3, negbinomial(parallel = TRUE, zero = NULL),
            mydata, trace = TRUE)
# Extracting out some quantities:[提取出一些数量:]
cnb1 <- coef(nb1, matrix = TRUE)
mydiff <- (cnb1["(Intercept)", "log(size)"] - cnb1["(Intercept)", "log(mu)"])
delta0.hat <- exp(mydiff)
(phi.hat &lt;- 1 + 1 / delta0.hat)  # MLE of phi[MLE的披]
summary(nb1)
# Obtain a 95 percent confidence interval for phi0:[获得了95%的置信区间phi0:]
myvec <- rbind(-1, 1, 0, 0)
(se.mydiff <- sqrt(t(myvec) %*%  vcov(nb1) %*%  myvec))
ci.mydiff <- mydiff + c(-1.96, 1.96) * se.mydiff
ci.delta0 <- ci.exp.mydiff <- exp(ci.mydiff)
(ci.phi0 &lt;- 1 + 1 / rev(ci.delta0)) # The 95 percent conf. interval for phi0[95%的置信时间间隔phi0]

confint_nb1(nb1) # Quick way to get it[快速的方式得到它]

summary(glm(y3 ~ x2 + x3, quasipoisson, mydata))$disper # cf. moment estimator[比照。矩估计]

转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2025-4-18 10:40 , Processed in 0.026969 second(s), 16 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表