R语言:density()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-17 10:23:15

density(stats)
density()所属R语言包：stats

                                    Kernel Density Estimation
                                       核密度估计

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

The (S3) generic function density computes kernel density estimates.  Its default method does so with the given kernel and bandwidth for univariate observations.
（三）通用功能density计算内核密度估计。其默认的方法，这样做与特定的内核和单变量观测的带宽。

用法----------Usage----------

density(x, ...)
## Default S3 method:[默认方法]
density(x, bw = "nrd0", adjust = 1,
      kernel = c("gaussian", "epanechnikov", "rectangular",
               "triangular", "biweight",
               "cosine", "optcosine"),
      weights = NULL, window = kernel, width,
      give.Rkern = FALSE,
      n = 512, from, to, cut = 3, na.rm = FALSE, ...)

参数----------Arguments----------

参数：x
the data from which the estimate is to be computed.
从中估计是要计算的数据。

参数：bw
the smoothing bandwidth to be used.  The kernels are scaled such that this is the standard deviation of the smoothing kernel. (Note this differs from the reference books cited below, and from S-PLUS.)  bw can also be a character string giving a rule to choose the bandwidth.  See bw.nrd. <br> The default, "nrd0", has remained the default for historical and compatibility reasons, rather than as a general recommendation, where e.g., "SJ" would rather fit, see also V&R (2002).  The specified (or computed) value of bw is multiplied by adjust.
平滑的带宽使用。缩放等，这是标准偏差的平滑内核的内核。（请注意这不同于下面引用参考书籍，并从，S-PLUS）。bw也可以是一个字符串，给人一种规则来选择带宽。看到bw.nrd。 <BR>默认情况下，"nrd0"，一直保持默认历史和兼容性的原因，而不是作为一般的建议，例如，"SJ"宁愿适合，也看到的V＆R（2002）。指定（或计算机）bw的价值乘以adjust。

参数：adjust
the bandwidth used is actually adjust*bw. This makes it easy to specify values like "half the default" bandwidth.
使用的带宽实际上是adjust*bw。这可以很容易地指定像半默认带宽值。

参数：kernel, window
a character string giving the smoothing kernel to be used. This must be one of "gaussian", "rectangular", "triangular", "epanechnikov", "biweight", "cosine" or "optcosine", with default "gaussian", and may be abbreviated to a unique prefix (single letter).  "cosine" is smoother than "optcosine", which is the usual "cosine" kernel in the literature and almost MSE-efficient. However, "cosine" is the version used by S.
使平滑的内核要使用一个字符串。这必须是一个"gaussian"，"rectangular"，"triangular"，"epanechnikov"，"biweight"，"cosine"或"optcosine"默认情况下， "gaussian"，并可能是一个独特的前缀（单字母）的缩写。 "cosine"是"optcosine"，这是通常的“余弦”内核在文学，几乎微型和小型企业高效平滑。然而，"cosine"是由S所使用的版本

参数：weights
numeric vector of non-negative observation weights, hence of same length as x. The default NULL is equivalent to weights = rep(1/nx, nx) where nx is the length of (the finite entries of) x[].
数字向量的非负的观察权，因此，相同长度为x。默认的NULL相当于weights = rep(1/nx, nx)其中nx是x[]（有限项）的长度。

参数：width
this exists for compatibility with S; if given, and bw is not, will set bw to width if this is a character string, or to a kernel-dependent multiple of width if this is numeric.
存在与S的兼容性，如果有，bw不，将设置bwwidth如果这是一个字符串，或内核依赖多个<X >如果这是数字。

参数：give.Rkern
logical; if true, no density is estimated, and the "canonical bandwidth" of the chosen kernel is returned instead.
逻辑，如果属实，没有密度估计，“典型的带宽”，选择kernel返回。

参数：n
the number of equally spaced points at which the density is to be estimated.  When n > 512, it is rounded up to a power of 2 during the calculations (as fft is used) and the final result is interpolated by approx.  So it almost always makes sense to specify n as a power of two.
等距点的数量，密度估计。当n > 512，它是四舍五入计算期间为2功率（fft被用来），最终的结果是由approx插。因此，它几乎总是有意义的指定n作为两个电源的。

参数：from,to
the left and right-most points of the grid at which the density is to be estimated; the defaults are cut * bw outside of range(x).
网格的密度估计的左边和最右边的点;默认值是cut * bwrange(x)外面。

参数：cut
by default, the values of from and to are cut bandwidths beyond the extremes of the data.  This allows the estimated density to drop to approximately zero at the extremes.
默认情况下，价值观的from和to是cut的超越数据的极端带宽。这使得估计的密度下降到接近于零的极端。

参数：na.rm
logical; if TRUE, missing values are removed from x. If FALSE any missing values cause an error.
逻辑;从TRUE如果x，遗漏值删除。如果FALSE任何遗漏值导致错误。

参数：...
further arguments for (non-default) methods.
进一步论据（非默认）的方法。

Details

详情----------Details----------

The algorithm used in density.default disperses the mass of the empirical distribution function over a regular grid of at least 512 points and then uses the fast Fourier transform to convolve this approximation with a discretized version of the kernel and then uses linear approximation to evaluate the density at the specified points.
density.default所使用的算法，分散的经验分布函数的至少512点的定期电网的质量，然后采用快速傅立叶变换卷积离散版本的内核与此近似，然后使用线性近似评估在指定点的密度。

The statistical properties of a kernel are determined by sig^2 (K) = int(t^2 K(t) dt) which is always = 1 for our kernels (and hence the bandwidth bw is the standard deviation of the kernel) and R(K) = int(K^2(t) dt).<br> MSE-equivalent bandwidths (for different kernels) are proportional to sig(K) R(K) which is scale invariant and for our kernels equal to R(K).  This value is returned when give.Rkern = TRUE.  See the examples for using exact equivalent bandwidths.
内核的统计特性确定sig^2 (K) = int(t^2 K(t) dt)这始终是= 1我们的核心（因此带宽bw是内核的标准偏差）和R(K) = int(K^2(t) dt)。参考的MSE相当于带宽（不同的内核）是成正比sig(K) R(K)这是规模不变，我们的内核等于R(K)的。此值时返回give.Rkern = TRUE。使用精确的等效带宽的例子。

Infinite values in x are assumed to correspond to a point mass at +/-Inf and the density estimate is of the sub-density on (-Inf, +Inf).
在无限值x假设对应于一个点的群众在+/-Inf子密度和密度估计是(-Inf, +Inf)。

值----------Value----------

If give.Rkern is true, the number R(K), otherwise an object with class "density" whose underlying structure is a list containing the following components.
give.Rkern如果是真实的，数量R(K)，否则类"density"对象，其基本结构是一个列表，其中包含以下几部分组成。

参数：x
the n coordinates of the points where the density is estimated.
n坐标点密度估计。

参数：y
the estimated density values.  These will be non-negative, but can be zero.
估计密度值。这些将非负，但可以是零。

参数：bw
the bandwidth used.
使用的带宽。

参数：n
the sample size after elimination of missing values.
消除遗漏值后的样本大小。

参数：call
the call which produced the result.
呼叫产生的结果。

参数：data.name
the deparsed name of the x argument.
的x参数deparsed名称。

参数：has.na
logical, for compatibility (always FALSE).
逻辑性，兼容性（总是FALSE）。

The print method reports summary values on the x and y components.
print方法报告summary值x和y组件。

参考文献----------References----------

The New S Language. Wadsworth & Brooks/Cole (for S version).
Multivariate Density Estimation. Theory, Practice and Visualization. New York: Wiley.
A reliable data-based bandwidth selection method for kernel density estimation. J. Roy. Statist. Soc. B, 683–690.
Density Estimation. London: Chapman and Hall.
Modern Applied Statistics with S. New York: Springer.

参见----------See Also----------

bw.nrd, plot.density, hist.
bw.nrd，plot.density，hist。

举例----------Examples----------

require(graphics)

plot(density(c(-20,rep(0,98),20)), xlim = c(-4,4))# IQR = 0[四分= 0]

# The Old Faithful geyser data[老忠实间歇泉数据]
d <- density(faithful$eruptions, bw = "sj")
d
plot(d)

plot(d, type = "n")
polygon(d, col = "wheat")

## Missing values:[＃遗漏值：]
x <- xx <- faithful$eruptions
x[i.out <- sample(length(x), 10)] <- NA
doR <- density(x, bw = 0.15, na.rm = TRUE)
lines(doR, col = "blue")
points(xx[i.out], rep(0.01, 10))

## Weighted observations:[＃加权意见：]
fe <- sort(faithful$eruptions) # has quite a few non-unique values[有相当多的非唯一值]
## use 'counts / n' as weights:[＃使用的计数/ N作为权数：]
dw <- density(unique(fe), weights = table(fe)/length(fe), bw = d$bw)
utils::str(dw) ## smaller n: only 126, but identical estimate:[＃小N：只有126个，但相同的估计：]
stopifnot(all.equal(d[1:3], dw[1:3]))

## simulation from a density() fit:[＃从模拟密度（）适合：]
# a kernel density fit is an equally-weighted mixture.[内核密度合适的是同样加权混合物的。]
fit <- density(xx)
N <- 1e6
x.new <- rnorm(N, sample(xx, size = N, replace = TRUE), fit$bw)
plot(fit)
lines(density(x.new), col="blue")

(kernels <- eval(formals(density.default)$kernel))

## show the kernels in the R parametrization[＃表明在R参数化的内核]
plot (density(0, bw = 1), xlab = "",
   main="R's density() kernels with bw = 1")
for(i in 2:length(kernels))
lines(density(0, bw = 1, kernel =  kernels[i]), col = i)
legend(1.5,.4, legend = kernels, col = seq(kernels),
   lty = 1, cex = .8, y.intersp = 1)

## show the kernels in the S parametrization[＃显示的S参数化的内核]
plot(density(0, from=-1.2, to=1.2, width=2, kernel="gaussian"), type="l",
   ylim = c(0, 1), xlab="", main="R's density() kernels with width = 1")
for(i in 2:length(kernels))
lines(density(0, width = 2, kernel =  kernels[i]), col = i)
legend(0.6, 1.0, legend = kernels, col = seq(kernels), lty = 1)

##-------- Semi-advanced theoretic from here on -------------[＃--------半先进的理论从这里-------------]

(RKs <- cbind(sapply(kernels,
                  function(k) density(kernel = k, give.Rkern = TRUE))))
100*round(RKs["epanechnikov",]/RKs, 4) ## Efficiencies[＃效率]

bw <- bw.SJ(precip) ## sensible automatic choice[＃明智的自动选择]
plot(density(precip, bw = bw),
   main = "same sd bandwidths, 7 different kernels")
for(i in 2:length(kernels))
lines(density(precip, bw = bw, kernel = kernels[i]), col = i)

## Bandwidth Adjustment for "Exactly Equivalent Kernels"[＃带宽调整为“完全等同于内核”]
h.f <- sapply(kernels, function(k)density(kernel = k, give.Rkern = TRUE))
(h.f <- (h.f["gaussian"] / h.f)^ .2)
## -> 1, 1.01, .995, 1.007,... close to 1 => adjustment barely visible..[＃ - > 1，1.01，.995，1.007，...接近1 =>调整隐约可见......]

plot(density(precip, bw = bw),
   main = "equivalent bandwidths, 7 different kernels")
for(i in 2:length(kernels))
lines(density(precip, bw = bw, adjust = h.f[i], kernel = kernels[i]),
      col = i)
legend(55, 0.035, legend = kernels, col = seq(kernels), lty = 1)

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册