查看: 340|回复: 0

R语言 rockchalk包 summarize()函数中文帮助文档(中英文对照)

发表于 2012-9-27 22:45:07 | 显示全部楼层 |阅读模式

                                        Sorts numeric from factor variables and returns separate

                                         译者:生物统计家园网 机器人LoveR


The work is done by the functions summarizeNumerics and summarizeFactors. Please see the help pages for those functions for complete details.


  summarize(dat, ...)


A data frame

Optional arguments that are passed to summarizeNumerics and summarizeFactors.  These may be used: maxLevels The maximum number of levels that will be reported.  alphaSort If TRUE (default), the columns are re-organized in alphabetical order. If FALSE, they are presented in the original order.  digits integer, used for number formatting output.
可选参数被传递的summarizeNumerics和summarizeFactors的。这些可以单独使用:maxLevels的最大数目将被报告的水平。 alphaSort如果是TRUE(默认值),重新组织列的字母顺序排列。如果为FALSE,他们在原来的顺序。位整数,用于数字格式输出。


A list with 2 objects, numerics and factors. numerics is a matrix of summary information, while factors is a list of factor summaries.


Paul E. Johnson <pauljohn@ku.edu>



N <- 100
x1 <- gl(12, 2, labels = LETTERS[1:12])
x2 <- gl(8, 3, labels = LETTERS[12:24])
x1 <- sample(x = x1, size=N, replace = TRUE)
x2 <- sample(x = x2, size=N, replace = TRUE)
z1 <- rnorm(N)
a1 <- rnorm(N, mean = 1.2, sd = 1.7)
a2 <- rpois(N, lambda = 10 + a1)
a3 <- rgamma(N, 0.5, 4)
b1 <- rnorm(N, mean = 1.3, sd = 1.4)
dat <- data.frame(z1, a1, x2, a2, x1, a3, b1)


summarizeFactors(dat, maxLevels = 5)

summarize(dat, alphaSort = FALSE)

summarize(dat, digits = 6, alphaSort = FALSE)

summarize(dat, digits = 22, alphaSort = FALSE)

summarize(dat, maxLevels = 2)

datsumm <- summarize(dat)

datsumm[[1]]  ## same: gets numerics[#一样的:获取数值解]


## Use numerics output to make plots. First,[#使用数值输出,使图。首先,]
## transpose gives varnames x summary stat matrix[#调换给varnames X摘要统计矩阵]
datsummNT <- t(datsumm$numerics)
datsummNT <- as.data.frame(datsummNT)

plot(datsummNT$mean, datsummNT$var, xlab = "The Means",
    ylab = "The Variances")

plot(datsummNT$mean, datsummNT$var, xlab = "The Means",
    ylab = "The Variances", type = "n")
text(datsummNT$mean, datsummNT$var, labels = rownames(datsummNT))

## Here's a little plot wrinkle.  Note variable names are 'out to the[#这里有一个小图皱纹。请注意变量名“的]
##  edge' of the plot. If names are longer they don't stay inside[#边缘“的图。如果名称是更长的时间,他们不留在里面]
##  figure. See?[#图。看到了吗?]

## Make the variable names longer[#的变量名,使之不再]

rownames(datsummNT) <- c("boring var", "var with long name",
    "tedious name var", "stupid varname", "buffoon not baboon")
plot(datsummNT$mean, datsummNT$var, xlab = "The Means",
    ylab = "The Variances", type = "n")
text(datsummNT$mean, datsummNT$var, labels = rownames(datsummNT),
    cex = 0.8)
## That's no good. Names across the edges[#这是没有好。名称在边缘]

## We could brute force the names outside the edges like[#我们可以暴力破解等边缘以外的名称]
##  this[#这]
par(xpd = TRUE)
text(datsummNT$mean, datsummNT$var, labels = rownames(datsummNT),
    cex = 0.8)
## but that is not much better[但也好不了多少]
par(xpd = FALSE)

## Here is one fix. Make the unused space inside the plot[#这里是一个修复。未使用的空间里面的图]
##  larger by[#大]
## making xlim and ylim bigger.  I use the magRange[#作适当调整和ylim更大。我用的是magRange]
##  function from[#函数]
## rockchalk to easily expand range to 1.2 times its[#rockchalk轻松扩展至1.2倍的范围]
##  current size.[#电流的大小。]
## otherwise, long variable names do not fit inside plot.[#否则,不适合长变量名内部图。]
##  magRange[#magRange]
## could be asymmetric if we want, but this use is[#可能是不对称的,如果我们想要的,但使用的是]
##  symmetric.[#对称的。]

rownames(datsummNT) <- c("boring var", "var with long name",
    "tedious name var", "stupid varname", "buffoon not baboon")
plot(datsummNT$mean, datsummNT$var, xlab = "The Means",
    ylab = "The Variances", type = "n", xlim = magRange(datsummNT$mean,
        1.2), ylim = magRange(datsummNT$var, 1.2))
text(datsummNT$mean, datsummNT$var, labels = rownames(datsummNT),
    cex = 0.8)

## Here's another little plot wrinkle.  If we don't do that to keep[#这里是另一个小的图皱纹。如果我们不这样做,为了保持]
## the names in bounds, we need some fancy footwork.  Note when a[#中的名称界,我们需要一些奇特的步法。请注意,当一个]
## point is near the edge, I make sure the text prints toward the[#点附近的优势,向我要确保文本打印]
## center of the graph.[#中心的曲线图。]
plot(datsummNT$mean, datsummNT$var, xlab = "The Means",
    ylab = "The Variances")
## calculate label positions. This is not as fancy as it could be.  If[#计算标签的位置。这不是幻想,因为它可以。如果]
##  there were lots of variables, we'd have to get smarter about[#有大量的变量,我们就必须变得更聪明]
##  positioning labels on above, below, left, or right.[定位标签的上方,下方,左,或右。]
labelPos <- ifelse(datsummNT$mean - mean(datsummNT$mean,
    na.rm = TRUE) > 0, 2, 4)
text(datsummNT$mean, datsummNT$var, labels = rownames(datsummNT),
    cex = 0.8, pos = labelPos)

x <- data.frame(x = rnorm(N), y = gl(50, 2), z = rep(1:4,
    25), ab = gl(2, 50))

summarize(x, maxLevels = 15)

sumry <- summarize(x)
sumry[[1]]  ##another way to get the numerics output[#另一种方式来获得输出的数值计算]
sumry[[2]]  ##another way to get the factors output[#另一种方式来获得输出的因素]

dat <- data.frame(x = rnorm(N), y = gl(50, 2), z = factor(rep(1:4,
    25), labels = c("A", "B", "C", "D")), animal = factor(ifelse(runif(N) <
    0.2, "cow", ifelse(runif(N) < 0.5, "pig", "duck"))))


## Run this if you have internet access[#如果您有互联网连接,运行此]

## dat &lt;- read.table(url("http://pj.freefaculty.org/guides/stat/DataSets/USNewsCollege/USNewsCollege.csv"),[#DAT < -  read.table(的URL(“http://pj.freefaculty.org/guides/stat/DataSets/USNewsCollege/USNewsCollege.csv”),]
## sep = ",")[#九月=“,”)]

## colnames(dat) &lt;- c("fice", "name", "state", "private", "avemath",[#colnames(DAT) -  C(“五虎”,“名”,“状态”,“私人”,的“avemath”]
##                    "aveverb", "avecomb", "aveact", "fstmath",[,#“aveverb”中,“avecomb”中,“aveact”中,“fstmath”,]
##                    "trdmath", "fstverb", "trdverb", "fstact",[#“trdmath”,“fstverb”,“trdverb”,“fstact”,]
##                    "trdact", "numapps", "numacc", "numenr",[#“trdact,”numapps的“,”numacc“中,”numenr“,]
##                    "pctten", "pctquart", "numfull", "numpart",[#“pctten”,“pctquart”,“numfull”,“numpart”,]
##                    "instate", "outstate", "rmbrdcst", "roomcst",[#“缴费”中,“outstate”,“rmbrdcst”,“roomcst”,]
##                    "brdcst", "addfees", "bookcst", "prsnl",[#“BRDCST”中,“addfees”中,“bookcst”中,“prsnl”,]
##                    "pctphd", "pctterm", "stdtofac", "pctdonat",[#“pctphd中,”pctterm“中,”stdtofac“中,”pctdonat“,]
##                    "instcst", "gradrate")[#“instcst”中,“gradrate”)]

## dat$private &lt;- factor(dat$private, labels = c("public",[#DAT私人< - 因子(DAT $私人,标签= C(“公共”,]
##                                    "private"))[#“私人”))]
## sumry &lt;- summarize(dat, digits = 2)[sumry <# - 总结,数字= 2(DAT)]
## sumry[#sumry]

## sumry[[1]][#sumry [[1]]]
## sumry[[2]][#sumry [[2]]]

## summarize(dat[, c("fice", "name", "private", "fstverb",[#总结(DAT,C(“五虎”,“名”,“私人”,“fstverb”]
##                   "avemath")], digits = 4)[#“avemath”)] = 4,数字)]

转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


使用道具 举报

您需要登录后才可以回帖 登录 | 注册


手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2025-2-20 08:11 , Processed in 0.022529 second(s), 16 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表