summarize(rockchalk)
summarize()所属R语言包:rockchalk
Sorts numeric from factor variables and returns separate
从因子变量,并返回单独的数字排序
译者:生物统计家园网 机器人LoveR
描述----------Description----------
The work is done by the functions summarizeNumerics and summarizeFactors. Please see the help pages for those functions for complete details.
这项工作是做的功能summarizeNumerics和summarizeFactors。有关完整的详细信息,请参阅这些功能的帮助页面。
用法----------Usage----------
summarize(dat, ...)
参数----------Arguments----------
参数:dat
A data frame
一个数据框
参数:...
Optional arguments that are passed to summarizeNumerics and summarizeFactors. These may be used: maxLevels The maximum number of levels that will be reported. alphaSort If TRUE (default), the columns are re-organized in alphabetical order. If FALSE, they are presented in the original order. digits integer, used for number formatting output.
可选参数被传递的summarizeNumerics和summarizeFactors的。这些可以单独使用:maxLevels的最大数目将被报告的水平。 alphaSort如果是TRUE(默认值),重新组织列的字母顺序排列。如果为FALSE,他们在原来的顺序。位整数,用于数字格式输出。
值----------Value----------
A list with 2 objects, numerics and factors. numerics is a matrix of summary information, while factors is a list of factor summaries.
列表2个对象,数值和因素。数值解是一个矩阵的摘要信息,而因素是一系列的因素总结。
----------Author(s)----------
Paul E. Johnson <pauljohn@ku.edu>
实例----------Examples----------
library(rockchalk)
set.seed(23452345)
N <- 100
x1 <- gl(12, 2, labels = LETTERS[1:12])
x2 <- gl(8, 3, labels = LETTERS[12:24])
x1 <- sample(x = x1, size=N, replace = TRUE)
x2 <- sample(x = x2, size=N, replace = TRUE)
z1 <- rnorm(N)
a1 <- rnorm(N, mean = 1.2, sd = 1.7)
a2 <- rpois(N, lambda = 10 + a1)
a3 <- rgamma(N, 0.5, 4)
b1 <- rnorm(N, mean = 1.3, sd = 1.4)
dat <- data.frame(z1, a1, x2, a2, x1, a3, b1)
summary(dat)
summarize(dat)
summarizeNumerics(dat)
summarizeFactors(dat, maxLevels = 5)
summarize(dat, alphaSort = FALSE)
summarize(dat, digits = 6, alphaSort = FALSE)
summarize(dat, digits = 22, alphaSort = FALSE)
summarize(dat, maxLevels = 2)
datsumm <- summarize(dat)
datsumm$numerics
datsumm[[1]] ## same: gets numerics[#一样的:获取数值解]
datsumm$factors
datsumm[[2]]
## Use numerics output to make plots. First,[#使用数值输出,使图。首先,]
## transpose gives varnames x summary stat matrix[#调换给varnames X摘要统计矩阵]
datsummNT <- t(datsumm$numerics)
datsummNT <- as.data.frame(datsummNT)
plot(datsummNT$mean, datsummNT$var, xlab = "The Means",
ylab = "The Variances")
plot(datsummNT$mean, datsummNT$var, xlab = "The Means",
ylab = "The Variances", type = "n")
text(datsummNT$mean, datsummNT$var, labels = rownames(datsummNT))
## Here's a little plot wrinkle. Note variable names are 'out to the[#这里有一个小图皱纹。请注意变量名“的]
## edge' of the plot. If names are longer they don't stay inside[#边缘“的图。如果名称是更长的时间,他们不留在里面]
## figure. See?[#图。看到了吗?]
## Make the variable names longer[#的变量名,使之不再]
rownames(datsummNT)
rownames(datsummNT) <- c("boring var", "var with long name",
"tedious name var", "stupid varname", "buffoon not baboon")
plot(datsummNT$mean, datsummNT$var, xlab = "The Means",
ylab = "The Variances", type = "n")
text(datsummNT$mean, datsummNT$var, labels = rownames(datsummNT),
cex = 0.8)
## That's no good. Names across the edges[#这是没有好。名称在边缘]
## We could brute force the names outside the edges like[#我们可以暴力破解等边缘以外的名称]
## this[#这]
par(xpd = TRUE)
text(datsummNT$mean, datsummNT$var, labels = rownames(datsummNT),
cex = 0.8)
## but that is not much better[但也好不了多少]
par(xpd = FALSE)
## Here is one fix. Make the unused space inside the plot[#这里是一个修复。未使用的空间里面的图]
## larger by[#大]
## making xlim and ylim bigger. I use the magRange[#作适当调整和ylim更大。我用的是magRange]
## function from[#函数]
## rockchalk to easily expand range to 1.2 times its[#rockchalk轻松扩展至1.2倍的范围]
## current size.[#电流的大小。]
## otherwise, long variable names do not fit inside plot.[#否则,不适合长变量名内部图。]
## magRange[#magRange]
## could be asymmetric if we want, but this use is[#可能是不对称的,如果我们想要的,但使用的是]
## symmetric.[#对称的。]
rownames(datsummNT)
rownames(datsummNT) <- c("boring var", "var with long name",
"tedious name var", "stupid varname", "buffoon not baboon")
plot(datsummNT$mean, datsummNT$var, xlab = "The Means",
ylab = "The Variances", type = "n", xlim = magRange(datsummNT$mean,
1.2), ylim = magRange(datsummNT$var, 1.2))
text(datsummNT$mean, datsummNT$var, labels = rownames(datsummNT),
cex = 0.8)
## Here's another little plot wrinkle. If we don't do that to keep[#这里是另一个小的图皱纹。如果我们不这样做,为了保持]
## the names in bounds, we need some fancy footwork. Note when a[#中的名称界,我们需要一些奇特的步法。请注意,当一个]
## point is near the edge, I make sure the text prints toward the[#点附近的优势,向我要确保文本打印]
## center of the graph.[#中心的曲线图。]
plot(datsummNT$mean, datsummNT$var, xlab = "The Means",
ylab = "The Variances")
## calculate label positions. This is not as fancy as it could be. If[#计算标签的位置。这不是幻想,因为它可以。如果]
## there were lots of variables, we'd have to get smarter about[#有大量的变量,我们就必须变得更聪明]
## positioning labels on above, below, left, or right.[定位标签的上方,下方,左,或右。]
labelPos <- ifelse(datsummNT$mean - mean(datsummNT$mean,
na.rm = TRUE) > 0, 2, 4)
text(datsummNT$mean, datsummNT$var, labels = rownames(datsummNT),
cex = 0.8, pos = labelPos)
x <- data.frame(x = rnorm(N), y = gl(50, 2), z = rep(1:4,
25), ab = gl(2, 50))
summarize(x)
summarize(x, maxLevels = 15)
sumry <- summarize(x)
sumry[[1]] ##another way to get the numerics output[#另一种方式来获得输出的数值计算]
sumry[[2]] ##another way to get the factors output[#另一种方式来获得输出的因素]
dat <- data.frame(x = rnorm(N), y = gl(50, 2), z = factor(rep(1:4,
25), labels = c("A", "B", "C", "D")), animal = factor(ifelse(runif(N) <
0.2, "cow", ifelse(runif(N) < 0.5, "pig", "duck"))))
summarize(dat)
## Run this if you have internet access[#如果您有互联网连接,运行此]
## dat <- read.table(url("http://pj.freefaculty.org/guides/stat/DataSets/USNewsCollege/USNewsCollege.csv"),[#DAT < - read.table(的URL(“http://pj.freefaculty.org/guides/stat/DataSets/USNewsCollege/USNewsCollege.csv”),]
## sep = ",")[#九月=“,”)]
## colnames(dat) <- c("fice", "name", "state", "private", "avemath",[#colnames(DAT) - C(“五虎”,“名”,“状态”,“私人”,的“avemath”]
## "aveverb", "avecomb", "aveact", "fstmath",[,#“aveverb”中,“avecomb”中,“aveact”中,“fstmath”,]
## "trdmath", "fstverb", "trdverb", "fstact",[#“trdmath”,“fstverb”,“trdverb”,“fstact”,]
## "trdact", "numapps", "numacc", "numenr",[#“trdact,”numapps的“,”numacc“中,”numenr“,]
## "pctten", "pctquart", "numfull", "numpart",[#“pctten”,“pctquart”,“numfull”,“numpart”,]
## "instate", "outstate", "rmbrdcst", "roomcst",[#“缴费”中,“outstate”,“rmbrdcst”,“roomcst”,]
## "brdcst", "addfees", "bookcst", "prsnl",[#“BRDCST”中,“addfees”中,“bookcst”中,“prsnl”,]
## "pctphd", "pctterm", "stdtofac", "pctdonat",[#“pctphd中,”pctterm“中,”stdtofac“中,”pctdonat“,]
## "instcst", "gradrate")[#“instcst”中,“gradrate”)]
## dat$private <- factor(dat$private, labels = c("public",[#DAT私人< - 因子(DAT $私人,标签= C(“公共”,]
## "private"))[#“私人”))]
## sumry <- summarize(dat, digits = 2)[sumry <# - 总结,数字= 2(DAT)]
## sumry[#sumry]
## sumry[[1]][#sumry [[1]]]
## sumry[[2]][#sumry [[2]]]
## summarize(dat[, c("fice", "name", "private", "fstverb",[#总结(DAT,C(“五虎”,“名”,“私人”,“fstverb”]
## "avemath")], digits = 4)[#“avemath”)] = 4,数字)]
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|