crimtab(datasets)
crimtab()所属R语言包:datasets
Student's 3000 Criminals Data
3000学生的罪犯资料
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Data of 3000 male criminals over 20 years old undergoing their sentences in the chief prisons of England and Wales.
3000男性罪犯超过20年发生在英格兰和威尔士首席监狱服刑的旧数据。
用法----------Usage----------
格式----------Format----------
A table object of integer counts, of dimension 42 * 22 with a total count, sum(crimtab) of 3000.
一个tableinteger计数的对象的尺寸,42 * 22一个总数,sum(crimtab)3000。
The 42 rownames ("9.4", "9.5", ...) correspond to midpoints of intervals of finger lengths whereas the 22 column names (colnames) ("142.24", "144.78", ...) correspond to (body) heights of 3000 criminals, see also below.
42 rownames("9.4","9.5",...)对应手指的长度,而22列名的时间间隔的中点(colnames)("142.24" "144.78",...)对应(机构)3000罪犯的高度,又见下文。
Details
详情----------Details----------
Student is the pseudonym of William Sealy Gosset. In his 1908 paper he wrote (on page 13) at the beginning of section VI entitled Practical Test of the forgoing Equations:
学生是威廉·西利戈塞特的化名。在他1908的论文中,他写道(第13页),在第六节题为前述方程的实际测试开始:
“Before I had succeeded in solving my problem analytically, I had endeavoured to do so empirically. The material used was a correlation table containing the height and left middle finger measurements of 3000 criminals, from a paper by W. R. MacDonell (Biometrika, Vol. I., p. 219). The measurements were written out on 3000 pieces of cardboard, which were then very thoroughly shuffled and drawn at random. As each card was drawn its numbers were written down in a book, which thus contains the measurements of 3000 criminals in a random order. Finally, each consecutive set of 4 was taken as a sample—750 in all—and the mean, standard deviation, and correlation of each sample determined. The difference between the mean of each sample and the mean of the population was then divided by the standard deviation of the sample, giving us the z of Section III.”
“之前我曾在我的分析解决问题的成功,我努力这样做的经验。所使用的材料是相关的表,其中包含高度和左中指测量,从3000罪犯由WR MacDonell(Biometrika,一卷,第219页)的论文。被写入了3000块纸板,当时很彻底打乱,并随机抽取测量。由于每个卡被提请号码写在了一本书,因此包含3000罪犯在随机顺序测量。最后,每连续4集作为样本-750在所有的平均值,标准偏差,每个样品的相关决定。每个样本的平均值和平均人口之间的差异,再除以样本标准差,让我们第三节ž。“
The table is in fact page 216 and not page 219 in MacDonell(1902). In the MacDonell table, the middle finger lengths were given in mm and the heights in feet/inches intervals, they are both converted into cm here. The midpoints of intervals were used, e.g., where MacDonell has 4' 7''9/16 -- 8''9/16, we have 142.24 which is 2.54*56 = 2.54*(4' 8'').
事实上,该表是第216页,而不是在MacDonell(1902)第219页。在MacDonell表,中指长度毫米,高度英尺/英寸的间隔,它们都转换成厘米这里。例如,当使用区间的中点,MacDonell有4' 7''9/16 -- 8''9/16,我们有142.24是2.54 * 56 = 2.54 *(4' 8'')。
MacDonell credited the source of data (page 178) as follows: The data on which the memoir is based were obtained, through the kindness of Dr Garson, from the Central Metric Office, New Scotland Yard... He pointed out on page 179 that : The forms were drawn at random from the mass on the office shelves; we are therefore dealing with a random sampling.
MacDonell存入的数据源(178页)如下:获得的回忆录是根据数据的中央的公制办公室,新苏格兰场,通过加森博士的恩情......他指出,第179页:随机从办公室货架上的质量;因此,我们在处理与随机抽样的形式得出。
源----------Source----------
http://pbil.univ-lyon1.fr/R/donnees/criminals1902.txt thanks to Jean R. Lobry and Anne-B閍trice Dufour.
让R. Lobry和安妮 - B?atrice杜福尔http://pbil.univ-lyon1.fr/R/donnees/criminals1902.txt感谢。
参考文献----------References----------
The metric system of identification of criminals, as used in in Great Britain and Ireland. The Journal of the Anthropological Institute of Great Britain and Ireland 30, 161–198.
On criminal anthropometry and the identification of criminals. Biometrika 1, 2, 177–227.
Biometrika 6, 1–25.
举例----------Examples----------
require(stats)
dim(crimtab)
utils::str(crimtab)
## for nicer printing:[#更好印花:]
local({cT <- crimtab
colnames(cT) <- substring(colnames(cT), 2,3)
print(cT, zero.print = " ")
})
## Repeat Student's experiment:[#重复学生的实验:]
# 1) Reconstitute 3000 raw data for heights in inches and rounded to[1)3000复溶原始数据英寸的高度,并四舍五入至]
# nearest integer as in Student's paper:[最接近的整数,如学生的论文:]
(heIn <- round(as.numeric(colnames(crimtab)) / 2.54))
d.hei <- data.frame(height = rep(heIn, colSums(crimtab)))
# 2) shuffle the data:[2)洗牌的数据:]
set.seed(1)
d.hei <- d.hei[sample(1:3000), , drop = FALSE]
# 3) Make 750 samples each of size 4:[3)750样本大小为4:]
d.hei$sample <- as.factor(rep(1:750, each = 4))
# 4) Compute the means and standard deviations (n) for the 750 samples:[4)计算为750样品的手段和标准偏差(N):]
h.mean <- with(d.hei, tapply(height, sample, FUN = mean))
h.sd <- with(d.hei, tapply(height, sample, FUN = sd)) * sqrt(3/4)
# 5) Compute the difference between the mean of each sample and[5)计算,平均每个样品和之间的区别]
# the mean of the population and then divide by the[人口平均,然后除以]
# standard deviation of the sample:[样本的标准偏差:]
zobs <- (h.mean - mean(d.hei[,"height"]))/h.sd
# 6) Replace infinite values by +/- 6 as in Student's paper:[6)更换无限值+ / - 6学生的文件:]
zobs[infZ <- is.infinite(zobs)] # 3 of them[其中3]
zobs[infZ] <- 6 * sign(zobs[infZ])
# 7) Plot the distribution:[7)绘制分布:]
require(grDevices); require(graphics)
hist(x = zobs, probability = TRUE, xlab = "Student's z",
col = grey(0.8), border = grey(0.5),
main = "Distribution of Student's z score for 'crimtab' data")
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|