找回密码
 注册
查看: 2269|回复: 0

R语言 arules包 Adult()函数中文帮助文档(中英文对照)

  [复制链接]
发表于 2012-9-12 11:04:07 | 显示全部楼层 |阅读模式
Adult(arules)
Adult()所属R语言包:arules

                                        Adult Data Set
                                         成人数据集

                                         译者:生物统计家园网 机器人LoveR

描述----------Description----------

The AdultUCI data set contains the questionnaire data of the “Adult” database (originally called the “Census Income” Database) formatted as a data.frame.  The Adult data set contains the data already prepared and coerced to transactions for use with arules.  
AdultUCI数据集包含的“成人”数据库(最初被称为“人口普查收入”数据库)数据框格式的问卷调查的数据。 Adult数据集包含的数据已经准备好,并强制转换为transactions使用arules。


用法----------Usage----------


data("Adult")
data("AdultUCI")



格式----------Format----------

The AdultUCI data set contains a data frame with 48842 observations on the following 15 variables.
AdultUCI数据集包含一个数据框48842以下15个变量的观察。




age a numeric vector.
年龄的数值向量。




workclass a factor with levels Federal-gov, Local-gov, Never-worked, Private, Self-emp-inc, Self-emp-not-inc, State-gov,
workclass的因素与水平Federal-gov,Local-gov,Never-worked,Private,Self-emp-inc,Self-emp-not-inc,State-gov,




education an ordered factor with levels Preschool < 1st-4th < 5th-6th < 7th-8th < 9th < 10th < 11th < 12th < HS-grad < Prof-school < Assoc-acdm < Assoc-voc < Some-college < Bachelors < Masters <
教育的有序因素与水平Preschool1st-4th5th-6th7th-8th9th10th11th<12thHS-gradProf-schoolAssoc-acdmAssoc-vocSome-college<X ><Bachelors




education-num a numeric vector.
教育NUM一个数值向量。




marital-status a factor with levels Divorced, Married-AF-spouse, Married-civ-spouse, Married-spouse-absent, Never-married,
婚姻状况的因素与水平Divorced,Married-AF-spouse,Married-civ-spouse,Married-spouse-absent,Never-married,




occupation a factor with levels Adm-clerical, Armed-Forces, Craft-repair, Exec-managerial, Farming-fishing, Handlers-cleaners, Machine-op-inspct, Other-service, Priv-house-serv, Prof-specialty, Protective-serv, Sales, Tech-support, and
职业与水平的一个因素Adm-clerical,Armed-Forces,Craft-repair,Exec-managerial,Farming-fishing,Handlers-cleaners,Machine-op-inspct,<X >,Other-service,Priv-house-serv,Prof-specialty,Protective-serv,Sales,和




relationship a factor with levels Husband, Not-in-family, Other-relative, Own-child,
关系的因素与水平Husband,Not-in-family,Other-relative,Own-child,




race a factor with levels Amer-Indian-Eskimo, Asian-Pac-Islander, Black, Other, and
水平Amer-Indian-Eskimo,Asian-Pac-Islander,Black,Other,和比赛的一个因素




sex a factor with levels Female and Male.
水平Female和Male性的因素。




capital-gain a numeric vector.
资本获得一个数值向量。




capital-loss a numeric vector.
资本损失的数值向量。




fnlwgt a numeric vector.
fnlwgt一个数值向量。




hours-per-week a numeric vector.
小时,每星期一个数值向量。




native-country a factor with levels Cambodia, Canada, China, Columbia, Cuba, Dominican-Republic, Ecuador, El-Salvador, England, France, Germany, Greece, Guatemala, Haiti, Holand-Netherlands, Honduras, Hong, Hungary, India, Iran, Ireland, Italy, Jamaica, Japan, Laos, Mexico, Nicaragua, Outlying-US(Guam-USVI-etc), Peru, Philippines, Poland, Portugal, Puerto-Rico, Scotland, South, Taiwan, Thailand, Trinadad&amp;Tobago, United-States,
为母语的国家的一个因素水平Cambodia,Canada,China,Columbia,Cuba,Dominican-Republic,Ecuador, El-Salvador,England,France,Germany,Greece,Guatemala,Haiti,Holand-Netherlands,<X >,Honduras,Hong,Hungary,India,Iran,Ireland,Italy,Jamaica Japan,Laos,Mexico,Nicaragua,Outlying-US(Guam-USVI-etc),Peru,Philippines,Poland, Portugal,Puerto-Rico,Scotland,South,Taiwan,Thailand,Trinadad&amp;Tobago,




income an ordered factor with levels small <
收入的有序因素与水平small<


Details

详细信息----------Details----------

The &ldquo;Adult&rdquo; database was extracted from the census bureau database found at http://www.census.gov/ftp/pub/DES/www/welcome.html in 1994 by Ronny Kohavi and Barry Becker, Data Mining and Visualization, Silicon Graphics. It was originally used to predict whether income exceeds USD 50K/yr based on census data. We added the attribute income with levels small and large (>50K).
“成人”的数据库,从人口普查局发现在http://www.census.gov/ftp/pub/DES/www/welcome.html在1994年罗尼Kohavi和巴里·贝克尔,数据挖掘和可视化的数据库中提取, Silicon Graphics公司。它最初是用来预测收入是否超过美元50K/yr,根据普查数据。我们添加了属性income同级别small和large(> 50000)。

We prepared the data set for association mining as shown in the  section Examples. We removed the continuous attribute fnlwgt (final weight). We also eliminated education-num because it is just a numeric representation of the attribute education. The other 4 continuous attributes we mapped to ordinal attributes as follows:
我们准备的数据部分中的实施例所示,设置为关联挖掘。我们删除了连续属性fnlwgt(最终重量)。我们也消除了education-num,因为它只是一个数字表示的属性education。其他4个连续的属性映射到序属性,如下所示:




age cut into levels  Young (0-25), Middle-aged (26-45), Senior (46-65) and
年龄切成Young(0-25),Middle-aged(26-45),Senior(46-65级)和




hours-per-week cut into levels Part-time (0-25), Full-time (25-40), Over-time (40-60) and
小时,每星期切成水平Part-time(0-25),Full-time(25-40),Over-time(40-60)




capital-gain and capital-loss each cut into levels None (0), Low (0 < median of the values greater zero < max) and
资本增益和资本损失每个切成水平None(0),Low(0 <中位数的值大于零的<最大)和


源----------Source----------

http://www.ics.uci.edu/~mlearn/MLRepository.html



参考文献----------References----------

UCI Repository of Machine Learning Databases. Irvine, CA: University of California, Department of Information and Computer Science.
of Naive-Bayes Classifiers: a Decision-Tree Hybrid.  Proceedings of the Second International Conference on Knowledge Discovery and Data Mining.

实例----------Examples----------


data("AdultUCI")
dim(AdultUCI)
AdultUCI[1:2,]

## remove attributes[#删除属性]
AdultUCI[["fnlwgt"]] <- NULL
AdultUCI[["education-num"]] <- NULL

## map metric attributes[#公制的属性映射]
AdultUCI[["age"]] <- cut(AdultUCI[["age"]], c(15,25,45,65,100),
    labels = c("Young", "Middle-aged", "Senior", "Old"), ordered=TRUE)

AdultUCI[["hours-per-week"]] <- cut(AdultUCI[["hours-per-week"]],
    c(0,25,40,60,168), labels = c("Part-time", "Full-time",
        "Over-time", "Workaholic"), ordered=TRUE)

AdultUCI[["capital-gain"]] <- cut(AdultUCI[["capital-gain"]], c(-Inf,0,
    median(AdultUCI[["capital-gain"]][AdultUCI[["capital-gain"]]>0]),Inf),
        labels = c("None", "Low", "High"), ordered=TRUE)

AdultUCI[["capital-loss"]] <- cut(AdultUCI[["capital-loss"]], c(-Inf,0,
    median(AdultUCI[[ "capital-loss"]][AdultUCI[[ "capital-loss"]]>0]),Inf),
        labels = c("none", "low", "high"), ordered=TRUE)

## create transactions[#创建交易]
Adult <- as(AdultUCI, "transactions")
Adult


转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2024-11-23 20:02 , Processed in 0.027424 second(s), 15 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表