read.SAScii(SAScii)
read.SAScii()所属R语言包:SAScii
Create an R data frame by reading in an ASCII file and SAS import instructions
创建一个R的数据框在ASCII文件中读取和SAS导入说明
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Using importation code designed for SAS users to read ASCII files into sas7bdat files, the read.SAScii function parses through the INPUT block of a (.sas) syntax file to design the parameters needed for a read.fwf function call, and then runs that command. This allows the user to specify the location of the ASCII (often a .dat) file and the location of the .sas syntax file, and then load the data frame directly into R in just one step.
使用输入代码读出ASCII文件到sas7bdat文件的SAS用户中,read.SAScii功能解析整个输入块(SAS)的语法文件设计的read.fwf函数调用所需的参数,然后运行该命令。这允许用户指定位置的ASCII文件(通常是一个DAT)的位置。SAS语法文件,然后加载到R的数据框直接在短短的一个步骤。
用法----------Usage----------
read.SAScii( fn, sas_ri, beginline = 1, buffersize = 50, zipped = F , n = -1 , intervals.to.print = 1000 , lrecl = NULL , skip.decimal.division = NULL )
参数----------Arguments----------
参数:fn
Character string containing location of ASCII filename (or if zipped = T, a filename ending in .zip).
字符串,其中包含ASCII文件名的位置(或压缩。压缩文件的文件名)= T,。
参数:sas_ri
Character string containing location of SAS import instructions.
字符串位置的SAS进口的说明。
参数:beginline
Line number in SAS import instructions where the INPUT statement begins. If the word INPUT appears before the actual INPUT block, the function will return an error.
在SAS导入说明在这里输入语句开始的行号。如果单词输入之前出现的实际输入模块,该函数将返回一个错误。
参数:buffersize
Maximum number of lines to read at one time, passed to read.fwf().
最大的一次读取的行数,通过read.fwf()。
参数:zipped
Flag noting if ASCII file should be unzipped / decompressed before loading. Useful when downloading larger data sets directly from a website.
旗指出,如果ASCII文件应在装货前解压缩/解压缩。集直接从网站下载更大的数据时非常有用。
参数:n
the maximum number of records (lines) to be passed to read.fwf(), defaulting to no limit.
记录的最大数量(系)要传递给read.fwf(),默认为没有限制。
参数:intervals.to.print
the number of records to wait before printing current progress to the screen.
等待目前的进展在打印前在屏幕上的记录数。
参数:lrecl
LRECL option from SAS code. Only necessary if the width of the ASCII file is longer than the actual columns containing data (if the file contains empty space on the right side)
LRECL选项SAS代码。只有宽度的ASCII文件是必要的,如果时间比实际包含数据的列(如果该文件包含一个空的空间在右侧)
参数:skip.decimal.division
whether numeric columns should be divided based on how many decimal places are specified by the SAS import instructions. recommended: ignore this parameter (or set it to NULL) to let the function attempt to determine whether numeric columns have already been divided to hit the appropriate number of decimal places or not. TRUE tells read.SAScii to not perform any decimal-related division of numeric columns. FALSE tells read.SAScii to perform decimal-related division according to the SAS import instructions, ignoring the presence of numeric fields that already contain decimals.
不管是数字列应划分的基础上所指定的SAS导入说明多少位小数。推荐:忽略此参数(或将其设置为NULL)功能让数字列,以确定是否已经被划分为达到适当数量的小数位数或不。 TRUE告诉read.SAScii不执行任何十进制数值列相关部门。 ,FALSE告诉read.SAScii的到十进制相关部门根据SAS导入说明,忽略的存在已经包含小数的数值字段。
Details
详细信息----------Details----------
This function cannot handle overlapping columns. For example, in the 2009 National Ambulatory Medical Care Survey (NAMCS) SAS import instructions, columns DIAG1 and DIAG13D will create an error because both start at space 55. <br> ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/dataset_documentation/namcs/sas/nam09inp.txt.
此功能不能处理重叠的列。例如,在2009年全国门诊医疗调查(NAMCS)SAS导入说明,列DIAG1和DIAG13D将创建一个错误,因为这两个空间55。 <BR> ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/dataset_documentation/namcs/sas/nam09inp.txt。
值----------Value----------
A data.frame as produced by read.fwf() which is called internally.
数据框产生的内部被称为read.fwf()。
注意----------Note----------
Some of the commands below take days to run, depending on your machine. If you need the Survey of Income and Program Participation, start the program before you quit working for the weekend.
下面的一些命令需要花费数天,这取决于你的机器上运行。如果您需要的收入和项目参与调查,启动该程序,然后停止工作的周末。
(作者)----------Author(s)----------
Anthony Joseph Damico
实例----------Examples----------
###########[##########]
#Some Data#[一些数据#]
###########[##########]
#write an example ASCII data set[写一个ASCII数据集的例子]
some.data <- "0154hello2304coolgreatZZ\n2034puppy0023nicesweetok\n9900buddy4495 swell!!"
#create temporary ASCII file[创建临时的ASCII文件]
some.data.tf <- tempfile()
#write the sas code above to that temporary file[上面写的SAS代码,临时文件]
writeLines ( some.data , con = some.data.tf )
#write an example SAS import script using the at method[写一个例子的SAS导入脚本使用方法]
sas.import.with.at.signs <-
"INPUT
@1 NUMBERS1 4.2
@5 WORDS1 $ 5.
@10 NUMBERS2 2.0
@12 NUMBERS3 2.0
@14 WORDS2 $4.
@18 WORDS3 $5
@23 WORDS4 $ 1
@24 WORDS5 $ 1
;"
#create a temporary file[创建一个临时文件]
sas.import.with.at.signs.tf <- tempfile()
#write the sas code above to that temporary file[上面写的SAS代码,临时文件]
writeLines ( sas.import.with.at.signs , con = sas.import.with.at.signs.tf )
parse.SAScii( sas.import.with.at.signs.tf )
#using at signs sas script[使用在SAS脚本的迹象]
read.SAScii( some.data.tf , sas.import.with.at.signs.tf )
#write an example SAS import script using the dash method[写一个例子的的SAS导入脚本使用破折号方法]
sas.import.with.lengths <-
"INPUT
NUMBERS1 1 - 4 .2
WORDS1 $ 5-9
NUMBERS2 10 -11
NUMBERS3 12- 13 .0
WORDS2 $14-17
WORDS3$ 18-22
WORDS4 $ 23-23
WORDS5 $24
;"
#create a temporary file[创建一个临时文件]
sas.import.with.lengths.tf <- tempfile()
#write the sas code above to that temporary file[上面写的SAS代码,临时文件]
writeLines ( sas.import.with.lengths , con = sas.import.with.lengths.tf )
parse.SAScii( sas.import.with.lengths.tf )
#using dash method sas script[使用虚方法SAS脚本]
read.SAScii( some.data.tf , sas.import.with.lengths.tf )
## Not run: [#不运行:]
#########################################################################################[################################################## ######################################]
#Load the 2009 Medical Expenditure Panel Survey Emergency Room Visits file as an R data frame[载入2009年医疗费用小组调查急诊室访问文件作为R的数据框]
#Location of the ASCII 2009 Medical Expenditure Panel Survey Emergency Room Visits File[2009年的ASCII医疗费用小组调查急诊室的位置访问文件]
MEPS.09.ER.visit.file.location <-
"http://meps.ahrq.gov/mepsweb/data_files/pufs/h126edat.exe"
#Location of the SAS import instructions for the[位置SAS进口的说明]
#2009 Medical Expenditure Panel Survey Emergency Room Visits File[2009年医疗费用小组调查的急诊室文件]
MEPS.09.ER.visit.SAS.read.in.instructions <-
"http://meps.ahrq.gov/mepsweb/data_stats/download_data/pufs/h126e/h126esu.txt"
#Load the 2009 Medical Expenditure Panel Survey Emergency Room Visits File[2009年医疗费用小组调查急诊室加载访问文件]
#NOTE: The SAS INPUT command occurs at line 273.[注意:发生在273线的SAS输入命令。]
MEPS.09.ER.visit.df <-
read.SAScii (
MEPS.09.ER.visit.file.location ,
MEPS.09.ER.visit.SAS.read.in.instructions ,
zipped = T ,
beginline = 273 )
#save the data frame now for instantaneous loading later[现在保存的数据框后瞬间加载]
save( MEPS.09.ER.visit.df , file = "MEPS.09.ER.visit.data.rda" )
#########################################################################################[################################################## ######################################]
#Load the 2011 National Health Interview Survey Persons file as an R data frame[载入2011年全国健康访问调查人的文件作为一个R的数据框]
NHIS.11.personsx.SAS.read.in.instructions <-
"ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Program_Code/NHIS/2011/personsx.sas"
NHIS.11.personsx.file.location <-
"ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/NHIS/2011/personsx.zip"
#store the NHIS file as an R data frame![NHIS文件存储作为一个R的数据框!]
NHIS.11.personsx.df <-
read.SAScii (
NHIS.11.personsx.file.location ,
NHIS.11.personsx.SAS.read.in.instructions ,
zipped = T )
#or store the NHIS SAS import instructions for use in a [或在一个存储的的NHIS SAS进口的使用说明]
#read.fwf function call outside of the read.SAScii function[read.fwf以外的read.SAScii函数的函数调用]
NHIS.11.personsx.sas <- parse.SAScii( NHIS.11.personsx.SAS.read.in.instructions )
#save the data frame now for instantaneous loading later[现在保存的数据框后瞬间加载]
save( NHIS.11.personsx.df , file = "NHIS.11.personsx.data.rda" )
#########################################################################################[################################################## ######################################]
#Load the 2011 National Health Interview Survey Sample Adult file as an R data frame[载入2011年全国健康访问调查样本成人文件作为一个R的数据框]
NHIS.11.samadult.SAS.read.in.instructions <-
"ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Program_Code/NHIS/2011/SAMADULT.sas"
NHIS.11.samadult.file.location <-
"ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/NHIS/2011/samadult.zip"
#store the NHIS file as an R data frame![NHIS文件存储作为一个R的数据框!]
NHIS.11.samadult.df <-
read.SAScii (
NHIS.11.samadult.file.location ,
NHIS.11.samadult.SAS.read.in.instructions ,
zipped = T )
#or store the NHIS SAS import instructions for use in a [或在一个存储的的NHIS SAS进口的使用说明]
#read.fwf function call outside of the read.SAScii function[read.fwf以外的read.SAScii函数的函数调用]
NHIS.11.samadult.sas <- parse.SAScii( NHIS.11.samadult.SAS.read.in.instructions )
#save the data frame now for instantaneous loading later[现在保存的数据框后瞬间加载]
save( NHIS.11.samadult.df , file = "NHIS.11.samadult.data.rda" )
#########################################################################################[################################################## ######################################]
#Load an IPUMS - American Community Survey Extract into R[载入IPUMS - 美国社区调查提取到R]
#DOES NOT RUN without downloading ACS ASCII files to[无需下载ACS ASCII文件不运行]
#your local drive from http://www.ipums.org/[您的本地驱动器http://www.ipums.org/]
#MINNESOTA POPULATION CENTER - IPUMS ASCII EXTRACTS & SAS import instructions[明尼苏达人口中心 - IPUMS ASCII提取物及SAS导入说明]
IPUMS.file.location <- "./IPUMS/usa_00001.dat"
IPUMS.SAS.read.in.instructions <- "./IPUMS/usa_00001.sas"
#store the IPUMS extract as an R data frame![存储IPUMS提取物为R的数据框!]
IPUMS.df <-
read.SAScii (
IPUMS.file.location ,
IPUMS.SAS.read.in.instructions ,
zipped = F )
#or store the IPUMS extract SAS import instructions for use in a [或存储IPUMS提取物SAS导入说明中使用的]
#read.fwf function call outside of the read.SAScii function[read.fwf以外的read.SAScii函数的函数调用]
IPUMS.sas <- parse.SAScii( IPUMS.SAS.read.in.instructions )
#########################################################################################[################################################## ######################################]
#Load the Current Population Survey - [加载当前的人口调查 - ]
#Annual Social and Economic Supplement - March 2011 as an R data frame[社会和经济的年度补充资料 - 2011年3月为R的数据框]
#census.gov website containing the current population survey's main file[census.gov网站目前的人口调查的主要文件]
CPS.ASEC.mar11.file.location <-
"http://smpbff2.dsd.census.gov/pub/cps/march/asec2011_pubuse.zip"
CPS.ASEC.mar11.SAS.read.in.instructions <-
"http://www.nber.org/data/progs/cps/cpsmar11.sas"
#create a temporary file and a temporary directory..[创建一个临时文件和临时目录..]
tf <- tempfile() ; td <- tempdir()
#download the CPS repwgts zipped file[“下载的CPS repwgts压缩文件]
download.file( CPS.ASEC.mar11.file.location , tf , mode = "wb" )
#unzip the file's contents and store the file name within the temporary directory[解压缩文件的内容,并存储在临时目录中的文件名]
fn <- unzip( tf , exdir = td , overwrite = T )
#the CPS March Supplement ASCII/FWF contains household-, family-, and person-level records.[CPS 3月补编ASCII / FWF包含家庭,家庭和个人级别的记录。]
#throw out records that are not person-level.[扔了,没有人的记录。]
#according to the SAS import instructions, person-level record lines begin with a "3"[根据SAS导入说明,人级记录线开始用“3”]
#create a second temporary file[创建第二个临时文件]
tf.sub <- tempfile()
input <- fn
output <- tf.sub
incon <- file(input, "r")
outcon <- file(output, "w")
#cycle through every line in the downloaded CPS file..[周期通过下载的CPS文件中的每一行..]
while(length(line <- readLines(incon, 1))>0){
#and if the first letter is a 3, add it to the new person-only CPS file.[如果第一个字母是一个3分,将它添加到新的人CPS文件。]
if ( substr( line , 1 , 1 ) == "3" ){
writeLines(line,outcon)
}
}
close(outcon)
close(incon , add = T)
#the SAS file produced by the National Bureau of Economic Research (NBER)[SAS文件由国家经济研究局(NBER)]
#begins the person-level INPUT after line 1209, [开始后线1209人电平输入,]
#so skip SAS import instruction lines before that.[在此之前,所以跳过SAS进口指令行。]
#NOTE that the beginline of 1209 will change for different years.[请注意,1209 beginline会改变不同的年份。]
#store the CPS ASEC March 2011 file as an R data frame![CPS ASEC 2011文件存储作为一个R的数据框!]
cps.asec.mar11.df <-
read.SAScii (
tf.sub ,
CPS.ASEC.mar11.SAS.read.in.instructions ,
beginline = 1209 ,
zipped = F )
#or store the CPS ASEC March 2011 SAS import instructions for use in a [或储存CPS ASEC 3月2011 SAS进口的使用说明在]
#read.fwf function call outside of the read.SAScii function[read.fwf以外的read.SAScii函数的函数调用]
cps.asec.mar11.sas <-
parse.SAScii( CPS.ASEC.mar11.SAS.read.in.instructions , beginline = 1209 )
#########################################################################################[################################################## ######################################]
#Load the Replicate Weights file of the Current Population Survey [加载复制权重文件的当前人口调查]
#March 2011 as an R data frame[2011年3月R的数据框]
#census.gov website containing the current population survey's replicate weights file[census.gov网站目前的人口调查的复制权重文件]
CPS.replicate.weight.file.location <-
"http://smpbff2.dsd.census.gov/pub/cps/march/CPS_ASEC_ASCII_REPWGT_2011.zip"
CPS.replicate.weight.SAS.read.in.instructions <-
"http://smpbff2.dsd.census.gov/pub/cps/march/CPS_ASEC_ASCII_REPWGT_2011.SAS"
#store the CPS repwgt file as an R data frame![R的数据框存储CPS repwgt的文件!]
cps.repwgt.df <-
read.SAScii (
CPS.replicate.weight.file.location ,
CPS.replicate.weight.SAS.read.in.instructions ,
zipped = T )
#or store the CPS repwgt SAS import instructions for use in a [或存储的CPS repwgt SAS导入说明中使用的]
#read.fwf function call outside of the read.SAScii function[read.fwf以外的read.SAScii函数的函数调用]
cps.repwgt.sas <- parse.SAScii( CPS.replicate.weight.SAS.read.in.instructions )
#########################################################################################[################################################## ######################################]
#Load the 2008 Survey of Income and Program Participation Wave 1 as an R data frame[将2008年的调查收入和计划参与波1,R的数据框]
SIPP.08w1.SAS.read.in.instructions <-
"http://smpbff2.dsd.census.gov/pub/sipp/2008/l08puw1.sas"
SIPP.08w1.file.location <-
"http://smpbff2.dsd.census.gov/pub/sipp/2008/l08puw1.zip"
#store the SIPP file as an R data frame[存储的SIPP文件作为R的数据框]
#note the text "INPUT" appears before the actual INPUT block of the SAS code[“INPUT”文字出现前的实际输入块SAS代码]
#so the parsing of the SAS instructions will fail without a beginline parameter specifying[所以分析的SAS指令将失败,没有一个beginline参数指定]
#where the appropriate INPUT block occurs[适当的输入块]
SIPP.08w1.df <-
read.SAScii (
SIPP.08w1.file.location ,
SIPP.08w1.SAS.read.in.instructions ,
beginline = 5 ,
buffersize = 10 ,
zipped = T )
#or store the SIPP SAS import instructions for use in a [或存储SIPP SAS导入说明用于一]
#read.fwf function call outside of the read.SAScii function[read.fwf以外的read.SAScii函数的函数调用]
SIPP.08w1.sas <- parse.SAScii( SIPP.08w1.SAS.read.in.instructions , beginline = 5 )
#########################################################################################[################################################## ######################################]
#Load the Replicate Weights file of the [加载文件的复制权重]
#2008 Survey of Income and Program Participation Wave 1 as an R data frame[“2008年的收入和项目参与波1,R的数据框]
SIPP.repwgt.08w1.SAS.read.in.instructions <-
"http://smpbff2.dsd.census.gov/pub/sipp/2008/rw08wx.sas"
SIPP.repwgt.08w1.file.location <-
"http://smpbff2.dsd.census.gov/pub/sipp/2008/rw08w1.zip"
#store the SIPP file as an R data frame[存储的SIPP文件作为R的数据框]
#note the text "INPUT" appears before the actual INPUT block of the SAS code[“INPUT”文字出现前的实际输入块SAS代码]
#so the parsing of the SAS instructions will fail without a beginline parameter specifying[所以分析的SAS指令将失败,没有一个beginline参数指定]
#where the appropriate INPUT block occurs[适当的输入块]
SIPP.repwgt.08w1.df <-
read.SAScii (
SIPP.repwgt.08w1.file.location ,
SIPP.repwgt.08w1.SAS.read.in.instructions ,
beginline = 5 ,
zipped = T )
#store the SIPP SAS import instructions for use in a [存储SIPP SAS导入说明用于一]
#read.fwf function call outside of the read.SAScii function[read.fwf以外的read.SAScii函数的函数调用]
SIPP.repwgt.08w1.sas <-
parse.SAScii(
SIPP.repwgt.08w1.SAS.read.in.instructions ,
beginline = 5 )
#########################################################################################[################################################## ######################################]
#Load all twelve waves of the 2004 Survey of Income and Program Participation as R data frames[将2004年的收入和项目参与调查的所有12波为R的数据框]
SIPP.04w1.SAS.read.in.instructions <-
"http://smpbff2.dsd.census.gov/pub/sipp/2004/l04puw1.sas"
#store the SIPP SAS import instructions for use in a [存储SIPP SAS导入说明用于一]
#read.fwf function call outside of the read.SAScii function[read.fwf以外的read.SAScii函数的函数调用]
SIPP.04w1.sas <- parse.SAScii( SIPP.04w1.SAS.read.in.instructions , beginline = 5 )
#note the text "INPUT" appears before the actual INPUT block of the SAS code[“INPUT”文字出现前的实际输入块SAS代码]
#so the parsing of the SAS instructions will fail without a beginline parameter specifying[所以分析的SAS指令将失败,没有一个beginline参数指定]
#where the appropriate INPUT block occurs[适当的输入块]
#loop through all 12 waves of SIPP 2004[遍历所有12个波2004年的SIPP]
for ( i in 1:12 ){
SIPP.04wX.file.location <-
paste( "http://smpbff2.dsd.census.gov/pub/sipp/2004/l04puw" , i , ".zip" , sep = "" )
#name the data frame based on the current wave[命名上的电流波形为基础的数据框]
df.name <- paste( "SIPP.04w" , i , ".df" , sep = "" )
#store the SIPP file as an R data frame![存储的SIPP文件作为R的数据框!]
assign(
df.name ,
read.SAScii (
SIPP.04wX.file.location ,
SIPP.04w1.SAS.read.in.instructions ,
beginline = 5 ,
buffersize = 5 ,
zipped = T )
)
}
## End(Not run)[#(不执行)]
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|