read.table(utils)
read.table()所属R语言包:utils
Data Input
数据输入
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Reads a file in table format and creates a data frame from it, with cases corresponding to lines and variables to fields in the file.
读取一个表格式的文件,并从它的数据框,创建相应的线路和文件中的字段变量的情况下。
用法----------Usage----------
read.table(file, header = FALSE, sep = "", quote = "\"'",
dec = ".", row.names, col.names,
as.is = !stringsAsFactors,
na.strings = "NA", colClasses = NA, nrows = -1,
skip = 0, check.names = TRUE, fill = !blank.lines.skip,
strip.white = FALSE, blank.lines.skip = TRUE,
comment.char = "#",
allowEscapes = FALSE, flush = FALSE,
stringsAsFactors = default.stringsAsFactors(),
fileEncoding = "", encoding = "unknown", text)
read.csv(file, header = TRUE, sep = ",", quote="\"", dec=".",
fill = TRUE, comment.char="", ...)
read.csv2(file, header = TRUE, sep = ";", quote="\"", dec=",",
fill = TRUE, comment.char="", ...)
read.delim(file, header = TRUE, sep = "\t", quote="\"", dec=".",
fill = TRUE, comment.char="", ...)
read.delim2(file, header = TRUE, sep = "\t", quote="\"", dec=",",
fill = TRUE, comment.char="", ...)
参数----------Arguments----------
参数:file
the name of the file which the data are to be read from. Each row of the table appears as one line of the file. If it does not contain an absolute path, the file name is relative to the current working directory, getwd(). Tilde-expansion is performed where supported. As from R 2.10.0 this can be a compressed file (see file). Alternatively, file can be a readable text-mode connection (which will be opened for reading if necessary, and if so closed (and hence destroyed) at the end of the function call). (If stdin() is used, the prompts for lines may be somewhat confusing. Terminate input with a blank line or an EOF signal, Ctrl-D on Unix and Ctrl-Z on Windows. Any pushback on stdin() will be cleared before return.) file can also be a complete URL. (For the supported URL schemes, see the "URLs" section of the help for url.)
是被读出的数据文件的名称。表中的每一行显示为一行文件。如果它不包含一个绝对路径,文件名是相对于当前的工作目录,getwd()。波浪线扩展进行支持的地方。从R 2.10.0可以是一个压缩文件(见file)。另外,file可以是一个可读的文本模式连接(阅读如有必要,将打开,如果是这样closeD(因此在函数调用结束时销毁))。 (stdin()如果用于提示行可能有些令人困惑。终止输入一个空行或一个EOF信号,Ctrl-DUnix和Ctrl-ZWindows上的任何推回 stdin()将返回之前清除。)file也可以是一个完整的URL。 (支持的URL方案,请参阅“网址”url帮助部分)。
参数:header
a logical value indicating whether the file contains the names of the variables as its first line. If missing, the value is determined from the file format: header is set to TRUE if and only if the first row contains one fewer field than the number of columns.
一个逻辑值,指示是否该文件包含的变量的名称作为其第一线。如果丢失,价值确定从文件格式:header设置为TRUE当且仅当第一行包含列数少于领域。
参数:sep
the field separator character. Values on each line of the file are separated by this character. If sep = "" (the default for read.table) the separator is "white space", that is one or more spaces, tabs, newlines or carriage returns.
字段分隔符。文件的每一行的值是通过这个角色分离。如果sep = ""(默认read.table)分隔符是“白色空间”,这是一个或多个空格,制表符,换行符或回车。
参数:quote
the set of quoting characters. To disable quoting altogether, use quote = "". See scan for the behaviour on quotes embedded in quotes. Quoting is only considered for columns read as character, which is all of them unless colClasses is specified.
引用字符集。完全禁用引用,使用quote = ""。看到scan引号中嵌入引号的行为。只考虑读的性格,这是所有这些,除非colClasses指定的列引用。
参数:dec
the character used in the file for decimal points.
字符用于在小数点文件。
参数:row.names
a vector of row names. This can be a vector giving the actual row names, or a single number giving the column of the table which contains the row names, or character string giving the name of the table column containing the row names. If there is a header and the first row contains one fewer field than the number of columns, the first column in the input is used for the row names. Otherwise if row.names is missing, the rows are numbered. Using row.names = NULL forces row numbering. Missing or NULL row.names generate row names that are considered to be "automatic" (and not preserved by as.matrix).
向量的行名。这可以是一个向量,给予实际的行名,或一个号码表,其中包含的行名,或字符串,包含行名称表列的名称列。如果有一个头的第一行包含列数少一个领域,在输入的第一列用于行名称。否则,如果row.names丢失,行编号。使用row.names = NULL部队排编号。失踪或NULLrow.names,生成的行被认为是“自动”(而不是由as.matrix保存)的名称。
参数:col.names
a vector of optional names for the variables. The default is to use "V" followed by the column number.
可选名称为变量的向量。默认是使用列数"V"其次。
参数:as.is
the default behavior of read.table is to convert character variables (which are not converted to logical, numeric or complex) to factors. The variable as.is controls the conversion of columns not otherwise specified by colClasses. Its value is either a vector of logicals (values are recycled if necessary), or a vector of numeric or character indices which specify which columns should not be converted to factors. Note: to suppress all conversions including those of numeric columns, set colClasses = "character". Note that as.is is specified per column (not per variable) and so includes the column of row names (if any) and any columns to be skipped.
read.table的默认行为转换成字符变量(而不是转换为逻辑,数字或复杂的)因素。变量as.is控制转换colClasses没有其他指定的列。它的值是一个逻辑值向量(如果有必要回收价值),或数字或字符索引指定的列不应该被转换为因素的向量。注:禁止所有的转换,包括那些数字列,设置colClasses = "character"。请注意,as.is指定每列(而不是每个变量)等行名称的列(如有)及任何要跳过的列。
参数:na.strings
a character vector of strings which are to be interpreted as NA values. Blank fields are also considered to be missing values in logical, integer, numeric and complex fields.
NA值作为解释的字符串的字符向量。空白领域也被认为是缺少逻辑,整数,数字和复杂的领域中的价值。
参数:colClasses
character. A vector of classes to be assumed for the columns. Recycled as necessary, or if the character vector is named, unspecified values are taken to be NA. Possible values are NA (the default, when type.convert is used), "NULL" (when the column is skipped), one of the atomic vector classes (logical, integer, numeric, complex, character, raw), or "factor", "Date" or "POSIXct". Otherwise there needs to be an as method (from package methods) for conversion from "character" to the specified formal class. Note that colClasses is specified per column (not per variable) and so includes the column of row names (if any).
字符。须承担一个班的向量为列。必要时,回收或如果被命名为特征向量,未指定的值是NA。可能的值是NA(默认情况下,当type.convert)"NULL"(列时跳过),一个原子的向量类(逻辑,整数,数字,复杂的,性格,原材料),或"factor","Date"或"POSIXct"。否则需要有一个as从methods转换到指定的正规类的方法(包"character")。请注意,colClasses指定每列(而不是每个变量)等行名称(如有)列。
参数:nrows
integer: the maximum number of rows to read in. Negative and other invalid values are ignored.
整数:最大数量的行读入负和其他无效值将被忽略。
参数:skip
integer: the number of lines of the data file to skip before beginning to read data.
整数:开始读取数据前跳过的数据文件的行数。
参数:check.names
logical. If TRUE then the names of the variables in the data frame are checked to ensure that they are syntactically valid variable names. If necessary they are adjusted (by make.names) so that they are, and also to ensure that there are no duplicates.
逻辑。如果TRUE然后检查数据框中的变量的名称,以确保它们是语法上有效的变量名。如果有必要,他们调整(make.names),使他们,同时也确保没有重复。
参数:fill
logical. If TRUE then in case the rows have unequal length, blank fields are implicitly added. See "Details".
逻辑。如果TRUE然后在情况下,行有长度不等的空白领域隐式添加。见“详细资料”。
参数:strip.white
logical. Used only when sep has been specified, and allows the stripping of leading and trailing white space from unquoted character fields (numeric fields are always stripped). See scan for further details (including the exact meaning of "white space"), remembering that the columns may include the row names.
逻辑。只用当sep已指定,并允许剥离的非上市character(numeric领域总是剥离领域)的开头和结尾的空白。看到scan进一步详情(包括“白色空间”的确切含义),记住,列可能包含的行名。
参数:blank.lines.skip
logical: if TRUE blank lines in the input are ignored.
逻辑:如果TRUE在输入空行被忽略。
参数:comment.char
character: a character vector of length one containing a single character or an empty string. Use "" to turn off the interpretation of comments altogether.
性格:特征向量的长度包含单个字符或一个空字符串之一。使用""完全关闭评论的解释。
参数:allowEscapes
logical. Should C-style escapes such as \n be processed or read verbatim (the default)? Note that if not within quotes these could be interpreted as a delimiter (but not as a comment character). For more details see scan.
逻辑。如\n处理或逐字读(默认)C风格逃逸?请注意,如果不是引号内的这些都可以解释为分隔符(而不是作为一个注释字符)。详细内容见scan。
参数:flush
logical: if TRUE, scan will flush to the end of the line after reading the last of the fields requested. This allows putting comments after the last field.
逻辑:如果TRUE,scan将刷新行结束后阅读领域的最后要求。这允许把意见后,最后一个字段。
参数:stringsAsFactors
logical: should character vectors be converted to factors? Note that this is overridden by as.is and colClasses, both of which allow finer control.
逻辑:特征向量转换的因素?请注意,这是由as.is和colClasses,这两者可以更好地控制覆盖。
参数:fileEncoding
character string: if non-empty declares the encoding used on a file (not a connection) so the character data can be re-encoded. See the "Encoding" section of the help for file, the "R Data Import/Export Manual" and "Note".
字符串:如果非空的声明文件(未连接)上使用这样的字符数据可以被重新编码的编码。看到“编码”部分,帮助file“R数据导入/导出手册”和“注意”。
参数:encoding
encoding to be assumed for input strings. It is used to mark character strings as known to be in Latin-1 or UTF-8 (see Encoding): it is not used to re-encode the input, but allows R to handle encoded strings in their native encoding (if one of those two). See "Value".
假设输入字符串编码。它是用来作为已知的Latin-1或UTF-8(见标记字符串Encoding):不使用它来重新编码输入,但允许R在他们的本地编码处理编码的字符串(如果这两个标准之一)。看到“价值”。
参数:text
character string: if file is not supplied and this is, then data are read from the value of text via a text connection. Notice that a literal string can be used to include (small) data sets within R code.
字符串:file如果不提供的,这是,那么数据是从text值读通过的文本连接。请注意,一个文字字符串,可用于包括(小)R代码集内的数据。
参数:...
Further arguments to be passed to read.table.
进一步的参数被传递到read.table。
Details
详情----------Details----------
This function is the principal means of reading tabular data into R.
此功能是表格数据读入河的主要手段
Unless colClasses is specified, all columns are read as character columns and then converted using type.convert to logical, integer, numeric, complex or (depending on as.is) factor as appropriate. Quotes are (by default) interpreted in all fields, so a column of values like "42" will result in an integer column.
除非colClasses指定的所有列字符列读取,然后转换使用type.convert逻辑,整数,数字,复杂的或适当的因子(as.is而定)。报价(默认)在各个领域的解释,所以像值列"42"将导致一个整数列。
A field or line is "blank" if it contains nothing (except whitespace if no separator is specified) before a comment character or the end of the field or line.
一个字段或行是“空白”,如果它不包含任何注释字符或字段或行结束前(除空白如果没有指定分隔符)。
If row.names is not specified and the header line has one less entry than the number of columns, the first column is taken to be the row names. This allows data frames to be read in from the format in which they are printed. If row.names is specified and does not refer to the first column, that column is discarded from such files.
如果row.names不指定标题行有一个小于列数项,第一列是该行的名称。这允许在他们印制的格式读取数据框。如果row.names指定并没有提及到第一列,该列从这些文件将被丢弃。
The number of data columns is determined by looking at the first five lines of input (or the whole file if it has less than five lines), or from the length of col.names if it is specified and is longer. This could conceivably be wrong if fill or blank.lines.skip are true, so specify col.names if necessary (as in the "Examples").
在第一五线输入(或整个文件,如果有少于五线),或从col.names如果它被指定较长的长度寻找数据的列数确定。这可以想见,是错误的,如果fill或blank.lines.skip是真实的,所以指定col.names如果有必要(如在“示例”)。
read.csv and read.csv2 are identical to read.table except for the defaults. They are intended for reading "comma separated value" files (".csv") or (read.csv2) the variant used in countries that use a comma as decimal point and a semicolon as field separator. Similarly, read.delim and read.delim2 are for reading delimited files, defaulting to the TAB character for the delimiter. Notice that header = TRUE and fill = TRUE in these variants, and that the comment character is disabled.
read.csv和read.csv2除了默认值是相同read.table。他们的目的是为阅读“逗号分隔值”文件(.csv)或(read.csv2)国家使用一个逗号作为小数点和一个分号作为字段分隔符使用的变种。同样,read.delim和read.delim2是阅读,默认的分隔符为制表符分隔的文件。请注意,header = TRUE和fill = TRUE注释字符在这些变种,被禁用。
The rest of the line after a comment character is skipped; quotes are not processed in comments. Complete comment lines are allowed provided blank.lines.skip = TRUE; however, comment lines prior to the header must have the comment character in the first non-blank column.
该行注释字符后的其余部分将被跳过,报价不处理意见。完整的注释行可以提供blank.lines.skip = TRUE;然而,注释行头前必须有注释字符的第一个非空白列。
Quoted fields with embedded newlines are supported except after a comment character.
嵌入式换行符引述领域的支持后,注释字符除外。
值----------Value----------
A data frame (data.frame) containing a representation of the data in the file.
一个数据框(data.frame)包含在文件中的数据的代表性。
Empty input is an error unless col.names is specified, when a 0-row data frame is returned: similarly giving just a header line if header = TRUE results in a 0-row data frame. Note that in either case the columns will be logical unless colClasses was supplied.
空的输入是一个错误,除非col.names被指定时,返回0行数据框:同样只是一个标题行,如果header = TRUE0行数据框中的结果。请注意,在任何情况下,列将是逻辑,除非colClasses提供。
Character strings in the result (including factor levels) will have a declared encoding if encoding is "latin1" or "UTF-8".
结果(包括因子水平)的字符串将有一个声明的编码,如果encoding是"latin1"或"UTF-8"。
内存使用----------Memory usage----------
These functions can use a surprising amount of memory when reading large files. There is extensive discussion in the "R Data Import/Export" manual, supplementing the notes here.
读取大文件时,这些功能都可以使用的内存数量惊人。 “R数据导入/导出”手册中有广泛的讨论,在这里补充笔记。
Less memory will be used if colClasses is specified as one of the six atomic vector classes. This can be particularly so when reading a column that takes many distinct numeric values, as storing each distinct value as a character string can take up to 14 times as much memory as storing it as an integer.
将使用较少的内存,如果colClasses指定的6个原子的向量类之一。这可以读一列,许多不同的数值,以每个不同的值存储为字符串可以占用多少内存作为存储作为一个整数,它的14倍时,尤其如此。
Using nrows, even as a mild over-estimate, will help memory usage.
使用nrows,甚至作为一个温和的估计,将有助于内存使用。
Using comment.char = "" will be appreciably faster than the read.table default.
使用comment.char = ""将略微快于read.table默认。
read.table is not the right tool for reading large matrices, especially those with many columns: it is designed to read data frames which may have columns of very different classes. Use scan instead for matrices.
read.table是不是合适的工具,阅读大量的矩阵,尤其是那些具有很多列的:它是用来读取数据框,其中可能有非常不同类别的列。使用scan,而不是为矩阵。
注意----------Note----------
The columns referred to in as.is and colClasses include the column of row names (if any).
列称为as.is和colClasses包括行名称的列(如有)。
Because this function uses pushBack it can only handle character strings which can be represented in the current locale. So although fileEncoding can be used to specify the encoding of the input file (or a connection can be specified which re-encodes), the implied re-encoding must be possible. This is not a problem in UTF-8 locales, but it can be on Windows — readLines or scan can be used to avoid this limitation since they have special provisions to convert input to UTF-8.
由于此功能使用pushBack它只能处理字符可以表示在当前语言环境的字符串。因此,尽管fileEncoding可用于指定编码输入文件(或重新编码,可以指定一个连接),隐含的重新编码必须是可能的。这是不是在UTF-8语言环境中存在的问题,但它可以在Windows - readLines或scan可以用来避免这种限制,因为他们有特别规定转换为UTF-8的输入。
参考文献----------References----------
Data for models. Chapter 3 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.
参见----------See Also----------
The "R Data Import/Export" manual.
“R数据导入/导出”手册。
scan, type.convert, read.fwf for reading fixed width formatted input; write.table; data.frame.
scan,type.convert,read.fwf读取固定宽度的格式输入;write.table;data.frame。
count.fields can be useful to determine problems with reading files which result in reports of incorrect record lengths (see the "Examples" below).
count.fields可能是有用的,以确定导致不正确的记录长度的报告(见下面的“范例”)与读文件的问题。
http://tools.ietf.org/html/rfc4180 for the IANA definition of CSV files (which requires comma as separator and CRLF line endings).
http://tools.ietf.org/html/rfc4180为CSV文件的IANA定义(这需要逗号作为分隔符,CRLF作为行结束)。
举例----------Examples----------
## using count.fields to handle unknown maximum number of fields[#使用count.fields的处理未知领域的最大数量]
## when fill=TRUE[#当填写= TRUE]
test1 <- c(1:5, "6,7", "8,9,10")
tf <- tempfile()
writeLines(test1, tf)
read.csv(tf, fill = TRUE) # 1 column[1列]
ncol <- max(count.fields(tf, sep = ","))
read.csv(tf, fill = TRUE, header = FALSE,
col.names = paste("V", seq_len(ncol), sep = ""))
unlink(tf)
## "Inline" data set, using text=[“内联”#数据集,使用文本=]
## Notice that leading and trailing empty lines are auto-trimmed[#请注意开头和结尾的空行自动调整]
read.table(header=TRUE, text="
a b
1 2
3 4
")
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|