R语言:strsplit()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-16 18:05:54

strsplit(base)
strsplit()所属R语言包：base

                                    Split the Elements of a Character Vector
                                       分裂的特征向量的元素

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

Split the elements of a character vector x into substrings according to the matches to substring split within them.
根据子串x在他们的比赛，分裂成子字符向量split元素。

用法----------Usage----------

strsplit(x, split, fixed = FALSE, perl = FALSE, useBytes = FALSE)

参数----------Arguments----------

参数：x
character vector, each element of which is to be split.  Other inputs, including a factor, will give an error.
特征向量，其中的每个元素被分裂。其他的投入，包括一个因素，将给出一个错误。

参数：split
character vector (or object which can be coerced to such) containing regular expression(s) (unless fixed = TRUE) to use for splitting.  If empty matches occur, in particular if split has length 0, x is split into single characters. If split has length greater than 1, it is re-cycled along x.
特征向量（或对象可以强制等）含正则表达式（S）（除非fixed = TRUE）用于分裂。如果出现空场比赛，尤其是如果split长度为0，x分成单个字符。如果split长度大于1，它是沿x循环再用。

参数：fixed
logical.  If TRUE match split exactly, otherwise use regular expressions.  Has priority over perl.
逻辑。如果TRUE匹配split准确，否则使用正则表达式。有过perl优先。

参数：perl
logical.  Should perl-compatible regexps be used?
逻辑。 Perl兼容的正则表达式应该使用？

参数：useBytes
logical.  If TRUE the matching is done byte-by-byte rather than character-by-character, and inputs with marked encodings are not converted.  This is forced (with a warning) if any input is found which is marked as "bytes".
逻辑。如果TRUE进行匹配字节逐字节而不是字符，并与显着的编码输入字符不会被转换。这是被迫（警告），如果发现任何输入被标记为"bytes"。

Details

详情----------Details----------

Argument split will be coerced to character, so you will see uses with split = NULL to mean split = character(0), including in the examples below.
参数split将被强制转换为字符，所以你会看到split = NULL指split = character(0)，包括在下面的例子使用。

Note that splitting into single characters can be done via split = character(0) or split = ""; the two are equivalent.  The definition of "character" here depends on the locale: in a single-byte locale it is a byte, and in a multi-byte locale it is the unit represented by a "wide character" (almost always a Unicode point).
请注意，分拆为单个字符，可以通过split = character(0)或split = "";两者是等价的。这里的“性格”的定义取决于语言环境：在一个单字节语言环境，它是一个字节和多字节语言环境中，它是由“宽字符”（几乎总是一个Unicode点）所代表的单位。

A missing value of split does not split the corresponding element(s) of x at all.
一个split缺失值不分裂x（S）在所有相应的元素。

The algorithm applied to each input string is
该算法应用到每一个输入字符串

值----------Value----------

A list of the same length as x, the i-th element of which contains the vector of splits of x[i].
长度相同的名单x，i个元素，其中包含矢量分裂x[i]。

If any element of x or split is declared to be in UTF-8 (see Encoding), all non-ASCII character strings in the result will be in UTF-8 and have their encoding declared as UTF-8.  As from R 2.10.0, for perl = TRUE, useBytes = FALSE all non-ASCII strings in a multibyte locale are translated to UTF-8.
如果x或split宣布在UTF-8（见Encoding），结果在所有非ASCII字符的字符串将在UTF-8，并有任何元素为UTF-8编码声明。从R 2.10.0，perl = TRUE, useBytes = FALSE在多字节语言环境中的所有非ASCII字符串转换为UTF-8。

注意----------Note----------

Prior to R 2.11.0 there was an argument extended which could be used to select "basic" regular expressions: this was often used when fixed = TRUE would be preferable.  In the actual implementation (as distinct from the POSIX standard) the only difference was that ?, +, {, |, (, and ) were not interpreted as metacharacters.
ŕ2.11.0之前，有一个参数extended可以用来选择基本正则表达式：这是经常被用来当fixed = TRUE将是可取的。（POSIX标准不同）在实际执行中，唯一的区别是?，+，{，|，(，)没有被解释为元字符。

参见----------See Also----------

paste for the reverse, grep and sub for string search and manipulation; also nchar, substr.
paste相反，grep和sub字符串搜索和操纵;nchar，substr。

"regular expression" for the details of the pattern specification.
“正则表达式的模式规范的细节。

举例----------Examples----------

noquote(strsplit("A text I want to display with spaces", NULL)[[1]])

x <- c(as = "asfef", qu = "qwerty", "yuiop[", "b", "stuff.blah.yech")
# split x on the letter e[拆分上的字母e的x]
strsplit(x, "e")

unlist(strsplit("a.b.c", "."))
## [1] "" "" "" "" ""[＃[1]“”“”“”“”“”]
## Note that 'split' is a regexp![＃注意，“分裂”是一个正则表达式！]
## If you really want to split on '.', use[＃如果你真的想上。分裂，使用]
unlist(strsplit("a.b.c", "\\."))
## [1] "a" "b" "c"[＃[1]“A”“B”“C”]
## or[＃或]
unlist(strsplit("a.b.c", ".", fixed = TRUE))

## a useful function: rev() for strings[＃一个有用的功能：转（）字符串]
strReverse <- function(x)
      sapply(lapply(strsplit(x, NULL), rev), paste, collapse="")
strReverse(c("abc", "Statistics"))

## get the first names of the members of R-core[＃R型铁芯的成员名字]
a <- readLines(file.path(R.home("doc"),"AUTHORS"))[-(1:8)]
a <- a[(0:2)-length(a)]
(a <- sub(" .*","", a))
# and reverse them[和扭转它们]
strReverse(a)

## Note that final empty strings are not produced:[＃注意，没有产生最后的空字符串：]
strsplit(paste(c("", "a", ""), collapse="#"), split="#")[[1]][“），分裂=”＃“）[1]]
# [1] ""  "a"[[1]“”“一”]
## and also an empty string is only produced before a definite match:[＃前一个明确的比赛只产生一个空字符串：]
strsplit("", " ")[[1]] # character(0)[字符（0）]
strsplit(" ", " ")[[1]] # [1] ""[[1]“”]

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册