找回密码
 注册
查看: 10532|回复: 0

R语言:strsplit()函数中文帮助文档(中英文对照)

[复制链接]
发表于 2012-2-16 18:05:54 | 显示全部楼层 |阅读模式
strsplit(base)
strsplit()所属R语言包:base

                                        Split the Elements of a Character Vector
                                         分裂的特征向量的元素

                                         译者:生物统计家园网 机器人LoveR

描述----------Description----------

Split the elements of a character vector x into substrings according to the matches to substring split within them.
根据子串x在他们的比赛,分裂成子字符向量split元素。


用法----------Usage----------


strsplit(x, split, fixed = FALSE, perl = FALSE, useBytes = FALSE)



参数----------Arguments----------

参数:x
character vector, each element of which is to be split.  Other inputs, including a factor, will give an error.  
特征向量,其中的每个元素被分裂。其他的投入,包括一个因素,将给出一个错误。


参数:split
character vector (or object which can be coerced to such) containing regular expression(s) (unless fixed = TRUE) to use for splitting.  If empty matches occur, in particular if split has length 0, x is split into single characters. If split has length greater than 1, it is re-cycled along x.  
特征向量(或对象可以强制等)含正则表达式(S)(除非fixed = TRUE)用于分裂。如果出现空场比赛,尤其是如果split长度为0,x分成单个字符。如果split长度大于1,它是沿x循环再用。


参数:fixed
logical.  If TRUE match split exactly, otherwise use regular expressions.  Has priority over perl.  
逻辑。如果TRUE匹配split准确,否则使用正则表达式。有过perl优先。


参数:perl
logical.  Should perl-compatible regexps be used?
逻辑。 Perl兼容的正则表达式应该使用?


参数:useBytes
logical.  If TRUE the matching is done byte-by-byte rather than character-by-character, and inputs with marked encodings are not converted.  This is forced (with a warning) if any input is found which is marked as "bytes".
逻辑。如果TRUE进行匹配字节逐字节而不是字符,并与显着的编码输入字符不会被转换。这是被迫(警告),如果发现任何输入被标记为"bytes"。


Details

详情----------Details----------

Argument split will be coerced to character, so you will see uses with split = NULL to mean split = character(0), including in the examples below.
参数split将被强制转换为字符,所以你会看到split = NULL指split = character(0),包括在下面的例子使用。

Note that splitting into single characters can be done via split = character(0) or split = ""; the two are equivalent.  The definition of "character" here depends on the locale: in a single-byte locale it is a byte, and in a multi-byte locale it is the unit represented by a "wide character" (almost always a Unicode point).
请注意,分拆为单个字符,可以通过split = character(0)或split = "";两者是等价的。这里的“性格”的定义取决于语言环境:在一个单字节语言环境,它是一个字节和多字节语言环境中,它是由“宽字符”(几乎总是一个Unicode点)所代表的单位。

A missing value of split does not split the corresponding element(s) of x at all.
一个split缺失值不分裂x(S)在所有相应的元素。

The algorithm applied to each input string is
该算法应用到每一个输入字符串


值----------Value----------

A list of the same length as x, the i-th element of which contains the vector of splits of x[i].
长度相同的名单x,i个元素,其中包含矢量分裂x[i]。

If any element of x or split is declared to be in UTF-8 (see Encoding), all non-ASCII character strings in the result will be in UTF-8 and have their encoding declared as UTF-8.  As from R 2.10.0, for perl = TRUE, useBytes = FALSE all non-ASCII strings in a multibyte locale are translated to UTF-8.
如果x或split宣布在UTF-8(见Encoding),结果在所有非ASCII字符的字符串将在UTF-8,并有任何元素为UTF-8编码声明。从R 2.10.0,perl = TRUE, useBytes = FALSE在多字节语言环境中的所有非ASCII字符串转换为UTF-8。


注意----------Note----------

Prior to R 2.11.0 there was an argument extended which could be used to select "basic" regular expressions: this was often used when fixed = TRUE would be preferable.  In the actual implementation (as distinct from the POSIX standard) the only difference was that ?, +, {, |, (, and ) were not interpreted as metacharacters.
ŕ2.11.0之前,有一个参数extended可以用来选择基本正则表达式:这是经常被用来当fixed = TRUE将是可取的。 (POSIX标准不同)在实际执行中,唯一的区别是?,+,{,|,(,)没有被解释为元字符。


参见----------See Also----------

paste for the reverse, grep and sub for string search and manipulation; also nchar, substr.
paste相反,grep和sub字符串搜索和操纵;nchar,substr。

"regular expression" for the details of the pattern specification.
“正则表达式的模式规范的细节。


举例----------Examples----------


noquote(strsplit("A text I want to display with spaces", NULL)[[1]])

x <- c(as = "asfef", qu = "qwerty", "yuiop[", "b", "stuff.blah.yech")
# split x on the letter e[拆分上的字母e的x]
strsplit(x, "e")

unlist(strsplit("a.b.c", "."))
## [1] "" "" "" "" ""[#[1]“”“”“”“”“”]
## Note that 'split' is a regexp![#注意,“分裂”是一个正则表达式!]
## If you really want to split on '.', use[#如果你真的想上。分裂,使用]
unlist(strsplit("a.b.c", "\\."))
## [1] "a" "b" "c"[#[1]“A”“B”“C”]
## or[#或]
unlist(strsplit("a.b.c", ".", fixed = TRUE))

## a useful function: rev() for strings[#一个有用的功能:转()字符串]
strReverse <- function(x)
        sapply(lapply(strsplit(x, NULL), rev), paste, collapse="")
strReverse(c("abc", "Statistics"))

## get the first names of the members of R-core[#R型铁芯的成员名字]
a <- readLines(file.path(R.home("doc"),"AUTHORS"))[-(1:8)]
a <- a[(0:2)-length(a)]
(a <- sub(" .*","", a))
# and reverse them[和扭转它们]
strReverse(a)

## Note that final empty strings are not produced:[#注意,没有产生最后的空字符串:]
strsplit(paste(c("", "a", ""), collapse="#"), split="#")[[1]][“),分裂=”#“)[1]]
# [1] ""  "a"[[1]“”“一”]
## and also an empty string is only produced before a definite match:[#前一个明确的比赛只产生一个空字符串:]
strsplit("", " ")[[1]]    # character(0)[字符(0)]
strsplit(" ", " ")[[1]]   # [1] ""[[1]“”]

转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2025-1-23 12:01 , Processed in 0.026388 second(s), 15 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表