R语言 Unicode包 u_char_basics()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-10-1 13:20:28

u_char_basics(Unicode)
u_char_basics()所属R语言包：Unicode

                                    Unicode Character Objects
                                       Unicode字符对象

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

Data structures and basic methods for Unicode character data.
Unicode字符数据的数据结构和基本方法。

用法----------Usage----------

as.u_char(x)
as.u_char_range(x)
as.u_char_seq(x, sep = NA_character_)

参数----------Arguments----------

参数：x
R objects coercible to the respective Unicode character data types, see Details.
R对象转换成相应的Unicode字符数据类型，查看详细信息。

参数：sep
a character string.
一个字符串。

Details

详细信息----------Details----------

Package Unicode provides three basic classes for representing Unicode characters: u_char for vectors of Unicode characters, u_char_range for vectors of Unicode character ranges, and u_char_seq for vectors of Unicode character sequences.  Objects from these classes are created via the respective coercion functions.
套件Unicode提供了三个基本类型表示Unicode字符：u_char为向量的Unicode字符，u_char_range为向量的Unicode字符范围，和u_char_seq为向量的Unicode字符序列。从这些类创建的对象通过相应的转换函数。

as.u_char knows to coerce integers or hex strings (with or without a leading 0x or the U+ typically used for Unicode characters) giving the corresponding code points.  It can also handle Unicode character ranges, flattening them out into the corresponding vector of Unicode characters.  To “coerce” a UTF-8 encoded R character string to the corresponding Unicode character object, use coercion on the result of obtaining the integer code points via utf8ToInt.
as.u_char知道，迫使整数或十六进制字符串（带或不带领先的0x或U+通常用于Unicode字符），给出相应的代码点。它也可以处理Unicode字符范围，扁平化出来的Unicode字符对应的向量。 “强制”的UTF-8编码的R字符的字符串对应的Unicode字符对象，使用胁迫的结果获得通过utf8ToInt点的整数代码。

as.u_char_range knows to coerce character strings of single Unicode characters or a Unicode range expression with the hex codes of two Unicode characters collapsed by .. (currently, hard-wired). It can also handle u_char objects, coercing them to ranges of single code points.
as.u_char_range知道，迫使单一的Unicode字符或字符串的Unicode范围的表达与倍数..（目前，硬有线）两个Unicode字符的十六进制代码。它也可以处理u_char对象，强迫他们单一的代码点范围。

as.u_char_seq knows to coerce character strings with the hex codes of Unicode characters collapsed by a non-empty sep.  The default corresponds to using , if the strings use surrounding angles, and otherwise.  If sep is empty or has length zero, the character strings are used as is, re-encoded in UTF-8 if necessary, and mapped to the corresponding Unicode character sequences using utf8ToInt.  as.u_char_seq can also handle Unicode character ranges (giving the corresponding flattened out Unicode character sequences), or lists of objects coercible to Unicode characters via as.u_char.
as.u_char_seq知道强制字符串的Unicode字符的十六进制代码倍数一个非空的sep。默认使用,如果字符串使用周围角度，和否则。 sep如果为空或有长度为零的字符串被用作是，如果有必要重新编码UTF-8，并映射到对应的Unicode字符序列utf8ToInt。 as.u_char_seq也可以处理Unicode字符范围内（给夷为平地了相应的Unicode字符序列），或列表中的对象转换成Unicode字符通过as.u_char。

All classes currently have as.character, as.data.frame, c, format, print, rep, unique and [ subscript methods.  More methods will be added eventually.
目前，所有的类as.character，as.data.frame，c，format，print，rep，unique和[ 标方法。最终将加入更多的方法。

值----------Value----------

For as.u_char, a u_char object giving a vector of Unicode characters.
对于as.u_char，u_char一个向量的Unicode字符的对象。

For as.u_char_range, a u_char_range object giving a vector of Unicode character ranges.
对于as.u_char_range，u_char_range一个向量的Unicode字符范围的对象。

For as.u_char_seq, a u_char_seq object giving a vector of Unicode character sequences.
对于as.u_char_seq，u_char_seq对象的Unicode字符序列的一个向量。

参考文献----------References----------

Unicode Character Database (http://www.unicode.org/ucd/),<br> http://en.wikipedia.org/wiki/Unicode

实例----------Examples----------

x <- as.u_char_range(c("00AA..00AC", "01CC"))
x
## Corresponding Unicode character sequence object:[＃对应的Unicode字符序列对象：]
as.u_char_seq(x)
## Corresponding Unicode character object with all code points:[＃对应的Unicode字符对象的所有代码点：]
as.u_char(x)
## Inspect all Unicode characters in the range:[＃检查范围内的所有Unicode字符：]
u_char_inspect(x)

## Turning R character strings into the respective Unicode character[＃打开R字符字符串转换成相应的Unicode字符]
## sequences:[＃序列：]
as.u_char_seq(c("Austria", "Trantor"), "")
## which can then be subscripted "as usual", e.g.:[＃然后可以将其下标“一切如常”，例如：]
x <- as.u_char_seq(c("Austria", "Trantor"), "")[[1L]][c(3L, 5L)]
x
## To reassemble the character strings:[＃要重新安装的字符串：]
intToUtf8(x)

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册