u_char_basics(Unicode)
u_char_basics()所属R语言包:Unicode
Unicode Character Objects
Unicode字符对象
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Data structures and basic methods for Unicode character data.
Unicode字符数据的数据结构和基本方法。
用法----------Usage----------
as.u_char(x)
as.u_char_range(x)
as.u_char_seq(x, sep = NA_character_)
参数----------Arguments----------
参数:x
R objects coercible to the respective Unicode character data types, see Details.
R对象转换成相应的Unicode字符数据类型,查看详细信息。
参数:sep
a character string.
一个字符串。
Details
详细信息----------Details----------
Package Unicode provides three basic classes for representing Unicode characters: u_char for vectors of Unicode characters, u_char_range for vectors of Unicode character ranges, and u_char_seq for vectors of Unicode character sequences. Objects from these classes are created via the respective coercion functions.
套件Unicode提供了三个基本类型表示Unicode字符:u_char为向量的Unicode字符,u_char_range为向量的Unicode字符范围,和u_char_seq为向量的Unicode字符序列。从这些类创建的对象通过相应的转换函数。
as.u_char knows to coerce integers or hex strings (with or without a leading 0x or the U+ typically used for Unicode characters) giving the corresponding code points. It can also handle Unicode character ranges, flattening them out into the corresponding vector of Unicode characters. To “coerce” a UTF-8 encoded R character string to the corresponding Unicode character object, use coercion on the result of obtaining the integer code points via utf8ToInt.
as.u_char知道,迫使整数或十六进制字符串(带或不带领先的0x或U+通常用于Unicode字符),给出相应的代码点。它也可以处理Unicode字符范围,扁平化出来的Unicode字符对应的向量。 “强制”的UTF-8编码的R字符的字符串对应的Unicode字符对象,使用胁迫的结果获得通过utf8ToInt点的整数代码。
as.u_char_range knows to coerce character strings of single Unicode characters or a Unicode range expression with the hex codes of two Unicode characters collapsed by .. (currently, hard-wired). It can also handle u_char objects, coercing them to ranges of single code points.
as.u_char_range知道,迫使单一的Unicode字符或字符串的Unicode范围的表达与倍数..(目前,硬有线)两个Unicode字符的十六进制代码。它也可以处理u_char对象,强迫他们单一的代码点范围。
as.u_char_seq knows to coerce character strings with the hex codes of Unicode characters collapsed by a non-empty sep. The default corresponds to using , if the strings use surrounding angles, and otherwise. If sep is empty or has length zero, the character strings are used as is, re-encoded in UTF-8 if necessary, and mapped to the corresponding Unicode character sequences using utf8ToInt. as.u_char_seq can also handle Unicode character ranges (giving the corresponding flattened out Unicode character sequences), or lists of objects coercible to Unicode characters via as.u_char.
as.u_char_seq知道强制字符串的Unicode字符的十六进制代码倍数一个非空的sep。默认使用,如果字符串使用周围角度,和 否则。 sep如果为空或有长度为零的字符串被用作是,如果有必要重新编码UTF-8,并映射到对应的Unicode字符序列utf8ToInt。 as.u_char_seq也可以处理Unicode字符范围内(给夷为平地了相应的Unicode字符序列),或列表中的对象转换成Unicode字符通过as.u_char。
All classes currently have as.character, as.data.frame, c, format, print, rep, unique and [ subscript methods. More methods will be added eventually.
目前,所有的类as.character,as.data.frame,c,format,print,rep,unique和[ 标方法。最终将加入更多的方法。
值----------Value----------
For as.u_char, a u_char object giving a vector of Unicode characters.
对于as.u_char,u_char一个向量的Unicode字符的对象。
For as.u_char_range, a u_char_range object giving a vector of Unicode character ranges.
对于as.u_char_range,u_char_range一个向量的Unicode字符范围的对象。
For as.u_char_seq, a u_char_seq object giving a vector of Unicode character sequences.
对于as.u_char_seq,u_char_seq对象的Unicode字符序列的一个向量。
参考文献----------References----------
Unicode Character Database (http://www.unicode.org/ucd/),<br> http://en.wikipedia.org/wiki/Unicode
实例----------Examples----------
x <- as.u_char_range(c("00AA..00AC", "01CC"))
x
## Corresponding Unicode character sequence object:[#对应的Unicode字符序列对象:]
as.u_char_seq(x)
## Corresponding Unicode character object with all code points:[#对应的Unicode字符对象的所有代码点:]
as.u_char(x)
## Inspect all Unicode characters in the range:[#检查范围内的所有Unicode字符:]
u_char_inspect(x)
## Turning R character strings into the respective Unicode character[#打开R字符字符串转换成相应的Unicode字符]
## sequences:[#序列:]
as.u_char_seq(c("Austria", "Trantor"), "")
## which can then be subscripted "as usual", e.g.:[#然后可以将其下标“一切如常”,例如:]
x <- as.u_char_seq(c("Austria", "Trantor"), "")[[1L]][c(3L, 5L)]
x
## To reassemble the character strings:[#要重新安装的字符串:]
intToUtf8(x)
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|