找回密码
 注册
查看: 1680|回复: 0

R语言 Unicode包 u_char_basics()函数中文帮助文档(中英文对照)

[复制链接]
发表于 2012-10-1 13:20:28 | 显示全部楼层 |阅读模式
u_char_basics(Unicode)
u_char_basics()所属R语言包:Unicode

                                        Unicode Character Objects
                                         Unicode字符对象

                                         译者:生物统计家园网 机器人LoveR

描述----------Description----------

Data structures and basic methods for Unicode character data.
Unicode字符数据的数据结构和基本方法。


用法----------Usage----------


as.u_char(x)
as.u_char_range(x)
as.u_char_seq(x, sep = NA_character_)



参数----------Arguments----------

参数:x
R objects coercible to the respective Unicode character data types, see Details.
R对象转换成相应的Unicode字符数据类型,查看详细信息。


参数:sep
a character string.
一个字符串。


Details

详细信息----------Details----------

Package Unicode provides three basic classes for representing Unicode characters: u_char for vectors of Unicode characters, u_char_range for vectors of Unicode character ranges, and u_char_seq for vectors of Unicode character sequences.  Objects from these classes are created via the respective coercion functions.
套件Unicode提供了三个基本类型表示Unicode字符:u_char为向量的Unicode字符,u_char_range为向量的Unicode字符范围,和u_char_seq为向量的Unicode字符序列。从这些类创建的对象通过相应的转换函数。

as.u_char knows to coerce integers or hex strings (with or without a leading 0x or the U+ typically used for Unicode characters) giving the corresponding code points.  It can also handle Unicode character ranges, flattening them out into the corresponding vector of Unicode characters.  To “coerce” a UTF-8 encoded R character string to the corresponding Unicode character object, use coercion on the result of obtaining the integer code points via utf8ToInt.
as.u_char知道,迫使整数或十六进制字符串(带或不带领先的0x或U+通常用于Unicode字符),给出相应的代码点。它也可以处理Unicode字符范围,扁平化出来的Unicode字符对应的向量。 “强制”的UTF-8编码的R字符的字符串对应的Unicode字符对象,使用胁迫的结果获得通过utf8ToInt点的整数代码。

as.u_char_range knows to coerce character strings of single Unicode characters or a Unicode range expression with the hex codes of two Unicode characters collapsed by .. (currently, hard-wired). It can also handle u_char objects, coercing them to ranges of single code points.
as.u_char_range知道,迫使单一的Unicode字符或字符串的Unicode范围的表达与倍数..(目前,硬有线)两个Unicode字符的十六进制代码。它也可以处理u_char对象,强迫他们单一的代码点范围。

as.u_char_seq knows to coerce character strings with the hex codes of Unicode characters collapsed by a non-empty sep.  The default corresponds to using , if the strings use surrounding angles, and   otherwise.  If sep is empty or has length zero, the character strings are used as is, re-encoded in UTF-8 if necessary, and mapped to the corresponding Unicode character sequences using utf8ToInt.  as.u_char_seq can also handle Unicode character ranges (giving the corresponding flattened out Unicode character sequences), or lists of objects coercible to Unicode characters via as.u_char.
as.u_char_seq知道强制字符串的Unicode字符的十六进制代码倍数一个非空的sep。默认使用,如果字符串使用周围角度,和 否则。 sep如果为空或有长度为零的字符串被用作是,如果有必要重新编码UTF-8,并映射到对应的Unicode字符序列utf8ToInt。 as.u_char_seq也可以处理Unicode字符范围内(给夷为平地了相应的Unicode字符序列),或列表中的对象转换成Unicode字符通过as.u_char。

All classes currently have as.character, as.data.frame, c, format, print, rep, unique and [ subscript methods.  More methods will be added eventually.
目前,所有的类as.character,as.data.frame,c,format,print,rep,unique和[ 标方法。最终将加入更多的方法。


值----------Value----------

For as.u_char, a u_char object giving a vector of Unicode characters.
对于as.u_char,u_char一个向量的Unicode字符的对象。

For as.u_char_range, a u_char_range object giving a vector of Unicode character ranges.
对于as.u_char_range,u_char_range一个向量的Unicode字符范围的对象。

For as.u_char_seq, a u_char_seq object giving a vector of Unicode character sequences.
对于as.u_char_seq,u_char_seq对象的Unicode字符序列的一个向量。


参考文献----------References----------

Unicode Character Database (http://www.unicode.org/ucd/),<br> http://en.wikipedia.org/wiki/Unicode

实例----------Examples----------


x <- as.u_char_range(c("00AA..00AC", "01CC"))
x
## Corresponding Unicode character sequence object:[#对应的Unicode字符序列对象:]
as.u_char_seq(x)
## Corresponding Unicode character object with all code points:[#对应的Unicode字符对象的所有代码点:]
as.u_char(x)
## Inspect all Unicode characters in the range:[#检查范围内的所有Unicode字符:]
u_char_inspect(x)

## Turning R character strings into the respective Unicode character[#打开R字符字符串转换成相应的Unicode字符]
## sequences:[#序列:]
as.u_char_seq(c("Austria", "Trantor"), "")
## which can then be subscripted "as usual", e.g.:[#然后可以将其下标“一切如常”,例如:]
x <- as.u_char_seq(c("Austria", "Trantor"), "")[[1L]][c(3L, 5L)]
x
## To reassemble the character strings:[#要重新安装的字符串:]
intToUtf8(x)

转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2024-11-28 14:36 , Processed in 0.025345 second(s), 15 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表