找回密码
 注册
查看: 7297|回复: 0

R语言:Encoding()函数中文帮助文档(中英文对照)

[复制链接]
发表于 2012-2-16 17:25:46 | 显示全部楼层 |阅读模式
Encoding(base)
Encoding()所属R语言包:base

                                        Read or Set the Declared Encodings for a Character Vector
                                         读取或设置申报编码字符向量

                                         译者:生物统计家园网 机器人LoveR

描述----------Description----------

Read or set the declared encodings for a character vector.
读取或设置申报编码字符向量。


用法----------Usage----------


Encoding(x)

Encoding(x) <- value

enc2native(x)
enc2utf8(x)



参数----------Arguments----------

参数:x
A character vector.
字符向量。


参数:value
A character vector of positive length.
一个积极的长度的字符向量。


Details

详情----------Details----------

Character strings in R can be declared to be in "latin1" or "UTF-8" or "bytes".  These declarations can be read by Encoding, which will return a character vector of values "latin1", "UTF-8" "bytes" or "unknown", or set, when value is recycled as needed and other values are silently treated as "unknown".  ASCII strings will never be marked with a declared encoding, since their representation is the same in all supported encodings.  Strings marked as "bytes" are intended to be non-ASCII strings which should be manipulated as bytes, and never converted to a character encoding.
R中的字符串可以宣布是"latin1"或"UTF-8"或"bytes"。可以通过Encoding阅读这些声明,这将返回一个值的特征向量"latin1","UTF-8""bytes"或"unknown",或集,当<X >回收需要和其他值都默默地为value治疗。 ASCII字符串不会被标记声明的编码,因为他们的代表在所有支持的编码是相同的。 "unknown"标记的字符串的本意是应该以字节为单位进行操作,并不会转换为字符编码的非ASCII字符串。

enc2native and enc2utf8 convert elements of character vectors to the native encoding or UTF-8 respectively, taking any marked encoding into account.  They are primitive functions, designed to do minimal copying.
enc2native和enc2utf8本地编码或UTF-8转换特征向量的元素,考虑采取任何显着的编码。他们是原始的功能,旨在做最小的复制。

There are other ways for character strings to acquire a declared encoding apart from explicitly setting it (and these have changed as R has evolved).  Functions scan, read.table, readLines, and parse have an encoding argument that is used to declare encodings, iconv declares encodings from its from argument, and console input in suitable locales is also declared.  intToUtf8 declares its output as "UTF-8", and output text connections (see textConnection) are marked if running in a suitable locale.  Under some circumstances (see its help page) source(encoding=) will mark encodings of character strings it outputs.
有字符串的其他方式收购声明的编码除了明确设置(这些已经改变已发展成为R)。功能scan,read.table,readLines,parseencoding参数用来声明编码,iconv其声明编码还宣布from参数,并在适当的区域设置控制台输入。 intToUtf8"UTF-8"宣布它的输出,输出文本连接(见textConnection)标记,如果在一个合适的语言环境中运行。在某些情况下(参见其帮助页面)source(encoding=)将迎来它输出字符串的编码。

Most character manipulation functions will set the encoding on output strings if it was declared on the corresponding input.  These include chartr, strsplit(useBytes = FALSE), tolower and toupper as well as sub(useBytes = FALSE) and gsub(useBytes =   FALSE).  Note that such functions do not preserve the encoding, but if they know the input encoding and that the string has been successfully re-encoded (to the current encoding or UTF-8), they mark the output.
大多数字符处理函数将设置输出字符串的编码,如果它被宣布相应的输入。这些措施包括chartr,strsplit(useBytes = FALSE),tolower和toupper以及sub(useBytes = FALSE)和gsub(useBytes =   FALSE)。请注意,这样的功能不保留的编码,但如果他们知道,输入编码和字符串已成功地重新编码(当前或UTF-8编码),它们标志着输出。

substr does preserve the encoding, and chartr, tolower and toupper preserve UTF-8 encoding on systems with Unicode wide characters.  With their fixed and perl options, strsplit, sub and gsub will give a marked UTF-8 result if any of the inputs are UTF-8.
substr不保存编码,chartr,tolower和toupperUTF-8编码保存Unicode宽字符系统。与他们的fixed和perl选项,strsplit,sub和gsub会给明显的UTF-8的结果,如果有任何输入是UTF-8。

paste and sprintf return elements marked as bytes if any of the corresponding inputs is marked as bytes, and  otherwise marked as UTF-8 of any of the inputs is marked as UTF-8.
paste和sprintf返回元素标记为任何相应的输入标记为字节,否则标记为UTF-8输入任何标记为UTF-8字节。

match, pmatch, charmatch, duplicated and unique all match in UTF-8 if any of the elements are marked as UTF-8.
match,pmatch,charmatch,duplicated和unique所有在UTF-8的比赛,如果任何标记为UTF-8的元素。


值----------Value----------

A character vector.
字符向量。


举例----------Examples----------


## x is intended to be in latin1[#X是为了在LATIN1]
x <- "fa\xE7ile"
Encoding(x)
Encoding(x) <- "latin1"
x
xx <- iconv(x, "latin1", "UTF-8")
Encoding(c(x, xx))
c(x, xx)
Encoding(xx) <- "bytes"
xx # will be encoded in hex[将十六进制编码]
cat("xx = ", xx, "\n", sep = "")

转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2025-2-2 20:05 , Processed in 0.021791 second(s), 15 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表