找回密码
 注册
查看: 4804|回复: 0

R语言:iconv()函数中文帮助文档(中英文对照)

[复制链接]
发表于 2012-2-16 20:51:33 | 显示全部楼层 |阅读模式
iconv(base)
iconv()所属R语言包:base

                                        Convert Character Vector between Encodings
                                         转换字符编码之间的媒介

                                         译者:生物统计家园网 机器人LoveR

描述----------Description----------

This uses system facilities to convert a character vector between encodings: the "i" stands for "internationalization".
这使用系统设施之间的编码转换的特征向量:我,国际化。


用法----------Usage----------


iconv(x, from = "", to = "", sub = NA, mark = TRUE, toRaw = FALSE)

iconvlist()



参数----------Arguments----------

参数:x
A character vector, or an object to be converted to a character vector by as.character, or a list with NULL and raw elements as returned by iconv(toRaw = TRUE).
一个特征向量,或对象被转换为一个字符向量as.character或NULL和raw作为元素iconv(toRaw = TRUE)返回列表。


参数:from
A character string describing the current encoding.
一个描述当前编码的字符串。


参数:to
A character string describing the target encoding.
一个字符串,描述了目标编码。


参数:sub
character string.  If not NA it is used to replace any non-convertible bytes in the input.  (This would normally be a single character, but can be more.)  If "byte", the indication is "<xx>" with the hex code of the byte.
字符串。如果没有NA它是用来取代输入任何非自由兑换字节。 (这通常是一个单一的字符,但可能更多。)如果"byte",指示"<xx>"字节的十六进制代码。


参数:mark
logical, for expert use.  Should encodings be marked?
逻辑,专家使用。应编码被标记吗?


参数:toRaw
logical.  Should a list of raw vectors be returned rather than a character vector?
逻辑。应该返回原始向量,而不是一个字符向量?


Details

详情----------Details----------

The names of encodings and which ones are available are platform-dependent.  All R platforms support "" (for the encoding of the current locale), "latin1" and "UTF-8". Generally case is ignored when specifying an encoding.
编码和哪些是可用的名称是平台依赖的。所有R平台支持""(当前locale的编码),"latin1"和"UTF-8"。一般情况下被忽略时指定编码。

On many platforms, including Windows, iconvlist provides an alphabetical list of the supported encodings.  On others, the information is on the man page for iconv(5) or elsewhere in the man pages (but beware that the system command iconv may not support the same set of encodings as the C functions R calls). Unfortunately, the names are rarely valid across all platforms.
在许多平台上,包括Windows中,iconvlist提供支持的编码按字母顺序排列的列表。对他人,信息手册页是iconv(5)或其他地方的手册页(但要注意系统命令iconv可能不支持同一套编码作为C函数&#341;电话)。不幸的是,在所有平台上的名字是很少有效。

Elements of x which cannot be converted (perhaps because they are invalid or because they cannot be represented in the target encoding) will be returned as NA unless sub is specified.
x不能转换(也许是因为它们是无效的,或者因为他们不能代表在目标编码)的元素将返回NA除非sub指定。

Most versions of iconv will allow transliteration by appending //TRANSLIT to the to encoding: see the examples.
iconv版本将允许追加//TRANSLITto编码的音译:看到的例子。

Encoding "ASCII" is also accepted, and on most systems "C" and "POSIX" are synonyms for ASCII.
编码"ASCII"也接受了,并在大多数系统上"C"和"POSIX"是对ASCII的同义词。

Any encoding bits (see Encoding) on elements of x are ignored: they will always be translated as if from from even if declared otherwise.
任何编码位上的元素(见Encoding)x被忽略:他们将永远被翻译仿佛从from否则即使宣布。

"UTF8" will be accepted as meaning the (more correct) "UTF-8".
"UTF8"将接受意义(更正确的)"UTF-8"的。


值----------Value----------

If toRaw = FALSE (the default), the value is a character vector of the same length and the same attributes as x (after conversion to a character vector).
如果toRaw = FALSE(默认),该值是x(转换后的字符向量)相同的长度和相同的属性的特征向量。

If mark = TRUE (the default) the elements of the result have a declared encoding if from is "latin1" or "UTF-8", or if from = "" and the current locale's encoding is detected as Latin-1 or UTF-8.
如果mark = TRUE(默认)结果的元素有一个声明的编码,如果from是"latin1"或"UTF-8",或者from = ""“当前语言环境的编码被检测的Latin-1或UTF-8。

If toRaw = TRUE, the value is a vector of the same length and the same attributes as x whose elements are either NULL (if conversion fails) or a raw vector.
如果toRaw = TRUE,值是相同的长度和x要么NULL(如果转换失败)或原始向量,其元素是相同的属性的向量。

For iconvlist(), a character vector (typically of a few hundred elements).
iconvlist(),特征向量(通常是几百个元素)。


实施细则----------Implementation Details----------

There are three main implementations of iconv in use. glibc (as used on Linux) contains one.  Several platforms supply GNU libiconv, including Mac OS X, FreeBSD and Cygwin. On Windows we use a version of Yukihiro Nakadaira's win_iconv, which is based on Windows' codepages.  All three have iconvlist, ignore case in encoding names and support //TRANSLIT (but with different results, and for win_iconv currently a "best fit" strategy is used except for to = "ASCII").
iconv在使用的主要有三个实现。 glibc(Linux上使用)包含一个。多个平台提供的GNUlibiconv包括Mac OS X,FreeBSD和Cygwin的。在Windows中,我们使用的幸Nakadaira win_iconv,这是基于Windows的代码页的版本。所有这三个iconvlist,忽略编码名称和//TRANSLIT(但不同的结果,win_iconv目前1“最合适”的战略,除了支持to = "ASCII" )。

Most commercial Unixes contain an implemetation of iconv but none we have encountered have supported the encoding names we need: the &ldquo;R Installation and Administration Manual&rdquo; recommends installing GNU libiconv on Solaris and AIX, for example.
大多数商业Unix系统包含一个实iconv但没有我们所遇到的支持我们需要的编码名称:“R安装和管理手册”建议安装GNU libiconvSolaris和AIX,例如。

There are other implementations, e.g. NetBSD uses one from the Citrus project (which does not support //TRANSLIT) and there is an older FreeBSD port (libiconv is usually used there): it has not been reported whether or not these work with R.
还有其他的实现,例如NetBSD使用从柑橘项目(不支持//TRANSLIT),有一个旧的FreeBSD端口(libiconv通常用于有):它并没有被这些工作是否与R.报道


参见----------See Also----------

localeToCharset, file.
localeToCharset,file。


举例----------Examples----------


## In principle, not all systems have iconvlist[#原则,并不是所有的系统有iconvlist]
try(utils::head(iconvlist(), n = 50))

## Not run: [#无法运行:]
## convert from Latin-2 to UTF-8: two of the glibc iconv variants.[#的Latin-2转换为UTF-8的glibc的iconv变种。]
iconv(x, "ISO_8859-2", "UTF-8")
iconv(x, "LATIN2", "UTF-8")

## End(Not run)[#结束(不运行)]

## Both x below are in latin1 and will only display correctly in a[#下面了x latin1中,只会显示在正确]
## locale that can represent and display latin1.[#语言环境,能代表和展示LATIN1。]
x <- "fa\xE7ile"
Encoding(x) <- "latin1"
x
charToRaw(xx <- iconv(x, "latin1", "UTF-8"))
xx

iconv(x, "latin1", "ASCII")          #   NA[不适用]
iconv(x, "latin1", "ASCII", "?")     # "fa?ile"[“FA?ILE”]
iconv(x, "latin1", "ASCII", "")      # "faile"[“faile”]
iconv(x, "latin1", "ASCII", "byte")  # "fa&lt;e7&gt;ile"[“FA <e7> ILE”]

## Extracts from old R help files (they are nowadays in UTF-8)[#旧&#341;帮助文件的摘录(他们如今在UTF-8)]
x <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher")
Encoding(x) <- "latin1"
x
try(iconv(x, "latin1", "ASCII//TRANSLIT"))  # platform-dependent[依赖于平台]
iconv(x, "latin1", "ASCII", sub="byte")
## and for Windows' 'Unicode'[#和Windows的“Unicode的”]
str(xx <- iconv(x, "latin1", "UTF-16LE", toRaw = TRUE))
iconv(xx, "UTF-16LE", "UTF-8")

转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2025-1-24 05:31 , Processed in 0.035721 second(s), 16 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表