R语言 tm包 readXML()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-10-1 10:52:26

readXML(tm)
readXML()所属R语言包：tm

                                    Read In an XML Document
                                       在一个XML文件中读取

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

Return a function which reads in an <acronym>XML</acronym> document. The structure of the <acronym>XML</acronym> document can be described with a so-called specification.
返回一个函数，它读取的<acronym> XML </首字母缩写>文件。结构的<acronym>XML</缩写>文件可以与一个所谓的规范描述。

用法----------Usage----------

readXML(spec, doc, ...)

参数----------Arguments----------

参数：spec
A named list of lists each containing two character vectors. The constructed reader will map each list entry to a attribute or meta datum corresponding to the named list entry. Valid names include Content to access the document's content, any valid attribute name, and characters which are mapped to LocalMetaData entries.  Each list entry must consist of two character vectors: the first describes the type of the second argument, and the second is the specification entry. Valid combinations are:
一个名为listlist的每片含两个character向量。构建的读者将每个列表项映射到指定的列表项对应一个属性或元数据。有效名称包括Content访问文档的内容，任何有效的属性名，和字符映射到LocalMetaData项目。每个列表的条目必须包含两个特征向量：第一部分介绍的第二个参数的类型，第二个是规范条目。有效组合是：

type = "node", spec = "XPathExpression"The XPath expression spec extracts information from an <acronym>XML</acronym> node.
类型=“节点”，规格=“的XPathExpression”XPath表达式spec的<acronym> XML </首字母缩写>节点中提取信息。

type = "attribute", spec = "XPathExpression"The XPath expression spec extracts information from an attribute of an <acronym>XML</acronym> node.
=“属性”，规格=“的XPathExpression”的XPath表达式spec中提取信息的<acronym> XML </首字母缩写>节点的属性。

type = "function", spec = function(tree) ...The function spec is called, passing over a tree representation (as delivered by xmlInternalTreeParse from package XML) of the read in <acronym>XML</acronym> document as first argument.
=“功能”，规范函数（树）的功能spec被调用时，通过在一个树表示（交付xmlInternalTreeParse从包XML）阅读<acronym> XML </首字母缩写>文件作为第一个参数。

type = "unevaluated", spec = "String"The character vector spec is returned without modification.
=“未评估”，规范=“字串”的特征向量spec不加修改地返回。

参数：doc
An (empty) document of some subclass of TextDocument
（空）文件的一些子TextDocument

参数：...
Arguments for the generator function.
对生成器函数的参数。

Details

详细信息----------Details----------

Formally this function is a function generator, i.e., it returns a function (which reads in a text document) with a well-defined signature, but can access passed over arguments (e.g., the specification) via lexical scoping.
从形式上看，这个函数是一个函数发生器，即，它返回一个函数（在一个文本文件中读取）与一个明确的签名，但可以访问传递过来的参数（例如，“规范”）通过词法范围。

值----------Value----------

A function with the signature elem, language, id:
Afunction的签名elem, language, id：

参数：elem
A list with the named element content which must hold the document to be read in.
Alist命名的元素content必须持有文件被读入。

参数：language
A character vector giving the text's language.
Acharacter矢量提供的文本的语言。

参数：id
A character vector representing a unique identification string for the returned text document.
Acharacter向量，代表一个独特的标识字符串，返回的文本文件。

The function returns doc augmented by the parsed information out of the <acronym>XML</acronym> file as described by spec.
该函数返回doc增强的分析信息的<acronym> XML </>文件所描述的spec的缩写。

（作者）----------Author(s)----------

Ingo Feinerer

参见----------See Also----------

Vignette 'Extensions: How to Handle Custom File Formats'.
Vignette的扩展：如何处理自定义文件格式“。

getReaders to list available reader functions.
getReaders列出可用的阅读器功能。

实例----------Examples----------

## Not run: readReut21578XML <- readXML([＃不运行：readReut21578XML < - 的ReadXml（]
  spec = list(Author = list("node", "/REUTERS/TEXT/AUTHOR"),
            DateTimeStamp = list("function", function(node)
            strptime(sapply(XML::getNodeSet(node, "/REUTERS/DATE"),
                              XML::xmlValue),
                     format = "
                     tz = "GMT")),
            Description = list("unevaluated", ""),
            Heading = list("node", "/REUTERS/TEXT/TITLE"),
            ID = list("attribute", "/REUTERS/@NEWID"),
            Origin = list("unevaluated", "Reuters-21578 XML"),
            Topics = list("node", "/REUTERS/TOPICS/D")),
  doc = Reuters21578Document())
## End(Not run)[＃（不执行）]

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册