xmlEventParse(XML)
xmlEventParse()所属R语言包:XML
XML Event/Callback element-wise Parser
XML事件/回调的各个元素的分析器
译者:生物统计家园网 机器人LoveR
描述----------Description----------
This is the event-driven or SAX (Simple API for XML) style parser which process XML without building the tree but rather identifies tokens in the stream of characters and passes them to handlers which can make sense of them in context. This reads and processes the contents of an XML file or string by invoking user-level functions associated with different components of the XML tree. These components include the beginning and end of XML elements, e.g <myTag x="1"> and </myTag> respectively, comments, CDATA (escaped character data), entities, processing instructions, etc. This allows the caller to create the appropriate data structure from the XML document contents rather than the default tree (see xmlTreeParse) and so avoids having the entire document in memory. This is important for large documents and where we would end up with essentially 2 copies of the data in memory at once, i.e the tree and the R data structure containing the information taken from the tree. When dealing with classes of XML documents whose instances could be large, this approach is desirable but a little more cumbersome to program than the standard DOM (Document Object Model) approach provided by XMLTreeParse.
这是事件驱动的SAX(XML的简单API)风格的解析器过程中没有建立树的XML,而是标识流中的字符,并把它们传递给处理程序,可以在上下文中理解这些令牌。读取和处理XML文件或字符串通过调用用户级的功能与不同的组件的XML树的内容。这些组件包括开头和结尾的XML元素,例如<myTag x="1">和</myTag>分别,注释,CDATA(转义字符数据),实体,处理指令等,这将允许调用者创建相应的数据从XML文档中的内容,而不是默认的树(见xmlTreeParse)等的结构避免了在内存中的整个文档。大文件,这是很重要的,在这里我们将结束的2份,一次在内存中的数据,即树和R的数据结构,其中包含的信息从树上。在处理XML文件的实例可能是大级别的,这种做法是可取的,但多了几分繁琐的程序比标准的DOM(文档对象模型)的方法提供XMLTreeParse。
Note that xmlTreeParse does allow a hybrid style of processing that allows us to apply handlers to nodes in the tree as they are being converted to R objects. This is a style of event-driven or asynchronous calling
需要注意的是xmlTreeParse确实允许一个混合的风格,使我们能够处理程序树中的节点,因为它们被转换为R对象的处理。这是一个事件驱动或异步调用的风格
In addition to the generic token event handlers such as "begin an XML element" (the startElement handler), one can also provide handler functions for specific tags/elements such as <myTag> with handler elements with the same name as the XML element of interest, i.e. "myTag" = function(x, attrs).
除了通用的象征性事件的处理程序,比如“开始XML元素”(startElement处理程序),也可以提供特定的标签/ <myTag>处理程序元素的元素,如处理函数相同的名称的XML元素的利益,即"myTag" = function(x, attrs)。
When the event parser is reading text nodes, it may call the text handler function with different sub-strings of the text within the node. Essentially, the parser collects up n characters into a buffer and passes this as a single string the text handler and then continues collecting more text until the buffer is full or there is no more text. It passes each sub-string to the text handler. If trim is TRUE, it removes leading and trailing white space from the substring before calling the text handler. If the resulting text is empty and ignoreBlanks is TRUE, then we don't bother calling the text handler function.
当事件解析器读取文本节点,它可以调用不同的子字符串的文本节点内的文本处理函数。从本质上讲,解析器收集了n个字符到缓冲区,并将此作为一个单一字符串的文字处理程序,然后继续收集更多的文字,直到缓冲区满或有没有更多的文字。它通过每个子串的文本处理程序。如果trim是TRUE,它会删除开头和结尾的白色空间的子字符串,然后调用的文本处理程序。如果文本是空的,ignoreBlanks是TRUE,那么我们不打扰调用的文本处理函数。
So the key thing to remember about dealing with text is that the entire text of a node may come in multiple separate calls to the text handler. A common idiom is to have the text handler concatenate the values it is passed in separate calls and to have the end element handler process the entire text and reset the text variable to be empty.
因此,关键是要记住处理文本是一个节点可能会在整个文本的文字处理程序的多个单独的呼叫。一个常用的文字处理程序连接起来的值是通过单独的调用和结束元素处理程序处理整个文本及重置文本变量是空的。
用法----------Usage----------
xmlEventParse(file, handlers = xmlEventHandler(),
ignoreBlanks = FALSE, addContext=TRUE,
useTagName = TRUE, asText = FALSE, trim=TRUE,
useExpat=FALSE, isURL = FALSE,
state = NULL, replaceEntities = TRUE, validate = FALSE,
saxVersion = 1, branches = NULL,
useDotNames = length(grep("^\\.", names(handlers))) > 0,
error = xmlErrorCumulator(), addFinalizer = NA)
参数----------Arguments----------
参数:file
the source of the XML content. This can be a string giving the name of a file or remote URL, the XML itself, a connection object, or a function. If this is a string, and asText is TRUE, the value is the XML content. This allows one to read the content separately from parsing without having to write it to a file. If asText is FALSE and a string is passed for file, this is taken as the name of a file or remote URI. If one is using the libxml parser (i.e. not expat), this can be a URI accessed via HTTP or FTP or a compressed local file. If it is the name of a local file, it can include ~, environment variables, etc. which will be expanded by R. (Note this is not the case in S-Plus, as far as I know.) If a connection is given, the parser incrementally reads one line at a time by calling the function readLines with the connection as the first argument (and 1 as the number of lines to read). The parser calls this function each time it needs more input. If invoking the readLines function to get each line is excessively slow or is inappropriate, one can provide a function as the value of fileName. Again, when the XML parser needs more content to process, it invokes this function to get a string. This function is called with a single argument, the maximum size of the string that can be returned. The function is responsible for accessing the correct connection(s), etc. which is typically done via lexical scoping/environments. This mechanism allows the user to control how the XML content is retrieved in very general ways. For example, one might read from a set of files, starting one when the contents of the previous file have been consumed. This allows for the use of hybrid connection objects. Support for connections and functions in this form is only provided if one is using libxml2 and not libxml version 1.
的XML内容的源。这可以是一个字符串,给出了一个文件或远程URL,XML本身,一个连接对象,或一个函数的名称。如果这是一个字符串,asTextTRUE,该值是XML内容。这允许一个阅读的内容,分别从解析,而无需将其写入到一个文件中。如果asText是FALSE和file,这是作为一个文件的名称或远程URI字符串传递。如果一个人使用libxml的解析器的(即不外籍人士),这可能是一个URI可以通过HTTP或FTP或压缩的本地文件。如果它是一个本地文件的名称,它可以包括~,环境变量,将扩大R.(请注意,这不是在S-PLUS的情况下,据我所知)。如果连接,增量解析器读取一个在一个时间线,通过调用函数readLines作为第一个参数(和连接1“读取的行数)。解析器调用这个函数每次需要更多的投入。如果调用readLines函数来获取每一行过慢,或者是不恰当的,我们可以提供一个函数的价值的fileName。同样,当XML解析器需要更多的内容,过程,它会调用这个函数来获得一个字符串。调用此函数带一个参数,可以返回的字符串的最大大小。的功能是负责进入正确的连接(s),这是通常是通过词法范围/环境。这种机制允许用户控制的XML内容的检索中非常普遍的方式。例如,一个可能从一组文件中读取,开始一个以前的文件的内容时,已被消耗。这允许使用混合连接对象。在这种形式的连接和功能的支持,如果只提供一个使用libxml2和libxml的版本1。
参数:handlers
a closure object that contains functions which will be invoked as the XML components in the document are encountered by the parser. The standard function or handler names are startElement(), endElement() comment(), getEntity, entityDeclaration(), processingInstruction(), text(), cdata(), startDocument(), and endDocument(), or alternatively and preferrably, these names prefixed with a '.', i.e. .startElement, .comment, ... The call signature for the entityDeclaration function was changed in version 1.7-0. Note that in earlier versions, the C routine did not invoke any R function and so no code will actually break. Also, we have renamed externalEntity to getEntity. These were based on the expat parser. The new signature is c(name = "character", type = "integer", content = "", system = "character", public = "character" ) name gives the name of the entity being defined. The type identifies the type of the entity using the value of a C-level enumerated constant used in libxml2, but also gives the human-readable form as the name of the single element in the integer vector. The possible values are "Internal_General", "External_General_Parsed", "External_General_Unparsed", "Internal_Parameter", "External_Parameter", "Internal_Predefined". If we are dealing with an internal entity, the content will be the string containing the value of the entity. If we are dealing with an external entity, then content will be a character vector of length 0, i.e. empty. Instead, either or both of the system and public arguments will be non-empty and identify the location of the external content. system will be a string containing a URI, if non-empty, and public corresponds to the PUBLIC identifier used to identify content using an SGML-like approach. The use of PUBLIC identifiers is less common.
一个封闭的对象,它包含文档中的函数将被调用的XML组件中遇到的解析器。标准功能或处理程序的名称是startElement(),endElement()comment(),getEntity,entityDeclaration(),processingInstruction(),text(),<所述>,cdata()和startDocument(),或者,并且最好的是,这些名字前缀一个。,即。的startElement,评论,...版本1.7-0改变的entityDeclaration函数的调用签名。请注意,在早期版本中,C例程没有调用任何R的功能,所以没有代码实际上将打破。此外,我们已改名为endDocument()到externalEntity。这些都是基于expat解析器。新的签名是getEntityc(name = "character", type = "integer", content = "", system = "character", public = "character" )提供的名称被定义的实体。 name标识的类型的实体使用一个C级枚举常量中使用的libxml2的价值,但也给人类可读的形式的整数向量中的单个元素的名称。可能的值是type,"Internal_General","External_General_Parsed","External_General_Unparsed","Internal_Parameter","External_Parameter"。如果我们要处理的内部实体,其内容将是字符串,其中包含的价值实体。如果我们在处理与外部实体,那么"Internal_Predefined"将是一个字符向量的长度为0,即空。相反,将一方或双方的系统和公共参数非空和拣选的外部的内容的位置。 content将是一个字符串,其中包含一个URI,如果非空,并且system对应的标识符,用于标识内容使用SGML类似的方法,。使用公共标识符是不常见的。
参数:ignoreBlanks
a logical value indicating whether text elements made up entirely of white space should be included in the resulting "tree".
一逻辑值,文本元素是否完全空白中应包括所获得的“树”。
参数:addContext
logical value indicating whether the callback functions in "handlers" should be invoked with contextual information about the parser and the position in the tree, such as node depth, path indices for the node relative the root, etc. If this is True, each callback function should support ....
逻辑值,该值指示是否在“处理程序”,应该调用的回调函数的上下文信息分析器和树中的位置,如节点的深度,路径的节点相对根等指标,如果这是真的,每个回调函数应该支持......
参数:useTagName
a logical value. If this is TRUE, when the SAX parser signals an event for the start of an XML element, it will first look for an element in the list of handler functions whose name matches (exactly) the name of the XML element. If such an element is found, that function is invoked. Otherwise, the generic startElement handler function is invoked. The benefit of this is that the author of the handler functions can write node-specific handlers for the different element names in a document and not have to establish a mechanism to invoke these functions within the startElement function. This is done by the XML package directly. If the value is FALSE, then the startElement handler function will be called without any effort to find a node-specific handler. If there are no node-specific handlers, specifying FALSE for this parameter will make the computations very slightly faster.
一个逻辑值。如果是这样TRUE,SAX解析器事件的XML元素的开始信号时,它会首先要看的处理程序函数的名称匹配(精确)的XML元素的名称列表中的元素。如果找到这样的元素,该函数将被调用。否则,通用startElement处理函数被调用。这样做的好处是,作者的处理函数可以写在一个文档中的不同元素名称的特定节点的处理程序,而不是要建立一种机制,在startElement函数调用这些函数。这是通过直接的XML包。如果该值FALSE,那么startElement处理函数会被调用,没有任何努力找到一个特定节点的处理程序。如果没有特定节点的处理程序,指定FALSE此参数的计算速度稍快。
参数:asText
logical value indicating that the first argument, "file", should be treated as the XML text to parse, not the name of a file. This allows the contents of documents to be retrieved from different sources (e.g. HTTP servers, XML-RPC, etc.) and still use this parser.
逻辑值,表示的第一个参数,“文件”,应被视为XML文本解析,而不是一个文件名。这使得文件的内容,从不同的来源(例如HTTP服务器,XML-RPC等)进行检索,仍然可以使用这个解析器。
参数:trim
whether to strip white space from the beginning and end of text strings.
是否剥离白色空间的开头和结尾的文本字符串。
参数:useExpat
a logical value indicating whether to use the expat SAX parser, or to default to the libxml. If this is TRUE, the library must have been compiled with support for expat. See supportsExpat.
一个逻辑值,该值指示是否使用外籍SAX解析器,或预设的libxml的。如果这是真的,图书馆必须被编译为外籍人士的支持。请参阅supportsExpat。
参数:isURL
indicates whether the file argument refers to a URL (accessible via ftp or http) or a regular file on the system. If asText is TRUE, this should not be specified.
指示是否file参数是指一个URL(可通过FTP或HTTP)或一个普通的文件系统上。如果asText是TRUE,这不应该被指定。
参数:state
an optional S object that is passed to the callbacks and can be modified to communicate state between the callbacks. If this is given, the callbacks should accept an argument named .state and it should return an object that will be used as the updated value of this state object. The new value can be any S object and will be passed to the next callback where again it will be updated by that functions return value, and so on. If this not specified in the call to xmlEventParse, no .state argument is passed to the callbacks. This makes the interface compatible with previous releases.
可以修改的对象传递给回调和可选的S之间的沟通状态的回调。如果这是应该接受的回调一个名为.state的参数和它应该返回一个对象,将被用作这种状态对象的更新值。新的值可以是任何S对象,将被传递到下一个回调再次由该函数返回值将被更新,依此类推。如果这不是指定的在调用xmlEventParse的,没有.state参数被传递到回调。这使得接口与以前的版本兼容。
参数:replaceEntities
logical value indicating whether to substitute entity references with their text directly. This should be left as False. The text still appears as the value of the node, but there is more information about its source, allowing the parse to be reversed with full reference information.
逻辑值,该值指示是否取代实体直接引用自己的文字。这应该是左为False。仍然显示为文本节点的值,但有更多关于它的来源的信息,让被逆转的解析与完整的参考信息。
参数:saxVersion
an integer value which should be either 1 or 2. This specifies which SAX interface to use in the C code. The essential difference is the number of arguments passed to the startElement handler function(s). Under SAX 2, in addition to the name of the element and the named-attributes vector, two additional arguments are provided. The first identifies the namespace of the element. This is a named character vector of length 1, with the value being the URI of the namespace and the name being the prefix that identifies that namespace within the document. For example, xmlns:r="http://www.r-project.org" would be passed as c(r = "http://www.r-project.org"). If there is no prefix because the namespace is being used as the default, the result of calling names on the string is "". The second additional argument (the fourth in total) gives the collection of all the namespaces defined within this element. Again, this is a named character vector.
一个整数值,这应该是1或2。这指定SAX接口在C代码中使用。本质上的区别是传递给startElement处理函数()的参数的数量。根据SAX 2,除了到的元素的名称和命名属性向量,两个额外的参数中所提供的。首先确定命名空间的元素。这是指定的字符长度为1的向量的值是URI的命名空间和名称的前缀标识该命名空间内的文件。例如,xmlns:r="http://www.r-project.org"将通过为c(r = "http://www.r-project.org")。如果没有作为默认的命名空间前缀,因为正在使用的调用names的字符串是""的,结果。第二个附加参数(第四个)为收集所有在此元素定义的命名空间。再次,这是指定的字符向量。
参数:validate
Currently, this has no effect as the libxml2 parser uses a document structure to do validation. a logical indicating whether to use a validating parser or not, or in other words check the contents against the DTD specification. If this is true, warning messages will be displayed about errors in the DTD and/or document, but the parsing will proceed except for the presence of terminal errors.
目前,这有没有影响libxml2解析器使用文档结构做验证。一个逻辑指示是否使用验证解析器或没有,或换句话说,检查的内容对DTD规范。如果这是真的,警告消息将显示在DTD和/或文件有关的错误,但除了终端存在的错误进行解析。
参数:branches
a named list of functions. Each element identifies an XML element name. If an XML element of that name is encountered in the SAX stream, the stream is processed until the end of that element and an internal node (see xmlTreeParse and its useInternalNodes parameter) is created. The function in our branches list corresponding to this XML element is then invoked with the (internal) node as the only argument. This allows one to use the DOM model on a sub-tree of the entire document and thus use both SAX and DOM together to get the efficiency of SAX and the simpler programming model of DOM. Note that the branches mechanism works top-down and does not work for nested tags. If one specifies an element name in the branches argument, e.g. myNode, and there is a nested myNode instance within a branch, the branches handler will not be called for that nested instance. If there is an instance where this is problematic, please contact the maintainer of this package. One can cause the parser to collect a branch without identifying the node within the branches list. Specifically, within a regular start-element handler, one can return a function whose class is SAXBranchFunction. The SAX parser recognizes this and collects up the branch starting at the current node being processed and when it is complete, invokes this function. This allows us to dynamically determine which nodes to treat as branches rather than just matching names. This is necessary when a node name has different meanings in different parts of the XML hierarchy, e.g. dict in an iTunes song list. See the file itunesSax2.R inthe examples for an example of this. This is a two step process. In the future, we might make it so that the R function handling the start-element event could directly collect the branch and continue its operations without having to call another function asynchronously.
命名列表的功能。每一个元素标识的XML元素的名称。如果该名称的过程中遇到的一个XML元素SAX流,该流被处理,直到结束该元素和一个内部节点(见xmlTreeParse和useInternalNodes参数)被创建。在我们的分支对应于该XML元素的列表,然后调用的功能(内部)节点作为唯一的参数。这允许一个使用DOM模型对整个文档的一个子树,因此使用SAX和DOM,SAX的效率和简单的编程模型DOM。请注意,分支机构的工作机制自上而下,不适用于嵌套的标签。如果一个指定的元素名称branches参数,例如:内的一个分支的MYNODE,和有是一个嵌套MYNODE,例如,分支处理程序不会被调用,嵌套实例。如果有一个实例,这是有问题的情况下,请联系这个包的维护者。可能会导致解析器收集的一个分支,而不确定的节点内branches名单。具体而言,在正常的启动元素处理,可以返回一个函数的类是SAXBranchFunction。 SAX解析器认识到这一点,并收集了分公司在当前正在处理的节点开始和完成时,调用这个函数。这使我们能够动态地确定哪些节点治疗的分公司,而不是只匹配名称。这是必要的,当一个节点名称的XML的层次结构的不同部分的,例如具有不同的含义dict中的iTunes歌曲列表。参阅的文件itunesSax2.R在矿井例子一个这样的例子。这是一个两步的过程。在未来,我们可能会使其处理开始元素事件,使得R功能可以直接收集的分支,并继续其业务,而无需异步调用另一个函数。
参数:useDotNames
a logical value indicating whether to use the newer format for identifying general element function handlers with the '.' prefix, e.g. .text, .comment, .startElement. If this is FALSE, then the older format text, comment, startElement, ... are used. This causes problems when there are indeed nodes named text or comment or startElement as a node-specific handler are confused with the corresponding general handler of the same name. Using TRUE means that your list of handlers should have names that use the '.' prefix for these general element handlers. This is the preferred way to write new code.
逻辑值,该值指示是否使用新格式的一般元素的功能处理与“识别”。前缀,例如文字,评论,。的startElement。如果这是FALSE,然后将旧格式的文本,评论,的startElement,...被使用。这会导致问题时,确实有相应的处理程序相同的名称命名的文本或注释的startElement作为一个特定节点的处理程序相混淆节点。使用TRUE是指你的处理程序列表中应该有名称中使用。“这些元素处理程序的前缀。这是首选的方法,编写新的代码。
参数:error
a function that is called when an XML error is encountered. This is called with 6 arguments and is described in xmlTreeParse.
一个函数时被调用的XML时遇到错误。这就是所谓的6个参数,并且被描述在xmlTreeParse。
参数:addFinalizer
a logical value or identifier for a C routine that controls whether we register finalizers on the intenal node.
C程序,控制我们是否终结的内置式节点上注册一个逻辑值或标识符。
Details
详细信息----------Details----------
This is now implemented using the libxml parser. Originally, this was implemented via the Expat XML parser by Jim Clark (http://www.jclark.com).
这是现在实施使用libxml的解析器。原来,这是由吉姆克拉克(http://www.jclark.com)通过expat XML解析器实现。
值----------Value----------
The return value is the "handlers" argument. It is assumed that this is a closure and that the callback functions have manipulated variables local to it and that the caller knows how to extract this.
返回值是“处理程序”的说法。它假定,这是一个封闭和回调函数操纵它的局部变量,并且调用方知道如何提取此。
注意----------Note----------
The libxml parser can read URLs via http or ftp. It does not require the support of wget as used in other parts of R, but uses its own facilities to connect to remote servers.
libxml的解析器可以读取的URL通过HTTP或FTP。它并不需要的支持wget中所使用的其他区域的R,但使用自己的设备连接到远程服务器。
The idea for the hybrid SAX/DOM mode where we consume tokens in the stream to create an entire node for a sub-tree of the document was first suggested to me by Seth Falcon at the Fred Hutchinson Cancer Research Center. It is similar to the XML::Twig module in Perl by Michel Rodriguez.
理念的混合动力SAX / DOM模式下,我们消耗的令牌流来创建整个节点的子树中的文件的第一个建议,我由Seth猎鹰的弗雷德哈钦森癌症研究中心。这是一个类似的XML ::嫩枝米歇尔·罗德里格斯在Perl模块。
(作者)----------Author(s)----------
Duncan Temple Lang
参考文献----------References----------
<h3>See Also</h3> <code>xmlTreeParse</code> <code>xmlStopParser</code> XMLParserContextFunction
实例----------Examples----------
fileName <- system.file("exampleData", "mtcars.xml", package="XML")
# Print the name of each XML tag encountered at the beginning of each[在开始时,每个所遇到打印每个XML标签的名称]
# tag.[标签。]
# Uses the libxml SAX parser.[使用的libxml SAX解析器。]
xmlEventParse(fileName,
list(startElement=function(name, attrs){
cat(name,"\n")
}),
useTagName=FALSE, addContext = FALSE)
## Not run: [#不运行:]
# Parse the text rather than a file or URL by reading the URL's contents[通过读取URL的内容,文本,而不是一个文件或URL解析]
# and making it a single string. Then call xmlEventParse[和一个单独的字符串。然后调用xmlEventParse“]
xmlURL <- "http://www.omegahat.org/Scripts/Data/mtcars.xml"
xmlText <- paste(scan(xmlURL, what="",sep="\n"),"\n",collapse="\n")
xmlEventParse(xmlText, asText=TRUE)
## End(Not run)[#(不执行)]
# Using a state object to share mutable data across callbacks[使用一个状态对象共享可变数据在回调]
f <- system.file("exampleData", "gnumeric.xml", package = "XML")
zz <- xmlEventParse(f,
handlers = list(startElement=function(name, atts, .state) {
.state = .state + 1
print(.state)
.state
}), state = 0)
print(zz)
# Illustrate the startDocument and endDocument handlers.[说明处理程序的startDocument,endDocument。]
xmlEventParse(fileName,
handlers = list(startDocument = function() {
cat("Starting document\n")
},
endDocument = function() {
cat("ending document\n")
}),
saxVersion = 2)
if(libxmlVersion()$major >= 2) {
startElement = function(x, ...) cat(x, "\n")
xmlEventParse(file(f), handlers = list(startElement = startElement))
# Parse with a function providing the input as needed.[解析与提供的输入的功能,根据需要。]
xmlConnection =
function(con) {
if(is.character(con))
con = file(con, "r")
if(isOpen(con, "r"))
open(con, "r")
function(len) {
if(len < 0) {
close(con)
return(character(0))
}
x = character(0)
tmp = ""
while(length(tmp) > 0 && nchar(tmp) == 0) {
tmp = readLines(con, 1)
if(length(tmp) == 0)
break
if(nchar(tmp) == 0)
x = append(x, "\n")
else
x = tmp
}
if(length(tmp) == 0)
return(tmp)
x = paste(x, collapse="")
x
}
}
ff = xmlConnection(f)
xmlEventParse(ff, handlers = list(startElement = startElement))
# Parse from a connection. Each time the parser needs more input, it[解析从一个连接。每次解析器需要更多的输入,它]
# calls readLines(<con>, 1)[调用readlines方法(<con>,1)]
xmlEventParse(file(f), handlers = list(startElement = startElement))
# using SAX 2[使用SAX 2]
h = list(startElement = function(name, attrs, namespace, allNamespaces){
cat("Starting", name,"\n")
if(length(attrs))
print(attrs)
print(namespace)
print(allNamespaces)
},
endElement = function(name, uri) {
cat("Finishing", name, "\n")
})
xmlEventParse(system.file("exampleData", "namespaces.xml", package="XML"), handlers = h, saxVersion = 2)
# This example is not very realistic but illustrates how to use the[这个例子是不太现实的,但说明了如何使用]
# branches argument. It forces the creation of complete nodes for[分支机构的说法。它迫使创作的完整的节点]
# elements named <b> and extracts the id attribute.[的元素名为<b>和提取的id属性。]
# This could be done directly on the startElement, but this just[这可以直接做的startElement上,但是这只是]
# illustrates the mechanism.[示出的机制。]
filename = system.file("exampleData", "branch.xml", package="XML")
b.counter = function() {
nodes <- character()
f = function(node) { nodes <<- c(nodes, xmlGetAttr(node, "id"))}
list(b = f, nodes = function() nodes)
}
b = b.counter()
invisible(xmlEventParse(filename, branches = b["b"]))
b$nodes()
filename = system.file("exampleData", "branch.xml", package="XML")
invisible(xmlEventParse(filename, branches = list(b = function(node) {print(names(node))})))
invisible(xmlEventParse(filename, branches = list(b = function(node) {print(xmlName(xmlChildren(node)[[1]]))})))
}
############################################[###########################################]
# Stopping the parser mid-way and an example of using XMLParserContextFunction.[停止解析器中途使用XMLParserContextFunction的例子。]
startElement =
function(ctxt, name, attrs, ...) {
print(ctxt)
print(name)
if(name == "rewriteURI") {
cat("Terminating parser\n")
xmlStopParser(ctxt)
}
}
class(startElement) = "XMLParserContextFunction"
endElement =
function(name, ...)
cat("ending", name, "\n")
fileName = system.file("exampleData", "catalog.xml", package = "XML")
xmlEventParse(fileName, handlers = list(startElement = startElement, endElement = endElement))
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|