xmlTreeParse(XML)
xmlTreeParse()所属R语言包:XML
XML Parser
XML分析器
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Parses an XML or HTML file or string containing XML/HTML content, and generates an R structure representing the XML/HTML tree. Use htmlTreeParse when the content is known to be (potentially malformed) HTML. This function has numerous parameters/options and operates quite differently based on their values. It can create trees in R or using internal C-level nodes, both of which are useful in different contexts. It can perform conversion of the nodes into R objects using caller-specified handler functions and this can be used to map the XML document directly into R data structures, by-passing the conversion to an R-level tree which would then be processed recursively or with multiple descents to extract the information of interest.
解析XML或HTML文件或字符串,其中包含XML / HTML内容,并生成R代表XML / HTML树的结构。使用htmlTreeParse的内容是已知的(可能是格式不正确的)HTML。这个函数有许多参数/选项和他们的价值观完全不同的工作。它可以创建在R或使用内部的C级节点,这两者都是在不同的上下文中有用的树木。它可以执行转换成R使用主叫方指定的处理函数的对象的节点,这可以直接到R的数据结构中,通过转换到的R-级树,然后将被处理的递归调用或用于映射的XML文档与多个下行提取感兴趣的信息。
xmlParse and htmlParse are equivalent to the xmlTreeParse and htmlTreeParse respectively, except they both use a default value for the useInternalNodes parameter of TRUE, i.e. they working with and return internal nodes/C-level nodes. These can then be searched using XPath expressions via xpathApply and getNodeSet.
xmlParse和htmlParse等同于xmlTreeParse和htmlTreeParse分别,但他们都使用了默认值useInternalNodes TRUE参数,也就是说,他们的工作并返回内部节点/ C-级别的节点。可以使用XPath表达式进行搜索,通过xpathApply和getNodeSet。
xmlSchemaParse is a convenience function for parsing an XML schema.
xmlSchemaParse是一个方便的功能解析一个XML架构。
用法----------Usage----------
xmlTreeParse(file, ignoreBlanks=TRUE, handlers=NULL, replaceEntities=FALSE,
asText=FALSE, trim=TRUE, validate=FALSE, getDTD=TRUE,
isURL=FALSE, asTree = FALSE, addAttributeNamespaces = FALSE,
useInternalNodes = FALSE, isSchema = FALSE,
fullNamespaceInfo = FALSE, encoding = character(),
useDotNames = length(grep("^\\.", names(handlers))) > 0,
xinclude = TRUE, addFinalizer = TRUE, error = xmlErrorCumulator())
xmlInternalTreeParse(file, ignoreBlanks=TRUE, handlers=NULL, replaceEntities=FALSE,
asText=FALSE, trim=TRUE, validate=FALSE, getDTD=TRUE,
isURL=FALSE, asTree = FALSE, addAttributeNamespaces = FALSE,
useInternalNodes = TRUE, isSchema = FALSE,
fullNamespaceInfo = FALSE, encoding = character(),
useDotNames = length(grep("^\\.", names(handlers))) > 0,
xinclude = TRUE, addFinalizer = TRUE, error = xmlErrorCumulator())
xmlNativeTreeParse(file, ignoreBlanks=TRUE, handlers=NULL, replaceEntities=FALSE,
asText=FALSE, trim=TRUE, validate=FALSE, getDTD=TRUE,
isURL=FALSE, asTree = FALSE, addAttributeNamespaces = FALSE,
useInternalNodes = TRUE, isSchema = FALSE,
fullNamespaceInfo = FALSE, encoding = character(),
useDotNames = length(grep("^\\.", names(handlers))) > 0,
xinclude = TRUE, addFinalizer = TRUE, error = xmlErrorCumulator())
htmlTreeParse(file, ignoreBlanks = TRUE, handlers = NULL,
replaceEntities = FALSE, asText = FALSE, trim = TRUE,
isURL = FALSE, asTree = FALSE,
useInternalNodes = FALSE, encoding = character(),
useDotNames = length(grep("^\\.", names(handlers))) > 0,
xinclude = FALSE, addFinalizer = TRUE,
error = function(...){})
xmlSchemaParse(file, asText = FALSE, xinclude = TRUE, error = xmlErrorCumulator())
参数----------Arguments----------
参数:file
The name of the file containing the XML contents. This can contain \~ which is expanded to the user's home directory. It can also be a URL. See isURL. Additionally, the file can be compressed (gzip) and is read directly without the user having to de-compress (gunzip) it.
包含XML内容的文件的名称。这可以包含\扩展到用户的主目录。它也可以是一个URL。见isURL。此外,该文件可以被压缩(GZIP),直接读取,而用户不必解压缩(用gunzip)。
参数:ignoreBlanks
logical value indicating whether text elements made up entirely of white space should be included in the resulting "tree".
应包括逻辑值,指示文本元素是否完全的白色空间中产生的树。
参数:handlers
Optional collection of functions used to map the different XML nodes to R objects. Typically, this is a named list of functions, and a closure can be used to provide local data. This provides a way of filtering the tree as it is being created in R, adding or removing nodes, and generally processing them as they are constructed in the C code. In a recent addition to the package (version 0.99-8), if this is specified as a single function object, we call that function for each node (of any type) in the underlying DOM tree. It is invoked with the new node and its parent node. This applies to regular nodes and also comments, processing instructions, CDATA nodes, etc. So this function must be sufficiently general to handle them all.
可选的收集使用不同的XML节点映射到R对象的功能。通常,这是一个命名的函数列表,和一封闭可用于提供本地数据。这提供了一种过滤树,因为它是在R创建,添加或删除节点,和一般处理它们,因为它们是在C代码中构建。在最近的一次除了包(0.99-8),如果被指定为一个单一功能的对象,我们调用该函数在底层的DOM树的每个节点(任何类型)。它被调用,在新的节点和它的父节点。这适用于普通节点和意见,处理指令,CDATA节点,等等,所以这个函数必须有足够的一般处理。
参数:replaceEntities
logical value indicating whether to substitute entity references with their text directly. This should be left as False. The text still appears as the value of the node, but there is more information about its source, allowing the parse to be reversed with full reference information.
逻辑值,该值指示是否取代实体直接引用自己的文字。这应该是左为False。仍然显示为文本节点的值,但有更多关于它的来源的信息,让被逆转的解析与完整的参考信息。
参数:asText
logical value indicating that the first argument, "file", should be treated as the XML text to parse, not the name of a file. This allows the contents of documents to be retrieved from different sources (e.g. HTTP servers, XML-RPC, etc.) and still use this parser.
逻辑值,表示的第一个参数,“文件”,应被视为XML文本解析,而不是一个文件名。这使得文件的内容,从不同的来源(例如HTTP服务器,XML-RPC等)进行检索,仍然可以使用这个解析器。
参数:trim
whether to strip white space from the beginning and end of text strings.
是否剥离白色空间的开头和结尾的文本字符串。
参数:validate
logical indicating whether to use a validating parser or not, or in other words check the contents against the DTD specification. If this is true, warning messages will be displayed about errors in the DTD and/or document, but the parsing will proceed except for the presence of terminal errors.
逻辑指示是否使用验证解析器或没有,或换句话说,检查的内容对DTD规范。如果这是真的,警告消息将显示在DTD和/或文件有关的错误,但除了终端存在的错误进行解析。
参数:getDTD
logical flag indicating whether the DTD (both internal and external) should be returned along with the document nodes. This changes the return type.
逻辑指示标志是否DTD(内部和外部)应返回的文档节点。这改变了返回的类型。
参数:isURL
indicates whether the file argument refers to a URL (accessible via ftp or http) or a regular file on the system. If asText is TRUE, this should not be specified. The function attempts to determine whether the data source is a URL by using grep to look for http or ftp at the start of the string. The libxml parser handles the connection to servers, not the R facilities (e.g. scan).
指示是否file参数是指一个URL(可通过FTP或HTTP)或一个普通的文件系统上。如果asText是TRUE,这不应该被指定。试图确定是否使用grep看HTTP或FTP的字符串开始处的数据源是一个URL。 libxml的解析器处理连接到服务器,而不是设施(如:scan)。
参数:asTree
this only applies when on passes a value for the handlers argument and is used then to determine whether the DOM tree should be returned or the handlers object.
这仅适用,当上通过handlers参数的值,然后使用DOM树,以确定是否一律作退回或handlers对象。
参数:addAttributeNamespaces
a logical value indicating whether to return the namespace in the names of the attributes within a node or to omit them. If this is TRUE, an attribute such as xsi:type="xsd:string" is reported with the name xsi:type. If it is FALSE, the name of the attribute is type.
一个逻辑值,该值指示是否返回的属性的名称中的命名空间内的节点,或省略它们。如果这是TRUE,如xsi:type="xsd:string"的名称xsi:type报道的属性。如果是的话FALSE,该属性的名称是type。
参数:useInternalNodes
a logical value indicating whether to call the converter functions with objects of class XMLInternalNode rather than XMLNode. This should make things faster as we do not convert the contents of the internal nodes to R explicit objects. Also, it allows one to access the parent and ancestor nodes. However, since the objects refer to volatile C-level objects, one cannot store these nodes for use in further computations within R. They “disappear” after the processing the XML document is completed. If this argument is TRUE and no handlers are provided, the return value is a reference to the internal C-level document pointer. This can be used to do post-processing via XPath expressions using getNodeSet.
一个逻辑值,该值指示是否调用对象的类XMLInternalNode而不是XMLNode转换器的功能。这应该使事情做得更快,因为我们没有内部节点的内容转换到R明确的对象。此外,它允许你访问的父母和祖先节点。然而,因为对象是指挥发性C级对象,一个不能用于在进一步计算内R.他们“消失”的XML文档的处理之后完成存储这些节点。如果这种说法是TRUE并没有处理程序,返回值是一个C级的内部文件指针的引用。这可以用来做后处理,通过使用XPath表达式getNodeSet。
参数:isSchema
a logical value indicating whether the document is an XML schema (TRUE) and should be parsed as such using the built-in schema parser in libxml.
一个逻辑值,该值指示是否文档是一个XML架构(TRUE)和被解析为使用内置的模式解析器中的libxml。
参数:fullNamespaceInfo
a logical value indicating whether to provide the namespace URI and prefix on each node or just the prefix. The latter (FALSE) is currently the default as that was the original way the package behaved. However, using TRUE is more informative and we will make this the default in the future.
一个逻辑值,该值指示是否提供每个节点或仅为前缀的命名空间URI前缀。后者(FALSE)是目前默认情况下,这是最原始的方式包乖巧。然而,使用TRUE是更多的信息,我们将在默认情况下,在未来。
参数:encoding
a character string (scalar) giving the encoding for the document. This is optional as the document should contain its own encoding information. However, if it doesn't, the caller can specify this for the parser. If the XML/HTML document does specify its own encoding that value is used regardless of any value specified by the caller. (That's just the way it goes!) So this is to be used as a safety net in case the document does not have an encoding and the caller happens to know theactual encoding.
给人的编码为文档的字符串(标量)。这是可选的,因为该文件应该包含自己的编码信息。但是,如果没有的话,主叫方可以指定的解析器。如果XML / HTML文档并指定自己的编码,无论使用该值由调用者指定的任何值。 (这仅仅是不言而喻的方式),因此,这是被用来作为安全网的情况下,文件不具有编码和调用者发生知道theactual编码。
参数:useDotNames
a logical value indicating whether to use the newer format for identifying general element function handlers with the '.' prefix, e.g. .text, .comment, .startElement. If this is FALSE, then the older format text, comment, startElement, ... are used. This causes problems when there are indeed nodes named text or comment or startElement as a node-specific handler are confused with the corresponding general handler of the same name. Using TRUE means that your list of handlers should have names that use the '.' prefix for these general element handlers. This is the preferred way to write new code.
逻辑值,该值指示是否使用新格式的一般元素的功能处理与“识别”。前缀,例如文字,评论,。的startElement。如果这是FALSE,然后将旧格式的文本,评论,的startElement,...被使用。这会导致问题时,确实有相应的处理程序相同的名称命名的文本或注释的startElement作为一个特定节点的处理程序相混淆节点。使用TRUE是指你的处理程序列表中应该有名称中使用。“这些元素处理程序的前缀。这是首选的方法,编写新的代码。
参数:xinclude
a logical value indicating whether to process nodes of the form <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"> to insert content from other parts of (potentially different) documents. TRUE means resolve the external references; FALSE means leave the node as is. Of course, one can process these nodes oneself after document has been parse using handler functions or working on the DOM. Please note that the syntax for inclusion using XPointer is not the same as XPath and the results can be a little unexpected and confusing. See the libxml2 documentation for more details.
一个逻辑值,该值指示是否处理节点<xi:include xmlns:xi="http://www.w3.org/2001/XInclude">插入内容的其他部分(可能不同)文件的形式。 TRUE是指解决外部引用,“FALSE是指离开节点。当然,人们可以自己处理这些节点后文件的解析处理程序功能或工作的DOM。请注意,使用XPath的语法包含使用XPointer的是不一样的,其结果可能是一个有点意外和混乱。有关详细信息,请参阅libxml2的文件。
参数:addFinalizer
a logical value indicating whether the default finalizer routine should be registered to free the internal xmlDoc when R no longer has a reference to this external pointer object. This is only relevant when useInternalNodes is TRUE.
一个逻辑值,该值指示是否默认终结程序的,应当予以登记,释放内部的xmlDoc当R不再有外部指针对象的引用,这。这是相关useInternalNodes是TRUE。
参数:error
a function that is invoked when the XML parser reports an error. When an error is encountered, this is called with 7 arguments. See xmlStructuredStop for information about these If parsing completes and no document is generated, this function is called again with only argument which is a character vector of length 0. This gives the function an opportunity to report all the errors and raise an exception rather than doing this when it sees th first one. This function can do what it likes with the information. It can raise an R error or let parser continue and potentially find further errors. The default value of this argument supplies a function that cumulates the errors If this is NULL, the default error handler function in the package xmlStructuredStop is invoked and this will raise an error in R at that time in R.
一个函数时被调用的XML解析器报告错误。当遇到错误时,这就是所谓的7参数。见xmlStructuredStop关于这些的信息,如果解析完成,并且不会产生任何文件,这个功能被称为唯一的参数是一个字符长度为0的向量。这给出了函数报告的所有错误,并抛出一个异常,而不是这样做的,当它看到日第一次的机会。此功能可以做喜欢的信息。它可以提高一个R错误或,让解析器继续,有可能找到进一步的错误。这个参数的默认值提供了一个累积的错误的功能,如果这是NULL,默认的错误处理功能包中的xmlStructuredStop被调用,这将产生一个错误当时在R在R 。
Details
详细信息----------Details----------
The handlers argument is used similarly to those specified in xmlEventParse. When an XML tag (element) is processed, we look for a function in this collection with the same name as the tag's name. If this is not found, we look for one named startElement. If this is not found, we use the default built in converter. The same works for comments, entity references, cdata, processing instructions, etc. The default entries should be named comment, startElement, externalEntity, processingInstruction, text, cdata and namespace. All but the last should take the XMLnode as their first argument. In the future, other information may be passed via ..., for example, the depth in the tree, etc. Specifically, the second argument will be the parent node into which they are being added, but this is not currently implemented, so should have a default value (NULL).
handlers参数使用同样到指定的xmlEventParse。当XML标记(元素)的处理,我们来看看这个集合中的函数具有相同的名称作为标签的名称。如果没有找到,我们期待一个名为startElement。如果不这样,我们使用的默认内置的转换器。同样的意见,实体引用,CDATA,处理指令等的缺省项工程被命名为comment,startElement,externalEntity,processingInstruction,text ,cdata和namespace。所有,但最后应采取的XmlNode作为第一个参数。在未来,其他信息可通过通过...,例如,在树中的深度等具体地,第二个参数将是它们的父节点被添加,但是这是目前尚未实现,因此,应该有一个默认值(NULL)。
The namespace function is called with a single argument which is an object of class XMLNameSpace. This contains
namespace函数被调用的一个参数,它是一个对象类XMLNameSpace。这包含
id the namespace identifier as used to
ID命名空间标识符,用于
uri the value of the namespace identifier, i.e. the URI
命名空间标识符URI的值,即URI
local a logical value indicating whether the definition
当地的一个逻辑值,该值指示是否定义
One should note that the namespace handler is called before the node in which the namespace definition occurs and its children are processed. This is different than the other handlers which are called after the child nodes have been processed.
每个人都应该注意的是namespace处理程序之前被调用的节点的命名空间定义和它的孩子处理。比其他处理程序,这是所谓的子节点已经被处理后,这是不同的。
Each of these functions can return arbitrary values that are then entered into the tree in place of the default node passed to the function as the first argument. This allows the caller to generate the nodes of the resulting document tree exactly as they wish. If the function returns NULL, the node is dropped from the resulting tree. This is a convenient way to discard nodes having processed their contents.
这些函数中的每一个都可以返回任意的值,然后进入到树中默认的节点作为第一个参数传递给函数的地方。这允许调用者产生完全按照自己的意愿所产生的文件树的节点。如果函数返回NULL,节点被丢弃从产生的树。这是一个方便的方法来丢弃节点在处理它们的内容。
值----------Value----------
By default ( when useInternalNodes is FALSE, getDTD is TRUE, and no handler functions are provided), the return value is, an object of (S3) class XMLDocument. This has two fields named doc and dtd and are of class DTDList and XMLDocumentContent respectively.
默认情况下(当useInternalNodes是FALSE,getDTD是TRUE,没有处理程序提供的功能),返回值是一个对象类(S3)XMLDocument。这有两个字段命名doc和dtd类DTDList和XMLDocumentContent分别。
If getDTD is FALSE, only the doc object is returned.
getDTD如果是FALSE,只有doc对象返回。
The doc object has three fields of its own: file, version and children.
doc对象有其自己的三个领域:file,version和children。
参数:<code>file</code>
The (expanded) name of the file containing the XML.
包含XML的文件名(扩展)。
参数:<code>version</code>
A string identifying the version of XML used by the document.
一个字符串,标识XML文件使用的版本。
参数:<code>children</code>
A list of the XML nodes at the top of the document. Each of these is of class XMLNode. These are made up of 4 fields.
在该文件的顶部的XML节点列表。这些是的类XMLNode。这些都是由4个字段。
nameThe name of the element.
name的元素的名称。
attributesFor regular elements, a named list of XML attributes converted from the <tag x="1" y="abc">
attributes对于常规元素,一个名为XML属性列表的转换,从<tag x="1" y="abc">
childrenList of sub-nodes.
children的子节点的列表。
valueUsed only for text entries.
value只用于文本条目。
Some nodes specializations of XMLNode, such as XMLComment, XMLProcessingInstruction, XMLEntityRef are used. If the value of the argument getDTD is TRUE and the document refers to a DTD via a top-level DOCTYPE element, the DTD and its information will be available in the dtd field. The second element is a list containing the external and internal DTDs. Each of these contains 2 lists - one for element definitions and another for entities. See parseDTD. If a list of functions is given via handlers, this list is returned. Typically, these handler functions share state via a closure and the resulting updated data structures which contain the extracted and processed values from the XML document can be retrieved via a function in this handler list. If asTree is TRUE, then the converted tree is returned. What form this takes depends on what the handler functions have done to process the XML tree. If useInternalNodes is TRUE and no handlers are specified, an object of S3 class XMLInternalDocument is returned. This can be used in much the same ways as an XMLDocument, e.g. with xmlRoot, docName and so on to traverse the tree. It can also be used with XPath queries via getNodeSet, xpathApply and doc["xpath-expression"]. If internal nodes are used and the internal tree returned directly, all the nodes are returned as-is and no attempt to trim white space, remove “empty” nodes (i.e. containing only white space), etc. is done. This is potentially quite expensive and so is not done generally, but should be done during the processing of the nodes. When using XPath queries, such nodes are easily identified and/or ignored and so do not cause any difficulties. They do become an issue when dealing with a node's chidren directly and so one can use simple filtering techniques such as xmlChildren(node)[ ! xmlSApply(node, inherits, "XMLInternalTextNode")] and even check the xmlValue to determine if it contains only white space. xmlChildren(node)[ ! xmlSApply(node, function(x) inherit(x, "XMLInternalTextNode")] && trim(xmlValue(x)) == "") </table>
一些节点的专业的XMLNode,如XMLComment,XMLProcessingInstruction,XMLEntityRef使用。如果该值的说法getDTD是TRUE和文档是指通过顶级DOCTYPE元素的DTD,DTD,其信息将在dtd域。第二个元素是一个列表,其中包含的外部和内部DTD。所有这些包含2个列表 - 一个元素定义和另一个实体。见parseDTD。通过handlers如果一个功能列表,该列表将被返回。通常情况下,这些处理功能股通过一个封闭的状态下得到的最新数据结构,其中包含从XML文档中提取和处理的值可以通过此处理程序列表中的函数检索。如果asTreeTRUE,然后将转换后的树被返回。什么样的形式,这需要依赖于处理函数都做了处理XML树。如果useInternalNodes是TRUE并没有指定的处理程序,对象的S3类XMLInternalDocument返回。在大致相同的方式作为XMLDocument,例如,这可以被用来xmlRoot,docName“等对树进行遍历。也可以使用XPath查询中使用通过getNodeSet,xpathApply和doc["xpath-expression"]。如果内部节点,并的内部树直接返回,所有的节点都返回的是并没有试图去修剪空格,删除“空”的节点(即只包含空格),等完成。这是可能相当昂贵的,所以一般不这样做,但应的节点的处理过程中完成。使用XPath查询时,这些节点很容易识别和/或忽略,因此不会造成任何困难。他们成为一个问题时,节点的chidren直接处理,因此可以使用简单的过滤技术,如 xmlChildren(node)[ ! xmlSApply(node, inherits, "XMLInternalTextNode")]“,甚至检查xmlValue,以确定它是否仅包含空白。 xmlChildren(node)[ ! xmlSApply(node, function(x) inherit(x, "XMLInternalTextNode")] && trim(xmlValue(x)) == "")</ TABLE>
注意----------Note----------
Make sure that the necessary 3rd party libraries are available.
确保必要的第三方库提供。
(作者)----------Author(s)----------
Duncan Temple Lang <duncan@wald.ucdavis.edu>
参考文献----------References----------
<h3>See Also</h3> xmlEventParse, <code>free</code> for releasing the memory when an <code>XMLInternalDocument</code> object is returned.
实例----------Examples----------
fileName <- system.file("exampleData", "test.xml", package="XML")
# parse the document and return it in its standard format.[解析文档并返回它的标准格式。]
xmlTreeParse(fileName)
# parse the document, discarding comments.[解析文档,丢弃的意见。]
xmlTreeParse(fileName, handlers=list("comment"=function(x,...){NULL}), asTree = TRUE)
# print the entities[打印的实体]
invisible(xmlTreeParse(fileName,
handlers=list(entity=function(x) {
cat("In entity",x$name, x$value,"\n")
x}
), asTree = TRUE
)
)
# Parse some XML text.[解析一些XML文本。]
# Read the text from the file[从该文件中读取文本]
xmlText <- paste(readLines(fileName), "\n", collapse="")
print(xmlText)
xmlTreeParse(xmlText, asText=TRUE)
# with version 1.4.2 we can pass the contents of an XML[1.4.2版本中,我们可以通过一个XML的内容]
# stream without pasting them.[流,而将其粘贴。]
xmlTreeParse(readLines(fileName), asText=TRUE)
# Read a MathML document and convert each node[阅读的MATHML文件,并转换成每个节点]
# so that the primary class is [因此,第一类是]
# <name of tag>MathML[<tag> MATHML名称>]
# so that we can use method dispatching when processing[这样我们就可以调度处理时使用方法]
# it rather than conditional statements on the tag name.[标签上的名称,而不是条件语句。]
# See plotMathML() in examples/.[见在examples / plotMathML()。]
fileName <- system.file("exampleData", "mathml.xml",package="XML")
m <- xmlTreeParse(fileName,
handlers=list(
startElement = function(node){
cname <- paste(xmlName(node),"MathML", sep="",collapse="")
class(node) <- c(cname, class(node));
node
}))
# In this example, we extract _just_ the names of the[在这个例子中,我们提取_just_的名称]
# variables in the mtcars.xml file. [变量在mtcars.xml文件。]
# The names are the contents of the <variable>[名称的<变量>的内容]
# tags. We discard all other tags by returning NULL[标签。我们放弃所有其他标签返回NULL]
# from the startElement handler.[从在startElement处理程序。]
#[]
# We cumulate the names of variables in a character[我们累积的变量名中的字符]
# vector named `vars'.[向量瓦尔。]
# We define this within a closure and define the [我们定义在一个封闭和定义]
# variable function within that closure so that it[变量在该关闭的功能,因此,它]
# will be invoked when the parser encounters a <variable>[时,将调用解析器遇到<变量>]
# tag.[标签。]
# This is called with 2 arguments: the XMLNode object (containing[这就是所谓的有两个参数:XMLNode对象(含]
# its children) and the list of attributes.[它的儿童)和属性的列表。]
# We get the variable name via call to xmlValue().[我们得到的变量名通过调用到xmlValue()。]
# Note that we define the closure function in the call and then [注意,我们定义了封闭功能的呼叫,然后]
# create an instance of it by calling it directly as[通过直接调用它作为创建它的一个实例]
# (function() {...})()[(函数(){...})()]
# Note that we can get the names by parsing[需要注意的是,我们可以通过分析得到的名字]
# in the usual manner and the entire document and then executing[在以通常的方式与整个文档,然后执行]
# xmlSApply(xmlRoot(doc)[[1]], function(x) xmlValue(x[[1]]))[xmlSApply(xmlRoot(文档)[[1]],函数(x)xmlValue(×[[1]]))]
# which is simpler but is more costly in terms of memory.[这是简单的,但在内存方面更昂贵。]
fileName <- system.file("exampleData", "mtcars.xml", package="XML")
doc <- xmlTreeParse(fileName, handlers = (function() {
vars <- character(0) ;
list(variable=function(x, attrs) {
vars <<- c(vars, xmlValue(x[[1]]));
NULL},
startElement=function(x,attr){
NULL
},
names = function() {
vars
}
)
})()
)
# Here we just print the variable names to the console[在这里,我们只是打印到控制台的变量名]
# with a special handler.[一个特殊的处理。]
doc <- xmlTreeParse(fileName, handlers = list(
variable=function(x, attrs) {
print(xmlValue(x[[1]])); TRUE
}), asTree=TRUE)
# This should raise an error.[这应该引发一个错误。]
try(xmlTreeParse(
system.file("exampleData", "TestInvalid.xml", package="XML"),
validate=TRUE))
## Not run: [#不运行:]
# Parse an XML document directly from a URL.[直接从URL解析一个XML文档。]
# Requires Internet access.[需要互联网接入。]
xmlTreeParse("http://www.omegahat.org/Scripts/Data/mtcars.xml", asText=TRUE)
## End(Not run)[#(不执行)]
counter = function() {
counts = integer(0)
list(startElement = function(node) {
name = xmlName(node)
if(name %in% names(counts))
counts[name] <<- counts[name] + 1
else
counts[name] <<- 1
},
counts = function() counts)
}
h = counter()
xmlParse(system.file("exampleData", "mtcars.xml", package="XML"), handlers = h)
h$counts()
f = system.file("examples", "index.html", package = "XML")
htmlTreeParse(readLines(f), asText = TRUE)
htmlTreeParse(readLines(f))
# Same as [同]
htmlTreeParse(paste(readLines(f), collapse = "\n"), asText = TRUE)
getLinks = function() {
links = character()
list(a = function(node, ...) {
links <<- c(links, xmlGetAttr(node, "href"))
node
},
links = function()links)
}
h1 = getLinks()
htmlTreeParse(system.file("examples", "index.html", package = "XML"), handlers = h1)
h1$links()
h2 = getLinks()
htmlTreeParse(system.file("examples", "index.html", package = "XML"), handlers = h2, useInternalNodes = TRUE)
all(h1$links() == h2$links())
# Using flat trees[使用平面树]
tt = xmlHashTree()
f = system.file("exampleData", "mtcars.xml", package="XML")
xmlTreeParse(f, handlers = list(.startElement = tt[[".addNode"]]))
xmlRoot(tt)
doc = xmlTreeParse(f, useInternalNodes = TRUE)
sapply(getNodeSet(doc, "//variable"), xmlValue)
#free(doc) [免费(DOC)]
# character set encoding for HTML[字符集编码的HTML]
f = system.file("exampleData", "9003.html", package = "XML")
# we specify the encoding[我们指定的编码]
d = htmlTreeParse(f, encoding = "UTF-8")
# get a different result if we do not specify any encoding[得到了不同的结果,如果我们不指定任何编码]
d.no = htmlTreeParse(f)
# document with its encoding in the HEAD of the document.[其编码的文档的HEAD文件。]
d.self = htmlTreeParse(system.file("exampleData", "9003-en.html",package = "XML"))
# XXX want to do a test here to see the similarities between d and[XXX想在这里做一个测试,看和D之间的相似性]
# d.self and differences between d.no[d.self和d.no之间的差异]
# include[包括]
f = system.file("exampleData", "nodes1.xml", package = "XML")
xmlRoot(xmlTreeParse(f, xinclude = FALSE))
xmlRoot(xmlTreeParse(f, xinclude = TRUE))
f = system.file("exampleData", "nodes2.xml", package = "XML")
xmlRoot(xmlTreeParse(f, xinclude = TRUE))
# Errors[错误]
try(xmlTreeParse("<doc><a> & < <?pi > </doc>"))
# catch the error by type.[捕获错误的类型。]
tryCatch(xmlTreeParse("<doc><a> & < <?pi > </doc>"),
"XMLParserErrorList" = function(e) {
cat("Errors in XML document\n", e$message, "\n")
})
# terminate on first error [第一个错误时终止]
try(xmlTreeParse("<doc><a> & < <?pi > </doc>", error = NULL))
# see xmlErrorCumulator in the XML package [参见在XML包的xmlErrorCumulator]
f = system.file("exampleData", "book.xml", package = "XML")
doc.trim = xmlInternalTreeParse(f, trim = TRUE)
doc = xmlInternalTreeParse(f, trim = FALSE)
xmlSApply(xmlRoot(doc.trim), class)
# note the additional XMLInternalTextNode objects[注意的额外XMLInternalTextNode对象]
xmlSApply(xmlRoot(doc), class)
top = xmlRoot(doc)
textNodes = xmlSApply(top, inherits, "XMLInternalTextNode")
sapply(xmlChildren(top)[textNodes], xmlValue)
# Storing nodes[存储节点]
f = system.file("exampleData", "book.xml", package = "XML")
titles = list()
xmlTreeParse(f, handlers = list(title = function(x)
titles[[length(titles) + 1]] <<- x))
sapply(titles, xmlValue)
rm(titles)
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|