XML::Grove is a tree-based object model for accessing the information set of parsed or stored XML, HTML,
or SGML instances. XML::Grove objects are Perl hashes and arrays where you access the properties of the
objects using normal Perl syntax:
$text = $characters->{Data};
HowToCreateaGrove
There are several ways for groves to come into being, they can be read from a file or string using a
parser and a grove builder, they can be created by your Perl code using the `"new()"' methods of
XML::Grove::Objects, or databases or other sources can act as groves.
The most common way to build groves is using a parser and a grove builder. The parser is the package
that reads the characters of an XML file, recognizes the XML syntax, and produces ``events'' reporting
when elements (tags), text (characters), processing instructions, and other sequences occur. A grove
builder receives (``consumes'' or ``handles'') these events and builds XML::Grove objects. The last
thing the parser does is return the XML::Grove::Document object that the grove builder created, with all
of it's elements and character data.
The most common parser and grove builder are XML::Parser::PerlSAX (in libxml-perl) and
XML::Grove::Builder. To build a grove, create the grove builder first:
$grove_builder = XML::Grove::Builder->new;
Then create the parser, passing it the grove builder as it's handler:
$parser = XML::Parser::PerlSAX->new ( Handler => $grove_builder );
This associates the grove builder with the parser so that every time you parse a document with this
parser it will return an XML::Grove::Document object. To parse a file, use the `"Source"' parameter to
the `"parse()"' method containing a `"SystemId"' parameter (URL or path) of the file you want to parse:
$document = $parser->parse ( Source => { SystemId => 'kjv.xml' } );
To parse a string held in a Perl variable, use the `"Source"' parameter containing a `"String"'
parameter:
$document = $parser->parse ( Source => { String => $xml_text } );
The following are all parsers that work with XML::Grove::Builder:
XML::Parser::PerlSAX (in libxml-perl, uses XML::Parser)
XML::ESISParser (in libxml-perl, uses James Clark's `nsgmls')
XML::SAX2Perl (in libxml-perl, translates SAX 1.0 to PerlSAX)
Most parsers supply more properties than the standard information set below and XML::Grove will make
available all the properties given by the parser, refer to the parser documentation to find out what
additional properties it may provide.
Although there are not any available yet (August 1999), PerlSAX filters can be used to process the output
of a parser before it is passed to XML::Grove::Builder. XML::Grove::PerlSAX can be used to provide input
to PerlSAX filters or other PerlSAX handlers.
UsingGroves
The properties provided by parsers are available directly using Perl's normal syntax for accessing hashes
and arrays. For example, to get the name of an element:
$element_name = $element->{Name};
By convention, all properties provided by parsers are in mixed case. `"Parent"' properties are available
using the `"Data::Grove::Parent"' module.
The following is the minimal set of objects and their properties that you are likely to get from all
parsers:
XML::Grove::Document
The Document object is parent of the root element of the parsed XML document.
Contents An array containing the root element.
A document's `Contents' may also contain processing instructions, comments, and whitespace.
Some parsers provide information about the document type, the XML declaration, or notations and entities.
Check the parser documentation for property names.
XML::Grove::Element
The Element object represents elements from the XML source.
Parent The parent object of this element.
Name A string, the element type name of this element
Attributes A hash of strings or arrays
Contents An array of elements, characters, processing instructions, etc.
In a purely minimal grove, the attributes of an element will be plain text (Perl scalars). Some parsers
provide access to notations and entities in attributes, in which case the attribute may contain an array.
XML::Grove::Characters
The Characters object represents text from the XML source.
Parent The parent object of this characters object
Data A string, the characters
XML::Grove::PI
The PI object represents processing instructions from the XML source.
Parent The parent object of this PI object.
Target A string, the processing instruction target.
Data A string, the processing instruction data, or undef if none was supplied.
In addition to the minimal set of objects above, XML::Grove knows about and parsers may provide the
following objects. Refer to the parser documentation for descriptions of the properties of these
objects.
XML::Grove::
::Entity::External External entity reference
::Entity::SubDoc External SubDoc reference (SGML)
::Entity::SGML External SGML reference (SGML)
::Entity Entity reference
::Notation Notation declaration
::Comment <!-- A Comment -->
::SubDoc A parsed subdocument (SGML)
::CData A CDATA marked section
::ElementDecl An element declaration from the DTD
::AttListDecl An element's attribute declaration, from the DTD