This handler builds a lightweight tree structure representing the XML document. This structure is, at
least in this author's opinion, easier to work with than the "standard" style of tree. It is the same
type of structure as built by XML::Parser when using XML::Parser::EasyTree, or by the get_simple_tree
method in XML::Records.
The tree is returned as a reference to an array of tree nodes, each of which is a hash reference. All
nodes have a 'type' key whose value is the type of the node: 'e' for element nodes, 't' for text nodes,
and 'p' for processing instruction nodes. All nodes also have a 'content' key whose value is a reference
to an array holding the element's child nodes for element nodes, the string value for text nodes, and the
data value for processing instruction nodes. Element nodes also have an 'attrib' key whose value is a
reference to a hash of attribute names and values and a 'name' key whose value is the element's name.
Processing instructions also have a 'target' key whose value is the PI's target.
EasyTree nodes are ordinary Perl hashes and are not objects. Contiguous runs of text are always returned
in a single node.
The reason the parser returns an array reference rather than the root element's node is that an XML
document can legally contain processing instructions outside the root element (the xml-stylesheet PI is
commonly used this way).
If namespace information is available (only possible with PerlSAX 2), element and attribute names will be
prefixed with their (possibly empty) namespace URI enclosed in curly brackets, and namespace prefixes
will be stripped from names.
METHODS
$handler = XML::Handler::EasyTree->new([options])
Creates a handler object. Options can be provided hash-style:
Noempty
If this is set to a true value, text nodes consisting entirely of whitespace will not be stored
in the tree. The default is false.
Latin
If this is set to a true value, characters with Unicode values in the Latin-1 range (160-255)
will be stored in the tree as Latin-1 rather than UTF-8. The default is false.
Searchable
If this is set to a true value, the parser will return a tree of
XML::Handler::EasyTree::Searchable objects rather than bare array references, providing access to
the navigation methods listed below. The top-level node returned will be a dummy element node
with a name of "__TOPLEVEL__". It is false by default. Setting this option automatically
enables the Noempty option.
XML::Handler::EasyTree::SearchableMETHODS
If the Searchable option is set, all nodes in the tree will be XML::Handler::EasyTree::Searchable
objects, which have the same structure as EasyTree nodes but also implement the following methods similar
to those in XML::SimpleObject.
$name = $node->name()
Returns the name of the node. Ideally, it should return a "fully qualified" name, but it doesn't.
$val = $node->value()
Returns the text value associated with a node object. Returns undef if the node has no text children
or its first child is not a text node.
$newobj = $obj->child( $name );
Returns a child (elements only) of the object with the $name.
For the case where there is more than one child that match $name, the array context semantics haven't
been completely worked out: - in an array context, all children are returned. - in scalar context,
the first child matching $name is returned.
In a scalar context, The XML::Parser::SimpleObj class returns an object containing all the children
matching $name, unless there is only one child in which case it returns that child (see commented
code). I find that behavior confusing.
@children = $obj->children( $name );
Returns a list of all children (elements only) of the $obj that match $name -- in the order in which
they appeared in the original xml text.
@children_names = $obj->children_names();
Returns a list of all the names of the objects children (elements only) in the order in which they
appeared in the original text.
$attrib = $obj->attribute( $att_name );
Returns the string associated with the attribute of the object. If not found returns a null string.
@attribute_list = $obj->attribute_list();
Returns a list (in no particular order) of the attribute names associated with the object
$text = $obj->dump_tree();
Returns a textual representation (in xml form) of the object's hierarchy. Only elements are
processed. The result will be in whatever character encoding the SAX driver delivered (which may not
be the same encoding as the original source).
$text = $obj->pretty_dump_tree();
Identical to dump_tree(), except that newline and indentation embellishments are added
EXAMPLE
#! /usr/bin/perl -w
use XML::Handler::Trees;
use XML::Parser::PerlSAX;
use strict;
my $p=XML::Parser::PerlSAX->new();
my $h=XML::Handler::EasyTree->new( Searchable=>1 );
my $easytree=$p->parse( Handler => $h, Source => { SystemId => 'systemB.xml' } );
my $vme = $easytree->child( "vmesystem" );
print "\n";
print "vmesystem config: ", $vme->attribute( "configuration_name" ), "\n";
print "\n";
print "vmesystem children: ", join( ', ', $vme->children_names() ), "\n";
print "\n";
print "gps model is ", $vme->child( "gps" )->child( "model" )->value(), "\n";
my $gps = $vme->child( "gps" );
print "gps slot is ", $gps->child( "slot" )->value(), "\n";
print "\n";
print "reconstructed XML: \n";
print $easytree->dump_tree(), "\n";
# print "\n";
# print "recontructed XML (pretty): \n";
# print $easytree->pretty_dump_tree(), "\n";
print "\n";
exit;