"new({options})"
Create a new XML::Descent. Options are supplied has a hash reference. The only option recognised
directly by XML::Descent is "Input" which should be reference to the object that provides the XML source.
Any value that can be passed as the first argument to "XML::TokeParser->new" is allowed.
The remaining options are passed directly to "XML::TokeParser". Consult that module's documentation for
more details.
"walk"
Parse part of the XML document tree triggering any handlers that correspond with elements it contains.
When called recursively within a handler "walk" visits all the elements below the element that triggered
the handler and then returns.
"on([elementnames],handler)"
Register a handler to be called when the named element is encountered. Multiple element names may be
supplied as an array reference. Multiple handlers may be registered with one call to "on" by supplying a
number of element, handler pairs.
Calling "on" within a handler defines a nested local handler whose scope is limited to the containing
element. Handlers are called with three arguments: the name of the element that triggered the handler, a
hash of the element's attributes and a user defined context value - see "context" for more about that.
For example:
$p = XML::Descent->new( { Input => \$some_xml } );
# Global handler - trigger anywhere an <options> tag is found
$p->on(
options => sub {
my ( $elem, $attr, $ctx ) = @_;
# Define a nested handler for <name> elements that only
# applies within the <options> handler.
$p->on(
name => sub {
my ( $elem, $attr, $ctx ) = @_;
# Get the inner text of the name element
my $name = $p->text;
print "Name: $name\n";
}
);
# Recursively walk elements inside <options> triggering
# any handlers
$p->walk;
}
);
# Start parsing
$p->walk;
A handler may call one of the parsing methods ("walk", "text", "xml" or "get_token") to consume any
nested XML before returning. If none of the parsing methods are called nested XML is automatically
discarded so that the parser can properly move past the current element.
Nested handlers temporarily override another handler with the same name. A handler named '*' will
trigger for all elements for which there is no explicit handler. A nested '*' handler hides all handlers
defined in containing scopes.
As a shorthand you may specify a path to a nested element:
$p->on( 'a/b/c' => sub {
print "Woo!\n";
})->walk;
That's equivalent to:
$p->on( a => sub {
$p->on( b => sub {
$p->on( c => sub {
print "Woo!\n";
})->walk;
})->walk;
})->walk;
Note that this shorthand only applies to "on" - not to other methods that accept element names.
"inherit([elementnames])"
Inherit handlers from the containing scope. Typically used to import handlers that would otherwise be
masked by a catch all '*' handler.
$p->on(
'a' => sub {
my ( $elem, $attr, $ctx ) = @_;
my $link = $attr->{href} || '';
my $text = $p->text;
print "Link: $text ($link)\n";
}
);
$p->on(
'special' => sub {
my ( $elem, $attr, $ctx ) = @_;
# Within <special> we want to handle all
# tags apart from <a> by printing them out
$p->on(
'*' => sub {
my ( $elem, $attr, $ctx ) = @_;
print "Found: $elem\n";
}
);
# Get the handler for <a> from our containing
# scope.
$p->inherit( 'a' );
$p->walk;
}
);
The inherited handler is the handler that would have applied in the containing scope for an element with
the given name. For example:
$p->on( '*' => sub { print "Whatever\n"; $p->walk; } );
$p->on(
'interesting' => sub {
# Inherits the default 'Whatever' handler because that's the
# handler that would have been called for <frob> in the
# containing scope
$p->inherit( 'frob' );
# Handle everything else ourselves
#p->on('*', sub { $p->walk; });
}
);
"before"
Register a handler to be called before the existing handler for an element. As with "on" multiple
elements may be targeted by providing an array ref.
"after"
Register a handler to be called after the existing handler for an element. As with "on" multiple elements
may be targeted by providing an array ref.
"context"
Every time a handler is called a new scope is created for it. This allows nested handlers to be defined.
The current scope contains a user context variable which can be used, for example, to keep track of an
object that is being filled with values parsed from the XML. The context value is inherited from the
parent scope but may be overridden locally.
For example:
my $root = {};
# Set the outermost context
$p->context( $root );
# Handle HTML <a href...> links /anywhere/
$p->on(
'a' => sub {
my ( $elem, $attr, $ctx ) = @_;
my $link = {
href => $attr->{href},
text => $p->text
};
push @{ $ctx->{links} }, $link;
}
);
# Links in the body are stored in a nested
# object.
$p->on(
'body' => sub {
my ( $elem, $attr, $ctx ) = @_;
my $body = {};
# Set the context
$p->context( $body );
$p->walk;
$ctx->{body} = $body;
}
);
$p->walk;
Note that the handler for <a href...> tags stores its results in the current context object - whatever
that happens to be. That means that outside of any <body> tag links will be stored in $root but within a
<body> they will be stored in a nested object ("$root->{body}"). The <a> handler itself need know nothing
of this.
With no parameter "context" returns the current context. The current context is also passed as the third
argument to handlers.
"text"
Return any text contained within the current element. XML markup is discarded.
"xml"
Return the unparsed inner XML of the current element. For example:
$p->on(
'item' => sub {
my ( $elem, $attr, $ctx ) = @_;
my $item_source = $p->xml;
print "Item: $item_source\n";
}
);
If <item> contains XHTML (for example) the above handler would correctly capture it without recursively
parsing any elements it contains. Parsing
<feed>
<item>This is the <i>first story</i>.</item>
<item>This is <b>another story</b>.</item>
</feed>
would print
Item: This is the <i>first story</i>.
Item: This is <b>another story</b>.
"get_path"
Called within a handler returns the path that leads to the current element. For example:
$p->on(
'here' => sub {
my ( $elem, $attr, $ctx ) = @_;
print "I am here: ", $p->get_path, "\n";
$p->walk;
}
);
would, if applied to this XML
<outer>
<inner>
<here />
</inner>
<here />
</outer>
print
I am here: /outer/inner/here
I am here: /outer/here
"get_token"
XML::Descent is built on "XML::TokeParser" which splits an XML document into a stream of tokens
representing start tags, end tags, literal text, comment and processing instructions. Within an element
"get_token" returns the same stream of tokens that "XML::TokeParser" would produce. Returns "undef" once
all the tokens contained within the current element have been read (i.e. it's impossible to read past the
end of the enclosed XML).
"scope_handlers"
Get a list of all handlers that are registered locally to the current scope. The returned list won't
include '*' if a wildcard handler has been registered.
"all_handlers"
Get a list of all registered handlers in all scopes. The returned list won't include the '*' wildcard
handler.