XML::NodeFilter - Generic XML::NodeFilter Class |
XML::NodeFilter - Generic XML::NodeFilter Class
use XML::NodeFilter;
my $filter = XML::NodeFilter->new();
$your_iterator->set_filter( $filter );
``Filters are objects that know how to ''filter out`` nodes. If a NodeIterator or a TreeWalker is given a NodeFilter, it applies the filter before it returns the next node. If the filter says to accept the node, the traversal logic returns it; otherwise, traversal looks for the next node and pretends that the node was rejected was not there.''
This definition is given by the DOM Traversal and Range Specification. It explains pretty well, what this class is for: A XML::NodeFilter will recieve a node from a traversal object, such as XML::LibXML::Iterator is one and tells if the given node should be returned to the caller or not.
Although I refere only to XML::LibXML here, XML::NodeFilter is implemented more open, so it can be used with other DOM implementations as well.
The DOM Traversal and Range Specification just defines the contstants
and accept_node()
for a node filter. The XML::NodeFilter
implementation also adds the what_to_show()
function to the class
definition, since I think that it is a filters job to decide which
node-types should be shown and which not.
Also XML::NodeFilter adds two constants which are not part of the specification. The first one is FILTER_DECLINED. It tells the traversal logic, that it should apply another filter in order to decide if the node should be visible or not. While the spec only defines the traversal logic to have either one or no filter applied, it showed that it leads to cleaner code if more filter could be used in conjunktion. If a traversal logic finds a single filter that returns FILTER_DECLINED, it should be handled as a synonym of FILTER_ACCEPT. While FILTER_ACCEPT is finite and would cause all other not to be executed, FILTER_DECLINED gives one more flexibility.
The second extension of the specification is the SHOW_NONE symbol. It was added for operational completeness, so one can explicitly switch the node type filter off (means all node types are rejected). This will cause the two calls of what_to_show have a different result:
$filter->what_to_show( undef ); # will set SHOW_ALL $filter->what_to_show( SHOW_NONE ); # will not set SHOW_ALL
Infact SHOW_NONE is a NULL flag, that means it can be added to any list of flags without altering it.
$filter->what_to_show( SHOW_ELEMENT | SHOW_TEXT | SHOW_NONE );
is therefore identical to
$filter->what_to_show( SHOW_ELEMENT | SHOW_TEXT );
SHOW_NONE is espacially usefull to avoid numerically or even more ugly unintialized values while building such flag lists dynamically.
Actually writing a node filter becomes very simple with
XML::NodeFilter: Simply inherit your specialized node filter from
XML::NodeFilter and implement your implement the function
accept_node(). This name is more perlish than the name given by the
specification. If your implementation needs to stay very close to the
specification, you can alternativly implement
acceptNode(). Implementing both functions makes no sense, since
accept_node()
should be prefered by the traversal logic. Because of
this acceptNode()
will only be called if no accept_node()
implementation was given.
Example:
package My::NodeFilter;
use XML::NodeFilter qw(:results); use vars qw(@ISA); @ISA = qw(XML::NodeFilter);
use XML::LibXML::Common;
sub accept_node { my $filter = shift; my $node = shift;
unless ( $node->getNodeType == ELEMENT_NODE and defined $node->getNamespaceURI ) { # ignore node without a defined namespace return FILTER_REJECT; } return FILTER_DECLINED; }
1;
This example shows a simple nodefilter that will reject any element without a namespace defined. Note that FILTER_DECLINED is returned if the node was not rejected. This allows a traversal logic to apply another filter on the nodes with a namespace defined. If your application needs to use different filters on the namespaced elements depending on the state where you want to traverse your DOM but you need allways to ignore elements without a namespace such a filter will enshure that you need not to add redundant code to your filter or even to choose a base class.
If you need to write some traversal code yourself, you should call the
node filters accept_node()
function to test if the logic should return
the current node. A node is not returned if any filter retunrs
FILTER_SKIP or FILTER_REJECT. In this case you need to reinvoke your
traversal code.
The following code snippet shows how you can make use of XML::NodeFilter in your traversal logic:
use XML::NodeFilter qw(:results);
#... sub traversal_logic { my $refnode = shift; my @filters = @_; my $node = undef;
TRAVERSE: while (1) { my $state = FILTER_DECLINED; # your traversal logic # ... last TRAVERSE unless defined $node; FILTER: foreach my $filter ( @filters ) { $state = $filter->accept_node($node); last TRAVERSE if $state == FILTER_ACCEPT; last FILTER unless $state == FILTER_DECLINED; last TRAVERSE if $state == FILTER_DECLINED; }
return $node; }
As you see the traversal code will call only accept_node()
on each
filter. Still this will work fine with filters, that have acceptNode()
implemented: XML::NodeFilter calls acceptNode()
if the original
accept_node()
function is called. This ashures that filters that use
function names conform to the specification will work as well.
Note that XML::NodeFilter uses as default return value of
accept_node()
FILTER_ACCEPT rather than FILTER_DECLINED. This is done
so you can write 100% specification conform traversal and filter
logic.
new()
If -show is ommited SHOW_ALL is assumed as default.
what_to_show()
what_to_show()
takes
either a bitmask or a hash that holds the information what nodes
should be filtered.
If what_to_show()
is called without any parameter, it simply returns
the bitmaks in scalar context; if called in array context it returns a
hash containing the corresponding information: If a bit is set in the
bitmask the corresponding key has the value 1; otherwise it has the
value 0.
accept_node()
FILTER_DECLINED to indicate that the filter itself would accept if no other (less significant) filters rejects or skips it. NOTE FILTER_DECLINED is not defined by the spec.
FILTER_ACCEPT to indicate that a node is accepted regardless what other filters may indicate.
FILTER_SKIP to indicate a node is skipped, but its descendants should be still available.
FILTER_REJECT to indicate a node and all its descendants should be skipped by the traversal logic.
By default accept_node()
returns FILTER_ACCEPT.
acceptNode()
accept_node()
or acceptNode(). Implmenting both makes no
sense at all!
NOTE: @FLAGNAMES and %FALGMAPPING are not exported. To make use of them you have to use the fully quallified namespace as follows
# gives the value of the SHOW_ELEMENT. my $flag = $XML::NodeFilter::FLAGMAPPING{SHOW_ELEMENT};
XML::NodeFilter will not export any symbols at all. Instead it gives two tags: ':results' and ':flags'.
accept_node()
and are required to
indicate the state
what_to_show()
should be
used conform to the specification rather than using named parameters.
Alternativly you might import ':all' to get all symbols exported by both of the tags just described.
Christian Glahn, <christian.glahn@uibk.ac.at>
the perl manpage, the XML::LibXML::Iterator manpage, the XML::LibXML::NodeFilter manpage
W3C DOM Level 2 Traversal and Range Specification
(c) 2002, Christian Glahn. All rights reserved.
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
XML::NodeFilter - Generic XML::NodeFilter Class |