HTML::Element::traverse man page on Debian

Man page or keyword search:  
man Server   8174 pages
apropos Keyword Search (all sections)
Output format
Debian logo
[printable version]

HTML::Element::traversUsermContributed Perl DocumeHTML::Element::traverse(3pm)

NAME
       HTML::Element::traverse - discussion of HTML::Element's traverse method

SYNOPSIS
	 # $element->traverse is unnecessary and obscure.
	 #   Don't use it in new code.

DESCRIPTION
       "HTML::Element" provides a method "traverse" that traverses the tree
       and calls user-specified callbacks for each node, in pre- or post-
       order.  However, use of the method is quite superfluous: if you want to
       recursively visit every node in the tree, it's almost always simpler to
       write a subroutine does just that, than it is to bundle up the pre-
       and/or post-order code in callbacks for the "traverse" method.

EXAMPLES
       Suppose you want to traverse at/under a node $tree and give elements an
       'id' attribute unless they already have one.

       You can use the "traverse" method:

	 {
	   my $counter = 'x0000';
	   $start_node->traverse(
	     [ # Callbacks;
	       # pre-order callback:
	       sub {
		 my $x = $_[0];
		 $x->attr('id', $counter++) unless defined $x->attr('id');
		 return HTML::Element::OK; # keep traversing
	       },
	       # post-order callback:
	       undef
	     ],
	     1, # don't call the callbacks for text nodes
	   );
	 }

       or you can just be simple and clear (and not have to understand the
       calling format for "traverse") by writing a sub that traverses the tree
       by just calling itself:

	 {
	   my $counter = 'x0000';
	   sub give_id {
	     my $x = $_[0];
	     $x->attr('id', $counter++) unless defined $x->attr('id');
	     foreach my $c ($x->content_list) {
	       give_id($c) if ref $c; # ignore text nodes
	     }
	   };
	   give_id($start_node);
	 }

       See, isn't that nice and clear?

       But, if you really need to know:

THE TRAVERSE METHOD
       The "traverse()" method is a general object-method for traversing a
       tree or subtree and calling user-specified callbacks.  It accepts the
       following syntaxes:

       $h->traverse(\&callback)
       or $h->traverse(\&callback, $ignore_text)
       or $h->traverse( [\&pre_callback,\&post_callback] , $ignore_text)

       These all mean to traverse the element and all of its children.	That
       is, this method starts at node $h, "pre-order visits" $h, traverses its
       children, and then will "post-order visit" $h.  "Visiting" means that
       the callback routine is called, with these arguments:

	   $_[0] : the node (element or text segment),
	   $_[1] : a startflag, and
	   $_[2] : the depth

       If the $ignore_text parameter is given and true, then the pre-order
       call will not be happen for text content.

       The startflag is 1 when we enter a node (i.e., in pre-order calls) and
       0 when we leave the node (in post-order calls).

       Note, however, that post-order calls don't happen for nodes that are
       text segments or are elements that are prototypically empty (like "br",
       "hr", etc.).

       If we visit text nodes (i.e., unless $ignore_text is given and true),
       then when text nodes are visited, we will also pass two extra arguments
       to the callback:

	   $_[3] : the element that's the parent
		    of this text node
	   $_[4] : the index of this text node
		    in its parent's content list

       Note that you can specify that the pre-order routine can be a different
       routine from the post-order one:

	   $h->traverse( [\&pre_callback,\&post_callback], ...);

       You can also specify that no post-order calls are to be made, by
       providing a false value as the post-order routine:

	   $h->traverse([ \&pre_callback,0 ], ...);

       And similarly for suppressing pre-order callbacks:

	   $h->traverse([ 0,\&post_callback ], ...);

       Note that these two syntaxes specify the same operation:

	   $h->traverse([\&foo,\&foo], ...);
	   $h->traverse( \&foo	     , ...);

       The return values from calls to your pre- or post-order routines are
       significant, and are used to control recursion into the tree.

       These are the values you can return, listed in descending order of my
       estimation of their usefulness:

       HTML::Element::OK, 1, or any other true value
	   ...to keep on traversing.

	   Note that "HTML::Element::OK" et al are constants.  So if you're
	   running under "use strict" (as I hope you are), and you say:
	   "return HTML::Element::PRUEN" the compiler will flag this as an
	   error (an unallowable bareword, specifically), whereas if you spell
	   PRUNE correctly, the compiler will not complain.

       undef, 0, '0', '', or HTML::Element::PRUNE
	   ...to block traversing under the current element's content.	(This
	   is ignored if received from a post-order callback, since by then
	   the recursion has already happened.)	 If this is returned by a pre-
	   order callback, no post-order callback for the current node will
	   happen.  (Recall that if your callback exits with just "return;",
	   it is returning undef -- at least in scalar context, and "traverse"
	   always calls your callbacks in scalar context.)

       HTML::Element::ABORT
	   ...to abort the whole traversal immediately.	 This is often useful
	   when you're looking for just the first node in the tree that meets
	   some criterion of yours.

       HTML::Element::PRUNE_UP
	   ...to abort continued traversal into this node and its parent node.
	   No post-order callback for the current or parent node will happen.

       HTML::Element::PRUNE_SOFTLY
	   Like PRUNE, except that the post-order call for the current node is
	   not blocked.

       Almost every task to do with extracting information from a tree can be
       expressed in terms of traverse operations (usually in only one pass,
       and usually paying attention to only pre-order, or to only post-order),
       or operations based on traversing. (In fact, many of the other methods
       in this class are basically calls to traverse() with particular
       arguments.)

       The source code for HTML::Element and HTML::TreeBuilder contain several
       examples of the use of the "traverse" method to gather information
       about the content of trees and subtrees.

       (Note: you should not change the structure of a tree while you are
       traversing it.)

       [End of documentation for the "traverse()" method]

   Traversing with Recursive Anonymous Routines
       Now, if you've been reading Structure and Interpretation of Computer
       Programs too much, maybe you even want a recursive lambda.  Go ahead:

	 {
	   my $counter = 'x0000';
	   my $give_id;
	   $give_id = sub {
	     my $x = $_[0];
	     $x->attr('id', $counter++) unless defined $x->attr('id');
	     foreach my $c ($x->content_list) {
	       $give_id->($c) if ref $c; # ignore text nodes
	     }
	   };
	   $give_id->($start_node);
	   undef $give_id;
	 }

       It's a bit nutty, and it's still more concise than a call to the
       "traverse" method!

       It is left as an exercise to the reader to figure out how to do the
       same thing without using a $give_id symbol at all.

       It is also left as an exercise to the reader to figure out why I
       undefine $give_id, above; and why I could achieved the same effect with
       any of:

	   $give_id = 'I like pie!';
	  # or...
	   $give_id = [];
	  # or even;
	   $give_id = sub { print "Mmmm pie!\n" };

       But not:

	   $give_id = sub { print "I'm $give_id and I like pie!\n" };
	  # nor...
	   $give_id = \$give_id;
	  # nor...
	   $give_id = { 'pie' => \$give_id, 'mode' => 'a la' };

   Doing Recursive Things Iteratively
       Note that you may at times see an iterative implementation of pre-order
       traversal, like so:

	  {
	    my @to_do = ($tree); # start-node
	    while(@to_do) {
	      my $this = shift @to_do;

	      # "Visit" the node:
	      $this->attr('id', $counter++)
	       unless defined $this->attr('id');

	      unshift @to_do, grep ref $_, $this->content_list;
	       # Put children on the stack -- they'll be visited next
	    }
	  }

       This can under certain circumstances be more efficient than just a
       normal recursive routine, but at the cost of being rather obscure.  It
       gains efficiency by avoiding the overhead of function-calling, but
       since there are several method dispatches however you do it (to "attr"
       and "content_list"), the overhead for a simple function call is
       insignificant.

   Pruning and Whatnot
       The "traverse" method does have the fairly neat features of the
       "ABORT", "PRUNE_UP" and "PRUNE_SOFTLY" signals.	None of these can be
       implemented totally straightforwardly with recursive routines, but it
       is quite possible.  "ABORT"-like behavior can be implemented either
       with using non-local returning with "eval"/"die":

	 my $died_on; # if you need to know where...
	 sub thing {
	   ... visits $_[0]...
	   ... maybe set $died_on to $_[0] and die "ABORT_TRAV" ...
	   ... else call thing($child) for each child...
	   ...any post-order visiting $_[0]...
	 }
	 eval { thing($node) };
	 if($@) {
	   if($@ =~ m<^ABORT_TRAV>) {
	     ...it died (aborted) on $died_on...
	   } else {
	     die $@; # some REAL error happened
	   }
	 }

       or you can just do it with flags:

	 my($abort_flag, $died_on);
	 sub thing {
	   ... visits $_[0]...
	   ... maybe set $abort_flag = 1; $died_on = $_[0]; return;
	   foreach my $c ($_[0]->content_list) {
	     thing($c);
	     return if $abort_flag;
	   }
	   ...any post-order visiting $_[0]...
	   return;
	 }

	 $abort_flag = $died_on = undef;
	 thing($node);
	 ...if defined $abort_flag, it died on $died_on

SEE ALSO
       HTML::Element

COPYRIGHT
       Copyright 2000,2001 Sean M. Burke

AUTHOR
       Sean M. Burke, <sburke@cpan.org>

perl v5.10.1			  2006-08-04	  HTML::Element::traverse(3pm)
[top]

List of man pages available for Debian

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net