lib::HTML::ElemUser3Contributed Perl Documenlib::HTML::Element(3)NAMEHTML::Element - Class for objects that represent HTML
elements
SYNOPSIS
require HTML::Element;
$a = new HTML::Element 'a', href => 'http://www.oslonett.no/';
$a->push_content("Oslonett AS");
$tag = $a->tag;
$tag = $a->starttag;
$tag = $a->endtag;
$ref = $a->attr('href');
$links = $a->extract_links();
print $a->as_HTML;
DESCRIPTION
Objects of the HTML::Element class can be used to
represent elements of HTML. These objects have attributes
and content. The content is an array of text segments and
other HTML::Element objects. Thus a tree of HTML::Element
objects as nodes can represent the syntax tree for a HTML
document.
The following methods are available:
$h = HTML::Element->new('tag', 'attrname' => 'value',...)
The object constructor. Takes a tag name as argument.
Optionally, allows you to specify initial attributes at
object creation time.
$h->tag()
Returns (optionally sets) the tag name for the element.
The tag is always converted to lower case.
$h->starttag()
Returns the complete start tag for the element. Including
leading "<", trailing ">" and attributes.
$h->endtag()
Returns the complete end tag. Includes leading "</" and
the trailing ">".
$h->parent([$newparent])
Returns (optionally sets) the parent for this element.
24/Aug/1997 perl 5.005, patch 03 1
lib::HTML::ElemUser3Contributed Perl Documenlib::HTML::Element(3)
$h->implicit([$bool])
Returns (optionally sets) the implicit attribute. This
attribute is used to indicate that the element was not
originally present in the source, but was inserted in
order to conform to HTML strucure.
$h->is_inside('tag',...)
Returns true if this tag is contained inside one of the
specified tags.
$h->pos()
Returns (and optionally sets) the current position. The
position is a reference to a HTML::Element object that is
part of the tree that has the current object as root.
This restriction is not enforced when setting pos(), but
unpredictable things will happen if this is not true.
$h->attr('attr', [$value])
Returns (and optionally sets) the value of some attribute.
$h->content()
Returns the content of this element. The content is
represented as a reference to an array of text segments
and references to other HTML::Element objects.
$h->is_empty()
Returns true if there is no content.
$h->insert_element($element, $implicit)
Inserts a new element at current position and updates
pos() to point to the inserted element. Returns $element.
$h->push_content($element_or_text,...)
Adds to the content of the element. The content should be
a text segment (scalar) or a reference to a HTML::Element
object.
$h->delete_content()
Clears the content.
$h->delete()
Frees memory associated with the element and all children.
This is needed because perl's reference counting does not
work since we use circular references.
24/Aug/1997 perl 5.005, patch 03 2
lib::HTML::ElemUser3Contributed Perl Documenlib::HTML::Element(3)
$h->traverse(\&callback, [$ignoretext])
Traverse the element and all of its children. For each
node visited, the callback routine is called with the
node, a startflag and the depth as arguments. If the
$ignoretext parameter is true, then the callback will not
be called for text content. The flag is 1 when we enter a
node and 0 when we leave the node.
If the returned value from the callback is false then we
will not traverse the children.
$h->extract_links([@wantedTypes])
Returns links found by traversing the element and all of
its children. The return value is a reference to an
array. Each element of the array is an array with 2
values; the link value and a reference to the
corresponding element.
You might specify that you just want to extract some types
of links. For instance if you only want to extract <a
href="..."> and <img src="..."> links you might code it
like this:
for (@{ $e->extract_links(qw(a img)) }) {
($link, $linkelem) = @$_;
...
}
$h->dump()
Prints the element and all its children to STDOUT. Mainly
useful for debugging. The structure of the document is
shown by indentation (no end tags).
$h->as_HTML()
Returns a string (the HTML document) that represents the
element and its children.
BUGS
If you want to free the memory assosiated with a tree
built of HTML::Element nodes then you will have to delete
it explicitly. The reason for this is that perl currently
has no proper garbage collector, but depends on reference
counts in the objects. This scheme fails because the
parse tree contains circular references (parents have
references to their children and children have a reference
to their parent).
SEE ALSO
the HTML::AsSubs manpage
24/Aug/1997 perl 5.005, patch 03 3
lib::HTML::ElemUser3Contributed Perl Documenlib::HTML::Element(3)COPYRIGHT
Copyright 1995,1996 Gisle Aas. All rights reserved.
This library is free software; you can redistribute it
and/or modify it under the same terms as Perl itself.
AUTHOR
Gisle Aas <aas@sn.no>
24/Aug/1997 perl 5.005, patch 03 4