MIME::Head(3) User Contributed Perl Documentation MIME::Head(3)NAMEMIME::Head - MIME message header
WARNING: This code is in an evaluation phase until 1
August 1996. Depending on any comments/complaints
received before this cutoff date, the interface may change
in a non-backwards-compatible manner.
DESCRIPTION
A class for parsing in and manipulating RFC-822 message
headers, with some methods geared towards standard (and
not so standard) MIME fields as specified in RFC-1521,
Multipurpose Internet Mail Extensions.
SYNOPSIS
Start off by requiring or using this package:
require MIME::Head;
You can create a MIME::Head object in a number of ways:
# Create a new, empty header, and populate it manually:
$head = MIME::Head->new;
$head->set('content-type', 'text/plain; charset=US-ASCII');
$head->set('content-length', $len);
# Create a new header by parsing in the STDIN stream:
$head = MIME::Head->read(\*STDIN);
# Create a new header by parsing in a file:
$head = MIME::Head->from_file("/tmp/test.hdr");
# Create a new header by running a program:
$head = MIME::Head->from_file("cat a.hdr b.hdr |");
To get rid of all internal newlines in all fields:
# Get rid of all internal newlines:
$head->unfold();
To test whether a given field exists:
# Was a "Subject:" given?
if ($head->exists('subject')) {
# yes, it does!
}
To get the contents of that field as a string:
# Is this a reply?
$reply = 1 if ($head->get('Subject') =~ /^Re: /);
To set the contents of a field to a given string:
28/Aug/1996 perl 5.005, patch 03 1
MIME::Head(3) User Contributed Perl Documentation MIME::Head(3)
# Is this a reply?
$head->set('Content-type', 'text/html');
To extract parameters from certain structured fields, as a
hash reference:
# What's the MIME type?
$params = $head->params('content-type');
$mime_type = $$params{_};
$char_set = $$params{'charset'};
$file_name = $$params{'name'};
To get certain commonly-used MIME information:
# The content type (e.g., "text/html"):
$mime_type = $head->mime_type;
# The content transfer encoding (e.g., "quoted-printable"):
$mime_encoding = $head->mime_encoding;
# The recommended filename (e.g., "choosy-moms-choose.gif"):
$file_name = $head->recommended_filename;
# The boundary text, for multipart messages:
$boundary = $head->multipart_boundary;
PUBLIC INTERFACE
Creation, input, and output
new Class method. Creates a new header object, with no
fields.
from_file EXPR
Class or instance method. For convenience, you can
use this to parse a header object in from EXPR, which
may actually be any expression that can be sent to
open() so as to return a readable filehandle. The
"file" will be opened, read, and then closed:
# Create a new header by parsing in a file:
my $head = MIME::Head->from_file("/tmp/test.hdr");
Since this method can function as either a class
constructor or an instance initializer, the above is
exactly equivalent to:
# Create a new header by parsing in a file:
my $head = MIME::Head->new->from_file("/tmp/test.hdr");
On success, the object will be returned; on failure,
the undefined value.
28/Aug/1996 perl 5.005, patch 03 2
MIME::Head(3) User Contributed Perl Documentation MIME::Head(3)
This is really just a convenience front-end onto
read().
print FILEHANDLE
Output to the given FILEHANDLE, or to the currently-
selected filehandle if none was given:
# Output to STDOUT:
$head->print(\*STDOUT);
WARNING: this method does not output the blank line
that terminates the header in a legal message (since
you may not always want it).
read FILEHANDLE
Class or instance method. This constructs a header
object by reading it in from a FILEHANDLE, until
either a blank line or an end-of-stream is
encountered. A syntax error will also halt
processing.
Supply this routine with a reference to a filehandle
glob; e.g., \*STDIN:
# Create a new header by parsing in STDIN:
my $head = MIME::Head->read(\*STDIN);
Since this method can function as either a class
constructor or an instance initializer, the above is
exactly equivalent to:
# Create a new header by parsing in STDIN:
my $head = MIME::Head->new->read(\*STDIN);
Except that you should probably use the first form.
On success, the object will be returned; on failure,
the undefined value.
Getting/setting fields
NOTE: this interface is not as extensive as that of
Mail::Internet; however, I have provided a set of methods
that I can guarantee are supportable across any changes to
the internal implementation of this class.
add FIELD,TEXT,[WHERE]
Add a new occurence of the FIELD, given by TEXT:
# Add the trace information:
$head->add('Received', 'from eryq.pr.mcs.net by gonzo.net with smtp');
The FIELD is automatically coerced to lowercase.
Returns the TEXT.
28/Aug/1996 perl 5.005, patch 03 3
MIME::Head(3) User Contributed Perl Documentation MIME::Head(3)
Normally, the new occurence will be appended to the
existing occurences. However, if the optional WHERE
argument is the string "BEFORE", then the new
occurence will be prepended. NOTE: if you want to be
explicit about appending, use the string "AFTER" for
this argument.
WARNING: this method always adds new occurences; it
doesn't overwrite any existing occurences... so if you
just want to change the value of a field (creating it
if necessary), then you probably don't want to use
this method: consider using set() instead.
add_text FIELD,TEXT
Add some more text to the [last occurence of the]
field:
# Force an explicit character set:
if ($head->get('Content-type') !~ /\bcharset=/) {
$head->add_text('Content-type', '; charset="us-ascii"');
}
The FIELD is automatically coerced to lowercase.
WARNING: be careful if adding text that contains a
newline! A newline in a field value must be followed
by a single space or tab to be a valid continuation
line!
I had considered building this routine so that it
"fixed" bare newlines for you, but then I decided
against it, since the behind-the-scenes trickery would
probably create more problems through confusion. So,
instead, you've just been warned... proceed with
caution.
delete FIELD
Delete all occurences of the given field.
# Remove all the MIME information:
$head->delete('MIME-Version');
$head->delete('Content-type');
$head->delete('Content-transfer-encoding');
$head->delete('Content-disposition');
Currently returns 1 always.
exists FIELD
Returns whether a given field exists:
# Was a "Subject:" given?
if ($head->exists('subject')) {
# yes, it does!
}
28/Aug/1996 perl 5.005, patch 03 4
MIME::Head(3) User Contributed Perl Documentation MIME::Head(3)
The FIELD is automatically coerced to lowercase. This
method returns the undefined value if the field
doesn't exist, and some true value if it does.
fields
Return a list of all fields (in no particular order):
foreach $field (sort $head->fields) {
print "$field: ", $head->get($field), "\n";
}
get FIELD,[OCCUR]
Returns the text of the [first occurence of the]
field, or the empty string if the field is not present
(nice for avoiding those "undefined value" warnings):
# Is this a reply?
$is_reply = 1 if ($head->get('Subject') =~ /^Re: /);
NOTE: this returns the first occurence of the field,
so as to be consistent with Mail::Internet::get().
However, if the optional OCCUR argument is defined, it
specifies the index of the occurence you want: zero
for the first, and -1 for the last.
# Print the first 'Received:' entry:
print "Most recent: ", $head->get('received'), "\n";
# Print the first 'Received:' entry, explicitly:
print "Most recent: ", $head->get('received', 0), "\n";
# Print the last 'Received:' entry:
print "Least recent: ", $head->get('received', -1), "\n";
get_all FIELD
Returns the list of all occurences of the field, or
the empty list if the field is not present:
# How did it get here?
@history = $head->get_all('Received');
NOTE: I had originally experimented with having get()
return all occurences when invoked in an array
context... but that causes a lot of accidents when you
get careless and do stuff like this:
print "\u$field: ", $head->get($field), "\n";
It also made the intuitive behaviour unclear if the
OCCUR argument was given in an array context. So I
opted for an explicit approach to asking for all
occurences.
28/Aug/1996 perl 5.005, patch 03 5
MIME::Head(3) User Contributed Perl Documentation MIME::Head(3)
original_text
Recover the original text that was read() in to create
this object:
print "PARSED FROM:\n", $head->original_text;
set FIELD,TEXT
Set the field to [the single occurence given by] the
TEXT:
# Set the MIME type:
$head->set('content-type', 'text/html');
The FIELD is automatically coerced to lowercase.
This method returns the text.
unfold [FIELD]
Unfold the text of all occurences of the given FIELD.
If the FIELD is omitted, all fields are unfolded.
"Unfolding" is the act of removing all newlines.
$head->unfold;
Currently, returns 1 always.
MIME-specific methods
All of the following methods extract information from the
following structured fields:
Content-type
Content-transfer-encoding
Content-disposition
Be aware that they do not just return the raw contents of
those fields, and in some cases they will fill in sensible
(I hope) default values. Use get() if you need to grab
and process the raw field text.
params FIELD
Extract parameter info from a structured field, and
return it as a hash reference. For example, here is a
field with parameters:
Content-Type: Message/Partial;
number=2; total=3;
id="oc=jpbe0M2Yt4s@thumper.bellcore.com"
Here is how you'd extract them:
28/Aug/1996 perl 5.005, patch 03 6
MIME::Head(3) User Contributed Perl Documentation MIME::Head(3)
$params = $head->params('content-type');
if ($$params{_} eq 'message/partial') {
$number = $$params{'number'};
$total = $$params{'total'};
$id = $$params{'id'};
}
Like field names, parameter names are coerced to
lowercase. The special '_' parameter means the
default parameter for the field.
WARNING: the syntax is a little different for each
field (content-type, content-disposition, etc.). I've
attempted to come up with a nice, simple catch-all
solution: it simply stops when it can't match anything
else.
mime_encoding
Try real hard to determine the content transfer
encoding, which is returned as a non-empty string in
all-lowercase.
If no encoding could be found, the empty string is
returned.
mime_type
Try real hard to determine the content type (e.g.,
"text/plain", "image/gif", "x-weird-type", which is
returned in all-lowercase.
A happy thing: the following code will work just as
you would want, even if there's no subtype (as in "x-
weird-type")... in such a case, the $subtype would
simply be the empty string:
($type, $subtype) = split('/', $head->mime_type);
If the content-type information is missing, it
defaults to "text/plain", as per RFC-1521:
Default RFC-822 messages are typed by this protocol as plain text in
the US-ASCII character set, which can be explicitly specified as
"Content-type: text/plain; charset=us-ascii". If no Content-Type is
specified, this default is assumed.
If just the subtype is missing (a syntax error unless
the type begins with "x-", but we'll tolerate it,
since some brain-dead mailers actually do this), then
it simply is not reported; e.g., "Content-type: TEXT"
is returned simply as "text".
WARNING: prior to version 1.17, a missing subtype was
reported as "x-subtype-unknown". I said at the time
that this might be a really horrible idea, and that I
28/Aug/1996 perl 5.005, patch 03 7
MIME::Head(3) User Contributed Perl Documentation MIME::Head(3)
might change it in the future. Well, it was, so I
did.
If the content type is present but can't be parsed at
all (yow!), the empty string is returned.
multipart_boundary
If this is a header for a multipart message, return
the "encapsulation boundary" used to separate the
parts. The boundary is returned exactly as given in
the Content-type: field; that is, the leading double-
hyphen (--) is not prepended.
(Well, almost exactly... from RFC-1521:
(If a boundary appears to end with white space, the white space
must be presumed to have been added by a gateway, and must be deleted.)
so we oblige and remove any trailing spaces.)
Returns undef (not the empty string) if either the
message is not multipart, if there is no specified
boundary, or if the boundary is illegal (e.g., if it
is empty after all trailing whitespace has been
removed).
recommended_filename
Return the recommended external filename. This is
used when extracting the data from the MIME stream.
Returns undef if no filename could be suggested.
Compatibility tweaks
tweak_FROM_parsing CHOICE
Class method. The parser may be tweaked so that any
line in the header stream that begins with "From "
will be either ignored, flagged as an error, or
coerced into the special field "Mail-from:" (the
default; this approach was inspired by Emacs's "Babyl"
format). Though not valid for a MIME header, this
will provide compatibility with some Unix mail
messages. Just do this:
MIME::Head->tweak_FROM_parsing($choice)
Where $choice is one of 'IGNORE', 'ERROR', or
'COERCE'.
DESIGN ISSUES28/Aug/1996 perl 5.005, patch 03 8
MIME::Head(3) User Contributed Perl Documentation MIME::Head(3)
Why have separate objects for the head and the entity?
See the documentation under MIME::Entity for the rationale
behind this decision.
Why assume that MIME headers are email headers?
I quote from Achim Bohnet, who gave feedback on v.1.9 (I
think he's using the word header where I would use field;
e.g., to refer to "Subject:", "Content-type:", etc.):
There is also IMHO no requirement [for] MIME::Heads to look
like [email] headers; so to speak, the MIME::Head [simply stores]
the attributes of a complex object, e.g.:
new MIME::Head type => "text/plain",
charset => ...,
disposition => ..., ... ;
See the next question for an answer to this one.
Why is MIME::Head so complex, and yet lacking in
composition methods?
Sigh.
I have often wished that the original RFC-822 designers
had taken a different approach, and not given every other
field its own special grammar: read RFC-822 to see what I
mean. As I understand it, in Heaven, all mail message
headers have a very simple syntax that encodes
arbitrarily-nested objects; a consistent, generic
representation for exchanging OO data structures.
But we live in an imperfect world, where there's nonsense
like this to put up with:
From: Yakko Warner <yakko@tower.wb.com>
Subject: Hello, nurse!
Received: from gsfc.nasa.gov by eryq.pr.mcs.net with smtp
(Linux Smail3.1.28.1 #5) id m0tStZ7-0007X4C; Thu, 21 Dec 95 16:34 CST
Received: from rhine.gsfc.nasa.gov by gsfc.nasa.gov (5.65/Ultrix3.0-C)
id AA13596; Thu, 21 Dec 95 17:20:38 -0500
Content-type: text/html; charset=US-ASCII;
name="nurse.html"
I quote from Achim Bohnet, who gave feedback on v.1.9 (I
think he's using the word header where I would use field;
e.g., to refer to "Subject:", "Content-type:", etc.):
28/Aug/1996 perl 5.005, patch 03 9
MIME::Head(3) User Contributed Perl Documentation MIME::Head(3)MIME::Head is too big. A better approach IMHO would be to
have a general header class that knows about allowed characters,
line length, and some (formatting) output routines. There
should be other classes that handle special headers and that
are aware of the semantics/syntax of [those] headers...
From, to, reply-to, message-id, in-reply-to, x-face ...
MIME::Head should only handle MIME specific headers.
As he describes, each kind of field really merits its own
small class (e.g, Mail::Field::Subject,
Mail::Field::MessageId, Mail::Field::XFace, etc.), each of
which provides a from_field() method for parsing field
data into a class object, and a to_field() method for
generating that field from a class object.
I kind of like the elegance of this approach. We could
then have a generic Mail::Head class, instances of which
would consist simply of one or more instances of
subclasses of a generic Mail::Field class. Unrecognized
fields would be represented as instances of Mail::Field by
default.
There would be a MIME::Field class, with subclasses like
MIME::Field::ContentType that would allow us to get fields
like this:
$type = $head->field('content-type')->type;
$subtype = $head->field('content-type')->subtype;
$charset = $head->field('content-type')->charset;
And set fields like this:
$head->field('content-type')->type('text');
$head->field('content-type')->subtype('html');
$head->field('content-type')->charset('us-ascii');
And, with that same MIME::Head object, get at other
fields, like:
$subject = $head->field('subject')->text; # just the flat text
$sender_name = $head->field('from')->name; # e.g., Yakko Warner
$sender_addr = $head->field('from')->addr; # e.g., yakko@tower.wb.com
So why a special MIME::Head subclass of Mail::Head? Why,
to enable us to add MIME-specific wrappers, like this:
28/Aug/1996 perl 5.005, patch 03 10
MIME::Head(3) User Contributed Perl Documentation MIME::Head(3)
package MIME::Head;
@ISA = qw(Mail::Head);
sub recommended_filename {
my $self = shift;
my $try;
# First, try to get it from the content-disposition:
($try = $self->field('content-disposition')->filename) and return $try;
# Next, try to get it from the content-type:
($try = $self->field('content-type')->name) and return $try;
# Give up:
undef;
}
Why all this ""occurence"" jazz? Isn't every field
unique?
Aaaaaaaaaahh....no.
Looking at a typical mail message header, it is sooooooo
tempting to just store the fields as a hash of strings,
one string per hash entry. Unfortunately, there's the
little matter of the Received: field, which (unlike From:,
To:, etc.) will often have multiple occurences; e.g.:
Received: from gsfc.nasa.gov by eryq.pr.mcs.net with smtp
(Linux Smail3.1.28.1 #5) id m0tStZ7-0007X4C; Thu, 21 Dec 95 16:34 CST
Received: from rhine.gsfc.nasa.gov by gsfc.nasa.gov (5.65/Ultrix3.0-C)
id AA13596; Thu, 21 Dec 95 17:20:38 -0500
Received: (from eryq@localhost) by rhine.gsfc.nasa.gov (8.6.12/8.6.12)
id RAA28069; Thu, 21 Dec 1995 17:27:54 -0500
Date: Thu, 21 Dec 1995 17:27:54 -0500
From: Eryq <eryq@rhine.gsfc.nasa.gov>
Message-Id: <199512212227.RAA28069@rhine.gsfc.nasa.gov>
To: eryq@eryq.pr.mcs.net
Subject: Stuff and things
The Received: field is used for tracing message routes,
and although it's not generally used for anything other
than human debugging, I didn't want to inconvenience
anyone who actually wanted to get at that information.
I also didn't want to make this a special case; after all,
who knows what other fields could have multiple occurences
in the future? So, clearly, multiple entries had to
somehow be stored multiple times... and the different
occurences had to be retrievable.
SEE ALSO
MIME::Decoder, MIME::Entity, MIME::Head, MIME::Parser.
28/Aug/1996 perl 5.005, patch 03 11
MIME::Head(3) User Contributed Perl Documentation MIME::Head(3)AUTHOR
Copyright (c) 1996 by Eryq / eryq@rhine.gsfc.nasa.gov
All rights reserved. This program is free software; you
can redistribute it and/or modify it under the same terms
as Perl itself.
The more-comprehensive filename extraction is courtesy of
Lee E. Brotzman, Advanced Data Solutions.
VERSION
$Revision: 1.20 $ $Date: 1996/07/23 19:02:43 $
28/Aug/1996 perl 5.005, patch 03 12