MIME::Parser man page on BSDi

Man page or keyword search:  
man Server   6284 pages
apropos Keyword Search (all sections)
Output format
BSDi logo
[printable version]



MIME::Parser(3)User Contributed Perl DocumentationMIME::Parser(3)

NAME
       MIME::Parser - split MIME mail into decoded components

       WARNING: This code is in an evaluation phase until 1
       August 1996.  Depending on any comments/complaints
       received before this cutoff date, the interface may change
       in a non-backwards-compatible manner.

DESCRIPTION
       Where it all begins.  This is how you'll parse MIME
       streams to obtain MIME::Entity objects.

SYNOPSIS
	   use MIME::Parser;

	   # Create a new parser object:
	   my $parser = new MIME::Parser;

	   # Optional: set up parameters that will affect how it extracts
	   #   documents from the input stream:
	   $parser->output_dir("$ENV{HOME}/mimemail");

	   # Parse an input stream:
	   $entity = $parser->read(\*STDIN) or die "couldn't parse MIME stream";

	   # Congratulations: you now have a (possibly multipart) MIME entity!
	   $entity->dump_skeleton;	    # for debugging

WARNINGS
       The organization of the output_path() code changed in
       version 1.11 of this module.  If you are upgrading from a
       previous version, and you use inheritance to override the
       output_path() method, please take a moment to familiarize
       yourself with the new code.  Everything should still work,
       but ya never know...

       New, untested binmode() calls were added in module version
       1.11...	if binmode() is not a NOOP on your system, please
       pay careful attention to your output, and report any
       anomalies.  It is possible that "make test" will fail on
       such systems, since some of the tests involve checking the
       sizes of the output files.  That doesn't necessarily
       indicate a problem.

PUBLIC INTERFACE
       new Create a new parser object.	You can then set up
	   various parameters before doing the actual parsing:

	       my $parser = new MIME::Parser;
	       $parser->output_dir("/tmp");
	       $parser->output_prefix("msg1");
	       my $entity = $parser->read(\*STDIN);

28/Aug/1996	       perl 5.005, patch 03			1

MIME::Parser(3)User Contributed Perl DocumentationMIME::Parser(3)

       output_dir [DIRECTORY]
	   Get/set the output directory for the parsing
	   operation.  This is the directory where the extracted
	   and decoded body parts will go.  The default is ".".

	   If DIRECTORY is not given, the current output
	   directory is returned.  If DIRECTORY is given, the
	   output directory is set to the new value, and the
	   previous value is returned.

       output_path HEAD
	   Utility method.  Given a MIME head for a file to be
	   extracted, come up with a good output pathname for the
	   extracted file.

       o	You'll probably never need to invoke this method
		directly.  As of version 1.11, this method is
		provided so that your output_path_hook() function
		(or your MIME::Parser subclass) can have clean
		access to the original algorithm.  This method no
		longer attempts to run the user hook function.

		Normally, the "directory" portion of the returned
		path will be the output_dir(), and the "filename"
		portion will be the recommended filename
		extracted from the MIME header (or some simple
		temporary file name, starting with the
		output_prefix(), if the header does not specify a
		filename).

		If there is a recommended filename, but it is
		judged to be evil (if it is empty, or if it
		contains "/"s or ".."s or non-ASCII characters),
		then a warning is issued and the temporary file
		name is used in its place.  This may be overly
		restrictive, so...

		NOTE: If you don't like the behavior of this
		function, you can override it with your own
		routine.  See output_path_hook() for details.
		If you want to be OOish about it, you could
		instead define your own subclass of MIME::Parser
		and override it there:

28/Aug/1996	       perl 5.005, patch 03			2

MIME::Parser(3)User Contributed Perl DocumentationMIME::Parser(3)

		     package MIME::MyParser;

		     require 5.002;		   # for SUPER
		     use strict;
		     use package MIME::Parser;

		     @MIME::MyParser::ISA = ('MIME::Parser');

		     sub output_path {
			 my ($self, $head) = @_;

			 # Your code here; FOR EXAMPLE...
			 if (i_have_a_preference) {
			     return my_custom_path;
			 }
			 else {			     # return the default path:
			     return $self->SUPER::output_path($head);
			 }
		     }
		     1;

		Thanks to Laurent Amon for pointing out problems
		with the original implementation, and for making
		some good suggestions.	Thanks also to Achim
		Bohnet for pointing out that there should be a
		hookless, OO way of overriding the output_path.

       output_path_hook SUBREF
	   Install a different function to generate the output
	   filename for extracted message data.	 Declare it like
	   this:

	       sub my_output_path_hook {
		   my $parser = shift;	 # this MIME::Parser
		   my $head = shift;	 # the MIME::Head for the current message

		   # Your code here: it must return a path that can be
		   # open()ed for writing.  Remember that you can ask the
		   # $parser about the output_dir, and you can ask the
		   # $head about the recommended_filename!
	       }

	   And install it immediately before parsing the input
	   stream, like this:

	       # Create a new parser object, and install my own output_path hook:
	       my $parser = new MIME::Parser;
	       $parser->output_path_hook(\&my_output_path_hook);

	       # NOW we can parse an input stream:
	       $entity = $parser->read(\*STDIN);

	   This method is intended for people who are squeamish
	   about creating subclasses.  See the output_path()

28/Aug/1996	       perl 5.005, patch 03			3

MIME::Parser(3)User Contributed Perl DocumentationMIME::Parser(3)

	   documentation for a cleaner, OOish way to do this.

       output_prefix [PREFIX]
	   Get/set the output prefix for the parsing operation.
	   This is a short string that all filenames for
	   extracted and decoded body parts will begin with.  The
	   default is "msg".

	   If PREFIX is not given, the current output prefix is
	   returned.  If PREFIX is given, the output directory is
	   set to the new value, and the previous value is
	   returned.

       parse_two HEADFILE BODYFILE
	   Convenience front-end onto read(), intended for
	   programs running under mail-handlers like deliver,
	   which splits the incoming mail message into a header
	   file and a body file.

	   Simply give this method the paths to the respective
	   files.  These must be pathnames: Perl "open-able"
	   expressions won't work, since the pathnames are shell-
	   quoted for safety.

	   WARNING: it is assumed that, once the files are cat'ed
	   together, there will be a blank line separating the
	   head part and the body part.

       read FILEHANDLE
	   Takes a MIME-stream and splits it into its component
	   entities, each of which is decoded and placed in a
	   separate file in the splitter's output_dir().

	   The stream should be given as a glob ref to a readable
	   FILEHANDLE; e.g., \*STDIN.

	   Returns a MIME::Entity, which may be a single entity,
	   or an arbitrarily-nested multipart entity.  Returns
	   undef on failure.

UNDER THE HOOD
       RFC-1521 gives us the following BNF grammar for the body
       of a multipart MIME message:

	     multipart-body  := preamble 1*encapsulation close-delimiter epilogue

	     encapsulation   := delimiter body-part CRLF

	     delimiter	     := "--" boundary CRLF
					  ; taken from Content-Type field.
					  ; There must be no space between "--"
					  ; and boundary.

28/Aug/1996	       perl 5.005, patch 03			4

MIME::Parser(3)User Contributed Perl DocumentationMIME::Parser(3)

	     close-delimiter := "--" boundary "--" CRLF
					  ; Again, no space by "--"

	     preamble	     := discard-text
					  ; to be ignored upon receipt.

	     epilogue	     := discard-text
					  ; to be ignored upon receipt.

	     discard-text    := *(*text CRLF)

	     body-part	     := <"message" as defined in RFC 822, with all
				 header fields optional, and with the specified
				 delimiter not occurring anywhere in the message
				 body, either on a line by itself or as a substring
				 anywhere.  Note that the semantics of a part
				 differ from the semantics of a message, as
				 described in the text.>

       From this we glean the following algorithm for parsing a
       MIME stream:

28/Aug/1996	       perl 5.005, patch 03			5

MIME::Parser(3)User Contributed Perl DocumentationMIME::Parser(3)

	   PROCEDURE parse
	   INPUT
	       A FILEHANDLE for the stream.
	       An optional end-of-stream OUTER_BOUND (for a nested multipart message).

	   RETURNS
	       The (possibly-multipart) ENTITY that was parsed.
	       A STATE indicating how we left things: "END" or "ERROR".

	   BEGIN
	       LET OUTER_DELIM = "--OUTER_BOUND".
	       LET OUTER_CLOSE = "--OUTER_BOUND--".

	       LET ENTITY = a new MIME entity object.
	       LET STATE  = "OK".

	       Parse the (possibly empty) header, up to and including the
	       blank line that terminates it.	Store it in the ENTITY.

	       IF the MIME type is "multipart":
		   LET INNER_BOUND = get multipart "boundary" from header.
		   LET INNER_DELIM = "--INNER_BOUND".
		   LET INNER_CLOSE = "--INNER_BOUND--".

		   Parse preamble:
		       REPEAT:
			   Read (and discard) next line
		       UNTIL (line is INNER_DELIM) OR we hit EOF (error).

		   Parse parts:
		       REPEAT:
			   LET (PART, STATE) = parse(FILEHANDLE, INNER_BOUND).
			   Add PART to ENTITY.
		       UNTIL (STATE != "DELIM").

		   Parse epilogue:
		       REPEAT (to parse epilogue):
			   Read (and discard) next line
		       UNTIL (line is OUTER_DELIM or OUTER_CLOSE) OR we hit EOF
		       LET STATE = "EOF", "DELIM", or "CLOSE" accordingly.

	       ELSE (if the MIME type is not "multipart"):
		   Open output destination (e.g., a file)

		   DO:
		       Read, decode, and output data from FILEHANDLE
		   UNTIL (line is OUTER_DELIM or OUTER_CLOSE) OR we hit EOF.
		   LET STATE = "EOF", "DELIM", or "CLOSE" accordingly.

	       ENDIF

	       RETURN (ENTITY, STATE).
	   END

28/Aug/1996	       perl 5.005, patch 03			6

MIME::Parser(3)User Contributed Perl DocumentationMIME::Parser(3)

       For reasons discussed in MIME::Entity, we can't just
       discard the "discard text": some mailers actually put data
       in the preamble.

QUESTIONABLE PRACTICES
       Multipart messages are always read line-by-line
	   Multipart document parts are read line-by-line, so
	   that the encapsulation boundaries may easily be
	   detected.  However, bad MIME composition agents (for
	   example, naive CGI scripts) might return multipart
	   documents where the parts are, say, unencoded bitmap
	   files... and, consequently, where such "lines" might
	   be veeeeeeeeery long indeed.

	   A better solution for this case would be to set up
	   some form of state machine for input processing.  This
	   will be left for future versions.

       Multipart parts read into temp files before decoding
	   In my original implementation, the MIME::Decoder
	   classes had to be aware of encapsulation boundaries in
	   multipart MIME documents.  While this decode-while-
	   parsing approach obviated the need for temporary
	   files, it resulted in inflexible and complex decoder
	   implementations.

	   The revised implementation uses temporary files (a la
	   tmpfile()) to hold the encoded portions of MIME
	   documents.  Such files are deleted automatically after
	   decoding is done, and no more than one such file is
	   opened at a time, so you should never need to worry
	   about them.

       Fuzzing of CRLF and newline on input
	   RFC-1521 dictates that MIME streams have lines
	   terminated by CRLF ("\r\n").	 However, it is extremely
	   likely that folks will want to parse MIME streams
	   where each line ends in the local newline character
	   "\n" instead.

	   An attempt has been made to allow the parser to handle
	   both CRLF and newline-terminated input.

       Fuzzing of CRLF and newline on output
	   The "7bit" and "8bit" decoders will decode both a "\n"
	   and a "\r\n" end-of-line sequence into a "\n".

	   The "binary" decoder (default if no encoding
	   specified) still outputs stuff verbatim... so a MIME
	   message with CRLFs and no explicit encoding will be
	   output as a text file that, on many systems, will have
	   an annoying ^M at the end of each line... but this is
	   as it should be.

28/Aug/1996	       perl 5.005, patch 03			7

MIME::Parser(3)User Contributed Perl DocumentationMIME::Parser(3)

CALL FOR TESTERS
       If anyone wants to test out this package's handling of
       both binary and textual email on a system where binmode()
       is not a NOOP, I would be most grateful.	 If stuff breaks,
       send me the pieces (including the original email that
       broke it, and at the very least a description of how the
       output was screwed up).

SEE ALSO
       MIME::Decoder, MIME::Entity, MIME::Head, MIME::Parser.

AUTHOR
       Copyright (c) 1996 by Eryq / eryq@rhine.gsfc.nasa.gov

       All rights reserved.  This program is free software; you
       can redistribute it and/or modify it under the same terms
       as Perl itself.

VERSION
       $Revision: 1.14 $ $Date: 1996/07/06 05:28:29 $

28/Aug/1996	       perl 5.005, patch 03			8

[top]

List of man pages available for BSDi

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net