btparse man page on DragonFly

Man page or keyword search:  
man Server   44335 pages
apropos Keyword Search (all sections)
Output format
DragonFly logo
[printable version]

btparse(3)			    btparse			    btparse(3)

NAME
       btparse - C library for parsing and processing BibTeX data files

SYNOPSIS
	  #include <btparse.h>

	  /* Basic library initialization / cleanup */
	  void bt_initialize (void);
	  void bt_free_ast (AST *ast);
	  void bt_cleanup (void);

	  /* Input / interface to parser */
	  void	 bt_set_stringopts (bt_metatype_t metatype, ushort options);
	  AST * bt_parse_entry_s (char *    entry_text,
				  char *    filename,
				  int	    line,
				  ushort    options,
				  boolean * status);
	  AST * bt_parse_entry	 (FILE *    infile,
				  char *    filename,
				  ushort    options,
				  boolean * status);
	  AST * bt_parse_file	 (char *    filename,
				  ushort    options,
				  boolean * overall_status);

	  /* AST traversal/query */
	  AST * bt_next_entry (AST * entry_list,
			       AST * prev_entry)
	  AST * bt_next_field (AST *entry, AST *prev, char **name);
	  AST * bt_next_value (AST *head,
			       AST *prev,
			       bt_nodetype_t *nodetype,
			       char **text);

	  bt_metatype_t bt_entry_metatype (AST *entry);
	  char *bt_entry_type (AST *entry);
	  char *bt_entry_key (AST *entry);
	  char *bt_get_text (AST *node);

	  /* Splitting names and lists of names */
	  bt_stringlist * bt_split_list (char *	  string,
					 char *	  delim,
					 char *	  filename,
					 int	  line,
					 char *	  description);
	  void bt_free_list (bt_stringlist *list);
	  bt_name * bt_split_name (char *  name,
				   char *  filename,
				   int	   line,
				   int	   name_num);
	  void bt_free_name (bt_name * name);

	  /* Formatting names */
	  bt_name_format * bt_create_name_format (char * parts, boolean abbrev_first);
	  void bt_free_name_format (bt_name_format * format);
	  void bt_set_format_text (bt_name_format * format,
				   bt_namepart part,
				   char * pre_part,
				   char * post_part,
				   char * pre_token,
				   char * post_token);
	  void bt_set_format_options (bt_name_format * format,
				      bt_namepart part,
				      boolean abbrev,
				      bt_joinmethod join_tokens,
				      bt_joinmethod join_part);
	  char * bt_format_name (bt_name * name, bt_name_format * format);

	  /* Construct tree from TeX groups */
	  bt_tex_tree * bt_build_tex_tree (char * string);
	  void		bt_free_tex_tree (bt_tex_tree **top);
	  void		bt_dump_tex_tree (bt_tex_tree *node, int depth, FILE *stream);
	  char *	bt_flatten_tex_tree (bt_tex_tree *top);

	  /* Miscellaneous string utilities */
	  void bt_purify_string (char * string, ushort options);
	  void bt_change_case (char transform, char * string, ushort options);

DESCRIPTION
       btparse is a C library for parsing and processing BibTeX files.	It
       provides a lexical scanner and LR parser (constructed by PCCTS), both
       of which are efficient and offer good error detection and recovery; a
       set of functions for traversing the AST (abstract syntax tree) gener‐
       ated by the parser; and utility functions for manipulating strings
       according to BibTeX conventions.	 (Note that nothing in the library
       assumes that you're using BibTeX files for their original purpose of
       bibliographic data for scholarly publications; you could use the file
       format for any conceivable purpose that fits it.	 However, there is
       some code in the library that is really only appropriate for use with
       strings meant to be processed in the same way that BibTeX itself does.
       This is all entirely optional, though.)

       Note that the interface provided by btparse, while complete, is fairly
       low-level.  If you have more sophisticated needs, you might be inter‐
       ested my "Text::BibTeX" module for Perl 5 (available on CPAN).

CONCEPTS AND TERMINOLOGY
       To understand this document and use btparse, you should already be
       familiar with the BibTeX language---more specifically, the BibTeX data
       description language.  (BibTeX being the complex beast that it is, one
       can conceive of the term applying to the program, the data language,
       the particular database structure described in the original BibTeX doc‐
       umentation, the ".bst" formatting language, and the set of conventions
       embodied in the standard styles included with the BibTeX distribution.
       In this document, I'll stick to the first two meanings---the data lan‐
       guage because that's what btparse deals with, and the program because
       it's occasionally necessary to explain differences between my parser
       and BibTeX's.)

       In particular, you should have a good idea what's going on in the fol‐
       lowing:

	  @string{and = { and },
		  joe = "Blow, Joe",
		  john = "John Smith"}

	  @book(ourbook,
		author = joe # and # john,
		title = {Our Little Book})

       If this looks like something you want to parse, but don't want to have
       to write your own parser for, you've come to the right place.

       Before going much further, though, you're going to have to learn some
       of the terminology I use for describing BibTeX data.  Most of it's the
       same as you'll find in any BibTeX documentation, but it's important to
       be sure that we're talking about the same things here.  So, some defi‐
       nitions:

       top-level
	   All text in a BibTeX file from the start of the file to the start
	   of the first entry, and between entries thereafter.

       name
	   A string of letters, digits, and the following characters:

	      ! $ & * + - . / : ; < > ? [ ] ^ _ ` ⎪

	   A "name" is a catch-all used for entry types, entry keys, and field
	   and macro names.  For BibTeX compatibility, there are slightly dif‐
	   ferent rules for these four entities; currently, the only such rule
	   actually implemented is that field and macro names may not begin
	   with a digit.  Some names in the above example: "string", "and".

       entry
	   A chunk of text starting with an "at" sign ("@") at top-level, fol‐
	   lowed by a name (the entry type), an entry delimiter ("{" or "("),
	   and proceeding to the matching closing delimiter.  Also, the data
	   structure that results from parsing this chunk of text.  There are
	   two entries in the above example.

       entry type
	   The name that comes right after an "@" at top-level.	 Examples from
	   above: "string", "book".

       entry metatype
	   A classification of entry types that allows us to group one or more
	   entry types under the same heading.	With the standard BibTeX data‐
	   base structure, "article", "book", "inbook", etc. all fall under
	   the "regular entry" metatype.  Other metatypes are "macro defini‐
	   tion" (for "string" entries), "preamble" (for "preamble") entries,
	   and "comment" ("comment" entries).  In fact, any entry whose type
	   is not one of "string", "preamble", or "comment" is called a "regu‐
	   lar" entry.

       entry delimiters
	   "{" and "}", or "(" and ")": the pair of characters that (almost)
	   mark the boundaries of an entry.  "Almost" because the start of an
	   entry is marked by an "@", not by the "entry open" delimiter.

       entry key
	   (Or just key when it's clear what we're speaking of.)  The name
	   immediately following the entry open delimiter in a regular entry,
	   which uniquely identifies the entry.	 Example from above: "our‐
	   book".  Only regular entries have keys.

       field
	   A name to the left of an equals sign in a regular or macro-defini‐
	   tion entry.	In the latter context, might also be called a macro
	   name.  Examples from above: "joe", "author".

       field list
	   In a regular entry, everything between the entry delimiters except
	   for the entry key.  In a macro definition entry, everything between
	   the entry delimiters (possibly also called a macro list).

       compound value
	   (Usually just "value".)  The text that follows an equals sign ("=")
	   in a regular or macro definition entry, up to a comma or the entry
	   close delimiter; a list of one or more simple values joined by hash
	   signs ("#").

       simple value
	   A string, macro, or number.

       string
	   (Or, sometimes, "quoted string.")  A chunk of text between quotes
	   (""") or braces ("{" and "}").  Braces must balance: "{this is a
	   {string}" is not a BibTeX string, but "{this is a {string}}" is.
	   ("this is a {string" is also illegal, mainly to avoid the possibil‐
	   ity of generating bogus TeX code--which BibTeX will do in certain
	   cases.)

       macro
	   A name that appears on the right-hand side of an equals sign (i.e.
	   as one simple value in a compound value).  Implies that this name
	   was defined as a macro in an earlier macro definition entry, but
	   this is only checked if btparse is being asked to expand macros to
	   their full definitions.

       number
	   An unquoted string of digits.

       Working with btparse generally consists of passing the library some
       BibTeX data (or a source for some BibTeX data, such as a filename or a
       file pointer), which it then lexically scans, parses, and constructs an
       abstract syntax tree (AST) from.	 It returns this AST to you, and you
       call other btparse functions to traverse and query the tree.

       The contents of AST nodes are the private domain of the library, and
       you shouldn't go poking into them.  This being C, though, there's noth‐
       ing to prevent you from doing so except good manners and the possibil‐
       ity that I might change the AST structure in future releases, breaking
       any badly-behaved code.	Also, it's not necessary to know the struc‐
       tural relationships between nodes in the AST---that's taken care of by
       the query/traversal functions.

       However, it's useful to know some of the things that btparse deposits
       in the AST and returns to you through those query/traversal functions.
       First off, each node has a "node type," which records the syntactic
       element corresponding to each node.  For instance, the entry

	  @book{mybook, author = "Joe Blow", title = "My Little Book"}

       is rooted by an "entry" node; under this would be found a "key" node
       (for the entry key), two "field" nodes (for the "author" and "title"
       fields); and associated with each field node would be a "string" node.
       The only time this concerns you is when you ask the library for a sim‐
       ple value; just looking at the text is not enough to distinguish quoted
       strings, numbers, and macro names, so btparse returns the nodetype as
       well.

       In addition to the nodetype, btparse records the metatype of each
       "entry" node.  This allows you (and the library) to distinguish, say,
       regular entries from comment entries.  Not only do they have very dif‐
       ferent structures and must therefore be traversed differently by the
       library, but certain traversal functions make no sense on certain entry
       metatypes---thus it's necessary for you to be able to make the distinc‐
       tion as well.

       That said, everything you need to know to work with the AST is
       explained in bt_traversal.

DATA TYPES AND MACROS
       btparse defines several types required for the external interface.
       First, it trivially defines a "boolean" type (along with "TRUE" and
       "FALSE" macros).	 This might affect you when including the btparse.h
       header in your own code---since it's not possible for the code to
       detect if there is already a "boolean" type defined, you might have to
       define the "HAVE_BOOLEAN" pre-processor token to deactivate btparse.h's
       "typedef" of "boolean".

       Next, two enumeration types are defined: "bt_metatype" and "bt_node‐
       type".  Both of these are used extensively in the library itself, and
       are made available to users of the library because they can be found in
       nodes of the "btparse" AST (abstract syntax tree).  (I.e., querying the
       AST can give you "bt_metatype" and "bt_nodetype" values, so the "type‐
       def"s must be available to your code.)

       Entry metatype enum

       "bt_metatype_t" has the following values:

       ·   "BTE_UNKNOWN"

       ·   "BTE_REGULAR"

       ·   "BTE_COMMENT"

       ·   "BTE_PREAMBLE"

       ·   "BTE_MACRODEF"

       which are determined by the "entry type" token.	(@string entries have
       the "BTE_MACRODEF" metatype; @comment and @preamble correspond to
       "BTE_COMMENT" and "BTE_PREAMBLE"; and any other entry type has the
       "BTE_REGULAR" metatype.)

       AST nodetype enum

       "bt_nodetype" has the following values:

       ·   "BTAST_UNKNOWN"

       ·   "BTAST_ENTRY"

       ·   "BTAST_KEY"

       ·   "BTAST_FIELD"

       ·   "BTAST_STRING"

       ·   "BTAST_NUMBER"

       ·   "BTAST_MACRO"

       Of these, you'll only ever deal with the last three.  They are returned
       when you query the AST for a simple value---just seeing the text isn't
       enough to distinguish between a quoted string, a number, and a macro,
       so the AST nodetype is supplied along with the text.

       String processing option macros

       Since BibTeX is essentially a system for glueing strings together in a
       wide variety of ways, the processing done to its strings is fairly
       important.  Most of the string transformations are done outside of the
       lexer/parser; this reduces their complexity, and makes it easier to
       switch different transformations on and off.  This switching is done
       with an "options" bitmap which can be specified on a per-entry-metatype
       basis.  (That is, you can have one set of transformations done to the
       strings in all regular entries, another set done to the strings in all
       macro definition entries, and so on.)  If you need finer control than
       that, it's currently unavailable outside of the library (but it's just
       a matter of making a couple functions available and documenting
       them---so bug me if you need this feature).

       There are three basic macros for constructing this bitmap:

       "BTO_CONVERT"
	   Convert "number" values to strings.	(The conversion is trivial,
	   involving changing the type of the AST node representing the number
	   from "BTAST_NUMBER" to "BTAST_STRING".  "Number" values are stored
	   as strings of digits, just as they are in the input data.)

       "BTO_EXPAND"
	   Expand macro invocations to the full macro text.

       "BTO_PASTE"
	   Paste simple values together.

       "BTO_COLLAPSE"
	   Collapse whitespace according to the BibTeX rules.

       For instance, supplying "BTO_CONVERT ⎪ BTO_EXPAND" as the string
       options bitmap for the "BTE_REGULAR" metatype means that all simple
       values in "regular" entries will be converted to strings: numbers will
       simply have their "nodetype" changed, and macros will be expanded.
       Nothing else will be done to the simple values, though---they will not
       be concatenated, nor will whitespace be collapsed.  See the
       "bt_set_stringopts()" and "bt_parse_*()" functions in bt_input for more
       information on the various options for parsing; see bt_postprocess for
       details on the post-processing.

USING THE LIBRARY
       The following code is a skeletal example of using the btparse library:

	   #include <btparse.h>

	   int main (void)
	   {
	      bt_initialize ();

	      /* process some data */

	      bt_cleanup ();
	      exit (0);
	   }

       Please note the call to "bt_initialize()"; this is very important!
       Without it, the library may crash or fail mysteriously.	You must call
       "bt_initialize()" before calling any other btparse functions.
       "bt_cleanup()" just frees the memory allocated by "bt_initialize()"; if
       you are careful to call it before exiting, and "bt_free_ast()" on any
       abstract syntax trees generated by btparse when you are done with them,
       then your program shouldn't have any memory leaks.  (Unless they're due
       to your own code, of course!)

BUGS AND LIMITATIONS
       btparse has several inherent limitations that are due to the lexical
       scanner and parser generated by PCCTS 1.x.  In short, the scanner and
       parser are both heavily dependent on global variables, meaning that
       thread safety -- or even the ability to have two files open and being
       parsed at the same time -- is well-nigh impossible.  This will not
       change until I get with the times and adopt ANTLR 2.0, the successor to
       PCCTS -- presuming of course that it can generate more modular C scan‐
       ners and parsers.

       Another limitation that is due to PCCTS: entries with a large number of
       fields (more than about 90, if each field value is just a single
       string) will cause the parser to crash.	This is unavoidable due to the
       parser using statically-allocated stacks for attributes and abstract-
       syntax tree nodes.  I could increase the static allocation, but that
       would just decrease the likelihood of encountering the problem, not
       make it go away.	 Again, the chances of this changing as long as I'm
       using PCCTS 1.x are nil.

       Apart from those inherent limitations, there are no known bugs in
       btparse.	 Any segmentation faults or bus errors from the library should
       be considered bugs.  They probably result from using the library incor‐
       rectly (eg. attempting to interleave the parsing of two files), but I
       do make an attempt to catch all such mistakes, and if I've missed any
       I'd like to know about it.

       Any memory leaks from the library are also a concern; as long as you
       are conscientious about calling the cleanup functions ("bt_free_ast()"
       and "bt_cleanup()"), then the library shouldn't leak.

SEE ALSO
       To read and parse BibTeX data files, see bt_input.

       To traverse the syntax tree that results, see bt_traversal.

       To learn what is done to values in parsed entries, and how to customize
       that munging, see bt_postprocess.

       To learn how btparse deals with strings, see bt_strings (oops, I
       haven't written this one yet!).

       To manipulate and access the btparse macro table, see bt_macros.

       For splitting author names and lists "the BibTeX way" using btparse,
       bt_split_names.

       To put author names back together again, see bt_format_names.

       Miscellaneous functions for processing strings "the BibTeX way":
       bt_misc.

       A semi-formal language definition is in bt_language.

AUTHOR
       Greg Ward <gward@python.net>

COPYRIGHT
       Copyright (c) 1996-97 by Gregory P. Ward.

       This library is free software; you can redistribute it and/or modify it
       under the terms of the GNU Library General Public License as published
       by the Free Software Foundation; either version 2 of the License, or
       (at your option) any later version.

       This library is distributed in the hope that it will be useful, but
       WITHOUT ANY WARRANTY; without even the implied warranty of MER‐
       CHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Library
       General Public License for more details.

       You should have received a copy of the GNU Library General Public
       License along with this library; if not, write to the Free Software
       Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.

AVAILABILITY
       The btOOL home page, where you can get up-to-date information about
       btparse (and download the latest version) is

	  http://starship.python.net/~gward/btOOL/

       You will also find the latest version of Text::BibTeX, the Perl library
       that provides a high-level front-end to btparse, there.	btparse is
       needed to build "Text::BibTeX", and must be downloaded separately.

       Both libraries are also available on CTAN (the Comprehensive TeX Ar‐
       chive Network, "http://www.ctan.org/tex-archive/") and CPAN (the Com‐
       prehensive Perl Archive Network, "http://www.cpan.org/").  Look in bib‐
       lio/bibtex/utils/btOOL/ on CTAN, and authors/Greg_Ward/ on CPAN.	 For
       example,

	  http://www.ctan.org/tex-archive/biblio/bibtex/utils/btOOL/
	  http://www.cpan.org/authors/Greg_Ward

       will both get you to the latest version of "Text::BibTeX" and btparse
       -- but of course, you should always access busy sites like CTAN and
       CPAN through a mirror.

btparse, version 0.34		  2003-10-25			    btparse(3)
[top]

List of man pages available for DragonFly

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net