csv man page on DragonFly

Man page or keyword search:  
man Server   44335 pages
apropos Keyword Search (all sections)
Output format
DragonFly logo
[printable version]

CSV(3)									CSV(3)

NAME
       csv - CSV parser and writer library

SYNOPSIS
       #include <libcsv/csv.h>

       int csv_init(struct csv_parser *p, unsigned char options);
       size_t csv_parse(struct csv_parser *p,
	       const void *s,
	       size_t len,
	       void (*cb1)(void *, size_t, void *),
	       void (*cb2)(int, void *),
	       void *data);
       int csv_fini(struct csv_parser *p,
	       void (*cb1)(void *, size_t, void *),
	       void (*cb2)(int, void *),
	       void *data);
       void csv_free(struct csv_parser *p);

       unsigned char csv_get_delim(struct csv_parser *p);
       unsigned char csv_get_quote(struct csv_parser *p);
       void csv_set_space_func(struct csv_parser *p, int (*f)(unsigned char));
       void csv_set_term_func(struct csv_parser *p, int (*f)(unsigned char));

       int csv_get_opts(struct csv_parser *p);
       int csv_set_opts(struct csv_parser *p, unsigned char options);
       int csv_error(struct csv_parser *p);
       char * csv_strerror(int error);

       size_t csv_write(void *dest, size_t dest_size, const void *src,
	       size_t src_size);
       int csv_fwrite(FILE *fp, const void *src, size_t src_size);

       size_t csv_write2(void *dest, size_t dest_size, const void *src,
	       size_t src_size, unsigned char quote);
       int csv_fwrite2(FILE *fp, const void *src, size_t src_size, unsigned char quote);

       void csv_set_realloc_func(struct csv_parser *p, void *(*func)(void *, size_t));
       void csv_set_free_func(struct csv_parser *p, void (*func)(void *));
       void csv_set_blk_size(struct csv_parser *p, size_t size);
       size_t csv_get_blk_size(struct csv_parser *p);
       size_t csv_get_buffer_size(struct csv_parser *p);

DESCRIPTION
       The  CSV	 library  provides a flexible, intuitive interface for parsing
       and writing csv data.

OVERVIEW
       The idea behind parsing with libcsv is straight-forward: you initialize
       a parser object with csv_init() and feed data to the parser over one or
       more calls to csv_parse() providing callback functions that handle end-
       of-field	 and  end-of-row events.  csv_parse() parses the data provided
       calling the user-defined callback functions  as	it  reads  fields  and
       rows.   When  complete,	csv_fini()  is called to finish processing the
       current field and make a final call to the callback functions  if  nec‐
       cessary.	  csv_free()  is  then	called	to  free  the  parser  object.
       csv_error() and csv_strerror() provide information about errors encoun‐
       tered  by the functions.	 csv_write() and csv_fwrite() provide a simple
       interface for converting raw data into CSV data and storing the	result
       into a buffer or file respectively.

       CSV  is	a binary format allowing the storage of arbitrary binary data,
       files opened for reading or writing CSV data should be opened in binary
       mode.

       libcsv provides a default mode in which the parser will happily process
       any data as CSV without complaint, this is  useful  for	parsing	 files
       which  don't adhere to all the traditional rules. A strict mode is also
       supported which will cause any violation of the imposed rules to	 cause
       a parsing failure.

ROUTINES
   PARSING DATA
       csv_init()  initializes	a  pointer  to	a  csv_parser structure.  This
       structure contains housekeeping information such as the	current	 state
       of  the	parser,	 the  buffer,  current	size  and  position, etc.  The
       csv_init() function returns 0 on success	 and  a	 non-zero  value  upon
       failure.	  csv_init()  will  fail if the pointer passed to it is a null
       pointer.	 The options argument specifies the parser options, these  may
       be changed later with the csv_set_opts() function.

       OPTIONS

	      CSV_STRICT
		     Enables strict mode.

	      CSV_REPALL_NL
		     Causes  each  instance  of	 a carriage return or linefeed
		     outside of a record to be reported.

	      CSV_STRICT_FINI
		     Causes  unterminated   quoted   fields   encountered   in
		     csv_fini() to cause a parsing error (see below).

	      CSV_APPEND_NULL
		     Will  cause all fields to be nul-terminated when provided
		     to cb1, introduced in 3.0.0.

	      CSV_EMPTY_IS_NULL
		     Will cause NULL to be passed as the first argument to cb1
		     for empty, unquoted, fields.  Empty means consisting only
		     of either spaces and tabs or the values defined by the  a
		     custom   function	registered  via	 csv_set_space_func().
		     Added in 3.0.3.

       Multiple options can be specified by OR-ing them together.

       csv_parse() is the function that does the actual parsing,  it  takes  6
       arguments:

	      p is a pointer to an initialized struct csv_parser.

	      s	 is  a	pointer	 to the data to read in, such as a dynamically
	      allocated region of memory containing data read in from  a  call
	      to fread().

	      len is the number of bytes of data to process.

	      cb1  is  a  pointer to the callback function that will be called
	      from csv_parse() after an entire field has been read.  cb1  will
	      be  called  with a pointer to the parsed data (which is NOT nul-
	      terminated unless the CSV_APPEND_NULL option is set), the number
	      of  bytes	 in  the  data,	 and  the  pointer  that was passed to
	      csv_parse().

	      cb2 is a pointer to the callback function that  will  be	called
	      when  the end of a record is encountered, it will be called with
	      the character that caused the record to end, cast to an unsigned
	      char,  or	 -1  if called from csv_fini, and the pointer that was
	      passed to csv_init().

	      data is a pointer to user-defined data that will	be  passed  to
	      the callback functions when invoked.

	      cb1  and/or  cb2	may  be NULL in which case no function will be
	      called for the associated actions.  data may also	 be  NULL  but
	      the  callback  functions	must be prepared to handle receiving a
	      null pointer.

       By default cb2 is not called when rows that do not contain  any	fields
       are encountered.	 This behavior is meant to accomodate files using only
       either a linefeed or a carriage return as  a  record  seperator	to  be
       parsed  properly	 while at the same time being able to parse files with
       rows terminated by multiple characters from  resulting  in  blank  rows
       after  each actual row of data (for example, processing a text CSV file
       created that was created on a Windows machine on a Unix machine).   The
       CSV_REPALL_NL  option  will  cause cb2 to be called once for every car‐
       raige return or linefeed encountered outside of a field.	 cb2 is called
       with the character that prompted the call to the function, , cast to an
       unsigned char, either CSV_CR for carriage return, CSV_LF for  linefeed,
       or  -1 for record termination from a call to csv_fini() (see below).  A
       carriage return or linefeed within a non-quoted field always marks both
       the  end of the field and the row.  Other characters can be used as row
       terminators  and	 thus  be  provided  as	 an  argument  to  cb2	 using
       csv_set_space_func().

       Note: The first parameter of the cb1 function is void *, not const void
       *; the pointer passed to the callback function is actually a pointer to
       the  entry buffer inside the csv_parser struct, this data may safely be
       modified from the callback function (or any function that the  callback
       function	 calls) but you must not attempt to access more than len bytes
       and you should not access the data after the callback function  returns
       as  the	buffer	is dynamically allocated and its location and size may
       change during calls to csv_parse().

       Note: Different callback functions may safely be specified during  each
       call to csv_parse() but keep in mind that the callback functions may be
       called many times during a single call to csv_parse() depending on  the
       amount of data being processed in a given call.

       csv_parse() returns the number of bytes processed, on a successful call
       this will be len, if it is less than len	 an  error  has	 occured.   An
       error  can occur, for example, if there is insufficient memory to store
       the contents of the current field in the entry buffer.	An  error  can
       also  occur  if	malformed  data is encountered while running in strict
       mode.

       The csv_error() function can be used to determine what the error is and
       the  csv_strerror()  function can be used to provide a textual descrip‐
       tion of the error. csv_error() takes a single argument, a pointer to  a
       struct  csv_parser,  and returns one of the following values defined in
       csv.h:

	      CSV_EPARSE   A parse error has occured while in strict mode

	      CSV_ENOMEM   There was not enough	 memory	 while	attempting  to
	      increase the entry buffer for the current field

	      CSV_ETOOBIG  Continuing  to  process  the	 current  field	 would
	      require a buffer of more than SIZE_MAX bytes

       The  value  passed  to  csv_strerror()  should  be  one	returned  from
       csv_error().   The  return  value  of  csv_strerror() is a pointer to a
       static string. The pointer may be used for the entire lifetime  of  the
       program	and the contents will not change during execution but you must
       not attempt to modify the string it points to.

       When you have finished submitting data to csv_parse(), you need to call
       the csv_fini() function.	 This function will call the cb1 function with
       any remaining data in the entry buffer (if there is any) and  call  the
       cb2  function  unless we are already at the end of a row (the last byte
       processed was a newline character for example).	It  is	neccessary  to
       call  this function because the file being processed might not end with
       a carriage return or newline but the data that has been read in to this
       point  still needs to be submitted to the callback routines.  If cb2 is
       called from within csv_fini() it will be because the row was not termi‐
       nated  with a newline sequence, in this case cb2 will be called with an
       argument of -1.

       Note: A call to csv_fini implicitly ends the field  current  field  and
       row.   If the last field processed is a quoted field that ends before a
       closing quote is encountered, no error will  be	reported  by  default,
       even  if	 CSV_STRICT  is	 specified.   To cause csv_fini() to report an
       error in such a case, set the CSV_STRICT_FINI option  (new  in  version
       1.0.1) in addition to the CSV_STRICT option.

       csv_fini()  also	 reinitializes the parser state so that it is ready to
       be used on the next file or set of data.	 csv_fini() does not alter the
       current buffer size. If the last set of data that was being parsed con‐
       tained a very large field that increased the size of  the  buffer,  and
       you  need  to  free  that  memory  before  continuing,  you  must  call
       csv_free(), you do not need to call csv_init() again after  csv_free().
       Like  csv_parse,	 the  callback functions provided to csv_fini() may be
       NULL.  csv_fini() returns 0 on success and a non-zero value if you pass
       it a null pointer.

       After  calling  csv_fini()  you	may  continue  to  use the same struct
       csv_parser pointer without reinitializing it (in fact you must not call
       csv_init()  with	 an  initialized csv_parser object or the memory allo‐
       cated for the original structure will be lost).

       When you are finished using the csv_parser  object  you	can  free  any
       dynamically  allocated memory associated with it by calling csv_free().
       You may call csv_free() at any time, it need not be preceded by a  call
       to  csv_fini().	 You  must only call csv_free() on a csv_parser object
       that has been initialized with a successful call to csv_init().

   WRITING DATA
       libcsv provides two functions to transform raw data into CSV  formatted
       data:  the  csv_write()	function which writes the result to a provided
       buffer, and the csv_fwrite() function which  writes  the	 result	 to  a
       file.   The  functionality  of both functions is straight-forward, they
       write out a single field including the opening and closing  quotes  and
       escape each encountered quote with another quote.

       The  csv_write()	 function takes a pointer to a source buffer (src) and
       processes at most src_size characters from src.	csv_write() will write
       at  most dest_size characters to dest and returns the number of charac‐
       ters that would have been written if dest was large enough.   This  can
       be  used	 to  determine if all the characters were written and, if not,
       how large dest needs to be to write out all of the  data.   csv_write()
       may  be	called with a null pointer for the dest argument in which case
       no data is written but the size required to write out the data will  be
       returned.   The	space  needed to write out the data is the size of the
       data + number of quotes appearing in data (each one will be escaped)  +
       2  (the	leading and terminating quotes).  csv_write() and csv_fwrite()
       always surround the output data with quotes.  If src_size is very large
       (SIZE_MAX/2  or greater) it is possible that the number of bytes needed
       to represent the data, after inserting escaping quotes, will be greater
       than  SIZE_MAX.	 In  such a case, csv_write will return SIZE_MAX which
       should be interpreted as meaning the data is too large to  write	 to  a
       single field.  The csv_fwrite() function is not similiarly limited.

       csv_fwrite()  takes  a  FILE  pointer (which should have been opened in
       binary mode) and converts and writes the data pointed to by src of size
       src_size.   It returns 0 on success and EOF if there was an error writ‐
       ing to the file.	 csv_fwrite() doesn't provide the number of characters
       processed  or  written.	 If  this  functionality  is required, use the
       csv_write() function combined with fwrite().

       csv_write2() and csv_fwrite2() work similiarly but take	an  additional
       argument, the quote character to use when composing the field.

   CUSTOMIZING THE PARSER
       The  csv_set_delim()  and  csv_set_quote() functions provide a means to
       change the characters that the parser will consider the	delimiter  and
       quote  characters  respetively, cast to unsigned char.  csv_get_delim()
       and csv_get_delim() return the current delimiter and  quote  characters
       respectively.   When  csv_init()	 is  called  the  delimiter  is set to
       CSV_COMMA and the quote to CSV_QUOTE.  Note that the rest  of  the  CSV
       conventions  still  apply  when	these functions are used to change the
       delimiter and/or quote characters,  fields  containing  the  new	 quote
       character  or  delimiter	 must  be  quoted and quote characters must be
       escaped with an immediately preceeding instance of the same  character.
       Additionally,  the csv_set_space_func() and csv_set_term_func() allow a
       user-defined function to be provided which will be used determine  what
       constitutes  a space character and what constitutes a record terminator
       character.  The space characters determine which characters are removed
       from  the  beginning  and  end  of non-quoted fields and the terminator
       characters govern when a record ends.  When csv_init() is  called,  the
       effect  is  as if these functions were each called with a NULL argument
       in which case no function is called and CSV_SPACE and CSV_TAB are  used
       for  space  characters,	and  CSV_CR and CSV_LF are used for terminator
       characters.

       csv_set_realloc_func() can be used to set the function that  is	called
       when the internal buffer needs to be resized, only realloc, not malloc,
       is used internally; the default is to use the  standard	realloc	 func‐
       tion.  Likewise, csv_set_free_func() is used to set the function called
       to free the internal buffer, the default is the standard free function.

       csv_get_blk_size() and csv_set_blk_size() can be used to	 get  and  set
       the  block  size	 of  the  parser  respectively.	 The block size if the
       amount of extra memory allocated every time the internal	 buffer	 needs
       to be increased, the default is 128.  csv_get_buffer_size() will return
       the current number of bytes allocated for the internal buffer.

THE CSV FORMAT
       Although quite prevelant there is  no  standard	for  the  CSV  format.
       There are however, a set of traditional conventions used by many appli‐
       cations.	 libcsv follows the conventions described  at  http://www.cre‐
       ativyst.com/Doc/Articles/CSV/CSV01.htm  which  seem to reflect the most
       common usage of the format, namely:

	      Fields are seperated with commas.

	      Rows are delimited by newline sequences (see below).

	      Fields may be surrounded with quotes.

	      Fields that contain comma, quote, or newline characters MUST  be
	      quoted.

	      Each instance of a quote character must be escaped with an imme‐
	      diately preceding quote character.

	      Leading and trailing spaces and tabs are removed from non-quoted
	      fields.

	      The final line need not contain a newline sequence.

       In  strict  mode, any detectable violation of these rules results in an
       error.

       RFC 4180 is an informational memo which attempts to  document  the  CSV
       format, especially with regards to its use as a MIME type.  There are a
       several parts of the description documented in this memo	 which	either
       do not accurately reflect widely used conventions or artificially limit
       the usefulness of the format.  The  differences	between	 the  RFC  and
       libcsv are:

	      "Each  line  should contain the same number of fields throughout
	      the file"
		     libcsv doesn't care if every record contains a  different
		     number  of	 fields,  such	a  restriction could easily be
		     enforced by the application itself if desired.

	      "Spaces are considered  part  of	a  field  and  should  not  be
	      ignored"
		     Leading  and  trailing spaces that are part of non-quoted
		     fields are ignored as this is  by	far  the  most	common
		     behavior and expected by many applications.

		     abc ,  def

		     is considered equivalent to:

		     "abc", "def"

	      "The last field in the record must not be followed by a comma"
		     The  meaning  of  this  statement is not clear but if the
		     last character of a record is a comma, libcsv will inter‐
		     pret that as a final empty field, i.e.:

		     "abc", "def",

		     will be interpreted as 3 fields, equivalent to:

		     "abc", "def", ""

	      RFC  4180 limits the allowable characters in a CSV field, libcsv
	      allows any character to  be  present  in	a  field  provided  it
	      adheres  to the conventions mentioned above.  This makes it pos‐
	      sible to store binary data in CSV format, an attribute that many
	      application rely on.

	      RFC 4180 states that a Carriage Return plus Linefeed combination
	      is used to delimit records, libcsv  allows  any  combination  of
	      Carriage	Returns	 and Linefeeds to signify the end of a record.
	      This is to increase portability among systems that use different
	      combinations to denote a newline sequence.

PARSING MALFORMED DATA
       libcsv  should  correctly parse any CSV data that conforms to the rules
       discussed above.	 By default, however,  libcsv  will  also  attempt  to
       parse  malformed	 CSV  data such as data containing unescaped quotes or
       quotes within non-quoted fields.	 For example:

       a"c, "d"f"

       would be parsed equivalently to the correct form:

       "a""c", "d""f"

       This is often desirable as there are  some  applications	 that  do  not
       adhere  to the specifications previously discussed.  However, there are
       instances where malformed CSV data is ambigious, namely when a comma or
       newline is the next non-space character following a quote such as:

       "Sally said "Hello", Wally said "Goodbye""

       This could either be parsed as a single field containing the data:

       Sally said "Hello", Wally said "Goodbye"

       or as 2 seperate fields:

       Sally said "Hello and Wally said "Goodbye""

       Since  the  data	 is  malformed,	 there	is no way to know if the quote
       before the comma is meant to be a literal quote or if it signifies  the
       end  of	the field.  This is of course not an issue for properly formed
       data as all quotes must be escaped.  libcsv will parse this example  as
       2 seperate fields.

       libcsv  provides a strict mode that will return with a parse error if a
       quote is seen inside a non-quoted field or if a	non-escaped  quote  is
       seen whose next non-space character isn't a comma or newline sequence.

PARSER DETAILS
       A field is considered quoted if the first non-space character for a new
       field is a quote.

       If a quote is encountered in a quoted  field  and  the  next  non-space
       character  is a comma, the field ends at the closed quote and the field
       data is submitted when the comma is encountered.	 If the next non-space
       character  after	 a quote is a newline character, the row has ended and
       the field data is submitted and the end of row is  signalled  (via  the
       appropriate  callback  function).   If two quotes are immediately adja‐
       cent, the first one is interpreted as escaping the second one  and  one
       quote  is written to the field buffer.  If the next non-space character
       following a quote is anything else, the quote is interpreted as a  non-
       escaped	literal quote and it and what follows are written to the field
       buffer, this would cause a parse error in strict mode.

       Example 1
       "abc"""
       Parses as: abc"
       The first quote marks the field as quoted, the second quote escapes the
       following  quote	 and  the last quote ends the field.  This is valid in
       both strict and non-strict modes.

       Example 2
       "ab"c
       Parses as: ab"c
       The first qute marks the field as quoted, the second quote is taken  as
       a  literal  quote since the next non-space character is not a comma, or
       newline and the quote is not escaped.  The last quote  ends  the	 field
       (assuming there is a newline character following).  A parse error would
       result upon seeing the character c in strict mode.

       Example 3
       "abc" "
       Parses as: abc"
       In this case, since the next non-space character following  the	second
       quote  is not a comma or newline character, a literal quote is written,
       the space character after is part of the field, and the last quote ter‐
       minated	the field.  This demonstrates the fact that a quote must imme‐
       diately precede another quote to escape it.  This would	be  a  strict-
       mode violation as all quotes are required to be escaped.

       If the field is not quoted, any quote character is taken as part of the
       field data, any comma terminated the field, and any  newline  character
       terminated the field and the record.

       Example 4
       ab""c
       Parses as: ab""c
       Quotes  are not considered special in non-quoted fields.	 This would be
       a strict mode violation since quotes may not exist in non-quoted fields
       in strict mode.

EXAMPLES
       The  following  example prints the number of fields and rows in a file.
       This is a simplified version of the csvinfo  program  provided  in  the
       examples	 directory.   Error  checking  not  related to libcsv has been
       removed for clarity, the csvinfo program also provides  an  option  for
       enabling strict mode and handles multiple files.

	      #include <stdio.h>
	      #include <string.h>
	      #include <errno.h>
	      #include <stdlib.h>
	      #include "libcsv/csv.h"

	      struct counts {
		long unsigned fields;
		long unsigned rows;
	      };

	      void cb1 (void *s, size_t len, void *data) {
		((struct counts *)data)->fields++; }
	      void cb2 (int c, void *data) {
		((struct counts *)data)->rows++; }

	      int main (int argc, char *argv[]) {
		FILE *fp;
		struct csv_parser p;
		char buf[1024];
		size_t bytes_read;
		struct counts c = {0, 0};

		if (csv_init(&p, 0) != 0) exit(EXIT_FAILURE);
		fp = fopen(argv[1], "rb");
		if (!fp) exit(EXIT_FAILURE);

		while ((bytes_read=fread(buf, 1, 1024, fp)) > 0)
		  if (csv_parse(&p, buf, bytes_read, cb1, cb2, &c) != bytes_read) {
		    fprintf(stderr, "Error while parsing file: %s\n",
		    csv_strerror(csv_error(&p)) );
		    exit(EXIT_FAILURE);
		  }

		csv_fini(&p, cb1, cb2, &c);

		fclose(fp);
		printf("%lu fields, %lu rows\n", c.fields, c.rows);

		csv_free(&p);
		exit(EXIT_SUCCESS);
	      }

       See the examples directory for several complete example programs.

AUTHOR
       Written by Robert Gamble.

BUGS
       Please send questions, comments, bugs, etc. to:

	       rgamble@users.sourceforge.net

				9 January 2013				CSV(3)
[top]

List of man pages available for DragonFly

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net