dirfile-encoding man page on OpenSuSE

dirfile-encoding man page on OpenSuSE
Man page or keyword search:
man Server 25941 pages
apropos Keyword Search (all sections)
Output format
dirfile-encoding(5)		 DATA FORMATS		   dirfile-encoding(5)

NAME
       dirfile-encoding — dirfile database encoding schemes

DESCRIPTION
       The  Dirfile Standards indicate that RAW fields defined in the database
       are accompanied by binary files containing the field data in the speci‐
       fied  simple  data type.	 In certain situations, it may be advantageous
       to convert the binary files in the  database  into  a  more  convenient
       form.  This is accomplished by encoding the binary file into the alter‐
       nate form.  A common use-case for encoding a binary file is to compress
       it  to  save  disk space.  Only data is modified by an encoding scheme.
       Database metadata is never encoded.

       Support for encoding schemes is optional.  An implementation  need  not
       support	any  particular	 encoding  scheme, or may only support certain
       operations with it, but should expect  to  encounter  unknown  encoding
       schemes and fail gracefully in such situations.

       Additionally, how a particular encoding is implemented is not specified
       by the Dirfile Standards, but, for purposes  of	interoperability,  all
       dirfile	implementations	 are encouraged to support the encoding imple‐
       mentation used by the GetData dirfile reference implementation,	elabo‐
       rated below.

       An  encoding  scheme  is	 local	to the particular format specification
       fragment in which it is indicated.  This allows	a  single  dirfile  to
       have  binary files which are stored using multiple encodings, by having
       them defined in multiple fragments.

       The rest of this manual page discusses specifics of the encoding frame‐
       work  implemented  in the GetData library, and does not constitute part
       of the Dirfile Standards.

THE GETDATA ENCODING FRAMEWORK
       The GetData library provides  an	 encoding  framework  which  abstracts
       binary  file  I/O,  allowing  for generic support for a wide variety of
       encoding schemes.  Functions which may make use of the encoding	frame‐
       work are:

	      gd_add(3),  gd_add_raw(3), gd_add_spec(3), gd_alter_encoding(3),
	      gd_alter_endianness(3),		      gd_alter_frameoffset(3),
	      gd_alter_entry(3),  gd_alter_raw(3),  gd_alter_spec(3),  gd_get‐
	      data(3),	 gd_move(3),   gd_nframes(3),	 gd_putdata(3),	   and
	      gd_rename(3).

       Most  of	 the  encodings	 supported  by GetData are implemented through
       external libraries which handle the actual file I/O and	data  transla‐
       tion.   All  such  libraries are optional; a build of the library which
       omits an external library will lack support for the associated encoding
       scheme.	 In this case, GetData will still properly identify the encod‐
       ing scheme, but attempts to use GetData for file I/O via	 the  encoding
       will fail with the GD_E_UNSUPPORTED error code.

       GetData discovers the encoding scheme of a particular RAW field by not‐
       ing the filename extension of files associated with the field.	Binary
       files which form an unencoded dirfile have no file extension.  The file
       extension used by the other encodings are noted below.	Encoding  dis‐
       covery  proceeds	 by  searching	for  files with the known list of file
       extensions (in an unspecified order) and stopping when the  first  suc‐
       cessful	match is made.	Because of this, when the a field has multiple
       data files with different, supported file extensions which could legit‐
       imately	be  associated with it, the encoding scheme discovered by Get‐
       Data is not well defined.

       In addition to raw  (unencoded)	data,  GetData	supports  eight	 other
       encoding	 schemes:  text	 encoding, bzip2 encoding, gzip encoding, lzma
       encoding, sie (sample-index encoding), slim  encoding,  zzip  encoding,
       and zzslim encoding, all discussed below.

       The text encoding and the sample-index encoding are implemented by Get‐
       Data natively and need no external library.   As	 a  result,  they  are
       always present in the library.

   BZip2 Encoding
       The BZip2 Encoding reads compressed raw binary files using the Burrows-
       Wheeler block sorting text compression algorithm and Huffman coding, as
       implemented  in	the  bzip2 format.  GetData's BZip2 Encoding scheme is
       implemented through the bzip2 compression  library  written  by	Julian
       Seward.	GetData's BZip2 Encoding framework currently lacks write capa‐
       bilities; as a result the BZip2 Encoding	 does  not  support  functions
       which modify binary data.

       GetData	caches	an  uncompressed  megabyte  of data at a time to speed
       access times.  A call to gd_nframes(3) requires	decompression  of  the
       entire  binary  file  to	 determine its uncompressed size, and may take
       some time to complete.  The file extension of  the  BZip2  Encoding  is
       .bz2.

   GZip Encoding
       The  GZip  Encoding compresses raw binary files using Lempel-Ziv coding
       (LZ77) as implemented in the  gzip  format.   GetData's	GZip  Encoding
       scheme  is implemented through the the zlib compression library written
       by Jean-loup Gailly and Mark Adler. All operations are supported by the
       GZip  Encoding.	 Writes	 to GZip encoded data occur out-of-place; that
       is: writing GZip Encoded data requires  making  a  copy	of  the	 whole
       binary data file.  A side effect of this is that concurrently reading a
       GZip Encoded Dirfile which is being written to usually doesn't work.

       To speed the operation of gd_nframes(3), the GZip  Encoding  takes  the
       uncompressed  size  of  the  file  the  gzip footer, which contains the
       file's uncompressed size in bytes, modulo 2**32.	 As a result, using  a
       field  with an (uncompressed) binary file size larger than 4 GiB as the
       reference field will  result  in	 the  wrong  number  of	 frames	 being
       reported.  The file extension of the GZip Encoding is .gz.

   LZMA Encoding
       The  LZMA  Encoding reads compressed raw binary files using the Lempel-
       Ziv Markov Chain Algorithm (LZMA) as implemented in  the	 xz  container
       format.	GetData's LZMA Encoding scheme is implemented through the lzma
       library, part of the XZ Utils suite  written  by	 Lasse	Collin,	 Ville
       Koskinen, and Igor Pavlov.  GetData's LZMA Encoding framework currently
       lacks write capabilities; as a result the LZMA Encoding does  not  sup‐
       port functions which modify binary data.

       As  with the BZip2 Encoding, GetData caches an uncompressed megabyte of
       data at a time to speed access times.  A call to gd_nframes(3) requires
       decompression  of  the entire binary file to determine its uncompressed
       size, and may take some time to complete.  The file  extension  of  the
       LZMA Encoding is .xz, or .lzma.

   Sample-Index Encoding
       The Sample-Index Encoding (SIE) compresses raw binary data by replacing
       runs of repeated data, similar to run-length encoding.  SIE files  con‐
       tain  binary records consisting of a 64-bit sample number followed by a
       datum (the size and format of which is determined by  the  RAW  field's
       data  type  in  the  format metadata).  The sample number indicates the
       last sample of the field which has the specified value.	The first sam‐
       ple  with the value is the sample immediately following the data in the
       previous record, or sample number zero, for the first  record.	Sample
       numbers are relative to any /FRAMEOFFSET specified in the Dirfile meta‐
       data.  All operations are supported by the Sample-Index Encoding.   The
       file extension of the Sample-Index Encoding is .sie.

   Slim Encoding
       The  Slim  Encoding reads compressed raw binary files using the slimlib
       compression library written by Joseph Fowler.  The slimlib library  was
       developed  at Princeton University to compress dirfile-like data.  Get‐
       Data's Slim Encoding framework currently lacks write capabilities; as a
       result, the Slim Encoding does not support function which modify binary
       files.  The file extension of the Slim Encoding is .slm.

       Using the Slim Encoding with GetData may result in unexpected, but man‐
       ageable, memory usage.  See the gd_getdata(3) manual page for details.

   Text Encoding
       The Text Encoding replaces the binary data files with 7-bit ASCII files
       containing a decimal text encoding of the data, one  sample  per	 line.
       All  operations are supported by the Text Encoding.  The file extension
       of the Text Encoding is .txt.

   ZZip Encoding
       The ZZip Encoding reads compressed raw binary files using  the  DEFLATE
       algorithm  as  implemented  in the PKWARE ZIP archive container format.
       GetData's ZZip Encoding scheme is implemented through the zzip  library
       written	by Tomi Ollila and Guido Draheim.  The ZZip Encoding framework
       currently lacks write capabilities; as a result the ZZip Encoding  does
       not support functions which modify binary data.

       Unlike  most encoding schemes, the ZZip encoding merges all binary data
       files defined in a given fragment into a single ZIP archive.  The  name
       of  this	 archive  is  raw.zip  by default, but a different name may be
       specified using the second parameter to the /ENCODING  directive.   For
       example,

	      /ENCODING zzip archive

       indicates  that the ZIP archive is called archive.zip.  The file exten‐
       sion of the ZZip Encoding is .zip.

   ZZSlim Encoding
       The ZZSlim Encoding is a convolution of the Slim Encoding and the  ZZip
       Encoding.   To create ZZSlim Encoded files, first the raw data are com‐
       pressed using the slim library, and then	 these	slim-compressed	 files
       are  archived  (and  compressed again) into a ZIP archive.  As with the
       ZZip Encoding, the ZIP archive is raw.zip by default, but  a  different
       name may be specified with the /ENCODING directive.

       Notably,	 since	the  archives have the same name as ZZip Encoded data,
       automatic encoding detection on ZZSlim Encoded data always fails:  they
       are  incorrectly	 identified  as	 simply ZZip Encoded.  As a result, an
       /ENCODING directive in the format file or else a GD_ZZSLIM_ENCODED flag
       passed to gd_open(3) is required to read ZZSlim encoded data.  The file
       extension of the ZZSlim Encoding is .zip.

       Using the ZZSlim Encoding with GetData may result  in  unexpected,  but
       manageable,  memory  usage.   See  the  gd_getdata(3)  manual  page for
       details.

AUTHOR
       This manual page was written by D. V. Wiebe <dvw@ketiltrout.net>.

SEE ALSO
       dirfile(5), dirfile-format(5), bzip2(1), gzip(1), xz(1), zlib(3).

Standards Version 9		26 January 2013		   dirfile-encoding(5)
[top]

List of man pages available for OpenSuSE

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome