gd_cbopen(3) GETDATA gd_cbopen(3)NAME
gd_cbopen, gd_open — open or create a dirfile
SYNOPSIS
#include <getdata.h>
DIRFILE* gd_cbopen(const char *dirfilename, unsigned long flags,
gd_parser_callback_t sehandler, void *extra);
DIRFILE* gd_open(const char *dirfilename, unsigned long flags);
DESCRIPTION
The gd_cbopen() function opens or creates the dirfile specified by
dirfilename, returning a DIRFILE object associated with it. Opening a
dirfile will cause the library to read and parse the dirfile's format
specification (see dirfile-format(5)).
If not NULL, sehandler should be a pointer to a function which will be
called whenever a syntax error is encountered during parsing the format
specification. Specify NULL for this parameter if no callback function
is to be used. The caller may use this function to correct the error
or modify the error handling of the format specification parser. See
The Callback Function section below for details on this function. The
extra argument allows the caller to pass data to the callback function.
The pointer will be passed to the callback function verbatim.
The gd_open() function is equivalent to gd_cbopen(), with sehandler and
extra set to NULL.
The flags argument should include one of the access modes: GD_RDONLY
(read-only) or GD_RDWR (read-write), and may also contain zero or more
of the following flags, bitwise-or'd together:
GD_ARM_ENDIAN
GD_NOT_ARM_ENDIAN
Specifies that double precision floating point raw data on disk
are, or are not, stored in the middle-endian format used by old‐
er ARM processors.
These flag only set the default endianness, and will be overrid‐
den when an /ENDIAN directive specifies the byte sex of RAW
fields, unless GD_FORCE_ENDIAN is also specified.
On every platform, one of these flags (GD_NOT_ARM_ENDIAN on all
but middle-ended ARM systems) indicates the native behaviour of
the platform. That symbol will equal zero, and may be omitted.
GD_BIG_ENDIAN
GD_LITTLE_ENDIAN
Specifies the default byte sex of raw data stored on disk to be
either big-endian (most significant byte first) or little-endian
(least significant byte first). Omitting both flags indicates
the default should be the native endianness of the platform.
Unlike the ARM endianness flags above, neither of these symbols
is ever zero. Specifying both these flags together will cause
the library to assume that the endianness of the data is oppo‐
site to that of the native architecture, whatever that might be.
These flag only set the default endianness, and will be overrid‐
den when an /ENDIAN directive specifies the byte sex of RAW
fields, unless GD_FORCE_ENDIAN is also specified.
GD_CREAT
An empty dirfile will be created, if one does not already exist.
This will create both the dirfile directory and an empty format
specification file called format. The directory will have have
mode S_IRWXU | S_IRWXG | S_IRWXO (0777), modified by the call‐
er's umask value (see umask(2)). The format file will have mode
S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH
(0666), also modified by the caller's umask.
The owner of the dirfile directory and format file will be the
effective user ID of the caller. Group ownership follows the
rules outlined in mkdir(2).
GD_EXCL
Ensure that this call creates a dirfile: when specified along
with GD_CREAT, the call will fail if the dirfile specified by
dirfilename already exists. If GD_CREAT is not specified, this
flag is ignored. This flag suffers from all the limitations of
the O_EXCL flag as indicated in open(2).
GD_FORCE_ENCODING
Specifies that /ENCODING directives (see dirfile-format(5))
found in the dirfile format specification should be ignored.
The encoding scheme specified in flags will be used instead (see
below).
GD_FORCE_ENDIAN
Specifies that /ENDIAN directives (see dirfile-format(5)) found
in the dirfile format specification should be ignored. All raw
data will be assumed to have the byte sex indicated through the
presence or absence of the GD_ARM_ENDIAN, GD_BIG_ENDIAN, GD_LIT‐
TLE_ENDIAN, and GD_NOT_ARM_ENDIAN flags.
GD_IGNORE_DUPS
If the dirfile format metadata specifies more than one field
with the same name, all but one of them will be ignored by the
parser. Without this flag, parsing would fail with the
GD_E_FORMAT error, possibly resulting in invocation of the reg‐
istered callback function. Which of the duplicate fields is
kept is not specified. As a result, this flag is typically only
useful in the case where identical copies of a field specifica‐
tion line are present.
No indication is provided to indicate whether a duplicate field
has been discarded. If finer grained control is required, the
caller should handle GD_E_FORMAT_DUPLICATE suberrors itself with
an appropriate callback function.
GD_PEDANTIC
Reject dirfiles which don't conform to the Dirfile Standards.
See the Standards Compliance section below for full details.
GD_PERMISSIVE
Allow non-compliant format specification syntax, even when given
along with a conflicting /VERSION directive. See the Standards
Compliance section below for full details.
GD_PRETTY_PRINT
When dirfile metadata are flushed to disk (either explicitly via
gd_metaflush(3), gd_rewrite_fragment(3), or gd_flush(3) or im‐
plicitly by closing the dirfile), an attempt will be made to
create a nicer looking format specification (from a human-read‐
able standpoint). What this explicitly means is not part of the
API, and any particular behaviour should not be relied on. If
the dirfile is opened read-only, this flag is ignored.
GD_TRUNC
If dirfilename specifies an already existing dirfile, it will be
truncated before opening. Since gd_cbopen() decides whether
dirfilename specifies an existing dirfile before attempting to
parse the dirfile, dirfilename is considered to specify an ex‐
isting dirfile if it refers to a directory containing a regular
file called format, regardless of the content or form of that
file.
Truncation occurs by deleting every regular file and symlink in
the specified directory, whether the files were referred to by
the dirfile before truncation or not. Accordingly, this flag
should be used with caution. Unless GD_TRUNCSUB is also speci‐
fied, subdirectories are left untouched. Notably, this opera‐
tion does not consider directories used in /INCLUDE directives.
If the dirfile does not exist, this flag is ignored.
GD_TRUNCSUB
If specified along with GD_TRUNC, truncation will descend into
subdirectories, deleting all regular files and symlinks recur‐
sively. It does not descend into directories pointed to by sym‐
bolic links: in these cases, just the symlink itself is deleted.
If specified without an accompanying GD_TRUNC, this flag is ig‐
nored.
GD_VERBOSE
Specifies that whenever an error is triggered by the library
when working on this dirfile, the corresponding error string,
which can be retrieved by calling gd_error_string(3), should be
written on the caller's standard error stream (stderr(3)) by
GetData. The error string may be prefixed by a string specified
by the caller; see gd_verbose_prefix(3). Without this flag,
GetData writes nothing to standard error. (GetData never writes
to standard output.)
Those flags which affect the operation of the library beyond this call
itself may be modified later using the gd_flags(3) function.
The flags argument may also be bitwise or'd with one of the following
symbols indicating the default encoding scheme of the dirfile. Like
the endianness flags, the choice of encoding here is ignored if the en‐
coding is specified in the dirfile itself, unless GD_FORCE_ENCODED is
also specified. If none of these symbols is present, GD_AUTO_ENCODED
is assumed, unless the gd_cbopen() call results in creation or trunca‐
tion of the dirfile. In that case, GD_UNENCODED is assumed. See
dirfile-encoding(5) for details on dirfile encoding schemes.
GD_AUTO_ENCODED
Specifies that the encoding type is not known in advance, but
should be detected by the GetData library. Detection is accom‐
plished by searching for raw data files with extensions appro‐
priate to the encoding scheme. This method will notably fail if
the the library is called via putdata(3) to create a previously
non-existent raw field unless a read is first successfully per‐
formed on the dirfile. Once the library has determined the en‐
coding scheme for the first time, it remembers it for subsequent
calls.
GD_BZIP2_ENCODED
Specifies that raw data files are compressed using the Burrows-
Wheeler block sorting text compression algorithm and Huffman
coding, as implemented in the bzip2 format.
GD_GZIP_ENCODED
Specifies that raw data files are compressed using Lempel-Ziv
coding (LZ77) as implemented in the gzip format.
GD_LZMA_ENCODED
Specifies that raw data files are compressed using the Lempel-
Ziv Markov Chain Algorithm (LZMA) as implemented in the xz con‐
tainer format.
GD_SLIM_ENCODED
Specifies that raw data files are compressed using the slimlib
library.
GD_SIE_ENCODED
Specified that raw data files are sample-index encoded, similar
to run-length encoding, suitable for data that change rarely.
GD_TEXT_ENCODED
Specifies that raw data files are encoded as text files contain‐
ing one data sample per line.
GD_UNENCODED
Specifies that raw data files are not encoded, but written as
simply binary data to disk.
GD_ZZIP_ENCODED
Specifies that raw data files are compressed using the DEFLATE
algorithm. All raw data files for a given fragment are collect‐
ed together and stored in a PKZIP archive called raw.zip.
GD_ZZSLIM_ENCODED
Specifies that raw data files are compressed using a combina‐
tions of compression schemes: first files are slim-compressed,
as with the GD_SLIM_ENCODED scheme, and then they are collected
together and compressed (again) into a PKZIP archive called
raw.zip, as in the GD_ZZIP_ENCODED scheme.
Standards Compliance
The latest Dirfile Standards Version which this release of GetData un‐
derstands is provided in the preprocessor macro GD_DIRFILE_STAN‐
DARDS_VERSION defined in getdata.h. GetData is able to open and parse
any dirfile which conforms to this Standards Version, or to any earlier
Version. The dirfile-format(5) manual page lists the changes between
Standards Versions.
The GetData parser can operate in two modes: a permissive mode, in
which much non-Standards-compliant syntax is allowed, and a pedantic
mode, in which the parser adheres strictly to the Standards. The mode
made change during the parsing of a dirfile. If GD_PEDANTIC is passed
to gd_cbopen(), the parser will start parsing the format specification
in pedantic mode, otherwise it will start in permissive mode.
Permissive mode is provided primarily to allow GetData to be used on
dirfiles which conform to no single Standard, but which were accepted
by the GetData parser in previous versions. It is notably lax regard‐
ing reserved field names, and field name characters, the mixing of old
and new data type specifiers, and generally ignores the presence of
/VERSION directives. In read-write mode, permissive mode should be
used with caution, as it can cause unintentional corruption of dirfile
metadata on write, if the heuristics in the parser incorrectly guessed
the intention of non-compliant syntax. In permissive mode, actual syn‐
tax errors are still reported as such.
In pedantic mode, the parser conforms to one specific Standards Ver‐
sion. This target version may change any number of times in the course
of scanning a single format specification. If invoked using the
GD_PEDANTIC flag, the parser will start in pedantic mode with a target
version equal to GD_DIRFILE_STANDARDS_VERSION. Whenever a /VERSION di‐
rective is encountered in the format specification, the target version
is changed to the Standards Version specified. When encountering a
/VERSION directive in permissive mode, the parser will switch to pedan‐
tic mode, unless the GD_PERMISSIVE flag was passed to gd_cbopen(), in
which case no mode switch will take place.
Independent of the mode of the parser when parsing the format specifi‐
cation, GetData will calculate a list of Standards Versions to which
the parsed metadata conform to. The gd_dirfile_standards(3) function
can provide this information, and also specify the desired Standards
Version for writing format metadata back to disk.
The Callback Function
The caller-supplied sehandler function is called whenever the format
specification parser encounters a syntax error (i.e. whenever it would
return the GD_E_FORMAT error). This callback may be used to correct
the error, or to tell the parser how to recover from it.
This function should take two pointers as arguments, and return an int:
int sehandler(gd_parser_data_t *pdata, void *extra);
The extra parameter is the pointer supplied to gd_cbopen(), passed ver‐
batim to this function. It can be used to pass caller data to the
callback. GetData does not inspect this pointer, not even to check its
validity. If the caller needs to pass no data to the callback, it may
be NULL.
The gd_parser_data_t type is a structure with at least the following
members:
typedef struct {
const DIRFILE* dirfile;
int suberror;
int linenum;
const char* filename;
char* line;
size_t buflen;
...
} gd_parser_data_t;
The pdata->dirfile member will be a pointer to a DIRFILE object suit‐
able only for passing to gd_error_string(). Notably, the caller should
not assume this pointer will be the same as the pointer eventually re‐
turned by gd_cbopen(), nor that it will be valid after the callback
function returns.
The pdata->suberror parameter will be one of the following symbols in‐
dicating the type of syntax error encountered:
GD_E_FORMAT_ALIAS
The parent specified for a meta field was an alias.
GD_E_FORMAT_BAD_LINE
The line was indecipherable. Typically this means that the line
contained neither a reserved word, nor a field type.
GD_E_FORMAT_BAD_NAME
The specified field name was invalid.
GD_E_FORMAT_BAD_SPF
The samples-per-frame of a RAW field was out-of-range.
GD_E_FORMAT_BAD_TYPE
The data type of a RAW field was unrecognised.
GD_E_FORMAT_BITNUM
The first bit of a BIT field was out-of-range.
GD_E_FORMAT_BITSIZE
The last bit of a BIT field was out-of-range.
GD_E_FORMAT_CHARACTER
An invalid character was found in the line, or a character es‐
cape sequence was malformed.
GD_E_FORMAT_DUPLICATE
The specified field name already exists.
GD_E_FORMAT_ENDIAN
The byte sex specified by an /ENDIAN directive was unrecognised.
GD_E_FORMAT_LITERAL
An unexpected character was encountered in a complex literal.
GD_E_FORMAT_LOCATION
The parent of a metafield was defined in another fragment.
GD_E_FORMAT_META_META
An attempt was made to use a metafield as the parent to a new
metafield.
GD_E_FORMAT_METARAW
An attempt was made to add a RAW metafield.
GD_E_FORMAT_MPLEXVAL
A MPLEX specification has a negative period.
GD_E_FORMAT_N_FIELDS
The number of fields of a LINCOM field was out-of-range.
GD_E_FORMAT_N_TOK
An insufficient number of tokens was found on the line.
GD_E_FORMAT_NO_FIELD
The parent of a metafield was not found.
GD_E_FORMAT_NUMBITS
The number of bits of a BIT field was out-of-range.
GD_E_FORMAT_PROTECT
The protection level specified by a /PROTECT directive was un‐
recognised.
GD_E_FORMAT_RES_NAME
A field was specified with the reserved name INDEX (or with the
reserved name FILEFRAM in a dirfile conforming to Standards Ver‐
sion 5 or earlier).
GD_E_FORMAT_UNTERM
The last token of the line was unterminated.
GD_E_FORMAT_WINDOP
The operation in a WINDOW field was not recognised.
pdata->filename and pdata->linenum members contains the pathname of the
fragment and line number where the syntax error was encountered. The
first line in a fragment is line one.
The pdata->line member contains a copy of the line containing the syn‐
tax error. This line may be freely modified by the callback function.
It will then be reparsed if the callback function returns the symbol
GD_SYNTAX_RESCAN (see below). The size of the memory buffer (which may
be greater than the length of the actual string) is provided in pda‐
ta->buflen, and space is available for at least GD_MAX_LINE_LENGTH
bytes. A larger buffer may be used if desired, by assigning a pointer
to the new buffer of the desired length to pdata->line. The new buffer
should be allocated with malloc(3). It will be freed by the parser.
Do not call free(3) or realloc(3) on the original pointer passed to the
callback as pdata->line: it, too, will be freed by the parser.
The callback function should return one of the following symbols, which
tells the parser how to subsequently handle the error:
GD_SYNTAX_ABORT
The parser should immediately abort parsing the format specifi‐
cation and fail with the error GD_E_FORMAT. This is the default
behaviour, if no callback function is provided (or if the parser
is invoked by calling gd_open()).
GD_SYNTAX_CONTINUE
The parser should continue parsing the format specification.
However, once parsing has finished, the parser will fail with
the error GD_E_FORMAT, even if no further syntax errors are en‐
countered. This behaviour may be used by the caller to identify
all lines containing syntax errors in the format specification,
instead of just the first one.
GD_SYNTAX_IGNORE
The parser should ignore the line containing the syntax error
completely, and carry on parsing the format specification. If
no further errors are encountered, the dirfile will be success‐
fully opened.
GD_SYNTAX_RESCAN
The parser should rescan the line argument, which replaces the
line which originally contained the syntax error. The line is
assumed to have been corrected by the callback function. If the
line still contains a syntax error, the callback function will
be called again.
Note: the line is not corrected on disk; however, the caller may
subsequently correct the fragment on disk by calling gd_re‐
write_fragment(3).
The callback function handles only syntax errors. The parser may still
abort early, if a different kind of library error is encountered. Fur‐
thermore, although a line may contain more than one syntax error, the
parser will only ever report one syntax error per line, even if the
callback function returns GD_SYNTAX_CONTINUE.
RETURN VALUE
A call to gd_cbopen() or gd_open() always returns a pointer to a newly
allocated DIRFILE object, except in instances when it is unable to al‐
locate memory for the DIRFILE object itself, in which case it will re‐
turn NULL. The DIRFILE object is an opaque structure containing the
parsed dirfile metadata. If an error occurred, the dirfile error will
be set to a non-zero error value. The DIRFILE object will also be in‐
ternally flagged as invalid. Possible error values are:
GD_E_ACCMODE
The library was asked to create or truncate a dirfile opened
read-only (i.e. GD_CREAT or GD_TRUNC was specified in flags
along with GD_RDONLY).
GD_E_ALLOC
The library was unable to allocate memory.
GD_E_BAD_REFERENCE
The reference field specified by a /REFERENCE directive in the
format specification (see dirfile-format(5)) was not found, or
was not a RAW field.
GD_E_CALLBACK
The registered callback function, sehandler, returned an un‐
recognised response.
GD_E_CREAT
The library was unable to create the dirfile.
GD_E_EXISTS
The dirfile already exists and both GD_CREAT and GD_EXCL were
specified.
GD_E_FORMAT
A syntax error occurred in the format specification. See also
The Callback Function section above.
GD_E_LINE_TOO_LONG
The parser encountered a line in the format specification
longer than it was able to deal with. Lines are limited by the
storage size of ssize_t. On 32-bit systems, this limits format
specification lines to 2**31 bytes. The limit is larger on
64-bit systems.
GD_E_OPEN
The dirfile format specification could not be opened, or
dirfilename does not specify a valid dirfile.
GD_E_OPEN_FRAGMENT
A file specified in an /INCLUDE directive could not be opened.
GD_E_TRUNC
The library was unable to truncate the dirfile.
The dirfile error may be retrieved by calling gd_error(3). A descrip‐
tive error string for the last error encountered can be obtained from a
call to gd_error_string(3). When finished with it, a caller should de-
allocate the DIRFILE object by calling gd_close(3), or gd_discard(3),
even if the open failed.
BUGS
When working with dirfiles conforming to Standards Versions 4 and ear‐
lier (before the introduction of the /ENDIAN directive), GetData as‐
sumes the dirfile has native byte sex, even though, officially, these
early Standards stipulated data to be little-endian. This is necessary
since, in the absence of an explicit /VERSION directive, it is often
impossible to determine the intended Standards Version of a dirfile,
and the current behaviour is to assume native byte sex for modern
dirfiles lacking /ENDIAN. To read an old, little-ended dirfile on a
big-ended platform, an /ENDIAN directive should be added to the format
specification, or else GD_LITTLE_ENDIAN should be specified by the
caller.
GetData artificially limits the size of a CARRAY field to GD_MAX_CAR‐
RAY_LENGTH elements, to be certain it is always able to write the CAR‐
RAY back to disk without overrunning its maximum line length. On
32-bit systems, GD_MAX_CARRAY_LENGTH is 2**24. It is larger on 64-bit
systems. Excess elements are silently truncated on dirfile open.
GetData's parser assumes it is running on an ASCII-compatible platform.
Format specification parsing will fail gloriously on an EBCDIC plat‐
form.
SEE ALSOdirfile(5), dirfile-encoding(5), dirfile-format(5), gd_close(3),
gd_dirfile_standards(3), gd_discard(3), gd_error(3), gd_er‐
ror_string(3), gd_flags(3), gd_getdata(3), gd_include(3), gd_pars‐
er_callback(3), gd_verbose_prefix(3)Version 0.8.4 3 April 2013 gd_cbopen(3)