dirfile-format(5) DATA FORMATS dirfile-format(5)NAMEdirfile-format — the dirfile database format specification file
DESCRIPTION
The dirfile format specification fully specifies the raw and derived
time streams and auxiliary information for a dirfile(5) database.
The format specification is contained in one or more case-sensitive
text files located in the dirfile tree. Each file is known as a frag‐
ment. The primary fragment is the file called format located in the
base dirfile directory. This file may contain only part of the format
specification, and may reference other fragments (using the /INCLUDE
directive) containing further format specification. This inclusion
mechanism may be nested arbitrarily deep.
The explicit text encoding of these files is not specified by these
Standards, but it must be 7-bit ASCII compatible. Examples of accept‐
able character encodings include all the ISO 8859 character sets (i.e.
Latin-1 through Latin-10, among others), as well as the UTF-8 encoding
of Unicode and UCS.
This document primarily describes the latest version of the Standards
(Version 9); differences with previous versions are noted where rele‐
vant. A complete list of changes between versions is given in the HIS‐
TORY section below.
SYNTAX
The format specification is composed of field specification lines and
directive lines, optionally separated by blank lines or lines contain‐
ing only whitespace. Lines are separated by the line-feed character
(0x0A). Unless escaped (see below), the hash mark (#) is the comment
delimiter; the comment delimiter, and any text following it to the end
of the line, is ignored.
Tokens
Both field specification lines and directive lines consist of several
tokens separated by whitespace. Whitespace consists of one or more
whitespace characters. These are: space (0x20), horizontal tab (0x09),
vertical tab (0x0B), form-feed (0x0C), and carriage return (0x0D). The
first token of a directive line is always a reserved word. The first
token of a field specification line is never a reserved word. Any
amount of whitespace may precede the first token on a line.
Since tokens are separated by whitespace, to include a whitespace char‐
acter in a token, it must either escaped by preceding it by a backslash
character (\), or be replaced by a character escape sequence (see
below), or else the token must be enclosed in quotation marks ("). The
quotation marks themselves are stripped from the token. The null-token
(that is, the token consisting of zero characters) may be specified by
a pair of quotation marks with nothing between them (""). To include a
literal quotation mark in a token, it must be escaped (\"). Similarly,
a hash mark may be included in a token by including it in a quoted
token or else by escaping it (\#), otherwise the hash mark is under‐
stood as the comment delimiter.
It is a syntax error to have a line which contains unmatched quotation
marks, or in which the last character is the backslash character.
Several characters when escaped by a preceding backslash character are
interpreted as special characters in tokens. The character escape
sequences are:
\a an alert (bell) character (ASCII 0x07 / U+0007)
\b a backspace character (ASCII 0x08 / U+0008)
\e an escape character (ASCII 0x1B / U+001B)
\f a form-feed character (ASCII 0x0C / U+000C)
\n a line-feed character (ASCII 0x0A / U+000A)
\r a carriage return character (ASCII 0x0D / U+000D)
\t a horizontal tab character (ASCII 0x09 / U+0009)
\v a vertical tab character (ASCII 0x0B / U+000B)
\\ a backslash character (ASCII 0x5C / U+005C)
\ooo the single byte given by the octal number ooo (1 to 3
octal digits).
\xhh the single byte given by the hexadecimal number hh (1 or
2 hexadecimal digits).
\uhhhhhhh
the UTF-8 byte sequence encoding the Unicode code point
given by the hexadecimal number hhhhhhh (1 to 7 hexadeci‐
mal digits).
Any other character which is escaped is interpreted as the character
itself. (i.e. \c is interpreted as c; also, as pointed out above, \"
and \# are interpreted as simply " and #, without their special mean‐
ings).
No token may contain the NULL character (ASCII 0x00 / U+0000). Fur‐
thermore, although support is present to create UTF-8 byte sequences,
tokens are not required to be valid UTF-8 sequences. Any byte sequence
not containing the NULL character forms a valid token. However, there
may be further restrictions on allowed characters for a token in a par‐
ticular situation, (for example, when used as a field name).
Standards Version 5 and earlier do not recognise the character escape
sequences, nor allow quoting of tokens. As a result, they prohibit both
whitespace and the comment delimiter from being used in tokens.
DIRECTIVES
There are ten directives, each specified by a different reserved word,
which cannot be used as field names in the dirfile. As of Standards
Version 8, all reserved words start with an initial forward slash (/),
to distinguish them from field names. Standards Versions 5, 6, and 7
permitted the omission of the initial forward slash, while in Standards
Version 4 and earlier, reserved words may not have an initial forward
slash. Like the rest of the format specification, directives are case
sensitive.
A number of the directives have fragment scope. A directive with frag‐
ment scope only applies to the fragment in which it is present, plus
any sub-fragments indicated by the /INCLUDE directive, but only if
those sub-fragments don't have their own corresponding directive.
Directives which have fragment scope are: /ENCODING, /ENDIAN, /FRAME‐
OFFSET, and /PROTECT. Because of these scoping rules, different por‐
tions of the dirfile may have different encodings, endiannesses, frame
offsets, or protection levels.
If a directive with fragment scope appears more than once in a frag‐
ment, only the last such directive is honoured, with the exception that
the effect of a directive is not propagated to sub-fragments if the
directive line appears after the sub-fragment is included. The scoping
rules of the remaining directives are discussed below.
/ALIAS The /ALIAS directive defines an alternate name for a field
defined elsewhere in the format specification (called the "tar‐
get"). Aliases may not be used as the parent field in a /META
directive, but are in most other ways indistinguishable from the
target's original, canonical name. Aliases may be chained (that
is, the target name appearing in an /ALIAS directive may itself
be an alias). In this case, the new alias is another name for
the target's own target. Just as there is no requirement that
the input fields of a derived field exist, it is not an error
for the target of an alias to not exist. Syntax is:
/ALIAS <name> <target>
A metafield alias may defined using the <parent-field>/<alias-
name> syntax for name in the /ALIAS directive. No restriction
is placed on target; specifically, a metafield alias may target
a top-level field, or a metafield of with a different parent;
conversely, a top-level alias may target a metafield.
A metafield alias may never appear as the parent part of a
metafield field code, even if it refers to a top-level field.
That is, given the valid format:
field1 RAW UINT8 1
field1/meta CONST FLOAT64 0.0
field2 RAW UINT8 1
/ALIAS field2/alias field1
the metafield field1/meta may not be referred to as
field2/alias/meta, even though field2/alias is a valid field
code referring to field1.
The /ALIAS directive has no scope: it is processed immediately.
It appeared in Standards Version 9.
/ENCODING
The /ENCODING directive specifies the encoding scheme used to
encode binary files in the dirfile. The encoding scheme may be
one of the predefined names listed below, which are described in
more detail in dirfile-encoding(5), or any other site-specific
encoding scheme. The predefined scheme names are:
none The dirfile is unencoded.
bzip2 The dirfile is compressed using the bzip2 compression
scheme.
gzip The dirfile is compressed using the gzip compression
scheme.
lzma The dirfile is compressed using the LZMA compression
scheme.
slim The dirfile is compressed using the slim compression
scheme.
sie The dirfile is sample-index encoded (a variant of run-
length encoding).
text The dirfile is text encoded.
zzip The dirfile is compressed and encapsulated using the zzip
compression scheme.
zzslim The dirfile is compressed and encapsulated using a combi‐
nation of the zzip and slim compression schemes.
Implementations should fail gracefully when encountering an
unknown encoding scheme. If no encoding scheme is specified,
behaviour is implementation dependent. Syntax is:
/ENCODING <scheme> [<enc-datum>]
The enc-datum token provides additional data for certain encod‐
ing schemes; see dirfile-encoding(5) for details. The form of
enc-datum is not specified.
The /ENCODING directive has fragment scope. It appeared in
Standards Version 6. The predefined schemes sie, zzip, and
zzslim, and the optional enc-datum token, appeared in Standards
Version 9; the predefined scheme lzma appeared in Standards Ver‐
sion 7; all other predefined schemes appeared in Standards Ver‐
sion 6.
/ENDIAN
The /ENDIAN directive specifies the endianness of the raw data
in the database. The assumed endianness of raw data in dirfiles
which omit this directive is implementation dependent. Syntax
is:
/ENDIAN ( big | little ) [ arm ]
where the "arm" token should be included if double precision
floating point data are stored in the ARM middle-endian format.
The /ENDIAN directive has fragment scope. It appeared in Stan‐
dards Version 5. The optional arm token appeared in Standards
Version 8.
/FRAMEOFFSET
The /FRAMEOFFSET directive specifies the frame number of the
first frame for which data exists in binary files associated
with RAW fields. Syntax is:
/FRAMEOFFSET <integer>
The /FRAMEOFFSET directive has fragment scope. It appeared in
Standards Version 1.
/HIDDEN
The /HIDDEN directive indicates that the specified field name is
hidden. The difference (if any) between a field name which is
hidden and one that is not is implementation dependent. Hidden‐
ness is not inherited by metafields of the specified field.
Hiddenness applies to the name, not the field itself; it does
not hide all aliases of the field-name, and if field-name an
alias, the alias is hidden, not its target. Syntax is:
/HIDDEN <field-name>
A /HIDDEN directive must appear after the specification of
field-name, (which occurs either in a field specification line,
or an /ALIAS directive, or a /META directive) in the same frag‐
ment.
The /HIDDEN directive has no scope: it is processed immediately.
It appeared in Standards Version 9.
/INCLUDE
The /INCLUDE directive specifies another file (called a frag‐
ment) to parse for additional format specification for the
dirfile. The inclusion is processed immediately, before the
fragment containing the /INCLUDE directive (the parent fragment)
is parsed further. RAW fields specified in the included frag‐
ment are located in the directory containing the fragment file,
and not in the directory containing the parent fragment, and the
binary file encoding may be different for each fragment. The
fragment may be specified either with an absolute path, or else
a path relative to the directory containing the parent fragment.
The /INCLUDE directive may optionally specify a prefix and/or
suffix to apply to field names defined in the included fragment.
If present, affixes are applied to all field-names (including
aliases) defined in the included fragment and any fragments it
further includes. Affixes nest, with the affixes of the deepest
inclusion innermost. Affixes are not applied to the names of
binary files associated with RAW fields. Syntax is:
/INCLUDE <file> [<prefix> [<suffix>]]
To specify only a suffix, use the null-token ("") as prefix.
The /INCLUDE directive has no scope: it is processed immediate‐
ly. It appeared in Standards Version 3. The optional prefix
and suffix appeared in Standards Version 9.
/META The /META directive specifies a metafield attached to a particu‐
lar parent field. The field metadata may be of any allowed type
except RAW. Metafields are retrieved in exactly the same way as
regular field data, but the field code specified consists of the
parent and metafield names joined with a forward slash:
<parent-field>/<meta-field>
META fields may not be specified before their parent field has
been. Syntax is:
/META <parent-field> {field specification line}
The <parent-field> code may not be an alias. As an illustration
of this concept,
/META pfield meta CONST FLOAT64 3.291882
provides a scalar metadatum called meta with value 3.291882 at‐
tached to the field pfield. This particular metafield may be
referred to by the field code "pfield/meta". Note that differ‐
ent parent fields may have metafields with the same name, since
all references to metafields must include the parent field name.
Metafields may not themselves have further sub-metafields.
As an alternative to the /META directive, starting with Stan‐
dards Version 7, a metafield may be specified by a standard
field specification line, using
<parent-field>/<meta-field>
as the field name. That is, the above example metafield could
have also been specified as:
pfield/meta CONST FLOAT64 3.291882
The /META directive has no scope: it is processed immediately.
It appeared in Standards Version 6.
/PROTECT
The /PROTECT directive specifies the advisory protection level
of the current fragment and of the RAW fields defined therein.
The protection level indicates whether writing to the fragment,
or the binary data on disk is permitted. Syntax is:
/PROTECT <level>
Four advisory protection levels are defined:
none No protection at all: data and metadata may be freely
changed. This is the default, if no /PROTECT directive
is present.
format The dirfile metadata is protected from change, but RAW
data on disk may be modified.
data The RAW data on disk is protected from change, but meta‐
data may be modified.
all Both metadata and data on disk are protected from change.
The /PROTECT directive has fragment scope. It appeared in Stan‐
dards Version 6.
/REFERENCE
The /REFERENCE directive specifies the name of the field to use
as the dirfile's reference field (see dirfile(5)). If no /REF‐
ERENCE directive is specified, the first RAW field encountered
is used as the reference field. The /REFERENCE directive must
specify a RAW field. Syntax is:
/REFERENCE <field-code>
The /REFERENCE directive has global scope: if multiple /REFER‐
ENCE directives appear in the dirfile metadata, only the last
such is honoured. It appeared in Standards Version 6.
/VERSION
The /VERSION directive specifies the particular version of the
Dirfile Standards to which the dirfile format specification con‐
forms. This directive should occur before any version dependent
syntax is encountered. As of Standards Version 6, no such syn‐
tax exists, and this directive is provided primarily to ease
forward compatibility. Syntax is:
/VERSION <integer>
The /VERSION directive has immediate scope: its effect is imme‐
diate, and it applies only to metadata below it, including and
propagating downwards to sub-fragments after the directive.
In Standards Version 8 and earlier, its effect also propagates
upwards back to the parent fragment, and affects subsequent
metadata. Starting with Standards Version 9, this no longer
happens. As a result, a /VERSION directive which indicates a
version of 9 or later never propagates upwards; additionally,
/VERSION directives found in subfragments included in a Version
9 or later fragment aren't propagated upwards into that frag‐
ment, regardless of the Version of the subfragments. The /VER‐
SION directive appeared in Standards Version 5.
FIELD SPECIFICATION LINES
Any line which does not start with a reserved word is assumed to be a
field specification line. A field specification line consists of at
least two tokens. The first token is the field name. The second token
is the field type. Subsequent tokens are field parameters. The mean‐
ing and number these parameters depends on the field type specified.
Field Names
The first token in a field specification line is the field name. The
field name consists of one or more characters, excluding both ASCII
control characters (the bytes 0x01 through 0x1F), and the characters
& / ; < > | .
which are reserved (but see below for the use of / to specify
metafields). The full stop (.) is allowed in Standards Version 5 and
earlier. The ampersand, semicolon, less than, greater than, and verti‐
cal line (& / ; < > |) are allowed in Standards Version 4 and earlier.
Furthermore, due to the lack of an escape or quoting mechanism (see To‐
kens above), Standards Version 5 and earlier also prohibit whitespace
and the comment delimiter (#) in field names.
The field name may not be INDEX, which is a special, implicit field
which contains the integer frame index. Standards Version 5 and earli‐
er also prohibit FILEFRAM, which was an alias for INDEX. Field names
are case sensitive. Standards Version 3 and 4 restrict field names to
50 characters. Standards Version 2 and earlier restrict field names to
16 characters. Additionally, the filesystem may put restrictions on the
length and acceptable characters of a RAW field name, regardless of
Standards Version.
Starting in Standards Version 7, if the field name beginning a field
specification line contains exactly one / character, the line is as‐
sumed to specify a metafield. See the /META directive above for fur‐
ther details. A field name may not contain more than one /.
Field Types
There are fifteen field types. Of these, twelve are of vector type
(BIT, DIVIDE, LINCOM, LINTERP, MPLEX, MULTIPLY, PHASE, POLYNOM, RAW,
RECIP, SBIT, and WINDOW) and three are of scalar type (CONST, CARRAY,
and STRING). The eleven vector field types other than RAW fields are
also called derived fields, since they derive their value from one or
more input fields.
Five of these derived fields (DIVIDE, LINCOM, MPLEX, MULTIPLY, and WIN‐
DOW) may have more than one input field. In situations where these in‐
put fields have differing sample rates, the sample rate of the derived
field is the same as the sample rate of the first (left-most) input
field specified. Furthermore, the input fields are synchronised by
aligning them on frame boundaries, assuming equally-spaced sampling
throughout a frame, and using the last sample of each input field which
did not occur after the sample of the derived field being computed.
That is, if the first and second input fields have sample rates s1 and
s2, the derived field also has sample rate s1 and, for every sample of
the derived field, n, the n'th sample of the first field is used (since
they have the same sample rate by definition), and the sample number
used of the second field, m, is computed as:
m = floor((n * s2) / s1).
Starting in Standards Version 6, certain scalar field parameters in the
field specifications may be specified using CONST or CARRAY fields, in‐
stead of literal values. A list of parameters for which this is al‐
lowed is given below in the Field Parameters section.
The possible fields types are:
BIT The BIT vector field type extracts one or more bits out of an
input vector field as an unsigned number. Syntax is:
<fieldname> BIT <input> <first-bit> [<num-bits>]
which specifies fieldname to be the value of bits first-bit
through first-bit+num-bits-1 of the input vector field input,
when input is converted from its native type to an (endianness
corrected) unsigned 64-bit integer. If num-bits is omitted, it
is assumed to be 1. The SBIT field type is a signed version of
this field type. The optional num-bits parameter appeared in
Standards Version 1.
CARRAY The CARRAY scalar field type is a list of constants fully speci‐
fied in the format specification metadata. Syntax is:
<fieldname> CARRAY <type> <value0> <value1> <value2> ...
where type may be any supported native data type (see the de‐
scription of the RAW field type below), and value0, value1, &c.
are the values of successive elements in the scalar list inter‐
preted as indicated by type. No limit is placed on the number
of elements in a CARRAY. (Note: despite being multivalued, this
is not considered a vector field since the elements of the CAR‐
RAY are not indexed by frames.) It appeared in Standards Ver‐
sion 8.
CONST The CONST scalar field type is a constant fully specified in the
format specification metadata. Syntax is:
<fieldname> CONST <type> <value>
where type may be any supported native data type (see the de‐
scription of the RAW field type below), and value is the numeri‐
cal value of the constant interpreted as indicated by type. It
appeared in Standards Version 6.
DIVIDE The DIVIDE vector field type is the quotient of two vector
fields. Syntax is:
<fieldname> DIVIDE <field1> <field1>
The derived field is computed as:
fieldname = field1 / field2.
It was introduced in Standards Version 8.
LINCOM The LINCOM vector field type is the linear combination of one,
two or three input vector fields. Syntax is:
<fieldname> LINCOM [<n>] <field1> <a1> <b1> [<field2>
<a2> <b2> [<field3> <a3> <b3>]]
where n, if present, indicates the number of input vector fields
(1, 2, or 3). The derived field is computed as:
fieldname = (a1 * field1 + b1) + (a2 * field2 + b2) + (a3
* field3 + b3)
with the field2 and field3 terms included only if specified.
If n is not specified, the number of fields is determined by
looking at the supplied parameters. Since it is possible to
create a field code which is identical to a literal number, the
third token on the line is assumed to be n if it the entire to‐
ken can be parsed as a literal number using the rules outlined
in strtod(3). That is, if the field code specifying field1
could be mistaken for a literal number, n must be specified to
prevent ambiguity. In standards Version 6 and earlier, n is
mandatory.
LINTERP
The LINTERP vector field type specifies a table look up based on
another vector field. Syntax is:
<fieldname> LINTERP <input> <table>
where input is the input vector field for the table lookup, and
table is the path to the lookup table file for the field. If
this path is relative, it is assumed to be relative to the di‐
rectory containing the fragment defining this field. The lookup
table file is an ASCII text file with two whitespace separated
columns of x and y values. Values are linearly interpolated be‐
tween the points specified in the lookup table.
MPLEX The MPLEX vector field type permits the multiplexing of several
low sample rate fields into a single data field of higher sample
rate. Syntax is:
<fieldname> MPLEX <input> <index> <count> [<period>]
where input is the input vector containing the multiplexed
fields, index is the vector containing the mutliplex index,
count is the value of the multiplex index when the computed
field is stored in input, and period, if present and non-zero,
is the number of samples between successive occurrances of the
value count in the index vector. A period of zero (or, equiva‐
lently, it's omission) indicates that either the value count is
not equally spaced in the index vector, or else that the spacing
is unknown. Both count and period are integers, and period may
not be negative.
At every sample n, the derived field is computed as:
fieldname[n] = (index == count) ? input[n] : fieldname[n
- 1]
The index vector is converted to an integer type for comparison.
The value of the derived field before the first sample where in‐
dex equals count is implementation dependent.
The values of count and period place no restrictions on values
contained in index. Specifically, particular values of index
(including count) need not be equally spaced (neither by period
nor any other spacing); index need not ever take on the value
count (in which case the value of the entirety of the derived
field is implementation dependent). Different MPLEX field defi‐
nitions which use the same index vector may specify different
periods. MPLEX appeared in Standards Version 9.
MULTIPLY
The MULTIPLY vector field type is the product of two vector
fields. Syntax is:
<fieldname> MULTIPLY <field1> <field2>
The derived field is computed as:
fieldname = field1 * field2.
It appeared in Standards Version 2.
PHASE The PHASE vector field type shifts an input vector field by the
specified number of samples. Syntax is:
<fieldname> PHASE <input> <shift>
which specifies fieldname to be the input vector field, input,
shifted by shift samples. A positive shift indicates a forward
shift, towards the end-of-field. Results of shifting past the
beginning- or end-of-field is implementation dependent. PHASE
appeared in Standards Version 4.
POLYNOM
The POLYNOM vector field type specifies a polynomial function of
a single input vector field. Syntax is:
<field_name> POLYNOM <input> <a0> <a1> [<a2> [<a3> [<a4>
[<a5>]]]]
where <input> is the input field code, and the order of the com‐
puted polynomial is determined by how many co-efficients are
present in the specification. The derived field is computed as:
fieldname = a0 + a1 * input + a2 * input**2 + a3 * in‐
put**3 + a4 * input**4 + a5 * input**5
where ** is the element-wise exponentiation operator, and the
higher order terms are computed only if the corresponding co-ef‐
ficients ai are specified. POLYNOM appeared in Standards Ver‐
sion 7.
RAW The RAW vector field type specifies raw time streams on disk.
In this case, the field name should correspond to the name of
the file containing the time stream. Syntax is:
<fieldname> RAW <type> <sample-rate>
where sample-rate is the number of samples per dirfile frame for
the time stream and type is a token specifying the native data
format type:
UINT8 unsigned 8-bit integer
INT8 signed (two's complement) 8-bit integer
UINT16 unsigned 16-bit integer
INT16 signed (two's complement) 16-bit integer
UINT32 unsigned 32-bit integer
INT32 signed (two's complement) 32-bit integer
UINT64 unsigned 64-bit integer
INT64 signed (two's complement) 64-bit integer
FLOAT32
IEEE-754 standard 32-bit single precision floating
point number
FLOAT64
IEEE-754 standard 64-bit double precision floating
point number
COMPLEX64
a 64-bit complex number consisting of two IEEE-754
standard 32-bit single precision floating point
numbers representing the real and imaginary parts
of the complex number (Standards Version 7 and
later)
COMPLEX128
a 128-bit complex number consisting of two
IEEE-754 standard 64-bit double precision floating
point numbers representing the real and imaginary
parts of the complex number (Standards Version 7
and later).
For more information on the storage of complex valued data, see
dirfile(5). Two additional type names exist: FLOAT is equiva‐
lent to FLOAT32, and DOUBLE is equivalent to FLOAT64. Standards
Version 9 deprecates these two aliases, but still allows them.
All these type names (except those for complex data, which came
later) were introduced in Standards Version 5. Earlier Stan‐
dards Versions specified data types with single character type
aliases:
c UINT8
u UINT16
s INT16
U UINT32
i, S INT32
f FLOAT32
d FLOAT64
Types INT8, UINT64, INT64, COMPLEX64, and COMPLEX128 are not
supported before Standards Version 5, so no single character
type aliases exist for these types. These single character type
aliases were deprecated in Standards Version 5 and removed in
Standards Version 8.
RECIP The RECIP vector field type computes the reciprocal of a single
input vector field. Syntax is:
<field_name> RECIP <input> <dividend>
where <input> is the input field code and <dividend> is a scalar
quantity. The derived field is computed as:
fieldname = dividend / input.
RECIP appeared in Standards Version 8.
SBIT The SBIT vector field type extracts one or more bits out of an
input vector field as a signed number. Syntax is:
<fieldname> SBIT <input> <first-bit> [<bits>]
which specifies fieldname to be the value of bits first-bit
through first-bit+bits-1 of the input vector field input, when
input is converted from its native type to a (endianness cor‐
rected) signed 64-bit integer. If bits is omitted, it is as‐
sumed to be 1. The BIT field type is an unsigned version of
this field type. SBIT appeared in Standards Version 7.
STRING The STRING scalar field type is a character string fully speci‐
fied in the format file metadata. Syntax is:
<fieldname> STRING <value>
where value is the string value of the field. Note that value
is a single token. To include whitespace in the string, enclose
value in quotation marks ("), or else escape the whitespace with
the backslash character (\). STRING appeared in Standards Ver‐
sion 6.
WINDOW The WINDOW vector field type isolates a portion of an input vec‐
tor based on a comparison. Syntax is:
<fieldname> WINDOW <input> <check> <op> <threshold>
where input is the vector containing the data to extract, check
is the vector on which to test the comparison, threshold is the
value against which check is compared, and op is one of the fol‐
lowing tokens indicating the particular comparison performed:
EQ data are extracted where check, converted to a
64-bit signed integer, equals threshold,
GE data are extracted where check, converted to a
64-bit floating-point number, is greater than or
equal to threshold,
GT data are extracted where check, converted to a
64-bit floating-point number, is strictly greater
than threshold,
LE data are extracted where check, converted to a
64-bit floating-point number, is less than or
equal to threshold,
LT data are extracted where check, converted to a
64-bit floating-point number, is strictly less
than threshold,
NE data are extracted where check, converted to a
64-bit signed integer, is not equal to threshold,
SET data are extracted where at least one bit set in
threshold is also set in check, when converted to
a 64-bit unsigned integer,
CLR data are extracted where at least one bit set in
threshold is not set in check, when converted to a
64-bit unsigned integer,
The storage type of threshold depends on the operator, and fol‐
lows the interpretation of check. It may never be complex val‐
ued.
Outside the region extracted, the value of the derived field is
implementation dependent.
Note: with the EQ operator, this derived field type is very sim‐
ilar to the MPLEX field type above. The primary difference is
that MPLEX mandates the value of the derived field outside the
extracted region, while WINDOW does not. WINDOW appeared in
Standards Version 9.
Field Parameters
All input vector field parameters should be field codes (see below).
Additionally, the scalar field parameters listed may be either literal
numbers or else the field code of a CONST field containing the value,
or the field code of a CARRAY followed by a left angle bracket (<),
then an non-negative integer used as the CARRAY element index, then a
right angle bracket (>), that is:
fieldcode<n>
If the angle brackets and element index are omitted from a CARRAY field
code used as a parameter, the first element in the field (index zero)
is assumed.
Field parameters which may be specified using a scalar field code are:
BIT, SBIT
bitnum, numbits
LINCOM any of the mi, or bi
MPLEX count, max
PHASE shift
POLYNOM
any of the ai
RAW spf
RECIP dividend
WINDOW threshold
Since it is possible to create a field code which is identical to a
literal number, a parameter is assumed to be the field code of a scalar
field only if the entire token cannot be parsed as a literal number us‐
ing the rules outlined in strtod(3). For example, a CONST field whose
field code consists solely of digits can never be used as a parameter
in a field specification line.
Starting in Standards Version 7, literal complex number is specified as
two real (floating point) numbers separated by a semicolon (;) with no
intervening whitespace. So, for example, the tokens
1;0 0;1 4;0 0;5 9.313e2;74.1
represent, respectively, the real unit, the imaginary unit, the real
number four, the imaginary number 5i, and the complex number 931.3 +
74.1i. Because the semicolon character cannot be used in field names,
a complex valued literal can never be mistaken for a field code. This
allows, among other things, the composition of complex valued fields
from purely real input fields. For example, a complex valued field, z,
may be created from a real valued field re, representing the real part
of the complex number, and the real valued field im, representing the
imaginary part of the complex number, with the following LINCOM speci‐
fication:
z LINCOM re 1 0 im 0;1 0
Starting in Standards Version 9, in additional to decimal notation,
literal integer parameters may be specified as hexadecimal numbers, by
prefixing the number (after an optional '+' or '-' sign) with 0x or 0X,
or as octal numbers, by prefixing the number with 0, as described in
strtol(3). Similarly, floating point literal numbers (both purely real
ones and components of complex literals) may be specified in hexadeci‐
mal by prefixing them with 0x or 0X, and using p or P as the binary ex‐
ponent prefix, as described in the C99 standard. Both uppercase and
lowercase hexadecimal digits may be used. In cases where a literal
floating point number may apear, the tokens INF or INFINITY, optionally
preceded by a '+' or '-' sign, and NAN, optionally immediately followed
by '(', then a sequence of characters, then ')', and all disregarding
case, will be interpreted as the special floating point values ex‐
plained in strtod(3).
Field Codes
When specifying the input to a field, either as a scalar parameter, or
as an input vector field to a non-RAW vector field, field codes are
used. A field code is one of:
· a simple field name, possibly an alias, indicating a vector or
scalar field
· a parent field name, followed by a forward slash, followed by a
metafield name, indicating a metafield. See the description of the
/META directive above for further details.
· either of the above, followed by a period, followed by a represen‐
tation suffix, but only if the field or metafield specified is not
a STRING type field.
A representation suffix may be used used to extract a real number from
a complex value. The available suffixes and their meanings are:
.a This representation indicates the angle (in radians) between the
positive real axis and the value (ie. the complex argument).
The argument is in the range [-pi, pi], and a branch cut exists
along the negative real axis. At the branch cut, -pi is re‐
turned if the imaginary part is -0, and pi is returned if the
imaginary part is +0. If z=0, zero is returned.
.i This representation indicates the projection of the value onto
the imaginary axis (ie. the imaginary part of the number).
.m This representation indicates the modulus of the value (ie. its
absolute value).
.r This representation indicates the projection of the value onto
the real axis (ie. the real part of the number).
If the specified field is purely real, the representations are calcu‐
lated as if the imaginary part was equal to +0. For example, given a
complex valued vector, z, a vector containing the real part of z, re_z,
could be produced with:
re_z PHASE z.r 0
and similarly for the complex field's imaginary part, argument, and ab‐
solute value. (Although it should be pointed out this simplistic an
example isn't strictly necessary, since z.r could be used wherever re_z
would be.)
HISTORY
This document describes Versions 9 and earlier of the Dirfile Stan‐
dards.
Version 9 of the Standards (April 2012) added the MPLEX and WINDOW
field types, the /ALIAS and /HIDDEN directives, the affixes to /IN‐
CLUDE, the sie, zzip, and zzslim encoding schemes, along with the op‐
tional enc_datum token to /ENCODING. It permitted specification of in‐
teger literals in octal and hexadecimal. Finally, it deprecated the
type aliases FLOAT and DOUBLE.
Version 8 of the Standards (November 2010) added the DIVIDE, RECIP, and
CARRAY field types, made the forward slash on reserved words mandatory,
and prohibited using the single character data type aliases in the
specification of RAW fields. It also introduced the optional second
(arm) token to the /ENDIAN directive.
Version 7 of the Standards (October 2009) added the SBIT and POLYNOM
field types, and the directive-less method of specifying metafields.
It also introduced the data types COMPLEX128 and COMPLEX64, along with
the notion of representations, and the lzma encoding scheme. Finally,
it made the number of fields parameter for LINCOM optional.
Version 6 of the Standards (October 2008) added the /ENCODING, /META,
/PROTECT, and /REFERENCE directives, and the CONST and STRING field
types. It permitted whitespace in tokens and introduced the character
escape sequences. It allowed CONST fields to be used as parameters in
field specification lines. It also removed FILEFRAM as an alias for
INDEX, and prohibited . but allowed # and \ in field names.
Version 5 of the Standards (August 2008) added VERSION and ENDIAN,
slash demarcation of reserved words, and removed the restriction on
field name length. It introduced the data types INT8, INT64, and
UINT64, the new-style type specifiers, and increased the range of the
BIT field type from 32 to 64 bits. It also prohibited the characters
&;<>\| in field names.
Version 4 of the Standards (October 2006) added the PHASE field type.
Version 3 of the Standards (January 2006) added INCLUDE and increased
the allowed length of a field name from 16 to 50 characters.
Version 2 of the Standards (September 2005) added the MULTIPLY field
type.
Version 1 of the Standards (November 2004) added FRAMEOFFSET and the
optional fourth argument to the BIT field type.
Version 0 of the Standards (before March 2003) refers to the dirfile
standards supported by the getdata(3) library originally introduced in‐
to the kst(1) sources, which contained support for all other features
covered by this document.
AUTHORS
The dirfile specification was developed by C. B. Netterfield
<netterfield@astro.utoronto.ca>.
Since Standards Version 3, the dirfile specification has been main‐
tained by D. V. Wiebe <getdata@ketiltrout.net>.
SEE ALSOdirfile(5), dirfile-encoding(5)Standards Version 9 3 April 2013 dirfile-format(5)