DBZ(3)DBZ(3)NAME
dbzinit, dbzfresh, dbzagain, dbzclose - database routines
dbzexists, dbzfetch, dbzstore - database routines
dbzsync, dbzsize, dbzgetoptions, dbzsetoptions, dbzdebug -
database routines
SYNOPSIS
#include <dbz.h>
BOOL dbzinit(const char *base);
BOOL dbzclose(void);
BOOL dbzfresh(const char *base, const long size);
BOOL dbzagain(const char *base, const char *oldbase);
BOOL dbzexists(const HASH key);
OFFSET_T dbzfetch(const HASH key);
BOOL dbzfetch(const HASH key, void *ivalue);
BOOL dbzstore(const HASH key, const OFFSET_T offset);
BOOL dbzstore(const HASH key, void *ivalue);
BOOL dbzsync(void);
long dbzsize(const long nentries);
void dbzgetoptions(dbzoptions *opt);
void dbzsetoptions(const dbzoptions opt);
BOOL dbzdebug(const BOOL newvalue);
DESCRIPTION
These functions provide an indexing system for rapid ran-
dom access to a text file (the base file).
Dbz stores offsets into the base text file for rapid
retrieval. All retrievals are keyed on a hash value that
is generated by the HashMesssageID() function.
Dbminit opens a database, an index into the base file
base, consisting of files base.dir , base.index , and
base.hash which must already exist. (If the database is
new, they should be zero-length files.) Subsequent
accesses go to that database until dbzclose is called to
close the database.
Dbzfetch searches the database for the specified key,
returning the corresponding value if any, if
<DBZ_TAGGED_HASH in config.data> is ``DO''. If
<DBZ_TAGGED_HASH in config.data> is ``DONT'', it returns
6 Sep 1997 1
DBZ(3)DBZ(3)
TRUE and content of ivalue is set. Dbzstore stores the
key-value pair in the database, if <DBZ_TAGGED_HASH in
config.data> is ``DO''. If <DBZ_TAGGED_HASH in con-
fig.data> is ``DONT'', it stores the content of ivalue.
Dbzstore will fail unless the database files are writable.
Dbzexists will verify whether or not the given message-id
exists or not. Dbz is optimized for this operation and it
may be significantly faster than dbzfetch.
Dbzfresh is a variant of dbzinit for creating a new
database with more control over details.
Dbzfresh's size parameter specifies the size of the first
hash table within the database, in key-value pairs. Per-
formance will be best if the number of key-value pairs
stored in the database does not exceed about 2/3 of size.
(The dbzsize function, given the expected number of key-
value pairs, will suggest a database size that meets these
criteria.) Assuming that an fseek offset is 4 bytes, the
.index file will be 4*size bytes. The .hash file will be
DBZ_INTERNAL_HASH_SIZE*size bytes (the .dir file is tiny
and roughly constant in size) until the number of key-
value pairs exceeds about 80% of size. (Nothing awful
will happen if the database grows beyond 100% of size, but
accesses will slow down quite a bit and the .index and
.hash files will grow somewhat.)
Dbz stores up to DBZ_INTERNAL_HASH_SIZE bytes of the mes-
sage-id's hash in the .hash file to confirm a hit. This
eliminates the need to read the base file to handle colli-
sions. This replaces the tagmask feature in previous dbz
releases.
A size of 0 given to dbzfresh is synonymous with the local
default; the normal default is suitable for tables of
5,000,000 key-value pairs. Calling dbzinit(name) with the
database files empty is equivalent to calling
dbzfresh(name,0).
When databases are regenerated periodically, as in news,
it is simplest to pick the parameters for a new database
based on the old one. This also permits some memory of
past sizes of the old database, so that a new database
size can be chosen to cover expected fluctuations. Dbza-
gain is a variant of dbzinit for creating a new database
as a new generation of an old database. The database
files for oldbase must exist. Dbzagain is equivalent to
calling dbzfresh with a size equal to the result of apply-
ing dbzsize to the largest number of entries in the old-
base database and its previous 10 generations.
When many accesses are being done by the same program, dbz
is massively faster if its first hash table is in memory.
If the incore flag is set to INCORE_MEM, an attempt is
6 Sep 1997 2
DBZ(3)DBZ(3)
made to read the table in when the database is opened, and
dbzclose writes it out to disk again (if it was read suc-
cessfully and has been modified). Dbzsetoptions can be
used to set the idx_incore and exists_incore flag to new-
value (which should be INCORE_NO, INCORE_MEM, or
INCORE_MMAP) for the .hash and .index files separately;
this does not affect the status of a database that has
already been opened. The default is FALSE for the .index
file and TRUE for the .hash file. The attempt to read the
table in may fail due to memory shortage; in this case dbz
fails with an error. Stores to an in-memory database are
not (in general) written out to the file until dbzclose or
dbzsync, so if robustness in the presence of crashes or
concurrent accesses is crucial, in-memory databases should
probably be avoided or the writethrough option should be
set to TRUE;
If the nonblock option is turn on then, then writes to the
.index and .hash files will be done using non-blocking
I/O. This can be significantly faster if your platform
supports non-blocking I/O with files.
Dbzsync causes all buffers etc. to be flushed out to the
files. It is typically used as a precaution against
crashes or concurrent accesses when a dbz-using process
will be running for a long time. It is a somewhat expen-
sive operation, especially for an in-memory database.
If dbz has been compiled with debugging facilities avail-
able (which makes it bigger and a bit slower), dbzdebug
alters the value (and returns the previous value) of an
internal flag which (when 1; default is 0) causes verbose
and cryptic debugging output on standard output.
Concurrent reading of databases is fairly safe, but there
is no (inter)locking, so concurrent updating is not.
An open database occupies three stdio streams and and two
file descriptors; Memory consumption is negligible (except
for stdio buffers) except for in-memory databases.
SEE ALSOdbm(3), history(5)DIAGNOSTICS
Functions returning BOOL values return TRUE for success,
FALSE for failure. Functions returning OFFSET_T values
return a value with -1 for failure. Dbminit attempts to
have errno set plausibly on return, but otherwise this is
not guaranteed. An errno of EDOM from dbzinit indicates
that the database did not appear to be in dbz format.
If DBZTEST is defined at compile-time then a main() func-
tion will be included. This will do performance tests and
6 Sep 1997 3
DBZ(3)DBZ(3)
integrity test.
HISTORY
The original dbz was written by Jon Zeeff (zeeff@b-
tech.ann-arbor.mi.us). Later contributions by David But-
ler and Mark Moraes. Extensive reworking, including this
documentation, by Henry Spencer (henry@zoo.toronto.edu) as
part of the C News project. MD5 code borrowed from RSA.
Extensive reworking to remove backwards compatibility and
to add hashes into dbz files by Clayton O'Neill
(coneill@oneill.net)
BUGS
Unlike dbm, dbz will refuse to dbzstore with a key already
in the database. The user is responsible for avoiding
this.
The RFC822 case mapper implements only a first approxima-
tion to the hideously-complex RFC822 case rules.
Dbz no longer tries to be call-compatible with dbm in any
way.
6 Sep 1997 4