AWKA-ELM(5) AWKA EXTENDED LIBRARY METHODS AWKA-ELM(5)NAMEawka-elm - Awka Extended Library Methods
DESCRIPTION
Awka is a translator of AWK programs to ANSI-C code, and a library
(libawka.a) against which the code is linked to create executables.
Awka is described in the awka manpage.
The Extended Library Methods (ELM) provide a way of adding new func‐
tions to the AWK language, so that they appear in your AWK code as if
they were builtin functions such as substr() or index().
ELM code interfaces with the internal Awka variable structures and
functions, and is suitable for anyone with some experience and profi‐
ciency in C programming.
This document is a step-by-step introduction to how the ELM works, so
by the end of it you can write your own libraries to extend the AWK
programming language using Awka. For example, you could write an
interface to allow AWK programs to communicate with ODBC databases, or
solve the travelling salesman problem given input of town locations -
whatever you require AWK to do should now be possible.
AN OVERVIEW OF HOW IT WORKS
The C code produced by awka from AWK programs is heavily populated with
calls to functions in the awka library (libawka). Hence after it is
compiled, this code must be linked to the library to produce a working
executable.
When parsing an AWK program, awka checks to see if each function call
in the program is (a) a core builtin function, (b) a call to a user-
defined AWK function in the program, or (c) a call to one of the
extended builtin functions. The above order of priority is applied, so
a user-defined function (b) overrides (c), and (a) overrides (b) to
avoid conflicts.
If none of these prove to be true, the function call is written in the
code in the format of a user-defined function, even though that func‐
tion doesn't exist to its knowledge. Awka is assuming that by link
time you will provide another object file or library that contains the
missing function and resolve the call.
So if I pass awka the following code:
BEGIN { print mymath(3,4) }
The call it generates will look like this...
mymath_fn(awka_arg2(a_TEMP, _litd0_awka, _litd1_awka))
So all we need to do is write the mymath_fn() function, and link it
with the awka-generated code, and bingo! AWK has been extended by you,
to do what you want. And the only restrictions on what a function like
mymath_fn() might do are those imposed by the C language!
So, you write the function, compile it into a library, use it in your
AWK program, translate it, link it in, and you're away - its that sim‐
ple (fingers crossed).
FUNCTIONS AND DATA STRUCTURES
Ok, the first thing to notice is that the function name in the AWK
code, mymath, has been appended with _fn in the C code. This happens
with all unresolved AWK function calls (also with user-defined function
names, but that doesn't matter here). It's done to avoid unintentional
conflicts with functions in other libraries.
The definition of any function is this:-
funcname_fn( a_VARARG * )
Ugh! What's this a_VARARG thingy? Yes, learned reader, the time has
come to get acquainted with the dreaded Awka data structures. Well
they're pretty simple actually. The two you need to know about are
a_VAR and a_VARARG, and as the latter contains arrays of the former,
I'll deal with a_VAR first.
The a_VAR Structure
typedef struct {
double dval; /* the variable's numeric value */
char * ptr; /* pointer to string, array or RE structure */
unsigned int slen; /* length of string ptr as per strlen */
unsigned int allc; /* space mallocated for string ptr */
char type; /* records current cast of variable */
char type2; /* special flag for dual-type variables */
char temp; /* TRUE if a temporary variable */
} a_VAR;
These are used prolifically throughout the AWK library, and are at the
heart of how it manipulates data. Remember, AWK variables are essen‐
tially typeless, as they can be cast to number, string or regular
expression at your whim throughout a program. The only thing you can't
cast to & from is arrays, as a variable is only either an array or a
scalar (the other types).
Recall our mymath example earlier. In the AWK code, we had
"mymath(3,4)", but the C code was "mymath_fn(awka_arg2(a_TEMP,
_litd0_awka, _litd1_awka))".
The numeric value of 3 has been changed to _litd0_awka, and 4 to
_litd1_awka. If you run awka with this example program & examine the
output, you'll see that both _litd0_awka and _litd1_awka are pointers
to a_VAR structures, and each has been set to the appropriate numeric
values. Hence, all data passed to our functions will be embodied
inside a_VAR's.
Confused? Yes? No? Take heart, it doesn't get much worse, and with a
few more examples I hope things should be clearer. Looking at the call
to mymath_fn above, you'll notice a call to awka_arg2(). Remember that
mymath_fn only takes a pointer to an a_VARARG, so awka_arg2() obviously
returns one of these.
What an a_VARARG contains is an array of a_VARs, and an integer showing
how many there are in the array - thats all! Don't believe me? Then
here's the structure in all its glory:
The a_VARARG Structure
typedef struct {
a_VAR *var[256];
int used;
} a_VARARG;
The a_VARARG structure gives us an easy means of passing around flexi‐
ble numbers of a_VARS to functions, much as you'd use vararg in a C
program. If you don't know what vararg does and have some time, check
the stdarg manpage.
So, to conclude, awka_arg2() takes two a_VARs and packages them nicely
into an a_VARARG to make life easy for our function. Another thing to
note - the a_VARARG function allows up to 256 arguments. No parame‐
ters, only arguments, and they always win them! Sorry, on with the
serious stuff...
THE MYMATH FUNCTION IMPLEMENTED
So when we come to write mymath_fn, what type of thing should it con‐
tain? Ok, lets assume we want mymath to add the two numbers it
receives as arguments, then add on the two numbers multiplied, and
return the result, ie. (n1+n2)+n1*n2.
Well, here goes...
#include <libawka.h>
a_VAR *
mymath_fn( a_VARARG *va )
{
a_VAR *ret = NULL;
if (va->used < 2)
awka_error("function mymath expecting 2 arguments, only got %d.\n",va->used);
ret = awka_getdoublevar(FALSE);
ret->dval = (awka_getd(va->var[0]) + awka_getd(va->var[1])) +
va->var[0]->dval * va->var[1]->dval;
return ret;
}
Ok, there's not a lot to it, so lets start at the top. You need to
include libawka.h, as it defines the data structures plus the whole
Awka API that you'll be calling.
The definition of mymath_fn is as described earlier. It will need to
return a numeric value, but as we're in AWK (conceptually), this will
need to be enclosed in an a_VAR, hence the existence of ret.
The incoming a_VARARG can contain any number of a_VAR's - we only care
about the first two, so we check to see whether these exist, and if not
spit an error through the awka_error function (or you could use your
own error handler). When writing your own functions, you'll need to
remember that any number of arguments could be passed in, and they
could be of any type, so you'll need to check them.
So far, ret is NULL, so we need to create a structure to point it to.
Better than that, we call awka_getdoublevar(), which gets us a tempo‐
rary variable, already initialised to contain a numeric value. You
guessed it, there's an awka_getstringvar() that we could use if our
function was to return a string. The value of FALSE passed to
awka_getdoublevar() means that we don't want to be responsible for
freeing this structure, but prefer to leave it to libawka's internal
garbage collection. I can't see any reason why you'd choose TRUE, but
its there just in case.
The next 2 lines do the core stuff. Ok, ret->dval is set, that makes
sense. The expression refers to the contents of the a_VARARG->a_VAR
array, again this is expected. At first, though, it calls awka_getd()
for each of the arguments, but on the next line it references the dval
value directly. Why the calls to awka_getd?
Because it can't be sure that the incoming variables are already cast
to numbers, so these functions (actually macros) do the casting for us,
and return the value of dval after the cast is done. Subsequently, we
can look at dval directly as we know its been set to the current numer‐
ical value of the variable.
Lastly, we return ret.
COMPILING AND LINKING
Alright, let's get this working. Follow these steps:
1. Create mymath.c with mymath_fn(), exactly as its written above.
2. Create mymath.h containing: a_VAR * mymath_fn( a_VARARG *va );
3. gcc -c mymath.c (or use whatever C compiler you have).
4. awka -i mymath.h 'BEGIN { print mymath(3,4) }' >test.c
5. gcc -I. test.c mymath.o -lawka -lm -o mytest
6. mytest
The output from running mytest should be 19. Magic!
A more comprehensive example is the awkatk library available from the
awka website. Hopefully you'll find it helpful, and who knows, you may
even use it to write GUI interfaces from AWK!
HOW & WHEN WOULD YOU USE IT?
Obviously, this is intended to extend the limits of the AWK universe,
as you could introduce any functionality written in C as a new builtin
function within AWK.
There may be complex functions you've written in AWK and use all the
time that are just plain inefficient, even using Awka. They're stable,
you have the skill to implement them in C, so now you can, and your AWK
programs become shorter in the process. It's no longer a choice of C
or AWK, now you can migrate sections to C as & when you like.
There are many functions in standard C libraries that AWK doesn't have.
Things like strcasecmp(), fread(), cbrt(), and so on. Now you can
implement them.
Lastly, I'd love to see Awka have functions to read & write proprietary
formats like MS Excel, to communicate with ODBC databases, to perform
complex mathematical or scientific operations, to implement true multi-
dimensional arrays, to provide Fast Fourier Transform functions - I
know its possible. If you do develop something neat like this, it'd be
very cool if you were to make it available for everyone to share. Just
send an email to andrewsumner@yahoo.com, and I'd be happy to host it
on, or link it from the Awka website.
NOTE: KEEP YOUR API FLAT
So you've created quite a few Awka-ELM functions that you've put
together into a library. Let's say they calculate the time needed to
build the Sydney Harbour Bridge given a volume of manpower and the num‐
ber of supervisors. Internally, there's quite a few algorithms that
take into account strikes by unions, material shortages, and casualties
as workers fall off the bridge.
Because of this complexity, within your library functions will need to
call other functions. This is fine. What you need to do is not have
an API function call another API function, but instead keep any func‐
tions they call hidden within the library, and also ensure these inter‐
nal functions do not use the awka_getdoublevar(), awka_getstringvar()
or awka_tmpvar() calls.
Apart from keeping your library structure nice and hierarchical and
your API simple, it avoids overloading awka's internal pool of tempo‐
rary variables. If this pool is overloaded, random chaos will ensue,
so please avoid it.
NOTE: REFERENCING GLOBAL VARIABLES
All global variables in your AWK program are accessible by your library
functions. Herein lies the potential for great danger, so be careful!
Global variables are, of course, pointers to a_VAR structures, and
their name is the same as in the AWK script, with _awk appended. So
the variable 'myvar' in the script would be myvar_awk in the translated
C code. If you know what the variable name is, you can put an extern
declaration of it in your library code then work with it directly, but
this may be very restrictive, as it would mean that every script that
uses your library would need that variable name reserved. There are
other methods.
One of the easiest is with arrays. You can pass them in as arguments
to your functions, as their address is passed over rather than a copy
of their contents. Scalars are not as easy. Just say our function
will work with a global variable, however it expects a string argument
to contain the variable name in order to identify which variable to
work with - this would make it pretty flexible.
You have available to you the gvar_struct variable _gvar (both
described in awka-elmref(5)). This contains the name of every global
variable in the script, and its a simple matter to search down the list
to find a pointer to the a_VAR structure of the variable you want to
use.
NOTE: CUSTOM DATA STRUCTURES
Looking again at the a_VAR structure, you may note that it contains a
char * pointer that can reference strings, arrays and regular expres‐
sions. There is no reason why you couldn't introduce your own custom
data structure and attach it to a global variable within one of your
functions, as long as you adhere to the following rules:
1. Don't set the variable to anything in AWK after you set it to your
customised value, as libawka will try (and fail) to free the value
up,
causing all sorts of flow-on problems.
2. Don't use the AWK language to copy or compare this variable to oth‐
ers,
even with two variables of the same custom type (ie. custvar1 =
custvar2),
as libawka will have no idea how the copy should be done, and it
will stuff
it up. Instead, provide your own copy and comparison functions.
3. If your structures are memory intensive, you may consider providing
a method
of freeing the structures when they are no longer needed.
4. Document what your data structures and methods do, and how they
should be used
in the AWK script. Please, please do this, as it could save you a
lot of grief
later. If your library becomes publicly available this is espe‐
cially necessary.
This has been a very brief introduction indeed, but hopefully enough to
get you started. I recommend you refer to the awka-elmref(5) manpage
for a listing of key libawka API functions and data definitions that
are available for you to use (but hopefully not abuse). If you have
any questions at all, don't be afraid to contact me (andrewsum‐
ner@yahoo.com). Put the word "awka" at the front of your message title
so I know its not spam.
SEE ALSOawka(1), awka-elmref(5), gcc(1)BUGS
Bound to be plenty. Let me know if you find a bug with the libawka
interface, or get stuck with a problem. I am not, though, in any way
responsible for bugs that are introduced by your code, nor am I liable
for any damages or expenses incurred as a result. Nor am I liable for
anything you do using Awka.
I'll help where I can, and I'll usually help debug someone's library if
I have a personal interest in it. If you're not sure, try me anyway,
the worst I can do is say no, and I might be able to help. I really
like folk who send fixes along with bug reports, though. And I love
the folk who send cash inducements (at last count, um, zero folk). Oh
well, enough rambling, time to finish.
AUTHOR
Andrew Sumner, August 2000 (andrewsumner@yahoo.com).
Version 0.7.x Aug 8 2000 AWKA-ELM(5)