+-------------------------------------+
                   | WASD HTTP SERVER - "NUTS AND BOLTS" |
                   +-------------------------------------+

                            WASD VMS Web Services
                   Copyright (C) 1996-2013 Mark G. Daniel.
                      Revision: v10.3.0 (September 2013)

This package is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation; version 3 of the License, or any later version.  This package is
distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.  See the GNU General Public License for more details.

              WASD_ROOT:[000000]GNU_GENERAL_PUBLIC_LICENSE.TXT
                     http://www.gnu.org/licenses/gpl.txt

                                      
                        Thanks to BETA Testing Sites
                        ----------------------------

WASD has become a large package and requests for additional functionality
frequent enough that the relatively long development-testing-refinement-
release cycle of the first few years is no longer possible.  It is no longer
feasable for the one author to code and exhaustively test all the functionality
of each new release.  The role of BETA testing has become indispensable.
Thanks to all who participate in this part of the cycle, in particular (and in
no significant order);

  Jeremy Begg               VSM Software Services, Adelaide, South Australia
  Alex Daniels              themail, United Kingdom
  Victoriano Giralt         University of Malaga, Spain
  Dr Klaus-Werner Gurgle    Hamburg University, Germany
  Alexander Ivanov          S&B Joint Stock Company, St Petersburg, Russia
  Ruslan Laishev            Delta Telecom, St Petersburg, Russia
  Tom Linden                Kednos Enterprises, California, USA
  Jean-Pierre Petit         ESME-Sudria, Paris, France
  Jean-François Piéronne    European WASD Advocate :^), Paris, France
  Geoff Roberts             St Mark's College, Pt Pirie, South Australia

There may be others but these are the ones who pester me with email.  As the
BETA test sites are also often the ones requesting additional functionality
they should also be thanked for pushing along WASD's capability set.


                      Brief Introduction to HTTPd Code
                      --------------------------------

This document is designed to be only a broad overview of the basic workings of
the HTTP server.  It is derived from an original layed-up document, "Nut and
Bolts", which was abandoned in favour of something requiring a little less
up-keep (therefore increasing it's chances of being up-to-date :^) and that
could be kept with the server code itself.   This description only covers the
server image itself, not the full suite of WASD VMS Hypertext Services
software.


                               General Design
                               --------------

(Almost) without apology the server is a large, monolithic, shall we be kind
and say old-fashioned, piece of software.  Lots of things would be done
differently if it was being started over.  Other things wouldn't be.  The code
is always attempting to do things faster or more efficiently (especially
because it's VMS) and so it's a bit clunky in places.  Other clunkiness is due
entirely to the author ;^)


                              Server Behaviour
                              ----------------

The HTTPd executes permanently on the server host, listening for client
connection requests on TCP/IP port 80 (by default).  It provides concurrent
services for a (technically) unlimitted number of clients (constrained only by
the server resources). When a client connects the server performs the following
tasks:

  1. creates a thread for this request (this term does not denote the use of
     DECthreads or other specific thread library, just a thread of execution)

  2. reads and analyzes the HTTP request sent ...

    o  initiates transfer of the requested file, either from the
       file system or from the file cache

    o  initiates processing of an SSI file

    o  initiates directory listing

    o  initiates processing of a clickable-image mapping file

    o  initiates file/directory create/update

    o  initiates server administration

    o  initiates web file-system update

    o  creates a (detached) process to execute a CGI script with:

      -  SYS$COMMAND and SYS$OUTPUT assigned to intermediate mailboxes
         (essentially pipes)

      -  SYS$INPUT logical name providing a mailbox allowing the script to read
         the raw request body (if any)

      -  CGIPLUSIN logical name providing a mailbox allowing a CGIplus
         (persistent) script to read CGI variables

      -  (for non-CGIplus) DCL symbols containing CGI-compilant variables

      -  for the life of the script process HTTPd:

        *  controls the essential behaviour via its SYS$COMMAND 

        *  receives data written to its SYS$OUTPUT, writing this to the client

        *  supplies the body of a (POSTed) request via SYS$INPUT

    o  creates a (detached) process to execute a WebSocket script with:

      -  SYS$COMMAND assigned to intermediate mailbox (controls process)

      -  SYS$OUTPUT assigned to mailbox until WebSocket established

      -  CGIPLUSIN logical name providing a mailbox allowing the WebSocket
         (CGIplus, persistent) script to read CGI variables

      -  WEBSOCKET_INPUT logical name providing a mailbox to read client
         data for the life of the request

      -  WEBSOCKET_OUTPUT logical name providing a mailbox to write data
         to the client for the life of the request

      -  for the life of the script process HTTPd:

        *  controls the essential behaviour via its SYS$COMMAND 

        *  disconnects the SYS$OUTPUT and CGIPLUSIN mailboxes once WebSocket
           is established and allows another WebSocket request to be accepted

        *  manages multiple WEBSOCKET_INPUT and WEBSOCKET_OUTPUT mailboxes
           (i.e. multiple clients) connected to the process

    o  creates a network process to execute a CGI or OSU script

        *  controls the essential behaviour of a CGI script, providing the
           CGI-compilant variable and receiving CGI-compilant output

        *  provides an emulation of the native OSU scripting environment

  3. closes the connection to the client and disposes of the thread

For I/O intensive activities like file transfer and directory listing, the
AST-driven code provides an efficient, multi- threaded environment for the
concurrent serving of multiple clients.


                               Multi-Threaded
                               --------------

The WASD HTTPd is written to exploit VMS operating system characteristics
allowing the straight-forward implementation of event-driven, multi-threaded
code.  Asynchronous System Traps (ASTs), or software interrupts, at the
conclusion of an I/O (or other) event allow functions to be activated to post-
process the event.  The event traps are automatically queued on a FIFO basis,
allowing a series of events to be sequentially processed.  When not responding
to an event the process is quiescent, or otherwise occupied, effectively
interleaving I/O and processing, and allowing a sophisticated client multi-
threading.

Multi-threaded code is inherently more complex than single-threaded code, and
there are issues involved in the synchronization of some activities in such
an environment.  Fortunately VMS handles many of these issues internally.
After connection acceptance, all of the processing done within the server is at
USER mode AST delivery level, and for all intents and purposes the processing
done therein is atomic, implicitly handling its own synchronization issues.

The HTTPd is written to make longer duration activities, such as the transfer
of a file's contents, event-driven.  Other, shorter duration activites, such as
accepting a client connection request, are handled synchronously.

It is worth noting that with asynchronous, and AST-driven output, the data
being written must be guaranteed to exist without modification for the duration
of the write (indicated by completion AST delivery).  This means data written
must be static or in buffers that persist with the thread.  Function-local
(automatic) storage cannot be used.  The server allocates dynamic storage for
general (e.g. output buffering) or specific (e.g. response headers) uses.


                                AST Behaviour
                                -------------

With server functions having AST capability, in particular $QIO, the server is
designed to rely on the AST routine to report any error, including both those
that occur during the IO operation and any that occur when initiating the IO
(which would normally prevent it being queued) even if that requires directly
setting the IO status block with the offending status and explicitly declaring
the AST.  This eliminates any ambiguity about under what conditions ASTs are
delivered ... ASTs are always delivered.

If a call to a server function with AST capability does not supply an AST
routine then it must check the return status to determine whether it can
continue processing.  If it supplies an AST routine address then it must not
act on any error status returned, it must allow the AST routine to process
according to the IO status block status.


                               Server "Tasks"
                               --------------

Each request can have one or more tasks executed sequentially to fullfil the
request.  This occurs most obviously with Server-Side Includes (SSI, the HTML
pre-processor) but also, to a more limited extent, with directory listing and
its read-me file inclusion.  A task is more-or-less defined as one of:

  o  transfer file from file-system or cache
  o  directory listing
  o  SSI interpretation
  o  DCL execution or script processing
  o  DECnet script processing
  o  POST/PUT processing
  o  WebDAV processing
  o  WebSocket processing
  o  proxy processing
  o  update facility processing

Each one of the associated modules executes relatively independently.  Before
commencing a task, a next-task pointer can be set to the function required to
execute at the conclusion of that task.  At that conclusion, the next- task
functionality checks for a specified task to start or continue.  If it has been
specified control is passed to that next-task function via an AST.

Some tasks can only be called once per request.  For example, image mapping,
file transfer using cache, file upload, menu interpretation.

Other tasks have the possibility of being called within other tasks or multiple
times serially during a request.  An example is the transfer file task
(non-cache), which can be used within directory listings to insert read-me
files, and when "<!-#include"ing multiple files within an SSI document.

Two tasks, the directory listing and SSI interpretation tasks, can be called
multiple times and can also have concurrent instances running.  For example, an
SSI file can <!-#include another SSI file, nesting the SSI execution.  The same
SSI document can have an embedded directory listing that contains an SSI
read-me file with another directory listing. Can get quite convoluted!  The
tasks are inplemented using a linked-list FILO stack allowing this nesting.
SSI documents have a maximum depth for nesting, preventing recursive document
inclusion.


                             Memory Management
                             -----------------

Memory management is (almost) exclusively done using VMS system library virtual
memory routines.  Using these rather that generic C library routines is a
deliberate design decision, and done with the following considerations.

  o  The library routines allow a more precise integrity checking and error
     reporting for both the allocation and freeing of dynamic memory chunks.

  o  Separate zones provide some measure of isolation between threads of usage
     and in this way assist in isolating any errors in memory usage.

  o  Separate zones may be created with characteristics tailored to specific
     memory request profiles, reducing overhead and improving performance.

  o  A separate zone may be used for each request thread improving
     deallocation performance at request disposal.

  o  Memory behaviour for the various aspects of server usage is more easily
     monitored where separate zones represent distinct usages.

Per-request memory is managed in three distinct portions.

  1. A fixed-size structure of dynamic memory is used to contain the core
     request thread data. This is released at thread disposal.  This is
     allocated from a specific virtual memory zone tailored for fixed-size
     management.

  2. A heap of dynamically allocated memory is maintained during the life of a
     thread structure.  When a dynamic structure is required during request
     processing it is allocated from a request-thread-specific zone of virtual
     memory.  This list is released in one operation at thread disposal, making
     it quite efficient.  Maintaining a thread-specific heap of vritual memory
     also makes it easier to avoid memory leakage.
    
  3. Per-task data structures are allocated using the above heap.  These
     structures are used to store task-specific data.  If a task is used
     multiple times within the one request (see above) the previous allocated
     and now finished-with (but not deallocated) task structures can be
     reused, reducing overhead.


                             Output Formatting
                             -----------------

The increasing complexity of the formatting of output (particularly with the
introduction of extended file specifications with ODS-5) prompted the
devlopment of a $FAO-like set of functions for writing formatted output.  This
can write directly into the request dynamic network buffers or into static
character storage.  The directives run parallel to those supported by $FAO,
although it is not a complete implementation and contains a number of variants
and extensions to that service's behaviour.


                              Output Buffering
                              ----------------

To reduce the number of individual network writes, and thus provide significant
improvements in efficiency, generated output can be buffered into larger
packets before sending to the client. Not all modules use this (e.g. File.c)
and not all modules use it all of the time, but all modules work to implement a
seamless integration of output via this mechanism (best seen in the SSI.c
module).

The output buffer functionality underwent a complete redesign for v5.0. It is
now based on a list of one or more buffers that can be used in two modes.

  1. When both an AST address and data to be buffered is supplied the
     buffering function operates to fill one entire buffer, overflowing into a
     second linked into the list. When that overflow occurs the first is
     written to the network asynchronously (calling the supplied AST when
     complete) and the second moved to the head of the list, effectively to
     the front of the buffer, and so on.
     
  2. When no AST address is supplied with the data to be buffered, it keeps on
     filling buffers and adding others to the tail of the list as required,
     creating a virtual buffer with no fixed length.
     
The first mode is used for general buffering (e.g. SSI and directory listings),
streaming data to the client in a sequence of larger aggregates.  The second
mode is useful for functions that must block (e.g. those reporting on data
structures such as the file cache), write a lot of output for a report, and not
want to block general server activity for a long-ish period due to network
throughput (e.g. again the caching reports).  In these cases the entire report
can be written to buffer, then simply asynchronously output, unblocking any
resource it may have held.


                               String Matching
                               ---------------

Matching of strings is a pervasive and important function within the server. 
Two types are supported; wildcard and regular expression.  Wildcard matching is
generally much less expensive (in CPU cycles and time) than regular expression
matching and so should always be used unless the match explicitly requires it.
The StringMatchAndRgex() function attempts to improve the efficiency of both by
performing a preliminary pass to eliminate obvious mismatches using a
light-weight match.  This either matches or doesn't or encounters a pattern
matching meta-character and abort to drop through for a full pattern matching.

Wildcard matching uses the '*' and '%' to match any zero or more, or any one
character respectively.  The '*' wildcard can either be greedy or non-greedy
depending on the context (and for horrible historical reasons).  It can also be
forced to be greedy by using two consecutive ('**').  By default it is not
greedy when matching request paths for mapping or authentication, and is greedy
at other times (matching strings within conditional testing, etc.)

Regular expression matching uses the essentials of the GNU RX 1.5 package. 
Matching is case insensitive (in line with other WASD behaviour) and uses the
Posix EGREP pattern syntax and capabilities.  Regular expressions are
differentiated from wildcard patterns by a leading '^' (non-significant)
character.  Regular expression matching offers significant but fairly expensive
functionality.  One of those expenses is expression compilation.  WASD attempts
to eliminate this by pre-compiling expressions whereever feasable.

Both wildcard (implemented by the WASD matching function) and regular
expressions (integral) use the posix-style registers for noting the offsets of
the matched portions of the strings.  These are then used for wildcard and
specific wildcard (i.e. "*'1") substitution where result strings provide this
(e.g. mapping 'pass' and 'redirect' rules).  A maximum of nine such wildcard
substitutions are supported (one other, the zeroeth, is the full match).


                               Auto-Scripting
                               --------------

The WASD VMS HTTP server has the facility to automatically invoke a script to
process a non-HTML document (file).  This facility is based on detecting the
MIME content data type (via the file's extension) and causing a transparent,
local redirection, invoking the script as if it was specified in the original
request.


                     Internal Directives and "Scripts"
                     ---------------------------------

The HTTPd server detects certain paths and query strings as directives about
its behaviour. Certain paths are interpreted as pseudo, or internal scripts,
handled internal to the server.  Other directives are passed in the query string
component of the request, and as reserved sequences cannot occur in normal
requests (an unlikely combination of characters has been selected).


                       Server Security and Privileges
                       ------------------------------

As a major security design criterion the WASD environment has specified the use
of a non-privileged, non-SYSTEM, non-system-group server account. In this way
it begins with a fairly restricted and safe base, resources limited to those
world- accessable or explicitly allowed to the server account. For access to
selected, essential resources (such a privileged IP ports, for example 80)
selected privileges are enabled only on an as-required basis, then as soon as
the need for that privilege has passed disabled.  Hence, the executable is
installed with the minimum required extended privileges which are operating and
used only as required during the course of processing.  The server program is
almost always executing with only NETMBX and TMPMBX enabled . . . in other
words as a completely average VMS user!

Extended privileges are required for the purposes listed below:

  o  ALTPRI - allows the server account to raise it's priority above 4 if
     enabled by the /PRIORITY= qualifier.

  o  CMKRNL - required for single use of $GRANTID system service in DCL.C
     module.  This rights identifier is used to mark detached WASD script
     processes, so that DclCleanupScriptProcesses() can identify them.
     Is also used to allow the PERSONA_MACRO kludge to do it's thing.

  o  DETACH - allows use of the VMS V6.2 and later $PERSONA services for
     non-server-account scripting.

  o  PRMGBL - used when creating a permanent system-wide global section.
     Shared memory in a permananet global section is used to store accounting
     data (in between server incarnations) and the directive and response
     buffer used for command-line and server admininstration menu control
     purposes (amongst other things).

  o  PRMMBX - used by the subprocess scripting module to create permanent
     mailboxes (much more efficient that creating a new set with each script
     subprocess).

  o  PSWAPM - allows the server process to prevent itself from being
     swapped out if specified by the /[NO]SWAP qualifier.

  o  SECURITY - required by ACME for some SYSUAF authentication

  o  SHMEM - used when creating the permanent system global section (VAX
     only, see PRMGBL).

  o  SYSGBL - used when creating the permanent system global section
     (see PRMGBL).

  o  SYSLCK - the VMS Distributed Lock Manager is explicitly used to
     coordinate some activities on systems and clusters where multiple servers
     are executing. This privilege is required to enqueue system-wide locks.

  o  SYSNAM - is actually not required with version 8.n and later.

  o  SYSPRV  - used for various purposes, including creating sockets within
     the privileged port range (1-1023) which includes port 80 of course.
     Accessing configuration files (which can be protected from world access).
     To ensure the server can stream-LF convert a file. It is also extensively
     used to enable AUTHORIZED write access to the file system. If the
     authorization configuration is set up to allow write access to selected
     portions of the Web-space (by default it's not and up to the local site
     to configure) SYSPRV is enabled just before a file is $CREATEed and then
     immediately disabled. If SYSUAF authentication is enabled (by default it
     is not) then SYSPRV is enabled just before $GETUAI is used to check a
     user's password then immediately disabled.

  o  WORLD - when control functions are used from the command line (e.g.
     HTTPD/DO=RESTART) allows the server to retrieve process details (name,
     user, etc.) of the issuer of that command for inclusion in server logs.

Not that the author doesn't have at least some confidence in his code ;^) but
has also placed a sanity checker which when the server becomes quiescent
establishes that only the NETMBX and TMPMBX privileges are enabled.  The server
will exit with an error message if any extended privileges are enabled at the
time of the check.  (During development in 1997 this check discovered an
instance where an EnableSysPrv() call inadvertantly had been coded instead of
a DisableSysPrv() call :^( so it does work in real-life :^)

The capacity for the server to write into the file system is a major concern,
and a lot of care has been taken to make it as secure as possible. Of course
there is always the chance of a problem :^(  The main defence against a system
design or programming problem allowing write access to the file system is
having the server account as a separate user and group (and definitely
non-SYSTEM).  In this way a part of the file system must explicitly have write
access granted to the server account for it to be able write into the file
system (or for it to have world write access ... but then what is the problem
with server access if the world has access?)  This is recommended to be done
using an ACE (see the Technical Overview). 


                          Server Process Instances
                          ------------------------

The term 'instance' is used to describe an autonomous server process.
WASD will support multiple servers running on a single system, alone or in
combination with multiple servers running across a cluster.  When multiple
instances are configured on a single system they cooperate to distribute the
request load between themselves and share certain essential resources such as
accounting and authorization information.

This sharing introduces several concurrency issues for multiple, per-node
instances (processes).  Data is shared between such processes using global
sections and shared memory.  Access to these is mediated by resource locking.

The VMS Distrubuted Lock Manager (DLM) is used extensively in three roles.

  1. coordinating access to shared resources
  2. storing and distributing data related to resources (e.g. socket use)
  3. initiating actions on a per-node or per-cluster basis (e.g. /DO=)

Mutexes are used to control access to per-system shared memory (e.g. global
accounting data, activity statistics, authentication and SSL session caches).


                                IPv4 and IPv6
                                -------------

The server supports both IPv4 and IPv6 addressing.  Two of the design
objectives were a source that could be compiled on system that did not support
or have the header files for IPv6, and an image that could be executed
regardless of whether IPv6 was suported by the underlying TCP/IP kernel.  The
TCPIP.H module header file contains all the requires IPv4 and IPv6 defintions
and structures to remove dependency on system build environment.  The server
runtime uses a WASD address structure that contains both IPv4 and IPv6 IP
address structures and the server adopts the appropriate behaviour for the
underlying address type being processed.

Server configuration handles the standard dotted-decimal addresses of IPv4, as
well as 'normal' and 'compressed' forms of standard IPv6 literal addresses, and
a (somewhat) standard variation of these that substitutes hyphens for the
colons in these addresses to allow the colon-delimited port component of a
'URL' to be resolved.  The TCPIP.C module describes these.

The TCP/IP Services implementation (at least) has no asynchronous DNS host name
resolution interface for IPv6 as it does $QIO ACPCONTROL for IPv4 (it does have
a POSIX threads compliant C-RTL call but WASD does not use POSIX threads). 
Hence AAAA record lookup under IPv6 is blocking (see TCPIP6.C module).


                      HTTP/1.0 and HTTP/1.1 Compliance
                      --------------------------------

HTTP/1.0 behaviour is based on descriptions in RFC1945 (May, 1996).
Along with a swag of de facto HTTP/1.0 extensions (e.g. keep-alives) and
HTTP/1.1 style functionality that crept into HTTP/1.0 behaviour.

HTTP/1.1 behaviour is based on descriptions in RFC2616 (June, 1999).
WASD supports a subset of the extensive capabilities described in the above RFC
but provides an acceptable and functional compliance.  The 'core' subset is
largely based on RFC2616 section "Compatibility with Previous Versions".

HTTP/1.1 'core compliance' offered by WASD:

  o  persistent connections

     Where a response includes a content-length and the client has not
     requested the connection closed it is maintained.
     Where a content-length is not available or the connection will be
     closed for some other reason the (final) response has a
     "Connection: close" field provided.
     Where a script provides a CGI response header it is checked for
     content-length and the appropriate header fields generated.
     The only response WASD does not enforce this field is with NPH scripts.

  o  absolute URI's in a request

     WASD interprets an absolute URI as a proxy request.  It ignores the
     "Host:" field.  This is the also the HTTP/1.0 behaviour.

  o  requests containing a chunked body

     The body reading module accepts 'chunked' requests
     Includes post-processing any trailing header fields.

  o  "Host:" request header

     WASD returns a 400 (bad request) response if an HTTP/1.1 request
     does not contain this header field.  The field is supplied as a CGI
     variable.

  o  "If-Modified-Since:" or "If-Unmodified-Since:" requests

     WASD has historically provided HTTP/1.0 "If-Modified-Since:" support
     (including the de facto 'length=nnn') when sending files.  For HTTP/1.1
     the 'length=nnn' is unsupported.  WASD supports "If-Unmodified-Since:"
     for file access (static and SSI, etc).  Both request fields are provided
     as CGI variables.

  o  "Expect: 100-continue" request

     WASD will return a 417 (expectation failed) if it is not a
     '100-continue' expectation.  This field is supplied as a CGI variable.

  o  "100 Continue" response

     If the request contained an "Expect: 100-continue" field, is an
     HTTP/1.1 POST or PUT request, then WASD responds with a "100 Continue"
     response header after evaluating such factors as authorization and
     before beginning to read/process the body.

  o  "Date:" header in each response

     The only response WASD does not enforce this field is with NPH scripts.
     This sort of script needs to be modified for compliance.
     The WASD 'REQUEST_TIME_GMT' CGI variable contains the current date/time.

  o  support of HTTP/1.0

     Continued support for requests using the HTTP/1.0 protocol.
  

Other HTTP/1.1 functionality implemented:

  o  request pipelining

     WASD supports request pipelining.  With body-less requests any
     octets in excess of the request header are considered possible
     pipelining.  Requests with bodies (if pipelined at all) are
     implicitly handled as such because the body network reads are
     precisely sized and so do not encroach on data belonging to any
     following request.

  o  "Accept-Ranges: bytes" response

     This advises clients of the byte-range support.

  o  "Range: ..." requests

     WASD provides byte-range responses for non-variable record format
     files from disk and all files stored in cache.  The "Range:" field
     is supplied as a CGI variable.

  o  "If-Range:" request header

     This changes a "Range:" request from a 206 into a full file 200 if
     the file has been modified since the date/time provided in the header.

  o  "Cache-Control:" request header

     The cache-control parameters "no-cache", "no-store" and "max-age=0"
     all disable WASD's various caching mechanisms, particular the file
     cache (where at the very least the file is revalidated).

  o  "ETag:" response header

     For responses generated from static files (also cached) an entity
     tag is generated using the file ID (6 bytes) and last modification
     time (8 bytes) as a 28 character hexadecimal string.  This provides
     a unique (enough) identifier in file-system space and time.

  o  "ETag: request header

     WASD uses this field in conjunction with "If-match:" and/or
     "If-none-match" during file response processing.  This field is
     supplied as a CGI variable.

  o  "If-Match: request header

     Used in conjunction with the "ETag: field during file response
     processing.  This field is supplied as a CGI variable.

  o  "If-None-Match: request header

     Used in conjunction with the "ETag: field during file response
     processing.  This field is supplied as a CGI variable.

  o  TRACE method

     Returns a 200 response with the entire request, header and body,
     as a content-type "message/http".  It honours the "Max-forwards:"
     request header field, attempting to proxy any non-zero values.

  o  OPTIONS method

     Returns a standard 200 response header (only).  It honours the
     "Max-forwards:" request header field, attempting to proxy any
     non-zero values.

  o  extension-methods

     These are HTTP methods not defined in the RFC (i.e. not GET, HEAD,
     POST, etc.) and not supported natively by the server.  WASD allows
     requests using such methods to be mapped to an external agent (RTE,
     script, etc.) for handling.

        if (request-method:EXAMPLE) \
           exec /* (CGI_EXE:EXAMPLE.EXE)/cgi-bin/* map=method 

     The above example would allow any request using the EXAMPLE method
     to be mapped to the EXAMPLE.EXE RTE for processing.


                         Secure Sockets Layer (SSL)
                         --------------------------

The basic WASD package supports only HTTP.  SSL support can be provided by
linking against and using the Compaq SSL for OpenVMS Alpha product, a separate
WASD-specific OpenSSL Toolkit based package, or an existing OpenSSL
installation.  The build procedure allows the conditional compilation of
non-SSL or SSL object code.  The SSL image is built by linking the SSL object
modules with the OpenSSL toolkit.  This approach was adopted to allow the
export of WASD from countries that prohibit such with cryptographic software. 
It does contain the "hooks" for such functionality though which may prohibit
export in themselves.


                                   WebDAV
                                   ------

WebDAV 1,2 required a significant rework of many aspects of data handling in
the server.  Before general release my feeling is that WebDAV was not worth the
effort.  I guess time will tell.  The DAVWEB.C module contains comprehensive
commentary on WASD WebDAV functionality.

WebDAV makes extensive use of XML in its request bodies, response bodies and
meta-data maintained against resources (files and directories).  EXPAT by Clark
Cooper was chosen.  It has a reputation for being fast, efficient, and document
event-driven structure.  The XML managed by WASD has a relatively simple
structure and EXPAT suits it fine.

There are several WebDAV modules providing WebDAV-specific processing.  Some
other methods have WebDAV-specific behaviours.  PUT is an obvious one.  OPTIONS
also reports WebDAV methods when it is enabled.


                                  WebSocket
                                  ---------

WebSocket is an HTML 5 capability providing an asynchronous, bidirectional
full-duplex connection over which messages can be sent between agents. 
WebSocket applications (scripts) run in CGIplus (persistent) processes setup
and controlled by the DCL module, very similar to other CGIplus scripting. 
However, WebSocket scripts can handle multiple, concurrent clients!

Managing persistent, long-running, asynchronous connections between clients and
scripting proceesses is generally counter to the HTTP request-response,
client-server paradigm, and is implemented using multiple additional
WebSocket-aware hooks throughout WASD request, DCL and response processing
code.


                        Request Processing Flowchart
                        ----------------------------

One of the more difficult aspects of event-driven (AST) programming is the
constant lack of a clear code path.  This section attempts to provide an
overview of the major functional points in processing a request.  Needless to
say there are are lots of detours and some dead-ends when processing but what
follows is the essence of a standard request.  Processing is from top-to-bottom
except where the crude arrows flow otherwise.  Of course
NetWrite()/SesolaNetWrite() and their relatives are employed all the time to
provide output to the client.


   NetAcceptAst()                          !received socket connection
   |
   +- NetAccept()                          !queue the next socket accept
   |
   RequestBegin()            <--+          !initialize request processing
   |                            |
   +- SesolaNetRequestBegin() --+          !if SSL then SSL-accept connection
   |
   RequestGet()                <--+        !get request header
   |                              |
   +- NetRead()/SesolaNetRead() --+        !read request header
   |
   RequestParseAndExecute()                !parse request line
   |
   +- RequestFields()                      !parse request header fields
   |
   RequestExecute()                        !begin to process request content
   | 
   MapUrl_Map()                            !apply mapping rules
   |
   +- RequestEnd()                         !if redirect from mapping
   |     |
   |     RequestRedirect() -->             !redirect (possibly restart request)
   |     |
   |     RequestEnd()                      !301/302 response redirect
   |
   +- ProxyRequestBegin()                  !proxy request (HTTP)
   |        :
   |     ProxyRequestEnd()                 !after proxy processing
   |        |
   |        RequestEnd()                   !end of request
   |
   Authorize() --+                         !authorize access to request path
   |             |
   RequestExecutePostAuth1()               !either direct or after AST
   |  
   +- AdminBegin()                         !server administration 
   |     :
   |     RequestEnd()                      !once complete
   |
   +- HtAdminBegin()                       !HTA database administration
   |     :
   |     RequestEnd()                      !once complete
   |
   +- other-internal                       !other internally handled
   |     :
   |     RequestEnd()                      !once complete
   |
   +- Authorize() ------+                  !authorize access to script path
   |     |              |
   |     RequestEnd()   |                  !not authorized
   |                    |
   RequestExecutePostAuth2()               !either direct of after AST
   |  |
   |  RequestEnd()                         !not authorized
   |
   +- ThrottleBegin() --+                  !request subject to queuing
   |                    |
   RequestExecutePostThrottle()            !none, or post queuing after AST
   |
   +- RequestScript()                      !script to process
   |     |
   |     +- internal                       !internally handled look-alikes
   |     |     :
   |     |     RequestEnd()                !once complete
   |     |
   |     +- DclBegin()                     !sub/detached process CGI script
   |     |     :
   |     |     RequestEnd()                !once complete
   |     |
   |     +- DECnetBegin()                  !network CGI or OSU script
   |           :
   |           RequestEnd()                !once complete
   |
   +- FileBegin()                          !search cache, continue if not found
   |     |
   |     CacheSearch() ---------+          !if not found in cache
   |        |                   |
   |        CacheBegin()        |          !cache entry require revalidation?
   |           |                |
   |        CacheAcpInfoAst() --+          !if cache entry is stale
   |           |                |
   |        CacheNext()         |          !transfer from cache
   |           :                |
   |        RequestEnd()        |          !once cache transfer complete
   |                            |
   RequestExecutePostCache() <--+          !cannot be supplied from cache
   |
   DavWebRequest()                         !if WebDAV-specific method
   |     :                                 !(some methods, e.g. PUT, can also
   |     DavWebEnd()                       ! have WebDAV-specific behaviour)
   |
   PutBegin()                              !if a PUT, POST, or DELETE method
   |     :
   |     RequestEnd()                      !once PUT or POSTed
   |
   +- RequestHomePage() <--+               !search for a home page
   |     |                 |
   |     + ----------------+               !possible multiple names
   |     |
   |     RequestFile()                     !home page found, transfer
   |     |  :
   |     |  RequestEnd()                   !once transfered
   |     |
   |     DirBegin() -->                    !no home page, directory listing
   |        :
   |        RequestEnd()                   !once listed
   |
   +- RequestEnd()                         !auto-script
   |     |
   |     RequestRedirect() -->             !redirect (possibly restart request)
   |     |
   |     RequestEnd()                      !301/302 response redirect
   |
   +- IsMapBegin()                         !image mapping
   |     :
   |     RequestEnd()                      !after mapping
   |
   +- RequestEnd()                         !search keyword 
   |     |
   |     RequestRedirect() -->             !redirect (possibly restart request)
   |     |
   |     RequestEnd()                      !301/302 response redirect
   |
   RequestFile()                           !transfer file
      :
      RequestEnd()                         !once transfered
         |
         RequestRedirect() -->             !redirect (possibly restart request)
         |
         RequestEnd()                      !301/302 response redirect


                               HTTPd Modules
                               -------------

The HTTPd server comprises several main modules, implementing the obvious
functionality of the server, and other, smaller, support modules.  Modules
contain descriptive prologues.  As these are usually up-to-date (usually
more-so than discrete documentation such as this text anyway), it is strongly
recommended that the source code modules be consulted for specific information
on how each operates.  This section is provided only to give a quick overview.

All files are located in HT_ROOT:[SRC.HTTPD]


                                   WASD.H

This C header file contains many of the general data structures and macros
required for the HTTPd.

                                  ENAMEL.H

This header is used to work-around the issue of older versions of VMS and DECC
not having a long NAM structure or definitions used to support ODS-5. For the
same reason an extended FIB structure definition must be provided. For such
environments this header file provides the necessary infrastructure, allowing
extended file specification compliant code to be compiled and linked on
pre-v7.2 systems.

                                   ADMIN.C

Server administration is based in this module, although other modules provide
their own functions for implementing menus, reports and actions.


                                   AUTH.C

This module provides path-based authorization and authentica- tion. Uses the
HTTPD$AUTH file as the source of configuration information. This module uses
the other AUTHxxxxxx.C modules to provide particular required functionalities.


                                 AUTHACME.C

Allows authentication and SYSUAF password changes using the ACME server via the
$ACM system service.  Provides both SYSUAF authentication and from other
sources using other and third-party ACME agents.


                                 AUTHAGENT.C

Provide authentication and authorization information from an external,
CGIplus-based script.


                                 AUTHCACHE.C

Provides the caching of authentication/authorization information so that the
sources do not need to be accessed for every request.


                                AUTHCONFIG.C

Loads authorization configuration information from the HTTPD$CONFIG file into
a data structure that the HTTPd then uses during authorization.


                                  AUTHHTA.C

Accesses information stored in binary authentication databases (files).


                                  AUTHHTL.C

Accesses information stored in plain-text authentication files.


                                 AUTHIDENT.C

Provide authentication and authorization from an RFC1413 source ("ident"
protocol).


                                  AUTHTOKEN.C

Accesses authentication/authorization in one context and reflects that
authorization to another using a short-lived token delivered as a cookie.


                                  AUTHVMS.C

Accesses authentication and authorization information from the system's SYSUAF
and RIGHTSLIST databases.


                                  BASE64.C

Base-64 encoding/decoding (well, golllee!)


                                   BASIC.C

Implmentation of the "BASIC" HTTP authentication scheme are implemented in
this module.


                                   BODY.C

Controls the transfer of the request body (for POST and PUT methods) from the
client to the server and then on to a script, proxied server or internal HTTP
module for processing.  A script or a proxied HTTP server receives the "raw"
data provided by the client.  Internal modules or a proxied FTP server get data
that has been processed by one of three functions that "filter" the data from
a request body.  The first just puts the entire body into a single buffer and
is used internally.  The others extract data from
"application/x-www-form-urlencoded" or "multipart/form-data" content-types.


                                   CACHE.C

This module implements a file data and revision time cache, designed to
provided an efficient static document request (file transfer) mechanism.


                                    CGI.C

The CGI module provides the CGI scripting support functions used by the DCL.C
and DECNET.C modules.


                                    CLI.C

Module to process (interpret) the HTTPD command-line.


                                  CONTROL.C

This module implements the HTTPd command-line control functionality. At the
command-line an administrator may enter HTTP/DO=command[/ALL]. This is written
to the HTTPd via shared memory in a global sector, interpreted and the
requested action taken or an error message sent back to the administrator. The
Distributed Lock Manager (DLM) is used to alert a detached server process that
a command is available for actions, and to distribute a command to all server
cluster-wide when using the /ALL qualifier.


                                  CONFIG.C

This module provides basic server and service configuration. Uses the
HTTPD$CONFIG file as the source of configuration information.

                                  DAVCOPY.C

WebDAV copy a file or directory tree.


                                 DAVDELETE.C

WebDAV delete a file or directory tree.


                                  DAVLOCK.C

Manage a WebDAV lock on a resource (file or tree).  Although WASD WebDAV uses
VMS locks and the DLM for managing access within its processing this module
deals exclusively with WebDAV locking.  WASD WebDAV locks are stores in
meta-data files managed by DAVMETA.C.


                                  DAVMETA.C

WebDAV meta-data uses XML for its representation.  WASD WebDAV meta-data is
stored in a separate file along with the file it is the meta-data of (or
directory file if a directory).  This module manages the WASD-specific data
stored in that (WebDAV locks) and independent client-supplied entities.


                                  DAVMOVE.C

WebDAV move a file or directory tree.  For efficiency renames the file or
directory file if on the same volume, or if on another volume copies and
deletes (much more expensive).


                                  DAVPROP.C

Generates WebDAV "live" properties for files and directories.  These are XML
representations of such data as creation and modification timer, size, etc.


                                  DAVWEB.C

This is the primary WebDAV processing module.  All WebDAV-specific processing
begins and ends with this.  It also pr4ovides some common WebDAV processing
functionality and the WebDAV report code.  The prologue contains significant
WASD Web-DAV comments.

                                      
                                    DCL.C

The DCL execution functionality must interface and coordinate with an external
subprocess. Supports CGI and CGIplus scripting and SSI DCL-processed
directives.


                                  DECNET.C

The DECnet module provides scripting based on process management using DECnet.
Both standard WASD CGI scripting and an emulation of OSU (DECthreads)
scripting are supported.


                                   DESCR.C

The Descr.c module generates a file description by searching HTML files for
the first occurance of <TITLE>...</TITLE> or <Hn>...</Hn> tags, using the
description provided there-in. It is primarily used by the directory listing
module, but can also be used by the menu module.


                                  DIGEST.C

This module provides authentication functionality for the DIGEST method.  I do
not know of any browser or user group that employs the mechanism.


                                    DIR.C

This module implements the HTTPd directory listing (Index-of) functionality.


                                   ERROR.C

Error generating and reporting functions.


                                    FAO.C

Provides formatted write capability to storage and network buffer space.
Functionality based on the VMS system service $FAO but contains all sorts of
extensions to facilitate WASD objectives.  It's versatility makes it an
indispensable code module.


                                   FILE.C

This module implements the static file transfer functionality.


                                   GRAPH.C

This module generates the server activity graph.


                                   GZIP.C

Provides ZLIB-enabled GZIP compression (gzip, deflate) for WASD.  Dynamically
maps required functions from a ZLIB shareable image.  If the image is found and
all required functions present the ZLIB compression/decompression of network
streams in enabled for suitable requests/responses.  If not it is just left
disabled.  Based on ZLIB port by Jean-François Piéronne.  Requires this package
to be installed and started on the runtime system for dynamic activation.  


                                  HTADMIN.C

This module allows on-line administration of the HTTPd- specific
authentication (.HTA) databases.


                                   HTTPD.C

This is the main()  module of the server. It performs server startup and
shutdown, along with other miscellaneous functionality.


                                 INSTANCE.C

Contains functions used to setup, maintain and coordinate action between,
multiple servers running on a single system, alone or in combination with
multiple servers running across a cluster.  An "instance" in this context
refers to the (almost) completely autonomous server process.


                                   ISMAP.C

The clickable-image support module provides this functionality as an
integrated part of the server. It supports image configuration file directives
in either of NCSA or CERN formats.


                                  LOGGING.C

The logging module provides an access log (server logs, including error
messages are generated by the detached HTTPd process. The access log format
can be that of the Web-standard, "common"-format, "common+server"-format or
"combined"-format, along with user-definable formats, allowing processing by
most log-analysis tools.


                                  MAPCON.C

Used by the MAPURL.C module to process (the somewhat obsolesecent) mapping rule 
conditionals.


                                  MAPODS.C

Used by the MAPURL.C module to support mapping of URL-style specifications to
VMS file system specifications and back.  Can process ODS-2, ODS-5 (EFS), SRI
(MultiNet NFS), PATHWORKS (v4/5) and Advanced Server (PATHWORKS V6) / Samba
encodings.


                                  MAPURL.C

Main module supporting mapping of URLs to VMS file specifications and VMS
specifications to URLs, setting specified characteristics agaionst paths,
etc.  Uses the HTTPD$MAP file as the source of configuration information.


                                  MAPUSER.C

Used by the MAPURL.C module to support mapping /~ style requests (user home
directory) using SYSUAF account data to map the username's home file
specification.


                                    MD5.C

Module providing MD5 digest code from RFC1321. This is used to generated
"digests" (or unique fingerprints of sequences of bytes and strings) for use
in authorization and the generation of file names in proxy caching.


                                   MENU.C

This module implements the WASD menu interpretation functionality.


                                    MSG.C

The message database for the server is maintained by this module. Uses the
HTTPD$MSG file as the source of configuration information.


                                    NET.C

This module handles all non-SSL TCP/IP network activites, from creating the
server socket and listening on the port, to reading and writing network I/O.
It manages request initiation and rundown, and controls connection
persistence.  It is developed using, and based upon the behaviour of, both the
Digital TCP/IP Services (UCX) BG driver.  Any other package that supports this
through emulation will support WASD.


                                    ODS.C

This module supports file system access for both ODS-2 and where appropriate
(Alpha VMS v7.2ff) ODS-5.

It does this by abstracting the NAM block so that either the standard or long
formats may be used without other modules needing to be aware of the
underlying structure. The VAX version does not need to know about long NAMs,
or extended file specifications in general (excluded for code compactness and
minor efficiency reasons using the ODS_EXTENDED macro). Alpha versions prior
to 7.2 do not know about NAMLs and so for compilation in these environments a
compatible NAML is provided (ENAMEL.H header file, see ENAMEL.H), allowing
extended file specification compliant code to be compiled and linked on
pre-v7.2 systems.

Runtime decision structures based on the VMS version, device ACP type, etc.,
must be used if pre-v7.2 systems are to avoid using the NAML structure with
RMS (which would of course result in runtime errors). In this way a single set
of base- level Alpha object modules, built in either environment, will support
both pre and post 7.2 environments.


                                  PERSONA.C

For VMS V6.2 and later provides access to the $PERSONA services allowing the
server to create scripting processes executing under user accounts other than
itself (see DCL.C). For VMS versions not supporting the $PERSONA services
(earlier than V6.2) it creates stubs allowing the module to be linked but just
returning a status indicating it is not available.


                                   PROXY.C

All data structures and general macros, etc., for proxy processing are located
in the following header file.  Provides the request and network functionality
for proxy HTTP and HTTPS serving.


                                PROXYCACHE.C

Implements a basic HTTP proxy disk caching service. The design always trades
simplicity off against efficiency and elegance. Cache management is performed
by the PROXYMAINT.C module. It works intimately with this module to provide
routine and reactive purging (deletion) of cache files to maintain device free
space and currency of cache content.


                                 PROXYFTP.C

Provides FTP proxying for both GET (RETR) and POST/PUT (STOR) methods. 
Interprets directory listings (LIST) for DOS, Unix and VMS FTP servers (makes a
feeble attempt for those it doesn't recognise).


                                PROXYMAINT.C

See PROXYCACHE.C above.


                                 PROXYNET.C

This module provides the essential networking functionality for proxy
processing.  It also maintains the pool of persistent proxy->origin server
connections.


                                PROXYTUNNEL.C

WASD supports the CONNECT method which effectively allows tunnelling of RAW
octets through the proxy server.  This facility is most commonly used to allow
secure SSL connections to be established with hosts on the 'other side' of the
proxy server.  This basic mechanism is also used by WASD to provide an extended
range of tunnelling services.


                                PROXYVERIFY.C

Implements a relatively simple, pragmatic mechanism that allows a proxy server
to authorize a request locally, then convey that authorized username to a
reverse-proxied server using a standard HTTP "Authorization: basic ..." request
field, while keeping the original password private.  It then provides an
HTTP-based mechanism for the proxied-to server to verify that the request is
indeed originating from the proxy server.


                                    PUT.C

The PUT module allows files to be uploaded to, and stored by, the server. It
also allows the deletion of files, and the creation and deletion of
directories. This same module handles PUT, POST and DELETE methods
appropriately. It requires authorization to be enabled on the server.


                                   REGEX.C

The GNU RX regular expression package REGEX.C and REGEX.H files, unmodified
except for a small WASD macro introduced at the beginning of the former.


                                  REQUEST.C

This module reads the request header from the client, parses this, and then
calls the appropriate task function to execute the request (i.e. send a file,
SSI an HTML file, generate a directory listing, execute a script, etc.)


                                 RESPONSE.C

This module provides support for generating appropriate HTTP responses and
response-related processing.  It also deals specifically with National
Character Set (NCS) configuration and conversions.


                                  SESOLA.C

Thhe SESOLA.. modules provide support for SEcure SOcket LAyer processing.  This
module provides the optional Secure Sockets Layer (SSL) encrypted communication
link functionality for WASD and is named "SeSoLa" to avoid any confusion and/or
conflict with OpenSSL package library routines.  These modules are
conditionally compiled into two sets of object modules, one for non-SSL
servers, the other for those built against the optional OpenSSL toolkit.


                                SESOLACACHE.C

Implements the instance-shared SSL session cache.  This allows multiple
per-node instances to share session contexts.


                                 SESOLACGI.C

Generates the SSL-specific CGI variables.  Supports Purveyor and Apache
environment sets of variables.


                               SESOLACLIENT.C

Handles processing specifically for the negotiation for and verification of
peer (client) certificates.  It also provide support for X.509 certificate
authentication and authorization.


                                 SESOLANET.C

This module provides the SSL network processing.  It has functions supporting
the encryption of plain data streams and the subsequent transmission to the
remote SSL service, as well as the decryption back to original data of
recei8ved SSL streams.  It also supports the session acceptance (for SSL
requests) and session establishment (connections, for SSL gatewaying).


                                   SHA1.C

Provides SHA-1 digest hashing used during the WebSocket request handshake.


                                    SSI.C

The Server Side Includes (HTML pre-processor) module provides this
functionality as an integrated part of the server.


                                   STMLF.C

The stmLF.c module converts VARIABLE format records to STREAM- LF. It is only
called from the File.c module and only then when STREAM-LF conversion is
enabled within the server configuration.


                                  STRDSC.C

Particularly with the introduction of WebDAV and the associated XML processing
required some more versatile method of string handling was required.  This
module uses a data structure allowing 32 bit sized strings (i.e. > 65365),
linked-lists of buffers of these, and automatically uses dynamic memory
associated with requests or server as appropriate.  FAO.C and NET.C can use
these structures.  Data contained in them does not necessarily have to be
ASCII/null-terminated of course.


                                   STRNG.C

This code module contains functions related to string processing.  In
particular string wildcard and regular expression matching (see above).


                                  SUPPORT.C

The support module provides a number of miscellaneous support functions for
the HTTPd (well go-o-o-lee!).


                                   TCPIP.C

With the introduction of IPv6 support some generic IP and TCP/IP functionality
was move or placed into this module.  It's header contains some structure
definitions and macros used extensively in the support of QIO-based network I/O
in WASD.  The code module contains several support functions that tmake IPv4
and IPv6 supportable without recompilation.

                                   TCPIP6.C

IPv6 address resolution functions only available with VMS V7.0 and later.
IPv6 name/address resolution is a little problematic for WASD because there is
(currently) no native asynchronous interface to it in the same way as there is
with $QIO ACPCONTROL for IPv4.  Fortunately IPv6 is somewhat of a niche
environment!


                                 THROTTLE.C

Provides request throttling functions. This controls the number of concurrent
processing requests against a specified path. Requests in excess of specified
limits are FIFO queued.


                                   TRACK.C

Provides session tracking functions.


                                    UPD.C

Implements the on-line web directory administration and file editing and
upload facility. It requires authorization to be enabled on the server. File
and directory modification are still performed by the Put.c module. The Upd.c
is an overly large body of code generating all of the dialogues and editing
pages. It also provides additional functionality for the server
administration, adding admin-specific portions of dialogues as required.


                                  VERSION.C

The VERSION module merely provides a small, convenient place to generate some
build information.


                                    VM.C

The virtual memory management module provides dynamic memory allocation and
deallocation functions. These functions use the VMS system library virtual
memory routines. Also see general comments in section "Memory Management".


                                   WATCH.C

The WATCH facility provides an online, real-time, in-browser-window view of
request processing in the running server.  It support functionality to allow a
site administrator to intergate the processing server for resolving
configuration and other run-time issues (the WATCH_CAT, or category
functionality).  It also contains extensive in-detail request processing
functionality intended for server development and BETA debugging purposes (the
WATCH_MOD, or module funtionality).

                                  WEBSOCK.C

Essentially this module handles the setup and tear-down of mailbox "pipes" to
persistent (CGIplus) processes managed by the DCL.C module.  Each of these
processes can handle multiple such WebSocket clients.  Once established the
mailbox "pipes" just transfer the raw network data to and from the client and
scripting process.  The WebSocket protocol must be implemented by the script
(commonly using a library).


                                   +-----+
                                   + END +
                                   +-----+