+-------------------------------------+ | WASD HTTP SERVER - "NUTS AND BOLTS" | +-------------------------------------+ WASD VMS Web Services Copyright (C) 1996-2013 Mark G. Daniel. Revision: v10.3.0 (September 2013) This package is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; version 3 of the License, or any later version. This package is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. WASD_ROOT:[000000]GNU_GENERAL_PUBLIC_LICENSE.TXT http://www.gnu.org/licenses/gpl.txt Thanks to BETA Testing Sites ---------------------------- WASD has become a large package and requests for additional functionality frequent enough that the relatively long development-testing-refinement- release cycle of the first few years is no longer possible. It is no longer feasable for the one author to code and exhaustively test all the functionality of each new release. The role of BETA testing has become indispensable. Thanks to all who participate in this part of the cycle, in particular (and in no significant order); Jeremy Begg VSM Software Services, Adelaide, South Australia Alex Daniels themail, United Kingdom Victoriano Giralt University of Malaga, Spain Dr Klaus-Werner Gurgle Hamburg University, Germany Alexander Ivanov S&B Joint Stock Company, St Petersburg, Russia Ruslan Laishev Delta Telecom, St Petersburg, Russia Tom Linden Kednos Enterprises, California, USA Jean-Pierre Petit ESME-Sudria, Paris, France Jean-François Piéronne European WASD Advocate :^), Paris, France Geoff Roberts St Mark's College, Pt Pirie, South Australia There may be others but these are the ones who pester me with email. As the BETA test sites are also often the ones requesting additional functionality they should also be thanked for pushing along WASD's capability set. Brief Introduction to HTTPd Code -------------------------------- This document is designed to be only a broad overview of the basic workings of the HTTP server. It is derived from an original layed-up document, "Nut and Bolts", which was abandoned in favour of something requiring a little less up-keep (therefore increasing it's chances of being up-to-date :^) and that could be kept with the server code itself. This description only covers the server image itself, not the full suite of WASD VMS Hypertext Services software. General Design -------------- (Almost) without apology the server is a large, monolithic, shall we be kind and say old-fashioned, piece of software. Lots of things would be done differently if it was being started over. Other things wouldn't be. The code is always attempting to do things faster or more efficiently (especially because it's VMS) and so it's a bit clunky in places. Other clunkiness is due entirely to the author ;^) Server Behaviour ---------------- The HTTPd executes permanently on the server host, listening for client connection requests on TCP/IP port 80 (by default). It provides concurrent services for a (technically) unlimitted number of clients (constrained only by the server resources). When a client connects the server performs the following tasks: 1. creates a thread for this request (this term does not denote the use of DECthreads or other specific thread library, just a thread of execution) 2. reads and analyzes the HTTP request sent ... o initiates transfer of the requested file, either from the file system or from the file cache o initiates processing of an SSI file o initiates directory listing o initiates processing of a clickable-image mapping file o initiates file/directory create/update o initiates server administration o initiates web file-system update o creates a (detached) process to execute a CGI script with: - SYS$COMMAND and SYS$OUTPUT assigned to intermediate mailboxes (essentially pipes) - SYS$INPUT logical name providing a mailbox allowing the script to read the raw request body (if any) - CGIPLUSIN logical name providing a mailbox allowing a CGIplus (persistent) script to read CGI variables - (for non-CGIplus) DCL symbols containing CGI-compilant variables - for the life of the script process HTTPd: * controls the essential behaviour via its SYS$COMMAND * receives data written to its SYS$OUTPUT, writing this to the client * supplies the body of a (POSTed) request via SYS$INPUT o creates a (detached) process to execute a WebSocket script with: - SYS$COMMAND assigned to intermediate mailbox (controls process) - SYS$OUTPUT assigned to mailbox until WebSocket established - CGIPLUSIN logical name providing a mailbox allowing the WebSocket (CGIplus, persistent) script to read CGI variables - WEBSOCKET_INPUT logical name providing a mailbox to read client data for the life of the request - WEBSOCKET_OUTPUT logical name providing a mailbox to write data to the client for the life of the request - for the life of the script process HTTPd: * controls the essential behaviour via its SYS$COMMAND * disconnects the SYS$OUTPUT and CGIPLUSIN mailboxes once WebSocket is established and allows another WebSocket request to be accepted * manages multiple WEBSOCKET_INPUT and WEBSOCKET_OUTPUT mailboxes (i.e. multiple clients) connected to the process o creates a network process to execute a CGI or OSU script * controls the essential behaviour of a CGI script, providing the CGI-compilant variable and receiving CGI-compilant output * provides an emulation of the native OSU scripting environment 3. closes the connection to the client and disposes of the thread For I/O intensive activities like file transfer and directory listing, the AST-driven code provides an efficient, multi- threaded environment for the concurrent serving of multiple clients. Multi-Threaded -------------- The WASD HTTPd is written to exploit VMS operating system characteristics allowing the straight-forward implementation of event-driven, multi-threaded code. Asynchronous System Traps (ASTs), or software interrupts, at the conclusion of an I/O (or other) event allow functions to be activated to post- process the event. The event traps are automatically queued on a FIFO basis, allowing a series of events to be sequentially processed. When not responding to an event the process is quiescent, or otherwise occupied, effectively interleaving I/O and processing, and allowing a sophisticated client multi- threading. Multi-threaded code is inherently more complex than single-threaded code, and there are issues involved in the synchronization of some activities in such an environment. Fortunately VMS handles many of these issues internally. After connection acceptance, all of the processing done within the server is at USER mode AST delivery level, and for all intents and purposes the processing done therein is atomic, implicitly handling its own synchronization issues. The HTTPd is written to make longer duration activities, such as the transfer of a file's contents, event-driven. Other, shorter duration activites, such as accepting a client connection request, are handled synchronously. It is worth noting that with asynchronous, and AST-driven output, the data being written must be guaranteed to exist without modification for the duration of the write (indicated by completion AST delivery). This means data written must be static or in buffers that persist with the thread. Function-local (automatic) storage cannot be used. The server allocates dynamic storage for general (e.g. output buffering) or specific (e.g. response headers) uses. AST Behaviour ------------- With server functions having AST capability, in particular $QIO, the server is designed to rely on the AST routine to report any error, including both those that occur during the IO operation and any that occur when initiating the IO (which would normally prevent it being queued) even if that requires directly setting the IO status block with the offending status and explicitly declaring the AST. This eliminates any ambiguity about under what conditions ASTs are delivered ... ASTs are always delivered. If a call to a server function with AST capability does not supply an AST routine then it must check the return status to determine whether it can continue processing. If it supplies an AST routine address then it must not act on any error status returned, it must allow the AST routine to process according to the IO status block status. Server "Tasks" -------------- Each request can have one or more tasks executed sequentially to fullfil the request. This occurs most obviously with Server-Side Includes (SSI, the HTML pre-processor) but also, to a more limited extent, with directory listing and its read-me file inclusion. A task is more-or-less defined as one of: o transfer file from file-system or cache o directory listing o SSI interpretation o DCL execution or script processing o DECnet script processing o POST/PUT processing o WebDAV processing o WebSocket processing o proxy processing o update facility processing Each one of the associated modules executes relatively independently. Before commencing a task, a next-task pointer can be set to the function required to execute at the conclusion of that task. At that conclusion, the next- task functionality checks for a specified task to start or continue. If it has been specified control is passed to that next-task function via an AST. Some tasks can only be called once per request. For example, image mapping, file transfer using cache, file upload, menu interpretation. Other tasks have the possibility of being called within other tasks or multiple times serially during a request. An example is the transfer file task (non-cache), which can be used within directory listings to insert read-me files, and when " !redirect (possibly restart request) | | | RequestEnd() !301/302 response redirect | +- ProxyRequestBegin() !proxy request (HTTP) | : | ProxyRequestEnd() !after proxy processing | | | RequestEnd() !end of request | Authorize() --+ !authorize access to request path | | RequestExecutePostAuth1() !either direct or after AST | +- AdminBegin() !server administration | : | RequestEnd() !once complete | +- HtAdminBegin() !HTA database administration | : | RequestEnd() !once complete | +- other-internal !other internally handled | : | RequestEnd() !once complete | +- Authorize() ------+ !authorize access to script path | | | | RequestEnd() | !not authorized | | RequestExecutePostAuth2() !either direct of after AST | | | RequestEnd() !not authorized | +- ThrottleBegin() --+ !request subject to queuing | | RequestExecutePostThrottle() !none, or post queuing after AST | +- RequestScript() !script to process | | | +- internal !internally handled look-alikes | | : | | RequestEnd() !once complete | | | +- DclBegin() !sub/detached process CGI script | | : | | RequestEnd() !once complete | | | +- DECnetBegin() !network CGI or OSU script | : | RequestEnd() !once complete | +- FileBegin() !search cache, continue if not found | | | CacheSearch() ---------+ !if not found in cache | | | | CacheBegin() | !cache entry require revalidation? | | | | CacheAcpInfoAst() --+ !if cache entry is stale | | | | CacheNext() | !transfer from cache | : | | RequestEnd() | !once cache transfer complete | | RequestExecutePostCache() <--+ !cannot be supplied from cache | DavWebRequest() !if WebDAV-specific method | : !(some methods, e.g. PUT, can also | DavWebEnd() ! have WebDAV-specific behaviour) | PutBegin() !if a PUT, POST, or DELETE method | : | RequestEnd() !once PUT or POSTed | +- RequestHomePage() <--+ !search for a home page | | | | + ----------------+ !possible multiple names | | | RequestFile() !home page found, transfer | | : | | RequestEnd() !once transfered | | | DirBegin() --> !no home page, directory listing | : | RequestEnd() !once listed | +- RequestEnd() !auto-script | | | RequestRedirect() --> !redirect (possibly restart request) | | | RequestEnd() !301/302 response redirect | +- IsMapBegin() !image mapping | : | RequestEnd() !after mapping | +- RequestEnd() !search keyword | | | RequestRedirect() --> !redirect (possibly restart request) | | | RequestEnd() !301/302 response redirect | RequestFile() !transfer file : RequestEnd() !once transfered | RequestRedirect() --> !redirect (possibly restart request) | RequestEnd() !301/302 response redirect HTTPd Modules ------------- The HTTPd server comprises several main modules, implementing the obvious functionality of the server, and other, smaller, support modules. Modules contain descriptive prologues. As these are usually up-to-date (usually more-so than discrete documentation such as this text anyway), it is strongly recommended that the source code modules be consulted for specific information on how each operates. This section is provided only to give a quick overview. All files are located in HT_ROOT:[SRC.HTTPD] WASD.H This C header file contains many of the general data structures and macros required for the HTTPd. ENAMEL.H This header is used to work-around the issue of older versions of VMS and DECC not having a long NAM structure or definitions used to support ODS-5. For the same reason an extended FIB structure definition must be provided. For such environments this header file provides the necessary infrastructure, allowing extended file specification compliant code to be compiled and linked on pre-v7.2 systems. ADMIN.C Server administration is based in this module, although other modules provide their own functions for implementing menus, reports and actions. AUTH.C This module provides path-based authorization and authentica- tion. Uses the HTTPD$AUTH file as the source of configuration information. This module uses the other AUTHxxxxxx.C modules to provide particular required functionalities. AUTHACME.C Allows authentication and SYSUAF password changes using the ACME server via the $ACM system service. Provides both SYSUAF authentication and from other sources using other and third-party ACME agents. AUTHAGENT.C Provide authentication and authorization information from an external, CGIplus-based script. AUTHCACHE.C Provides the caching of authentication/authorization information so that the sources do not need to be accessed for every request. AUTHCONFIG.C Loads authorization configuration information from the HTTPD$CONFIG file into a data structure that the HTTPd then uses during authorization. AUTHHTA.C Accesses information stored in binary authentication databases (files). AUTHHTL.C Accesses information stored in plain-text authentication files. AUTHIDENT.C Provide authentication and authorization from an RFC1413 source ("ident" protocol). AUTHTOKEN.C Accesses authentication/authorization in one context and reflects that authorization to another using a short-lived token delivered as a cookie. AUTHVMS.C Accesses authentication and authorization information from the system's SYSUAF and RIGHTSLIST databases. BASE64.C Base-64 encoding/decoding (well, golllee!) BASIC.C Implmentation of the "BASIC" HTTP authentication scheme are implemented in this module. BODY.C Controls the transfer of the request body (for POST and PUT methods) from the client to the server and then on to a script, proxied server or internal HTTP module for processing. A script or a proxied HTTP server receives the "raw" data provided by the client. Internal modules or a proxied FTP server get data that has been processed by one of three functions that "filter" the data from a request body. The first just puts the entire body into a single buffer and is used internally. The others extract data from "application/x-www-form-urlencoded" or "multipart/form-data" content-types. CACHE.C This module implements a file data and revision time cache, designed to provided an efficient static document request (file transfer) mechanism. CGI.C The CGI module provides the CGI scripting support functions used by the DCL.C and DECNET.C modules. CLI.C Module to process (interpret) the HTTPD command-line. CONTROL.C This module implements the HTTPd command-line control functionality. At the command-line an administrator may enter HTTP/DO=command[/ALL]. This is written to the HTTPd via shared memory in a global sector, interpreted and the requested action taken or an error message sent back to the administrator. The Distributed Lock Manager (DLM) is used to alert a detached server process that a command is available for actions, and to distribute a command to all server cluster-wide when using the /ALL qualifier. CONFIG.C This module provides basic server and service configuration. Uses the HTTPD$CONFIG file as the source of configuration information. DAVCOPY.C WebDAV copy a file or directory tree. DAVDELETE.C WebDAV delete a file or directory tree. DAVLOCK.C Manage a WebDAV lock on a resource (file or tree). Although WASD WebDAV uses VMS locks and the DLM for managing access within its processing this module deals exclusively with WebDAV locking. WASD WebDAV locks are stores in meta-data files managed by DAVMETA.C. DAVMETA.C WebDAV meta-data uses XML for its representation. WASD WebDAV meta-data is stored in a separate file along with the file it is the meta-data of (or directory file if a directory). This module manages the WASD-specific data stored in that (WebDAV locks) and independent client-supplied entities. DAVMOVE.C WebDAV move a file or directory tree. For efficiency renames the file or directory file if on the same volume, or if on another volume copies and deletes (much more expensive). DAVPROP.C Generates WebDAV "live" properties for files and directories. These are XML representations of such data as creation and modification timer, size, etc. DAVWEB.C This is the primary WebDAV processing module. All WebDAV-specific processing begins and ends with this. It also pr4ovides some common WebDAV processing functionality and the WebDAV report code. The prologue contains significant WASD Web-DAV comments. DCL.C The DCL execution functionality must interface and coordinate with an external subprocess. Supports CGI and CGIplus scripting and SSI DCL-processed directives. DECNET.C The DECnet module provides scripting based on process management using DECnet. Both standard WASD CGI scripting and an emulation of OSU (DECthreads) scripting are supported. DESCR.C The Descr.c module generates a file description by searching HTML files for the first occurance of