WASD Web Services - Environment Overview

2 - Document Access and Specification

2.1 - Document Content Type
2.2 - Explicitly Specifying Content-Type
2.3 - Document Specification
    2.3.1 - Absolute File Path
    2.3.2 - Partial (or Relative) File Path
2.4 - Extended File Specifications (ODS-5)
    2.4.1 - Characters In Request Paths
    2.4.2 - Characters In Server-Generated Paths
    2.4.3 - Document Cache
[next] [previous] [contents] [full-page]

Arbitrary documents may not be accessed.

The server can only access files where the path is allowed according to a specified set of rules specified within the web environment.

Documents must be read-accessible.

The server can only access files that are world readable, or that have an ACL specifically controlling access for "HTTP$SERVER", the server account.

2.1 - Document Content Type

Document (file) retrieval is initiated by providing the server with the file specification as a URL path. Server configuration determines the format in which the file is returned to the client. It may contain text or images immediately diplayable by the browser, or by a viewer external to the browser may be spawned. The server may automatically activate a script to provide a gateway to non-native information (see description of [AddType] configuration directive in the Technical Overview). The file type (extension) determines the content type by which the server returns (and/or interprets) the file.

The following table lists some of the current file types (as examples) and their associated MIME-style content type. HTML documents are presented layed-up according to the full HTML-capabilities of the browser. Plain-text documents are presented in a fixed-font format. Other types require an external viewer to be activated. Here are a few examples.

.BKB        Bookreader document (BNU)   text/html, gateway script activated
.BKS        Bookreader shelf (BNU)      text/html, gateway script activated
.C          C source                    text/plain
.COM        DCL procedure               text/plain
.CONF       configuration file          text/plain
.CPP        C++ source                  text/plain
.DECW$BOOK  Bookreader document         text/html, gateway script activated
.FOR        Fortran source              text/plain
.GIF        GIF image                   image/gif
.H          C header                    text/plain
.HLB        VMS Help library            text/html, gateway script activated
.HTML       HyperText Markup Language   text/html
.HTM        HyperText Markup Language   text/html
.JPG        JPEG image                  image/jpeg
.LIS        Listing                     text/plain
.MAR        Macro source                text/plain
.PAS        Pascal source               text/plain
.PRO        IDL source                  text/plain
.PS         PostScript                  application/PostScript
.TEXT       Text                        text/plain
.TLB        VMS text library            text/html, gateway script activated
.TXT        Text                        text/plain
.SHTML      HyperText Markup Language   pre-processed text/html
.ZIP        zipped file                 application/binary

If other file types are required to be defined contact the Web administrator.

2.2 - Explicitly Specifying Content-Type

When accessing files it is possible to explicitly specify the identifying content-type to be returned to the browser in the HTTP response header. Of course this does not change the actual content of the file, just the header content-type! This is primarily provided to allow access to plain-text documents that have obscure, non-"standard" or non-configured file extensions.

It could also be used for other purposes, "forcing" the browser to accept a particular file as a particular content-type. This can be useful if the extension is not configured (as mentioned above) or in the case where the file contains data of a known content-type but with an extension conflicting with an already configured extension specifying data of a different content-type.

Enter the file path into the browser's URL specification field ("Location:", "Address:"). Then, for plain-text, append the following query string:

?httpd=content&type=text/plain

For another content-type substitute it appropriately. For example, to retrieve a text file in binary (why I can't imagine :^) use

?httpd=content&type=application/octet-stream
  file.unknown
  file.unknown?httpd=content&type=text/plain

It is also posssible to "force" the content-type for all files in a particular directory. See 3.3.13 - Specifying Content-Type.

Ignored Content-Type

Even then some browsers and/or some operating systems and/or some version combinations insist on ignoring the response header specified content-type and instead seem to second-guess (often incorrectly) based on the file name extension. A common example is the content of DCL procedures on Windows and up-until-fairly-recent versions of Internet Explorer.

Faux Extension

Notwithstanding, if a '$' and then a second extension is appended to the URI this is often sufficient to coerce the browser into displaying the content associated with the bogus extension. In the case of DCL procedure access on a Windows platform try using "$.txt", or for other purposes whatever extension fits the requirement, as in the following examples.

  /wasd_root/src/build_all.com$.txt
  /wasd_root/src/build_all.com$.anythingatall
WASD specially handles a URI in this format when the requested resource is not found by internally stripping the "$." extension and attempting to access the resultant file name again. This technique works for file based resources and not for scripts, etc.

2.3 - Document Specification

For the "http:" protocol, file and directory locations are specified using URL path syntax where slash-separated ("/") elements delineate a hierarchy leading to a data item. Anyone familiar with the syntax of the Unix file system, or the MS-DOS file system (where back-slashes are hierarchy delimiters), will feel at home with URL syntax. Specifications under VMS are not case-sensitive.

A VMS directory specification

WEB:[TECHNICAL.HTML-PRIMER]
would be represented in URL syntax as
/web/technical/html-primer/
and a VMS file specification
WEB:[TECHNICAL.HTML-PRIMER]HTML-PRIMER.HTML
represented as
/web/technical/html-primer/html-primer.html
NOTE

It is not required (although not forbidden) to supply a VMS master file directory component ("[000000]", "[000000.", etc.) in a URL specification. Hence the file specification
WEB:[000000]HOME.HTML
should be represented as
/web/home.html

2.3.1 - Absolute File Path

A file may be specified using an absolute, or full path. This must specify the location of the file exactly. Absolute paths always begin with a forward-slash ("/"). For example:

/web/committee/minutes/1994/1994-09-27.txt
/web/committee/constitution.txt
/web/committee/membership/fred-bloggs.txt

2.3.2 - Partial (or Relative) File Path

(Strictly speaking, it is a function of the client to construct a full URL from such a relative URL before sending the request to the server.)

A file may be specified relative to its current location. That is, a current document (or menu) may specify another document file relative to itself. This may be at the current level, a subdirectory, or in another part of the directory tree related to the current. Relative paths never begin with forward-slash ("/").

For example, documents at the same level as the current may be specified without any hierachy being indicated:

1994-07-22.txt
1994-08-24.txt
1994-09-27.txt

Documents at an inferior point in the hierarchy may be specified as in the following example:

1993/1993-02-17.txt
1993/reports/membership.txt
other/etc.txt

Documents in a related part of the hierarchy may be referenced using the "../" construct. As with MS-DOS and Unix this syntax indicates the immediately superior directory.

../other_committee/1993/1993-02-17.txt
../other_committee/1993/reports/balance-sheet.txt
../../other_section/committee/constitution.txt

2.4 - Extended File Specifications (ODS-5)

OpenVMS Alpha V7.2 introduced a new on-disk file system structure, ODS-5. This brings to VMS in general, and WASD and other Web servers in particular, a number of issues regarding the handling of characters previously not encountered during (ODS-2) file system activities.

2.4.1 - Characters In Request Paths

There is a standard for characters used in HTTP requests paths and query strings (URLs). This includes conventions for the handling of reserved characters, for example "?", "+", "&", "=" that have specific meanings in a request, characters that are completely forbidden, for example white-space, control characters (0x00 to 0x1f), and others that have usages by convention, for example the "~", commonly used to indicate a username mapping. The request can otherwise contain these characters provided they are URL-encoded (i.e. a percentage symbol followed by two hexadecimal digits representing the hexadecimal-encoded character value).

There is also an RMS standard for handling characters in extended file specifications, some of which are forbidden in the ODS-2 file naming conventions, and others which have a reserved meaning to either the command-line interpreter (e.g. the space) or the file system structure (e.g. the ":", "[", "]" and "."). Generally the allowed but reserved characters can be used in ODS-5 file names if escaped using the "^" character. For example, the ODS-2 file name "THIS_AND_THAT.TXT" could be named "This^_^&^_That.txt" on an ODS-5 volume. More complex rules control the use of character combinations with significance to RMS, for instance multiple periods. The following file name is allowed on an ODS-5 volume, "A-GNU-zipped-TAR-archive^.tar.gz", where the non-significant period has been escaped making it acceptable to RMS.

The WASD server will accept request paths for file specifications in both formats, URL-encoded and RMS-escaped. Of course characters absolutely forbidden in request paths must still be URL-encoded, the most obvious example is the space. RMS will accept the file name "This^ and^ that.txt" (i.e. containing escaped spaces) but the request path would need to be specified as "This%20and%20that.txt", or possibly "This^%20and^%20that.txt" although the RMS escape character is basically redundant.

Unlike for ODS-2 volumes, ODS-5 volumes do not have "invalid" characters, so unlike with ODS-2 no processing is performed by the server to ensure RMS compliance.

2.4.2 - Characters In Server-Generated Paths

When the server generates a path to be returned to the browser, either in a viewable page such as a directory listing or error message, or as a part of the HTTP transaction such as a redirection, the path will contain the URL-encoded equivalent of the canonical form of an extended file specification escaped character. For example, the file name "This^_and^_that.txt" will be represented by "This%20and%20that.txt".

When presenting a file name in a viewable page the general rule is to also provide this URL-equivalent of the unescaped file name, with a small number of exceptions. The first is a directory listing where VMS format has been requested by including a version component in the request file specification. The second is in similar fashion, but with the tree facility, displaying a directory tree. The third is in the navigation page of the UPDate menu. In all of the instances the canonical form of the extended file specification is presented (although any actual reference to the file is URL-encoded as described above).

2.4.3 - Document Cache

The Web server is most commonly set up to cache static documents (files). A cache is higher speed storage, in-memory, in the server itself. Cached documents are checked periodically for changes when being requested. Changes to a file are determined by the comparing the modification date/time and file length. A common check period is one minute, though it can set longer or even disabled. If a document has changed the old one is discarded from cache (called invalidation) and the new one loaded into cache while being transfered to the client.

After making changes to a document it is possible the server will continue to serve the old one for a short period. This can be overridden by using the browser's Reload facility. This directs the server to go and check the on-disk file regardless, invalidating it if necessary.


[next] [previous] [contents] [full-page]