/*****************************************************************************/ /* Query.c A CGI-compliant script to search plain-text and HTML files on ODS-2 and ODS-5 volumes. Extended file specifications may be expressed using either RMS-escaped ("^_") or URL-escaped ("%nn") forbidden characters. If a version delimiter (';', with or without version number) is present in the path specification then this script displays and anchors RMS-escaped and VMS syntax file names. If none is present it supplies URL-encoded file names. Query works in concert with EXTRACT.C. Accepts query-based searches (e.g. "/web/*.*?find+this+phrase") or form-based fields briefly discussed below. By default searches the URL-supplied path, or will use a path supplied with the "path=" query form field (via request redirection to get the server to map the supplied path). A VMS directory elipsis (i.e. "...") may be supplied in the path to result in a directory tree search. Hits in both plain-text files and HTML files provide a link to the extract script. With plain text files the extract script extracts a section of the file and presents it with some buttons to allow retrieving other sections or all the document. When extracting HTML files it returns the entire document but with each occurance of the hit enclosed by a '' anchor that allows the specific hit to be jumped to with relative document syntax. The following tags to not have any content included: , , , , . Using a script-name prefixed path such as "/query/web/.../*.*" returns a simple form for generating a search. "Text" file extensions are predefined in the DEFAULT_TEXT_TYPES and DEFAULT_HTML_TYPES macros. To specify a custom list use /TEXT= and /HTML= or to add other extensions to be considered text or HTML use /ADDTEXT= and /ADDHTML= (not this is a comma-separated list with no extension period). File extensions may contain the asterisk wildcard character, representing zero or more matching characters (e.g. "REPORT_*"). PAGE LAYOUT ----------- Page layout and colouration may be specified via the appropriate command-line qualifiers (or corresponding logical/symbol name). Defaults apply for any not specified. See "Qualifiers" section below, and also about the logical name or symbol "QUERY$PARAM". An example of changing the page colour to white and the banner to red! /PBGCOLOR="#ffffff" /PHBGCOLOR="#ff0000" Don't like explicitly setting a browser's colours? A colour may be disabled by setting it to empty. The following example disables all colours. /PBGCOLOR/PBBGCOLOR/PHBGCOLOR/PHTEXT/PLINK/PTEXT/PVLINK The script can format a page in either of two layouts. 1. Tables are used to create a coloured header and button bar (DEFAULT). Default colours are white page with grey heading and button outlines. 2. Textual header, horizontal rules and a textual button bar. No default colours. Select other than the default using the following: /PLAYOUT=2 Local information may be included in the header. For layout 1 this should be integrated with the formatted header and to the right of the header information. Text, an image logo, just about anything could be included. This is a example of providing a textual form of a local logo: /PHLOCAL="" This is an example of providing a local graphical logo: /PHLOCAL="" Such local information with layout 2 is included immediately before the header information at the top of the page. Button labels are customizable (potentially to non-English language). They comprise a label, equate symbol and URL-style path suitable for creating a link. Multiple buttons are separated using the semicolon. Note that any such button customization must provide escaped HTML-forbidden characters in the button label and URI-forbidden characters in the path! The backslash character, "\", escapes characters, including the button-delimitting "=" and ";". There are defaults, see DEFAULT_BUTTONS. Here is an example of changing the button labels: /BUTTON="About=/query/-/aboutquery.html" Additional buttons may be created by adding "label=path;" elements to the button string. In this way an additional information page could be referenced as follows: /BUTTON="About=/query/-/aboutquery.html;Other Information=/info/" DIFFICULTY FITTING ALL THESE QUALIFIERS INTO A COMMAND LINE OR LOGICAL? Use an existing, or create a new, DCL wrapper procedure for the script (same name and directory) and build up a DCL symbol to provide the information. Up to 1024 characters can be included in this way. For example: $ QUERY$PARAM = "/BUTTON=""About=/query/-/aboutquery.html""" $ QUERY$PARAM = QUERY$PARAM + "/PBGCOLOR/PLINK/PVLINK" $ QUERY$PARAM = QUERY$PARAM + "/PHLOCAL=""""" $ RUN HT_EXE:QUERY NOTE ON EXTENDED FILE SPECS --------------------------- This script is ODS-5 volume compliant and will process extended file naming (excluding those using Unicode characters). It's design allows building on VAX VMS (which excludes ODS-5 support) and Alpha VMS all versions. When building on VAX all extended ODS code is "#ifdef"ed out for efficiency reasons. If built under an Alpha version that does not support extended naming the ENAMEL.H header file provides an equivalent (a build allowed by the bogus NAML created by ENAMEL.H) then it can only process ODS-2 file names, however if built with VMS 7.2ff it will process both, and the object module can still be linked and used on an earlier version (although without ODS-5 capabilities of course). This Alpha backward/forward compatibility is provided at runtime by checking the version of VMS it is currently executing under. If extended file specification compliant then NAML structures are always used, otherwise NAMs. Hence, a VMS version that doesn't know about extended file specifications (and doesn't have any ODS-5 volumes of course) never gets passed a NAML! LOGICAL NAMES ------------- QUERY$DBUG turns on all "if (Debug)" statements QUERY$PARAM equivalent to (overrides) the command line parameters/qualifiers (define as a system-wide logical) HTML FORM ELEMENTS ------------------ case= case sensitive search (Y or N) exact= exact number of records (for extract utility, Y or N) extract= number of line to pass to extract utility hits= show all hits or document only (D or A) html= comma-separated list of HTML file extensions (overrides the /HTML and /ADDHTML qualifiers) plain= if "yes" then treat HTML files as if plain-text (i.e. search markup tags, everything!) path= form supplied path (otherwise URL path) search= text to search for text= comma-separated list of text file extensions (overrides the /TEXT and /ADDTEXT qualifiers) what= title on search results page These could be used as in the following example (note that as this is in a C-language comment "@" has been substituted for "*", they're just wildcards!):
Case sensitive? N Y
Extract this number of lines around a "hit": QUALIFIERS ---------- /ABOUT= synonym for /HELP /ADDHTML= additional list of comma separated HTML file types /ADDTEXT= additional list of comma separated TEXT file types /BUTTONS= string containing button labels/paths /CHARSET= "Content-Type: text/html; charset=...", empty suppress charset /DBUG turns on all "if (Debug)" statements /EXTRACT= path to extract script /HELP= URL for help on searching /HTML= complete list of comma separated HTML file types /[NO]ODS5 control extended file specification (basically for testing) /TEXT= complete list of comma separated TEXT file types /TIMEFMT= strftime() format string for last-modified time /PBACKGROUND= background image path /PBGCOLOR= background colour /PBBGCOLOR= button background color /PBBORDER= width of button border /PHBGCOLOR= heading background color /PHBORDER= width of heading and button-bar border /PHLOCAL= local information to be included in header /PHTEXT= heading text colour /PLAYOUT= 1 is coloured header & buttons, 2 is text & horizontal rules /PLINK= link colour /PTEXT= text colour /PVLINK= visited link colour BUILD DETAILS ------------- See BUILD_QUERY.COM procedure. COPYRIGHT --------- Copyright (C) 1996-2005 Mark G.Daniel This program, comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it under the conditions of the GNU GENERAL PUBLIC LICENSE, version 2. VERSION HISTORY (update SOFTWAREVN as well!) --------------- 10-MAY-2005 MGD v3.2.9, SWS 2.0 ignore query string components supplied as command-line parameters differently to CSWS 1.2/3 23-DEC-2003 MGD v3.2.8, minor conditional mods to support IA64 15-AUG-2003 MGD v3.2.7, bugfix; move fragment *after* query string 23-JUN-2003 MGD v3.2.6, record size increased to maximum (32767 bytes), ignore RMS$_WLK from sys$open() in line with RMS$_PRV 12-APR-2003 MGD v3.2.5, link colour changed to 0000cc 15-AUG-2002 MGD v3.2.4, GetParameters() mod for direct CSWS 1.2 support 01-JUL-2001 MGD v3.2.3, add 'SkipParameters' for direct OSU support 19-MAY-2001 MGD v3.2.2, remove limitation in SameFileType() preventing searching of multiple file versions 25-JAN-2001 MGD v3.2.1, use to terminate processing 28-OCT-2000 MGD v3.2.0, use CGILIB object module 02-MAR-2000 MGD v3.1.2, bugfix;ed again:^( rework SameFileType() 28-FEB-2000 MGD v3.1.1, bugfix; SameFileType() wildcard processing 15-FEB-2000 MGD v3.1.0, allow wildcarded file types 18-JAN-2000 MGD v3.0.0, support extended file specifications (ODS-5) 07-AUG-1999 MGD v2.9.0, use more of the CGILIB functionality, plain-text files described using file name 24-APR-1999 MGD v2.8.0, use CGILIB.C, standard CGI environment (e.g. Netscape FastTrack) 18-FEB-1999 MGD v2.7.2, Search[Text/Html]File() handling of SS$_PRV (files with protection violations now just ignored) 20-NOV-1998 MGD v2.7.1, exclude certain content (e.g.

"- "WASD HyperText Services

...