/*****************************************************************************/ /* Extract.c CGI-compliant script (working in concert with QUERY.C) to: 1. extract a specified range of lines from a plain text file 2. anchor hits of the keyword in an HTML file Extended file specifications may be expressed using either RMS-escaped ("^_") or URL-escaped ("%nn") forbidden characters. If a version delimiter (';', with or without version number) is present in the path specification then this script displays and anchors RMS-escaped and VM syntax file names. If none is present it supplies URL-encoded file names. When extracting HTML files it returns the entire document but with each occurance of the hit enclosed by a '' anchor that allows the specific hit to be jumped to with relative document syntax. If an HTML file is "extracted" without either CGI variable 'form_anchor' or 'form_plain' being non-empty a 302 redirection is generated direct to the document itself (on the assumption that it is a self-relative link within an "extract"-anchored document, with the consequent loss of any partial-document reference (i.e. #blah)). The following tags to not have any content included: , , , , . "Text" file extensions are predefined in the DEFAULT_TEXT_TYPES and DEFAULT_HTML_TYPES macros. To specify a custom list use /TEXT= and /HTML= or to add other extensions to be considered text or HTML use /ADDTEXT= and /ADDHTML= (not this is a comma-separated list with no extension period). File extensions may contain the asterisk wildcard character, representing zero or more matching characters (e.g. "REPORT_*"). PAGE LAYOUT ----------- Page layout and colouration may be specified via the appropriate command-line qualifiers (or corresponding logical/symbol name). Defaults apply for any not specified. See "Qualifiers" section below, and also about the logical name or symbol "QUERY$PARAM". An example of changing the page colour to white and the banner to red! /PBGCOLOR="#ffffff" /PHBGCOLOR="#ff0000" Don't like explicitly setting a browser's colours? A colour may be disabled by setting it to empty. The following example disables all colours. /PBGCOLOR/PBBGCOLOR/PHBGCOLOR/PHTEXT/PLINK/PTEXT/PVLINK The script can format a page in either of two layouts. 1. Tables are used to create a coloured header and button bar (DEFAULT). Default colours are white page with grey heading and button outlines. 2. Textual header, horizontal rules and a textual button bar. No default colours. Select other than the default using the following: /PLAYOUT=2 Local information may be included in the header. For layout 1 this should be integrated with the formatted header and to the right of the header information. Text, an image logo, just about anything could be included. This is a example of providing a textual form of a local logo: /PHLOCAL="" This is an example of providing a local graphical logo: /PHLOCAL="" Such local information with layout 2 is included immediately before the header information at the top of the page. Button labels are customizable (potentially to non-English language). They comprise a label, equate symbol and URL-style path suitable for creating a link. Multiple buttons are separated using the semicolon. Note that any such button customization must provide escaped HTML-forbidden characters in the button label and URI-forbidden characters in the path! The backslash character, "\", escapes characters, including the button-delimitting "=" and ";". There are defaults, see DEFAULT_BUTTONS. Here is an example of changing the button labels: /BUTTON="About=/extract/-/aboutextract.html" Additional buttons may be created by adding "label=path;" elements to the button string. In this way an additional information page could be referenced as follows: /BUTTON="About=/extract/-/aboutextract.html;Other Information=/info/" DIFFICULTY FITTING ALL THESE QUALIFIERS INTO A COMMAND LINE OR LOGICAL? Use an existing, or create a new, DCL wrapper procedure for the script (same name and directory) and build up a DCL symbol to provide the information. Up to 1024 characters can be included in this way. For example: $ EXTRACT$PARAM = "/BUTTON=""About=/extract/-/aboutextract.html""" $ EXTRACT$PARAM = EXTRACT$PARAM + "/PBGCOLOR/PLINK/PVLINK" $ EXTRACT$PARAM = EXTRACT$PARAM + "/PHLOCAL=""""" $ RUN HT_EXE:EXTRACT CGI FORM ELEMENTS ----------------- form_anchor= introduce "hit" anchors into an HTML file form_case= was a case sensitive search (Y or N) form_end= record (line) number to being extract form_exact= exact number of records (Y or N) form_extract= number of line to pass to extract utility form_highlight= the string to be highlighted form_html= comma-separated list of HTML file extensions (overrides the /HTML and /ADDHTML qualifiers) form_plain= treat an HTML file as plain text form_start= record (line) number to being extract form_text= comma-separated list of text file extensions (overrides the /TEXT and /ADDTEXT qualifiers) Generally these form elements will be generated by QUERY.C but there is no reasons why they shopuldn't come from somewhere else. LOGICAL NAMES ------------- EXTRACT$DBUG turns on all "if (Debug)" statements EXTRACT$PARAM equivalent to (overrides) the command line parameters/qualifiers (define as a system-wide logical) QUALIFIERS ---------- /ADDHTML= additional list of comma separated HTML file types /ADDTEXT= additional list of comma separated TEXT file types /BUTTONS= string containing button labels/paths /CHARSET= "Content-Type: text/html; charset=...", empty suppress charset /DBUG turns on all "if (Debug)" statements /[NO]ODS5 control extended file specification (basically for testing) /PBACKGROUND= background image path /PBGCOLOR= background colour /PBBGCOLOR= button background color /PBBORDER= width of button border /PHBGCOLOR= heading background color /PHBORDER= width of heading and button-bar border /PHLOCAL= local information to be included in header /PHTEXT= heading text colour /PLAYOUT= 1 is coloured header & buttons, 2 is text & horizontal rules /PLINK= link colour /PTEXT= text colour /PVLINK= visited link colour /TEXT= complete list of comma separated TEXT file types OSU ENVIRONMENT --------------- Script responses are returned in OSU "raw" mode; the script taking care of the full response header and correctly carriage-controlled data stream, text or binary!! Uses the CGILIB.C to engage in the dialog phase generating, storing and then making available the equivalent of CGI variables. "VANILLA" CGI ENVIRONMENT ------------------------- Primarily for the likes of Netscape FastTrack. This environment can accomodate CGI variables that are not prefixed with "WWW_" and do not supply "KEY_xxxxx" or "FORM_xxxxx" (which must be derived from "QUERY_STRING"). Full HTTP stream (non-parsed header) is assumed as not supported so all output occurs with a CGI-compliant header line (e.g. "Status: 200 Success") and record-oriented output. BUILD DETAILS ------------- See BUILD_EXTRACT.COM procedure. COPYRIGHT --------- Copyright (C) 1996-2011 Mark G.Daniel This program, comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it under the conditions of the GNU GENERAL PUBLIC LICENSE, version 2. VERSION HISTORY (update SOFTWAREVN as well!) --------------- 13-NOV-2011 MGD v3.3.3, bugfix; tag counting 10-MAY-2005 MGD v3.3.2, SWS 2.0 ignore query string components supplied as command-line parameters differently to CSWS 1.2/3 23-DEC-2003 MGD v3.3.1, minor conditional mods to support IA64 23-JUN-2003 MGD v3.3.0, record size increased to maximum (32767 bytes) 12-APR-2003 MGD v3.2.4, link colour changed to 0000cc 15-AUG-2002 MGD v3.2.3, GetParameters() mod for direct CSWS 1.2 support 01-JUL-2001 MGD v3.2.2, add 'SkipParameters' for direct OSU support 25-JAN-2001 MGD v3.2.1, use to terminate processing 28-OCT-2000 MGD v3.2.0, use CGILIB object module 02-MAR-2000 MGD v3.1.2, bugfix;ed again:^( rework SameFileType() 28-FEB-2000 MGD v3.1.1, bugfix; SameFileType() wildcard processing 15-FEB-2000 MGD v3.1.0, allow wildcarded file types 18-JAN-2000 MGD v3.0.0, support extended file specifications (ODS-5) 07-AUG-1999 MGD v2.7.0, use more of the CGILIB functionality 24-APR-1999 MGD v2.6.0, use CGILIB.C, standard CGI environment (Netscape FastTrack) 20-NOV-1998 MGD v2.5.1, exclude certain content (e.g. ", 9)) { InsideScript = false; memset (tptr, MASK_TAG_CHAR, 9); tptr += 9; rptr += 9; continue; } } if (InsideServer) { if (strsame (rptr, "", 9)) { InsideServer = false; memset (tptr, MASK_TAG_CHAR, 9); tptr += 9; rptr += 9; continue; } } if (InsideStyle) { if (strsame (rptr, "", 7)) { InsideStyle = false; memset (tptr, MASK_TAG_CHAR, 7); tptr += 7; rptr += 7; continue; } } if (InsideTitle) { if (strsame (rptr, "", 7)) { InsideTitle = false; memset (tptr, MASK_TAG_CHAR, 7); tptr += 7; rptr += 7; continue; } } } if (*rptr == '<') { if (*((ULONGPTR)rptr) == '\n", SoftwareID); return (status); } /*****************************************************************************/ /* This function accepts a comma-separated list of (possibly wildcarded) file types (extensions, e.g. "TXT,TEXT,COM,C,PAS,FOR,RPT*") and a VMS file type (e.g. ".TXT;", ".TXT", "TXT"). It returns true if the file type is in the list, false if not. */ boolean SameFileType ( char *TypeList, char *FileType ) { char ch; char *cptr, *sptr; /*********/ /* begin */ /*********/ if (Debug) fprintf (stdout, "SameFileType() |%s|%s|\n", FileType, TypeList); cptr = TypeList; while (*cptr) { for (sptr = cptr; *sptr && *sptr != ','; sptr++); ch = *sptr; *sptr = '\0'; if (Debug) fprintf (stdout, "|%s|%s|\n", FileType, cptr); if ((SearchTextString (FileType, cptr, false, false, NULL)) != NULL) { *sptr = ch; return (true); } if (*sptr = ch) sptr++; cptr = sptr; } return (false); } /*****************************************************************************/ /* String search allowing wildcard "*" (matching any multiple characters) and "%" (matching any single character). Returns NULL if not found or a pointer to start of matched string. Setting 'ImpliedWildcards' means the 'SearchFor' string is processed as if enclosed by '*' wildcard characters. */ char* SearchTextString ( char *SearchIn, char *SearchFor, boolean CaseSensitive, boolean ImpliedWildcards, int *MatchedLengthPtr ) { char *cptr, *sptr, *inptr, *RestartCptr, *RestartInptr, *MatchPtr; /*********/ /* begin */ /*********/ if (Debug) fprintf (stdout, "SearchTextString() |%s|%s|\n", SearchIn, SearchFor); if (MatchedLengthPtr != NULL) *MatchedLengthPtr = 0; if (!*(cptr = SearchFor)) return (NULL); inptr = MatchPtr = SearchIn; if (ImpliedWildcards) { /* skip leading text up to first matching character (if any!) */ if (*cptr != '*' && *cptr != '%') { if (CaseSensitive) while (*inptr && *inptr != *cptr) inptr++; else while (*inptr && toupper(*inptr) != toupper(*cptr)) inptr++; if (Debug && !*inptr) fprintf (stdout, "1. NOT matched!\n"); if (!*inptr) return (NULL); cptr++; MatchPtr = inptr++; } } for (;;) { if (CaseSensitive) { while (*cptr && *inptr && *cptr == *inptr) { cptr++; inptr++; } } else { while (*cptr && *inptr && toupper(*cptr) == toupper(*inptr)) { cptr++; inptr++; } } if (ImpliedWildcards) { if (!*cptr) { if (Debug) fprintf (stdout, "1. matched!\n"); if (MatchedLengthPtr != NULL) *MatchedLengthPtr = inptr - MatchPtr; return (MatchPtr); } } else { if (!*cptr && !*inptr) { if (Debug) fprintf (stdout, "2. matched!\n"); if (MatchedLengthPtr != NULL) *MatchedLengthPtr = inptr - MatchPtr; return (MatchPtr); } if (*cptr != '*' && *cptr != '%') { if (Debug && !*inptr) fprintf (stdout, "3. NOT matched!\n"); return (NULL); } } if (*cptr != '*' && *cptr != '%') { if (!*inptr) { if (Debug) fprintf (stdout, "4. NOT matched!\n"); return (NULL); } cptr = SearchFor; MatchPtr = ++inptr; continue; } if (*cptr == '%') { /* single char wildcard processing */ if (!*inptr) break; cptr++; inptr++; continue; } /* asterisk wildcard matching */ while (*cptr == '*') cptr++; /* an asterisk wildcard at end matches all following */ if (!*cptr) { if (Debug) fprintf (stdout, "5. matched!\n"); while (*inptr) inptr++; if (MatchedLengthPtr != NULL) *MatchedLengthPtr = inptr - MatchPtr; return (MatchPtr); } /* note the current position in the string (first after the wildcard) */ RestartCptr = cptr; for (;;) { /* find first char in SearchIn matching char after wildcard */ if (CaseSensitive) while (*inptr && *cptr != *inptr) inptr++; else while (*inptr && toupper(*cptr) != toupper(*inptr)) inptr++; /* if did not find matching char in SearchIn being searched */ if (Debug && !*inptr) fprintf (stdout, "6. NOT matched!\n"); if (!*inptr) return (NULL); /* note the current position in SearchIn being searched */ RestartInptr = inptr; /* try to match the remainder of the string and SearchIn */ if (CaseSensitive) { while (*cptr && *inptr && *cptr == *inptr) { cptr++; inptr++; } } else { while (*cptr && *inptr && toupper(*cptr) == toupper(*inptr)) { cptr++; inptr++; } } /* if reached the end of both string and SearchIn - match! */ if (ImpliedWildcards) { if (!*cptr) { if (Debug) fprintf (stdout, "7. matched!\n"); if (MatchedLengthPtr != NULL) *MatchedLengthPtr = inptr - MatchPtr; return (MatchPtr); } } else { if (!*cptr && !*inptr) { if (Debug) fprintf (stdout, "8. matched!\n"); if (MatchedLengthPtr != NULL) *MatchedLengthPtr = inptr - MatchPtr; return (MatchPtr); } } /* break to the external loop if we encounter another wildcard */ if (*cptr == '*' || *cptr == '%') break; /* lets have another go */ cptr = RestartCptr; /* starting the character following the previous attempt */ inptr = MatchPtr = RestartInptr + 1; } } } /****************************************************************************/ /* Return an integer reflecting the major and minor version of VMS (e.g. 60, 61, 62, 70, 71, 72, etc.) */ #ifdef ODS_EXTENDED int GetVmsVersion () { static char SyiVersion [16]; static struct { short int buf_len; short int item; void *buf_addr; unsigned short *ret_len; } SyiItems [] = { { 8, SYI$_VERSION, &SyiVersion, 0 }, { 0,0,0,0 } }; int status, version; /*********/ /* begin */ /*********/ if (Debug) fprintf (stdout, "GetVmsVersion()\n"); if (VMSnok (status = sys$getsyiw (0, 0, 0, &SyiItems, 0, 0, 0))) exit (status); SyiVersion[8] = '\0'; version = ((SyiVersion[1]-48) * 10) + (SyiVersion[3]-48); if (Debug) fprintf (stdout, "|%s| %d\n", SyiVersion, version); return (version); } #endif /* ODS_EXTENDED */ /****************************************************************************/ /* Does a case-insensitive, character-by-character string compare and returns true if two strings are the same, or false if not. If a maximum number of characters are specified only those will be compared, if the entire strings should be compared then specify the number of characters as 0. */ boolean strsame ( char *sptr1, char *sptr2, int count ) { while (*sptr1 && *sptr2) { if (toupper (*sptr1++) != toupper (*sptr2++)) return (false); if (count) if (!--count) return (true); } if (*sptr1 || *sptr2) return (false); else return (true); } /****************************************************************************/
"- "WASD HyperText Services...