WASD VMS Web Services - Features and Facilities

10 - Server Performance

10.1 - Simple File Request Turn-Around
10.2 - Scripting
10.3 - SSL
10.4 - Suggestions
[next] [previous] [contents] [full-page]

The server has a single-process, multi-threaded, asynchronous I/O design. On a single-processor system this is the most efficient approach. On a multi-processor system it is limited by the single process context (with scripts executing within their own context). For I/O constrained processing (the most common in general Web environments) the AST-driven approach is quite efficient.

The v10 test-bench system was an HP rx2600 (Itanium 1.40GHz/1.5MB) with 2 CPUs and 8191MB, running VMS V8.3-1H1 and HP TCP/IP Services Version V5.6 - ECO 2.

Many thanks to Kednos (http://www.kednos.com) for the use of the system.

The performance data is collected using the "WASDbench" utility (12.13 - WASDbench :^)). Previous performance measurements had been made using the ApacheBench utility (12.5 - ApacheBench) but experimenting with both it was observed that (perhaps the asynchronous I/O of) WASDbench provided generally greater throughput and less variation on this higher performance Itanium platform. DCL procedures with sets of WASDbench calls are used to benchmark requests. These procedures and the generated output from benchmark runs (collected via $@procedure/OUTPUT=filename) are available in the WASD_ROOT:[EXERCISE] directory.

These results are indicative only!

On a multi-user system too many things vary slightly all the time.

Every endeavour has been made to ensure the comparison is as equitable as possible. Each server executes at the same process priority, access logging and host name lookup disabled, and runs on the same machine in the same relatively quiescent environment. Each test run was interleaved between each server to try and distribute any environment variations. Those runs that are very high throughput use a larger number of requests to improve sample period validity. All servers were configured pretty-much "out-of-the-box", minimal changes (generally just enough to get the test environment going). The server and test-bench utility were located on the same system eliminating actual data on the wire. Multiple data collections have yielded essentially the same relative results.

For the test-bench WASD v10.0 is present on port 7080.

OSU Comparison

The OSU comparison used the v3-11 release suitable for Itanium. OSU is executing in kernel-threads mode ("M"). OSU is present on port 7777.

Apache Comparison

The Apache comparison used the latest CSWS (CSWS-V0201, based on v2.0.52), Perl (CSWS_PERL-V0201) and PHP (CSWS_PHP-V0210) kits, and any required updates/ECOs, available at the time of collection. Apache is present on port 8888.

10.1 - Simple File Request Turn-Around

A series of tests using batches of accesses. The first test returned an empty file measuring response and file access time, without any actual transfer. The second requested a file of 64K characters, testing performance with a more realistic load. All were done using one and ten concurrent requests. Note that the Apache measurement is "out-of-the-box" - the author could find no hint of a file cache, let-alone how to enable/disable one. Each request required a complete TCP connection and disposal.

Cache Disabled

Concurrency 1 - Requests/Second
ResponseWASDOSUApache
0K1379032
64K1266333

Concurrency 10 - Requests/Second
ResponseWASDOSUApache
0K1129628
64K987529

Cache Enabled

Concurrency 1 - Requests/Second
ResponseWASDOSUApache
0K1606125227
64K86256230

Concurrency 10 - Requests/Second
ResponseWASDOSUApache
0K1730158029
64K98267228

Result files:

WASD_ROOT:[EXERCISE]PERF_FILES_NOCACHE_WB_V10.TXT
WASD_ROOT:[EXERCISE]PERF_FILES_WB_V10.TXT

The difference between cached and non-cached result with the zero file size (no actual data transfer involved) gives some indication of the raw difference in response latency, some 5x improvement. It also indicates the relative efficiencies of file-system access. This is a fairly BASIC analysis, but does give an approciation of the utility and efficiencies of having an in-server cache.

File Transfer Rate

Requests for a large binary file indicate a potential transfer rate of many tens of Mbytes per second. On the test-bench this data does not get onto the wire of course but it does serve to demonstrate that server architecture should not be the limiting factor in file throughput.

Transfer Rate - MBytes/Second
ResponseConcurrentWASDOSUApache
13MB (26134 blocks)11137185
13MB (26134 blocks)101155893

Result file:

WASD_ROOT:[EXERCISE]PERF_XFER_WB_V10.TXT

File Record Format

The WASD server can handle STREAM, STREAM_LF, STREAM_CR, FIXED and UNDEFINED record formats very much more efficiently than VARIABLE or VFC files. With STREAM, FIXED and UNDEFINED files the assumption is that HTTP carriage-control is within the file itself (i.e. at least the newline (LF), all that is required required by browsers), and does not require additional processing. With VARIABLE record files the carriage-control is implied and therefore each record requires additional processing by the server to supply it. Even with variable record files having multiple records buffered by the HTTPd before writing them collectively to the network improving efficiency, stream and binary file reads are by Virtual Block and are written to the network immediately making the transfer of these very efficient indeed!

CPU Consumed

Just one other indicative metric; CPU time consumed during the file request runs. The value for Apache was not measured as it would be distributed over an indeterminate number of child processes.

CPU Time Consumed (Seconds)
CacheWASDOSU
Disabled4.3612.75
Enabled1.513.32

Result files (towards end of each):

WASD_ROOT:[EXERCISE]PERF_FILES_NOCACHE_WB_V10.TXT
WASD_ROOT:[EXERCISE]PERF_FILES_WB_V10.TXT

10.2 - Scripting

A simple performance evaluation shows the relative merits of the four WASD scripting environments available, plus a comparison with OSU and Apache. WASD_ROOT:[SRC.CGIPLUS]CGIPLUSTEST.C, which executes in both standard CGI and CGIplus environments, and an ISAPI example DLL, WASD_ROOT:[SRC.CGIPLUS]ISAPIEXAMPLE.C, which provides equivalent output. A series of accesses were made. The first test returned only the HTTP header, evaluating raw request turn-around time. The second test requested a body of 64K characters, again testing performance with a more realistic load.

The CGIPLUSTEST.C with the v10 package has be reworked (primarily by using CGILIB) to provide a more equitable comparison by using CGI with WASD and Apache, and the native dialog phase (i.e. non-CGI) with OSU.

DECnet-based WASD scripting was tested using essentially the same environment as detached process based CGI, assessing the performance of the same script being executed using DECnet to manage the processes. The OSU-emulation provided by WASD was also (somewhat obviously) provided using DECnet.

Concurrency 1 - Requests/Second
ResponseCGICGIplusISAPIDECnetWASD-OSUOSUApacheApache-OSU
0KB477145573246483.94.0
64KB584182633139383.01.7

Concurrency 10 - Requests/Second
ResponseCGICGIplusISAPIDECnetWASD-OSUOSUApacheApache-OSU
0KB629797854760555.54.2
64KB603772284261534.44.7

Result file:

WASD_ROOT:[EXERCISE]PERF_SCRIPTS_WB_V10.TXT

Although these results are indicative only, they do show the persistent environment of CGIplus and ISAPI to have a potential for improvement over standard CGI with factors of 5x to 10x - a not inconsiderable improvement. Of course this test generates the output stream very simply and efficiently and so excludes any actual processing time that may be required by a "real" application. If the script or application has a large activation time the reduction in response latency could be even more significant (e.g. with scripting engines such as Perl, PHP and Python, and RDMS access languages); see Persistent Scripting Observations immediately below.

Persistent Scripting Observations

CGI scripting is notoriously slow (as illustrated above), hence the effort expended by designers in creating persistent scripting environments - those where the scripting engine (and perhaps other state) is maintained between requests. Both WASD and Apache implement these as integrated modules, the former as CGIplus/RTE, and in the latter as loadable modules.

The following comparison uses two of the most common scripting environments and engines shared between WASD and Apache, Perl and PHP. The engines used in both server environments were identical. No comparison is made with OSU (in part due to the lack of obvious integration of such environments with OSU).

A simple script for each engine is used as a common test-bench for the two servers.

<!-- face2face.php -->
<?php
echo "<B>Hello!</B>"
?>
# face2face.pl
print "Content-Type:  text/html\n\n
<B>Hello!</B>
";

These are designed to measure the script environment and its activation latencies, rather than the time required to process script content (which should be consistent considering they are the same engines). In addition, the standard php_info.php is used to demonstrate with a script that actually performs some processing.

Persistent Scripting - Requests/Second
 ConcurrentWASDApache
face2face.pl11696.9
face2face.pl102299.5
face2face.php18216
face2face.php1020321
php_info.php13318
php_info.php1013518

Result file:

WASD_ROOT:[EXERCISE]PERF_PERSIST_WB_V10.TXT

These results demonstrate the efficiency and scalability of the WASD CGIplus/RTE technology used to implement its persistent scripting environments. Most site-specific scripts can also be built using the libraries, code fragments, and example scripts provided with the WASD package, and obtain similar efficiencies and low latencies. See WASD Web Services - Scripting document.

10.3 - SSL

At this time there are no definitive measurements of SSL performance (4 - Secure Sockets Layer). One might expect that because of the CPU-intensive cryptography employed in SSL requests that performance, particularly where concurrent requests are in progress, would be significantly lower. In practice SSL seems to provide more-than-acceptable responsiveness.

10.4 - Suggestions

Here are some suggestions for improving the performance of the server, listed in approximate order of significance. Many are defaults. Note that these will have proportionally less impact on an otherwise heavily loaded system.

  1. Disable host name resolution (configuration parameter [DNSLookup]). DNS latency can slow request processing significantly! Most log analysis tools can convert literal addresses so DNS resolution is often an unnecessary burden.
  2. Later versions of TCP/IP Services for OpenVMS seem to have large default values for socket send and receive buffers. MultiNet and TCPware are reported to improve transfer of large responses by increasing low default values for send buffer size. The WASD global configuration directives [SocketSizeRcvBuf] and [SocketSizeSndBuf] allow default values to be adjusted. WATCH can be used to report network connection buffer values.
  3. Enable file caching (configuration parameter [Cache]).
  4. Ensure served files are not VARIABLE record format (see above). Enable STREAM-LF conversion using a value such as 250 (configuration parameter [StreamLF], and SET against required paths using mapping rules).
  5. Use persistant DCL/scripting processes (configuration parameter [ZombieLifeTime])
  6. Ensure script processes are given every possible chance to persist (configuration parameter [DclBitBucketTimeout]).
  7. Use the persistent scripting capabilities of CGIplus or ISAPI whenever possible.
  8. Ensure the server account's WSQUO and WSEXTENT quotas are adequate. A constantly paging server is a slow server!
  9. If the server is intended to provide significant numbers of larger files (e.g. multimedia) then setting [BufferSizeNetFile] can improve data rates. Experiments should determine the maximum value (30000 - 60000) that provides the best server data rate.
  10. Tune the network and DCL output buffer size to the Maximum Transfer Unit (MTU) of the server's network interface. Using Digital TCP/IP Services (a.k.a. UCX) display the MTU.
    TCPIP> SHOW INTERFACE
                                                               Packets
    Interface   IP_Addr         Network mask          Receive          Send     MTU
    
     SE0        203.127.158.3   255.255.255.0          376960        704345    1500
     LO0        127.0.0.1       255.0.0.0                 306           306       0
    

    In this example the MTU of the ethernet interface is 1500 (bytes). Set the [BufferSizeNetWrite] configuration directive to be some multiple of this. In the case of 1500, say 3000, 4500 or 6000. Also set the [BufferSizeDclOutput] to the same value. Rationale: always use completely filled network packets when transmitting data.

    The [BufferSizeNetMTU] directive when set to the MTU will automatically optimise buffer sizes using this approach.

  11. Disable logging (configuration parameter [Logging]).
  12. Set the HTTP server process priority higher, say to 6 (use startup qualifier /PRIORITY=). Do this after due consideration. It will only improve response time if the system is also used for other, lower priority purposes. It will not help if Web-serving is the sole acitivity of the system.
  13. Use a pre-defined log format (e.g. "common", configuration parameter [LogFormat]). User-specified formats require more processing for each entry.
  14. Disable request history (configuration parameter [RequestHistory]).
  15. Disable activity statistics (configuration parameter [ActivityDays]).


[next] [previous] [contents] [full-page]