This module provides for an HTTP 1.1 caching proxy server.
Status: Extension
Source File: mod_proxy.c
Module Identifier:
proxy_module
Compatibility: Available in
Apache 1.1 and later.
FTP
, CONNECT
(for SSL), HTTP/0.9
, HTTP/1.0
, and
(as of Apache 1.3.23) HTTP/1.1
.
The module can be configured to connect to other proxy modules
for these and other protocols.
This module was experimental in Apache 1.1.x. As of Apache 1.2, mod_proxy stability is greatly improved.
Warning: Do not enable proxying with ProxyRequests until you have secured your server. Open proxy servers are dangerous both to your network and to the Internet at large.
Apache can be configured in both a forward and reverse proxy mode.
An ordinary forward proxy is an intermediate server that sits between the client and the origin server. In order to get content from the origin server, the client sends a request to the proxy naming the origin server as the target and the proxy then requests the content from the origin server and returns it to the client. The client must be specially configured to use the forward proxy to access other sites.
A typical usage of a forward proxy is to provide Internet access to internal clients that are otherwise restricted by a firewall. The forward proxy can also use caching to reduce network usage.
The forward proxy is activated using the ProxyRequests
directive.
Because forward proxys allow clients to access arbitrary sites
through your server and to hide their true origin, it is
essential that you secure your server so
that only authorized clients can access the proxy before
activating a forward proxy.
A reverse proxy, by contrast, appears to the client just like an ordinary web server. No special configuration on the client is necessary. The client makes ordinary requests for content in the name-space of the reverse proxy. The reverse proxy then decides where to send those requests, and returns the content as if it was itself the origin.
A typical usage of a reverse proxy is to provide Internet users access to a server that is behind a firewall. Reverse proxies can also be used to balance load among several back-end servers, or to provide caching for a slower back-end server. In addition, reverse proxies can be used simply to bring several servers into the same URL space.
A reverse proxy is activated using the ProxyPass
directive or the
[P]
flag to the RewriteRule
directive. It is not necessary to turn
ProxyRequests
on in
order to configure a reverse proxy.
The examples below are only a very basic idea to help you get started. Please read the documentation on the individual directives.
ProxyRequests On
ProxyVia On
<Directory proxy:*>
Order deny,allow
Deny from all
Allow from internal.example.com
</Directory>
CacheRoot "/usr/local/apache/proxy"
CacheSize 5
CacheGcInterval 4
CacheMaxExpire 24
CacheLastModifiedFactor 0.1
CacheDefaultExpire 1
NoCache a-domain.com another-domain.edu joes.garage-sale.com
ProxyRequests Off
ProxyPass /foo http://foo.example.com/bar
ProxyPassReverse /foo http://foo.example.com/bar
<Directory proxy:*> Order Deny,Allow Deny from all Allow from yournetwork.example.com </Directory>
A <Files> block will also work, and is the only method known to work for all possible URLs in Apache versions earlier than 1.2b10.
For more information, see mod_access.
Strictly limiting access is essential if you are using a
forward proxy (using the ProxyRequests
directive).
Otherwise, your server can be used by any client to access
arbitrary hosts while hiding his or her true identity. This is
dangerous both for your network and for the Internet at large.
When using a reverse proxy (using the ProxyPass
directive with
ProxyRequests Off
), access control is less critical
because clients can only contact the hosts that you have
specifically configured.
application/octet-stream bin dms lha lzh exe class tgz taz
A FTP URI is interpreted relative to the home directory of
the user who is logging in. Alas, to reach higher directory
levels you cannot use /../, as the dots are interpreted by the
browser and not actually sent to the FTP server. To address
this problem, the so called "Squid %2f hack" was implemented in
the Apache FTP proxy; it is a solution which is also used by
other popular proxy servers like the
Squid Proxy Cache.
By prepending /%2f to the path of your request, you can make
such a proxy change the FTP starting directory to / (instead
of the home directory).
Example: To retrieve the file /etc/motd
,
you would use the URL
ftp://user@host/%2f/etc/motd
To log in to an FTP server by username and password, Apache uses different strategies. In absense of a user name and password in the URL altogether, Apache sends an anonymous login to the FTP server, i.e.,
user: anonymous
password: apache_proxy@
This works for all popular FTP servers which are configured for
anonymous access.ftp://username@host/myfile
. If the FTP server
asks for a password when given this username (which it should),
then Apache will reply with a [401 Authorization required] response,
which causes the Browser to pop up the username/password dialog.
Upon entering the password, the connection attempt is retried,
and if successful, the requested resource is presented.
The advantage of this procedure is that your browser does not
display the password in cleartext (which it would if you had used
ftp://username:password@host/myfile
in
the first place).
ProxyBlock
or
NoCache
directives, hostnames' IP addresses are
looked up and cached during startup for later match test. This
may take a few seconds (or more) depending on the speed with
which the hostname lookups occur.
SOCKS4=yes
in
your Configuration file, and follow the instructions
there. SOCKS5 capability can be added in a similar way (there's
no SOCKS5
rule yet), so use the
EXTRA_LDFLAGS
definition, or build Apache normally
and run it with the runsocks wrapper provided with
SOCKS5, if your OS supports dynamically linked libraries.
Some users have reported problems when using SOCKS version 4.2 on Solaris. The problem was solved by upgrading to SOCKS 4.3.
Remember that you'll also have to grant access to your Apache proxy machine by permitting connections on the appropriate ports in your SOCKS daemon's configuration.
An Apache proxy server situated in an intranet needs to forward external requests through the company's firewall (for this, configure the ProxyRemote directive to forward the respective scheme to the firewall proxy). However, when it has to access resources within the intranet, it can bypass the firewall when accessing hosts. The NoProxy directive is useful for specifying which hosts belong to the intranet and should be accessed directly.
Users within an intranet tend to omit the local domain name from their WWW requests, thus requesting "http://somehost/" instead of "http://somehost.my.dom.ain/". Some commercial proxy servers let them get away with this and simply serve the request, implying a configured local domain. When the ProxyDomain directive is used and the server is configured for proxy service, Apache can return a redirect response and send the client to the correct, fully qualified, server address. This is the preferred method since the user's bookmark files will then contain fully qualified hosts.
ProxyRequests
Off
This allows or prevents Apache from functioning as a forward proxy server. Setting ProxyRequests to 'off' does not disable use of the ProxyPass directive.
Warning: Do not enable proxying until you have secured your server. Open proxy servers are dangerous both to your network and to the Internet at large.
This defines remote proxies to this proxy. match is either the name of a URL-scheme that the remote server supports, or a partial URL for which the remote server should be used, or '*' to indicate the server should be contacted for all requests. remote-server is a partial URL for the remote server. Syntax:
remote-server = protocol://hostname[:port]protocol is the protocol that should be used to communicate with the remote server; only "http" is supported by this module.
Example:
ProxyRemote http://goodguys.com/ http://mirrorguys.com:8000 ProxyRemote * http://cleversite.com ProxyRemote ftp http://ftpproxy.mydomain.com:8080In the last example, the proxy will forward FTP requests, encapsulated as yet another HTTP proxy request, to another proxy which can handle them.
This directive allows remote servers to be mapped into the space of the local server; the local server does not act as a proxy in the conventional sense, but appears to be a mirror of the remote server. path is the name of a local virtual path; url is a partial URL for the remote server.
Suppose the local server has address http://wibble.org/; then
ProxyPass /mirror/foo/ http://foo.com/
will cause a local request for the <http://wibble.org/mirror/foo/bar> to be internally converted into a proxy request to <http://foo.com/bar>.
The !
directive is useful when you don't want
to reverse-proxy a subdirectory, e.g.
ProxyPass /mirror/foo/bar ! ProxyPass /mirror/foo/ http://foo.com/
will proxy all requests to /mirror/foo to foo.com except requests made to /mirror/foo/bar.
Note: Order is important. Exclusions must come before the general ProxyPass directive.
Warning: The ProxyRequests
directive should
usually be set off when using ProxyPass
.
This directive lets Apache adjust the URL in the Location header on HTTP redirect responses. For instance this is essential when Apache is used as a reverse proxy to avoid by-passing the reverse proxy because of HTTP redirects on the backend servers which stay behind the reverse proxy.
path is the name of a local virtual path.
url is a partial URL for the remote server - the same
way they are used for the ProxyPass directive.
Example:
Suppose the local server has address
http://wibble.org/; then
ProxyPass /mirror/foo/ http://foo.com/ ProxyPassReverse /mirror/foo/ http://foo.com/will not only cause a local request for the <http://wibble.org/mirror/foo/bar> to be internally converted into a proxy request to <http://foo.com/bar> (the functionality ProxyPass provides here). It also takes care of redirects the server foo.com sends: when http://foo.com/bar is redirected by him to http://foo.com/quux Apache adjusts this to http://wibble.org/mirror/foo/quux before forwarding the HTTP redirect response to the client.
Note that this ProxyPassReverse directive can also be used in conjunction with the proxy pass-through feature ("RewriteRule ... [P]") from mod_rewrite because its doesn't depend on a corresponding ProxyPass directive.
When enabled, this option will pass the Host: line from the
incoming request to the proxied host, instead of the hostname
specified in the proxypass line.
This option should normally be turned Off. It is mostly useful
in special configurations like proxied mass name-based virtual
hosting, where the original Host header needs to be evaluated by
the backend server.
The AllowCONNECT directive specifies a list of
port numbers to which the proxy CONNECT method may
connect. Today's browsers use this method when a https
connection is requested and proxy tunneling over http
is in effect.
By default, only the default https port (443) and the default
snews port (563) are enabled. Use the AllowCONNECT
directive to override this default and allow connections to
the listed ports only.
The ProxyBlock directive specifies a list of words, hosts and/or domains, separated by spaces. HTTP, HTTPS, and FTP document requests to sites whose names contain matched words, hosts or domains are blocked by the proxy server. The proxy module will also attempt to determine IP addresses of list items which may be hostnames during startup, and cache them for match test as well. Example:
ProxyBlock joes-garage.com some-host.co.uk rocky.wotsamattau.edu'rocky.wotsamattau.edu' would also be matched if referenced by IP address.
Note that 'wotsamattau' would also be sufficient to match 'wotsamattau.edu'.
Note also that
ProxyBlock *blocks connections to all sites.
The ProxyReceiveBufferSize directive specifies an explicit network buffer size for outgoing HTTP and FTP connections, for increased throughput. It has to be greater than 512 or set to 0 to indicate that the system's default buffer size should be used.
Example:
ProxyReceiveBufferSize 2048
The ProxyIOBufferSize directive specifies the number of bytes that will be read from a remote HTTP or FTP server at one time. This directive is different from the ProxyReceiveBufferSize directive, which specifies the low level socket buffer size.
When a response is received which fits entirely within the IO buffer size, the remote HTTP or FTP server socket will be closed before an attempt is made to write the response to the client. This ensures that the remote server does not remain connected unnecessarily while the response is delivered to a slow client. A high value for the IO buffer decreases the load on remote HTTP and FTP servers, at the expense of greater RAM footprint on the proxy.
Example:
ProxyIOBufferSize 131072
This directive is only useful for Apache proxy servers within intranets. The NoProxy directive specifies a list of subnets, IP addresses, hosts and/or domains, separated by spaces. A request to a host which matches one or more of these is always served directly, without forwarding to the configured ProxyRemote proxy server(s).
Example:
ProxyRemote * http://firewall.mycompany.com:81 NoProxy .mycompany.com 192.168.112.0/21The arguments to the NoProxy directive are one of the following type list:
See Also: DNS Issues
See Also: DNS Issues
This directive is only useful for Apache proxy servers within intranets. The ProxyDomain directive specifies the default domain which the apache proxy server will belong to. If a request to a host without a domain name is encountered, a redirection response to the same host with the configured Domain appended will be generated.
Example:
ProxyRemote * http://firewall.mycompany.com:81 NoProxy .mycompany.com 192.168.112.0/21 ProxyDomain .mycompany.com
This directive controls the use of the Via: HTTP header by the proxy. Its intended use is to control the flow of of proxy requests along a chain of proxy servers. See RFC2068 (HTTP/1.1) for an explanation of Via: header lines.
If an http transfer that is being cached is cancelled, the proxy module will complete the transfer to cache if more than the percentage specified has already been transferred.
This is a percentage, and must be a number between 1 and 100, or 0 to use the default. 100 will cause a document to be cached only if the transfer was allowed to complete. A number between 60 and 90 is recommended.
Sets the name of the directory to contain cache files; this
must be writable by the httpd server. (see the User
directive).
Setting CacheRoot
enables proxy cacheing; without
defining a CacheRoot
, proxy functionality will be
available if ProxyRequests
are set to
On
, but no cacheing will be available.
CacheSize
5
Sets the desired space usage of the cache, in KB (1024-byte
units). Although usage may grow above this setting, the garbage
collection will delete files until the usage is at or below
this setting.
Depending on the expected proxy traffic volume and
CacheGcInterval
, use a value which is at least 20
to 40 % lower than the available space.
Check the cache after the specified number of
hours, and delete files if the space usage is greater
than that set by CacheSize. Note that hours accepts a
float value, you could for example use CacheGcInterval
1.5
to check the cache every 90 minutes. (If unset, no
garbage collection will be performed, and the cache will grow
indefinitely.) Note also that the larger the
CacheGcInterval
, the more extra space beyond the
configured CacheSize
will be needed for the cache
between garbage collections.
CacheMaxExpire
24
Specifies the maximum number of hours for which cachable HTTP documents will be retained without checking the origin server. Thus, documents will be out of date at most this number of hours This restriction is enforced even if an expiry date was supplied with the document.
CacheLastModifiedFactor 0.1
If the origin HTTP server did not supply an expiry date for the document, then estimate one using the formula
expiry-period = time-since-last-modification * factorFor example, if the document was last modified 10 hours ago, and factor is 0.1, then the expiry period will be set to 10*0.1 = 1 hour.
If the expiry-period would be longer than that set by CacheMaxExpire, then the latter takes precedence.
CacheDirLevels
3
CacheDirLevels sets the number of levels of subdirectories in the cache. Cached data will be saved this many directory levels below CacheRoot.
CacheDirLength
1
CacheDirLength sets the number of characters in proxy cache subdirectory names.
CacheDefaultExpire 1
If the document is fetched via a protocol that does not support expiry times, then use the specified number of hours as the expiry time. CacheMaxExpire does not override this setting.
The NoCache directive specifies a list of words, hosts and/or domains, separated by spaces. HTTP and non-passworded FTP documents from matched words, hosts or domains are not cached by the proxy server. The proxy module will also attempt to determine IP addresses of list items which may be hostnames during startup, and cache them for match test as well. Example:
NoCache joes-garage.com some-host.co.uk bullwinkle.wotsamattau.edu'bullwinkle.wotsamattau.edu' would also be matched if referenced by IP address.
Note that 'wotsamattau' would also be sufficient to match 'wotsamattau.edu'.
Note also that
NoCache *disables caching completely.