xtables-addons(8) v2.5 (2014-04-18) xtables-addons(8)Name
Xtables-addons — additional extensions for iptables, ip6tables, etc.
Targets
ACCOUNT
The ACCOUNT target is a high performance accounting system for large
local networks. It allows per-IP accounting in whole prefixes of IPv4
addresses with size of up to /8 without the need to add individual
accouting rule for each IP address.
The ACCOUNT is designed to be queried for data every second or at least
every ten seconds. It is written as kernel module to handle high band‐
widths without packet loss.
The largest possible subnet size is 24 bit, meaning for example
10.0.0.0/8 network. ACCOUNT uses fixed internal data structures which
speeds up the processing of each packet. Furthermore, accounting data
for one complete 192.168.1.X/24 network takes 4 KB of memory. Memory
for 16 or 24 bit networks is only allocated when needed.
To optimize the kernel<->userspace data transfer a bit more, the kernel
module only transfers information about IPs, where the src/dst packet
counter is not 0. This saves precious kernel time.
There is no /proc interface as it would be too slow for continuous
access. The read-and-flush query operation is the fastest, as no
internal data snapshot needs to be created&copied for all data. Use the
"read" operation without flush only for debugging purposes!
Usage:
ACCOUNT takes two mandatory parameters:
--addr network/netmask
where network/netmask is the subnet to account for, in CIDR syn‐
tax
--tname NAME
where NAME is the name of the table where the accounting infor‐
mation should be stored
The subnet 0.0.0.0/0 is a special case: all data are then stored in the
src_bytes and src_packets structure of slot "0". This is useful if you
want to account the overall traffic to/from your internet provider.
The data can be queried using the userspace libxt_ACCOUNT_cl library,
and by the reference implementation to show usage of this library, the
iptaccount(8) tool.
Here is an example of use:
iptables -A FORWARD -j ACCOUNT --addr 0.0.0.0/0 --tname all_outgoing;
iptables -A FORWARD -j ACCOUNT --addr 192.168.1.0/24 --tname sales;
This creates two tables called "all_outgoing" and "sales" which can be
queried using the userspace library/iptaccount tool.
Note that this target is non-terminating — the packet destined to it
will continue traversing the chain in which it has been used.
Also note that once a table has been defined for specific CIDR
address/netmask block, it can be referenced multiple times using -j
ACCOUNT, provided that both the original table name and address/netmask
block are specified.
For more information go to http://www.intra2net.com/en/devel‐
oper/ipt_ACCOUNT/
CHAOS
Causes confusion on the other end by doing odd things with incoming
packets. CHAOS will randomly reply (or not) with one of its config‐
urable subtargets:
--delude
Use the REJECT and DELUDE targets as a base to do a sudden or
deferred connection reset, fooling some network scanners to
return non-deterministic (randomly open/closed) results, and in
case it is deemed open, it is actually closed/filtered.
--tarpit
Use the REJECT and TARPIT target as a base to hold the connec‐
tion until it times out. This consumes conntrack entries when
connection tracking is loaded (which usually is on most
machines), and routers inbetween you and the Internet may fail
to do their connection tracking if they have to handle more con‐
nections than they can.
The randomness factor of not replying vs. replying can be set during
load-time of the xt_CHAOS module or during runtime in /sys/mod‐
ules/xt_CHAOS/parameters.
See http://inai.de/projects/chaostables/ for more information about
CHAOS, DELUDE and lscan.
DELUDE
The DELUDE target will reply to a SYN packet with SYN-ACK, and to all
other packets with an RST. This will terminate the connection much like
REJECT, but network scanners doing TCP half-open discovery can be
spoofed to make them belive the port is open rather than closed/fil‐
tered.
DHCPMAC
In conjunction with ebtables, DHCPMAC can be used to completely change
all MAC addresses from and to a VMware-based virtual machine. This is
needed because VMware does not allow to set a non-VMware MAC address
before an operating system is booted (and the MAC be changed with `ip
link set eth0 address aa:bb..`).
--set-mac aa:bb:cc:dd:ee:ff[/mask]
Replace the client host MAC address field in the DHCP message
with the given MAC address. This option is mandatory. The mask
parameter specifies the prefix length of bits to change.
EXAMPLE, replacing all addresses from one of VMware's assigned vendor
IDs (00:50:56) addresses with something else:
iptables -t mangle -A FORWARD -p udp --dport 67 -m physdev --physdev-in
vmnet1 -m dhcpmac --mac 00:50:56:00:00:00/24 -j DHCPMAC --set-mac
ab:cd:ef:00:00:00/24
iptables -t mangle -A FORWARD -p udp --dport 68 -m physdev --phys‐
dev-out vmnet1 -m dhcpmac --mac ab:cd:ef:00:00:00/24 -j DHCPMAC
--set-mac 00:50:56:00:00:00/24
(This assumes there is a bridge interface that has vmnet1 as a port.
You will also need to add appropriate ebtables rules to change the MAC
address of the Ethernet headers.)
DNETMAP
The DNETMAP target allows dynamic two-way 1:1 mapping of IPv4 subnets.
A single rule can map a private subnet to a shorter public subnet, cre‐
ating and maintaining unambiguous private-public IP address bindings.
The second rule can be used to map new flows to a private subnet
according to maintained bindings. The target allows efficient public
IPv4 space usage and unambiguous NAT at the same time.
The target can be used only in the nat table in POSTROUTING or OUTPUT
chains for SNAT, and in PREROUTING for DNAT. Only flows directed to
bound addresses will be DNATed. The packet continues chain traversal if
there is no free postnat address to be assigned to the prenat address.
The default binding TTL is 10 minutes and can be changed using the
default_ttl module option. The default address hash size is 256 and can
be changed using the hash_size module option.
--prefix addr/mask
The network subnet to map to. If not specified, all existing
prefixes are used.
--reuse
Reuse the entry for a given prenat address from any prefix even
if the binding's TTL is < 0.
--persistent
Set the prefix to be persistent. It will not be removed after
deleting the last iptables rule. The option is effective only in
the first rule for a given prefix. If you need to change persis‐
tency for an existing prefix, please use the procfs interface
described below.
--static
Do not create dynamic mappings using this rule. Use static map‐
pings only. Note that you need to create static mappings via the
procfs interface for this rule for this option to have any
effect.
--ttl seconds
Reset the binding's TTL value to seconds. If a negative value is
specified, the binding's TTL is kept unchanged. If this option
is not specified, then the default TTL value (600s) is used.
* /proc interface
The module creates the following entries for each new specified subnet:
/proc/net/xt_DNETMAP/subnet_mask
Contains the binding table for the given subnet/mask. Each line
contains prenat address, postnat address, ttl (seconds until the
entry times out), lasthit (last hit to the entry in seconds rel‐
ative to system boot time). Please note that the ttl and lasthit
entries contain an
/proc/net/xt_DNETMAP/subnet_mask_stat
Contains statistics for a given subnet/mask. The line contains
four numerical values separated by spaces. The first one is the
number of currently used dynamic addresses (bindings with nega‐
tive TTL excluded), the second one is the number of static
assignments, the third one is the number of all usable addresses
in the subnet, and the fourth one is the mean TTL value for all
active entries. If the prefix has the persistent flag set, it
will be noted as fifth entry.
The following write operations are supported via the procfs interface:
echo "+prenat-address:postnat-address" >/proc/net/xt_DNETMAP/sub‐
net_mask
Adds a static binding between the prenat and postnap address. If
postnat_address is already bound, any previous binding will be
timed out immediately. A static binding is never timed out.
echo "-address" >/proc/net/xt_DNETMAP/subnet_mask
Removes the binding with address as prenat or postnat address.
If the removed binding is currently static, it will make the
entry available for dynamic allocation.
echo "+persistent" >/proc/net/xt_DNETMAP/subnet_mask
Sets the persistent flag for the prefix. It is useful if you do
not want bindings to get flushed when the firewall is restarted.
You can check if the prefix is persistent by printing the con‐
tents of /proc/net/xt_DNETMAP/subnet_mask_stat.
echo "-persistent" >/proc/net/xt_DNETMAP/subnet_mask
Unsets the persistent flag for the prefix. In this mode, the
prefix will be deleted if the last iptables rule for that prefix
is removed.
echo "flush" >/proc/net/xt_DNETMAP/subnet_mask
Flushes all bindings for the specific prefix. All static entries
are also flushed and become available for dynamic bindings.
Note! Entries are removed if the last iptables rule for a specific pre‐
fix is deleted unless the persistent flag is set.
* Logging
The module logs binding add/timeout events to klog. This behaviour can
be disabled using the disable_log module parameter.
* Examples
1. Map subnet 192.168.0.0/24 to subnets 20.0.0.0/26. SNAT only:
iptables -t nat -A POSTROUTING -s 192.168.0.0/24 -j DNETMAP --prefix
20.0.0.0/26
Active hosts from the 192.168.0.0/24 subnet are mapped to 20.0.0.0/26.
If the packet from a not yet bound prenat address hits the rule and
there are no free or timed-out (TTL<0) entries in prefix 20.0.0.0/28,
then a notice is logged to klog and chain traversal continues. If
packet from an already-bound prenat address hits the rule, the bind‐
ing's TTL value is reset to default_ttl and SNAT is performed.
2. Use of --reuse and --ttl switches, multiple rule interaction:
iptables -t nat -A POSTROUTING -s 192.168.0.0/24 -j DNETMAP --prefix
20.0.0.0/26 --reuse --ttl 200
iptables -t nat -A POSTROUTING -s 192.168.0.0/24 -j DNETMAP --prefix
30.0.0.0/26
Active hosts from 192.168.0.0/24 subnet are mapped to 20.0.0.0/26 with
TTL = 200 seconds. If there are no free addresses in first prefix, the
next one (30.0.0.0/26) is used with the default TTL. It is important to
note that the first rule SNATs all flows whose source address is
already actively bound (TTL>0) to ANY prefix. The --reuse parameter
makes this functionality work even for inactive (TTL<0) entries.
If both subnets are exhausted, then chain traversal continues.
3. Map 192.168.0.0/24 to subnets 20.0.0.0/26 in a bidirectional way:
iptables -t nat -A POSTROUTING -s 192.168.0.0/24 -j DNETMAP --prefix
20.0.0.0/26
iptables -t nat -A PREROUTING -j DNETMAP
If the host 192.168.0.10 generates some traffic, it gets bound to first
free address in the subnet — 20.0.0.0. Now, any traffic directed to
20.0.0.0 gets DNATed to 192.168.0.10 as long as there is an active
(TTL>0) binding. There is no need to specify --prefix parameter in a
PREROUTING rule, because this way, it DNATs traffic to all active pre‐
fixes. You could specify the prefix you would like to make DNAT work
for a specific prefix only.
4. Map 192.168.0.0/24 to subnets 20.0.0.0/26 with static assignments
only:
iptables -t nat -A POSTROUTING -s 192.168.0.0/24 -j DNETMAP --prefix
20.0.0.0/26 --static
echo "+192.168.0.10:20.0.0.1" >/proc/net/xt_DNETMAP/20.0.0.0_26
echo "+192.168.0.11:20.0.0.2" >/proc/net/xt_DNETMAP/20.0.0.0_26
echo "+192.168.0.51:20.0.0.3" >/proc/net/xt_DNETMAP/20.0.0.0_26
This configuration will allow only preconfigured static bindings to
work due to the static rule option. Without this flag, dynamic bindings
would be created using non-static entries.
5. Persistent prefix:
iptables -t nat -A POSTROUTING -s 192.168.0.0/24 -j DNETMAP --prefix
20.0.0.0/26 --persistent
or
iptables -t nat -A POSTROUTING -s 192.168.0.0/24 -j DNETMAP --prefix
20.0.0.0/26
echo "+persistent" >/proc/net/xt_DNETMAP/20.0.0.0_26
Now, we can check the persistent flag of the prefix:
cat /proc/net/xt_DNETMAP/20.0.0.0_26
0 0 64 0 persistent
Flush the iptables nat table and see that prefix is still in existence:
iptables -F -t nat
ls -l /proc/net/xt_DNETMAP
-rw-r--r-- 1 root root 0 06-10 09:01 20.0.0.0_26
-rw-r--r-- 1 root root 0 06-10 09:01 20.0.0.0_26_stat
ECHO
The ECHO target will send back all packets it received. It serves as an
examples for an Xtables target.
ECHO takes no options.
IPMARK
Allows you to mark a received packet basing on its IP address. This can
replace many mangle/mark entries with only one, if you use firewall
based classifier.
This target is to be used inside the mangle table.
--addr {src|dst}
Select source or destination IP address as a basis for the mark.
--and-mask mask
Perform bitwise AND on the IP address and this bitmask.
--or-mask mask
Perform bitwise OR on the IP address and this bitmask.
--shift value
Shift addresses to the right by the given number of bits before
taking it as a mark. (This is done before ANDing or ORing it.)
This option is needed to select part of an IPv6 address, because
marks are only 32 bits in size.
The order of IP address bytes is reversed to meet "human order of
bytes": 192.168.0.1 is 0xc0a80001. At first the "AND" operation is per‐
formed, then "OR".
Examples:
We create a queue for each user, the queue number is adequate to the IP
address of the user, e.g.: all packets going to/from 192.168.5.2 are
directed to 1:0502 queue, 192.168.5.12 -> 1:050c etc.
We have one classifier rule:
tc filter add dev eth3 parent 1:0 protocol ip fw
Earlier we had many rules just like below:
iptables -t mangle -A POSTROUTING -o eth3 -d 192.168.5.2 -j MARK
--set-mark 0x10502
iptables -t mangle -A POSTROUTING -o eth3 -d 192.168.5.3 -j MARK
--set-mark 0x10503
Using IPMARK target we can replace all the mangle/mark rules with only
one:
iptables -t mangle -A POSTROUTING -o eth3 -j IPMARK --addr dst
--and-mask 0xffff --or-mask 0x10000
On the routers with hundreds of users there should be significant load
decrease (e.g. twice).
(IPv6 example) If the source address is of the form
2001:db8:45:1d:20d:93ff:fe9b:e443 and the resulting mark should be
0x93ff, then a right-shift of 16 is needed first:
-t mangle -A PREROUTING -s 2001:db8::/32 -j IPMARK --addr src
--shift 16 --and-mask 0xFFFF
LOGMARK
The LOGMARK target will log packet and connection marks to syslog.
--log-level level
A logging level between 0 and 8 (inclusive).
--log-prefix string
Prefix log messages with the specified prefix; up to 29 bytes
long, and useful for distinguishing messages in the logs.
SYSRQ
The SYSRQ target allows to remotely trigger sysrq on the local machine
over the network. This can be useful when vital parts of the machine
hang, for example an oops in a filesystem causing locks to be not
released and processes to get stuck as a result — if still possible,
use /proc/sysrq-trigger. Even when processes are stuck, interrupts are
likely to be still processed, and as such, sysrq can be triggered
through incoming network packets.
The xt_SYSRQ implementation uses a salted hash and a sequence number to
prevent network sniffers from either guessing the password or replaying
earlier requests. The initial sequence number comes from the time of
day so you will have a small window of vulnerability should time go
backwards at a reboot. However, the file /sys/module/xt_SYSREQ/seqno
can be used to both query and update the current sequence number. Also,
you should limit as to who can issue commands using -s and/or -m mac,
and also that the destination is correct using -d (to protect against
potential broadcast packets), noting that it is still short of MAC/IP
spoofing:
-A INPUT -s 10.10.25.1 -m mac --mac-source aa:bb:cc:dd:ee:ff -d
10.10.25.7 -p udp --dport 9 -j SYSRQ
(with IPsec) -A INPUT -s 10.10.25.1 -d 10.10.25.7 -m policy
--dir in --pol ipsec --proto esp --tunnel-src 10.10.25.1 --tun‐
nel-dst 10.10.25.7 -p udp --dport 9 -j SYSRQ
You should also limit the rate at which connections can be received to
limit the CPU time taken by illegal requests, for example:
-A INPUT -s 10.10.25.1 -m mac --mac-source aa:bb:cc:dd:ee:ff -d
10.10.25.7 -p udp --dport 9 -m limit --limit 5/minute -j SYSRQ
This extension does not take any options. The -p udp options are
required.
The SYSRQ password can be changed through /sys/module/xt_SYSRQ/parame‐
ters/password, for example:
echo -n "password" >/sys/module/xt_SYSRQ/parameters/password
The module will not respond to sysrq requests until a password has been
set.
Alternatively, the password may be specified at modprobe time, but this
is insecure as people can possible see it through ps(1). You can use an
option line in e.g. /etc/modprobe.d/xt_sysrq if it is properly guarded,
that is, only readable by root.
options xt_SYSRQ password=cookies
The hash algorithm can also be specified as a module option, for exam‐
ple, to use SHA-256 instead of the default SHA-1:
options xt_SYSRQ hash=sha256
The xt_SYSRQ module is normally silent unless a successful request is
received, but the debug module parameter can be used to find exactly
why a seemingly correct request is not being processed.
To trigger SYSRQ from a remote host, just use socat:
sysrq_key="s" # the SysRq key(s)
password="password"
seqno="$(date +%s)"
salt="$(dd bs=12 count=1 if=/dev/urandom 2>/dev/null |
openssl enc -base64)"
ipaddr="2001:0db8:0000:0000:0000:ff00:0042:8329"
req="$sysrq_key,$seqno,$salt"
req="$req,$(echo -n "$req,$ipaddr,$password" | sha1sum | cut -c1-40)"
echo "$req" | socat stdin udp-sendto:$ipaddr:9
See the Linux docs for possible sysrq keys. Important ones are:
re(b)oot, power(o)ff, (s)ync filesystems, (u)mount and remount read‐
only. More than one sysrq key can be used at once, but bear in mind
that, for example, a sync may not complete before a subsequent reboot
or poweroff.
An IPv4 address should have no leading zeros, an IPv6 address should be
in the full expanded form (as shown above). The debug option will cause
output to be emitted in the same form.
The hashing scheme should be enough to prevent mis-use of SYSRQ in many
environments, but it is not perfect: take reasonable precautions to
protect your machines.
TARPIT
Captures and holds incoming TCP connections using no local per-connec‐
tion resources.
TARPIT only works at the TCP level, and is totally application agnos‐
tic. This module will answer a TCP request and play along like a lis‐
tening server, but aside from sending an ACK or RST, no data is sent.
Incoming packets are ignored and dropped. The attacker will terminate
the session eventually. This module allows the initial packets of an
attack to be captured by other software for inspection. In most cases
this is sufficient to determine the nature of the attack.
This offers similar functionality to LaBrea <http://www.hack‐
busters.net/LaBrea/> but does not require dedicated hardware or IPs.
Any TCP port that you would normally DROP or REJECT can instead become
a tarpit.
--tarpit
This mode completes a connection with the attacker but limits
the window size to 0, thus keeping the attacker waiting long
periods of time. While he is maintaining state of the connection
and trying to continue every 60-240 seconds, we keep none, so it
is very lightweight. Attempts to close the connection are
ignored, forcing the remote side to time out the connection in
12-24 minutes. This mode is the default.
--honeypot
This mode completes a connection with the attacker, but signals
a normal window size, so that the remote side will attempt to
send data, often with some very nasty exploit attempts. We can
capture these packets for decoding and further analysis. The
module does not send any data, so if the remote expects an
application level response, the game is up.
--reset
This mode is handy because we can send an inline RST (reset). It
has no other function.
To tarpit connections to TCP port 80 destined for the current machine:
-A INPUT -p tcp -m tcp --dport 80 -j TARPIT
To significantly slow down Code Red/Nimda-style scans of unused address
space, forward unused ip addresses to a Linux box not acting as a
router (e.g. "ip route 10.0.0.0 255.0.0.0 ip.of.linux.box" on a Cisco),
enable IP forwarding on the Linux box, and add:
-A FORWARD -p tcp -j TARPIT
-A FORWARD -j DROP
NOTE: If you use the conntrack module while you are using TARPIT, you
should also use unset tracking on the packet, or the kernel will unnec‐
essarily allocate resources for each TARPITted connection. To TARPIT
incoming connections to the standard IRC port while using conntrack,
you could:
-t raw -A PREROUTING -p tcp --dport 6667 -j CT --notrack
-A INPUT -p tcp --dport 6667 -j NFLOG
-A INPUT -p tcp --dport 6667 -j TARPIT
Matches
condition
This matches if a specific condition variable is (un)set.
[!] --condition name
Match on boolean value stored in /proc/net/nf_condition/name.
dhcpmac
--mac aa:bb:cc:dd:ee:ff[/mask]
Matches the DHCP "Client Host" address (a MAC address) in a DHCP
message. mask specifies the prefix length of the initial por‐
tion to match.
fuzzy
This module matches a rate limit based on a fuzzy logic controller
(FLC).
--lower-limit number
Specifies the lower limit, in packets per second.
--upper-limit number
Specifies the upper limit, also in packets per second.
geoip
Match a packet by its source or destination country.
[!] --src-cc, --source-country country[,country...]
Match packet coming from (one of) the specified country(ies)
[!] --dst-cc, --destination-country country[,country...]
Match packet going to (one of) the specified country(ies)
NOTE: The country is inputed by its ISO-3166 code.
The extra files you will need is the binary database files. They are
generated from a country-subnet database with the geoip_build_db.pl
tool that is shipped with the source package, and which should be
available in compiled packages in /usr/lib(exec)/xtables-addons/. The
first command retrieves CSV files from MaxMind, while the other two
build packed bisectable range files:
mkdir -p /usr/share/xt_geoip; cd /tmp; $path/to/xt_geoip_dl;
$path/to/xt_geoip_build -D /usr/share/xt_geoip GeoIP*.csv;
The shared library is hardcoded to look in these paths, so use them.
gradm
This module matches packets based on grsecurity RBAC status.
[!] --enabled
Matches packets if grsecurity RBAC is enabled.
[!] --disabled
Matches packets if grsecurity RBAC is disabled.
iface
Allows you to check interface states. First, an interface needs to be
selected for comparison. Exactly one option of the following three must
be specified:
--iface name
Check the states on the given interface.
--dev-in
Check the states on the interface on which the packet came in.
If the input device is not set, because for example you are
using -m iface in the OUTPUT chain, this submatch returns false.
--dev-out
Check the states on the interface on which the packet will go
out. If the output device is not set, because for example you
are using -m iface in the INPUT chain, this submatch returns
false.
Following that, one can select the interface properties to check for:
[!] --up, [!] --down
Check the UP flag.
[!] --broadcast
Check the BROADCAST flag.
[!] --loopback
Check the LOOPBACK flag.
[!] --pointtopoint
Check the POINTTOPOINT flag.
[!] --running
Check the RUNNING flag. Do NOT rely on it!
[!] --noarp, [!] --arp
Check the NOARP flag.
[!] --promisc
Check the PROMISC flag.
[!] --multicast
Check the MULTICAST flag.
[!] --dynamic
Check the DYNAMIC flag.
[!] --lower-up
Check the LOWER_UP flag.
[!] --dormant
Check the DORMANT flag.
ipp2p
This module matches certain packets in P2P flows. It is not designed to
match all packets belonging to a P2P connection — use IPP2P together
with CONNMARK for this purpose.
Use it together with -p tcp or -p udp to search these protocols only or
without -p switch to search packets of both protocols.
IPP2P provides the following options, of which one or more may be spec‐
ified on the command line:
--edk Matches as many eDonkey/eMule packets as possible.
--kazaa
Matches as many KaZaA packets as possible.
--gnu Matches as many Gnutella packets as possible.
--dc Matches as many Direct Connect packets as possible.
--bit Matches BitTorrent packets.
--apple
Matches AppleJuice packets.
--soul Matches some SoulSeek packets. Considered as beta, use careful!
--winmx
Matches some WinMX packets. Considered as beta, use careful!
--ares Matches Ares and AresLite packets. Use together with -j DROP
only.
--debug
Prints some information about each hit into kernel logfile. May
produce huge logfiles so beware!
Note that ipp2p may not (and often, does not) identify all packets that
are exchanged as a result of running filesharing programs.
There is more information on http://ipp2p.org/ , but it has not been
updated since September 2006, and the syntax there is different from
the ipp2p.c provided in Xtables-addons; most importantly, the --ipp2p
flag was removed due to its ambiguity to match "all known" protocols.
ipv4options
The "ipv4options" module allows to match against a set of IPv4 header
options.
--flags [!]symbol[,[!]symbol...]
Specify the options that shall appear or not appear in the
header. Each symbol specification is delimited by a comma, and a
'!' can be prefixed to a symbol to negate its presence. Symbols
are either the name of an IPv4 option or its number. See exam‐
ples below.
--any By default, all of the flags specified must be present/absent,
that is, they form an AND condition. Use the --any flag instead
to use an OR condition where only at least one symbol spec must
be true.
Known symbol names (and their number):
1 — nop
2 — security — RFC 1108
3 — lsrr — Loose Source Routing, RFC 791
4 — timestamp — RFC 781, 791
7 — record-route — RFC 791
9 — ssrr — Strict Source Routing, RFC 791
11 — mtu-probe — RFC 1063
12 — mtu-reply — RFC 1063
18 — traceroute — RFC 1393
20 — router-alert — RFC 2113
Examples:
Match packets that have both Timestamp and NOP: -m ipv4options --flags
nop,timestamp
~ that have either of Timestamp or NOP, or both: --flags nop,timestamp
--any
~ that have Timestamp and no NOP: --flags '!nop,timestamp'
~ that have either no NOP or a timestamp (or both conditions): --flags
'!nop,timestamp' --any
length2
This module matches the length of a packet against a specific value or
range of values.
[!] --length length[:length]
Match exact length or length range.
--layer3
Match the layer3 frame size (e.g. IPv4/v6 header plus payload).
--layer4
Match the layer4 frame size (e.g. TCP/UDP header plus payload).
--layer5
Match the layer5 frame size (e.g. TCP/UDP payload, often called
layer7).
If no --layer* option is given, --layer3 is assumed by default. Note
that using --layer5 may not match a packet if it is not one of the rec‐
ognized types (currently TCP, UDP, UDPLite, ICMP, AH and ESP) or which
has no 5th layer.
lscan
Detects simple low-level scan attempts based upon the packet's con‐
tents. (This is different from other implementations, which also try
to match the rate of new connections.) Note that an attempt is only
discovered after it has been carried out, but this information can be
used in conjunction with other rules to block the remote host's future
connections. So this match module will match on the (probably) last
packet the remote side will send to your machine.
--stealth
Match if the packet did not belong to any known TCP connection
(Stealth/FIN/XMAS/NULL scan).
--synscan
Match if the connection was a TCP half-open discovery (SYN
scan), i.e. the connection was torn down after the 2nd packet in
the 3-way handshake.
--cnscan
Match if the connection was a TCP full open discovery (connect
scan), i.e. the connection was torn down after completion of the
3-way handshake.
--grscan
Match if data in the connection only flew in the direction of
the remote side, e.g. if the connection was terminated after a
locally running daemon sent its identification. (E.g. openssh,
smtp, ftpd.) This may falsely trigger on warranted single-direc‐
tion data flows, usually bulk data transfers such as FTP DATA
connections or IRC DCC. Grab Scan Detection should only be used
on ports where a protocol runs that is guaranteed to do a bidi‐
rectional exchange of bytes.
NOTE: Some clients (Windows XP for example) may do what looks like a
SYN scan, so be advised to carefully use xt_lscan in conjunction with
blocking rules, as it may lock out your very own internal network.
psd
Attempt to detect TCP and UDP port scans. This match was derived from
Solar Designer's scanlogd.
--psd-weight-threshold threshold
Total weight of the latest TCP/UDP packets with different desti‐
nation ports coming from the same host to be treated as port
scan sequence.
--psd-delay-threshold delay
Delay (in hundredths of second) for the packets with different
destination ports coming from the same host to be treated as
possible port scan subsequence.
--psd-lo-ports-weight weight
Weight of the packet with privileged (<=1024) destination port.
--psd-hi-ports-weight weight
Weight of the packet with non-priviliged destination port.
quota2
The "quota2" implements a named counter which can be increased or
decreased on a per-match basis. Available modes are packet counting or
byte counting. The value of the counter can be read and reset through
procfs, thereby making this match a minimalist accounting tool.
When counting down from the initial quota, the counter will stop at 0
and the match will return false, just like the original "quota" match.
In growing (upcounting) mode, it will always return true.
--grow Count upwards instead of downwards.
--no-change
Makes it so the counter or quota amount is never changed by
packets matching this rule. This is only really useful in
"quota" mode, as it will allow you to use complex prerouting
rules in association with the quota system, without counting a
packet twice.
--name name
Assign the counter a specific name. This option must be present,
as an empty name is not allowed. Names starting with a dot or
names containing a slash are prohibited.
[!] --quota iq
Specify the initial quota for this counter. If the counter
already exists, it is not reset. An "!" may be used to invert
the result of the match. The negation has no effect when --grow
is used.
--packets
Count packets instead of bytes that passed the quota2 match.
Because counters in quota2 can be shared, you can combine them for var‐
ious purposes, for example, a bytebucket filter that only lets as much
traffic go out as has come in:
-A INPUT -p tcp --dport 6881 -m quota --name bt --grow; -A OUTPUT -p
tcp --sport 6881 -m quota --name bt;
pknock
Pknock match implements so-called "port knocking", a stealthy system
for network authentication: a client sends packets to selected ports in
a specific sequence (= simple mode, see example 1 below), or a HMAC
payload to a single port (= complex mode, see example 2 below), to a
target machine that has pknock rule(s) installed. The target machine
then decides whether to unblock or block (again) the pknock-protected
port(s). This can be used, for instance, to avoid brute force attacks
on ssh or ftp services.
Example prerequisites:
modprobe cn
modprobe xt_pknock
Example 1 (TCP mode, manual closing of opened port not possible):
iptables -P INPUT DROP
iptables -A INPUT -p tcp -m pknock --knockports 4002,4001,4004
--strict --name SSH --time 10 --autoclose 60 --dport 22 -j
ACCEPT
The rule will allow tcp port 22 for the attempting IP address after the
successful reception of TCP SYN packets to ports 4002, 4001 and 4004,
in this order (a.k.a. port-knocking). Port numbers in the connect
sequence must follow the exact specification, no other ports may be
"knocked" inbetween. The rule is named 'SSH' — a file of the same name
for tracking port knocking states will be created in
/proc/net/xt_pknock . Successive port knocks must occur with delay of
at most 10 seconds. Port 22 (from the example) will be automatiaclly
dropped after 60 minutes after it was previously allowed.
Example 2 (UDP mode — non-replayable and non-spoofable, manual closing
of opened port possible, secure, also called "SPA" = Secure Port Autho‐
rization):
iptables -A INPUT -p udp -m pknock --knockports 4000 --name FTP
--opensecret foo --closesecret bar --autoclose 240 -j DROP
iptables -A INPUT -p tcp -m pknock --checkip --name FTP --dport
21 -j ACCEPT
The first rule will create an "ALLOWED" record in
/proc/net/xt_pknock/FTP after the successful reception of an UDP packet
to port 4000. The packet payload must be constructed as a HMAC256 using
"foo" as a key. The HMAC content is the particular client's IP address
as a 32-bit network byteorder quantity, plus the number of minutes
since the Unix epoch, also as a 32-bit value. (This is known as Simple
Packet Authorization, also called "SPA".) In such case, any subsequent
attempt to connect to port 21 from the client's IP address will cause
such packets to be accepted in the second rule.
Similarly, upon reception of an UDP packet constructed the same way,
but with the key "bar", the first rule will remove a previously
installed "ALLOWED" state record from /proc/net/xt_pknock/FTP, which
means that the second rule will stop matching for subsequent connection
attempts to port 21. In case no close-secret packet is received within
4 hours, the first rule will remove "ALLOWED" record from
/proc/net/xt_pknock/FTP itself.
Things worth noting:
General:
Specifying --autoclose 0 means that no automatic close will be per‐
formed at all.
xt_pknock is capable of sending information about successful matches
via a netlink socket to userspace, should you need to implement your
own way of receiving and handling portknock notifications. Be sure to
read the documentation in the doc/pknock/ directory, or visit the orig‐
inal site — http://portknocko.berlios.de/ .
TCP mode:
This mode is not immune against eavesdropping, spoofing and replaying
of the port knock sequence by someone else (but its use may still be
sufficient for scenarios where these factors are not necessarily this
important, such as bare shielding of the SSH port from brute-force
attacks). However, if you need these features, you should use UDP
mode.
It is always wise to specify three or more ports that are not monotoni‐
cally increasing or decreasing with a small stepsize (e.g.
1024,1025,1026) to avoid accidentally triggering the rule by a
portscan.
Specifying the inter-knock timeout with --time is mandatory in TCP
mode, to avoid permanent denial of services by clogging up the peer
knock-state tracking table that xt_pknock internally keeps, should
there be a DDoS on the first-in-row knock port from more hostile IP
addresses than what the actual size of this table is (defaults to 16,
can be changed via the "peer_hasht_ents" module parameter). It is also
wise to use as short a time as possible (1 second) for --time for this
very reason. You may also consider increasing the size of the peer
knock-state tracking table. Using --strict also helps, as it requires
the knock sequence to be exact. This means that if the hostile client
sends more knocks to the same port, xt_pknock will mark such attempt as
failed knock sequence and will forget it immediately. To completely
thwart this kind of DDoS, knock-ports would need to have an additional
rate-limit protection. Or you may consider using UDP mode.
UDP mode:
This mode is immune against eavesdropping, replaying and spoofing
attacks. It is also immune against DDoS attack on the knockport.
For this mode to work, the clock difference on the client and on the
server must be below 1 minute. Synchronizing time on both ends by means
of NTP or rdate is strongly suggested.
There is a rate limiter built into xt_pknock which blocks any subse‐
quent open attempt in UDP mode should the request arrive within less
than one minute since the first successful open. This is intentional;
it thwarts eventual spoofing attacks.
Because the payload value of an UDP knock packet is influenced by
client's IP address, UDP mode cannot be used across NAT.
For sending UDP "SPA" packets, you may use either knock.sh or knock-
orig.sh. These may be found in doc/pknock/util.
See alsoiptables(8), ip6tables(8), iptables-extensions(8), iptaccount(8)
For developers, the book "Writing Netfilter modules" at
http://inai.de/documents/Netfilter_Modules.pdf provides detailed infor‐
mation on how to write such modules/extensions.
SFUAN xtables-addons(8)