rwmatch(1) SiLK Tool Suite rwmatch(1)NAMErwmatch - Match SiLK records from two streams into a common stream
SYNOPSISrwmatch --relate=FIELD_PAIR [--relate=FIELD_PAIR ...]
[--time-delta=DELTA] [--symmetric-delta]
[{ --absolute-delta | --relative-delta | --infinite-delta }]
[--unmatched={q|r|b}]
[--note-add=TEXT] [--note-file-add=FILE]
[--ipv6-policy={ignore,asv4,mix,force,only}]
[--compression-method=COMP_METHOD]
[--site-config-file=FILENAME]
QUERY_FILE RESPONSE_FILE OUTPUT_FILE
rwmatch--help
rwmatch--version
DESCRIPTIONrwmatch provides a facility for relating (or matching) SiLK Flow
records contained in two sorted input files, labeling those flow
records, and writing the records to an output file.
The two input files are called QUERY_FILE and RESPONSE_FILE,
respectively. The purpose of rwmatch is to find a record in QUERY_FILE
that represents some network stimulus that caused a reply which is
represented by a record in RESPONSE_FILE. When rwmatch discovers this
relationship, it assigns a numeric ID to the match, searches both input
files for additional records that are part of the same event, stores
the numeric ID in each matching record's next hop IP field, and writes
all records that are part of that event to OUTPUT_FILE.
When the --symmetric-delta switch is specified, rwmatch also checks for
a stimulus in RESPONSE_FILE that triggered a reply in QUERY_FILE. This
is useful when matching flows where either side may have initiated the
conversation.
The input files must be sorted as described in "Sorting the input"
below. To use the standard input in place of one of the input streams,
specify "stdin" or "-" in its place.
The criteria for defining a match are given by one of more uses of the
--relate switch and by the timestamps on the flow records:
· Each use of --relate on the command line takes two comma-separated
SiLK Flow record fields as its argument. These two fields form a
FIELD_PAIR in the form QUERY_FIELD,RESPONSE_FIELD. For a match to
exist, the value of QUERY_FIELD on a record read from QUERY_FILE
must be identical to the value of RESPONSE_FIELD on a record read
from RESPONSE_FILE, and that must be true for all FIELD_PAIRs.
· By default, the start-time of the record from the RESPONSE_FILE
must begin within a time window determined by the start- and end-
times of the record read from the QUERY_FILE. The end-time is
extended by specifying the DELTA number of seconds as the argument
to the --time-delta switch. Thus
query_rec.sTime <= response_rec.sTime <= query_rec.eTime + DELTA
When the --symmetric-delta switch is provided, records also match
if the start-time of the query record begins within the time window
determined by the start- and end-times of the response record, plus
any value specified by --time-delta. That is:
response_rec.sTime <= query_rec.sTime <= response_rec.eTime + DELTA
The --time-delta switch allows for a delay in the response.
Although responses usually occur within a second of the query,
delays of several seconds are not uncommon due to combinations of
host and network processing delays. The DELTA value can also
compensate for timing errors between multiple sensors.
Once rwmatch establishes a match between records in the two input
files, it searches for additional records from both input files to add
to the match.
To do this, rwmatch denotes one of the records that comprise the
initial match pair as a base record. When possible, the base record is
the record with the earlier start time. In the case of a tie, the base
is determined by ports for TCP and UDP with the base being that with
the lower port if one is above 1024 and the other below 1024. If that
also fails, the base record is the record read from QUERY_FILE. With
millisecond time resolution, ties should be rare.
To determine whether a match exists between the base record and a
candidate record, rwmatch uses the FIELD_PAIRs specified by --relate.
When the base record and the candidate record were read from the same
file, only one side of each FIELD_PAIR is used.
In addition to the records having identical values for each field in
FIELD_PAIRs, the candidate record must be within a time window
determined by the --time-delta switch and the --absolute-delta,
--relative-delta, and --infinite-delta switches.
· When --infinite-delta is specified, there is no time window and
only the values specified by the FIELD_PAIRs are checked.
· Specifying --absolute-delta requires each candidate record to start
within the time window set by the start- and end-times of the base
record (plus any DELTA), similar to the rule used to establish the
match.
· If --relative-delta is specified, the end of the time window is
initially set to DELTA seconds after the end-time of the base
record. As records from either input file are added to the match,
the end of the time window is set to DELTA seconds beyond the
maximum end-time seen on any record in the match.
· When none of the above are explicitly specified, rwmatch uses the
rules of --absolute-delta.
Because long-lived sessions are often broken into multiple flows,
rwmatch may discard records that are part of a long-lived session. The
--relative-delta switch may compensate for this if the gap between
flows is less that the time specified in the --time-delta switch. The
--infinite-delta will compensate for arbitrarily long gaps, but it may
add records to a match that are not part of a true session. DNS flows
that use port 53/udp as both a service and reply port are an example.
When rwmatch establishes a match, it increments the match ID, with the
first match having a match ID of 1. To label the records that comprise
the match, rwmatch uses a 32-bit number where the lower 24-bits hold
the match ID and the upper 8-bits is set to 0 or 255 to indicate
whether the record was read from QUERY_FILE or RESPONSE_FILE,
respectively. rwmatch stores this 32-bit number in the next hop IP
field of the records. If the record is IPv6, rwmatch maps the number
into the ::ffff:0:0/96 netblock before modifying setting the next hop
IP. Apart from the change to the next hop IP field, the query and
response records are not modified.
By default, only matched records are written to the OUTPUT_FILE and any
record that could not be determined to be part of a match is discarded.
Specifying the --unmatched switch tells rwmatch to write unmatched
query and/or response records to OUTPUT_FILE. The required parameter
is one of "q", "r", or "b" to write the query records, the response
records, or both to OUTPUT_FILE. Unmatched query records have their
next hop IP set to 0.0.0.0, and unmatched response records have their
next hop IP set to 255.0.0.0.
Sorting the input
As rwmatch reads QUERY_FILE and RESPONSE_FILE, it expects the SiLK Flow
records to appear in a particular order that is best achieved by using
rwsort(1). In particular:
· The records in QUERY_FILE must appear in ascending order where the
key is the first value in each of the --relate FIELD_PAIRs in the
order in which the --relate switches appear and by the start time
of the flow.
· Likewise for the records in RESPONSE_FILE, except the second value
in each FIELD_PAIRs is used.
When rwmatch processes the following command
$ rwmatch --relate=1,2 --relate=2,1 --relate=5,5 Q.rw R.rw out.rw
it assumes the file1.rw and file2.rw were created by
$ rwsort --fields=1,2,5,stime --output=Q.rw input1.rw ....
$ rwsort --fields=2,1,5,stime --output=R.rw input2.rw ....
If the files source_ips.s.rw and dest_ips.s.rw are created by the
following commands:
$ rwsort --field=1,9 source_ips.rw > source_ips.s.rw
$ rwsort --field=2,9 dest_ips.rw > dest_ips.s.rw
The following call to rwmatch works correctly:
$ rwmatch --relate=1,2 source_ips.s.rw dest_ips.s.rw matched.rw
Note that the following command produces very few matches since
source_ips.s.rw was sorted on field 1 and dest_ips.s.rw was sorted on
field 2.
$ rwmatch --relate=2,1 source_ips.s.rw dest_ips.s.rw stdout
The recommended sort ordering for TCP and UDP is shown below. This
correctly handles multiple flows occurring during the same time
interval which involve multiple ports:
$ rwsort --fields=1,4,2,3,5,stime incoming.rw > incoming-query.rw
$ rwsort --fields=2,3,1,4,5,stime outgoing.rw > outgoing-response.rw
The corresponding rwmatch command is:
$ rwmatch --relate=1,2 --relate=4,3 --relate=2,1 --relate=3,4 \
--relate=5,5 incoming-query.rw outgoing-response.rw matched.rw
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an
exact match for an option. A parameter to an option may be specified
as --arg=param or --arg param, though the first form is required for
options that take optional parameters.
--relate=FIELD_PAIR
Specify a pair of fields where the value of these fields in two
records must be identical for the records to be considered part of
a match. The first field is for records from QUERY_FILE and the
second for records from RESPONSE_FILE. At least one FIELD_PAIR
must be provided; up to 128 FIELD_PAIRs may be provided. The
FIELD_PAIR must contain two field names or field IDs separated by a
comma, such as --relate=dip,sip or --relate=proto,proto.
Each FIELD_PAIR is unidirectional; specifying --relate=sip,dip
matches records where the query record's source IP matches the
response record's destination IP, but does not imply any
relationship between the response's source IP and query's
destination IP. To match symmetric flow records between hosts,
specify:
--relate=sip,dip --relate=dip,sip
When using a port-based protocol (e.g., TCP or UDP), refine the
match further by specifying the ports:
--relate=2,1 --relate=1,2 --relate=3,4 --relate=4,3
Matching becomes more specific as more fields are added. Since
rwmatch discards unmatched records, a highly specific match (such
as the last one specified above) generates more matches (resulting
in higher match IDs), but may result in fewer total flows due to
certain records being unmatched.
The available fields are listed here. For a better description of
some of these fields, see the rwcut(1) manual page.
sIP,1
source IP address
dIP,2
destination IP address
sPort,3
source port for TCP and UDP, or equivalent
dPort,4
destination port for TCP and UDP, or equivalent
protocol,5
IP protocol
packets,pkts,6
packet count
bytes,7
byte count
flags,8
bit-wise OR of TCP flags over all packets
sensor,12
name or ID of sensor at the collection point
class,20
class of sensor at the collection point
type,21
type of sensor at the collection point
iType
the ICMP type value for ICMP or ICMPv6 flows and empty for non-
ICMP flows. This field was introduced in SiLK 3.8.1.
iCode
the ICMP code value for ICMP or ICMPv6 flows and empty for non-
ICMP flows. See note at "iType".
in,13
router SNMP input interface or vlanId if packing tools were
configured to capture it (see sensor.conf(5))
out,14
router SNMP output interface or postVlanId
initialFlags,26
TCP flags on first packet in the flow
sessionFlags,27
bit-wise OR of TCP flags over all packets except the first in
the flow
attributes,28
flow attributes set by the flow generator
application,29
guess as to the content of the flow
--time-delta=DELTA
Specify the number of seconds by which a response record may start
after a query record has ended. DELTA may contain fractional
seconds to millisecond precision; for example, 0.500 represents a
500 millisecond delay. Responses match queries if
query.sTime <= response.sTime <= query.eTime + DELTA
When --time-delta is not specified, DELTA defaults to 0 and the
response must begin before the query ends.
--symmetric-delta
Allow matching of flows where the RESPONSE_FILE contains the
initial flow. In this case, a query record matches a response
record when
response.sTime <= query.sTime <= response.eTime + DELTA
--absolute-delta
When adding additional records to an established match, only
include candidate flows that start less than DELTA seconds after
the end of the initial flow. This is the default behavior. This
switch is incompatible with --relative-delta and --infinite-delta.
--relative-delta
When adding additional records to an established match, include
candidate flows that start within DELTA seconds of the greatest end
time for all records in the current match. This switch is
incompatible with --absolute-delta and --infinite-delta.
--infinite-delta
When adding additional records to an established match, include
candidate records based on the FIELD_PAIRS alone, ignoring time.
This switch is incompatible with --absolute-delta and
--relative-delta.
--unmatched=q|r|b
Write unmatched query and/or response records to OUTPUT_FILE. The
parameter determines whether the query records, the response
records, or both are written to OUTPUT_FILE. Unmatched query
records have their next hop IPv4 address set to 0.0.0.0, and
unmatched response records have their next hop IPv4 address set to
255.0.0.0. When the b value is used, OUTPUT_FILE contains a
complete merge of QUERY_FILE and RESPONSE_FILE.
--note-add=TEXT
Add the specified TEXT to the header of the output file as an
annotation. This switch may be repeated to add multiple
annotations to a file. To view the annotations, use the
rwfileinfo(1) tool.
--note-file-add=FILENAME
Open FILENAME and add the contents of that file to the header of
the output file as an annotation. This switch may be repeated to
add multiple annotations. Currently the application makes no
effort to ensure that FILENAME contains text; be careful that you
do not attempt to add a SiLK data file as an annotation.
--ipv6-policy=POLICY
Determine how IPv4 and IPv6 flows are handled when SiLK has been
compiled with IPv6 support. When the switch is not provided, the
SILK_IPV6_POLICY environment variable is checked for a policy. If
it is also unset or contains an invalid policy, the POLICY is mix.
When SiLK has not been compiled with IPv6 support, IPv6 flows are
always ignored, regardless of the value passed to this switch or in
the SILK_IPV6_POLICY variable. The supported values for POLICY
are:
ignore
Ignore any flow record marked as IPv6, regardless of the IP
addresses it contains.
asv4
Convert IPv6 flow records that contain addresses in the
::ffff:0:0/96 prefix to IPv4 and ignore all other IPv6 flow
records.
mix Process the input as a mixture of IPv4 and IPv6 flow records.
Should rwmatch need to compare an IPv4 and IPv6 address, it
maps the IPv4 address into the ::ffff:0:0/96 prefix.
force
Convert IPv4 flow records to IPv6, mapping the IPv4 addresses
into the ::ffff:0:0/96 prefix.
only
Process only flow records that are marked as IPv6 and ignore
IPv4 flow records in the input.
--compression-method=COMP_METHOD
Specify how to compress the output. When this switch is not given,
output to the standard output or to named pipes is not compressed,
and output to files is compressed using the default chosen when
SiLK was compiled. The valid values for COMP_METHOD are determined
by which external libraries were found when SiLK was compiled. To
see the available compression methods and the default method, use
the --help or --version switch. SiLK can support the following
COMP_METHOD values when the required libraries are available.
none
Do not compress the output using an external library.
zlib
Use the zlib(3) library for compressing the output, and always
compress the output regardless of the destination. Using zlib
produces the smallest output files at the cost of speed.
lzo1x
Use the lzo1x algorithm from the LZO real time compression
library for compression, and always compress the output
regardless of the destination. This compression provides good
compression with less memory and CPU overhead.
best
Use lzo1x if available, otherwise use zlib. Only compress the
output when writing to a file.
--site-config-file=FILENAME
Read the SiLK site configuration from the named file FILENAME.
When this switch is not provided, rwmatch searches for the site
configuration file in the locations specified in the "FILES"
section.
--help
Print the available options and exit.
--version
Print the version number and information about how SiLK was
configured, then exit the application.
EXAMPLES
In the following examples, the dollar sign ("$") represents the shell
prompt. The text after the dollar sign represents the command line.
Lines have been wrapped for improved readability, and the back slash
("\") is used to indicate a wrapped line.
Matching TCP Flows
rwmatch is a generalized matching tool; the most basic function
provided by rwmatch is the ability to match both sides of a TCP
connection. Given incoming and outgoing web traffic in two files
web_in.rw and web_out.rw, the following sequence of commands will
generate a file, web-sessions.rw consisting of matched sessions for
every complete web session in web_in.rw and web_out.rw:
$ rwsort --field=1,2,3,4,stime web_in.rw > web_in-s.rw
$ rwsort --field=2,1,4,3,stime web_out.rw > web_out-s.rw
$ rwmatch --relate=1,2 --relate=2,1 --relate=3,4 --relate=4,3 \
web_in-s.rw web_out-s.rw web-sessions.rw
Finding Responses to a Scan
Because rwmatch can match fields arbitrarily, you can also match
records across different protocols. Suppose there are two SiLK Flow
files, indata.rw and outdata.rw, that contain the incoming and outgoing
data, respectively, for a particular time period.
To trace responses to a scan attempt, we start by identifying a
specific horizontal scan. In this example, we use an SMTP scan on TCP
port 25. Assume that we have an IPset file, smtp-scanners.set, that
contains the external IP addresses that scanned us port port 25.
(Perhaps this file was obtained by using rwscan(1) and rwscanquery(1).)
First, use rwfilter(1) to find the flow records matching these scan
attempts in the incoming data file. Sort the output of rwfilter by
source IP, source port, destination IP, destination port, and time, and
store the results in smtp-scans.rw:
$ rwfilter --proto=6 --sip-set=smtp-scanners.set --dport=25 \
--pass=- indata.rw \
| rwsort --field=sip,sport,dip,dport,stime > smtp-scans.rw
We can identify hosts that responded to the scan (we consider a
accepting the TCP connection as a response) by finding potential
replies in the outgoing data file, sorting them, and storing the
results in scan-response.rw. For this command on the outgoing data,
note that we must swap source and destination from the values used for
the incoming data:
$ rwfilter --proto=6 --dip-set=smtp-scanners.set --sport=25 \
--pass=- outdata.rw \
| rwsort --field=dip,dport,sip,sport,stime > scan-response.rw
We can now match the flow records to produce the file matched-scans.rw:
$ rwmatch --relate=1,2 --relate-3,4 --relate=2,1 --relate=4,3 \
smtp-scans.rw scan-response.rw matched-scans.rw
The results file, matched-scans.rw, will contain all the exchanges
between the scanning hosts and the responders on port 25. Examination
of these flows may show evidence of buffer overflows, data
exfiltration, or similar attacks.
Next, we want to identify responses to the scan that were produced by
our routers, such as ICMP destination unreachable messages.
Use rwfilter to find the ICMP messages going to the scanning hosts,
sort the flow records, and store the results in icmp.rw:
$ rwfilter --proto=1 --icmp-type=3 --pass=stdout outdata.rw \
| rwsort --field=dip,stime > icmp.rw
Run rwmatch and match exclusively on the IP address.
$ rwmatch --relate=2,1 icmp.rw smtp-scans.rw result.rw
The resulting file, result.rw will consist of single packet flows (from
smtp-scans.rw) with an ICMP response (from icmp.rw).
Similar queries can be used to identify other multiple-protocol
phenomena, such as the results of a traceroute.
Displaying the Results
These examples assume matched.rw is an output file produced by rwmatch.
When using rwcut(1) to display the records in matched.rw, you may
specify the next hop IP field ("nhIP") to see the match identifier:
$ rwcut --num-rec=8 --fields=sip,sport,dip,dport,type,nhip matched.rw
sIP|sPort| dIP|dPort| type| nhIP|
10.4.52.235|29631|192.168.233.171| 80| inweb| 0.0.0.1|
192.168.233.171| 80| 10.4.52.235|29631| outweb| 255.0.0.1|
10.9.77.117|29906| 192.168.184.65| 80| inweb| 0.0.0.2|
192.168.184.65| 80| 10.9.77.117|29906| outweb| 255.0.0.2|
10.14.110.214|29989| 192.168.249.96| 80| inweb| 0.0.0.3|
192.168.249.96| 80| 10.14.110.214|29989| outweb| 255.0.0.3|
10.18.66.79|29660| 192.168.254.69| 80| inweb| 0.0.0.4|
192.168.254.69| 80| 10.18.66.79|29660| outweb| 255.0.0.4|
The first record is a query from the external host 10.4.52.235 to the
web server on the internal host 192.168.233.171, and the second record
is the web server's response. The third and fourth records represent
another query/response pair.
The cutmatch.so plug-in is an alternate way to display the match
parameter that rwmatch writes into the next hop IP field. The
cutmatch.so plug-in defines a "match" field that displays the direction
of the flow ("->" represents a query and "<-" a response) and the match
ID. To use the plug-in, you must explicit load it into rwcut by
specifying the --plugin switch. You can then add match to the list of
--fields to print:
$ rwcut --plugin=cutmatch.so --num-rec=8 \
--fields=sip,sport,match,dip,dport,type matched.rw
sIP|sPort| <->Match#| dIP|dPort| type|
10.4.52.235|29631|-> 1|192.168.233.171| 80| inweb|
192.168.233.171| 80|<- 1| 10.4.52.235|29631| outweb|
10.9.77.117|29906|-> 2| 192.168.184.65| 80| inweb|
192.168.184.65| 80|<- 2| 10.9.77.117|29906| outweb|
10.14.110.214|29989|-> 3| 192.168.249.96| 80| inweb|
192.168.249.96| 80|<- 3| 10.14.110.214|29989| outweb|
10.18.66.79|29660|-> 4| 192.168.254.69| 80| inweb|
192.168.254.69| 80|<- 4| 10.18.66.79|29660| outweb|
Using the "sIP" and "dIP" fields is confusing when the file you are
examining contains both incoming and outgoing flow records. To make
the output from rwmatch more clear, use the int-ext-fields(3) plug-in
as well. That plug-in allows you to display the external IPs in one
column and the internal IPs in a another column. See its manual page
for additional information.
$ export INCOMING_FLOWTYPES=all/in,all/inweb
$ export OUTGOING_FLOWTYPES=all/out,all/outweb
$ rwcut --plugin=cutmatch.so --plugin=int-ext-fields.so --num-rec=8 \
--fields=ext-ip,ext-port,match,int-ip,int-port,proto matched.rw
ext-ip|ext-p| <->Match#| int-ip|int-p| type|
10.4.52.235|29631|-> 1|192.168.233.171| 80| inweb|
10.4.52.235|29631|<- 1|192.168.233.171| 80| outweb|
10.9.77.117|29906|-> 2| 192.168.184.65| 80| inweb|
10.9.77.117|29906|<- 2| 192.168.184.65| 80| outweb|
10.14.110.214|29989|-> 3| 192.168.249.96| 80| inweb|
10.14.110.214|29989|<- 3| 192.168.249.96| 80| outweb|
10.18.66.79|29660|-> 4| 192.168.254.69| 80| inweb|
10.18.66.79|29660|<- 4| 192.168.254.69| 80| outweb|
ENVIRONMENT
SILK_IPV6_POLICY
This environment variable is used as the value for --ipv6-policy
when that switch is not provided.
SILK_CLOBBER
The SiLK tools normally refuse to overwrite existing files.
Setting SILK_CLOBBER to a non-empty value removes this restriction.
SILK_CONFIG_FILE
This environment variable is used as the value for the
--site-config-file when that switch is not provided.
SILK_DATA_ROOTDIR
This environment variable specifies the root directory of data
repository. As described in the "FILES" section, rwmatch may use
this environment variable when searching for the SiLK site
configuration file.
SILK_PATH
This environment variable gives the root of the install tree. When
searching for configuration files, rwmatch may use this environment
variable. See the "FILES" section for details.
FILES
${SILK_CONFIG_FILE}
${SILK_DATA_ROOTDIR}/silk.conf
/data/silk.conf
${SILK_PATH}/share/silk/silk.conf
${SILK_PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf
Possible locations for the SiLK site configuration file which are
checked when the --site-config-file switch is not provided.
SEE ALSOrwfilter(1), rwsort(1), rwcut(1), rwfileinfo(1), rwscan(1),
rwscanquery(1), sensor.conf(5), silk(7), zlib(3)NOTES
SiLK 3.9.0 expanded the set of fields accepted by the --relate switch
and added support for IPv6 flow records.
SiLK 3.11.0.1 2016-02-19 rwmatch(1)