Development of l7-filter has moved to the Clear Foundation. These pages are out of date, but will remain as a historical record.
Last updated 23 April 2008
It's fairly easy to add support for more protocols to l7-filter. All
you need to do is add a new pattern file to
/etc/l7-protocols
. This directory and its subdirectories
are searched (non-recursively) for pattern files. (Thus, it will find
/etc/l7-protocols/http.pat
and
/etc/l7-protocols/protocols/http.pat
, but not
/etc/l7-protocols/foo/bar/http.pat
.) Please consider
submitting any patterns you write for inclusion into the official
distribution.
The basic format is very simple:
The name of the file must match the name of the protocol. (If the protocol is "ftp", the file must be "ftp.pat".) Lines starting with '#' and blank lines are ignored. Both the kernel and userspace versions of l7-filter will use the given regular expression. For example, vnc.pat could be:
vnc
^rfb 00[1-9]\.00[0-9]\x0a$
Sometimes it will be desirable to define a separate regular expression for the kernel and userspace versions or to pass a custom set of flags to the userspace version's regcomp/regexec. (See regular expressions below for why.) In this case, add either or both of these lines after the two above:
userspace pattern=<userspace pattern>
userspace flags=<regexec and/or regcomp flags, whitespace delimited>
For example, smtp.pat could be:
smtp
^220[\x09-\x0d -~]* (e?smtp|simple mail)
userspace pattern=^220[\x09-\x0d -~]* (E?SMTP|[Ss]imple [Mm]ail)
userspace flags=REG_NOSUB REG_EXTENDED
Pattern files that are part of the official distribution need some metadata at the top for display on the webpage and for the use of frontends. The top four lines should look like this:
# <Protocol name and some concise detail about the protocol>
# Pattern attributes: [attribute word]*
# Protocol groups: [group name]*
# Wiki: [link]*
"Pattern attributes" give information about how good the pattern is on various scales. Attribute words can be any of undermatch, overmatch, superset, subset, great, good, ok, marginal, poor, veryfast, fast, nosofast, or slow. Any number of these may be used. They are defined on the protocols page.
"Protocol groups" are supposed to give frontends a way to group similar protocols. Group names can be whatever you like, but should match existing names if possible. Any number may be used. More relevant groups should be listed first for sorting purposes. Group names in use as of 2007-01-14 are:
"Wiki" gives zero or more links to pages documenting the pattern and other methods of identifying the protocol on protocolinfo.org.
The kernel and userspace versions of l7-filter use different regular expressions libraries. They use generally the same syntax, but have some differences.
Because patterns frequently need to use non-printable characters,
both versions of l7-filter add perl-style
hex matching on top of their stock libraries. This uses \xHH
notation, so to match a tab, use "\x09
". Note that regexp
control characters are still control characters even
when written in hex:
\x24 == $ \x28 == (
\x29 == ) \x2a == *
\x2b == + \x2e == .
\x3f == ? \x5b == [
\x5c == \ \x5d == ]
\x5e == ^ \x7b == { (only a control character for the userspace version)
\x7c == | \x7d == } (only a control character for the userspace version)
Both versions of l7-filter strip out the nulls (\x00 bytes) from network data so that they can treat it as normal C strings. So (1) you can't match on nulls and (2) fields may appear shorter than expected. For example, if a protocol has a 4 byte field and any of those bytes can be null, it can appear to be any length from 0 to 4.
The kernel version of l7-filter uses Henry Spencer's 1987
implementation of Bell Version 8 regular
expressions ("V8 regexps"), with a few modifications, noted here.
V8 regexps are likely more limited than the regexps you are used to.
Notably, you cannot use bounds ("foo{3}
"),
character classes ("[[:punct:]]
") or backreferences.
Because this library does not have a flag for case-sensitivity, the kernel version of l7-filter is always case insensitive. Upper case in patterns is identical to lower case. (This is true even if you write an uppercase letter in hex!)
The kernel version completely ignores any lines in the pattern file after the second non-comment line.
The userspace version of l7-filter uses the GNU regular expression library, so its behaviour should be more familiar. This library is documented in man 3 regcomp and man 7 regex.
If only one regular expression is specified in the pattern file (see
file format above), the userspace version
compiles it with the flags REG_EXTENDED | REG_ICASE |
REG_NOSUB
and executes it with no flags.
If the userspace pattern
and userspace
flags
lines are given, the userspace pattern will be used instead
of the first one. It will be compiled and executed with the given flags.
(l7-filter will sort out which flags go to regcomp and which to
regexec.)
If only the userspace pattern
line is given, the
userspace pattern will be compiled with REG_EXTENDED | REG_ICASE |
REG_NOSUB
and executed with no flags. If only the
userspace flags
line is given, the single regular
expression will be compiled and executed with the given flags.
If you have set up your iptables rules correctly (see the HOWTO), l7-filter sees the data going in both directions in the order that it passes through the computer. For instance, in FTP, the first thing it sees is "221 server ready", then "USER bob", then "331 send password", then "PASS frogbeard", and so on.
l7-filter can match across packets. For instance, with the above FTP
example, the match is first attempted on "221 server ready", then on
"221 server readyUser bob", then "221 server readyUSER bob331 send
password",[1] so you could match it with
"220.*user.*331
". At each match attempt, the regexp
special character ^
will match the beginning of the stream
and $
will match the end of the last packet seen so far.
Because the Linux kernel's ip_conntrack module tracks connectionless
UDP and ICMP sessions as
"connections", this works with them as well as TCP.
Usually the identifying characteristics of a connection are found at the beginning of that connection. For this reason, and to save processing time, l7-filter only looks at the first 10 packets or 2kB of each connection, whichever is smaller. Any match made within this time is applied to the rest of the connection as well.
There are two general guidelines:
1) A pattern must be neither too specific nor not specific enough.
Example 1: The pattern "bear
" for Bearshare is not
specific enough. This pattern could match a wide variety of
non-Bearshare connections. For instance, an HTTP request for http://bear.com would be
matched.
Example 2: "220 .*ftp.*(\[.*\]|\(.*\))
" for FTP is too
specific. Not all servers send ()s or []s after their 220. In fact,
servers are not even required to send the string "ftp" at any time, but
the vast majority do. Good judgement and testing are necessary for
instances such as this.
2) It should use a minimum of processing power. If it's possible to
reduce the number of instances of *
, +
and
|
in your pattern, you should do so. Use the performance
testing program included in the patterns package.
3) It should complete its match on the earliest packet possible. The
FTP pattern could be "^220[\x09-\x0d -~]*\x0d\x0aUSER[\x09-\x0d
-~]*\x0d\x0a331
", but that won't match until the third data
packet. Instead, we use "^220[\x09-\x0d -~]*ftp
", which
matches on the first data packet.
[\x09-\x0d -~] == printable characters, including whitespace [\x09-\x0d ] == any whitespace [!-~] == non-whitespace printable characters
If you do not feel that you are able to do all of the above yourself, you may want to send some packets you have captured to the mailing list so that others can do the rest. In order for this to be useful, please follow these guidelines:
If you aren't sure how to follow these guidelines, try your best and send the result to us. If it's wrong, we'll be happy to tell you how to fix it.