Skip to main content

Parsing HTTP log entries

The following example uses the FreeFormTextParser regex property to match several patterns in the following log entry.

log {start 1234567890.123456} {addr 123.456.789.012} {port 12345} {method POST} {url /abc/def.ghi} {agent {Mozilla/4.0 (compatible; MSIE 6.0; MS Web Services Client Protocol 1.1.12345.1234)}} {bytes 1234} {status 200} {end 1234567890.123456} {host 123.456.789.012}
...

In this case we use a positive lookbehind construct to match the start, addr, port, method, url, bytes, status, and end patterns, while excluding the log and agent patterns:

regex:'((?<=start ).[^}]+)|((?<=addr ).[^}]+)|((?<=port ).[^}]+)|((?<=method ).[^}]+)|((?<=url ).[^}]+)|((?<=bytes ).[^}]+)|(?<=\\(\\#)[^\\)]+|((?<=status ).[^}]+)|((?<=end ).[^}]+)|((?<=host ).[^}]+)'

Note that each capture group uses two sets of parentheses. For example,

((?<=start ).[^}]+)

The inner parentheses are used for the positive lookbehind syntax:

(?<=start )

The outer parentheses are used for the capture group. In this example, group[1]=1234567890.123456, which is used by the parser in its data array.

Here is the TQL of the PARSE statement using the regex expression within a FreeFormTextParser:

PARSE USING FreeFormTextParser (
  RecordBegin:'^log ',
  RecordEnd:'\n',
  regex:'((?<=start ).[^}]+)|((?<=addr ).[^}]+)|((?<=port ).[^}]+)|((?<=method ).[^}]+)|((?<=url ).[^}]+)|((?<=bytes ).[^}]+)|(?<=\\(\\#)[^\\)]+|((?<=status ).[^}]+)|((?<=end ).[^}]+)|((?<=host ).[^}]+)',
  separator:'~'
)