MonitorLogs: web server log data
The web server logs are in Apache NCSA extended/combined log format plus response time:
"%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\" %D"
(See apache.org for more information.) Here are four sample log entries:
216.103.201.86 - EHernandez [10/Feb/2014:12:13:51.037 -0800] "GET http://cloud.saas.me/login&jsessionId=01e3928f-e059-6361-bdc5-14109fcf2383 HTTP/1.1" 200 21560 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)" 1606 216.103.201.86 - EHernandez [10/Feb/2014:12:13:52.487 -0800] "GET http://cloud.saas.me/create?type=Partner&id=01e3928f-e05a-9be1-bdc5-14109fcf2383&jsessionId=01e3928f-e059-6361-bdc5-14109fcf2383 HTTP/1.1" 200 63523 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)" 1113 216.103.201.86 - EHernandez [10/Feb/2014:12:13:52.543 -0800] "GET http://cloud.saas.me/query?type=ChatterMessage&id=01e3928f-e05a-9be2-bdc5-14109fcf2383&jsessionId=01e3928f-e059-6361-bdc5-14109fcf2383 HTTP/1.1" 200 46556 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)" 1516 216.103.201.86 - EHernandez [10/Feb/2014:12:13:52.578 -0800] "GET http://cloud.saas.me/retrieve?type=ContractHistory&id=01e3928f-e05a-9be3-bdc5-14109fcf2383&jsessionId=01e3928f-e059-6361-bdc5-14109fcf2383 HTTP/1.1" 200 44556 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)" 39

In MultiLogApp, these logs are read by AccessLogSource:
CREATE SOURCE AccessLogSource USING FileReader ( directory:'Samples/MultiLogApp/appData', wildcard:'access_log', blocksize: 10240, positionByEOF:false ) PARSE USING DSVParser ( columndelimiter:' ', ignoreemptycolumn:'Yes', quoteset:'[]~"', separator:'~' ) OUTPUT TO RawAccessStream;
The log format is space-delimited, so the columndelimiter value is one space. With these quoteset and separator values, both square brackets and double quotes are recognized as delimiting strings that may contain spaces. With these settings, the first log entry above is output as a WAEvent data
array with the following values:
"216.103.201.86", "-", "EHernandez", "10/Feb/2014:12:13:51.037 -0800", "GET http://cloud.saas.me/login&jsessionId=01e3928f-e059-6361-bdc5-14109fcf2383 HTTP/1.1", "200", "21560", "-", "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)", "1606"
This in turn is processed by the ParseAccessLog CQ:
CREATE CQ ParseAccessLog INSERT INTO AccessStream SELECT data[0], data[2], MATCH(data[4], ".*jsessionId=(.*) "), TO_DATE(data[3], "dd/MMM/yyyy:HH:mm:ss.SSS Z"), data[4], TO_INT(data[5]), TO_INT(data[6]), data[7], data[8], TO_INT(data[9]) FROM RawAccessStream;
After the AccessLogEntry type is applied, the event looks like this:
srcIp: "216.103.201.86" userId: "EHernandez" sessionId: "01e3928f-e059-6361-bdc5-14109fcf2383" accessTime: 1392063231037 request: "GET http://cloud.saas.me/login&jsessionId=01e3928f-e059-6361-bdc5-14109fcf2383 HTTP/1.1" code: 200 size: 21560 referrer: "-" userAgent: "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)" responseTime: 1606
The web server log data is now in a format that Striim can analyze. AccessStream is used by the HackerCheck, LargeRTCheck, ProxyCheck, and ZeroContentCheck flows.