Using regular expressions (regex)
Striim supports the use of regular expressions (regex) in your TQL applications. It is important to remember that the Striim implementation of regex is Java-based (see java.util.regex.Pattern), so there are a few things to keep in mind as you develop your regex expressions:
The backslash character ( \ ) is recognized as an escape character in Java strings, so if you want to define something like
\w
in regex, use\\w
in such cases.In regex,
\\
matches a single backslash literal. Therefore if you want to use the backslash character as a literal in the Striim Java implementation of regex, you must actually use\\\\
.The java.lang.String class provides you with these methods supporting regex:
matches(), split(), replaceFirst(), replaceAll()
. Note that theString.replace()
methods do not support regex.TQL supports the regex syntax and constructs from java.util.regex. Note that this has some differences from POSIX regex.
If you are new to using regular expressions, refer to the following resources to get started:
You may use regex in LIKE
and NOT LIKE
expressions. For example:
WHERE ProcessName NOT LIKE '%.tmp%'
: filter out data from temp filesWHERE instance_applications LIKE '%Apache%'
: select only applications with Apache in their namesWHERE MerchantID LIKE '45%'
: select only merchants with IDs that start with 45.
The following entry from the MultiLogApp sample Apache access log data includes information about a REST API call in line 4:
0: 206.130.134.68 1: - 2: AWashington 3: 25/Oct/2013:11:28:36.960 -0700 4: GET http://cloud.saas.me/query?type=ChatterMessage&id=01e33d9a-34ee-ccd0-84b9- 14109fcf2383&jsessionId=01e33d9a-34c9-1c68-84b9-14109fcf2383 HTTP/1.1 5: 200 6: 0 7: - 8: Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/28.0.1500.71 Chrome/28.0.1500.71 Safari/537.36 9: 1506
Regex is also used by the MATCH
function. The MATCH
function in the ParseAccessLog
CQ parses the information in line 4 in to extract the session ID:
MATCH(data[4], ".*jsessionId=(.*) ")
The parsed output is:
sessionId: "01e33d9a-34c9-1c68-84b9-14109fcf2383"
The following, also from MultiLogApp
, is an example of the data[2]
element of a RawXMLStream
WAEvent data
array:
"Problem in API call [api=login] [session=01e3928f-e975-ffd4-bdc5-14109fcf2383] [user=HGonzalez] [sobject=User]","com.me.saas.SaasMultiApplication$SaasException: Problem in API call [api=login] [session=01e3928f-e975-ffd4-bdc5-14109fcf2383] [user=HGonzalez] [sobject=User]\n\tat com.me.saas.SaasMultiApplication.login (SaasMultiApplication.java:1253)\n\tat sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) \n\tat java.lang.reflect.Method.invoke(Method.java:606)\n\tat com.me.saas.SaasMultiApplication$UserApiCall.invoke(SaasMultiApplication.java:360)\n\tat com.me.saas.SaasMultiApplication$Session.login(SaasMultiApplication.java:1447)\n\tat com.me.saas.SaasMultiApplication.main(SaasMultiApplication.java:1587)"
This is parsed by the ParseLog4J
CQ as follows:
MATCH(data[2], '\\\\[api=([a-zA-Z0-9]*)\\\\]'), MATCH(data[2], '\\\\[session=([a-zA-Z0-9\\-]*)\\\\]'), MATCH(data[2], '\\\\[user=([a-zA-Z0-9\\-]*)\\\\]'), MATCH(data[2], '\\\\[sobject=([a-zA-Z0-9]*)\\\\]')
The parsed output is:
api: "login" sessionId: "01e3928f-e975-ffd4-bdc5-14109fcf2383" userId: "HGonzalez" sobject: "User"
See Parsing sources with regular expressions, FreeFormTextParser, and MultiFileReader for additional examples.