Skip to main content

Filtering data in a source

The following examples give an idea of various possibilities for filtering source data using OUTPUT TO and SELECT in sources. This syntax is supported both by the server and the Forwarding Agent (see Using the Striim Forwarding Agent).

In sources, SELECT statements must use DATA[#] function functions (see also Parsing the data field of WAEvent. To select using DATA(x) and DATAORDERED(x) functions or META() function functions, you must create a CQ on the output of the source (see also Using the DATA(), DATAORDERED(), BEFORE(), and BEFOREORDERED() functions).

Select only events where the fifth column value is greater than 10,000:

CREATE SOURCE ...
OUTPUT TO CSVStream (a String, b String, c String, d String,e String)
  SELECT * WHERE TO_INT(data[4]) > 10000;

Filter out second and third columns:

... OUTPUT TO CSVStream (a String, b String, c String)
  SELECT data[0], data[3],data[4] ;

Cast the first and third columns as integers:

... OUTPUT TO CSVStream (a Integer, b String, c Integer, d String, e String)
  SELECT TO_INT(data[0]), data[1], TO_INT(data[2]), data[3], data[4];

Cast the first and third columns as integers and select only events where the fifth column value is greater than 10,000:

... OUTPUT TO CSVStream (a Integer, b String, c Integer, d String, e String)
SELECT TO_INT(data[0]), data[1], TO_INT(data[2]), data[3], data[4]
  where TO_INT(data[4]) > 10000)

Add the first and third columns and output as a single field a:

... OUTPUT TO CSVStream (a Integer, b String, c String, d String) 
  SELECT TO_INT(data[0])+TO_INT(data[2]), data[1], data[3],data[4] ;

You can also use OUTPUT TO to split events among multiple streams based on their field values.

When the fifth column value is over 10,000, output the event to to HighOrderStream, when the value is 10,000 or less, output it to LowOrderStream:

... OUTPUT to HighOrderStream (a Integer, b String, c Integer, d String, e String)
  SELECT TO_INT(data[0]), data[1], TO_INT(data[2]), data[3], data[4]
  WHERE TO_INT(data[4]) > 10000),
OUTPUT to LowOrderStream (a Integer, b String, c Integer, d String, e String)
  SELECT TO_INT(data[0]), data[1], TO_INT(data[2]), data[3], data[4]
  WHERE TO_INT(data[4]) <= 10000);

Output all events to FullStream and only events where the fifth column value is 10,000 or less to LowOrderStream:

... OUTPUT to FullStream,
OUTPUT to LowOrderStream (a Integer, b String, c Integer, d String, e String)
SELECT TO_INT(data[0]), data[1], TO_INT(data[2]), data[3], data[4]
  WHERE TO_INT(data[4]) <= 10000);