Skip to main content

CREATE CACHE

Note

A cache is loaded into memory when it is deployed, so deployment of an application or flow with a large cache may take some time. If your cache is too large to fit in memory, see CREATE EXTERNAL CACHE.

CREATE CACHE <name>
USING <reader name> { <properties> } 
[ PARSE USING <parser name> ( <properties> ) ]
QUERY ( keytomap:'<key field for type name>'
  [, refreshinterval:'<microseconds>' ]
  [, refreshstarttime:'<hh:mm:ss>' ]
  [, replicas:{<integer>|all} ]
  [, skipinvalid: 'true' )
OF <type name>;

The required and optional properties vary according to the reader selected (typically FileReader or DatabaseReader). See the Sources topic for the reader (and parser, if any) you are using. If a required property is not specified, the cache will use its default value. Required properties without default values must be specified.

  • The keytomap value (in the example below, zip) is the name of field that will be used to index the cache and, in multi-server environments, to distribute it. For best performance, make the keytomap field the one used to join the cache data with stream data. Joins on other fields will be much slower.

  • The refreshinterval and refreshstarttime values specify when the cache is updated. See the discussion of these properties after the sample code below.

  • When skipinvalid has its default value of false, if the data in a cache does not match the defined format (for example, if it has fewer fields that are in the type, or the column delimiter specified in the PARSE USING clause is a comma but the data file is tab-delimited), deployment will fail with an error similar to:

    Deploy failed! Error invoking method CreateDeployFlowStatement: 
    java.lang.RuntimeException: com.webaction.exception.Warning: 
    java.util.concurrent.ExecutionException: 
    com.webaction.errorhandling.StriimRuntimeException: 
    Error in: Cache , error is: STRM-CACHE-1011 : 
    The size of this record is invalid for 
    {"class":"com.webaction.runtime.QueryValidator","method":"CreateDeployFlowStatement",
    "params":["01e6d1e9-3e67-bd31-adda-685b3587069e","APPLICATION",
    {"strategy":"any","flow":"dev9003","group":"default"},[]],"callbackIndex":5}

    To skip invalid records, set skipinvalid to true.

  • The OF type (in the example below, ZipCacheType) must correctly describe the data source.

The following illustrates typical usage in an application. In this example, the key field for ZipCache is zip, which is used in the join with FilteredDataStream:

CREATE TYPE ZipCacheType(
  zip String KEY,
  city String,
  state String,
  latVal double,
  longVal double
);
 
CREATE CACHE ZipCache
USING FileReader (
  directory: 'Samples',
  wildcard: 'zipdata.txt')
PARSE USING DSVParser (
  header: Yes,
  columndelimiter: '\t',
  trimquote:false
) QUERY (keytomap:'zip') OF ZipCacheType;
 
CREATE TYPE JoinedDataType(
  merchantId String KEY,
  zip String,
  city String,
  state String,
  latVal double,
  longVal double
);
CREATE STREAM JoinedDataStream OF JoinedDataType;
 
CREATE CQ JoinDataCQ
INSERT INTO JoinedDataStream
SELECT  f.merchantId,
  f.zip,
  z.city,
  z.state,
  z.latVal,
  z.longVal
FROM FilteredDataStream f, ZipCache z
WHERE f.zip = z.zip;

Using the sample code above, the cache data will be updated only when the cache is started or restarted. To refresh the cache at a set interval, add the refreshinterval option:

... QUERY (keytomap:'zip', refreshinterval:'360000000') OF ZipCacheType;

With the above setting, the cache will be refreshed hourly. To refresh the cache at a specific time, add the refreshstarttime option:

... QUERY (keytomap:'zip', refreshstarttime:'13:00:00') OF ZipCacheType;

With the above setting, the cache will be refreshed daily at 1:00 pm. You may combine the two options:

... QUERY (keytomap:'zip', refreshinterval:'360000000', refreshstarttime:'13:00:00') OF ZipCacheType;

With the above setting, the cache will be refreshed daily at 1:00 pm, and then hourly after that. This will ensure that the cache is refreshed at a specific time rather than relative to when it was started.

To see when a cache was last updated, use the console command MON <namespace>.<cache name>.

See Database Reader for examples of caches populated by querying databases.