Striim 3.9.7 documentation

Masking functions

The primary use for these functions is to anonymize personally identifiable information, for example, as required by  the European Union's General Data Protection Regulation.

The String value argument is the name of the field containing the values to be masked.

The String functionType argument is ANONYMIZE_COMPLETELY, ANONYMIZE_PARTIALLY, or a custom mask:

  • ANONYMIZE_COMPLETELY will replace all characters in the field with x.

  • ANONYMIZE_PARTIALLY will use a default mask specific to each function, as detailed below.

  • A custom mask lets you define which characters to pass and which to mask. A custom mask may include any characters you wish. For example, with maskPhoneNumber, the mask ###-abc-defg would mask 123-456-7890 as 123-abc-defg. See Changing and masking field values using MODIFY and Modifying and masking values in the WAEvent data array for examples.

function

notes

maskCreditCardNumber(String value, String functionType)

Input must be of the format ####-####-####-#### or ################. For the value 1234-5678-9012-3456, partially anonymized output would be xxxx-xxxx-xxxx-3456 and fully anonymized would be xxxx-xxxx-xxxx-xxxx. For the value 1234567890123456, partially anonymized output would be xxxxxxxxxxxx3456 and fully anonymized would be xxxxxxxxxxxxxxxx. 

maskEmailAddress(String value, String functionType)

Input must be a valid email address. For the value msmith@example.com, partially anonymized output would be mxxxxx@example.com and fully anonymized would be xxxxxxxxxxxxxxxxxx.

maskGeneric(String value, String functionType)

Input may be of any length. Partially anonymized output masks all but the last four characters, fully anonymized masks all characters.

maskPhoneNumber(String value, String functionType)

The input field format must be a ten-digit telephone number in the format ###-###-####, (###)-###-####, ##########, +1-###-###-####, +1(###)###-####, +1##########, or +1(###)#######.

For the value 123-456-7890 or +1-123-456-7890, partially anonymized output would be xxx-xxx-7890 and fully anonymized would be xxx-xxx-xxxx.

If you use a custom mask and the input field values are of varying lengths, use ELSE functions to handle each length. See Changing and masking field values using MODIFY for an example.

maskPhoneNumber(String value, String regex, Integer group)

The String regex parameter is a regular expression that matches the phone number pattern and splits it into regex groups. The Integer group parameter specifies a group within that expression to be exposed. The other groups will be masked. See the example below and this tutorial for more information.

maskSSN(String value, String functionType)

The input field format must be ###-##-#### (US Social Security number format).

For the value 123-45-6789, partially anonymized output would be xxx-xx-6789 and fully anonymized would be xxx-xx-xxxx.

The following example shows how to mask telephone numbers from various countries that have different lengths:

CREATE SOURCE PhoneNumbers USING FileReader  ( 
  positionbyeof: false,
  directory: 'Samples',
  wildcard: 'EUPhoneNumbers.csv'
 ) 
 PARSE USING DSVParser  ( 
  header: true,
  trimquote: false
 ) 
OUTPUT TO phoneNumberStream ;

CREATE CQ FilterNameAndPhone 
INSERT INTO TypedStream
SELECT TO_STRING(data[0]) as country,
 TO_STRING(data[1]) as phoneNumber
FROM phoneNumberStream p;

CREATE CQ MaskPhoneNumberBasedOnPattern 
INSERT INTO MaskedPhoneNumber
SELECT country,
  maskPhoneNumber(phoneNumber, "(\\\\d{0,4}\\\\s)(\\\\d{0,4}\\\\s)([0-9 ]+)", 1, 2) 
FROM TypedStream;

CREATE TARGET MaskedPhoneNumberOut USING FileWriter  ( 
  filename: 'MaskedData'
) 
FORMAT USING DSVFormatter() 
INPUT FROM MaskedPhoneNumber;

Within the regular expression, groups 1 and 2 (exposed) are \\\\d{0,4}\\\\s, which represents zero to four digits followed by a space, and group 3 (masked) is ([0-9 ]+), which represents zero to 9 digits.

If Striim/Samples/EUPhoneNumbers.csv contains the following:

country,phoneNumber
AT,43 5 1766 1001
UK,44 844 493 0787
UK,44 20 7730 1234
DE,49 69 86 799 799
DE,49 211 42168340
IE,353 818 365000

the output file will contain:

AT,435xxxxxxxx
UK,44844xxxxxxx
UK,4420xxxxxxxx
DE,4969xxxxxxxx
DE,49211xxxxxxxx
IE,353818xxxxxx
Creating a masking CQ in the web UI

The Flow Designer includes a special Masking component to create masking CQs:

masking_palette.png

Click Masking, drag it into the workspace, and drop:

  1. Name the CQ.

  2. Select the input stream.

  3. Click ADD COLUMN and select a column to include in the output. 

    Masking_UI.png
  4. To pass the field unmasked, do not select a masking function. To mask it, select the appropriate masking function.

  5. Optionally, change the alias.

  6. Repeat steps 3-5 for each field to be included in the output.

  7. Select or specify the output, then click Save.

With the masking CQ above, using FileWriter with JSONFormatter, if the input was:

"Mary Stuart",1234-5678-9012-3456,mary.stuart@example.com,800-555-1212,1234-56-789

the masked output would be:

 {
  "name":"Stuart, Mary",
  "cc":"xxxxxxxxxxxxxxx3456",
  "email":"mxxxxxxxxxx@example.com",
  "phone":"xxx-xxx-1212",
  "SSN":"xxx-xx-6789"
 }

If you wish to edit the SELECT statement, click Convert to CQ. When you click Save, the component will be converted to a regular CQ, and if you edit it again the masking UI will no longer be available.