Skip to main content

MongoDB Cosmos DB Writer

Writes to Cosmos DB using the Azure Cosmos DB API for MongoDB version 3.6 or 4.0, allowing you to write to a CosmosDB target as if it were a MongoDB target. For general information, see Azure Cosmos DB API for MongoDB and Connect a MongoDB application to Azure Cosmos DB.

Azure Cosmos DB API for MongoDB 3.2 is not supported.

Note

If the writer exceeds the number of Request Units per second provisioned for your Cosmos DB instance (see Request Units in Azure Cosmos DB), the application may halt. The Azure Cosmos DB Capacity Calculator can give you an estimate of the appropriate number of RUs to provision:

CosmosDBRUs.png

You may need more RUs during initial load than for continuing replication.

See Optimize your Azure Cosmos DB application using rate limiting and Prevent rate-limiting errors for Azure Cosmos DB API for MongoDB operations for more information.

Using the adapter

This adapter may be used in four ways:

  • With an input stream of a user-defined type, MongoDB CosmosDB Writer writes events as documents to a single Cosmos DB collection.

    Target document field names are taken from the input stream's event type.

    The value of the key field of the input event is used as the document key (_id field value). If the input stream's type has no key, the target document's key is generated by concatenating the values of all fields, separated by the Key Separator string. Alternatively, you may specify a subset of fields to be concatenated using the syntax <database name>.<collection name> keycolumns(<field1 name>, <field2 name>, ...) in the Collections property.

  • With an input stream of type JSONNodeEvent that is the output stream of a source using JSONParser, MongoDB Cosmos DB Writer writes events as documents to a single Cosmos DB collection.

    Target document field names are taken from the input events' JSON field names.

    When the JSON event contains an _id field, its value is used as the document key. Otherwise, Cosmos DB will generate an ObjectId for the document key.

  • With an input stream of type JSONNodeEvent that is the output stream of a MongoDB Reader source, MongoDB Cosmos DB Writer writes each MongoDB collection to a separate Cosmos DB collection.

    MongoDB collections may be replicated in a Cosmos DB instance by using wildcards in the Collections property. Alternatively, you may manually map source collections to target collections as discussed in the notes for the Collections property.

    The source document's primary key and field names are used as the target document's key and field names.

  • With an input stream of type WAEvent that is the output stream of a SQL CDC reader or Database Reader source, MongoDB Cosmos DB Writer writes data from each source table to a separate collection. The target collections may be in different databases. In order to process updates and deletes, compression must be disabled in the source adapter (that is, WAEvents for update and delete operations must contain all values, not just primary keys and, for updates, the modified values)..

    Each row in a source table is written to a document in the target collection mapped to the table. Target document field names are taken from the source event's metadata map and their values from its data array (see WAEvent contents for change data).

    Source table data may be replicated to Cosmos DB collections of the same names by using wildcards in the Collections property. Note that data will be read only from tables that exist when the source starts. Additional tables added later will be ignored until the source is restarted. Alternatively, you may manually map source tables to Cosmos DB collections as discussed in the notes for the Collections property. When the source is a CDC reader, updates and deletes in source tables are replicated in the corresponding Cosmos DB target collections.

    Each source row's primary key value (which may be a composite) is used as the key (_id field value) for the corresponding Cosmos DB document. If the table has no primary key, the target document's key is generated by concatenating the values of all fields in the row, separated by the Key Separator string. Alternatively, you may select a subset of fields to be concatenated using the KeyColumns option as discussed in the notes for the Collections property.

    Cosmos DB limits the number of characters allowed in document IDs (see Per-item limits in Microsoft's documentation). When using wildcards or keycolumns, be sure that the generated document IDs will not exceed that limit.

MongoDB Cosmos DB Writer properties

property

type

default value

notes

Batch Policy

String

EventCount:1000, Interval:30

The batch policy includes eventCount and interval (see Setting output names and rollover / upload policies for syntax). Events are buffered locally on the Striim server and sent as a batch to the target every time either of the specified values is exceeded. When the app is stopped, any remaining data in the buffer is discarded. To disable batching, set to EventCount:1,Interval:0.

With the default setting, data will be written every 30 seconds or sooner if the buffer accumulates 1,000 events.

Collections

String

The fully-qualified name(s) of the CosmosDB collection(s) to write to, for example, mydb.mycollection. Separate multiple collections by commas.

You may use the % wildcard, for example, mydb.%. Note that data will be written only to collections that exist when the Striim application starts. Additional collections added later will be ignored until the application is restarted.

When the input stream of the target is the output of a DatabaseReader, IncrementalBatchReader, or SQL CDC source, it can write to multiple collections. In this case, specify the names of both the source tables and target collections (schema.table,database.collection). You may use the % wildcard only for tables and documents, not for schemas or databases (schema.%,collection.%). If the reader uses three-part names, you must use them here as well. Note that Oracle CDB/PDB source table names must be specified in two parts when the source is Database Reader or Incremental Batch reader (schema.%,collection.%) but in three parts when the source is Oracle Reader or OJet (database.schema.%,collection.%). Note that SQL Server source table names must be specified in three parts when the source is Database Reader or Incremental Batch Reader (database.schema.%,collection.%) but in two parts when the source is MS SQL Reader or MS Jet (schema.%,collection.%).

Connection Retry

String

retryInterval=60, maxRetries=3

With the default setting, if a connection attempt is unsuccessful, the adapter will try again in 30 seconds (retryInterval. If the second attempt is unsuccessful, in 30 seconds it will try a third time (maxRetries). If that is unsuccessful, the adapter will fail and log an exception. Negative values are not supported.

Connection URL

String

Specify <host>:<port>, for example, mymongcos.mongo.cosmos.azure.com:10255. Copy the host and port values from the Connection String page under Settings for your Azure Cosmos DB API for MongoDB account.

Excluded Collections

String

Any collections to be excluded from the set specified in the Collections property. Specify as for the Collections property.

Ignorable Exception Code

String

By default, if the target returns an error, the application will terminate. Specify DUPLICATE_KEY, KEY_NOT_FOUND, or NO_OP_UPDATE to ignore such errors and continue. To specify both, separate them with a comma.

Ignored exceptions will be written to the application's exception store (see CREATE EXCEPTIONSTORE).

Key Separator

String

:

Inserted between values when generating document keys by concatenating column or field values. If the values might contain a colon, change this to something that will not occur in those values.

Ordered Writes

Boolean

True

If you do not care that documents may be written out of order (typically the case during initial load), set to False to improve performance.

Overload Retry Policy

String

retryInterval=1, maxRetries=10

With the default setting, if CosmosDB rejects a write because it exceeds the throughput limit, the adapter will try again in one second (retryInterval. If the second attempt is unsuccessful, in one second it will try a third time, and so on through ten attempts (maxRetries). If the tenth retry is unsuccessful, the adapter will halt and log an exception.

Negative values are not supported.

See Prevent rate-limiting errors for Azure Cosmos DB API for MongoDB operations.

Parallel Threads

Integer

See Creating multiple writer instances.

Password

com. webaction. security. Password

The password for the specified Username.

Retriable Error Codes

String

{"ThrottlingErrorCodes" : [16500,50]}

Specify any error codes for which you want to trigger a connection retry or overload retry rather than a halt or termination.

The default value {"ThrottlingErrorCodes" : [16500,50]} specifies error codes 16500 and 50 will result in an overload retry. {"ConnectRetryCodes" : ["301"], "ThrottlingErrorCodes" : [16500, 50]}” would also specify that error code 301 will result in a connection retry.

For information about these and other MongoDB error codes, see Common errors and solutions.

Upsert Mode

Boolean

False

Set to True to process inserts and updates as upserts. This is required if the input stream of this writer is a Cosmos DB Reader JSONNodeEvent stream.

Username

String

A MongoDB user with the readwrite role on the target collection(s).