Skip to main content

MongoDB operational considerations

  • If MongoDB Reader receives no messages for an entire oplog retention period, on restart the application will halt with the error message "Command failed with error 286 (ChangeStreamHistoryLost)".

Recommendations for Change Streams use in production environments

  1. Consolidate multiple MongoDB Readers: Consolidating multiple mongo change stream capture processes to 1 per Mongo deployment would significantly decrease the CPU, resource load on the database server and reduces the possibility of connection issues due to resource contention. This could also help continuously advance the change stream position, as including more active collections to the same reader process would ensure it keeps receiving new events from the mongo server.

    This can be achieved without compromising downstream isolation by using a router to multiple writers. Further source and target isolation can also be achieved by connecting the capture and apply apps using persistent streams.

    For more information, see Change Stream Performance Considerations and Change Streams Production Recommendations > Indexes and Performance.

  2. Provide a socket timeout to the adapter through the connection URL as per Connection String Options > Timeout Options to prevent the adapter from hanging on socket reads.

  3. Provide a "heartbeat" collection as part of the list of collections to be read for the adapters/applications which would deal with low/no traffic so that the checkpoint can be advanced regardless of operations to the actual interested collections. This is a collection with a cron job that updates it at regular intervals. This interval should be shorter than MongoDB's socket timeout interval. For example:

    #!/bin/bash
    
    # Heartbeat update script for MongoDB collection
    # Usage: ./heartbeat_update.sh [connection_string] [username] [password] [database] [collection]
    
    # Default values
    CONNECTION_STRING=${1:-"mongodb://localhost:27017"}
    USERNAME=${2:-""}
    PASSWORD=${3:-""}
    DATABASE=${4:-"myapp"}
    COLLECTION=${5:-"heartbeat"}
    
    # Build authentication string if credentials provided
    AUTH_STRING=""
    if [[ -n "$USERNAME" && -n "$PASSWORD" ]]; then
        AUTH_STRING="--username $USERNAME --password $PASSWORD"
    fi
    
    
    # MongoDB update command
    mongosh "$CONNECTION_STRING" $AUTH_STRING --eval "
    db.getSiblingDB('$DATABASE').$COLLECTION.updateOne(
      { _id: 'heartbeat' },
      { 
        \$set: { 
          lastUpdate: new Date(),
          status: 'alive'
        }
      },
      { upsert: true }
    )
    "
    
    # Cron entry example (runs every 30 seconds):
    # * * * * * /path/to/heartbeat_update.sh mongodb://localhost:27017 myuser mypass myapp heartbeat
    # * * * * * sleep 30; /path/to/heartbeat_update.sh mongodb://localhost:27017 myuser mypass myapp heartbeat
  4. If necessary, increase the oplog retention period on the MongoDB server if there is a requirement to resume/restart apps that are stopped longer than the existing retention time.

MongoDB Reader monitoring metrics

Metric

Measure

Scope

Updated

Description

Average number of messages from collection per second

average number of messages per second for each collection

since the adapter was first started

every 5 seconds

calculation: total number of messages read / number of seconds the adapter has been running;

  • Calculation values are retained when the adapter is stopped, but only the time the adapter was running is counted. For example, if the adapter was started three hours ago but was stopped for an hour, only two hours are used in the calculation.

  • Values are reset to zero when the adapter is undeployed.

returned as JSON, for example:

{
  "src.emp": 15.657016267402211,
  "src.hbase": 5709.086194391149,
  "src.emp1": 8209.751005331136
}

Average time spent for IO

average duration of IO calls, in milliseconds

since the adapter was started

every 5 seconds

calculation: total milliseconds adapter was actively reading from MongoDB (not including time the adapter was idle) / total number of IO calls