Oracle Database troubleshooting
Handling long-running open transactions in Oracle
Typically, database transactions are completed quickly, within a few seconds. Occasionally, however, a database transaction may be open for a long time, minutes, hours, or even days, before it completes. Potential causes of long-running open transactions include:
A user started a transaction but forgot to commit it or roll it back.
A SQL client application or utility opens a transaction automatically when connecting to the database and is keeps it open until disconnected. Such transactions frequently contain no operations.
A SQL application with poor transaction handling logic does not properly close a transaction.
Network problems or disconnections can leave a transaction open if they occur after the transaction begins but before it is committed.
High levels of concurrent access and locking issues can delay completion of transactions.
Long-running open transactions can have several impacts on Striim:
When long-running open transactions contain many operations (typically the case with batch jobs), CPU, memory, and disk usage may increase, resulting in increased lag in writing to targets. In the worst case, Striim may run out of memory or disk space and halt.
Recovery (see Recovering applications) from a crash or planned downtime may take longer as Striim will restart from the point where the oldest open transaction was started. In the worst case, the redo / archive log files may no longer be available and recovery will fail.
Quiesce may fail because the transaction cannot be flushed within the 30-second timeout.
Therefore, you must monitor your applications for signs of long-running open transactions and take prompt steps to address them to minimize impact to your Striim environment. Where possible, Striim strongly recommends that you modify your upstream Oracle applications and configurations to avoid long-running open transactions.
Monitoring long-running open transactions
Your applications may be impacted by long-running open transactions if you see signs such as low disk space or memory for the Striim server, the recovery checkpoint not advancing, recovery taking a long time, or inability to quiesce an application.
To determine whether you indeed have long-running open transactions, use the MON <Oracle Reader name> command. The output will include an entry similar to the following:
Oldest Open Transactions │
[{"5.21.21991":{"# of
Ops":1,"CommitSCN":"null","Sequence #":"1","StartSCN":"60601569",
"Rba block #":"3847","Thread #":"1","TimeStamp":"2023-03-02T00:47:10.000+05:30"}}] This tells us that the transaction with StartSCN 60601569 has been running since March 3, 2023. "# of Ops":1 suggests that this transaction is open because a user didn't commit it or roll it back or because a SQL client application is behaving inappropriately. An open transaction that might cause out-of-memory or disk errors would typically have a large number of operations.
Alternatively, use the SHOW command (see Viewing open transactions).
Best practices for managing long-running open transactions
Ideally you should avoid long-running open transactions by not running queries that create them. This may involve breaking large batch jobs up into multiple smaller jobs that can complete in less time. You should also run batch jobs, particularly those that may last for hours, during maintenance windows or at times when the database is not at its busiest.
You should also educate users with permission to connect to Oracle not to leave transactions open without committing them or rolling them back, and not to use client applications in such a way that they leave transactions open.
Handling long-running open transactions in Striim applications that use Oracle Reader
When there are long-running open transactions with many operations, Oracle Reader's transaction buffer helps avoid out-of-memory errors by buffering transactions to disk on the Striim server. By default, Oracle Reader is configured so that when a transaction's memory usage exceeds 100 MB (Transaction Buffer Spillover Size), it is buffered to disk (Transaction Buffer Type) on the Striim server, and processing is resumed when memory is available. When recovery is enabled, after the application halts, terminates, or is stopped, the buffer will be reset. During recovery, any previously buffered transactions will restart from the beginning.
If you believe that long-running transactions that do not contain any DML operations relevant to your Striim applications may be causing problems in your environment, you may use Striim's DISCARD TRANSACTION functionality to discard them. This clears a specified open transaction from the transaction buffer, sends any buffered operations to downstream components such as CQs and writers, and makes the recovery checkpoint progress. Any further operations that are part of the discarded transaction will be ignored. This command should be used only with guidance from Striim support (see Contact Striim support).
Handling long-running open transactions in Striim applications that use OJet
When there are long-running open transactions with many operations, OJet's transaction buffer helps avoid out-of-memory errors by buffering transactions to disk on the Oracle database server. By default, OJet is configured so that when any one of the statements in a transaction has been in memory for 1000 seconds (Transaction Age Spillover Limit) or the total number of operations exceeds 10,000 (Transaction Buffer Spillover Count), the transaction is buffered to disk on the Oracle server, and processing is resumed when memory is available. When recovery is enabled, after the application halts, terminates, or is stopped, recovery may begin from the oldest open transaction. If you need assistance in dealing with long-running open transactions, Contact Striim support .