Skip to main content

Glossary

Term

Definition

Validation

A Validation represents a structured comparison between a Source dataset and a Target dataset. The datasets may be static or actively changing while the comparison is performed; Validata always evaluates their state at the time each run begins.

The rules that govern how the comparison is executed—including scope, method, and the set of Validation Pairs—are defined in the Validation Configuration. A Validation can be run on demand or according to a recurring schedule.

Each execution of the comparison is a Validation Run, which always uses the most recent configuration in effect at the time the run starts.

Dataset

A dataset represents a structured collection of records organized in a defined format. Examples include database tables, data-warehouse tables, Kafka topics, or structured data files.

In this edition, the term dataset refers specifically to tabular structures—such as database or data-warehouse tables—that Validata can read and compare during a Validation.

Source

The Source is the dataset designated as the trusted reference or system of records for the validation process. Generally, this is the upstream dataset in a replication pipeline, or the dataset you consider as the reference for any comparison.

Target

The Target is the dataset compared against the trusted Source dataset to identify discrepancies. Typically, this is the downstream dataset in a replication pipeline that is being validated for accuracy against the Source.

Table

A table or a database table in a relational database represents a collection of data organized in rows and columns. The table represents the smallest logical data unit for which Validata generates a comparison report. In Validata, reporting is anchored at the table level to align with how you may naturally conceptualize your data. For example, you may want to know whether the tables in the Source are in sync or out of sync with the corresponding tables in the Target.

Validation Configuration

The Validation Configuration defines parameters of a comparison, including:

  • Source and Target Datasets: Definitions of the datasets, including the connection profiles needed to access the Source and Target.

  • Validation Type and Scope: The techniques and rules used for the comparison across the datasets.

  • Validation Schedule: The frequency or cadence for comparing the Source and Target datasets.

  • Validation Pairs: The table and columns mappings between the Source and Target datasets that the validation process will use for comparison.

  • Personalized Settings: Attributes reflecting your validation configuration, such as favorite validation pairs.

After a Validation Configuration is created, certain attributes—specifically the Validation name, Source and Target definitions, and the Validation Type and Scope—are immutable. Other attributes can be modified, and any updates will be applied in the subsequent Validation Run. A Validation Configuration can be duplicated or deleted.

Connection Profile

A Connection Profile is a reusable configuration object that securely stores the authentication and connection attributes required to access an external system from within Validata. It may include credentials and tokens such as usernames and passwords, OAuth tokens, API keys, access key/secret key pairs, or Entra ID (formerly Azure AD) tokens.

All sensitive fields—such as passwords and tokens—are encrypted using AES-256 within Validata Historian and are never displayed in clear text, even to privileged users. A single Connection Profile can be referenced by multiple Validations. Any updates made to a Connection Profile are automatically reflected across all associated Validations.

Validation Pair

A Validation Pair is a pair of tables that a Validation compares according to the validation configuration. It represents the fundamental mapping relationship established between a complete Source table and its corresponding Target table or between subsets of the Source table and the corresponding Target .table. It is configured as part of the Validation Configuration and includes a direct or derived mapping of individual columns from the Source table to the corresponding columns in the Target table.

Each Validation Pair must comply with the rules of the selected Validation Type; pairs that do not conform cannot be included in the validation process. This ensures that comparisons are accurate, consistent, and aligned with the chosen validation approach.

Column Pair

A column pair is a set of two columns—one from the Source table and one from the corresponding Target table in the Validation Pair—that have been explicitly mapped for data comparison during validation. Validata maps a column in the Source table to a column in the Target table so that it can evaluate and report whether the data in the Target column matches that in the Source column.

Comparison Key or Key

A Comparison Key is a set of mapped columns defined within a Validation Pair to identify and match records between a Source table and a Target table.

A Comparison Key is created by pairing column(s) from the Source table with corresponding column(s) from the Target table. These column pairs must represent the same logical data attribute so that Validata can correctly align records across both tables for comparison (e.g. Source.CustomerID mapped to Target.CustomerID).

A comparison key may consist of:

  • A single column pair—ideally the primary key of the tables, but not necessarily—as long as the column uniquely and reliably identifies records, or

  • Multiple column pairs, functioning as a composite key whose combined values uniquely identify records.

Validata uses Comparison Keys to determine which records should be compared. A Source record is compared to a Target record only when their key values are identical. All Validation Types require a Comparison Key, except for Custom Validation, where it is optional but strongly recommended.

Validation Set

A Validation Set is a collection of one or more Validation Pairs that are validated together under a single Validation Configuration. Validation Sets are useful when you want to validate multiple Validation Pairs, ensuring consistency and efficiency across the validation process. Validata automatically generates the Validation Pairs, mapping Source tables to their corresponding Target tables and then mapping the columns within each Validation Pair.

Validation Type

Validata supports the following validation methods:

  • Vector Validation: A fast, full-dataset validation method that uses vector signatures to detect differences between the Source and Target tables. These signatures are computed within the external data systems, which can increase the compute load on those systems. This is the default validation method.

  • Fast Record Validation: A full-dataset approach that compares the comparison keys and a hash of the remaining columns to quickly detect differences between the Source and Target tables. It shifts the bulk of the comparison work to the Validata server, limiting the amount of processing required on the external data systems.

  • Full Record Validation: A full-dataset validation in which every record and every selected column in every record from the Source and Target tables is compared directly on the Validata server. This offers the most comprehensive comparison but can require significant compute resources and time to complete.

  • Interval Validation: Validata evaluates only the portion of the Source and Target tables that were updated within a specified time window, such as records updated in the past two hours. The time interval can be defined using absolute timestamps or a relative duration. This method requires the presence of a timestamp or datetime column in both Source and Target datasets.

  • Key Validation: Validata checks whether each record in the Source table has a matching record in the mapped Target table based on the defined Comparison Key. It verifies the presence or absence of corresponding records but does not compare non-key values.

  • Custom Validation: Available only for a singleton Validation Pair. Validata compares the Source and Target tables using the results of a user-specified SQL query, making this option suitable for advanced or specialized validation scenarios beyond the built-in methods.

Validation Run

A Validation Run is a single execution of a Validation. During each run, Validata compares the Source and Target datasets specified in the Validation Configuration. For every Validation Pair in the Configuration, Validata compares the data in the mapped Source and Target tables, identifies mismatches, generates a Validation Run Report, and—when applicable—creates reconciliation scripts to help resolve data discrepancies.

A Validation may include multiple runs. For example, a Validation scheduled every six hours produces four runs per day, each with its own Validation Report. A one-time Validation can also have multiple runs if you initiate them manually.

Validata executes only one active run per Validation at any given time. You cannot start a new run while another run of the same Validation is in progress. If a Validation is scheduled to run on a recurring cadence, Validata automatically skips a scheduled run when it detects that a previous run of the same Validation is still underway.

Validation Run Report

The Validation Run Report is the output generated at the end of each Validation Run. Validata produces a unique report for every run, regardless of the result. The report indicates which Validation Pairs are in sync and which are out of sync.

Validation Pair Report

During a Validation Run, Validata compares the mapped Source and Target tables in every Validation Pair defined in the Validation Configuration, and generates Validation Pair Report for every pair that was evaluated. Each report lists the records that are out of sync between the Source and Target tables and includes an optional reconciliation script that you can run directly on the data systems to address the discrepancies.

Validation Pair comparison: Revalidation

An optional two-phase validation flow designed for continuously replicated datasets where recent Source updates may not yet appear in the Target. In the first phase, Validata performs the standard comparison and identifies Out-of-Sync records. In the second phase—Revalidation—Validata waits for a user-defined Wait-time to Revalidate, typically set slightly above the expected replication latency, and then rechecks only the records previously marked as Out-of-Sync to determine which discrepancies have resolved through replication lag and which represent true, persistent data mismatches.

All built-in validation methods (Vector, Fast Record, Full Record, Key, and Interval) support Revalidation; however, Custom Validation does not.

Validation Pair comparison: Halting due to excessive errors

An optional guardrail that automatically stops the comparison of a Validation Pair when the ongoing percentage of Out-of-Sync records exceeds a user-defined threshold. This prevents unnecessary processing when high Out-of-Sync rates suggest a deeper issue—such as incorrect Validation Pair mappings or significant Target-side divergence possibly caused by delayed, stalled, or failed replication. The guardrail activates only after Validata has processed the user-specified minimum number of records for a Validation Pair.

Validation Pair comparison: Null records

A Null Record is defined within the context of a Validation Pair. It is any record in the Source or Target table of a Validation Pair that contains a null value in any of the user-defined comparison key columns. Validata automatically filters out these records during data ingestion, meaning they are not processed, compared, or included in any validation metrics or reconciliation scripts.

Validation Pair comparison: Duplicate records

A Duplicate Record is defined within the context of a Validation Pair. It is any record in the Source or Target table whose comparison key values correspond to a Duplicate Key.

A Duplicate Key is a comparison key value (or combination of values) that occurs in two or more records within either the Source or the Target table of the Validation Pair after Validata has filtered out null records.

Validata excludes all Duplicate Records from its in-sync and out-of-sync comparisons between the Source and Target tables.

Validation Pair comparison: In-Sync records

An In-Sync Record is a record in a Validation Pair that appears in both the Source and Target tables after Validata has filtered out Null Records and excluded Duplicate Records. A record is considered In-Sync when its comparison key values match and all corresponding column values are identical in both the Source and Target tables.

Validation Pair comparison: Out-of-Sync records

An Out-of-Sync Record is any record in a Validation Pair that is not In-Sync after Validata has filtered out Null Records and excluded Duplicate Records. A record is classified as Out-of-Sync if it meets any of the following criteria:

  • It appears in both the Source and Target tables with the same comparison key but differs in one or more non-key columns, or

  • It appears only in the Source table, or

  • It appears only in the Target table.