Validation Types
Validata offers a suite of validation methods that enable you to balance validation thoroughness, performance, resource usage, and time to completion based on your specific use cases.
The Validation Type you select applies to every Validation Pair within that Validation. To apply different methods, you can create multiple Validations—each using a different Validation Type—and schedule them independently.
Validata’s built-in methods support both full-dataset and partial-dataset validation by performing a direct, as-is comparison. These methods evaluate values in corresponding columns exactly as stored. While Validata can accommodate certain minor differences—such as variations in letter casing or predictable padding in fixed-length strings—other discrepancies will result in mismatches
Significant discrepancies include inconsistent padding in variable-length strings, data transformed by logic applied during replication, or Target values stored with reduced precision or a different data type. In these cases, Validata will flag the records as mismatched even if the underlying information is equivalent
For scenarios requiring transformation logic or complex comparison rules, Validata offers Custom Validation, allowing you to use SQL queries to extend beyond these built-in capabilities.
Full-dataset validation
Validata provides three full-dataset methods—Vector Validation, Fast Record Validation, and Full Record Validation. All three evaluate every selected column in every record, with the main difference being where the bulk of the processing occurs.
These methods can be used for relational databases (e.g., Oracle, PostgreSQL) and cloud data warehouses (e.g., Snowflake, BigQuery) when the data is written in MERGE mode.
Vector Validation
Vector Validation is the default full-dataset method.
Mechanism: Validata computes vector signatures of the Source and Target tables in their respective data systems, then compares the signatures on the Validata server to identify out-of-sync records.
Resource usage: Because only compact signatures are transferred, this method significantly reduces network traffic and potentially lowers bandwidth costs—making it especially well-suited for very large datasets. However, the vector computations can increase the compute load on the external data system.
Use case: Vector Validation is fast and efficient across datasets of all sizes, but it is especially well-suited for massive datasets.
Constraints: Vector Validation does not identify and flag “perfect” duplicate records in a table, i.e. records where the data in the comparison key columns and all other columns are identical. If you have defined constraints such as a Primary Key or a Unique Constraint, then your data system will prevent “perfect” duplicate records, and you can use the Vector Validation method.
Fast Record Validation
Mechanism: In Fast Record Validation, Validata evaluates the comparison key columns and a record hash of the non-key columns for every record from the Source and Target tables to determine dataset differences. The hashes are computed in the respective data systems and then transferred, along with the comparison key columns, to the Validata server.
Resource usage: This method is optimized for reduced impact on the Source and Target data systems because it offloads the full set difference computation to the Validata server, reducing the work on the external data systems. However, the data transferred to the Validata server and the resource usage on the Validata server are both greater than the Vector Validation method.
Use case: You can use the Fast Record Validation to evaluate databases of all sizes, but it is particularly useful when comparing medium-to-large datasets.
Full Record Validation
Mechanism: Full Record Validation fetches the dataset from both Source and Target systems and performs a full column-by-column comparison on the Validata server.
Resource usage: Because all computation happens within the Validata server, there is minimal impact on the Source and Target systems. This approach provides the most complete comparison but is resource-intensive and results in higher data-transfer volumes.
Use case: Full Record Validation is recommended for small-to-medium datasets. While you can use it to compare large datasets, expect increased transfer times and processing overhead.
Partial-dataset validation
Validata offers two methods to evaluate a subset of your data: Key Validation and Interval Validation. Both fetch data from the Source and Target systems and compare it on the Validata server.
You can use these methods with relational databases such as MySQL and SQL Server and with data warehouses such as Snowflake and Databricks when data is written in MERGE mode.
Key Validation
Mechanism: Key Validation provides an easy way to confirm record presence based solely on the comparison key. Validata compares only key values in the Validata server and classifies each unique key as one of
In-Sync,Out-of-Sync: Extra at SourceorOut-of-Sync: Extra at Target.Resource usage: This is a lightweight validation method.
Use case: You can use Key Validation when you need to verify record existence or unique-record counts, rather than full record content.
Interval Validation
Mechanism: Interval Validation compares only those records updated within a specified time window. You can define the window using:
Relative time (e.g., “between 2 and 4 hours ago”)
Absolute time (e.g., “between 1:00 PM and 5:00 PM”)
Resource usage: Because all computation happens within the Validata server, there is minimal impact on the Source and Target systems. The data transferred to and the resources used on the Validata server depend on the size of the time window.
Use case: You can choose Interval Validation to align your validations with your replication cadence or to perform incremental checks that catch discrepancies sooner, without waiting for full maintenance-window checks. For example, run a validation every 2 hours to compare data updated in the previous 2 hours.
Constraints: This method requires the presence of a timestamp or datetime column in both the Source and Target tables.
Custom validation
Mechanism: Custom Validation allows you to build specialized validations by enabling you to extract data from the Source and Target tables using SQL queries. Validata then compares the results of these queries. Custom Validation is available only for comparing a singleton Validation Pair.
Constraints: Custom Validation does not support Revalidation and generation of reconciliation scripts.
Use Case: Append-mode replication
Background: When replicating data to a data warehouse (e.g., Snowflake, BigQuery, or Databricks) using APPEND mode, updated records are written as new rows in the Target table, rather than modifying existing ones. This creates duplicate rows that share the same keys.
Validata’s built-in validation methods automatically classify and exclude these duplicates, which means they compare only original, unchanged data and do not validate updated records.
Solution: To validate APPEND-mode datasets, you can use Custom Validation with SQL queries that produce deduplicated views of both the Source and Target tables. Each query should return only the latest version of each record and output columns in the same order.
Validata runs these queries against their respective systems and compares the results, enabling accurate validation of updated records.
Custom Validation supports validation between a database table and a warehouse table populated in APPEND mode, as well as between two warehouse tables when one or both use APPEND-mode replication.
Use Case: Compare Data Transformed Using an SQL Function
Background: When replicating data, the Target system may store certain values in a different format or structure than the Source,such as rewritten date fields, truncated strings, or numbers with reduced precision. These differences may arise from transformation logic applied during replication or simply from the Target using a different data type or representation for the same underlying value.
Because Validata’s built-in methods compare values exactly as stored, they will report these differences as Content Mismatch errors even when the information is effectively equivalent.
Solution: When the transformation can be expressed using an SQL function or query—such as reformatting dates, converting timestamps, trimming or padding text, or adjusting numeric precision—you can use Custom Validation to align the Source and Target before comparison.
By defining SQL queries that apply the required transformation (or its inverse), you can generate standardized result sets that reflect equivalent values, allowing Validata to compare the datasets accurately.
Selecting a Validation Type
Choose the method that best suits your data systems and your verification workflow. We recommend that you evaluate a validation method in a non-production environment before using it in production.
If your data warehouse stores data in APPEND mode, use Custom Validation.
If you transform your Source data before writing it to the Target and you can express the transformation using a SQL query, start with Custom Validation.
For relational databases or data warehouses using MERGE mode, start with Vector Validation or Fast Record Validation.
See Validation use cases and recommendations for additional use cases.