Skip to main content

Product architecture

Validata is a data validation solution that compares datasets across source and target systems to identify discrepancies and ensure data integrity. Validata works with any replication engine, including Striim, third-party tools, and custom-built pipelines, allowing you to validate data regardless of how it moves through your infrastructure.

validata-architecture.png

Deployment

Validata is deployed as a self-managed application on-premises or in a cloud VM. It operates independently of your replication infrastructure and requires no changes to your existing data systems or replication tools. Validata connects to source and target systems using standard database connectivity, reading data without modifying it.

See Deploy and manage Validata to learn more.

Validation coverage

Validata validates data across major databases (MySQL, Oracle, PostgreSQL, SQL Server) and cloud data warehouses (BigQuery, Databricks, Snowflake). Any combination of these systems can serve as the source or target in a validation, supporting heterogeneous environments where data moves between different platforms.

Intelligent automapping reduces configuration time by automatically mapping source tables and columns to their target counterparts. You can review and adjust these mappings as needed, and include or exclude specific tables or columns based on your requirements.

See Supported data systems to learn more.

Validation methods

Validata provides six validation methods to balance thoroughness and performance:

  • Full-dataset methods (Vector, Fast Record, Full Record) compare entire tables with varying levels of detail and resource requirements.

  • Partial-dataset methods (Interval, Key) enable lightweight, frequent checks on recent changes or record presence.

  • Custom validation supports complex scenarios using SQL queries, including joins and filtered comparisons.

This range of methods supports a layered validation strategy. For example, you can run comprehensive full-dataset validations during maintenance windows, and use lightweight partial-dataset validations for continuous monitoring throughout the day. This approach provides confidence in data accuracy without placing undue load on production systems.

See Validation Types to learn more.

Reporting and compliance

The Historian stores all validation reports, providing a complete audit trail for compliance and governance requirements. Reports include details about in-sync and out-of-sync records, duplicate keys, and content mismatches. You can view reports in the Validata UI or download them as JSON.

Trend analysis across multiple validation runs helps detect silent drift—gradual data divergence that might otherwise go unnoticed. Historical data enables you to track data quality over time and demonstrate compliance to auditors and stakeholders.

Reconciliation scripts are automatically generated for out-of-sync records. These SQL-based scripts can be reviewed and executed on the target system to resolve identified discrepancies.

The Historian is hosted on a database that ships with Validata by default. For organizations that prefer to use existing infrastructure or have data residency requirements, the Historian can alternatively be hosted on your own Oracle or PostgreSQL instance.

Enterprise security

Validata is designed for enterprise environments with robust security requirements:

  • Access control: Role-based access control (RBAC) and single sign-on (SSO) integration ensure that only authorized users can configure and run validations. Validata supports native authentication and SSO integrationwith Microsoft Entra or Okta to meet your infosec requirements.

  • Credential management: Validata supports secure credential storage through Vault integration, keeping sensitive values out of configuration files. Use the native Validata vault or integrate with external secret managers—Azure Key Vault, CyberArk Vault, Google Secrets Manager, or HashiCorp Vault (KV v2)—to manage and securely reference credentials in Validata.

  • Alerting: Configurable alerts notify teams via email, Slack, or Microsoft Teams when validations start, complete, fail, or encounter issues such as exceeding the out-of-sync threshold.

Monitoring

Validata provides built-in monitoring to track system health and validation activity:

  • Resource monitoring: Track CPU and memory consumption on the Validata server to ensure adequate capacity and identify potential bottlenecks.

  • Activity tracking: View validation activity over configurable time windows (last hour, 24 hours, etc.) to understand system utilization and workload patterns.

  • Log integration: Logs can be configured to stream to external monitoring tools such as Grafana, enabling integration with your existing observability infrastructure.

Validata AI

Validata AI provides a conversational interface for analyzing validation results and generating insights. Rather than manually reviewing reports, you can ask questions about validation history, trends, and specific discrepancies in natural language.

Validata AI supports flexible deployment to meet your security and compliance requirements:

  • Cloud LLMs: OpenAI, Google Gemini, and Anthropic Claude are supported. You provide your own API keys—credentials never pass through Striim infrastructure.

  • Local LLMs: Ollama enables fully isolated deployment with no external network traffic. All AI processing remains within your VPC, addressing data privacy and security concerns.

Validata AI is optional. All validation functionality is available without it, and you can enable it at any time as your needs evolve.