dataDiff ​
Command:
dataDiff
Category: Analysis Tools
Status: Production Ready
Description ​
Performs detailed row-level comparison between two datasets, identifying specific columns that differ. This is useful for data validation, change tracking, and identifying discrepancies at the cell level.
What is Data Diff? ​
Data diff is a row-by-row comparison of two datasets (tables, schemas, or even different databases) that shows:
- Which rows match: Identical rows in both datasets
- Which rows are different: Rows that exist in both but have different values
- Column-level differences: Exactly which columns differ and how
- Missing rows: Rows in source but not in target, or vice versa
- Summary statistics: Counts of matches, differences, and missing records
Think of it as a detailed "diff" tool like you use in code comparison—but for data.
Why Would You Want Data Diff? ​
Data diff solves critical problems across your organization:
Data Migration & Integration:
- Migration Validation: Verify that data migrated correctly to a new system (row counts match, values are identical)
- Integration Testing: Compare data before/after integration to ensure no loss or corruption
- System Cutover: Validate that old and new systems have identical data before switching over
- Database Replication: Verify that replica databases are perfectly in sync
- ETL Validation: Confirm that data imported/transformed matches expected results
Data Reconciliation:
- Cross-system Reconciliation: Find discrepancies between operational system and data warehouse
- Financial Reconciliation: Ensure accounting records match between systems (GL, subledger, reports)
- Inventory Accuracy: Compare physical inventory against system records
- Account Balancing: Verify account balances match between source and reporting systems
- Transaction Matching: Match orders between sales system and fulfillment system
Quality Assurance & Testing:
- Production vs. Test: Compare production data with test environment (anonymized) to verify test data completeness
- Regression Testing: Verify that code changes haven't affected data correctness
- Release Validation: Confirm that data-related changes in a release work correctly
- Snapshot Validation: Compare current state against known-good snapshots
- Bug Verification: Confirm that bugs are fixed by comparing before/after data
Troubleshooting & Debugging:
- Data Discrepancy Investigation: Find exactly what's different when reports show wrong numbers
- Timeline Tracking: Compare data snapshots at different points in time to find when data changed
- Source of Truth: Determine which system has correct data when sources disagree
- Audit Issues: Track down exactly which records differ from audit trail
- Performance Impact: Find if schema changes or migration affected data correctness
Compliance & Audit:
- Regulatory Required Reconciliations: Document data matching between systems for compliance
- Audit Trail: Prove that data is accurately synchronized between systems
- Change Documentation: Record exactly what changed in a data update
- Change Audit: Verify that only authorized changes were made
- Regulatory Reporting: Ensure source data matches regulatory reports
What Can You Do With Data Diff? ​
1. Validate Database Migration ​
# Compare old and new database schemas after migration
hana-cli dataDiff \
--table1 CUSTOMERS \
--table2 CUSTOMERS \
--schema1 LEGACY_DB \
--schema2 NEW_DB \
--keyColumns CUSTOMER_ID \
--format json \
--output migration-validation.jsonVerify that all customer records migrated correctly and no data was lost or corrupted.
2. ETL Process Validation ​
# Verify data imported correctly into warehouse
hana-cli dataDiff \
--table1 SALES \
--table2 SALES_FACT \
--schema1 SOURCE_SYSTEM \
--schema2 DATA_WAREHOUSE \
--keyColumns SALE_ID \
--compareColumns AMOUNT,CUSTOMER_ID,PRODUCT_ID \
--showValuesConfirm that source data matches transformed data in warehouse.
3. Financial Reconciliation ​
# Reconcile GL accounts between operational and reporting systems
hana-cli dataDiff \
--table1 GENERAL_LEDGER \
--table2 GL_REPORTING \
--keyColumns ACCOUNT_ID,POSTING_DATE \
--compareColumns DEBIT_AMOUNT,CREDIT_AMOUNT,BALANCE \
--format csv \
--output daily-reconciliation.csvFind accounting discrepancies that need investigation before closing books.
4. Replication Verification ​
# Verify database replication is working correctly
hana-cli dataDiff \
--table1 CRITICAL_TABLE \
--table2 CRITICAL_TABLE \
--schema1 PRIMARY_DB \
--schema2 REPLICA_DB \
--limit 100000 \
--timeout 1800Confirm replica database is perfectly synchronized with primary.
5. System Integration Testing ​
# Compare order data between order system and fulfillment system
hana-cli dataDiff \
--table1 ORDERS \
--table2 FULFILLMENT_ORDERS \
--keyColumns ORDER_ID \
--compareColumns ORDER_DATE,CUSTOMER_ID,TOTAL_AMOUNT,STATUS \
--showValues \
--format jsonEnsure both systems have matching order information.
6. Data Quality Checkpoint ​
# Compare current state with known-good snapshot
hana-cli dataDiff \
--table1 PRODUCTS \
--table2 PRODUCTS_SNAPSHOT_2026_02_00 \
--keyColumns PRODUCT_ID \
--limit 50000Verify that recent changes didn't corrupt data.
7. Audit Trail Comparison ​
# Track exactly what changed in a production update
hana-cli dataDiff \
--table1 CUSTOMER_BACKUP_BEFORE_IMPORT \
--table2 CUSTOMERS \
--keyColumns CUSTOMER_ID \
--format csv \
--output import-changes.csvDocument exactly which customer records changed and how.
Comparison Modes ​
Full Comparison ​
- Compares all columns between tables
- Shows every difference
- Slowest but most thorough
Selective Comparison ​
- Compare only specific columns
- Reduces output and execution time
- Use when you only care about certain fields
Key-based Comparison ​
- Groups rows by key columns
- Shows which rows exist in both/either dataset
- Find missing records quickly
Benefits by Role ​
Data Engineers: Validate data pipelines and transformations
Database Administrators: Verify replication, backups, and migrations
QA Teams: Confirm that schema changes don't affect data correctness
Finance Teams: Reconcile accounts and journal entries
Compliance Officers: Document data synchronization for audit
Business Analysts: Investigate data discrepancies in systems
Syntax ​
hana-cli dataDiff [options]Aliases ​
ddiffdiffDatadataCompare
Command Diagram ​
Parameters ​
Positional Arguments ​
This command has no positional arguments.
Required Parameters ​
| Parameter | Alias | Type | Description |
|---|---|---|---|
--table1 | -t1 | string | First table to compare |
--table2 | -t2 | string | Second table to compare |
--keyColumns | -k | string | Comma-separated columns for row matching (must uniquely identify rows) |
Schema Parameters ​
| Option | Alias | Type | Default | Description |
|---|---|---|---|---|
--schema1 | -s1 | string | **CURRENT_SCHEMA** | Schema for first table |
--schema2 | -s2 | string | **CURRENT_SCHEMA** | Schema for second table |
Comparison Options ​
| Option | Alias | Type | Default | Description |
|---|---|---|---|---|
--compareColumns | -c | string | - | Specific columns to compare (comma-separated) |
--showValues | --sv | boolean | false | Include actual values in report |
--limit | -l | number | 10000 | Maximum rows to compare |
--timeout | --to | number | 3600 | Operation timeout in seconds |
Output Options ​
| Option | Alias | Type | Default | Description |
|---|---|---|---|---|
--output | -o | string | - | File path for diff report |
--format | -f | string | summary | Report output format. Choices: json, csv, summary |
--dryRun | --dr, --preview | boolean | false | Dry run mode (show what would happen) |
Additional Options ​
| Option | Alias | Type | Default | Description |
|---|---|---|---|---|
--profile | -p | string | - | CDS profile for connections |
Connection Parameters ​
| Option | Alias | Type | Default | Description |
|---|---|---|---|---|
--admin | -a | boolean | false | Connect via admin (default-env-admin.json) |
--conn | - | string | - | Connection filename to override default-env.json |
Troubleshooting ​
| Option | Alias | Type | Default | Description |
|---|---|---|---|---|
--disableVerbose | --quiet | boolean | false | Disable verbose output - removes all extra output that is only helpful to human readable interface |
--debug | -d | boolean | false | Debug hana-cli itself by adding output of LOTS of intermediate details |
Special Default Values ​
| Token | Resolves To | Description |
|---|---|---|
**CURRENT_SCHEMA** | Current user's schema | Used as default for schema1 and schema2 parameters |
Usage ​
When --schema1 or --schema2 is not specified, the command automatically uses the current user's default schema. This allows for simplified commands:
# Compare tables in current schema
hana-cli dataDiff -t1 CUSTOMERS -t2 CUSTOMERS_BACKUP -k CUSTOMER_ID
# Explicit schema specification
hana-cli dataDiff -t1 CUSTOMERS -s1 PRODUCTION -t2 CUSTOMERS -s2 STAGING -k CUSTOMER_IDOutput Formats ​
Summary Format (Default) ​
Console-friendly overview:
Identical: 950
Different: 45
OnlyInTable1: 3
OnlyInTable2: 2JSON Format ​
Structured format with detailed differences:
{
"identical": 950,
"different": 45,
"onlyInTable1": [...],
"onlyInTable2": [...],
"differences": [
{
"key": "12345",
"changes": [
{"column": "PRICE", "table1Value": 99.99, "table2Value": 89.99}
]
}
]
}CSV Format ​
Tabular format for analysis:
type,key,column,table1Value,table2Value
difference,12345,PRICE,99.99,89.99
onlyInTable1,67890,,
onlyInTable2,11111,,Examples ​
1. Basic Data Diff ​
Find differences between two tables:
hana-cli dataDiff -t1 EMPLOYEES -t2 EMPLOYEES_UPDATED -k EMPLOYEE_ID2. Diff across Schemas ​
Compare same table in different environments:
hana-cli dataDiff \
-t1 CUSTOMERS -s1 PRODUCTION \
-t2 CUSTOMERS -s2 STAGING \
-k CUSTOMER_ID3. Detailed Value Comparison ​
Show actual values that differ:
hana-cli dataDiff \
-t1 PRODUCTS -t2 PRODUCTS_UPDATED \
-k PRODUCT_ID \
--showValues true4. Compare Specific Columns ​
Focus on particular columns:
hana-cli dataDiff \
-t1 ORDERS -t2 ORDERS_COPY \
-k ORDER_ID \
-c TOTAL_AMOUNT,STATUS,SHIPPING_DATE5. JSON Report Export ​
Export detailed diff in JSON format:
hana-cli dataDiff \
-t1 DATA -t2 DATA_BACKUP \
-k ID \
-f json \
-o ./reports/data_differences.json6. CSV Report for Spreadsheet Analysis ​
Export as CSV for Excel analysis:
hana-cli dataDiff \
-t1 SALES -t2 SALES_ARCHIVE \
-k TRANSACTION_ID \
-f csv \
-o ./reports/sales_diff.csv7. Multi-Column Key Matching ​
Use composite key for row matching:
hana-cli dataDiff \
-t1 TRANSACTIONS -t2 TRANSACTIONS_REFERENCE \
-k ORDER_ID,LINE_ITEM,STORE_IDUse Cases ​
Quality Assurance Testing ​
Verify test data matches expected state:
hana-cli dataDiff \
-t1 TEST_DATA_INPUT \
-t2 TEST_DATA_EXPECTED \
-k TEST_ID \
--showValues trueMigration Validation ​
Ensure all data migrated correctly:
hana-cli dataDiff \
-t1 SOURCE_TABLE -s1 LEGACY_SYSTEM \
-t2 SOURCE_TABLE -s2 NEW_SYSTEM \
-k RECORD_ID \
-f json \
-o ./migration_validation.jsonData Integrity Monitoring ​
Check for unexpected changes:
hana-cli dataDiff \
-t1 MASTER_DATA \
-t2 MASTER_DATA_SNAPSHOT \
-k ID \
-o ./integrity_check.jsonSync Verification ​
Verify two systems are in sync:
hana-cli dataDiff \
-t1 PRODUCTS -s1 SYSTEM_A \
-t2 PRODUCTS -s2 SYSTEM_B \
-k SKUPerformance Tips ​
- Limit rows: Use
--limitto sample large tables first - Selective columns: Compare only relevant columns
- Composite keys: Use multi-column keys when available for performance
- Filter with WHERE: Apply database-level filtering if possible
Tips and Best Practices ​
- Test with limit first: Use
--limit 100before full comparison - Choose correct keys: Primary or unique keys work best
- One issue at a time: Focus on specific column groups
- Archive reports: Keep historical diff reports for tracking changes
- Share reports: Review diff results with teammates to discuss discrepancies
Related Commands ​
See the Commands Reference for other commands in this category.
Primary Related Commands ​
- compareData - High-level data comparison by matching rows and identifying differences
- dataValidator - Validate data quality and integrity with rule-based validation
Additional Related Commands ​
- export - Export data for external analysis
- compareSchema - Compare schema structures between databases