Introduction to Data Reconciliation[Data Reconciliation Definition]
Data Reconciliation (DR) is a verification step during or after data migration to cross-check if the target and source have the same data.
Data Reconciliation happens almost in every industry and a very important aspect of the software journey of an Industry.
Reconciliation can happen from various legacy systems to a single target system. Data reconciliation is a form of audit or testing that should be done after or during a shift in the database.
History of Data Reconciliation [what is data reconciliation]
Data Validation and Reconciliation project started in 1960 that aims to measure the raw variables. Later in the same year, all unmeasured variables were also considered.
In 1977 Stanley and Mah introduced Quasi steady-state dynamics that help to filter and parallel parameter estimation.
In 1992, Libman introduced dynamic DVR for non-linear optimization models.
Reason for Data Migration and Data Reconciliation
- Today database of any organization is grooming at a heavy speed and all these industries need additional storage spaces.
- Industries are moving towards high-end servers.
- To reduce the cost and complexities migration is necessary to a steady, consumable, affordable system.
- Data should be transportable from physical to virtual devices.
- This enables clean, accurate, precise data for consumption.
Data Migration strategies
Data migration is a process that consists of Data Extract, transform and load (ETL). Data Migration strategies should cover the challenges like –
- Proper identification of the source and target database.
- Meet data quality.
- Creation of proper project methodologies.
- Issues in general migration expertise.
- Ensuring accuracy and completeness of the data migration.
- Ensure to cover the challenges faced by source data.
- Implement agile or any iterative model to migrate data in phases.
- Implement several checkpoints in several phases to ensure the completeness of the migration.
- Implement correct tools and techniques to cater to the migration.
- The strategy should advocate a large volume of data during the migration process.
- Properly identifying the profile of data in the legacy system.
- Prepare transformation rules.
Why do we need Data Reconciliation?
When we migrate data from one source to destination, the following errors may happen during data transfer –
- Incorrect mapping
- Incorrect transformation logic implementation
- Runtime error
- Network failure
- Package drop due to heavy load.
Prime reasons for data migration errors
- Faulty hardware
- Network issue
- Format mismatch
- Database definition mismatch
What are the Side effects of faulty data transfer?
The faulty data transfer makes the software unstable with the following issues –
- Many missing records
- Missing column values
- Incorrect volume values
- Duplicate entries to the target database
- Wrongly formatted values
- Relationship entities for tables get broken
Pre-migration (reconciliation process)
- Test the scope of the source system and data (included and excluded).
- Test the high-level mapping for each category of data content. Check the type (Type testing).
- Test the target system’s data requirements (field type, valid value, mandatory fields, etc) also check mandatory fields are not null.
- Test the target and source has a unique link source.
- Test the connections from source to target.
- Test the migration tool configuration so that it can transfer data field by field way.
Data Reconciliation Process
In this phase, we test the correctness of the records in both the database –
In the first step, we need to manually check the number of record counts in both the database. This will solve if the target has missing records.
More deep drive to test column to column requires a huge loss in resources. If we commit to this step, other issues of reconciliation may go unnoticed.
This is the reason we need to incorporate automation tools to verify the entries.
Important terminologies of Data Reconciliation
Redundancy – It represents that the target system may have redundant (duplicate data).
Observability – Observability is another analysis that determines the variable between The Set of Constraints and the set of measurements.
Checksum Process and Error- Checksum Methodology to detect Network issue. During the reconciliation process, checking the parity bits and extension is called the checksum process.
Gross Error – This measurement technique shows bias errors, failures of instruments, and any noise during data migration.
Variance – This is another measurement that depicts the season variability.
Unit of Migration (UoM) – Unit of Migration is the measurement technique that migrates the challenges of business and technical issues.
Survey Verification- No records are called survey verification.
Data Reconciliation Models
The reconciliation process is a step-by-step process. The steps are-
- Setup a new environment.
- Release additional unit.
- Change source and Target Schema in the Existing System.
- Generate reports to verify the reconciliation process.
The models of Data Reconciliation
Master Data Reconciliation – In the Master data reconciliation, model, only the master data between the legacy source and target system are reconciled. Because master data is mostly unchanged or very slow in changing, it is not a very tough reconciliation process.
The process of reconciliation of master data [ data reconciliation process ]
- Check the total count of rows – (Total data rows in legacy to target system).
- Check the total number of columns from the legacy to the target system.
- Check the active user count in both the system.
- Check the inactive user account in both the system.
As part of the reconciliation process, we need to check the followings –
- Transactional data are correct and valid.
- Transactions are properly authorized.
Transactional data Reconciliation
Transactional data are very important for an organization (especially the financial institutes). It is an integrated part of BI reports. So missing records may directly hamper the organization’s reputation.
The main technique used to check the transactional data reconciliation is to follow the “total method”!
While the above two methods are majorly followed manually with the recent up-gradation of technologies, automatic way of reconciliation is also becoming big.
Automatic Data Reconciliation
The main objective of this process to make the gross error should be zero.
Tools for Data Reconciliation
- Aperture Data Studio
Data reconciliation is a step-by-step approach that starts with an analysis of legacy data, copies, culminates, loading and reconciliation into the target application.
The industry today sees lots of growth of data hence they need data migration. As the migration of data is a very complex process, it requires thorough testing in various phases.
- OpenRefine (mostly works well on messy data)
- TIBCO clarity (Works as SAAS)
- Winpure
- Oracle Flex CUBE
- APP-RECON – Missing data, Data corruption.
- Data excellence
- chains
Various Data Reconciliation Reports
Various reports are –
- Migration Reconciliation Report.
- Adhoc Reconciliation Report.
- Parallel Run Reconciliation report.
Migration Reconciliation Report – This report is generated after the migration is completed from legacy to the target system. This report covers a complete reconciliation report and covers almost all elements that are to be reconciled.
Adhoc Reconciliation Report – Adhoc reconciliation report is used to cross verify the migration for a certain set of data.
Parallel Run Reconciliation Report – This report is generated after running a batch file targeting both a legacy database and a new database to find any specific entity for discrepancies.
Testing of Data Reconciliation
Testing of the Data Reconciliation is an important aspect while migrating data from legacy or older systems to a new or upgraded system.
The thousands to million to billion data (transaction) migration is a very tough manual process.
This is the reason in the early days some random sampling of data was tested and if found same, the auditor/tester used to certify that the Recon process passed.
# Count of records is not valid testing. It is a kind of preliminary way of testing.
The testing strategies are as follows [data reconciliation strategy]
Schema validation –
During the reconciliation process, it is important to validate schema (stored procedures, views, user preferences, dynamic data, etc). Manual verification with so many things is a waste of resources.
It is always better to write a script and execute them via automation. This automation can defect any problems with source and destination very easily.
Column by column comparison –
This testing provides 100% result but has time constraints as a negative side. Automation using query surge can be a very good solution over manual verification.
Checking of Reconciliation –
Validation of reconciliation checks from source to target for all columns ensures that the data is not corrupted. Using a Unix shell script, it can be automated via text file comparison.
Null/ Blank Space Validation –
This process also can be automated to check if any null values are inserted by the migration tool.
Monkey testing –
Monkey/Adhoc – manual testing should be performed or random dataset to check the reconciliation process.
Apart from these functional testing, we must check the non-functional aspect of the data Reconciliation process.
Security Validation – During security testing, we check if all user roles have been migrated to the new system. This check ensures that no unauthorized access can happen in the new system. This security check should cover roles, tables, views, stored procedure, dynamic value calculation.
Performance Testing – Performance Testing should be carried on the new system to validate the benchmarks set by the company.
Test the throughput of the migration process. This will tell planned sometime.
UAT- Reconciliation should also check if the data migration content has some special conditioning.
UAT also plays an important role in the Reconciliation process. During UAT, we definitely check the critical data, null check, Random data, roles, security.
Along with these, we also need to check all downstream channels that are dependent on this new system. This will make sure all down-streams get data as they request.
There may be an ecosystem that also consumes data, that also needs to be checked.
Conclusion
Data reconciliation is an important area in a financial domain like Stock, Cash records, etc. Data should be tested thoroughly while data migration and reconciliation happen.
Core system migration gives the domains a competitive advantage and many business benefits.
As the full population was not tested, there used to be a potential issue.
If early detection of issues worked, the company can roll back and re-plan. It saves time, money.
If the data problem exists in the system due to poor control and lack of reconciliation, then the company used to lose, reputation, business (financial loss), trust.