DATA MIGRATION
Data Migration is a critical issue in all implementation projects. By starting early this will not become a bottleneck in the project. This is why we review technical requirements and data volumes at the beginning of the project. This is also why we provide you with the tools and skilled consultants to speed up data conversion.
Data integrity is never an issue, thanks to our full-integration concept, which has taken more than a decade to achieve to where it is today. An investment to ensure data integrity. Stand-alone software, which requires cumbersome interfaces with other third-party software, is easy to develop. However, full integration is a science that takes years of R&D effort to perfect.
Conversion activities are customized for specific applications and environment. Generally, the conversion process should encompass the following, which breakdown into various activities in below flow diagram:
1.Refining Data Conversion Strategy
2.Identify Data Conversion Specification
Data conversion strategy will be put into work from this stage onwards. Legacy and target database and their structure should be determined by studying all information required to be converted.
It is believed that the process cannot be successful without user participation. Thus, the application of this method requires the users to describe the data structures of the linearized input and output files and to specify the mappings between the source and target data.
Input data for the new application will be taken from old application. Conversion mapping is a convenient way to list the data conversion requirements, which will then become input to the conversion plan and to the development of conversion programs. The conversion mapping will list the new (or target) data that is required when the new application begins operation. For each piece of new data, a corresponding source must be identified. Finally, because it is important that only valid data is accepted in the new application, the conversion mapping will have to list the validation rules for accepting or rejecting each new piece of data.
Activities to be carried out under this stage are:
Document Key Considerations
Identify possible document error and exception handling.
Document the controls and checkpoint approach.
Identify the risks and dependencies, and document the mitigation strategy.
Identify Sources of Data
Identify the different sources, systems, and designs.
Map the data from the design of these sources to the target database.
Identify Integrity Constraints
Identify and document the integrity constraints of the data. Take into account the design of the sources and of the target database. These constraints will add to the Conversion Mapping deliverable.
Identify Coexistence and Interface Requirements
Identify the issues related to coexistence of data
Identify requirements for the subsystem based on these issues, additions to the requirements resulting from the data conversion approach and procedures, and on the Conversion Mapping deliverable.
3.Developing Conversion Programs
A data conversion subsystem design is the design of the subsystem that will execute the conversion of the data from old/legacy applications to new applications. This includes extraction, cleanup, and downloading data.
Activities to be carried out under this stage are:
Document Overall Data Conversion Flow
Define the flow of the data conversion subsystem, containing extraction, cleansing, summarizing and loading. Include time constraints, interaction with personnel executing the data conversion, etc.
Design Extraction Programs
Design the programs that extract data from the different sources. Make sure that all data is in the same format and can be used for cleansing. Then define the modules that make up these programs.
Design Cleansing Programs
Design programs responsible for resolving any inconsistencies in the data related to syntax, definitions, and formats.
The issues that might arise during this step can be divided into three basic categories: syntactic, structural, and semantic.
Syntactic issues are surface differences, usually the result of typing or data entry. Typical problems in this area include spelling errors (e.g., Abdullah vs. Abdulah), reference inconsistencies (e.g., 1st vs. First), uncleansable data (e.g., Roland Garros vs. French Open).
Structural issues refer to the case where internal representation of data is inconsistent between sources. Typical problems include definition inconsistencies (e.g., "m" and "f" vs. "0" and "1" for male and female), different character sets (e.g., Swedish vs. American character sets), data disagreement (e.g., clarifying rows with non-unique keys, by reusing a customer ID), data type inconsistencies (e.g., four-byte floating point vs. eight-byte floating point for unit price), and significant figures (e.g. sales figures with two decimal places vs. sales figures with three decimal places).
Semantic problems occur when the program user's interpretation of the data may differ, or when the disagreement among source records requires a human to resolve, such as inconsistent definitions (e.g., sales figures including returns vs. sales figures excluding returns), old information (e.g., one customer has two different addresses, which one is most current?), optional fields (e.g., optional spouse name cannot be cleansed automatically, because it might not have any information), free-form text fields (e.g., free-form customer name containing full names, with and without initials, vs. only customer's last name).
Design Loading Programs
Design programs for loading the data from the current structures (and any intermediate structures) to the future structures of the databases.
Option to be used for loading the finished data into the database will be via relational database management system (RDBMS) which require creation of flat files with the cleansed data, and use the RDBMS's own load utility. The advantage of this approach is that many RDBMS vendors provide enhanced load utilities which employ fast loading or specialized loading techniques.
4.Data Conversion Process
Mapped information and data conversion program will be put into use during this period. Duration and timeframe of this process will depend on:
Amount of data to be migrated
Number of legacy system to be migrated
Resources limitation such as server performance
Error which were churned out by this process
The conversion error management approach aims to reject all records containing a serious error as soon as possible during the conversion approach. Correction facilities are provided during the conversion; where possible, these will use the existing amendment interface.
Errors can be classified as follows:
Fatal errors - which are so serious that they prevent the account from being loaded onto the database. These will include errors that cause a breach of database integrity; such as duplicate primary keys or invalid foreign key references. These errors will be the focus of data cleansing both before and during the conversion. Attempts to correct errors without user interaction are usually futile.
Non-fatal errors - which are less serious. Load the affected error onto the database, still containing the error, and the error will be communicated to the user via a work management item attached to the record. The error will then be corrected with information from user.
Auto-corrected errors - for which the offending data item is replaced by a previously agreed value by the conversion modules. This is done before the conversion process starts together with user to determine values which need to be updated.
One of the important tasks in the process of data conversion is data validation. Data validation in a broad sense includes the checking of the translation process per se or checking the information to see to what degree the conversion process is an information preserving mapping.
Some of the common verification methods used will be:
Financial verifications (verifying pre- to post-conversion totals for key financial values, verify subsidiary to general ledger totals) - to be conducted centrally in the presence of accounts, audit, compliance & risk management;
Mandatory exceptions verifications and rectifications (on those exceptions that must be resolved to avoid production problems) - to be reviewed centrally but branches to execute and confirm rectifications, again, in the presence of network management, audit, compliance & risk management;
Detailed verifications (where full details are printed and the users will need to do random detailed verifications with legacy system data) - to be conducted at branches with final confirmation sign-off by branch deployment and branch manager; and
Electronic files matching (matching field by field or record by record) using pre-defined files.
Here we have described a methodology and model for data conversion. The approach assumes that the legacy and target systems play an important role in the conversion process by transforming the source data into files to be used as inputs to the conversion system and transforming the conversion system output into desirable target formats.
Last but not the least, one of the key to success is to prepare for it very early on, and continue to monitor data migration throughout the life of the project. In addition, a devoted team with a clearly defined project plan from the inception of the project, armed with automated tools where applicable, is indeed the formula to success.