Information quality is a requirement for effective business performance. Data quality and data integrity are very important in implementing database system. Poor qualities of data will cause problems for IT departments and business. Whereas, lack of data integrity will make the system inaccessible. The process of making quality data, must be started from the beginning of deploying the system. Checking and observing at the system from time to time will assist data integrity process.
There will be no use in storing data if the data has no quality in improving an organization's overall performance. According to Redmon (2008), any organization need to manage and improve their data. Data quality means the reliability and usefulness of data that fit the purpose of business or can be used for the planning, operation, and decision making of the business. Data quality does not require tools, skills, or money whereas requires discipline, proper orientation, leadership and determination to improve. Ensuring the quality of data that enters database and data warehouses is essential for users to have confidence in their system.
In order to manage the data, data warehouse and data specialist are needed. A data warehouse is a central repository for all or significant parts of the data that an enterprise's various business systems collect.
THE CHARACTERISTICS OF QUALITY DATA
In order to understand what lies beneath the meaning of quality data, a data professional need to have an understanding on the quality data characteristics.
Data should be sufficiently accurate and precise for the usage. To be reliable, the user may consider that data must be consistent, credible, and accurate. Although it may have multiple uses, the system administrator should capture it only once, example such as data about population and number of genders. It is important to have accurate data, because it determines right or wrong. Inaccurate data can effect the history.
The user must be able to access to the data. It also determines that the user has the means and privilege to get the data. In order to be accessible to the user, the data must be available and exists in any form that can be accessed.
Data requirements should be clear and precise based on information needs of the organization. It is important that the process of collecting data in line with organization requirements.
The captured data must be relevant and useable and fits the organizational purpose. This will require a series of reports or review of the requirements to reproduce, depends on the changes. To be useful, the data must be relevant significantly and fits requirements in making the decision.
The data must be believable and reliable to the user. To the extent, that the user can use the data as a decision input. Even though the data collection process changes from time to time, data should reflect steady and consistent data. It also means getting a confirm data, which should be authenticated by the source.
Capturing data as soon as possible after an event or activity. It must be available for the intended use within any period of time. Timeliness can be characterized by currency or the time when data was stored in the database. This is to support information needs and getting influence and support from the management.
The data must be useful, meaning the data can be used as an input to the user's decision-making process. It is also important for the user to be able to interpret the data. The users who understand the syntax and semantics of the data.
10 HABITS OF ENTERPRISE OF BEST DATA
It is important to focus on preventing errors the moment data is created which was done by enterprises with large scale of data. They develop interest on which data is most important, identifying and eliminating errors.
To prevent errors from the sources, Redmon (2008) initiates 10 habits of enterprises of best data. The habits are:
- Customer focus
- Process management
- Supplier management
- Continuous improvement
- Targets for improvement
- Clear management accountabilities
- Managing soft issues
- Broad, senior group leadership
It is essential because quality is in the eye of the customer. To plan a good and user-friendly system, most of the time needs the customers or end user requirements. Without understanding them, it is not impossible to satisfy them. To follow the habit is by listening and sitting by the customers, learning on their decision-making, what kind of data needed and what is the customer's level of quality and doing a proper documentation using "Customer Data Requirement" or "System Requirement Specifications". This will improve data quality.
By having the requirements, it is recommended to work backwards to the business process that creates the data and managed the workflow end to end. It may happen that the data creators have no idea. Good process managers should consult and address by sharing the requirements document. As a result, the employees will support on ideas to improve their work.
There are times that data created are outside the enterprise, at supplier base. The data quality leader needs to manage the data suppliers. The method of suppliers can help to improve quality.
Good measurements are useful in identifying areas for improvement. Statistical data helps on this manner. Top companies usually published the data quality statistics in order to get the stakeholders confidence.
Continuous improvement means the project has been accomplished. An improvement project involves in investigating the pattern of errors, selecting and identifying the problem and changing the business process to eliminate the cause. It is possible that the enterprise with best data will have no problem for starting and completing improvement projects.
Control is the managerial act on tracking the data creators or employees to always follow the policy and guidelines given. There are levels in helping the people enter data correctly, to prevent error from leaking downstream, to ensure the system working properly and to use the statistical control and audit trails to make counter the errors.
Setting target and achievable goals will improve the data and the process. Example in cutting error rate by percentage every year will help maintain the data quality.
Enterprises with best data recognize the importance of leadership especially on clear management accountabilities. It also means that everybody is accountable on the data they create and process. When these become a habit, one will immediately identify the error.
It is about organizational politics and views of the people in the organization. Data people might want 360-degree views of customers, but sales people may be reluctant to contribute because everything is money and chargeable.
Success of the most data quality program involves leadership. It means, if the leader with higher ranking, or the top management fully support the project, it will be easier to influence everybody to follow the business process, to get the financial budget especially on the maintenance of the project and etc.
DATA QUALITY TEAM
Data professionals challenges is not just by managing the quality and integrity of data, they also need to understand the financial lingua franca meaning how money is spent, the ROI (Return of Investment), and how will it help the business or organization to run smoothly.
Data quality initiatives need business and technical analysts as well as testers and executives who recognize the importance and value of good, clean and reliable data. A typical data quality team can vary in size depending on the complexity of the task. Consequently, staff composition and levels will vary accordingly.
A typical data quality team will comprise the following:
- Client executive
- Data quality team leader
- Business analyst (one per source system or business unit)
- Data owners (to act as super-users and data validators)
- Data extractor/system analyst/technical analyst (one per source system)
- Data modeler/data architect
- Data-cleaning specialist(s)
- Data quality trainer
- External data providers
DATA QUALITY TECHNOLOGY
Data quality technology can play a major responsibility in assuring correct project preparation and scope. By deploying data quality technology at the beginning of the project, the managing team can get a much clearer view of the legacy data. Here are some of the key activities where the technology can help to ease the uncertainty and budgetary risk and scoping analysis:
- Data profiling
- Data matching/merging
- Data de-duplication
- Data standardization/parsing
Data profiling technique is to help identifying the scope of data. It is also a useful technique to eliminate and identify data object, which is empty. By doing this, the migration specialists are able to project how much effort to map and migrate data from the target environment.
This facility is useful within consolidation projects such as a commercial merger or system rationalization. Linking disparate systems together via key data objects, after profiling the data, does the work. Data matching also uses additional functions such as parsing to improve the match/merge success rate. For example, by linking a number of products database together. This will create an advanced matching algorithm that helps the success of migration process.
Data de-duplication can be used widely during the scoping and resource estimation phase to analyze the precise amount of data objects in scope. It is similar to matching/merging. For example, by linking disparate customer or inventory databases and completing a de-duplication exercise, the project team can determine a far more accurate count of total business objects to be migrated than by simply counting the total number of records in each system.
This is required during migration process. After matching and de-duplicating data objects from multiple sources, 'data cleansing' will transform the data in some way that converting the original data to match the system data. Simply joining the two systems won't match successfully. However, data parsing can be used to breakdown a data element into its constituent parts, allowing the data in both systems to be joined together far more successfully.
Data integration is the technology that enables to provide reliable information, helping the organization to accomplish its target and maintaining continuous improvement and enhancing IT especially end user productivity. Mid size and large and organizational are enable to efficiently leverage their data resources with data integrity. To be successful, the organizations need to have the ability to analyze performance. Byun (2006) summarize the requirement for integrity control systems are essentials as follows:
- Control of information-flow will prevents higher integrity data from being contaminated (or influenced) by lower integrity data
- Data verification ensures that only verified data are provided to certain transactions
- Prevention of fraud and error is necessary to ensure that only valid data are introduced to information systems
- Autonomous data validation maintains and/or enhances integrity (or confidence) of data, independently from data access.
DATA INTEGRITY APPROACHES
There are many data created in an organization in different application or systems, by doing data integration, it allows consolidating the current data in the operational or production system and combining it with historical values.
There are two basic approaches of data integrity:
- Develop an in-house solution
- Acquire commercial offering
In House Development
Organizations that build and develop their own solutions, generally assign the project to the IT department. A programmer or a team of programmers, creates programs that are necessary to integrate all the data. It is good to develop if the source systems are well documented or easy to get the information, if not the programmers need to start the system development life cycle beginning with user requirement specifications and down streaming the process, identifying the integrity problem, bureaucratic problems, in house customers problem, so hence and so forth. And as any experienced programmer knows, to implement a project, need support and maintenance especially if file structure or process changes. Turnover rate of programmers is another factor to consider, teaching a new programmer is already a burden. Moreover, if the in-house development is uncoordinated, unmanageable task, more problems arise. The situation might defer if the organization is a software house.
Purchasing Commercial Data Integration Solutions
While data integration solution, already in the market for many years, it provide wide range of capabilities and solution for the organization. More over the employees are not burden with their own task and just maintains the system. Capabilities includes;
- Support for variety of data source, format, type and targets
- Integration with other commercial applications
- Extensive library codes and functions
- Data quality functionality
- Metadata integration with other tools
- Documentation, data tracking, hits reports, audit trails.
- Ability to satisfy future needs
- A variety of packaging price
Vendors usually offer data integration capabilities, which is more flexible, and not constraining the upcoming choice of databases and operating system. Many designs wide variety of data types not just SQL based.
Third party packaged software applications are integrated in the commercial data integration solution, and even avoid problems that occurs when modifying or creating data. It is liable to use enterprise application software because in the future, it will develop and expand more. The commercial solution should offer the package and facilitate the population of data warehouse and data mart.
By having extensive library codes and data transformation functions in the commercial integration products, capability of performing data transformation and aggregation is bigger. This will minimize the need of coding and code maintenance. The vendor will do this. Data integration staff can monitor the debugging process in an interactive manner.
Data quality is the most essential in data integration, the integration product will help to do data cleanse as the part of offering. Common features of these tools are Data profiling, Data auditing and Data conditioning.
The commercial products are designed to integrate and leverage the metadata. This is accomplished by meeting the requirements of standards such as Object Management Group of Common Data Warehouse Metamodel (OMG-CWM) that allows the data integration software metadata repository to exchange metadata with other third party design, which is CWM compliance.
Commercial vendors have the ability to document the conversion process, which is a necessary element for hit repots, audit trail and statistical result. Audit trail is important for the database administrator to keep track and monitoring the database. Statistical data is important to keep track the hits report used by the customers. This is to show the return of investment of the system.
The solution should have the ability for continuous improvement for future needs. Change of process is one to the factor for the commercial product to upgrade and improve their features. This is usually able if the maintenance agreements are done.
As the buyer or implementer, it is a need to choose appropriate features and packages that is offered. It is necessary to foresee the future, taking consideration of the organizations economic condition and the solutions whether offers multi user licenses or single user licenses price and packages. Implementing with vendors, customers will have a higher initial start-up cost than in house solutions.
IMPLEMENTING DATA INTEGRITY AND DATA QUALITY
ENSURING DATA QUALITY
To ensure data quality, certain steps need to be done.
First, perform a data quality health check. This is recommended before any major systems implementation. The tests are similar with the characteristic of data. This exercise examines if data associated with the implementation passes the following tests:
- Accuracy - error-free data entry, transformation, analytic operations, storage, distribution and application processes
- Completeness - data available in all relevant database records
- Consistency in both definition and treatment across the organization's information systems and databases
- Compliance with organization's business rules
Second, is to review the organization's processes for capturing, organizing, storing and accessing data. Do not to overlook, but the important activity is identifying who owns the data, who uses it, and how they use it. With this information, changes to business processes, source systems and business rules can be made.
Third, to continuously improve and maintain data quality, there is a need to establish data quality management principles, particularly in the areas of data quality monitoring, and training and education programmes.
IMPLEMENTING THE SYSTEM
- A joint client-vendor team launched a series of programmes involving people, process and technology to ensure that new system could go live with clean data.
- Third party involvement - Depends on the team or the business process. The decision to involve third parties needs careful consideration. Data confidentiality, legal requirements and customer sensitivity must be thought through. An interesting aspect of the project is the involvement of a third party to validate the accuracy of the data. Such organizations have the ability to determine the accuracy of information by a number of means:
- Their own collection of databases
- Data quality tools
- Defining a data standard across the legacy data;
- Identifying what data needed to be migrated in what time frame; identify legacy data sources
- Defining data-cleaning approach;
- Cleansing rules;
- Cleansing responsibilities; and
- Internal versus external validation.
- An integrated data model was designed to link its various legacy data sources together. The data was extracted, transformed and loaded into the new integrated data model. Data cleansing was performed by a combination of advanced data quality tools and processes. Common features of these tools are:
- Data profiling - automates source system profiling and analysis, and provides database recommendations. Helps reduce the time taken to analyze data source systems.
- Data auditing - validates data quality, ensuring that data complies with business rules. Analysis and trending capability to measure data quality over time.
- Data conditioning - identifies data inconsistencies and data needing standardization (lexical and syntactical). Corrects and validates data using pre-defined rules.
The database administrator must be able to detect errors in data quality control process. When this happen the administrator need to examine quality indicators such as collection method or the source of data itself. One of the main challenge is dealing with inconsistencies of data within the data integration system. Violation of integrity constrains may happen where the data stored in the local sources are supposed to be stored in global level. Applying data quality best practices to address data issues will help to process of implementing data quality and data integrity solutions. The biggest challenge is dealing with people. People tend to make mistakes again and again. The data management should tackle this problem by creating policy, making sure corrective action has been done, and maybe human personal touch skill of soft skill need to put into practice.