In early ages, theory of databases centered around a single database serving every purpose known to the information processing from transaction to batch processing and from transaction to analytical processing. In most cases, the primary focus of the early database systems was operational. In recent years, a more sophisticated system in database has come in existence. One that serves operational needs and another that serves informational or analytical needs. To some extent, this more enlightened system of the database is due to the advancement 4GL technology, and the empowerment of the end user. This research paper is about the decision support system environment and the structuring of data in that environment. The focus of the paper is on how to create data warehouse. In this paper I shall discuss the integration of the data to build a data warehouse because there is no point to build a data warehouse without integrating data bases. Book store has number of databases which are unstructured and non integrated. There are a lot of problems associated with integration of databases which are not structured. I shall discuss ETL software in this paper which can be used to integrate such data to build a data warehouse from the operation environment of book shop. This paper focuses on the development life cycle, hardware utilization, data warehouse maintenance in book selling shop.
Worldwide Internet book selling organization is a large organization having several unstructured databases. To increase their market position they have decided to build a data warehouse for analytical purpose. This internet book selling organization currently uses a relational, object-relational and object oriented database management systems. In order to build its data warehouse worldwide internet book selling organization will need to integrate its sock levels, prices, suppliers and customers databases.
Data warehouse is oriented to the major subject areas of the organization. So typical subject areas include the following:
- Stock level
Data warehouse is not built at all at once. It is a slow timely process. In this scenario on first day existing system data will be operating and on second day first few tables of the first subject area of the data warehouse will be populated. On third day more data warehouse will be populated. Now its time when a decision support system analyst will be attracted towards the system. On fourth day some more data warehouse will be populated. Some data of operational system will become permanent in data warehouse and will be used for analytical purposes. On fifth day departmental databases which will be data marts will start to blossom. Departments will find its easy to get processing done by data warehouse. On sixth day data warehouse will start working fine but not perfectly. And after some days its architecture will be fully developed.
There is another factor involved in building data warehouse for book selling organization. And that is granularity. There are different levels of granularity. A single transaction that includes the sale of a book will be on lower granularity level while sales over month will be the higher level of granularity. Granularity of data has always been a major design issue. In early operational systems, granularity was taken for granted. When detailed data is being updated, it is almost a given that data be stored at the lowest level of granularity. In the data warehouse environment, though, granularity is not assumed.
Benefits of Granularity:
The granular data found in the data warehouse is the key to reusability, because it can be used by many people in different ways. For example, within a corporation, the same data might be used to satisfy the needs of marketing, sales, and accounting. All three departments look at the basic same data. Marketing may want to see sales on a monthly basis by geographic district, sales may want to see sales by salesperson by sales district on a weekly basis, and finance may want to see recognizable revenue on a quarterly basis by product line. All of these types of information are closely related, yet slightly different. With a data warehouse, the different organizations are able to look at the data as they wish to see it.
Looking at the data in different ways is only one advantage of having a solid foundation. A related benefit is the ability to reconcile data, if needed. Once there is a single foundation on which everyone relies, if there is a need to explain a discrepancy in analyses between two or more departments, then reconciliation is relatively simple.
Another related benefit is flexibility. Suppose that marketing wishes to alter how it looks at data. Having a foundation in place allows this to be accomplished easily. Another benefit of granular data is that it contains a history of activities and events across the corporation. And the level of granularity is detailed enough that the data can be reshaped across the corporation for many different needs. But perhaps the largest benefit of a data warehouse foundation is that future unknown requirements can be accommodated. Suppose there is a new requirement to look at data, or the state legislature passes a new law. There is a constant stream of new requirements for information because change is inevitable. With the data warehouse in place, the corporation can easily respond to change. When a new requirement arises and there is a need for information, the data warehouse is already available for analysis, and the organization is prepared to handle the new requirements.
Some Facts: (Cost)
There is no getting around the fact that data warehouses cost money. Data, processors, communications, software, tools, and in fact everything costs money. Actually the volumes of data that is aggregated and collected into the data warehouse will go well beyond everything the corporation had ever seen. The level of detail and the history of that detail all add up to a large amount of money. In almost every other aspect of information technology, the major investment for a system lies in creating, installing, and establishing the system. The ongoing maintenance costs for a system are miniscule compared to the initial costs. However, establishing the initial infrastructure of the data warehouse is not the most significant cost the ongoing maintenance costs far outweighs the initial infrastructure costs. There are several good reasons why the costs of a data warehouse are significantly different from the cost of a standard system
Problems in Existing System:
There could be some problems in existing system, which are as follows
- Lack of data credibility.
- Problems with productivity
- Inability to transform data to information
Lack of data credibility:
The lack of data credibility is something like two different departments want to deliver data report to management. One department claims different percentage of activity and other claims different. It shows the lack of data credibility in existing system. It means management receives the conflicting reports.
Problem with productivity:
There could be some problems with data productivity for analytical purposes. They would have to hire system analysts and programmers to assist them with data productivity. That's why it is necessary to build a new data warehouse.
Inability to transform data to information:
In book selling store case another problem with their current system might be the inability to transform data into information, because they are using unorganized way of storing their information into databases. And these databases are not properly integrated.
When bringing data from operational databases to data warehouse it is essential that data should be integrated properly. If data is unintegrated it will be inconsistent. It means data will give inconsistent analysis.
Benefits of Data warehouse:
There are lots of benefits concerned with the book selling shop if they convert their old database management system to the data warehouse.
First of all building the data warehouse will use the software development life cycle (SDLC). Following are the steps to SDLC of data warehouse
- Implement warehouse
- Integrate data
- Test for biasness
- Program against data
- Design decision support system
- Analyze results
- Understand requirements
Patterns of Hardware Utilization in data warehouse and existing system:
There will be a cost reduction because of hardware usage pattern. In book shop's current system hardware usage pattern is consistent e.g. if someone sends a query or sends a transaction to the system it will use all the resources and if the system is idle even then it will use the whole resources. But in data warehouse the scenario is different. Resource are utilized only when they are needed. It means it will help to reduce the cost.
There will be some more benefits related to data in data warehouse which are as follows
It means data in data warehouse will always be correct in all the way. E.g. there will be no missing or incorrect or different data types values in it. It will be restructured as well. It will be monitored every time by the data analysts. And as it's a warehouse so obviously it will be indexed for fast searching.
One of the most powerful uses of a monitor is to be able to compare today's results against an 'average' day. Once the current day is measured, it can then be compared to the average-day profile.
Typically this data warehouse will focus on the finance, marketing and sales criteria. It means that sometimes an organization can access to a piece of information in 10 with data warehouse and the other organization can gain access to the same information in 1000 without data warehouse.
Homogenous/heterogeneous data in data warehouse:
In first sight it seems that data in data warehouse is homogenous but the fact is something different, the fact is data in data warehouse is really very heterogeneous. In this store case it has many subject areas that are price, customers, books, time etc.
Data warehouse is a vital solution for the building decision support system. In my whole paper I have discussed all the benefits, problems, drawbacks and main points of building a data warehouse in a large scale world wide internet book selling store. My major points were with concerned to the book store. I have discussed the cost as a problem and as a benefit as well. I have discussed the potential future benefits of the data warehouse to be build in this store. They can use it for forecasting etc.
- Dodge, Gary, and Tim Gorman. Oracle8i Data Warehousing. New York: John Wiley & Sons. 2000.
- Dyche, Jill. The CRM Handbook. Reading, MA: Addison Wesley. 2001. E-data: Turning Data into information with Data Warehousing. Reading, MA: Addison Wesley. 2000.
- English, Larry. Improving Data Warehouse and Business Information Quality. New York: John Wiley & Sons. 1999.
- Hackathorn, Richard. Web Farming for the Data Warehouse, San Francisco: Morgan Kaufman. 1998.
- Inmon, W.H. and R.H. Terdeman. Exploration Warehousing. New York: John Wiley & Sons. 2000.
- Kachur, Richard. Data Warehouse Management Handbook. Englewood Cliffs, NJ: Prentice Hall. 2000.
- Kelly, Sean. Data Warehousing: The Key to Mass Customization. New York: John Wiley & Sons. 1996.
- Kimball, Ralph, and Richard Merz. The Data Webhouse Toolkit. New York: John Wiley & Sons. 2000.
- Love, Bruce. Enterprise Information Technologies. New York: John Wiley & Sons. 1993.
- Marco, David. Meta Data Repository. New York: John Wiley & Sons. 2000.Parsaye, Kamran and Marc Chignell. Intelligent Database Tools and Applications. New York: John Wiley & Sons. 1989.
- Silverston, Len. The Data Model Resource Book Volume I. New York: John Wiley & Sons. 2001.
- Sullivan, Dan. Document Warehousing and Text Mining. New York: John Wiley & Sons. 2001.
- Swift, Ron. Accelerating Customer Relationships. Englewood Cliffs, NJ: Prentice Hall. 2000.
- Tannenbaum, Adrienne. Metadata Solutions. Reading, MA: Addison Wesley. 2002.