In order to address the issue of how data warehousing can help to inform and shape business intelligence, it is first important to understand exactly what data warehousing means. Having identified this issue, the essay will then examine the benefits that can be yielded from implementation (with focus on the case study of Tescos), and also examine the potential obstacles and challenges that impact upon effective implementation and use.
Turning first to a definition of the term ‘data warehousing', Inmon (1995) offers the following: a data warehouse is ‘an integrated, subject-orientated, time-varient, non-volatile collection of data that provides support for decision making'. It is worth briefly outlining the meaning of each element further. Integrated refers to the fact that a data warehouse is a centralised information source for the entire organisation, and therefore incorporates multiple sources and types of data. Integration refers to the collection and standardisation of these different forms. Subject-orientated means that ideally, a data warehouse is able to group the data within it according to the subject to which it refers. This then allows for queries to be made by diverse elements and departments within a business, with the knowledge that the data yielded will relate to their subject (ie, marketing, sales, distribution etc). Time-variant refers to the historical nature of the data, and can therefore be contrasted with operational data; rather than painting a picture of current performance, a data warehouse is able to present an image of data over a sustained period of time. In this way, it is useful for spotting trends and making future predictions. Finally, non-volatile refers to the permanent nature of the data within a data warehouse - once it is entered, it does not leave.
In defining a data warehouses, the benefits of such a structure are already apparent. These are perhaps best highlighted through the examination of the practical implementation of a data warehouse - in this case the Tesco ‘Crucible' data warehouse. The Tesco crucible database is arguably one of the most significant and successful data warehouses in the UK. Tesco implemented this warehouse in 1994 to incorporate the data yielded from their clubcard operation. The clubcard enabled Tesco to collect data on every purchase made by clubcard members, as well as general demographic data. Customers are incentivised to use this clubcard through the collection of points, which can be used for money off in store, or vouchers for other experiences. Whilst many view the scheme as a loyalty initiative, the clubcard and resulting database are far more significant. Firstly, the database allows for targeted marketing for each individual customer, by matching vouchers and postal promotion offers with customer preferences. Vouchers can either be for existing preference purchases - thereby encouraging the customer to return to the store, or they can be for related purchases, thereby encouraging the customer to switch brands. The database also paints a more general picture of demographic trends, allowing for a wider alignment of marketing strategy. This can range from TV and press advertising, to the arrangement of products in store.
As well as marketing issues, the database allows for the effective alignment of other operations. It can impact upon product choice and predictions regarding demand and therefore delivery requirements. In this way, Tesco are able to streamline supply chain operations and ensure sufficient stock (but not excessive stock) is kept at all times. The data also presents information regarding potential new markets that could be accessed, and even the location or expansion of stores. Furthermore, the database yields information that can be shared with suppliers, allowing for a relationship of open communication that benefits both parties (Fernie and Sparks, 2004). Humby and Hunt (2004) state that almost every action taken by Tesco is informed by the data held in the data warehouse, and has driven Tescos to become the market leader in the UK.
By examining the benefits yielded for Tesco, the advantages of a well implemented data warehouse become clear. The systems can deliver strategic advantages in terms of dynamic strategy (Nguyen, 2009), resources and procurement (Grant, 2003), competition and competitive strategy, reputation (Kimball and Caserta, 2004), collaboration, and analysis of the industry. Becker (2002) stresses that in specific relation to data warehousing, the above multiple advantages can be achieved because the data warehouse facilitates the gathering, standardisation and maintenance of data, and this therefore makes the data more accessible, accurate and complete for the entire corporate-wide business.
However, she presents an argument that states that often, data warehouses are poorly implemented and fail to deliver the competitive advantage that they should She states that one of the central issues comes about from the use of the term ‘warehouse' as a metaphor for the way in which an IT department organises its historical data. She states that warehouse carries with it mental imagery of a ‘large, cavernous building' that is crammed with materials. In such a structure, customers do not (indeed, cannot) browse the shelves with a shopping list looking for particular items. As such, the metaphor of a warehouse is ill-suited, because the ability of the user to search for data and find it arranged helpfully, logically and conveniently should be a primary concern in an effective data system. As such, Becker submits that a better metaphor would be a ‘data department store' with all the associations of retail convenience - easy access, logical organisation and support from friendly shop floor staff. More importantly, what users are likely to discover within a department store is more ‘finished' informational data, rather than huge reams of raw data - which is what one might expect from a warehouse. Of course, the two names can refer to exactly the same system, but using a name that conceptually invokes more user friendly service features may be important in guiding the way in which a ‘warehouse' is implemented.
Becker (2002) goes on to indicate that it is perhaps this conceptual mismatch that has led to the misapplication of datawarehousing in a number of situations. She states that ‘data warehousing has been applied without critical reflection on the cognitive mappings implied by the underlying metaphor, or the possible consequences of its use in practice. She uses the example of a business that failed to consider the user when implementing the system; instead of focusing on the queries that users were likely to put forward, and what their expectations of the system and IT support would be, the IT department in charge of implementing the system focused on the constant expansion of the data in the warehouse. Of course, these substantial amounts of data are virtually worthless if they cannot be effectively accessed by users in a way that will generate ‘information' that can effectively inform business practices (Abramowicz et al, 2002).
In order to address these problems, she states that two central elements must occur. Firstly, the architects of data warehouses must change their underlying concept of ‘data warehouse', towards a user-orientated view of a data-department store. Having engaged in this change of mind-set, they must then investigate in detail the way in which different organisational members (including operational users, business analysts and managers) will engage with the system, in order o assess the way in which the data will be used. This will then determine the type of data that is contained within the warehouse, the dimensions of data quality, and the way in which the data is accessed and utilised.
The conceptual framework that underpins the structure of a data warehouse is not the only issue that can limit the efficacy of a system in informing strategic business practice. Despite the success of the Tesco data warehouse outlined in the beginning of this essay, the original architects highlight a number of issues that present themselves when designing, implementing and maintaining such a system (Humby, Hunt and Phillips, 2004).
The first problem that they outline is the is the issue of format. When initially implementing the data warehouse, Tescos was not just creating a skeleton into which newly generated data could be slotted, but also directly inputting existing data. However, this data came from disparate sources, and came in different formats. A specific example was the need to integrate transactional records from 57 William Low stores, all of which used different point of sale methodologies. The question that then arises is how this data can be standardised into a single format. Rob et al (2008) state that in order to avoid a ‘data tangle' it is necessary to chose a format that can be applied throughout the organisation. Changing existing data to this newly decided format can be a timely exercise, but will yield significant benefits in terms of usability and speed in the long term. However, the format must be carefully considered, with particular concern for the wider implications of the adopted format throughout the business.
The second issue for implementation outlined by Humby, Hunt and Phillips is the issue of time. Data input (and the standardisation of data) is a time consuming process. Additionally, the older data becomes, the less relevant it is to informing operational strategy. As such, the time spent inputting old data must be carefully assessed, as it may prove to be an inefficient use of time. Humby Hunt and Phillips state that ‘if it takes six months to organise data and reach conclusions, you are already out of date'. The authors point out that the time frame in which data becomes obsolete (or at least of less practical use) varies from industry to industry. Therefore, when implementing a data system, data architects must be careful to assess how much back data will be practicably useful, and how the input time will affect the relevancy of the data.
The issue of time, and assessing the level of data required for ‘start-up' ties neatly in with the third ‘challenge' forwarded by Humby et al. They forward the argument that ‘big is not always beautiful', and as a result issues of scale need to be a significant consideration in order to make sure that the data warehouse is cost-effective, manageable and fit for purpose. However, they point out the key problem in attempting to assess the issue of scale. The relevancy of data can really only be assessed when it has been examined, and of course that data can only be examined if it is collected and examined. Therefore, the data system is likely to have to be wieldy before it can be streamlined, but this necessarily entails the use of significant memory and processing power.
The quality of the data that is contained within the warehouse is perhaps one of the most significant issues that can challenge a data warehouse and affect its ability to contribute to effective business analysis. The way in which data is captured will have an effect on the quality of the data (Simpsion and Witt, 2005); for instance, its relevancy and accuracy in answering a particular question. Often, there will be little that a data warehouse can do to improve the relevance and quality of this data; instead, the focus must be on carefully considering the focus of data collection methods. However, data entry is another point that can affect data quality. There are certain measures that can be used the validity of data - ensuring that the data is reasonable (for instance, ensuring that the input for a financial measurement is input numerically). The integrity of data can also be protected against through the careful elimination of bugs that might overwrite or change data entries (Alagic, 1986)
The fifth challenge presented by Humby et al is the cost of maintaining a data warehouse. They state that the analyst Ovum calculates that for every pound spent implementing a data warehouse, another £4 will be spent on maintenance and input. This is an ongoing cost, and must be carefully considered by a business wishing to implement a data warehouse. Without consideration of this, a company risks investing significant amounts in construction, and then being unable to sustain the practical and financial advantage that could be yielded by the system through an inability to afford the input of data that is vital to continued success. Ovum further reported that over two-thirds of businesses that implemented data warehouses fell foul of this issue.
The sixth and seventh issues presented by Humby et al relate to the culture in which the business will be implemented, and the corporate ego of those who have commissioned its implementation. With regard to culture, the authors point out that the data warehouse will be utilised by a wide cross section of the company. Firstly, it's the culture of the company orientated in such a way that it embraces progressive technologies and understands the benefits that will be yielded from warehouse implementation? (Bhansali, 2009). The second company culture issue relates to the hierarchy of data warehouse use? Which department feels they have the central claim to use and should have their needs prioritised? Does the system constitute an IT function or a marketing function? These issues must be resolved within the company if the system is to be used to full effect.
With regards to the issue of corporate ego, the authors outline the fact that those commissioning the system may not truly understand how it will benefit the business, or whether it constitutes an effective option. Instead, they are driven by a desire to ‘have the latest thing' and be seen to be innovative. Furthermore, problems relating to the ongoing costs of data warehouses mean that often, a data warehouse project can spiral out of control in relation to costs, yet corporate members will be unwilling to pull the plug on the operation. This is particular true when a ‘bottom up' design approach is being utilised in order to implement progressive and innovative technologies, when existing technologies may have been sufficient (Han and Kamber, 2006).
It is clear then that data warehouses have the potential to deliver a significant competitive advantage to a business, but that there are many obstacles to the implementation of such a system, which can result in an expensive and time consuming mistake that yields little advantage. When designing a data warehouse, multifaceted issues must be considered that relate to the relevancy and usability of the data.
Abramowicz, Kalczynski and Wecel (2002) Filtering the web to feed data warehouses, Springer
Alagic (1986) Relational Database Technology, Springer
Becker (2002) Data Warehousing and web engineering, Idea Group Inc
Bhansali (2009) Strategic Data Warehousing: Achieving Alignment with Business, CRC
Fernie and Sparks (2004) Logistics and Retail Management: Insights Into Current Practice and Trends from Leading Experts Page Publishers
Grant (2003) ERP & Date Warehousing in organizations: issues and challenges, Idea Group
Han and Kamber (2006) Data mining: concepts and techniques, Morgan Kaufmann
Humby and Hunt (2004) Scoring Points: How Tesco continues to in customer loyalty, Kogan Page
Inmon (1995) What is a Data Warehouse, Prism, Volume 1, Number 1
Kimball and Caserta (2004) The data warehouse ETL toolkit,: practical techniques for extracting, cleaning, conforming and delivering data
Nguyen (2009) Complex data warehousing and knowledge discovery for advanced retrieval development: Innovative Methods and Applications, Idea Group
Rob, Corenell and Crocket (2008) Database Systems, Cengage
Simpsion and Witt (2005) Data modeling essentials, Morgan Kaufmann