Definition of database is an electronic store of data. Basic terms used to describe a structure of a database as entity, data, attributes, entity set and relationship between entities. Another definition of database is a special kind of software application whose main purposes is to help people, store programs, retrieve information and organizes information. A person, event, place, or item is called entity. The facts that describe an entity are known as data. Each of entity that are described by it characteristics are known as an attributes. All entity set is all related entities that are collected together to form. It set is given a singular name. The database is a collection of entity set. The entities in database are likely to interact with other entities. Relationships are interactions between the entity set. Relationship is a set of related entities, where it is one-to-one, one-to-many and many-to-many.
DATABASE MANAGEMENT SYSTEMS (DBMS)
It can be conclude as where DBMS software package such as Microsoft Access, Oracle, SQL Server, Visual Fox Pro and so forth. A user-developed an implemented database or databases includes a data dictionary and also other database objects. Data-entry forms, queries, blocks, and programs is such as a custom applications. Hardware is includes personal computer, minicomputers and mainframes in a network environment. An operating system and a network system is defines as software. This entire element of DBMS is can be mapping Figure 1.
What is Data Mining?
According a research done Data Mining and Data Warehouse by Mento, B and Rapple, B (2003) data mining been defines by the respondent as technology that used by the institution that 40% of respondent defined. But in the same research done by both author scopes respondent in the libraries believed data mining could be a valuable tool in facilitate library users for the next future technologies. Otherwise, based on research to others institutions which concluded that these large repositories of full text and numeric data would offer data mining opportunities that would gives an advantage from expertise found in libraries. This author also included a definition that defines from First International Conference on Knowledge Discovery and Data Mining which is "data mining is the process of selection, exploration, and modelling of large quantities of data to discover regularities or relations that are at first unknown with the aim of obtaining clear and useful results for the owner of the database".
According to Kantardzic, M. (2003) another author data mining is which compare definition by verbs means to mining operations that extract from the Earth her hidden and a point of view in scientific research its means a relatively new disciplines that has developed mainly from studies carried out in other disciplines. As for statisticians, they saw data mining as "data fishing', 'data dredging' or 'data snooping'. Data mining aiming is to examine databases for regularities that may lead to be understanding of the domain describe by database. As known database is an organised and typically large collection of details data facts that concern domain in the world. Other definition by another author been given, some defines as an iterative process within which progress is defined by discovery, through either automatic or by manual methods. Data mining also the most useful in an explanatory analysis scenario in which there are no predetermined notions about what will constitute an interesting outcome. Search for new, valuable, and nontrivial information in large volumes of data consider as data mining. It is cooperative effort of humans and computers. Best of result are achieved by balancing the knowledge of human experts in describing problems and goals with the search capabilities of computers.
What is Data Warehouse?
Data warehouse defines as a collection of integrated databases designed and a subject-oriented to sustain the decision-support functions (DSF), which is each unit of data, is relevant to some moment in time. Although, data warehouse means a different things to different people, it is relates to limited to data, others refer to people, processes, software, tools and data. One of the functions is to store the historical data of an organization in an integrated manner that reflects the various facets of the organization and business. Data warehouse can be viewed as an organization's repository of data, set up to support strategic decision-making. Even data in data warehouse is not update but used only to respond to queries from end to users who are decision-makers. Two aspects in data warehouse is specific types of data in terms of classification and the set of transformations used to prepare the data in final touch that is useful in decision making.
Data mining concepts can be looks at the definition which related to "process" that relies in the notion of matching problem to technique. It is also not simply a collection of tools that isolating each completely and waiting to be matched to problem. Jiawei, Han. (2006) has stated some general experimental procedure adapted to data-mining problems which involves the following steps:
- State problem and formulate hypothesis: modeller usually specifies a set of variables for unknown dependency and if possible a general form of this dependency as an initial hypothesis. It also required a combination expertise of an application domain and data mining model at the first steps.
- Collect data: involves data-generation that first approach as designed experiment (under control of modeller) and observational approach which is more to assuming most data mining application includes setting, namely and random data generation.
- Preprocessing data: which in preprocessing it will involves data that at least has two common tasks as outlet detection and scaling, encoding and selecting features.
- Estimate model: involves of selection and implementation of an appropriate data mining techniques as the main. Process of estimating model is not straight forwarding based on several models and selecting the best one is an additional task.
- Interpret model and draw conclusions: models is needed to be interpretable in order to be helpful where goals of accuracy of the model and accuracy of its interpretation are somewhat contradictory. Simple model are more interpretable but also less accurate. Data mining methods expected to yield highly accurate results using high-dimensional models. Good understanding of the whole process is important for any of successful application. It can be figure as above:
Data warehouse is not a prerequisite for data mining, especially for some large companies, is made easier by having access to a data warehouse. The primary goal of data warehouse is to increase the "intelligence" that involves in decision making process and knowledge. Data warehouse hold a huge and a billion of records are stored. There are two important aspects that should be understood of its design process that is the specific types (classification) of data storage in a data warehouse and a second is the set of transformations used to prepare data in the final form. Categories of data in data warehouse where the classification is accommodated to time-dependent data sources are detailed data, current detail data, lightly summarized data, highly summarized data and metadata.
There are four main categories in transformation and each of it has its own characteristics:
- Simple transformations: manipulation of data that focused on one filed at a time. Without taking into account its value in related field.
- Cleaning and scrubbing: a proper formatting of address information, including checks for valid values in a particular field, usually checking the range or choosing from an enumerated list.
- Integration: a process of taking operational data from one or more sources and mapping it, field by field, onto a new data structure in data warehouse. This situation occurs when there a multiple system sources for the same entities and there is no clear way to identify those entities as the same.
- Aggregation and summarization: A method of condensing instances of data found in the operational environment into fewer instances in warehouse environment. Summarization is a simple addition of values along one or more data dimensions while aggregation refer to additional of different business elements into a common total and it is a highly domain-dependent.
Data warehouse can be a point solution that been used to satisfy a specific need. Common data resource has a number of functional groups. Although its look easier in implementing with minimal data modeling effort. A data warehouse has to be faithful to such embedded data meanings. Data warehouse also consume substantial investment in time and funding.
Characteristic of data warehouse can be summarized in three-stage data-warehousing development process that includes modeling, building and deploying. Firstly, modeling is in a simple terms where to take time to understand business processes, the information requirements of these processes and the decisions that are currently made within processes. Building is a stage to establish requirements for tools that suit the types of decision support necessary for the targets business process. It also to create a data model that helps further define information requirements and also decompose problem into data specifications and the actual data store, which will in its final form, represent either a data mart or comprehensive data warehouse. Deploying is a stage where to implement in early in the overall process, the nature of the data to be warehoused and several of business intelligence tools to be employed to begin by training users.
Data in data warehouse is able to be used for many different purposes, including waiting and sitting for future requirements which are unknown today. Data warehouse is oriented to major subject areas of the corporations that have been defined in the high-level corporate data model including account, customer, product, transaction or activity, and policy.
TOOLS AND TECHNIQUES
BENEFITS OF DATA MINING AND DATAWAREHOUSE
Benefits of data warehouse can be concluded as below:
- Support strategic decision making: by providing summary and detail data that can be used for trend analysis, statistical analysis, performance measurement comparisons, correlation among disparate facts and other similar requirements.
- Support integrated business value chain: by supporting a single source of authoritative, accurate, consistent and timely data that cuts across traditional departmental applications where opportunity exist to provide consistently-defined data and reduce redundant efforts.
- Empower workforce by access to data empowers business users and improves analysis capabilities. This is enable users to be more self-sufficient and reduces the dependence on time-consuming secialized report development. It will enable organizational streamlining by simplify data flows enabled by better access to shared data.
- Speeds up response time to business queries: it enable faster response to business questions. Response time for data retrieval can be reduced from days to minutes.
- Data quality: where a consolidated data store will eliminate reconciliation of inconsistent data. Analysis and transformation of source data to the data warehouse, data quality improvements can be made.
- Document's organizational knowledge: a well documented and centralized data stored reduce organizational vulnerability caused by concentrating analysis expertise and the understanding of data in a few staffers with institutional knowledge.
- Streamlines systems portfolio: helps streamline systems by removing decision support functions and moving historical data out of operational systems into data warehouse. It can help to address legacy system deficiencies and support the transition to a new client/server platform.