understanding data warehousing and data mining

Upload: tms4u

Post on 05-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 Understanding Data Warehousing and Data Mining

    1/7

    Understanding Data warehouses and Data Mining

    Understanding Data warehouses and Data Mining

    Student: Hiren L. Patel

    Professor : Dr. Jones

    Homework Unit 3

    Everest University, North Campus.

    Date: May 26, 2012

  • 7/31/2019 Understanding Data Warehousing and Data Mining

    2/7

    Understanding Data warehouses and Data MiningUnderstanding Data Warehousing and Data Mining.

    Abstract

    Over the years, many large organizations have accumulated massive amounts of data about their

    customers, suppliers, products, and services. Even many new Web-based companies have

    amassed large databases about people and products as they have grown. The WWW is itself a

    large distributed data repository with untold potential. With the growing realization that these

    vast data resources can be tapped for significant commercial gain, interest in data mining, data

    warehousing has virtually exploded.

    Data Warehousing,

    Data warehousing is a collection of decision support technologies, aimed at enabling the

    knowledge workers such as executive, manager, analysts to make better and faster decisions.

    Data warehousing technologies have been successfully deployed in many industries such as

    manufacturing for order shipment and customer support, retail for user profiling and inventory

    management, financial services for claims analysis, risk analysis, credit card analysis, and fraud

    detection, transportation (for fleet management), telecommunications (for call analysis and fraud

    detection), utilities (for power usage analysis), and healthcare (for outcomes analysis). This

    paper presents a roadmap of data warehousing technologies, focusing on the special

    requirements that data warehouses place on database management systems (DBMSs).

    1. Database and relation

    A data warehouse is a subject-oriented, integrated, time- varying, non-volatile collection of data

    that is used primarily in organizational decision making. Typically, the data warehouse is

  • 7/31/2019 Understanding Data Warehousing and Data Mining

    3/7

    Understanding Data warehouses and Data Miningmaintained separately from the organizations operational databases. There are many reasons for

    doing this. The data warehouse supports on-line analytical processing (OLAP), the functional

    and performance requirements of which are quite different from those of the on-line transaction

    processing (OLTP) applications traditionally supported by the operational databases .

    OLTP applications typically automate clerical data processing tasks such as order entry and

    banking transactions that are essential day-to-day operations of an organization. These tasks are

    structured and repetitive, and consist of short, atomic, isolated transactions.

    Decision support Back End Tools and Utilities

    Data warehousing systems use a variety of data extraction and cleaning tools, and load and

    refresh utilities for populating warehouses. Data extraction from foreign sources is usually

    implemented via gateways and standard interfaces (such as Information Builders EDA/SQL,

    ODBC, Oracle Open Connect, Sybase Enterprise Connect, Informix Enterprise Gateway).

    The Need for Data Warehousing

    The majority of databases are designed to hold the current data needed by an organization to

    perform its business activities. In a business organization, current data might include information

    concerning bills due, inventory levels, and product orders, and would most likely be contained in

    a billing/inventory/order database. In most cases, the minute that data become outdated, they are

    deleted from the database. For example, once a bill is paid, data about the bill is removed.

    Fortunately, many organizations have realized the value of being able to analyze historical data

    in order to discover patterns of behavior and predict future trends. For example, analyzing

  • 7/31/2019 Understanding Data Warehousing and Data Mining

    4/7

    Understanding Data warehouses and Data Mininghistorical data can tell a retailer what items were ordered, in what quantities, and by which

    customers.

    One of the keys to understanding the value of databases is to understand how one database,

    whether it is current or historical, can be related to another. If you think about it, it makes good

    business sense to relate customer data to inventory data (because customers place orders that

    affect inventory), and inventory data to supplier data (because suppliers provide inventory

    items). We could name many more examples like this. The problem with most databases is they

    are not designed to be accessed simultaneously in this fashion.

    Data Mining (DM)

    Data mining, also known as "knowledge discovery," refers to computer-assisted tools and

    techniques for sifting through and analyzing these vast data stores in order to find trends,

    patterns, and correlations that can guide decision making and increase understanding. Data

    mining covers a wide variety of uses, from analyzing customer purchases to discovering

    galaxies. In essence, data mining is the equivalent of finding gold nuggets in a mountain of data.

    The monumental task of finding hidden gold depends heavily upon the power of computers. Data

    Mining employs techniques from statistics, pattern recognition, and machine learning. Many of

    these methods are also frequently used in vision, speech recognition, image processing,

    handwriting recognition, and natural language understanding. However, the issues of scalability

    and automated business intelligence solutions drive much of and differentiate data mining from

    the other applications of machine learning and statistical modeling. Data mining refers to using

    a variety of techniques to identify nuggets of information or decision-making knowledge in

    bodies of data, and extracting these in such a way that they can be put to use in the areas such as

  • 7/31/2019 Understanding Data Warehousing and Data Mining

    5/7

    Understanding Data warehouses and Data Miningdecision support, prediction, forecasting and estimation. The data is often voluminous, but as it

    stands of low value as no direct use can be made of it; it is the hidden information in the data that

    is useful Data mining is concerned with the analysis of data and the use of software techniques

    for finding patterns and regularities in sets of data. It is the computer which is responsible for

    finding the patterns by identifying the underlying rules and features in the data. The idea is that it

    is possible to strike gold in unexpected places as the data mining software extracts patterns not

    previously discernable or so obvious that no-one has noticed them before.

    DATA MINING TOOLS

    The best of the best commercial database packages are now available for data mining and

    warehousing including IBM DB2, INFORMIX-On Line XPS, ORACLE9i, Clementine,

    Intelligent Miner, 4 Thought and SYBASE System 10.

    Applications of Data Mining

    Data mining includes a variety of interesting applications. A few examples are listed below:

    By recording the activity of shoppers in an online store, such as Amazon.com, over time,retailers can use knowledge of these patterns to improve the placement of items in the

    layout of a mail-order catalog page or Web page.

    Telephone companies mine customer billing data to identify customers who spendconsiderably more than average on their monthly phone bill. The company can then

    target these customers to sell additional services.

    Marketers can effectively target the wants and needs of specific consumer groups byanalyzing data about customer preferences and buying patterns.

  • 7/31/2019 Understanding Data Warehousing and Data Mining

    6/7

    Understanding Data warehouses and Data Mining Hospitals use data mining to identify groups of people whose healthcare costs are likely

    to increase in the near future so that preventative steps can be taken.

    Data Mining Summarized

    In summary, the purpose of DM is to analyze and understand past trends and predict future

    trends. By predicting future trends, business organizations can better position their products and

    services for financial gain. Nonprofit organizations have also achieved significant benefits from

    data mining, such as in the area of scientific progress.

    The concept of data mining is simple yet powerful. The simplicity of the concept is deceiving,

    however. Traditional methods of analyzing data, involving query-and-report approaches, cannot

    handle tasks of such magnitude and complexity.

    Conclusion:

    Data mining, data warehousing are designed to assist individuals and organizations in managing

    and extracting meaning from enormous amounts of data. Data mining is used to analyze data sets

    and predict future trends. Data warehouses and data marts are used to store and analyze historical

    data in order to make better decisions and predictions about the future. The purpose of many of

    these activities and approaches is to relate data sets to each other, group related data together,

    and ensure the ability of users to access the data they need. Data is a resource that, in many

    cases, can be tapped for greater understanding and insight.

  • 7/31/2019 Understanding Data Warehousing and Data Mining

    7/7

    Understanding Data warehouses and Data MiningReferences

    Haag & Cummings, ninth edition, Management Information Systems

    http://www.olapcouncil.org

    http://pwp.starnetinc.com/larryg/articles.html

    Han, J. and M. Kamber (2000). Data Mining: Concepts and Techniques, Morgan Kaufmann.

    http://www.newagepublishers.com/samplechapter/001329.pdf