data mining and data visualization

Upload: dr-singh

Post on 06-Apr-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 Data Mining and Data Visualization

    1/28

    Data Mining andData Visualization

    Prof. Rushen Chahal

  • 8/3/2019 Data Mining and Data Visualization

    2/28

    A Picture is Worth a

    Thousand WordsData mining is the set of activities used to find

    new, hidden, or unexpected patterns in data.

    These techniques are often called knowledge

    data discovery (KDD), and include statisticalanalysis, neural or fuzzy logic, intelligent

    agents or data visualization.

    The KDD techniques not only discover useful

    patterns in the data, but also can be used to

    develop predictive models.

  • 8/3/2019 Data Mining and Data Visualization

    3/28

    Verification Versus Discovery

    In the past, decision support activities were

    primarily based on the concept of verification.

    This required a great deal of prior knowledge

    on the decision-makers part in order to verifya suspected relationship.

    With the advance of technology, the concept

    of verification began to turn into discovery.

  • 8/3/2019 Data Mining and Data Visualization

    4/28

    Data Minings Growth in Popularity

    One reason is that we keep getting more and

    more data all the time and need tools to

    understand it.

    We also are aware that the human brain hastrouble processing multidimensional data.

    A third reason is that machine learning

    techniques are becoming more affordable

    and more refined at the same time.

  • 8/3/2019 Data Mining and Data Visualization

    5/28

    Making Accurate Predictions with

    Data MiningAlthough the literature containsstatements such as data mining willallow us to predict who will buy a

    particular product, that is againsthuman nature.

    In situations where data mining is usedto predict response to a marketingcampaign, only about 5% of the peopleselected as likely respondents actuallydo respond.

  • 8/3/2019 Data Mining and Data Visualization

    6/28

    Making Accurate Predictions with

    Data Mining (cont.)

    Although the accuracy of predicting

    individual behavior is not so good, it isbetter than it seems, since direct

    marketing efforts often have hit rates

    of only about 1% without data mining.

  • 8/3/2019 Data Mining and Data Visualization

    7/28

    Online Analytical Processing (OLAP)

    1. Multidimensional view

    2. Transparent to user

    3. Accessible

    4. Consistent reporting

    5. Client-server

    architecture

    6. Generic dimensionality

    7. Dynamic sparse matrix

    handling8. Multiuser support

    9. Cross-dimensional ops

    10. Intuitive manipulation

    11. Flexible reporting

    12. Unlimited dimension and

    aggregation

    Codd developed a set of 12 rules for the

    development of multidimensional databases:

  • 8/3/2019 Data Mining and Data Visualization

    8/28

    OLAP as Implemented

    To date, it does not appear that any

    implementation exists that satisfies all 12

    rules.

    Some people argue it might not even bepossible to attain all of them.

    More recently, the term OLAP has come to

    represent the broad category of software

    technology that enables multidimensional

    analysis of enterprise data.

  • 8/3/2019 Data Mining and Data Visualization

    9/28

    Multidimensional OLAP (MOLAP)

    Data can be viewedacross severaldimensions. Here salesare arrayed by region andproduct.

    A fourth dimension couldbe added by using severalgraphs -- perhaps atdifferent time points.

    Most analyses have manymore dimensions thanthis. MOLAP handlesdata as an n-dimensionalhypercube.

    4

    3

    1

    0.3

    Product

    0.4

    0.5

    2

    0.6

    0.7

    2

    Sales

    1

    3Region

  • 8/3/2019 Data Mining and Data Visualization

    10/28

    Relational OLAP (ROLAP)

    A large relational database server replacesthe multidimensional one.

    The database contains both detailed and

    summarized data, allowing drill downtechniques to be applied.

    SQL interfaces allow vendors to build tools,both portable and scalable.

    This does require databases with manyrelational tables which may lead tosubstantial processor overhead on complex

    joins.

  • 8/3/2019 Data Mining and Data Visualization

    11/28

    A Typical Relational Schema

  • 8/3/2019 Data Mining and Data Visualization

    12/28

    Data Mining Technologies

    Statistics the most mature data mining

    technologies, but are often not applicable

    because they need clean data. In addition,

    many statistical procedures assume linearrelationships, which limits their use.

    Neuralnetworks, genetic algorithms, fuzzy

    logic these technologies are able to work

    with complicated and imprecise data. Theirbroad applicability has made them popular in

    the field.

  • 8/3/2019 Data Mining and Data Visualization

    13/28

    Data Mining Technologies (cont.)

    Decision trees these technologies are

    conceptually simple and have gained in

    popularity as better tree growing

    software was introduced. Because of

    the way they are used, they are perhaps

    better called classification trees.

  • 8/3/2019 Data Mining and Data Visualization

    14/28

    The Knowledge Discovery

    Search ProcessDefine the business problem and

    obtain the data to study it.

    Use data mining software to modelthe problem.

    Mine the data to search for patterns

    of interest.

  • 8/3/2019 Data Mining and Data Visualization

    15/28

    The Knowledge Discovery

    Search Process (cont.)Review the mining results and refine

    them by respecifying the model.

    Once validated, make the modelavailable to other users of the DW.

  • 8/3/2019 Data Mining and Data Visualization

    16/28

    New Applications for Data Mining

    As the technology matures, new applications

    emerge, especially in two new categories,

    text mining and web mining. Some text

    mining examples are: Distilling the meaning of a text

    Accurate summarization of a text

    Explication of the text theme structure Clustering of texts

  • 8/3/2019 Data Mining and Data Visualization

    17/28

    Web mining

    Web mining is a special case of text miningwhere the mining occurs over a website.

    It enhances the website with intelligent

    behavior, such as suggesting related links orrecommending new products.

    It allows you to unobtrusively learn theinterests of the visitors and modify their user

    profiles in real time.They also allow you to match resources to theinterests of the visitor.

  • 8/3/2019 Data Mining and Data Visualization

    18/28

    Current Limitations and

    Challenges to Data Mining

    Despite the potential power and value, datamining is still a new field. Some things thatthat thus far have limited advancement are:

    Iden

    tification

    of missin

    g in

    formation

    notall knowledge gets stored in a database

    Data noise andmissing values futuresystems need better ways to handle this

    Largedatabases andhigh dimensionalityfuture applications need ways to partition

    data into more manageable chunks

  • 8/3/2019 Data Mining and Data Visualization

    19/28

    3-6: Data Visualization:

    Seeing the Data

  • 8/3/2019 Data Mining and Data Visualization

    20/28

    Visual Presentation

    For any kind of high dimensional data set,

    displaying predictive relationships is a

    challenge.

    Shading is used to represent relative degreesof thunderstorm activity, with the darkest

    regions the heaviest activity.

  • 8/3/2019 Data Mining and Data Visualization

    21/28

    A Bit of History

    An early effort used sequences of two-

    dimensional graphs to add depth.

    Current virtual reality programs allow the user

    to step through a data set. Try going to arealtors website and taking a tour of a house

    up for sale.

  • 8/3/2019 Data Mining and Data Visualization

    22/28

    Human Visual Perception and

    DataV

    isualizationData visualization is so powerful because the

    human visual cortex converts objects into

    information so quickly.

    The next three slides show (1) usage ofglobal private networks, (2) flow through

    natural gas pipelines, and (3) a risk analysis

    report that permits the user to draw an

    interactive yield curve.

    All three use height or shading to add

    additional dimensions to the figure.

  • 8/3/2019 Data Mining and Data Visualization

    23/28

    Global Private Network Activity

    High Activity

    Low Activity

  • 8/3/2019 Data Mining and Data Visualization

    24/28

    Natural Gas Pipeline Analysis

    Note: Height shows total flow through compressor stations.

  • 8/3/2019 Data Mining and Data Visualization

    25/28

    An Enlivened Risk Analysis Report

  • 8/3/2019 Data Mining and Data Visualization

    26/28

    Geographical Information Systems

    A GIS is a special purpose database that

    contains a spatial coordinate system. A

    comprehensive GIS requires:

    1. Data input from maps, aerial photos, etc.2. Data storage, retrieval and query

    3. Data transformation and modeling

    4. Data reporting (maps, reports and plans)

  • 8/3/2019 Data Mining and Data Visualization

    27/28

    The Special Capabilities of a GIS

    In general, a GIS contains two types of data:

    Spatialdata: these elements correspond to auniquely-defined location on earth. They

    could be in point, line or polygon form.Attributedata: These are the data that will

    be portrayed at the geographicreferences established by spatial data.

    Example: Data from an opinion poll isdisplayed for multiple regions in the UnitedStates. Clicking on an area allows the userto drill down to the results for smaller areas.

  • 8/3/2019 Data Mining and Data Visualization

    28/28

    Telephone Polling Results

    Note: On the live map, clicking on an area allows the user

    to drill down and see results for smaller areas.