ppt devika

Upload: khydevi

Post on 07-Apr-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 Ppt Devika

    1/15

    PATTERN SEQUENCE

    MINING

    Presented By:

    DEVIKA MITTAL

    O915CS081019

  • 8/3/2019 Ppt Devika

    2/15

    CONTENTS

    Some terminology

    association rule

    sequential pattern

    sequence Database support

    What is Sequential Pattern Mining?

    Challenges ?Algorithms

    Applications

  • 8/3/2019 Ppt Devika

    3/15

    association rule

    the rule can be Buy A=)Buy B.

    mining does not take the time stamp into

    account,

    NOTE:

    If we take time stamp into account then we

    can get more accurate and useful rules

    such as: Buy A implies Buy B within a week,or usually people Buy A every week.

    make more sound decisions.

  • 8/3/2019 Ppt Devika

    4/15

    Sequential pattern

    It is a sequence of itemsets that frequently

    occurred in a specific order, all items in the

    same itemsets are supposed to have the

    same transaction time value or within a timegap.

    transactions of a customer are together

    viewed as a sequence

  • 8/3/2019 Ppt Devika

    5/15

    Sequence Database

    sequence database S is shown with minsupport= 2

    set ofitems in the database is {aa,b,c,,d,e,f,g}

    A sequence {a,(abc)(ac)d(cf)}

    has five elements.

    It is also a 9 sequence

    since there are 9 instance

    in sequence

    Sequence Id Sequence

    10 {a,(abc)(ac)d(cf)}

    20 {(ad)c,(bc)(ae)}

    30 {(ef)(ab)(df)cb}

    40 {eg(af)cbc}

  • 8/3/2019 Ppt Devika

    6/15

    Support Support, a customer support a sequence s if

    s is contained in the correspondingcustomer-sequence, the support of sequence s is

    dened as the fraction of customers whosupport this sequence.

    Support(s) = Number of support customers

    Total number of customers

  • 8/3/2019 Ppt Devika

    7/15

    What Is Sequential Pattern

    Mining? Given a set of sequences, find the complete set of

    frequentsubsequences.

    sequential pattern mining is trying to find the

    relationships between occurrences of sequential

    events, to find if there exist any

    specific order of the occurrences.

    Sequential pattern mining is the process

    of extracting certain sequential patterns

    whose support exceed a predefined

    minimal support threshold.

  • 8/3/2019 Ppt Devika

    8/15

    Example..

    From a book store's transaction databasehistory, we can find the frequent sequential

    purchasing patterns,

    for example 80% customers who brought thebook Database Management typically boughtthe book Data Warehouse and then broughtthe book Web Information System with

    certain time gap.

  • 8/3/2019 Ppt Devika

    9/15

    Types:

    string mining:

    used in biology, to examine gene and proteinsequences

    primarily concerned with sequences with a singlemember at each position.

    Itemset mining: used more often in marketing

    concerned with multiple-symbols at each position.

    popular approach to text mining.

    http://en.wikipedia.org/wiki/Genehttp://en.wikipedia.org/wiki/Proteinhttp://en.wikipedia.org/wiki/Text_mininghttp://en.wikipedia.org/wiki/Text_mininghttp://en.wikipedia.org/wiki/Proteinhttp://en.wikipedia.org/wiki/Gene
  • 8/3/2019 Ppt Devika

    10/15

    Challenges on Sequential Pattern

    Mining A huge number of possible sequential patterns are

    hidden in databases

    A mining algorithm should

    find the complete set of patterns, whenpossible, satisfying the minimum support

    (frequency) threshold

    be highly efficient, scalable, involving only asmall number of database scans

    be able to incorporate various kinds ofuser-

    specific constraints

  • 8/3/2019 Ppt Devika

    11/15

    Sequential Pattern Mining

    Algorithms Apriori-based Approaches

    GSP

    SPADE

    sequential pattern mining methodsfollow the

    methodology of Apriori encounters problems when a sequence

    database is large Pattern-Growth-based Approaches

    FreeSpan PrefixSpan

    substantially reduces the size of projected databases and leadsto efficient processing.

  • 8/3/2019 Ppt Devika

    12/15

    Applications

    Applications of sequential pattern mining

    Customer shopping sequences: First buy computer, then CD-ROM, and then digital

    camera, within 3 months.

    Medical treatments, natural disasters (e.g.,

    earthquakes), science & eng. processes, stocks

    and markets, etc.

    Telephone calling patterns, Weblog click streams

    DNA sequences and gene structures

  • 8/3/2019 Ppt Devika

    13/15

    CONCLUSION:

    Still more improvements are likely to be done.

    Balance and more clarity for results.

    More research is needed.

    In essence, the database need a way to store

    more pages, combat data, and still provide (or

    attempt to provide) pertinent results.

  • 8/3/2019 Ppt Devika

    14/15

    THANK YOU

  • 8/3/2019 Ppt Devika

    15/15

    ANY QUERY..???