powerpoint presentation -eco

Upload: nuraav

Post on 01-Mar-2016

14 views

Category:

Documents


1 download

DESCRIPTION

economics

TRANSCRIPT

  • Indian Institute of Management (IIM),Rohtak

    Recap and Discussion For Mid Term Exam

    Indian Institute of Management (IIM),Rohtak

    20 Objective Question : 20 minute

    Test : 120 Minute Total Duration:140 Minute

    Indian Institute of Management (IIM),Rohtak

    Basic E-R notation

    Relationship degrees specify number of entity types involved

    Entity symbols

    A special entity that is also a relationship

    Relationship symbols

    Relationship cardinalities specify how many of each entity type is allowed

    Attribute symbols

    A composite attribute

    An attribute broken into component parts

    Entity with multivalued attribute (Skill) and derived attribute (Years_Employed)

    Multivalued an employee can have more than one skill

    Derived from date employed and current date

  • Indian Institute of Management (IIM),Rohtak

    Both outpatients and resident patients are cared for by a responsible physician

    Only resident patients are assigned to a bed

    Indian Institute of Management (IIM),Rohtak

    Total specialization rule

    A patient must be either an outpatient or a resident patient

    Indian Institute of Management (IIM),Rohtak

    Partial specialization rule

    A vehicle could be a car, a truck, or neither

    Indian Institute of Management (IIM),Rohtak

    Disjoint rule

    A patient can either be outpatient or resident, but not both

  • Indian Institute of Management (IIM),Rohtak

    A part may be both purchased and manufactured

    Overlap rule Example of supertype/subtype hierarchy

    Indian Institute of Management (IIM),Rohtak

    Indian Institute of Management (IIM),Rohtak

    Primary Key Foreign Key (implements 1:N relationship between customer and order)

    Combined, these are a composite primary key (uniquely identifies the order line)individually they are foreign keys (implement M:N relationship between order and product)

  • Indian Institute of Management (IIM),Rohtak

    Onetomany relationship between original entity and new relation

    Multivalued attribute becomes a separate relation with foreign key

    Indian Institute of Management (IIM),Rohtak

    Manufactured ? Purchased ?

    Indian Institute of Management (IIM),Rohtak Indian Institute of Management (IIM),Rohtak

    Order_id Order_date

    Customer_id

    Customer_Name

    Customer_Address

    Product_id

    Product_details

    Product_finish

    Unit_price

    Ordered_qianitiy

    1006 10/24/2006

    2 Value Furniture

    Plano,TX 7 Dining Table

    Natural Ash

    800 2

    1006 10/24/2006

    2 Value Furniture

    Plano,TX 5 Writers Desk

    Chrry 325 2

    1006 10/24/2006

    2 Value Furniture

    Plano,TX 4 Entertainment Center

    Natural Maple

    650 1

    1007 10/25/2006

    6 Furniture Gallery

    Boulder Co

    11 4-Dr Dresser

    Oak 500 4

    1007 10/25/2006

    6 Furniture Gallery

    Boulder Co

    4 Entertainment Center

    Natural Maple

    650 3

    Identify Dependency and Relationship in Tabular form

  • Indian Institute of Management (IIM),Rohtak

    Functional Dependency Diagram

    Customer_ID Customer_Name, Customer_Address

    Product_ID Product_Description, Product_Finish, Unit_Price Order_ID, Product_ID Order_Quantity

    Order_ID Order_Date, Customer_ID, Customer_Name, Customer_Address

    Indian Institute of Management (IIM),Rohtak

    Functional Dependency Diagram

    Indian Institute of Management (IIM),Rohtak

    Identify Dependency and Relationship in Tabular form St_ID L_Name F_Name Phone_

    No St_ Lic

    Lic_No Ticket# Date Code

    Fine

    38249 Brown Thomas 111-7804

    FL BRY 123

    15634 10/17/08 2 $25

    38249 Brown Thomas 111-7804

    FL BRY 123

    16017 11/13/08 1 $15

    82453 Green Sally 391-1689

    AL TRE 141

    14987 10/05/08 3 $100

    82453 Green Sally 391-1689

    AL TRE 141

    16293 11/13/08 1 $15

    82453 Green Sally 391-1689

    AL TRE-141

    17892 12/13/08 2 $25

    Indian Institute of Management (IIM),Rohtak

    Identify Dependency and Relationship in Tabular form St_ID L_Name F_Name Phone_No St_

    Lic Lic_No Ticket# Date Code Fine

    38249 Brown Thomas 111-7804 FL BRY 123 15634 10/17/08 2 $25

    38249 Brown Thomas 111-7804 FL BRY 123 16017 11/13/08 1 $15 82453 Green Sally 391-1689 AL TRE 141 14987 10/05/08 3 $100 82453 Green Sally 391-1689 AL TRE 141 16293 11/13/08 1 $15 82453 Green Sally 391-1689 AL TRE-141 17892 12/13/08 2 $25

  • Indian Institute of Management (IIM),Rohtak

    Data Warehouse? Defined in many different ways, but not rigorously.

    A decision support database that is maintained separately from the

    organizations operational database

    Support information processing by providing a solid platform of

    consolidated, historical data for analysis.

    A data warehouse is a subject-oriented, integrated, time-variant, and

    nonvolatile collection of data in support of managements decision-

    making process.W. H. Inmon

    Data warehousing:

    The process of constructing and using data warehouses

    Indian Institute of Management (IIM),Rohtak

    Data Warehouse: A Multi-Tiered Architecture

    Data Warehouse

    Extract Transform Load Refresh

    OLAP Engine

    Analysis Query Reports

    Monitor &

    Integrator Metadata

    Data Sources Front-End Tools

    Serve

    Data Marts

    Operational DBs

    Other sources

    Data Storage

    OLAP Server

    Indian Institute of Management (IIM),Rohtak

    Conceptual Modeling of Data Warehouses Modeling data warehouses: dimensions & measures

    Star schema: A fact table in the middle connected to a set of dimension tables

    Snowflake schema: A refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension tables, forming a shape similar to snowflake

    Fact constellations: Multiple fact tables share dimension tables, viewed as a collection of stars, therefore called

    galaxy schema or fact constellation

    A Sample Data Cube Total annual sales of TV in U.S.A. Date

    C

    o

    u

    n

    t

    r

    y

    sum

    sum TV

    VCR PC

    1Qtr 2Qtr 3Qtr 4Qtr U.S.A

    Canada

    Mexico

    sum

  • Indian Institute of Management (IIM),Rohtak

    Typical OLAP Operations Roll up (drill-up): summarize data

    by climbing up hierarchy or by dimension reduction Drill down (roll down): reverse of roll-up

    from higher level summary to lower level summary or detailed data, or introducing new dimensions

    Slice and dice: project and select Pivot (rotate):

    reorient the cube, visualization, 3D to series of 2D planes

    Cube Aggregation: Roll-up day 2 s1 s2 s3p1 44 4

    p2 s1 s2 s3p1 12 50p2 11 8

    day 1

    s1 s2 s3p1 56 4 50p2 11 8

    s1 s2 s3sum 67 12 50

    sump1 110p2 19

    129

    . . .

    drill-down

    rollup

    Example: computing sums

    Indian Institute of Management (IIM),Rohtak 27

    Data Mining: A KDD Process

    Data miningcore of knowledge discovery process

    Data Cleaning

    Data Integration

    Databases

    Data Warehouse

    Task-relevant Da

    Selection

    Data Mining

    Pattern Evaluation

    Indian Institute of Management (IIM),Rohtak

    Data Mining Functionalities Concept description: Characterization and discrimination

    Generalize, summarize, and contrast data characteristics, e.g., dry vs. wet regions

    Association (correlation and causality)

    Bread Butter [0.5%, 75%]

    Classification and Prediction

    Construct models (functions) that describe and distinguish classes or concepts for future prediction

    E.g., classify countries based on climate, or classify cars based on gas mileage

    Presentation: decision-tree, classification rule, neural network

    Predict some unknown or missing numerical values

  • Indian Institute of Management (IIM),Rohtak

    Data Mining Functionalities (2)

    Cluster analysis Class label is unknown: Group data to form new classes, e.g.,

    cluster houses to find distribution patterns Maximizing intra-class similarity & minimizing interclass

    similarity Outlier analysis

    Outlier: a data object that does not comply with the general behavior of the data

    Noise or exception? No! useful in fraud detection, rare events analysis

    Other pattern-directed or statistical analyses

    Indian Institute of Management (IIM),Rohtak

    Association Rule a concept of Mining

    A `rule is something like this: If a basket contains Bread and Butter , then it also contains

    Milk Any such rule has two associated measures: 1. confidence when the `if part is true, how often is the

    `then bit true? This is the same as accuracy.

    Confidence (A )

    2. coverage or support how much of the database contains

    support(A B) =

    Indian Institute of Management (IIM),Rohtak

    Transaction ID Items Bought1 Trouser, Shirt, Jacket2 Trouser,Jacket3 Trouser, Jeans4 Shirt, Sweatshirt

    If the minimum support is 50%, then {Trouser, Jacket} is the only 2- itemset that satisfies the minimum support.

    Frequent Itemset Support{Trouser} 75%{Shirt} 50%{Jacket} 50%{Trouser, Jacket} 50%

    If the minimum confidence is 50%, then the only two rules generated from this 2-itemset, that have confidence greater than 50%, are: Trouser Jacket Support=50%, Confidence=66% Jacket Trouser Support=50%, Confidence=100%

    Indian Institute of Management (IIM),Rohtak

    The Apriori Algorithm: Basics

    Computational Complexity Given d unique items:

    Total number of possible association

  • Indian Institute of Management (IIM),Rohtak Indian Institute of Management (IIM),Rohtak

    Step 1: Generating 1-itemset Frequent Pattern

    In the first iteration of the algorithm, each item is a member of the set of candidate.

    The set of frequent 1-itemsets, L1 , consists of the candidate 1-itemsets satisfying minimum support.

    Indian Institute of Management (IIM),Rohtak

    Step 2: Generating 2-itemset Frequent Pattern [Cont.]

    Indian Institute of Management (IIM),Rohtak

    Step 2: Generating 2-itemset Frequent Pattern Itemset Sup.Count

    {I1} 6

    {I2} 7

    {I3} 6

    {I4} 2

    {I5} 2

  • Indian Institute of Management (IIM),Rohtak

    Step 3: Generating 3-itemset Frequent Pattern

    Indian Institute of Management (IIM),Rohtak

    Step 3: Generating 3-itemset Frequent Pattern [Cont.] Based on the Apriori property that all subsets of a frequent itemset

    must also be frequent, we can determine that four latter candidates cannot possibly be frequent. How ?

    For example , lets take {I1, I2, I3}. The 2-item subsets of it are {I1, I2}, {I1, I3} & {I2, I3}. Since all 2-item subsets of {I1, I2, I3} are members of L2, We will keep {I1, I2, I3} in C3.

    Lets take another example of {I2, I3, I5} which shows how the pruning is performed. The 2-item subsets are {I2, I3}, {I2, I5} & {I3,I5}.

    BUT, {I3, I5} is not a member of L2 and hence it is not frequent violating Apriori Property. Thus We will have to remove {I2, I3, I5} from C3.

    Therefore, C3 = {{I1, I2, I3}, {I1, I2, I5}} after checking for all members of result of Join operation for Pruning.

    Now, the transactions in D are scanned in order to determine L3, consisting of those candidates 3-itemsets in C3 having minimum support.

    Indian Institute of Management (IIM),Rohtak

    Step 3: Generating 3-itemset Frequent Pattern

    Indian Institute of Management (IIM),Rohtak

    Step 4: Generating 4-itemset Frequent Pattern

  • Indian Institute of Management (IIM),Rohtak

    Step 5: Generating Association Rules from Frequent Itemsets

    Indian Institute of Management (IIM),Rohtak

    Step 5: Generating Association Rules from Frequent Itemsets [Cont.]

    Indian Institute of Management (IIM),Rohtak

    Best Found Result @WEKA

    Best rules found: 1. I5=Y 2 ==> I1=Y 2 conf:(1) 2. I4=Y 2 ==> I2=Y 2 conf:(1) 3. I5=Y 2 ==> I2=Y 2 conf:(1) 4. I2=Y I5=Y 2 ==> I1=Y 2 conf:(1) 5. I1=Y I5=Y 2 ==> I2=Y 2 conf:(1) 6. I5=Y 2 ==> I1=Y I2=Y 2 conf:(1)

    Indian Institute of Management (IIM),Rohtak

    Criticism to Support and Confidence Strong Rules Are Not Necessarily Interesting

    Total transactions 10,000 C:computers, V: video

    V: 7,500 C: 6,000 C and V: 4,000 Min_support: 0.3 min_conf:0.60

    Consider the rule: Buy(X: computer) buy(X: video)

    Support : = 4000/10000 = 0.4 Confidence: P(C and V) /P(C) = 4000/6000 =%66 Strong BUT The probablity of buying a video is 0.75 buying a comuter reduces the probablity of buying a video From 0.75 to 0.66 Computer and video are negatively correlated

  • Indian Institute of Management (IIM),Rohtak

    Lift of A B Lift OR corrAB = : P(A and B)/P(A)*P(B) Ratio of probablity of buying A and B divided by buying A and B independently Or it can be interpreted as:

    Conditional probablity of buying B given that A is purchased divided by unconditional probablity of buying B

    taking both P(A) and P(B) in consideration

    P(A^B)=P(B)*P(A), if A and B are independent events

    A and B negatively correlated, if the value is less than 1;

    otherwise A and B positively correlated Indian Institute of Management (IIM),Rohtak

    4000

    3500

    2000

    500

    6000 4000

    7500

    2500

    10000

    V

    not V

    C not C

    From the table, we can see that the probability of purchasing a computer game is P({game}) = 0.60, the probability of purchasing a video is P({video}) = 0.75, and the probability of purchasing both is P({game, video}) = 0.40. According to the rule : P({game, video})/(P({game}) P({video})) = 0.40/(0.60 0.75) = 0.89. Because this value is less than 1, there is a negative correlation between the occurrence of {game} and {video}.

    Indian Institute of Management (IIM),Rohtak

    Process (1): Model Construction

    Training Data

    NAME RANK YEARS TENUREDMike Assistant Prof 3 noMary Assistant Prof 7 yesBill Professor 2 yesJim Associate Prof 7 yesDave Assistant Prof 6 noAnne Associate Prof 3 no

    Classification Algorithms

    IF rank = professor OR years > 6 THEN tenured = yes

    Classifier (Model)

    Indian Institute of Management (IIM),Rohtak

    Process (2): Using the Model in Prediction

    Classifier

    Testing Data

    NAME RANK YEARS TENUREDTom Assistant Prof 2 noMerlisa Associate Prof 7 noGeorge Professor 5 yesJoseph Assistant Prof 7 yes

    Unseen Data

    (Jeff, Professor, 4)

    Tenured?

  • Indian Institute of Management (IIM),Rohtak

    Decision Tree Induction: Training Dataset age income student credit_rating buys_computer

    40 low yes fair yes>40 low yes excellent no3140 low yes excellent yes40 low yes excellent no3140 low yes excellent yes40 low yes excellent no3140 low yes excellent yes

  • Indian Institute of Management (IIM),Rohtak

    age income student credit_rating buys_computer40 low yes fair yes>40 low yes excellent no3140 low yes excellent yes40 low yes excellent no

  • Indian Institute of Management (IIM),Rohtak

    For 40 low yes excellent no40 low yes excellent no40

    Indian Institute of Management (IIM),Rohtak

    age income student credit_rating buys_computer40 low yes fair yes>40 low yes excellent no40

    Indian Institute of Management (IIM),Rohtak

    age income student credit_rating buys_computer40 low yes fair yes>40 low yes excellent no40

  • Indian Institute of Management (IIM),Rohtak

    For >40

    age?

    overcast

    ?? ??

    40

    yes

    31..40

    student?

    no yes

    yes no

    age income student credit_rating buys_computer40 low yes fair yes>40 low yes excellent no40,medium,no,fair,?

    Most programming languages and calculators do not have a log2 function. Use a conversion factor Take log function of 2, and divide by it. Example: log10(2) = .301 Then divide to get log2(n): log10(3/5) / .301 = log2(3/5)

    Indian Institute of Management (IIM),Rohtak 63

    Bayes Classifier: Training Dataset age income studentcredit_rating_comp

    40 low yes fair yes>40 low yes excellent no3140 low yes excellent yes

  • Indian Institute of Management (IIM),Rohtak

    Bayes Classifier: An Example X = (age

  • Indian Institute of Management (IIM),Rohtak

    So, we go to the second point (2, 5) and we will calculate the distance to each of the three means, by using the distance function:

    point mean1 x1, y1 x2, y2 (2, 5) (2, 10) (a, b) = |x2 x1| + |y2 y1| (point, mean1) = |x2 x1| + |y2 y1| = |2 2| + |10 5| = 0 + 5 = 5

    point mean2 x1, y1 x2, y2 (2, 5) (5, 8) (a, b) = |x2 x1| + |y2 y1| (point, mean2) = |x2 x1| + |y2 y1| = |5 2| + |8 5| = 3 + 3 = 6

    point mean3 x1, y1 x2, y2 (2, 5) (1, 2) (a, b) = |x2 x1| + |y2 y1| (point, mean2) = |x2 x1| + |y2 y1| = |1 2| + |2 5| = 1 + 3 = 4

    (2, 10) (5, 8) (1, 2) Point Dist Mean 1 Dist Mean 2 Dist Mean 3 Cluster A1 (2, 10) 0 5 9 1 A2 (2, 5) 5 6 4 3 A3 (8, 4) A4 (5, 8) A5 (7, 5) A6 (6, 4) A7 (1, 2) A8 (4, 9)

    we fill in the rest of the table, and place each point in one of the clusters:

    (2, 10) (5, 8) (1, 2) Point Dist Mean 1 Dist Mean 2 Dist Mean 3 Cluster A1 (2, 10) 0 5 9 1 A2 (2, 5) 5 6 4 3 A3 (8, 4) 12 7 9 2 A4 (5, 8) 5 0 10 2 A5 (7, 5) 10 5 9 2 A6 (6, 4) 10 5 7 2 A7 (1, 2) 9 10 0 3 A8 (4, 9) 3 2 10 2

    Cluster 1 Cluster 2 Cluster 3 (2, 10) (8, 4),(5,8)(7,5)(6,4)(4,9) (2, 5)(1,2)

    Indian Institute of Management (IIM),Rohtak

    Next, we need to re-compute the new cluster centers (means). We do so, by taking the mean of all points in each cluster. For Cluster 1, we only have one point A1(2, 10), which was the old mean, so the cluster center remains the same. For Cluster 2, we have ( (8+5+7+6+4)/5, (4+8+5+4+9)/5 ) = (6, 6) For Cluster 3, we have ( (2+1)/2, (5+2)/2 ) = (1.5, 3.5)

    The initial cluster centers are shown in red dot. The new cluster centers are shown in red x.

    Indian Institute of Management (IIM),Rohtak

    That was Iteration1 (epoch1). Next, we go to Iteration2 (epoch2), Iteration3, and so on until the means do not change anymore. In Iteration2, we basically repeat the process from Iteration1 this time using the new means we computed.

    Indian Institute of Management (IIM),Rohtak

  • Indian Institute of Management (IIM),Rohtak