powerset explorer: a datamining application

67
1 Powerset Explorer: A Datamining Application Jordan Lee

Upload: jerzy

Post on 21-Jan-2016

31 views

Category:

Documents


0 download

DESCRIPTION

Powerset Explorer: A Datamining Application. Jordan Lee. Background. Background. PAST Datamining accomplished with human intuition. Background. PAST Datamining accomplished with human intuition PRESENT Computer aided with AI and brute force CPU cycles. Background. PAST - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Powerset Explorer: A Datamining Application

1

Powerset Explorer: A Datamining Application

Jordan Lee

Page 2: Powerset Explorer: A Datamining Application

2

Background

Page 3: Powerset Explorer: A Datamining Application

3

Background

PAST– Datamining accomplished with human intuition

Page 4: Powerset Explorer: A Datamining Application

4

Background

PAST– Datamining accomplished with human intuition

PRESENT– Computer aided with AI and brute force CPU cycles

Page 5: Powerset Explorer: A Datamining Application

5

Background

PAST– Datamining accomplished with human intuition

PRESENT– Computer aided with AI and brute force CPU cycles

FUTURE– Enter PowersetViewer….

Page 6: Powerset Explorer: A Datamining Application

6

Dataset

Page 7: Powerset Explorer: A Datamining Application

7

Dataset

Alphabet– Items that can be found in transactions– Eg. Apples, bread, chips

Page 8: Powerset Explorer: A Datamining Application

8

Dataset

Alphabet– Items that can be found in transactions– Eg. Apples, bread, chips

Transaction– Sets of items (unordered)– Eg. Tx1 = { Apples, Chips }– Eg. Tx2 = { Bread }

Page 9: Powerset Explorer: A Datamining Application

9

Dataset

Alphabet– Items that can be found in transactions– Eg. Apples, bread, chips

Transaction– Sets of items (unordered)– Eg. Tx1 = { Apples, Chips }– Eg. Tx2 = { Bread }

Transaction database– Collection of transactions (unordered, possibly repetitive)– Eg. Walmart transaction logs

Page 10: Powerset Explorer: A Datamining Application

10

Example Dataset

Student enrollment database

Page 11: Powerset Explorer: A Datamining Application

11

Example Dataset

Student enrollment database– Alphabet = courses

{ CPSC124, CPSC126, PHIL120, ANTH100, ENGL112 }

Page 12: Powerset Explorer: A Datamining Application

12

Example Dataset

Student enrollment database– Alphabet = courses

{ CPSC124, CPSC126, PHIL120, ANTH100, ENGL112 }

– Transaction = courses student is enrolled in #29389002 -> { CPSC 124, PHIL120, ENGL112 }

Page 13: Powerset Explorer: A Datamining Application

13

Example Dataset

Student enrollment database– Alphabet = courses

{ CPSC124, CPSC126, PHIL120, ANTH100, ENGL112 }

– Transaction = courses student is enrolled in #29389002 -> { CPSC 124, PHIL120, ENGL112 }

– Transaction DB = list of student course schedules

Page 14: Powerset Explorer: A Datamining Application

14

Example Dataset (cont’d)

72423298 5 676 1701 3046 3900 1327 38578546 7 175 178 1182 1701 3038 680 39127660625 5 326 676 1701 3038 390843359163 3 1177 1699 4317 26495781 6 676 1177 1701 3038 3900 4275 48536452 4 1699 2339 1327 2826 64251972 6 676 1177 1701 3038 3900 2549 23212318 5 676 1701 3040 3813 3900 19820119 5 104 676 1699 3038 3900 65954629 4 480 676 3040 3908 54392012 5 676 1701 3038 3813 3899 85833501 5 676 1699 3040 3813 3900 65136197 5 676 1699 3038 3900 2580

Page 15: Powerset Explorer: A Datamining Application

15

Why?

Why is this interesting?

Page 16: Powerset Explorer: A Datamining Application

16

Why?

Why is this interesting?– Consumer transaction logs -> trends in consumer

buying

Page 17: Powerset Explorer: A Datamining Application

17

Why?

Why is this interesting?– Consumer transaction logs -> trends in consumer

buying– Student enrollment database -> trends in

enrollment What electives do most undergrad computer science

students take? Departments can determine which joint majors would fit

the student population.

Page 18: Powerset Explorer: A Datamining Application

18

Why? (cont’d)

Dataset sizes growing exponentially

Page 19: Powerset Explorer: A Datamining Application

19

Why? (cont’d)

Dataset sizes growing exponentially– Human intuition has reached its limits

Page 20: Powerset Explorer: A Datamining Application

20

Why? (cont’d)

Dataset sizes growing exponentially– Human intuition has reached its limits– Require computers and AI (expensive)

Page 21: Powerset Explorer: A Datamining Application

21

Why? (cont’d)

Dataset sizes growing exponentially– Human intuition has reached its limits– Require computers and AI (expensive)– Information visualization can scale the power of

human intuition

Page 22: Powerset Explorer: A Datamining Application

22

Powerset Explorer

Code base from TreeJuxtaposer (Munzner)– AccordianDrawer package

Page 23: Powerset Explorer: A Datamining Application

TreeJuxtaposer

Page 24: Powerset Explorer: A Datamining Application

24

Powerset Explorer

Code base from TreeJuxtaposer (Munzner)– AccordianDrawer package

Goals

Page 25: Powerset Explorer: A Datamining Application

25

Powerset Explorer

Code base from TreeJuxtaposer (Munzner)– AccordianDrawer package

Goals– Focus + context exploration using grids

Page 26: Powerset Explorer: A Datamining Application

26

Powerset Explorer

Code base from TreeJuxtaposer (Munzner)– AccordianDrawer package

Goals– Focus + context exploration using grids– Guaranteed visibility

Page 27: Powerset Explorer: A Datamining Application

27

Powerset Explorer

Code base from TreeJuxtaposer (Munzner)– AccordianDrawer package

Goals– Focus + context exploration using grids– Guaranteed visibility– Marking of groups

Page 28: Powerset Explorer: A Datamining Application

28

Milestones Status Update

Page 29: Powerset Explorer: A Datamining Application

29

Milestones Status Update

#1 Completion of the basic visualization of a randomized database of small set size (~10)

Page 30: Powerset Explorer: A Datamining Application

30

Milestones Status Update

#1 Completion of the basic visualization of a randomized database of small set size (~10)

#2 Addition of a single level of “marking”.

Page 31: Powerset Explorer: A Datamining Application

31

Milestones Status Update

#1 Completion of the basic visualization of a randomized database of small set size (~10)

#2 Addition of a single level of “marking”. #3 Addition of multiple levels of “marking” (6)

Page 32: Powerset Explorer: A Datamining Application

32

Milestones Status Update

#1 Completion of the basic visualization of a randomized database of small set size (~10)

#2 Addition of a single level of “marking”. #3 Addition of multiple levels of “marking” (6) #4 Addition of background marking to demarcate

areas of sets containing different amounts of items.

Page 33: Powerset Explorer: A Datamining Application

33

Milestones Status Update

#1 Completion of the basic visualization of a randomized database of small set size (~10)

#2 Addition of a single level of “marking”. #3 Addition of multiple levels of “marking” (6) #4 Addition of background marking to demarcate

areas of sets containing different amounts of items. #5 Implement multiple constraints

Page 34: Powerset Explorer: A Datamining Application

34

Milestones Status Update

#1 Completion of the basic visualization of a randomized database of small set size (~10)

#2 Addition of a single level of “marking”. #3 Addition of multiple levels of “marking” (6) #4 Addition of background marking to demarcate

areas of sets containing different amounts of items. #5 Implement multiple constraints #6 Increase maximum possible dataset size to at

least 100.

Page 35: Powerset Explorer: A Datamining Application

35

Difficulties

Page 36: Powerset Explorer: A Datamining Application

36

Difficulties

Multiple constraints difficult to implement on current server-side dataminer

Page 37: Powerset Explorer: A Datamining Application

37

Difficulties

Multiple constraints difficult to implement on current server-side dataminer

Can not enumerate a powerset of alphabet size greater than 14 elements (integer = 32 bits)– Solution: use java class BigInteger

Page 38: Powerset Explorer: A Datamining Application

38

Difficulties

Multiple constraints difficult to implement on current server-side dataminer

Can not enumerate a powerset of alphabet size greater than 14 elements (integer = 32 bits)– Solution: use java class BigInteger

High CPU and memory usage– Solultion: upgrade computer! hack

Page 39: Powerset Explorer: A Datamining Application

39

Current Status

Reduced database8680433 3 0 7 5 2768129 2 6 4 6385608 5 1 9 10 9 11 147924 5 5 2 9 5 2 234140 3 11 4 8 4331093 4 4 6 0 0 3158394 5 12 1 12 5 4 5797538 6 11 4 3 13 12 4 6243191 1 5 5872060 4 3 8 9 6

Page 40: Powerset Explorer: A Datamining Application

40

Current Status

Property file– 0 CPSC 325 75.0 3

1 PHIL 327 84.0 1 2 ANTH 329 45.0 2 3 MATH 327 0.0 3 4 PSYC 328 0.0 1 5 ENGL 329 0.0 2 6 APSC 540 0.0 1 7 MECH 541 0.0 1 8 STAT 543 0.0 1 9 SPAN 201 71.0 1 10 FREN 258 76.0 2 11 ECON 260 84.0 1 12 LING 295 42.0 1 13 EECE 302 73.0 1

Page 41: Powerset Explorer: A Datamining Application

41

Page 42: Powerset Explorer: A Datamining Application

42

Page 43: Powerset Explorer: A Datamining Application

43

Page 44: Powerset Explorer: A Datamining Application

44

Page 45: Powerset Explorer: A Datamining Application

45

Page 46: Powerset Explorer: A Datamining Application

46

Page 47: Powerset Explorer: A Datamining Application

47

Page 48: Powerset Explorer: A Datamining Application

48

Page 49: Powerset Explorer: A Datamining Application

49

Page 50: Powerset Explorer: A Datamining Application

50

Page 51: Powerset Explorer: A Datamining Application

51

Page 52: Powerset Explorer: A Datamining Application

52

Page 53: Powerset Explorer: A Datamining Application

53

Page 54: Powerset Explorer: A Datamining Application

54

Page 55: Powerset Explorer: A Datamining Application

55

Page 56: Powerset Explorer: A Datamining Application

56

Page 57: Powerset Explorer: A Datamining Application

57

Page 58: Powerset Explorer: A Datamining Application

58

Page 59: Powerset Explorer: A Datamining Application

59

Page 60: Powerset Explorer: A Datamining Application

60

Page 61: Powerset Explorer: A Datamining Application

61

Page 62: Powerset Explorer: A Datamining Application

62

Page 63: Powerset Explorer: A Datamining Application

63

Page 64: Powerset Explorer: A Datamining Application

64

Page 65: Powerset Explorer: A Datamining Application

65

Page 66: Powerset Explorer: A Datamining Application

66

Page 67: Powerset Explorer: A Datamining Application

67

Questions?