scalable data analysis (cis 602-02) - computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf ·...

47
D. Koop, CIS 602-02, Fall 2015 Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop

Upload: lamhanh

Post on 26-Mar-2018

224 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

D. Koop, CIS 602-02, Fall 2015

Scalable Data Analysis (CIS 602-02)

Data Visualization

Dr. David Koop

Page 2: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

Data Integration• It is rare to be able to analyze a single raw dataset without pulling in

other information • Need to have some way to tie datasets together • Has often meant creating a common schema for all data

- New database: warehousing - Mediation: virtual warehousing

2D. Koop, CIS 602-02, Fall 2015

Page 3: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

Virtual Data Warehouses

3D. Koop, CIS 602-02, Fall 2015

Mediated Schema

Query

S1 S2 S3

SSN Name Category 123-45-6789 Charles undergrad 234-56-7890 Dan grad … …

SSN CID 123-45-6789 CSE444 123-45-6789 CSE444 234-56-7890 CSE142 …

CID Name Quarter CSE444 Databases fall CSE541 Operating systems winter

… …

Semantic Mappings

Independence of:• source & location• data model, syntax• semantic variations• …

<cd> <title> The best of … </title> <artist> Carreras </artist> <artist> Pavarotti </artist> <artist> Domingo </artist> <price> 19.95 </price> </cd>

[A. Doan et al., 2012]

Page 4: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

Integrated Schema Example

4D. Koop, CIS 602-02, Fall 2015

Movie ( title , director , year , genre )Actors ( title , actor )

Plays ( movie , location , startTime )Reviews (title , rating , description )

Movies (name , actors , director ,

genre )

Cinemas (place , movie , start )

CinemasInNYC (cinema , title ,

startTime )

CinemasInSF (location , movie ,

startingTime )

Reviews (title , date , grade ,

review )

S 1 S 2 S 3 S 4 S 5

[A. Doan et al., 2012]

Page 5: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

Integration Costs• Either integration requires a significant process in defining mappings

between data sources and the general schema • Fully integrated data is definitely more useful • Is there a way to tradeoff between time spent integrating and the

utility of the data?

5D. Koop, CIS 602-02, Fall 2015

% Functional

100

Time (or cost)

Schema First

[M. Franklin et al., 2005]

Page 6: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

Integration Costs• Either integration requires a significant process in defining mappings

between data sources and the general schema • Fully integrated data is definitely more useful • Is there a way to tradeoff between time spent integrating and the

utility of the data?

5D. Koop, CIS 602-02, Fall 2015

% Functional

100

Time (or cost)

Schema First

[M. Franklin et al., 2005]

Dataspaces

Page 7: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

Figure 2. An example dataspace and the components of a dataspace system.

systems, but provides a new set of services over the aggregate ofthese systems, while remaining sensitive to the autonomy needs ofthe systems. Furthermore, we may have several DSSPs serving thesame dataspace – in a sense, a DSSP can be a personal view on aparticular dataspace.Catalog and Browse: The catalog contains information about allthe participants in the dataspace and the relationships among them.The catalog must be able to accommodate a large variety of sourcesand support differing levels of information about their structure andcapabilities. In particular, for each participant, the catalog shouldinclude the schema of the source, statistics, rates of change, accu-racy, completeness, query answering capabilities, ownership, andaccess and privacy policies. Relationships may be stored as querytransformations, dependency graphs, or sometimes even textual de-scriptions.Wherever possible, the catalog should contain a basic inventory

of the data elements at each participant: identifier, type, creationdate and so forth. It can then support a basic browse capability overthe combined inventory of all participants. While not a very scal-able interface, it can at least be used to answer questions about thepresence or absence of a data element, or determine which partici-pants hold documents of a particular type. Simple scripts run overthe participants can extend the capabilities of this interface. For ex-ample, computing and storing an MD5 hash of all data elementscan help identify duplicated holdings between participants.On top of the catalog, the DSSP should support a model-

management environment that allows creating new relationshipsand manipulate existing ones (e.g., mapping composition and in-version, merging of schemas and creating unified views of multiplesources).Search and Query: The component should offer the followingcapabilities:(1) Query everything: Users should be able to query any data itemregardless of its format or data model. Initially, the DSSP shouldsupport keyword queries on any participant. As we gain more in-formation about a participant, we should be able to gradually sup-port more sophisticated queries. The system should support grace-ful transition between keyword querying, browsing and structuredquerying. In particular, when answers are given to a keyword (orstructured) query, additional query interfaces should be proposedthat enable the user to refine the query.(2) Structured query: Database-like queries should be supportedon common interfaces (i.e., mediated schemas) that provide accessto multiple sources, or can be posed on a specific data source

(using its own schema) with the intention that answers will also beobtained from other sources (as in peer-data management systems).Queries can be posed in a variety of languages (and underlyingdata models) and should be reformulated into other data modelsand schemas as best possible, leveraging exact and approximatesemantic mappings.(3) Meta-data queries: The system should support a wide spec-trum of meta-data queries. These include (a) including the source ofan answer or how it was derived or computed, (b) providing times-tamps on the data items that participated in the computation of ananswer, (c) specifying which other data items in the dataspace maydepend on a particular data item and being able to support hypo-thetical queries (i.e., What would change if I removed data itemX?), and (d) querying the sources and degree of uncertainty aboutthe answers.A DSSP should also support queries locating data, where the

answers are data sources rather than specific data items. For exam-ple, the system should be able to answer a query such as:Where canI find data about IBM?, or What sources have a salary attribute?Similarly, given an XML document, one should be able to queryfor XML documents with similar structures, and XML transforma-tions that involve them. Finally, given a fragment of a schema or aweb-service description, it should be possible to find similar onesin the dataspace.(4) Monitoring:All of the above Search and Query services shouldalso be supported in an incremental form that can be applied in real-time to streaming or modified data sources. Monitoring can be doneeither as a stateless process, in which data items are consideredindividually, or as a stateful process, where multiple data items areconsidered. For example, message filtering is a stateless process,whereas windowed aggregate computation is stateful. Complexevent detection and alerting are additional functionalities that canbe provided as part of an incremental monitoring service.

Local store and index: A DSSP will have a storage and index-ing component for the following goals: (1) to create efficientlyqueryable associations between data objects in different partici-pants, (2) to improve accesses to data sources that have limitedaccess patterns, (3) to enable answering certain queries without ac-cessing the actual data source, and (4) to support high availabilityand recovery.The index needs to be highly adaptive to heterogeneous environ-

ments. It should take as input any token appearing in the dataspaceand return the locations at which the token appears and the roles ofeach occurrence (e.g., a string in a text file, element in file path, a

4 2005/10/28

Dataspaces

6D. Koop, CIS 602-02, Fall 2015

[M. Franklin et al., 2005]

Page 8: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

Dataspaces• Entities: structured databases, files, code, Web services, sensors…

- Different query capabilities, amount of structure, streaming • Relationships:

- Full schema mappings - A was manually created from B and C - A is a snapshot of B on a certain date - A and B reflect the same underlying physical entity (but are

different) - A was sent to me at the same time as B.

7D. Koop, CIS 602-02, Fall 2015

[M. Franklin et al., 2005]

Page 9: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

Dataspace Challenges• Lineage • Uncertainty • Human effort

8D. Koop, CIS 602-02, Fall 2015

Page 10: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

Pay-as-you-go Integration• iTrails: Add integration hints incrementally

- Provide general search over all data (e.g. graph) - Add integration semantics on top of the graph - Add more semantics as needed

9D. Koop, CIS 602-02, Fall 2015

Page 11: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

iTrails Example

10D. Koop, CIS 602-02, Fall 2015

▪ Trail for Implicit Meaning: “When I query for global warming, you should also query for Temperature data above 10 degrees”

▪ Trail for an Entity: “When I query for zurich, you should also query for references of zurich as a region”

201514

BEZHZH

Temperaturescity celsiusdateBern24-Sep

24-SepZurich25-SepUster

region

9ZHZurich26-Sep

[Salles et al., 2007]

Page 12: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

iTrails Example

10D. Koop, CIS 602-02, Fall 2015

▪ Trail for Implicit Meaning: “When I query for global warming, you should also query for Temperature data above 10 degrees”

▪ Trail for an Entity: “When I query for zurich, you should also query for references of zurich as a region”

201514

BEZHZH

Temperaturescity celsiusdateBern24-Sep

24-SepZurich25-SepUster

region

global warming zurich

9ZHZurich26-Sep

[Salles et al., 2007]

Page 13: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

iTrails Example

10D. Koop, CIS 602-02, Fall 2015

▪ Trail for Implicit Meaning: “When I query for global warming, you should also query for Temperature data above 10 degrees”

▪ Trail for an Entity: “When I query for zurich, you should also query for references of zurich as a region”

201514

BEZHZH

global warming → //Temperatures/*[celsius > 10]

Temperaturescity celsiusdateBern24-Sep

24-SepZurich25-SepUster

region

global warming zurich

9ZHZurich26-Sep

[Salles et al., 2007]

Page 14: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

iTrails Example

10D. Koop, CIS 602-02, Fall 2015

▪ Trail for Implicit Meaning: “When I query for global warming, you should also query for Temperature data above 10 degrees”

▪ Trail for an Entity: “When I query for zurich, you should also query for references of zurich as a region”

201514

BEZHZH

global warming → //Temperatures/*[celsius > 10]

Temperaturescity celsiusdateBern24-Sep

24-SepZurich25-SepUster

region

global warming zurich

9ZHZurich26-Sep

[Salles et al., 2007]

Page 15: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

iTrails Example

10D. Koop, CIS 602-02, Fall 2015

▪ Trail for Implicit Meaning: “When I query for global warming, you should also query for Temperature data above 10 degrees”

▪ Trail for an Entity: “When I query for zurich, you should also query for references of zurich as a region”

201514

BEZHZH

global warming → //Temperatures/*[celsius > 10]

Temperaturescity celsiusdateBern24-Sep

24-SepZurich25-Sep

zurich → //*[region = “ZH”]

Uster

region

global warming zurich

9ZHZurich26-Sep

[Salles et al., 2007]

Page 16: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

iTrails Example

10D. Koop, CIS 602-02, Fall 2015

▪ Trail for Implicit Meaning: “When I query for global warming, you should also query for Temperature data above 10 degrees”

▪ Trail for an Entity: “When I query for zurich, you should also query for references of zurich as a region”

201514

BEZHZH

global warming → //Temperatures/*[celsius > 10]

Temperaturescity celsiusdateBern24-Sep

24-SepZurich25-Sep

zurich → //*[region = “ZH”]

Uster

region

global warming zurich

9ZHZurich26-Sep

[Salles et al., 2007]

Page 17: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

iTrails• Can match and replace or add to the graph based on the trails • Need to be aware of infinite recursion: Multiple Match Coloring Alg. • Where do these trails come from?

- Existing collections (e.g. Wikipedia, ontologies) - Can this be automated? (e.g. data mining)

• Can we always represent data in graphs?

11D. Koop, CIS 602-02, Fall 2015

Page 18: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

Reading Responses• First two graded • Comments

12D. Koop, CIS 602-02, Fall 2015

Page 19: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

Reading Presentation Schedule

13D. Koop, CIS 602-02, Fall 2015

Date Topic Student (Pos.) Student (Neg.)9/29 Visualization Chaitanya Chandurkar Ramya Reddy Mara10/1 Statistics Pragnya Srinivasan Shakti Bhattarai10/6 Machine Learning Sumukhi Kappa Vishnu Vardhan Kumar Pallati10/8 Clustering Akeim Findlay Richard de Groof10/15 Databases Gursharanpreet Singh Kalesha Nagineni10/20 Databases Priya Vishnudas Shanbhag Shree Lekha Kakkerla10/22 Data Cubes Nilesh Bhadane Tanmay Thakar11/3 Natural Language Processing Jayeshkumar Vijayaraghavalu Sanjana Bhardwaj11/5 Cloud Computing Arsalan Aqeel Hafiz Zennia Sandhu11/10 Map Reduce Arpit Parikh Rutvi Dave11/12 General Cluster Computing Harshada Gorhe Mehmet Duman11/17 Streaming Data Anurag Dhirendra Singh Rishu Vaid11/19 Out of Core Algorithms Rameshta Reddy Kotha11/24 Graph Algorithms Dhvani Patel Hari Bharti12/1 Reproducibility Xiaochun Chen

If you need to switch, coordinate with another student and email me to approve

Page 20: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

Reading Presentations• No reading response is due if you are presenting • Both students should present a summary of the main ideas of the

paper - May be individual or coordinated with the other student - Remember background material and related (inc. future) work

• One student should present the positive view of the ideas presented in the paper

• One student should present the negative view of the ideas presented in the paper

• Be specific and concrete • Focus on the ideas not necessarily the formatting of the paper • If your point can be easily rebutted, it's probably not a good point

14D. Koop, CIS 602-02, Fall 2015

Page 21: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

Projects• Options:

- Data analysis on some existing data: think about the questions you want to try to answer

- Improve some technique for data analysis • Data Sources:

- Search the web for topics you're interested in - https://github.com/caesar0301/awesome-public-datasets - Local data

• If you are doing a research project in a particular area, let's try to work something out so that the course project relates

15D. Koop, CIS 602-02, Fall 2015

Page 22: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

DYKERBEACHPARK

WESTCHESTER

THE BRONX

NASS

AU

QUE

ENS

NASSAU QUEENS

QUEENSBROOKLYN

J a m a i c aB a y

Ha

r l em

Ri v e r

Ea s t R i v e r

E a s t Ri v e r

Lo

ng

I

sl a

nd

So

u

nd

Hu

ds

on

Ri

ve

r

14

AirTrain stops/terminal numbers

7

58

2/3

Q33 (ends Sep 7)

Q72Q47

M60

M60

M60M60Q47Q48Q70 LtdQ72

Q70 Ltd(starts Sep 8)

Q48

Q10

Q10

B15Q10

AIRTRAIN JFK

AIRTRAIN JFK

Q3

Q3

Q70 Ltd (starts Sep 8)

LIRR

LIRR

LIRR

LIRR

LIRR

Met

ro-N

orth

Metro-North

Metro-North

Met

ro-N

orth

LIRR

PATH

PATH

Amtrak

Amtra

k

Amtrak

Amtrak

NJTransit • Amtrak

PATH

weekends

Bowling GreenBowling Green44••55

Broad St Broad St J J••ZZ

Rector StRector StRR

World TradeWorld TradeCenterCenter

E

DeKalb AvB•Q•R

Hoyt St2•3

Clark S

t2• 3

Union StR

Carroll S

tF• G

Bergen

St

F• G

Broad St J•Z

York St

F

CityHallR

Rector StR

Franklin St1

Canal St1

Prince StN•R

Houston St1

14 St A•C•E

50 St1

50 StC•E

59 StColumbus Circle

A•B•C•D•1

66 St Lincoln Center

1

72 St1•2•3

79 St1

86 St1

96 St1•2•3

103 St1

CathedralPkwy

(110 St )1

116 St ColumbiaUniversity

1

137 StCity

College1

145 St1

157 St1

175 StA

181 StA

190 StA

Dyckman St A

238 St1

Norwood205 StD

Mosholu Pkwy4

Bedford Pk BlvdLehman College

4

Kingsbridge Rd4

Fordham Rd4

Allerton Av2•5

183 St4

Burnside Av4

176 St4

Mt Eden Av4

170 St4

174 St2•5

Bronx ParkEast

2•5

Pelham Pkwy2•5

Freeman St2•5

Simpson St2•5

E 180 St2•5

West Farms Sq

E Tremont Av2 •5

167 St4

161 St

Yankee Stadium

B •D •4

Van Cortlandt Park242 St

1

Lexington

Av/63 St

F

14 St–U

nion Sq

L• N• Q

• R• 4• 5• 6

L

• N• Q• R

3 Av

L 1 Av

L

8 St-NYUN•R

Christopher St

Sheridan Sq1

Canal StJ•N•QR•Z•6

Canal StA•C•E

Spring St6

Spring StC•E

W 4 St Wash Sq A•B•C•D•E•F•M

8 Av

L

QueensPlazaE •M

•R

69 St7

52 St7 46 St

Bliss St 740 St

Lowery St

733 St-Rawson St

7

61 StWoodside

7 •Q70 Ltd LGA Airport

(starts Sep 8)

36 StM•R

90 St–Elmhurst Av7

Junction Blvd 7•Q72 LGA Airport

103 St–Corona Plaza7

111 St 7•Q48 LGA Airport

Elmhurst

Av

M• R

Grand Av

Newtown

M

• R Woodhaven Blvd

M

• R 63 Dr–R

ego Park

M

• R• Q72 LG

A Airport

Forest Hills

71 Av

E• F• M• R

75 Av

E• F Bria

rwood

Van W

yck Blvd

E• F

Sutphin BlvdF

Parsons Blvd F

169 StF

Jamaica

Van Wyc

kE

Kew Gardens

Union Tpke

E

• F67 Av

M

• R

21 StQueens-

bridgeF

39 AvN •Q

Steinway StM•R

46 St M

•R

Northern Blvd

M•R

65 St

M•R

74 St–Broadway

Q70 Ltd LGA Airport • 7

82 St–Jackson Hts7

36 AvN •Q

30 AvN•Q

Astoria BlvdN•Q

AstoriaDitmars Blvd

N•Q

Court Sq-23 StE•M

6 Av L 14 St1•2•3

18 St1

14 St F•M

23 StF•M

23 St1

23 StC•E

23 StN•R

33 St6

Hunters Point Av7•LIRR

Vernon BlvdJackson Av

7

21 StG

Queensboro Plaza

N•Q•7

Court SqG•7

68 StHunter College6

77 St6

86 St4•5•6

96 St6

103 St6

110 St6

Central ParkNorth (110 St) 2•3

116 St6

72 StB•C

81 St–Museum of Natural History B•C

86 St B•C

96 StB•C

103 StB•C

Cathedral Pkwy(110 St)B•C

116 StB•C

125 StA•B•C•D

125 St2•3 • M60 LaGuardia Airport

125 St4•5•6

135 StB•C

135 St2•3

116 St2•3

3 Av138 St6

Brook Av

6

Cypress Av

6

E 143 StSt Mary’s St

6145 StA•B•C•D

191 St1

Bedford Pk BlvdB•D

Kingsbridge RdB•D

Fordham RdB•D

182–183 StsB•D

Tremont Av B•D

174–175 StsB•D

170 StB•D

Morris Park5

Pelham Pkwy5

Burke Av2•5

Gun Hill Rd 2•5

219 St 2•5

225 St2•5

233 St2•5

Nereid Av2•5

Wakefield 241 St2

Gun Hill Rd5

Baychester Av5

EastchesterDyre Av5

167 StB•D

E 149 St6

Longwood Av6

Hunts Point Av6

Whitlock Av6

Elder Av6

Morrison Av Soundview 6

St Lawrence Av6

Castle Hill Av6

Zerega Av6

Middletown Rd6

Buhre Av6

Pelham Bay Park6

Parkchester6

181 St1

155 S

t

B

• D15

5 St

C

163 St–Amsterdam Av C

145 St3

149 S

t–Grand

Concourse

2• 4• 5

Harlem148 St3

57 StF

57 St-7 AvN•Q•R

49 StN•Q•R

7 AvB •D

•E

28 St1

28 St N•R

28 St6

23 St6

Astor Pl 6

BoweryJ• ZEast

BroadwayF

2 Av

F

Bleecker St

6 B’way–L

afayette

St

B• D• F• M

Essex

St

F

• J• M• Z

Delance

y St

Grand St B•D

Prospect AvR

25 StR

36 StD•N•R

45 StR

53 StR

59 StN•R

8 Av

N

Fort Hamilto

n

PkwyN

New Utrecht A

vN 18

AvN

20 Av

N

Bay Pkwy

N

KingsHwy

N

Avenue U N86 St

N

62 St

D

71 St

D

79 St

D

18 Av D

20 Av D

Bay Pkwy

D 25 AvD

Bay 50 StD

Coney IslandStillwell Av

D•F•N•Q

55 St

D

50 St

DFort Hamilto

n

PkwyD

9 Av

DDitm

as Av

F

18 Av

F

Avenue I

F

Bay

PkwyF

Bay Ridge AvR

77 StR

86 StR

Bay Ridge95 St

R

Jay St Jay St MetroTechMetroTechAA••CC••FF••RR

Jay St MetroTechA•C•F•R

Lafayette AvC

ParkPlS

Fulton StG

Smith

9 Sts F• G

4 Av–9

St

F• G• R 7 AvF• G

15 St

Prospect P

arkF• G

Fort HamiltonPkwy

F•G

Church AvF• G

Avenue N

F Avenue P

FKings H

wy

F

Avenue U

F

Avenue X

FNeptune Av

F

West 8 StNY AquariumF•Q

Ocean PkwyQ

Brighton BeachB•Q

Sheepshead Bay

B• Q

Neck Rd

Q

Avenue U

Q

Kings Hwy

B• Q

Avenue M

Q

Avenue J

Q

Avenue H

Q

Newkirk Plaza

B• Q

Cortelyo

u Rd

Q

Beverle

y Rd

Q

Church Av

B• QFlatbush

Av

Brooklyn Colle

ge

2• 5

Newkirk Av

2• 5

Beverly

Rd

2• 5

Church Av

2• 5

Winthrop St

2• 5

Sterling St 2•5

President St 2•5

CanarsieRockaway PkwyL

East 105 StL

Aqueduct North Conduit Av

A

Aqueduct RacetrackA

Van Siclen AvCLiberty

AvC

Ozone ParkLefferts BlvdA

111 StA

104 StA

Rockaway Blvd A

88 St A

80 StA

Grant AvA

Euclid AvA•C

Shepherd AvC Howard Beach

JFK Airport AAtlantic Av

L

Alabama AvJ

New Lots Av L B15 JFK Airport

Crescent StJ•Z

Norwood Av Z rush hrs, J other times

Cleveland St J

Bushwick Av

Aberdeen St

L

Wilson Av

L

DeKalb Av

LJe

fferso

n St

L

Flushing Av

J• MLorim

er St

J• MBroadway

G

Nassau Av

G

Greenpoint Av

G

Lorimer S

t

L Graham

Av

L Grand St

L Montrose

Av

L Morgan Av

L

Livonia Av L

Sutter

AvL

Nostrand Av

A •CFranklin Av

C •S

Kingston

Throop Avs

C

Utica AvA •C

Ralph Av

C

Chauncey St

Z rush hours,

J other times

MyrtleWyckoff AvsL•M

Halsey St

J Gates Av

Z rush hours,

J other times

Kosciuszko St

JMyrtle Av

J •M•Z

Central Av M

Seneca AvM

MyrtleWilloughby Avs G

Flushing Av

GMarcy Av

J• M• Z

Metropolitan Av G

Bedford Av

L

Fresh Pond RdM

Halsey

St

L

Rockaway

AvC

Broad

way

Junc

tion

A

• C• J• L

• Z

Parkside AvQ

Prospect Park B•Q•S

Botanic Garden S

Clinton

Washington AvsG

Classon AvG

Hewes

St J• M

Bedford

Nostrand AvsG

Clinton

Washington Avs

C

HoytSchermerhorn

A•C•G

Kingston Av3

Franklin Av

2• 3• 4• 5

BroadwayN•Q

Knickerbocker Av M

Middle VillageMetropolitan AvM

Forest AvM

High StA•C

Atlantic Av–Barclays Ctr B•Q•2•3•4•5•LIRR

Whitehall StSouth FerryR

Bowling Green4•5

Wall St4•5 Wall St

2•3

Fulton St

Chambers St1•2•3

Park Place 2•3

Chambers StJ•ZBrooklyn BridgeCity Hall 4•5•6

Chambers St A•C

Atlantic

Av–Barc

lays C

tr

D• N• R• LIRR

Bergen

St2• 3

7 Av

B• Q

Nevins St2•3•4•5

Borough Hall

2• 3• 4• 5

Court StR

Grand Arm

yPlaz

a2• 3

Easter

n Pkwy

Brooklyn M

useum

2• 3

34 StPenn

Station A•C•E•LIRR

42 StPort AuthorityBus Terminal

A•C•E Times Sq-42 St

N•Q•R•S•1•2•3•7Grand Central42 StS•4•5•6•7•Metro-North

47–50 StsRockefeller CtrB•D•F•M

34 StPenn

Station1•2•3•LIRR

34 StHerald Sq

B•D•FM•N•Q•R

42 StBryant PkB•D•F•M

5 Av 7

Lexington Av/53 St E•M

59 St 4•5•6

51 St 6

Lexington Av/59 StN•Q•R

5 Av/53 StE•M

5 Av/59 StN•Q•R

125 St1

168 St A•C•1 A•C

Dyckman St1

Inwood207 St

A

215 St1

3 Av–149 St2•5

Woodlawn4

Marble Hill225 St1

231 St1

75 St–Elderts Ln Z rush hours, J other times

Cypress Hills J

85 St–Forest Pkwy J

Woodhaven Blvd J•Z

104 St Z rush hours, J other times

111 StJ

121 St Z rush hours, J other times

Sutphin BlvdArcher AvJFK AirportE•J•Z•LIRR

Jamaica179 StF

Jamaica Center Parsons/ArcherE•J•Z

Jackson Hts

Roosevelt Av

E •F •M•R •Q70 Ltd LGA Airport (starts Sep 8)

Q47 LGA Airport (Marine Air Term only)

FlushingMain St

7

Nostrand Av3

Crown HtsUtica Av3•4

Saratoga Av 3

Rockaway Av 3

Junius St 3

Pennsylvania Av3

Van Siclen Av3

New Lots Av3

Sutter Av–Rutland Rd3

A•C•J•Z2•3•4•5

Westchester Sq East Tremont Av 6

Intervale Av 2•5

Prospect Av 2•5

Jackson Av 2•5

Mets–Willets Point7•Q48 LGA Airport

Van Siclen Av Z rush hrs, J other times

138 St–GrandConcourse4•5

M60 LaGuardia Airport

M60 LGAAirport

M60 LaGuardia Airport

M60 LGA Airport

Rector St1

Cortlandt St1

Cortlandt St R

South Ferry1

World TradeCenter

E

207 St 1

rushhours

rushhours

S

Rooseve

lt

Isl

and

F

Beach44 StA

Beach 36 StA

Beach 25 StA

Far RockawayMott Av

A

Broad Channel

A•S

Beach 67 StA

Beach 60 StA

Beach 90 StA•S

Beach 98 StA•S

Beach 105 StA•S

Rockaway ParkBeach 116 St

A•S

Stat

en Is

land

Fer

ry

summer only

QUEENSMIDTOWNTUNNEL

MARINE PARKWAY-GIL HODGESMEMORIALBRIDGE

CR

OS

S B

AY

VETE

RA

NS

ME

MO

RIA

L

BR

IDG

E

HENRY HUDSON

BRIDGE

HUGH L. CAREY TUNNEL

BRIDGE

VERRAZANO-NARROWS

BR

IDG

E

RO

BE

RT

F KE

NN

ED

Y

THROGS NECK BRIDGE

GEO. WASHINGTONBRIDGE

LINCOLN TUNNEL

HOLLAND TUNNEL

MANHATTAN BRIDGE

BROOKLYN BRIDGE

QUEENSBORO BRIDGE

BRONX-WHITESTONE

BRIDGE

MA

LC

OLM

X B

LVD

(LEN

OX

AV

)

NOSTRAND AV

BR

OA

DW

AY

BROADWAY BRIDGE

ST

NIC

HO

LAS

AV

BR

OA

DW

AY

BR

OA

DW

AY

BROADWAY

SE

VE

NT

H A

V

VAR

ICK

ST

L IVONIA

AV

WEST

CHES

TER

AV

E 138 ST

LEX

ING

TO

N A

V

PA

RK

AV

S

LAFA

YE

TT

E S

T

EASTERN PARKWAY

SO

UTH

ER

N B

LVD

WESTC

HE

ST

ER

AV

S

OU

TH

ER

N B

LVD

ES

PLA

NA

DE

WH

ITE PLAINS R

D

JER

OM

E A

V

MANHATTAN AV

UNION AV

LAFAYETTE AV

WEST END LINE

DELANCEY ST

BROADWAY

FULTON ST

JAMAIC

AAV

MYRTLE AV

VAN SINDEREN AV

WYCKOFF AV

BUSHWICK AV

N 7 ST

HOUSTON ST

R U TGERS ST JA

Y S

T S

MITH

ST

NINTH ST

MCDONALD AV

CULVER LINE

MCDONALD AV

FOU

RT

H A

V

86 ST

NEW

UTR

ECH

T AV

FOU

RT

H A

V

53 ST

HILLSID

E AV

41 AV 63 ST

SIX

TH

AV

FLATBUSH AV

E 15 ST

BRIGHTON LINE

E 16 ST

GR

AN

D C

ON

CO

UR

SE

QUEENS BLVD

QUEENS BLVD

ARCHER AV

LIBERTY A

V

PITKIN AV

FULTON ST

FULTON ST

CH

UR

CH

ST

SIX

TH

AV

GREENWICH AV

EIG

HT

H A

V

CE

NT

RA

L PA

RK

WE

ST

S

T N

ICH

OLA

S A

V

FOR

T W

AS

HIN

GTO

N AV

BROADW

AY

FOU

RTH

AV

61 ST SEA BEACH LINE 63 ST

WEST 8 ST

BROADWAY

31 ST

60 ST

BROADW

AY

BR

OA

DW

AY

BR

OA

DW

AY

QUEENS BLVD

ROOSEVELT AV

FLATBUSH AV

WILLIAMSBURG BRIDGE

14 ST

42 ST

PALIS

AD

E A

V

IND

EP

EN

DE

NC

E A

V

HE

NR

Y H

UD

SO

N P

KW

Y

BROADW

AY

231 ST

IRWIN AV

VAN CORTLANDT P

ARK SO

MO

SHO

LU PKW

Y

FORDHAM RD

PELHAM PKWY

CR

OT

ON

A A

V

PR

OS

PE

CT

AV

E 169 ST

180 ST

TREMONT AV

E TREMONT AV

WE

BS

TE

R A

V

BA

INBR

IDGE

TH

IRD

AV

225 ST

BRUCKNER EXPWY

BRUCKNER

EXPWY

ELDER AV ST LAW

RENCE AV

WHITE PLAINS RD

WHITE PLAINS RD

SOUNDVIEW AV

CASTLE HILL AV

ZEREGA AV HUTCHINSON PKW

Y

ALLERTON AV BURKE AV

222 ST

233 ST

MIDDLETOWN RD

BR

OA

DW

AY

AM

ST

ER

DA

M A

V

FT WA

SH

AV

RIV

ER

SID

ED

R

RIV

ER

SID

E D

R

145 ST

135 ST

ST NICHOLAS AV

AM

ST

ER

DA

M A

V FIFT

H A

V

5 AV

MA

DIS

ON

AV

M

AD

ISO

N A

V

PA

RK

AV

TH

IRD

AV

3 A

V

SE

CO

ND

AV

2 A

V

2 AV

1 AV

FIRS

T A

V

1 AV

ALLE

N S

T

YO

RK

AV

WE

ST

EN

D A

V

72 ST

CO

LUM

BU

S A

V

66 ST 66 ST

12 AV

WES

T ST

WEST ST

53 ST

E 8 ST

FDR

DR

GRAND ST

E BWAY

SOUTH ST

WATE

R ST

ASTORIA BLVD

NORTHERN BLVD

DITMARS BLVD

111 ST

112 ST

ST

EIN

WA

Y S

T

48 ST

LONG ISLAND EXPWY

HORACE HARDIN

G EXPWY

LONG ISLAND

EXPWY

36 ST

30 AV

GR

EEN

POIN

T AV

21 ST

JUNCTION BLVD

JEWEL A

V

UTOPIA PKWY

PARSONS BLVD

KISSENA BLVD

MAIN ST

HILLSIDE AV

JAMAICA A

V

SUTPHIN BLVD

111 ST

LINDEN B

LVD LEFFERTS BLVD

MERRICK BLVD

METROPOLITAN AV

METROPOLITAN AV

NASSAU AV

BEDFORD AV

FLUSHING AV

FOREST AV

WOODHAVEN BLVD

MYRTLE AV

JAC

KIE

RO

BIN

SO

N P

AR

KW

AY

WILSON AV

BUSHWICK AV

MYRTLE AV

BERGEN ST

BERGEN ST

LIBERTY A

V

HIC

KS

ST

HE

NR

Y S

T

9 ST

UNION ST CHURCH A

V

PROSPECT AV

OCEAN PKWY

CONEY ISLAND AV

9 AV

FOR

T H

AMIL

TON P

KWY

PARKSIDE A

V W

INTHROP S

T

NOSTRAND AV

AV Z

EMMONS AV

AV U

FLATBUSH AV

WASHINGTON

UTIC

A AV

UTICA AV

86 ST

KIN

GS

HW

Y

FIFTH

AV

39 ST

REMSEN AV

AV M

FLA

TLA

ND

S A

V

AV H

OCEAN AV

BE

DFO

RD

AV

BEDFORD AV

NOSTRAND AV

VAN SICLEN AV

PENNSYLVANIA AV

PORT W

ASHIN

GTO

N B

LVD

CR

OSS B

AY B

LVD

CR

OS

S B

AY

BLV

D

PARSONS BLVD

WH

ITESTON

E EXPW

Y

MID

DLE N

ECK

RD

NORTHERN BLVD

CANAL ST

CANAL ST SPRING ST

T R A M W A Y

HOUSTON ST

3 AV

BOW

ERY W 4 ST E 4 ST

BLEECKER ST

BLEECKER ST

23 ST

12 AV 23 ST

50 ST 50 ST

59 ST CENTRAL PARK SOUTH

79 ST

125 ST

116 ST

96 ST

86 ST

UNIVERSITY HTS BR

UNION TURNPIK

E

CLEARVIEW EXPWY

163 ST

FRE

DE

RIC

K

DO

UG

LAS

S B

LVD

AD

AM

CLA

YT

ON

PO

WE

LL BLV

D (7A

V)

VAN WYCK EXPW

Y

SEAGIRT BLVD

BEA

CH

CHANNEL D

R

RO

CK

AW

AY

BEA

CH

BLV

D

KING

S H

IGH

WA

Y

82 ST

VE

RN

ON

BLV

D

BE

AC

H C

HA

NN

EL D

R

ROCKAW

AY PT

BLV

D

HAMILTON BRIDGE

WASHINGTON BRIDGE

CROSS BRONX EXPWY

BAYCHESTER AV

9 AV

10 AV

11 AV

GR

AN

D A

V

SpuytenDuyvil

Riverdale

UniversityHeights

MorrisHeights

Harlem125 St

Melrose

Yankees-E153 St

Tremont

Fordham

Botanical Garden

WilliamsBridge

Woodlawn

Wakefield

LongIslandCity

9 St

14 St

23 St

33 St

Christopher St

Hunterspoint Av

Woodside

Mets–Willets Point

Flushing

ForestHills

JamaicaKewGardens

Hollis

Auburndale Bayside Douglaston

Manhasset

Plandome

PortWashington

GreatNeck

LittleNeck

MurrayHill

Broadway

QueensVillage

Laurelton Rosedale

Woodmere

Cedar-hurst

Lawrence

Inwood

LocustManor

FarRockaway

East NY

Nostrand Av

MarbleHill

WTC

VANCORTLANDT

PARK

BRONXZOO

PELHAMBAY

PARK

ORCHARDBEACH

CENTRALPARK

WASHINGTONSQUARE PARK

METROPOLITANMUSEUMOF ART

RANDALLSISLAND

JAVITSCENTER

RIVERBANKSTATE PARK

INWOODHILL PARK

FORT TRYONPARK

UNITEDNATIONS

WTC Site9/11 Memorial

FLUSHINGMEADOWSCORONA

PARK

PROSPECTPARK

BROOKLYNBOTANICGARDEN

FORT GREENEPARK

GREEN-WOODCEMETERY

LAGUARDIAAIRPORT

JFKINTERNATIONAL

AIRPORT

JAMAICABAY

WILDLIFEREFUGE

GATEWAYNATIONAL

RECREATIONAREA–

JAMAICA BAY

EASTRIVERPARK

BROOKLYNBRIDGEPARK

KISSENAPARK

CUNNINGHAMPARK

MARINEPARK

FLOYDBENNETT

FIELD

JUNIPERVALLEY

PARKFOREST

PARK

RIVERSIDE PARK

HUDSON RIVER PARK

HIGHBRIDGEPARK

JACOBRIIS

PARK

LIBERTYISLAND

ELLISISLAND

NEW YORKTRANSIT MUSEUM

southbound only

except

n-bound

southbound

6

Sexcept

south-bound

4•5

7

2•3 and north-bound 4•5

BROOKLYN

MANHATTAN

QUEENS

THEBRONX

FINANCIALDISTRICT

BATTERY PARK CITY

CHINATOWN

LITTLE ITALYSOHO

TRIBECA

GREENWICHVILLAGE

CHELSEA

WESTSIDE

UPPEREASTSIDE

UPPERWESTSIDE

EASTHARLEM

HARLEM

WASHINGTONHEIGHTS

EASTVILLAGE

LOWEREAST SIDE

NOHO

RIVERDALE

KINGSBRIDGE

HIGH-BRIDGE

FORDHAM

TREMONT

MORRISANIA

THE HUB

HUNTS POINT

RIKERSISLAND

MOTT HAVEN

SOUNDVIEW

PARKCHESTER

CITYISLAND

BAYCHESTER

CO-OPCITY

EASTCHESTER

ASTORIA

LONGISLAND

CITY

ROOSEVELTISLAND

JACKSONHEIGHTS

CORONA

FLUSHING

HILLCREST

FRESHMEADOWS

JAMAICAESTATES

JAMAICA

HOLLIS

QUEENSVILLAGE

KEWGARDENS

KEWGARDENS

HILLS

RICHMONDHILL

FORESTHILLS

REGO PARK

MIDDLEVILLAGE

GLENDALEWOODHAVEN

OZONEPARK

HOWARD BEACHEASTNEWYORK

OCEAN HILL-BROWNSVILLE

CANARSIE

EASTFLATBUSH

MIDWOOD

BENSONHURST

FLATBUSH

PARKSLOPE

REDHOOK

GOVERNORSISLAND

CARROLLGARDENS

FLATLANDS

ROCKAWAYPARK

BREEZYPOINT

SHEEPSHEADBAY

BRIGHTONBEACH

CONEY ISLAND

BAY RIDGE

BOROUGHPARK

SUNSETPARK

BROOKLYNHEIGHTS

WILLIAMSBURG

FORT GREENE

GREENPOINT

BEDFORD-STUYVESANT

CROWNHEIGHTS

BUSHWICK

RIDGEWOOD

MASPETH

DUMBO

NAVYYARD

MTA

Sta

ten

Isla

nd R

ailw

ay

Grasmere

St. George

Tompkinsville

Stapleton

Clifton S51

Old Town

Dongan Hills

Jefferson AvGrant City

S51/81

New Dorp

Oakwood Heights S57

Bay Terrace

Great KillsS54 X7 X8

Eltingville

Annadale S55

Huguenot S55 X17 X19

Prince's Bay S56

Pleasant Plains

Richmond ValleyNassauS74/84

AtlanticS74/84

Tottenville S74/84

RICHMOND TERRACE

VICTORY BLVD

VA

ND

ER

BIL

T A

V

ARTHUR KILL RD

STATEN ISLAND EXPRESSWAY VERRAZANO-NARROWS BRIDGE

FOREST AV

HY

LAN

B

LVD

HYLAN BLVD

AR

TH

UR

KIL

L RD

WE

ST

SH

OR

E E

XP

WY

RIC

HM

ON

D A

V

SILVERLAKEPARK

SNUG HARBORCULTURAL CENTER

COLLEGE OFSTATEN ISLAND

SEAVIEW

HOSPITALSTATENISLANDMALL

NEWSPRINGVILLE

PARK

LA TOURETTEPARK

GREATKILLSPARK

CLOVELAKESPARK

STATENISLAND

PORTRICHMOND

WEST NEWBRIGHTON

MARINERSHARBOR

FOXHILLS ROSEBANK

CASTLETONCORNERS

BULLSHEAD

CHELSEA

WESTERLEIGH

TODTHILL

NEWDORPBEACH

WOODROWROSSVILLE

CHARLESTON

ARDENHEIGHTS

FRESHKILLS

RICHMONDTOWN

TOTTENVILLEBEACH

tunnel closeduntil fall 2014

runs weekends via Manhattan Br

The subway operates 24 hours a day, but not all lines operate at all times. This map depicts morning to evening weekday service. Call our Travel Information Center at 511 for more information in English or Spanish (24 hours) or ask an agent for help in all other languages (6AM to 10PM).

To show service more clearly, geography on this map has been modified. © 2013 Metropolitan Transportation Authority

visit www.mta.info

Key

August 2013

Full time servicePart time service

All trains stop (local and express service)

Local service onlyRush hour line

extension

Free subway transferFree out-of-system subway transfer (excluding single-ride ticket)

Terminal

Bus or AIRTRAINto airport

Accessiblestation

Additional expressservice

Normal service

Commuter rail service

Bus to airport

StationName

A•C

New York City Subwaywith bus and railroad connections

Police

M60

The subway map depicts weekday service. Service differs by time of day and is sometimes affected by construction. Overhead directional signs on platforms show weekend, evening, and late night service. Visit mta.info for detailed guides to subway service: click on Maps, then “Individual Subway Line Maps,” “Service Guide,” or “Late Night Service Map.” For construction-related service changes, click on “Planned Service Changes” in the top menu bar. On weekends, the Weekender website and app show construction-related scheduled service changes. This information is also posted at station entrances and on platform columns of affected lines.

MTA Fare Exploration Example

16D. Koop, CIS 602-02, Fall 2015

Page 23: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

MTA Fare Data Exploration

17D. Koop, CIS 602-02, Fall 2015

Page 24: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

MTA Fare Data Exploration

18D. Koop, CIS 602-02, Fall 2015

Page 25: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

MTA Fare Data Exploration

19D. Koop, CIS 602-02, Fall 2015

Page 26: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

MTA Fare Data Exploration

19D. Koop, CIS 602-02, Fall 2015

Page 27: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

MTA Fare Data Exploration

20D. Koop, CIS 602-02, Fall 2015

Page 28: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

MTA Fare Data Exploration

21D. Koop, CIS 602-02, Fall 2015

Page 29: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

MTA Fare Data Exploration

21D. Koop, CIS 602-02, Fall 2015

A U G U S TS U N M O N T U E W E D T H U F R I S A T

2 3

10

17

24

31

9

16

23

30

SD SDHOU DETDETT OR DET

DET DETCHW COLCHWSD CHW

BOS BOSLAA LAALAADET LAA

TB TBTOR TORTORBOS TOR

BAL BALTOR TORTORTB TOR

1

8

15

22

29

1

7

14

21

28

3

6

13

20

27

2

5

12

19

26

1

4

11

18

25

1:10 1:10 10:10 8:40

8:104:10 8:10 8:10 1:10 7:05 1:05

7:05TBA 7:05

7:071:40 7:07 7:077:07 7:05 1:05

7:05 1:05 7:10 4:05

1:10TBA 7:05 7:05 1:05 7:10 7:10

YES YES YES

YES YES MY9 YES YES YES YES

TBA YES YES YES YES MY9 FOX

TBA YES MY9 YES YES MY9 YES

YES YES YES YES YES YES YES

S E P T E M B E RS U N M O N T U E W E D T H U F R I S A T

6 7

14

21

28

30

13

20

27

29

BOS BOSCHW BOSCHWBAL CHW

BOS BOSBAL BALBALBOS BAL

SF SFTORTOR TOR TORBOS

HOU HOUTB TBTBSF TB

T OR T ORCHW CHWHOUHOU HOU

5

12

19

26

28

4

11

18

25

27

3

10

17

24

30

2

9

16

23

30

1

8

15

22

29ALL GAMES ARE EASTERN TIME.

1:051:05 7:05 7:05 7:05 7:05 1:05

7:05TBA 7:05 7:05 7:05 7:10 1:05

1:10TBA 7:07

1:102:10 1:10

7:07 7:07 7:05 TBA

1:101:05 7:05 7:05 7:05 8:10 TBA

TBA YES MY9 YES YES MY9 FOX

YES YES YES YES YES YES FOX

TBA YES MY9 YES YES YES TBA

YES YES

YES YES

MY9 YES YES YES TBA

2 013 R E G U L A R S E A S O N S C H E D U L E

Page 30: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

Definition

“Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively.” — T. Munzner

22D. Koop, CIS 602-02, Fall 2015

Page 31: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

Why Visualization?

23D. Koop, CIS 602-02, Fall 2015

I II III IV

x y x y x y x y

10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58

8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76

13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71

9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84

11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47

14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04

6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25

4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50

12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56

7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91

5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89

[F. J. Anscombe]

Page 32: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

Why Visualization?

23D. Koop, CIS 602-02, Fall 2015

I II III IV

x y x y x y x y

10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58

8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76

13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71

9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84

11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47

14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04

6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25

4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50

12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56

7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91

5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89

Mean of x 9Variance of x 11Mean of y 7.50Variance of y 4.122Correlation 0.816

[F. J. Anscombe]

Page 33: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

●●

●●

●●

4 6 8 10 12 14 16 18

4

6

8

10

12

x1

y 1

●●

●●●

4 6 8 10 12 14 16 18

4

6

8

10

12

x2

y 2●

●●

●●

●●

4 6 8 10 12 14 16 18

4

6

8

10

12

x3

y 3

●●

●●

4 6 8 10 12 14 16 18

4

6

8

10

12

x4

y 4

Why Visualization?

24D. Koop, CIS 602-02, Fall 2015

[F. J. Anscombe]

Page 34: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

Visual Pop-out

25D. Koop, CIS 602-02, Fall 2015

[C. G. Healey, http://www.csc.ncsu.edu/faculty/healey/PP/]

Page 35: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

Visual Perception Limitations

26D. Koop, CIS 602-02, Fall 2015

[C. G. Healey, http://www.csc.ncsu.edu/faculty/healey/PP/]

Page 36: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

Visual Perception Limitations

27D. Koop, CIS 602-02, Fall 2015

[C. G. Healey, http://www.csc.ncsu.edu/faculty/healey/PP/]

Page 37: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

Visual Encoding• How do we encode data visually?

- Marks are the basic graphical elements in a visualization - Channels are ways to control the appearance of the marks

• Marks classified by dimensionality:

• Also can have surfaces, volumes • Think of marks as a mathematical definition, or if familiar with tools

like Adobe Illustrator or Inkscape, the path & point definitions

28D. Koop, CIS 602-02, Fall 2015

Points Lines Areas

Page 38: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

Channels: Visual Appearance• How should we encode this data?

29D. Koop, CIS 602-02, Fall 2015

Name Region Population Life Expectancy Income

China East Asia & Pacific 1335029250 73.28 7226.07

India South Asia 1140340245 64.01 2731

United States America 306509345 79.43 41256.08

Indonesia East Asia & Pacific 228721000 71.17 3818.08

Brazil America 193806549 72.68 9569.78

Pakistan South Asia 176191165 66.84 2603

Bangladesh South Asia 156645463 66.56 1492

Nigeria Sub-Saharan Africa 141535316 48.17 2158.98

Japan East Asia & Pacific 127383472 82.98 29680.68

Mexico America 111209909 76.47 11250.37

Philippines East Asia & Pacific 94285619 72.1 3203.97

Vietnam East Asia & Pacific 86970762 74.7 2679.34

Germany Europe & Central Asia 82338100 80.08 31191.15

Ethiopia Sub-Saharan Africa 79996293 55.69 812.16

Turkey Europe & Central Asia 72626967 72.06 8040.78

Page 39: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

Potential Solution

30D. Koop, CIS 602-02, Fall 2015

[Gapminder, Wealth & Health of Nations]

Page 40: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

Another Solution

31D. Koop, CIS 602-02, Fall 2015

[Gapminder, Wealth & Health of Nations]

Page 41: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

Horizontal

Position

Vertical Both

Color

Shape Tilt

Size

Length Area Volume

Visual Channels

32D. Koop, CIS 602-02, Fall 2015

[Munzner (ill. Maguire), 2014]

Page 42: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

Expressiveness and Effectiveness• Expressiveness Principle: all data from the dataset and nothing

more should be shown - Do encode ordered data in an ordered fashion - Don’t encode categorical data in a way that implies an ordering

• Effectiveness Principle: the most important attributes should be the most salient - Saliency: how noticeable something is - How do the channels we have discussed measure up? - How was this determined?

33D. Koop, CIS 602-02, Fall 2015

Page 43: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

Magnitude Channels: Ordered Attributes Identity Channels: Categorical Attributes

Spatial region

Color hue

Motion

Shape

Position on common scale

Position on unaligned scale

Length (1D size)

Tilt/angle

Area (2D size)

Depth (3D position)

Color luminance

Color saturation

Curvature

Volume (3D size)

Channels: Expressiveness Types and Effectiveness Ranks

Ranking Channels by Effectiveness

34D. Koop, CIS 602-02, Fall 2015

[Munzner (ill. Maguire), 2014]

Page 44: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

Decisions, decisions• Given a multi-attribute dataset (e.g. 15 attributes), how do we

decide what to visualize? • What question are we interested in? • Do we know what the columns represent? (domain information) • What visual encoding works best for the selected attributes?

35D. Koop, CIS 602-02, Fall 2015

Page 45: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

D. Koop, CIS 602-02, Fall 2015

Voyager: Exploratory Analysis via Faceted Browsing of Visualization Recommendations

K. Wongsuphasawat, D. Moritz, A. Anand, J. Mackinlay, B. Howe, and J. Heer

Presented by: Chaitanya Chandurkar and Ramya Reddy Mara

Online Application

Page 46: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

Discussion• Quality of Recommendations • Ranking of Recommendations

- Interesting discussion of vertical bar charts versus horizontal based on labels

- How are these mapped to scalar values? • Data variation versus design variation • Hybrid of PoleStar and Voyager? • Evaluation: "Get a comprehensive sense of what the dataset

contains and use the bookmark features to collect interesting patterns, trends, or other insights worth sharing with colleagues"

• Scanning for trends vs. answering questions

37D. Koop, CIS 602-02, Fall 2015

Page 47: Scalable Data Analysis (CIS 602-02) - Computer and ...dkoop/cis602-2015fa/lectures/lecture08.pdf · Scalable Data Analysis (CIS 602-02) Data Visualization Dr. David Koop. ... Movies

Next Class• Quiz, no reading response • Reading presentation on statistics • Frequentist/Bayesian discussion • Data science and statistics

38D. Koop, CIS 602-02, Fall 2015