data - harvard universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering...

87
Tusbupt!Jesfpt

Upload: others

Post on 24-Jul-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

Tusbupt!Jesfpt

Page 2: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

DATA

INDEX——HOW——

TO STORE ——DATA——

Page 3: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

DATA

INDEX

data structure decisions define the algorithms that access data

ALGORITHMS

Page 4: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

DATA

INDEX

[7,4,2,6,1,3,9,10,5,8]

ALGORITHMSunordered

Page 5: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

DATA

INDEX

[7,4,2,6,1,3,9,10,5,8]

ALGORITHMSunordered

Page 6: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

DATA

INDEX

[7,4,2,6,1,3,9,10,5,8]

ALGORITHMS[1,2,3,4,5,6,7,8,9,10]

unordered

ordered

Page 7: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

DATA

INDEX

ALGORITHMS

Page 8: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

DATA

INDEX

ALGORITHMS

DATA SYSTEMS

Page 9: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

DATA STRUCTURES

DEFINE PERFORMANCE

2020

spee

d COMPUTE

DATA MOVEMENT

Page 10: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

2020

spee

d COMPUTE

DATA MOVEMENT

register = this room

disk = Pluto memory = nearby city

Jim Gray, Turing Award 1998

caches = this city

Page 11: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

Read Update

Memory

no perfect structure

amplification

Page 12: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

Read Update

Memory

Mem

ory

Read

Upda

teno perfect structure

amplification

Page 13: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

Read Update

Memory

Mem

ory

Read

Upda

te

differential approximate

pointtree

no perfect structure

amplification

Page 14: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

Read Update

Memory

Mem

ory

Read

Upda

te

differential approximate

pointtree

no perfect structure

amplification

Array

Linked-List

Skip-List

Trie

Hash-Table

Sorted Array

B-tree

Page 15: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

How do I make my data system run x times as fast? (sql,nosql,bigdata, …)

Page 16: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

How do I make my data system run x times as fast?

How do I minimize my bill in the cloud?

(sql,nosql,bigdata, …)

Page 17: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

How do I make my data system run x times as fast?

How do I minimize my bill in the cloud?

(sql,nosql,bigdata, …)

How do I extend the lifetime of my hardware?

Page 18: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

How do I make my data system run x times as fast?

How do I minimize my bill in the cloud?

How to accelerate statistics computation for data science/ML?

(sql,nosql,bigdata, …)

How do I extend the lifetime of my hardware?

Page 19: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

How do I make my data system run x times as fast?

How do I minimize my bill in the cloud?

How do I train my neural network x times faster?

How to accelerate statistics computation for data science/ML?

(sql,nosql,bigdata, …)

How do I extend the lifetime of my hardware?

Page 20: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

NEW APPLICATIONS

Page 21: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

NEW APPLICATIONS

existing systems need to change too

Page 22: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

NEW APPLICATIONS

existing systems need to change too

WORKLOAD HARDWARE

ADAPT

Page 23: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

NEW APPLICATIONS

existing systems need to change too

WORKLOAD HARDWARE

ADAPT

IMPROVE WITHIN A BUDGET

WHAT WILL BREAK MY SYSTEM?

REASON

Page 24: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

more data

new applications

new h/w

continuous need

for newstorage solutions

Page 25: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

fundamental of storage learning outcome

Page 26: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

fundamental of storage learning outcome

software engineering data-driven startup research

Page 27: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

fundamental of storage learning outcome

software engineering data-driven startup researchdata structures, SQL, NoSQL, Big Data, Neural Networks, Statistics, Data Science

Page 28: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

fundamental of storage learning outcome

software engineering data-driven startup researchdata structures, SQL, NoSQL, Big Data, Neural Networks, Statistics, Data Science

small set of principles across all fields

Page 29: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,
Page 30: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

first 4 weeks: introduction to research problems/thinking through lectures

Page 31: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

first 4 weeks: introduction to research problems/thinking through lectures

Reading research papers

Open ended projects/research

Page 32: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

as of week 5: discussions/ presentations

Page 33: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

as of week 5: discussions/ presentations

interaction: in and out of classM/W/F OH/labs, Sat/Sun remote OH

Page 34: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

There is no such thing as a

wrong question/answer!!!!

as of week 5: discussions/ presentations

interaction: in and out of classM/W/F OH/labs, Sat/Sun remote OH

Page 35: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

Recent Research Papers

review and slides should focus on

what is the problem why is it important

why is it hard why existing solutions do not work

what is the core intuition for the solution solution step by step

does the paper prove its claims exact setup of analysis/experiments are there any gaps in the logic/proof

possible next steps

* follow a few citations to gain more background

Each student: 2 reviews per week/1 presentation

Page 36: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

Recent Research Papers

review and slides should focus on

what is the problem why is it important

why is it hard why existing solutions do not work

what is the core intuition for the solution solution step by step

does the paper prove its claims exact setup of analysis/experiments are there any gaps in the logic/proof

possible next steps

* follow a few citations to gain more background

Each student: 2 reviews per week/1 presentation

learn to judge constructively

learn to present

learn to prepare slides

Page 37: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

systems project research project

semester project: due in the end of semester + a midway check in (early March,10%)

Page 38: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

systems project

individual projectNoSQL, in c/c++

research project

semester project: due in the end of semester + a midway check in (early March,10%)

Page 39: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

systems project

individual projectNoSQL, in c/c++

research project

groups of threeNoSQL, Neural Networks

Periodic Table of Data Structures

semester project: due in the end of semester + a midway check in (early March,10%)

Page 40: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

systems project

individual projectNoSQL, in c/c++

research project

groups of threeNoSQL, Neural Networks

Periodic Table of Data Structures

semester project: due in the end of semester + a midway check in (early March,10%)

Page 41: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

ACM Special Interest Group In Data Management (SIGMOD)Undergrad Research Competition

first prize in 2016, 2017, 2018, 2019Adaptive Denormalization Evolving Trees Splaying LSM-Trees Adaptive NoSQL

Page 42: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

ACM Special Interest Group In Data Management (SIGMOD)Undergrad Research Competition

first prize in 2016, 2017, 2018, 2019Adaptive Denormalization Evolving Trees Splaying LSM-Trees Adaptive NoSQL

Design continuums at CIDR 2019, two projects in SIGMOD 2020 finals

Page 43: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

piazza forum

all announcements & discussions as of week 2

link on class website - check out usage guidelines

Page 44: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

piazza forum

all announcements & discussions as of week 2

link on class website - check out usage guidelines

classes are recorded (links on class website)

Page 45: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

piazza forum

all announcements & discussions as of week 2

link on class website - check out usage guidelines

classes are recorded (links on class website)

NO LAPTOP/PHONE POLICYclass is based on participation!

Page 46: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

piazza forum

all announcements & discussions as of week 2

link on class website - check out usage guidelines

classes are recorded (links on class website)

NO LAPTOP/PHONE POLICYclass is based on participation!

Project: 40% Midway Check-in:10% Discussion: 20% Presentation: 15% Reviews: 15%

Page 47: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

Check out: syllabus, preparation readings, project 0, systems project, online sections

http://daslab.seas.harvard.edu/classes/cs265/

Get familiar with the very basics of traditional database architectures:Architecture of a Database System. By J. Hellerstein, M. Stonebraker and J. Hamilton. Foundations and Trends in Databases, 2007

Get familiar with very basics of modern database architectures:The Design and Implementation of Modern Column-store Database Systems.  By D. Abadi, P. Boncz, S. Harizopoulos, S. Idreos, S. Madden. Foundations and Trends in Databases, 2013

Get familiar with the very basics of modern large scale systems:Massively Parallel Databases and MapReduce Systems. By Shivnath Babu and Herodotos Herodotou. Foundations and Trends in Databases, 2013

Page 48: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

tvcbsob

Teaching Fellows:

Off class discussions are key! question on readings, ideas, help with code/analysis

csjbo tjrjboh

Page 49: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

Prerequisitesknowledge of algorithms, data structures, hardware, systems

Page 50: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

Prerequisitesknowledge of algorithms, data structures, hardware, systems

Systems track allows taking the class without all prerequisites Research track: open to CS165 students

Page 51: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

Prerequisitesknowledge of algorithms, data structures, hardware, systems

Systems track allows taking the class without all prerequisites Research track: open to CS165 students

(165/265 will not be offered in fall 2020/spring2021)

Page 52: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

questions on logistics?

Page 53: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

BASICS of storageIntro to RESEACH topics

Discussion phase/presentation as of week 5

Next few classes:

Page 54: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

periodic table of data [email protected]

Page 55: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

registers

on chip cache

on board cache

memory

disk

CPU

memory wall

chea

per

fast

er

SRAM

DRAM

~1ns

~10ns

~100ns

cache miss: looking for something which is not in the cache

memory miss: looking for something which is not in memory

time

speed cpu

mem

Page 56: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

registers

on chip cache

on board cache

memory

disk

CPU

memory wall

chea

per

fast

er

SRAM

DRAM

~1ns

~10ns

~100ns

cache miss: looking for something which is not in the cache

memory miss: looking for something which is not in memory

time

speed cpu

mem

Page 57: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

Jim Gray, IBM, Tandem, DEC, Microsoft ACM Turing award ACM SIGMOD Edgar F. Codd Innovations award

disk100Kx Pluto

2 years

memory100x New York1.5 hours

on board cache10x this building

10 min

on chip cache2x this room

1 min

registers my head~0

Page 58: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

need to only read x… but have to read all of page 1

page1 page2 page3

data value x

registers

on chip cache

on board cachememory

disk

CPU

data

mov

e

Page 59: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

5 10 6 4 12

(size=120 bytes)

2 8 9 7 6 7 11 3 9 6

Page 60: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

scan

5 10 6 4 12(size=120 bytes)

2 8 9 7 6 7 11 3 9 6

Page 61: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

scan

5 10 6 4 12(size=120 bytes)

2 8 9 7 6

4

7 11 3 9 6

Page 62: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

scan

40 bytes

5 10 6 4 12(size=120 bytes)

2 8 9 7 6

4

7 11 3 9 6

Page 63: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

scan scan

40 bytes

5 10 6 4 12(size=120 bytes) 2 8 9 7 6 4

7 11 3 9 6

Page 64: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

scan scan

40 bytes

5 10 6 4 12(size=120 bytes) 2 8 9 7 6 4 2

7 11 3 9 6

Page 65: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

scan scan

5 10 6 4 12(size=120 bytes) 2 8 9 7 6 4 2

7 11 3 9 6

80 bytes

Page 66: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

(size=120 bytes) 2 8 9 7 6 4 2

7 11 3 9 6

80 bytes

Page 67: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

(size=120 bytes) 2 8 9 7 6 4 27 11 3 9 6

scan

80 bytes

Page 68: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

(size=120 bytes) 2 8 9 7 6 4 27 11 3 9 6

scan

3

80 bytes

Page 69: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

(size=120 bytes) 2 8 9 7 6 4 27 11 3 9 6

scan

3

120 bytes

Page 70: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

5 10 6 4 12

(size=120 bytes)

2 8 9 7 6 7 11 3 9 6

an oracle gives us the positions

Page 71: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

oracle

5 10 6 4 12(size=120 bytes)

2 8 9 7 6 7 11 3 9 6

an oracle gives us the positions

Page 72: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

oracle

5 10 6 4 12(size=120 bytes)

2 8 9 7 6

4

7 11 3 9 6

an oracle gives us the positions

Page 73: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

oracle

40 bytes

5 10 6 4 12(size=120 bytes)

2 8 9 7 6

4

7 11 3 9 6

an oracle gives us the positions

Page 74: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

oracle oracle

40 bytes

5 10 6 4 12(size=120 bytes) 2 8 9 7 6 4

7 11 3 9 6

an oracle gives us the positions

Page 75: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

oracle oracle

40 bytes

5 10 6 4 12(size=120 bytes) 2 8 9 7 6 4 2

7 11 3 9 6

an oracle gives us the positions

Page 76: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

oracle oracle

5 10 6 4 12(size=120 bytes) 2 8 9 7 6 4 2

7 11 3 9 6

80 bytesan oracle gives us the positions

Page 77: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

(size=120 bytes) 2 8 9 7 6 4 2

7 11 3 9 6

80 bytesan oracle gives us the positions

Page 78: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

(size=120 bytes) 2 8 9 7 6 4 27 11 3 9 6

oracle

80 bytesan oracle gives us the positions

Page 79: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

(size=120 bytes) 2 8 9 7 6 4 27 11 3 9 6

oracle

3

80 bytesan oracle gives us the positions

Page 80: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

(size=120 bytes) 2 8 9 7 6 4 27 11 3 9 6

oracle

3

120 bytesan oracle gives us the positions

Page 81: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

when does it make sense to have an oraclehow can we minimize the cost

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 65 10 6 4 12 2 8 9 7 6 7 11 3 9 6

e.g., query x<5

Page 82: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

algorithm system design = not just computation

Page 83: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

CPU DATA MOVEMENT

MEMORY REQUIREMENT

SPACE REQUIREMENT

(ENERGY)

ROBUSTNESS

Page 84: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

CPU DATA MOVEMENT

MEMORY REQUIREMENT

SPACE REQUIREMENT

(ENERGY)

SQL, NoSQL, Graph, Neural Nets, Statistics, Vision

ROBUSTNESS

Page 85: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

CPU DATA MOVEMENT

MEMORY REQUIREMENT

SPACE REQUIREMENT

(ENERGY)

SQL, NoSQL, Graph, Neural Nets, Statistics, Vision

TIME —— CLOUD COSTS

ROBUSTNESS

Page 86: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

Check out: syllabus, preparation readings, project 0, systems project, online sections

http://daslab.seas.harvard.edu/classes/cs265/

Get familiar with the very basics of traditional database architectures:Architecture of a Database System. By J. Hellerstein, M. Stonebraker and J. Hamilton. Foundations and Trends in Databases, 2007

Get familiar with very basics of modern database architectures:The Design and Implementation of Modern Column-store Database Systems.  By D. Abadi, P. Boncz, S. Harizopoulos, S. Idreos, S. Madden. Foundations and Trends in Databases, 2013

Get familiar with the very basics of modern large scale systems:Massively Parallel Databases and MapReduce Systems. By Shivnath Babu and Herodotos Herodotou. Foundations and Trends in Databases, 2013

Page 87: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,

Tusbupt!Jesfpt