cs 591: data systems architecturescs-people.bu.edu/.../slides/cas-cs591a1-class2.pdf · cs 591:...
TRANSCRIPT
![Page 1: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/1.jpg)
class 2
Data Systems 101
Prof. Manos Athanassoulis
http://manos.athanassoulis.net/classes/CS591
CS 591: Data Systems Architectures
![Page 2: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/2.jpg)
some reminders
no smartphones no laptop
![Page 3: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/3.jpg)
class effort summary
2 classes per week / OH 4 days per week
each student
1 presentation/discussion lead + 2 reviews per week
(5 long and the rest short, can skip 3)
systems or research project + mid-semester report
expect to work several hours every week
![Page 4: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/4.jpg)
Projectsystems project
implementation-heavy C/C++ project
group of 1-2
research project
group of 3-4
pick a subject (list will be available)
design & analysis
experimentation
![Page 5: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/5.jpg)
class timeline
Week 2 – register for presentations by 1/31first presentation on 2/5
now
Week 7 – submit your project report on 3/8
Week 4 (or earlier) – form your groups by 2/15
Week 15 – Project presentationssubmit all material by 4/30
discussionsinteraction in OH
questions
discussionsinteraction in OH
questions
![Page 6: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/6.jpg)
Piazza
all discussions & announcementshttp://piazza.com/bu/spring2019/cs591a1/
also available on class website
17 already registered!register so we can reach you easily
![Page 7: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/7.jpg)
sources (variety)
big data(it’s not only about size)
size (volume)
rate (velocity)
+ our ability to collect machine-generated data
social
scientific experiments sensors
Internet-of-Things
The 3 V’s
![Page 8: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/8.jpg)
a data system is a large software system
that stores data, and provides the interface to
update and access them efficiently
data system analysisknowledge
insightsdecisions
data
![Page 9: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/9.jpg)
a data system is a large software system
that stores data, and provides the interface to
update and access them efficiently
data system analysisknowledge
insightsdecisions
dataX
![Page 10: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/10.jpg)
a data system is a large software system
that stores data, and provides the interface to
update and access them efficiently
data system analysisknowledge
insightsdecisions
data
some analysis
![Page 11: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/11.jpg)
data system, what’s inside?
![Page 12: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/12.jpg)
application/SQLaccess patternscomplex queries
Indexing Data
op
opop
op
opalgorithms
andoperators
![Page 13: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/13.jpg)
Memory Hierarchy
application/SQLaccess patternscomplex queries
Query Parser
Query Compiler
Optimizer
Evaluation Engine
Memory/Storage Management
IndexingTransaction
Management Disk
Memory
Caches
CPU
modules
![Page 14: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/14.jpg)
growing environment
db
large systemscomplex
lots of tuninglegacy
noSQL
simple, clean“just enough”
newSQL
more complexapplications
need for scalability
>$200B by 2020, growing at 11.7% every year[The Forbes, 2016]
[noSQL]$3B by 2020, growing at 20% every year
[Forrester, 2016]
![Page 15: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/15.jpg)
growing need for tailored systems
new applications new hardware more data
![Page 16: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/16.jpg)
data system, what’s underneath?
![Page 17: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/17.jpg)
memory hierarchy
CPU
on-chip cache
on-board cache
main memory
flash storage
disks flash
smallerfaster
more expensive (GB/$)
![Page 18: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/18.jpg)
memory hierarchy (by Jim Gray)
Jim Gray, IBM, Tandem, Microsoft, DEC“The Fourth Paradigm” is based on his visionACM Turing Award 1998ACM SIGMOD Edgar F. Codd Innovations award 1993
registers/CPU
on chip cache
on board cache
memory
disk
tape
2x
10x
100x
106x
109x
my head ~0
this room 1min
this building 10min
Washington, DC5 hours
Pluto2 years
Andromeda2000 years
![Page 19: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/19.jpg)
memory hierarchy (by Jim Gray)
Jim Gray, IBM, Tandem, Microsoft, DEC“The Fourth Paradigm” is based on his visionACM Turing Award 1998ACM SIGMOD Edgar F. Codd Innovations award 1993
registers/CPU
on chip cache
on board cache
memory
disk
tape
2x
10x
100x
106x
109x
my head ~0
this room 1min
this building 10min
Washington, DC5 hours
Pluto2 years
Andromeda2 years
tape?sequential-only magnetic storage
still a multi-billion industry
![Page 20: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/20.jpg)
Jim Gray (a great scientist and engineer)
Jim Gray, IBM, Tandem, Microsoft, DEC“The Fourth Paradigm” is based on his visionACM Turing Award 1998ACM SIGMOD Edgar F. Codd Innovations award 1993
the first collection of technical visionary research on
a data-intensive scientific discovery
![Page 21: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/21.jpg)
memory wall
CPU
on-chip cache
on-board cache
main memory
flash storage
disks flash
fast
erch
eap
er/l
arge
r
![Page 22: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/22.jpg)
memory wall
CPU
on-chip cache
on-board cache
main memory
flash storage
disks flash
fast
erch
eap
er/l
arge
r
![Page 23: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/23.jpg)
cache/memory misses
CPU
on-chip cache
on-board cache
main memory
flash storage
disks flash
cache miss: looking for something that is not in the cache
memory miss: looking for something that
is not in memory
what happens if I miss?
![Page 24: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/24.jpg)
data movement
CPU
on-chip cache
on-board cache
main memory
flash storage
disks flash
data go through all necessary levels
also read unnecessary data pageX
photo from NBC
need to read only X read the whole page
![Page 25: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/25.jpg)
data movement
CPU
on-chip cache
on-board cache
main memory
flash storage
disks flash
data go through all necessary levels
also read unnecessary data pageX
remember!disk is millions (mem, hundreds) times slower than CPU
photo from NBC
need to read only X read the whole page
![Page 26: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/26.jpg)
page-based access & random access
memory (memory level N)
disk (memory level N+1)
size=120 bytes
2, 7, 13, 9, 8 1, 5, 12, 24, 23 10, 11, 6, 14, 15
page size = 5*8 = 40 bytes
query x<7
![Page 27: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/27.jpg)
page-based access & random access
memory (memory level N)
disk (memory level N+1)
size=120 bytes
2, 7, 13, 9, 8 1, 5, 12, 24, 23 10, 11, 6, 14, 15
page size = 5*8 = 40 bytes
1, 5, 12, 24, 23
scan output
1, 5
40 bytes
query x<7
![Page 28: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/28.jpg)
page-based access & random access
memory (memory level N)
disk (memory level N+1)
size=120 bytes
2, 7, 13, 9, 8 1, 5, 12, 24, 23 10, 11, 6, 14, 15
page size = 5*8 = 40 bytes
query x<7
1, 5, 12, 24, 23
output
1, 5
40 bytes
2, 7, 13, 9, 8
scan
![Page 29: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/29.jpg)
page-based access & random access
memory (memory level N)
disk (memory level N+1)
size=120 bytes
2, 7, 13, 9, 8 1, 5, 12, 24, 23 10, 11, 6, 14, 15
page size = 5*8 = 40 bytes
query x<7
1, 5, 12, 24, 23
output
1, 5, 2
40 bytes
2, 7, 13, 9, 8
scan
![Page 30: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/30.jpg)
page-based access & random access
memory (memory level N)
disk (memory level N+1)
size=120 bytes
2, 7, 13, 9, 8 1, 5, 12, 24, 23 10, 11, 6, 14, 15
page size = 5*8 = 40 bytes
query x<7
1, 5, 12, 24, 23
output
1, 5, 2
80 bytes
2, 7, 13, 9, 8
scan
![Page 31: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/31.jpg)
page-based access & random access
memory (memory level N)
disk (memory level N+1)
size=120 bytes
2, 7, 13, 9, 8 1, 5, 12, 24, 23 10, 11, 6, 14, 15
page size = 5*8 = 40 bytes
query x<7
10, 11, 6, 14, 15
scan output
1, 5, 2
80 bytes
2, 7, 13, 9, 8
![Page 32: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/32.jpg)
page-based access & random access
memory (memory level N)
disk (memory level N+1)
size=120 bytes
2, 7, 13, 9, 8 1, 5, 12, 24, 23 10, 11, 6, 14, 15
page size = 5*8 = 40 bytes
query x<7
10, 11, 6, 14, 15
scan output
1, 5, 2, 6
80 bytes
2, 7, 13, 9, 8
![Page 33: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/33.jpg)
page-based access & random access
memory (memory level N)
disk (memory level N+1)
size=120 bytes
2, 7, 13, 9, 8 1, 5, 12, 24, 23 10, 11, 6, 14, 15
page size = 5*8 = 40 bytes
query x<7
10, 11, 6, 14, 15
scan output
1, 5, 2, 6
120 bytes
2, 7, 13, 9, 8
![Page 34: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/34.jpg)
what if we had an oracle (perfect index)?
![Page 35: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/35.jpg)
page-based access & random access
memory (memory level N)
disk (memory level N+1)
size=120 bytes
2, 7, 13, 9, 8 1, 5, 12, 24, 23 10, 11, 6, 14, 15
page size = 5*8 = 40 bytes
query x<7
![Page 36: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/36.jpg)
page-based access & random access
memory (memory level N)
disk (memory level N+1)
size=120 bytes
2, 7, 13, 9, 8 1, 5, 12, 24, 23 10, 11, 6, 14, 15
page size = 5*8 = 40 bytes
1, 5, 12, 24, 23
oracle output
1, 5
40 bytes
query x<7
![Page 37: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/37.jpg)
page-based access & random access
memory (memory level N)
disk (memory level N+1)
size=120 bytes
2, 7, 13, 9, 8 1, 5, 12, 24, 23 10, 11, 6, 14, 15
page size = 5*8 = 40 bytes
query x<7
1, 5, 12, 24, 23
output
1, 5
40 bytes
2, 7, 13, 9, 8
oracle
![Page 38: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/38.jpg)
page-based access & random access
memory (memory level N)
disk (memory level N+1)
size=120 bytes
2, 7, 13, 9, 8 1, 5, 12, 24, 23 10, 11, 6, 14, 15
page size = 5*8 = 40 bytes
query x<7
1, 5, 12, 24, 23
output
1, 5, 2
40 bytes
2, 7, 13, 9, 8
oracle
![Page 39: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/39.jpg)
page-based access & random access
memory (memory level N)
disk (memory level N+1)
size=120 bytes
2, 7, 13, 9, 8 1, 5, 12, 24, 23 10, 11, 6, 14, 15
page size = 5*8 = 40 bytes
query x<7
1, 5, 12, 24, 23
output
1, 5, 2
80 bytes
2, 7, 13, 9, 8
oracle
![Page 40: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/40.jpg)
page-based access & random access
memory (memory level N)
disk (memory level N+1)
size=120 bytes
2, 7, 13, 9, 8 1, 5, 12, 24, 23 10, 11, 6, 14, 15
page size = 5*8 = 40 bytes
query x<7
10, 11, 6, 14, 15
output
1, 5, 2
80 bytes
2, 7, 13, 9, 8
oracle
![Page 41: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/41.jpg)
page-based access & random access
memory (memory level N)
disk (memory level N+1)
size=120 bytes
2, 7, 13, 9, 8 1, 5, 12, 24, 23 10, 11, 6, 14, 15
page size = 5*8 = 40 bytes
query x<7
10, 11, 6, 14, 15
output
1, 5, 2, 6
80 bytes
2, 7, 13, 9, 8
oracle
![Page 42: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/42.jpg)
page-based access & random access
memory (memory level N)
disk (memory level N+1)
size=120 bytes
2, 7, 13, 9, 8 1, 5, 12, 24, 23 10, 11, 6, 14, 15
page size = 5*8 = 40 bytes
query x<7
10, 11, 6, 14, 15
output
1, 5, 2, 6
120 bytes
2, 7, 13, 9, 8
oracle
![Page 43: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/43.jpg)
when is the oracle helpful?
2, 7, 13, 9, 8 1, 5, 12, 24, 23 10, 11, 6, 14, 15
for which query would an oracle help us?
how to decide whether to use the oracle?
![Page 44: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/44.jpg)
how we store data
every byte counts
know the query
index design space
layouts, indexes (classes 5,6)
overheads and tradeoffs (classes 11,12)
access path selection (class 10)
![Page 45: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/45.jpg)
rules of thumb
sequential access
read one block; consume it completely; discard it; read next;
random access
read one block; consume it partially; discard it; (may re-use);
read random next;
hardware can predict and start prefetching
prefetching can exploit full memory/disk bandwidth
ideal random access?
the one that helps us avoid a large numberof accesses (random or sequential)
![Page 46: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/46.jpg)
the language of efficient systems: C/C++
why?
low-level control over hardware
make decisions about physical data placement and consumptions
fewer assumptions
![Page 47: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/47.jpg)
the language of efficient systems: C/C++
why?
low-level control over hardware
make decisions about physical data placement and consumptions
fewer assumptionswe want you in the project to make low-level decisions
![Page 48: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/48.jpg)
a “simple” database operator
select operator (scan)
main-memory optimized-systems
data
qualifying positions
query: value<xover an array of N slots
![Page 49: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/49.jpg)
data
qualifying positions
how to implement it?
result = new array[data.size];j=0;for (i=0; i<data.size; i++)
if (data[i]<x)result[j++]=i;
query: value<xover an array of N slots
what if only 0.1% qualifies?
memorydata
result
![Page 50: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/50.jpg)
data
qualifying positions
how to implement it?
result = new array[data.size];j=0;for (i=0; i<data.size; i++)
if (data[i]<x)result[j++]=i;
query: value<xover an array of N slots
what if only 0.1% qualifies?
memorydata
![Page 51: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/51.jpg)
data
qualifying positions
how to implement it?
result = new array[data.size];j=0;for (i=0; i<data.size; i++)
if (data[i]<x)result[j++]=i;
query: value<xover an array of N slots
what if 99% qualifies?
how can we know?
branches (if statements)are bad for the processors,
can we avoid them?result = new array[data.size];j=0;for (i=0; i<data.size; i++)
result[j+=(data[i]<x)]=i;
how to bring the values?(remember we have the positions)
![Page 52: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/52.jpg)
data
qualifying positions
result = new array[data.size];j=0;for (i=0; i<data.size; i++)
if (data[i]<x)result[j++]=i;
query: value<xover an array of N slots
what about multi-core?NUMA? SIMD? GPU?
data
core1 core2 core3 core4needs coordination!
what about result writing?
![Page 53: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/53.jpg)
data
result = new array[data.size];j=0;for (i=0; i<data.size; i++)
if (data[i]<x)result[j++]=i;
query1: value<x1query2: value<x2 …what about having multiple queries?
![Page 54: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/54.jpg)
data
query: value<xover an array of N slots
should I scan?
should I probe an index?
how to decide which one is best?
total data movement&
computation
![Page 55: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/55.jpg)
how can I prepare?
1) Read background research material• Architecture of a Database System. By J. Hellerstein, M. Stonebraker and J. Hamilton.
Foundations and Trends in Databases, 2007
• The Design and Implementation of Modern Column-store Database Systems. By D. Abadi, P. Boncz, S. Harizopoulos, S. Idreos, S. Madden. Foundations and Trends in Databases, 2013
• Massively Parallel Databases and MapReduce Systems. By Shivnath Babu and HerodotosHerodotou. Foundations and Trends in Databases, 2013
2) Start going over the papers
![Page 56: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/56.jpg)
what to do now?
A) read the syllabus and the website
B) register to piazza
C) register to gradescope
D) register for the presentation (now!)
E) start submitting paper reviews (week 3)
F) go over the project (end of this week will be available)
G) start working on the mid-semester report (week 3)
![Page 57: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/57.jpg)
survival guide
class website: http://manos.athanassoulis.net/classes/CS591/
piazza website: http://piazza.com/bu/spring2019/cs591a1/
gradescope entry-code: MR7ZD4
office hours: Manos (Tu/Th, 2-3pm), Subhadeep (M/W 2-3pm)
material: papers available from BU network
![Page 58: CS 591: Data Systems Architecturescs-people.bu.edu/.../slides/CAS-CS591A1-Class2.pdf · CS 591: Data Systems Architectures. some reminders no smartphones no laptop. class effort summary](https://reader034.vdocument.in/reader034/viewer/2022050520/5fa42464047ebd42297b16fb/html5/thumbnails/58.jpg)
class 2
Data Systems 101
modern main-memory data systems
&
semester project
CS 591: Data Systems Architectures
next week: