01-intro
DESCRIPTION
dbmsTRANSCRIPT
-
5/17/2018 01-intro
1/52
CS186: Introduction
to Database Systems
Joe Hellerstein
and Christopher Olston
Fall 2!
-
5/17/2018 01-intro
2/52
"ueries #or $oday
% &hat'
% &hy'
% &ho'
% Ho('
% For instance'
-
5/17/2018 01-intro
3/52
&hat: Database Systems$hen
-
5/17/2018 01-intro
4/52
&hat: Database Systems$oday
-
5/17/2018 01-intro
5/52
&hat: Database Systems$oday
-
5/17/2018 01-intro
6/52
&hat: Database Systems$oday
-
5/17/2018 01-intro
7/52
&hat: Database Systems$oday
-
5/17/2018 01-intro
8/52
So) &hatis a Database'
% &e (ill be broad in our interpretation% * Database:
+ * ,ery lar-e. inte-rated collection o# data/
% $ypically models a real0(orld enterprise
+ 3ntities 4e/-/. teams. -ames5+ elationships 4e/-/$he *7s are playin- inthe &orld Series5
% i-ht surprise you ho( 9eible this is+ &eb search:
% 3ntities: (ords. documents
% elationships: (ord indocument. document lin;s todocument/
+
-
5/17/2018 01-intro
9/52
&hatis a Database ana-ementSystem'
% * Database ana-ement System 4D>S5is:
+ * so#t(are system desi-ned to store. mana-e.and #acilitate access to databases/
%$ypically this term used narro(ly+ elational databases (ith transactions
% 3/-/ Oracle. D>2. S"? Ser,er
+ ostly because they predate other lar-e
repositories% *lso because o# technical richness
+ &hen (e say D>Sin this class (e (ill usually#ollo( this con,ention
% >ut ;eep an open mind about applyin- the ideas@
-
5/17/2018 01-intro
10/52
&hat: Is the &&& a D>S'
% Fairly sophisticated search a,ailable+ Cra(ler indeespa-es on the (eb
+ Aey(ord0based search#or pa-es
% >ut. currently+ data is mostly unstructuredand untyped
+ search only:% can7t modi#y the data
% can7t -et summaries. comple combinations o# data
+ #e( -uaranteespro,ided #or #reshness o# data.consistency across data items. #ault tolerance. )
+ &eb sites typically ha,e a 4relational5 D>S in thebac;-round to pro,ide these #unctions/
% $he picture is chan-in- Buic;ly+ In#ormation 3tractionto -et structure #rom unstructured
+ e( standards e/-/. ?. Semantic &eb can help data
modelin-
-
5/17/2018 01-intro
11/52
&hat: Search ,s/ "uery
% &hat i# you (anted to=nd out (hich actorsdonated to JohnAerry7s presidentialcampai-n'
% $ry actors donated toEohn ;erryin your#a,orite search en-ine/
% I# it isn7tpublished.
it can7t be
searched@
-
5/17/2018 01-intro
12/52
&hat: * Database "uery*pproach
-
5/17/2018 01-intro
13/52
ahoo *ctors JOI F3CIn#o4Courtesy o# the $ele-raph research -roupG>er;eley5
": Did it '
-
5/17/2018 01-intro
14/52
%$hou-ht 3periment 2:
+ou7re updatin- a =le/
+$he po(er -oes out/
+&hich chan-es sur,i,e'
A) Yours B) Partners C) Both D) Neither E) ???
A) All B) None C) All Since Last Save D) ???
&hat: Is a File System aD>S'
%$hou-ht 3periment 1:+ou and your proEect partner are editin- the
same =le/
+ ou both sa,e it at the same time/
+ &hose chan-es sur,i,e'
-
5/17/2018 01-intro
15/52
%$hou-ht 3periment 2:
+ou7re updatin- a =le/
+$he po(er -oes out/
+&hich chan-es sur,i,e'
A) Yours B) Partners C) Both D) Neither E) ???
A) All B) None C) All Since Last Save D) ???
&hat: Is a File System aD>S'
%$hou-ht 3periment 1:+ou and your proEect partner are editin- the
same =le/
+ ou both sa,e it at the same time/
+ &hose chan-es sur,i,e'
": Ho( do you (ritepro-rams o,er asubsystem (hen itpromises you only ''' '
*: ery. ,ery care#ully@@
-
5/17/2018 01-intro
16/52
OS Support #or Dataana-ement
% Data can be stored in *+ this is (hat e,ery pro-rammin-
lan-ua-e oers@
+ * is #ast. and random access+ Isn7t this hea,en'
% 3,ery OS includes a File System+ mana-es fleson a ma-netic dis;
+ allo(s open, read, seek, closeon a =le
+ allo(s protections to be set on a =le
+ dra(bac;s relati,e to *'
-
5/17/2018 01-intro
17/52
Database ana-ementSystems
% &hat more could (e (ant than a =lesystem'
+ Simple. ecient ad hoc1Bueries
+ concurrency control+ reco,ery
+ bene=ts o# -ood data modelin-
% S//O/
-
5/17/2018 01-intro
18/52
Current Commercial Outloo;
% * maEor part o# the so#t(are industry:+ Oracle. I>. icroso#t
+ also Sybase. In#ormi 4no( I>5. $eradata
+ smaller players: Ea,a0based dbms. de,ices. OO. )
% &ell0;no(n benchmar;s 4esp/ $)'% Open Source comin- on stron-
+ yS"?.
-
5/17/2018 01-intro
19/52
&hatdatabase systems (ill (eco,er'
% &e (ill be try to be broad and touch upon+ elational D>S4e/-/ Oracle. S"? Ser,er. D>2.
systems4e/-/ ?repositories li;e indice5
% Startin- point+ &e assume you ha,e used (eb search en-ines
+ &e assume you don7t ;no( relational databases%et they pioneered many o# the ;ey ideas
+ So #ocus (ill be on relational D>Ss% &ith #reBuent side0notes on search en-ines. ? issues
-
5/17/2018 01-intro
20/52
&hyta;e this class'
*/ Database systems are at the core o# CS
>/ $hey are incredibly important to society
C/ $he topic is intellectually richD/ * capstone course #or under-rad
3/ It isn7t that much (or;
F/ ?oo;s -ood on your resume
?et7s spend a little time on each o# these
-
5/17/2018 01-intro
21/52
% Shi#t #rom computation to in#ormation+ $rue in corporate computin- #or years
+ &eb. p2p made this clear #or personal computin-
+ Increasin-ly true o# scienti=c computin-
% eed #or D> technolo-y has eploded in the lastyears+ Corporate: retail s(ipeKclic;streams. customer
relationship m-mt. supply chain m-mt. data(arehouses. etc/
+ &eb:not Eust documents/ Search en-ines. e0commerce. blo-s. (i;is. other (eb ser,ices/
+ Scienti=c: di-ital libraries. -enomics. satellite ima-ery.physical sensors. simulation data
+
-
5/17/2018 01-intro
22/52
&hy ta;e this class'
% Ano(led-e is po(er/ 00Sir Francis >acon
% &ith -reat po(er comes-reat responsibility/ 00Spideran7s Nncle >en
B. DBs are incredibly important to soci
Policy-makers should understand technological possibilities.
Informed Technologists needed in public discourse on usage.
-
5/17/2018 01-intro
23/52
% representin- in#ormation+ data modelin-
% lan-ua-es and systems #or Bueryin- data
+ comple Bueries M Buery semantics
+ o,er massi,e data sets% concurrency control #or data manipulation
+ controllin- concurrent access
+ ensurin- transactional semantics
% reliable data stora-e+ maintain data semantics e,en i# you pull the plu-
semantics: the meanin- or relationship o# meanin-s o# a si-n or set
o# si-ns
&hyta;e this class'C. The topic is intellectually rich.
-
5/17/2018 01-intro
24/52
% &e (ill see+ *l-orithms and cost analyses
+ System architecture and
implementation+ esource mana-ement and schedulin-
+ Computer lan-ua-e desi-n. semanticsand optimiPation
+ *pplications o# *I topics includin- lo-icand plannin-
+ Statistical modelin- o# data
&hyta;e this class'D. The course is a capstone.
-
5/17/2018 01-intro
25/52
% >ad ne(s: It is a lot o# (or;/
% Lood ne(s: the course is #ront
loaded
+ ost o# the hard (or; is in the =rsthal# o# the semester
+ ?oad balanced (ith most otherclasses
&hyta;e this class'E. It isnt that much work.
-
5/17/2018 01-intro
26/52
% es. but (hy' $his is not a course #or:+ Oracle administrators
+ I> D>2 en-ine de,elopers
%$hou-h it7s use#ul #or both@
% It is a course #or (ell0educatedcomputer scientists
+ Database system concepts and techniBuesincreasin-ly used outside the bo
% *s; your #riends at icroso#t. ahoo@. Loo-le.*pple. etc/
% *ctually. they may or may not realiPe it@
+ * rich understandin- o# these issues is abasic and 4un'5#ortunately unusual s;ill/
&hyta;e this class'. !ooks "ood on my resume.
-
5/17/2018 01-intro
27/52
&ho'
% Instructors+ er;eley
+ Dr/ Christopher Olston. ahoo@
esearch+ cs186pro#sGdb/cs/ber;eley/edu
%$*s
+John ?o+ athan >ur;hart
+ *le asmussen
-
5/17/2018 01-intro
28/52
Ho(' load
%
-
5/17/2018 01-intro
29/52
Ho(' *dministri,ia
% http:KKinst/eecs/ber;eley/eduKRcs186% * 4chec; (ebpa-e5
+ Olston: 68 Soda Hall. $hursday 2* 4chec; (eb pa-e5
% Discussion Sections &I??O$ meet this
(ee;
-
5/17/2018 01-intro
30/52
Ho(' *dministri,ia. cont/
% $etboo;+ ama;rishnan and Lehr;e. Trd 3dition
% Lradin-. hand0in policies. etc/ (ill be on &eb
-
5/17/2018 01-intro
31/52
*-enda #or the rest o# today
% * #ree tastin- o# central concepts inD> =eld:+ Bueries 4,s/ search5
+ data independence
+ transactions
% et $ime+ the elational data model
%$oday7s lecture is #rom Chapter 1 in ML
% ead Chapter 2 #or net class/
-
5/17/2018 01-intro
32/52
Describin- Data: Data odels
% * data model is a collection o# concepts #ordescribin- data/
% *schema is a description o# a particular collectiono# data. usin- a -i,en data model/
%$he relational model o data is the most (idelyused model today/+ ain concept: relation. basically a table (ith ro(s and
columns/
+ 3,ery relation has a schema. (hich describes thecolumns. or =elds/
-
5/17/2018 01-intro
33/52
3ample: Nni,ersity Database
%Schema:+ Students(sid: string, name: string,
login: string, age: integer, gpa:real)
+ Courses(cid: string, cname:string,
credits:integer)+ Enrolled(sid:string, cid:string,
grade:string)
-
5/17/2018 01-intro
34/52
?e,els o# *bstraction
% ie(s describe ho(users see the data/
% Conceptual schemade=nes lo-icalstructure
%
-
5/17/2018 01-intro
35/52
3ample: Nni,ersity Database
% Conceptual schema:
+ Students(sid: string, name: string, login: string, age:integer, gpa:real)
+ Courses(cid: string, cname:string, credits:integer)
+ Enrolled(sid:string, cid:string, grade:string)
%
-
5/17/2018 01-intro
36/52
Data Independence
% *pplications insulated #rom ho( data isstructured and stored/
% ?o-ical data independence:
-
5/17/2018 01-intro
37/52
*-enda )
% * #ree tastin- o# central conceptsin D> =eld:
+ Bueries 4,s/ search5
+ data independence+ transactions
C t ti
-
5/17/2018 01-intro
38/52
Concurrent eecutiono# user pro-rams
% &hy'
+ NtiliPe C
-
5/17/2018 01-intro
39/52
Concurrent eecution
% Interlea,in- actions o# dierent pro-rams:
trouble@3ample:
>ill trans#ers U1 #rom sa,in-s to chec;in-Savings !""# Checking $ !""
ean(hile. >ill7s (i#e reBuests account in#o/
>ad interlea,in-:% Sa,in-s +V 1
%
-
5/17/2018 01-intro
40/52
Concurrency Control
%D>S ensures such problems don7tarise
%Nsers can pretend they are usin- asin-le0user system/ 4calledIsolation5
+$han; -oodness@
-
5/17/2018 01-intro
41/52
Aey concept:Transaction
% an atomic seBuenceo# databaseactions 4readsK(rites5
% ta;es D> #rom one consistent state
to another
consistent state 1 consistent state 2transaction
-
5/17/2018 01-intro
42/52
3ample
% Here. consistencyis based on our;no(led-e o# ban;in- semantics
% In -eneral. up to (riter o# transaction toensure transaction preser,es consistency
% D>S pro,ides 4limited5 automaticen#orcement. ,ia inte-rity constraints+ e/-/. balances must be WV
chec;in-: U2sa,in-s: U1
transaction chec;in-: UTsa,in-s: UX
-
5/17/2018 01-intro
43/52
Concurrent transactions
% Loal:e#ecute #acts $T%& T'& (Tn)& and ensure a consistentoutcome
% %ne option:*serial+ schedule ,oneafter another-
% &etter:allow interleain" of #actactions& as lon" as outcome ise/uialent to some serial schedule
-
5/17/2018 01-intro
44/52
-
5/17/2018 01-intro
45/52
?oc;in- eample
$1 4>ill5: Savings !""# Checking $!""
$2 4>ill7s (i#e5: 'rint(Checking)#'rint(Savings)
+$1 and $2 both loc; Sa,in-s andChec;in- obEects
+ I# $1 loc;s Sa,in-s M Chec;in- =rst.
$2 must (ait
-
5/17/2018 01-intro
46/52
* (rin;le )
Suppose:
1/ $1 loc;s Sa,in-s
2/ $2 loc;s Chec;in-% o( neither transaction can proceed@
+ called deadloc;
+ D>S (ill abort and restart one o# $1 and $2
+ eed undo mechanism that preser,es consistency
+ Nndo mechanism also necessary i# system crashesbet(een Sa,in-s +V 1 and Chec;in- QV 1 )
$1 4>ill5: Savings !""# Checking $ !""
$2 4>ill7s (i#e5: 'rint(Checking)# 'rint(Savings)
-
5/17/2018 01-intro
47/52
3nsurin- $ransaction S ensures:+ atomicityeen if #act aborted ,due to deadlock& system
crash& (-
+ durability of committed#acts& een if system crashes.
% Idea: Aeep a logo# all actions carried out by the D>S:+ ecord all D> modi=cations in lo-. eorethey are eecuted
+ $o abort a act. undo lo--ed actions in re,erse order
+ I# system crashes. must:
15 undopartially eecuted acts 4ensures atomicity525 redocommitted acts 4ensures durability5
+ trickier than it sounds
-
5/17/2018 01-intro
48/52
*rchitecture o# a D>S)
-
5/17/2018 01-intro
49/52
$ypical D>S architecture
Query Optimization
and Execution
elational Operators
!iles and "ccess #ethods
$uffer #anagement
%isk &pace #anagement
%$
concurrency control,
logging & recovery
-
5/17/2018 01-intro
50/52
*d,anta-es o# a D>S
% Data independence
% 3cient data access% Data inte-rity M security
% Data administration
% Concurrent access. crash reco,ery
% educed application de,elopment time% So (hy not use them al(ays'
+ 3pensi,eKcomplicated to set up M maintain
+$his cost M compleity must be oset by need
+ Leneral0purpose. not suited #or special0purpose tas;s 4e/-/
tet search@5
Databases ma;e these #ol;s
-
5/17/2018 01-intro
51/52
Databases ma;e these #ol;shappy ///
% D>S ,endors. pro-rammers
+ Oracle. I>. S )% 3nd users in many =elds
+ >usiness. education. science. )
% D> application pro-rammers
+ >uild data entry M analysis tools on top o# D>Ss+ >uild (eb ser,ices that run o D>Ss
% Database administrators 4D>*s5+ Desi-n lo-icalKphysical schemas
+ Handle security and authoriPation
+ Data a,ailability. crash reco,ery
+ Database tunin- as needs e,ol,e
)must understand ho( a D>S (
S
-
5/17/2018 01-intro
52/52
Summary
% D>S used to maintain. Buery lar-e datasets/
+ can manipulate data and eploit semantics% Other bene=ts include:
+ reco,ery #rom system crashes.
+ concurrent access.
+Buic; application de,elopment.+ data inte-rity and security/
% ?e,els o# abstraction pro,ide data independence/
% In this course (e (ill eplore:15 Ho( to be a sophisticated user o# D>S technolo-y
25 &hat -oes on inside the D>S