01-intro

Upload: phani

Post on 01-Nov-2015

212 views

Category:

Documents


0 download

DESCRIPTION

dbms

TRANSCRIPT

  • 5/17/2018 01-intro

    1/52

    CS186: Introduction

    to Database Systems

    Joe Hellerstein

    and Christopher Olston

    Fall 2!

  • 5/17/2018 01-intro

    2/52

    "ueries #or $oday

    % &hat'

    % &hy'

    % &ho'

    % Ho('

    % For instance'

  • 5/17/2018 01-intro

    3/52

    &hat: Database Systems$hen

  • 5/17/2018 01-intro

    4/52

    &hat: Database Systems$oday

  • 5/17/2018 01-intro

    5/52

    &hat: Database Systems$oday

  • 5/17/2018 01-intro

    6/52

    &hat: Database Systems$oday

  • 5/17/2018 01-intro

    7/52

    &hat: Database Systems$oday

  • 5/17/2018 01-intro

    8/52

    So) &hatis a Database'

    % &e (ill be broad in our interpretation% * Database:

    + * ,ery lar-e. inte-rated collection o# data/

    % $ypically models a real0(orld enterprise

    + 3ntities 4e/-/. teams. -ames5+ elationships 4e/-/$he *7s are playin- inthe &orld Series5

    % i-ht surprise you ho( 9eible this is+ &eb search:

    % 3ntities: (ords. documents

    % elationships: (ord indocument. document lin;s todocument/

    +

  • 5/17/2018 01-intro

    9/52

    &hatis a Database ana-ementSystem'

    % * Database ana-ement System 4D>S5is:

    + * so#t(are system desi-ned to store. mana-e.and #acilitate access to databases/

    %$ypically this term used narro(ly+ elational databases (ith transactions

    % 3/-/ Oracle. D>2. S"? Ser,er

    + ostly because they predate other lar-e

    repositories% *lso because o# technical richness

    + &hen (e say D>Sin this class (e (ill usually#ollo( this con,ention

    % >ut ;eep an open mind about applyin- the ideas@

  • 5/17/2018 01-intro

    10/52

    &hat: Is the &&& a D>S'

    % Fairly sophisticated search a,ailable+ Cra(ler indeespa-es on the (eb

    + Aey(ord0based search#or pa-es

    % >ut. currently+ data is mostly unstructuredand untyped

    + search only:% can7t modi#y the data

    % can7t -et summaries. comple combinations o# data

    + #e( -uaranteespro,ided #or #reshness o# data.consistency across data items. #ault tolerance. )

    + &eb sites typically ha,e a 4relational5 D>S in thebac;-round to pro,ide these #unctions/

    % $he picture is chan-in- Buic;ly+ In#ormation 3tractionto -et structure #rom unstructured

    + e( standards e/-/. ?. Semantic &eb can help data

    modelin-

  • 5/17/2018 01-intro

    11/52

    &hat: Search ,s/ "uery

    % &hat i# you (anted to=nd out (hich actorsdonated to JohnAerry7s presidentialcampai-n'

    % $ry actors donated toEohn ;erryin your#a,orite search en-ine/

    % I# it isn7tpublished.

    it can7t be

    searched@

  • 5/17/2018 01-intro

    12/52

    &hat: * Database "uery*pproach

  • 5/17/2018 01-intro

    13/52

    ahoo *ctors JOI F3CIn#o4Courtesy o# the $ele-raph research -roupG>er;eley5

    ": Did it '

  • 5/17/2018 01-intro

    14/52

    %$hou-ht 3periment 2:

    +ou7re updatin- a =le/

    +$he po(er -oes out/

    +&hich chan-es sur,i,e'

    A) Yours B) Partners C) Both D) Neither E) ???

    A) All B) None C) All Since Last Save D) ???

    &hat: Is a File System aD>S'

    %$hou-ht 3periment 1:+ou and your proEect partner are editin- the

    same =le/

    + ou both sa,e it at the same time/

    + &hose chan-es sur,i,e'

  • 5/17/2018 01-intro

    15/52

    %$hou-ht 3periment 2:

    +ou7re updatin- a =le/

    +$he po(er -oes out/

    +&hich chan-es sur,i,e'

    A) Yours B) Partners C) Both D) Neither E) ???

    A) All B) None C) All Since Last Save D) ???

    &hat: Is a File System aD>S'

    %$hou-ht 3periment 1:+ou and your proEect partner are editin- the

    same =le/

    + ou both sa,e it at the same time/

    + &hose chan-es sur,i,e'

    ": Ho( do you (ritepro-rams o,er asubsystem (hen itpromises you only ''' '

    *: ery. ,ery care#ully@@

  • 5/17/2018 01-intro

    16/52

    OS Support #or Dataana-ement

    % Data can be stored in *+ this is (hat e,ery pro-rammin-

    lan-ua-e oers@

    + * is #ast. and random access+ Isn7t this hea,en'

    % 3,ery OS includes a File System+ mana-es fleson a ma-netic dis;

    + allo(s open, read, seek, closeon a =le

    + allo(s protections to be set on a =le

    + dra(bac;s relati,e to *'

  • 5/17/2018 01-intro

    17/52

    Database ana-ementSystems

    % &hat more could (e (ant than a =lesystem'

    + Simple. ecient ad hoc1Bueries

    + concurrency control+ reco,ery

    + bene=ts o# -ood data modelin-

    % S//O/

  • 5/17/2018 01-intro

    18/52

    Current Commercial Outloo;

    % * maEor part o# the so#t(are industry:+ Oracle. I>. icroso#t

    + also Sybase. In#ormi 4no( I>5. $eradata

    + smaller players: Ea,a0based dbms. de,ices. OO. )

    % &ell0;no(n benchmar;s 4esp/ $)'% Open Source comin- on stron-

    + yS"?.

  • 5/17/2018 01-intro

    19/52

    &hatdatabase systems (ill (eco,er'

    % &e (ill be try to be broad and touch upon+ elational D>S4e/-/ Oracle. S"? Ser,er. D>2.

    systems4e/-/ ?repositories li;e indice5

    % Startin- point+ &e assume you ha,e used (eb search en-ines

    + &e assume you don7t ;no( relational databases%et they pioneered many o# the ;ey ideas

    + So #ocus (ill be on relational D>Ss% &ith #reBuent side0notes on search en-ines. ? issues

  • 5/17/2018 01-intro

    20/52

    &hyta;e this class'

    */ Database systems are at the core o# CS

    >/ $hey are incredibly important to society

    C/ $he topic is intellectually richD/ * capstone course #or under-rad

    3/ It isn7t that much (or;

    F/ ?oo;s -ood on your resume

    ?et7s spend a little time on each o# these

  • 5/17/2018 01-intro

    21/52

    % Shi#t #rom computation to in#ormation+ $rue in corporate computin- #or years

    + &eb. p2p made this clear #or personal computin-

    + Increasin-ly true o# scienti=c computin-

    % eed #or D> technolo-y has eploded in the lastyears+ Corporate: retail s(ipeKclic;streams. customer

    relationship m-mt. supply chain m-mt. data(arehouses. etc/

    + &eb:not Eust documents/ Search en-ines. e0commerce. blo-s. (i;is. other (eb ser,ices/

    + Scienti=c: di-ital libraries. -enomics. satellite ima-ery.physical sensors. simulation data

    +

  • 5/17/2018 01-intro

    22/52

    &hy ta;e this class'

    % Ano(led-e is po(er/ 00Sir Francis >acon

    % &ith -reat po(er comes-reat responsibility/ 00Spideran7s Nncle >en

    B. DBs are incredibly important to soci

    Policy-makers should understand technological possibilities.

    Informed Technologists needed in public discourse on usage.

  • 5/17/2018 01-intro

    23/52

    % representin- in#ormation+ data modelin-

    % lan-ua-es and systems #or Bueryin- data

    + comple Bueries M Buery semantics

    + o,er massi,e data sets% concurrency control #or data manipulation

    + controllin- concurrent access

    + ensurin- transactional semantics

    % reliable data stora-e+ maintain data semantics e,en i# you pull the plu-

    semantics: the meanin- or relationship o# meanin-s o# a si-n or set

    o# si-ns

    &hyta;e this class'C. The topic is intellectually rich.

  • 5/17/2018 01-intro

    24/52

    % &e (ill see+ *l-orithms and cost analyses

    + System architecture and

    implementation+ esource mana-ement and schedulin-

    + Computer lan-ua-e desi-n. semanticsand optimiPation

    + *pplications o# *I topics includin- lo-icand plannin-

    + Statistical modelin- o# data

    &hyta;e this class'D. The course is a capstone.

  • 5/17/2018 01-intro

    25/52

    % >ad ne(s: It is a lot o# (or;/

    % Lood ne(s: the course is #ront

    loaded

    + ost o# the hard (or; is in the =rsthal# o# the semester

    + ?oad balanced (ith most otherclasses

    &hyta;e this class'E. It isnt that much work.

  • 5/17/2018 01-intro

    26/52

    % es. but (hy' $his is not a course #or:+ Oracle administrators

    + I> D>2 en-ine de,elopers

    %$hou-h it7s use#ul #or both@

    % It is a course #or (ell0educatedcomputer scientists

    + Database system concepts and techniBuesincreasin-ly used outside the bo

    % *s; your #riends at icroso#t. ahoo@. Loo-le.*pple. etc/

    % *ctually. they may or may not realiPe it@

    + * rich understandin- o# these issues is abasic and 4un'5#ortunately unusual s;ill/

    &hyta;e this class'. !ooks "ood on my resume.

  • 5/17/2018 01-intro

    27/52

    &ho'

    % Instructors+ er;eley

    + Dr/ Christopher Olston. ahoo@

    esearch+ cs186pro#sGdb/cs/ber;eley/edu

    %$*s

    +John ?o+ athan >ur;hart

    + *le asmussen

  • 5/17/2018 01-intro

    28/52

    Ho(' load

    %

  • 5/17/2018 01-intro

    29/52

    Ho(' *dministri,ia

    % http:KKinst/eecs/ber;eley/eduKRcs186% * 4chec; (ebpa-e5

    + Olston: 68 Soda Hall. $hursday 2* 4chec; (eb pa-e5

    % Discussion Sections &I??O$ meet this

    (ee;

  • 5/17/2018 01-intro

    30/52

    Ho(' *dministri,ia. cont/

    % $etboo;+ ama;rishnan and Lehr;e. Trd 3dition

    % Lradin-. hand0in policies. etc/ (ill be on &eb

  • 5/17/2018 01-intro

    31/52

    *-enda #or the rest o# today

    % * #ree tastin- o# central concepts inD> =eld:+ Bueries 4,s/ search5

    + data independence

    + transactions

    % et $ime+ the elational data model

    %$oday7s lecture is #rom Chapter 1 in ML

    % ead Chapter 2 #or net class/

  • 5/17/2018 01-intro

    32/52

    Describin- Data: Data odels

    % * data model is a collection o# concepts #ordescribin- data/

    % *schema is a description o# a particular collectiono# data. usin- a -i,en data model/

    %$he relational model o data is the most (idelyused model today/+ ain concept: relation. basically a table (ith ro(s and

    columns/

    + 3,ery relation has a schema. (hich describes thecolumns. or =elds/

  • 5/17/2018 01-intro

    33/52

    3ample: Nni,ersity Database

    %Schema:+ Students(sid: string, name: string,

    login: string, age: integer, gpa:real)

    + Courses(cid: string, cname:string,

    credits:integer)+ Enrolled(sid:string, cid:string,

    grade:string)

  • 5/17/2018 01-intro

    34/52

    ?e,els o# *bstraction

    % ie(s describe ho(users see the data/

    % Conceptual schemade=nes lo-icalstructure

    %

  • 5/17/2018 01-intro

    35/52

    3ample: Nni,ersity Database

    % Conceptual schema:

    + Students(sid: string, name: string, login: string, age:integer, gpa:real)

    + Courses(cid: string, cname:string, credits:integer)

    + Enrolled(sid:string, cid:string, grade:string)

    %

  • 5/17/2018 01-intro

    36/52

    Data Independence

    % *pplications insulated #rom ho( data isstructured and stored/

    % ?o-ical data independence:

  • 5/17/2018 01-intro

    37/52

    *-enda )

    % * #ree tastin- o# central conceptsin D> =eld:

    + Bueries 4,s/ search5

    + data independence+ transactions

    C t ti

  • 5/17/2018 01-intro

    38/52

    Concurrent eecutiono# user pro-rams

    % &hy'

    + NtiliPe C

  • 5/17/2018 01-intro

    39/52

    Concurrent eecution

    % Interlea,in- actions o# dierent pro-rams:

    trouble@3ample:

    >ill trans#ers U1 #rom sa,in-s to chec;in-Savings !""# Checking $ !""

    ean(hile. >ill7s (i#e reBuests account in#o/

    >ad interlea,in-:% Sa,in-s +V 1

    %

  • 5/17/2018 01-intro

    40/52

    Concurrency Control

    %D>S ensures such problems don7tarise

    %Nsers can pretend they are usin- asin-le0user system/ 4calledIsolation5

    +$han; -oodness@

  • 5/17/2018 01-intro

    41/52

    Aey concept:Transaction

    % an atomic seBuenceo# databaseactions 4readsK(rites5

    % ta;es D> #rom one consistent state

    to another

    consistent state 1 consistent state 2transaction

  • 5/17/2018 01-intro

    42/52

    3ample

    % Here. consistencyis based on our;no(led-e o# ban;in- semantics

    % In -eneral. up to (riter o# transaction toensure transaction preser,es consistency

    % D>S pro,ides 4limited5 automaticen#orcement. ,ia inte-rity constraints+ e/-/. balances must be WV

    chec;in-: U2sa,in-s: U1

    transaction chec;in-: UTsa,in-s: UX

  • 5/17/2018 01-intro

    43/52

    Concurrent transactions

    % Loal:e#ecute #acts $T%& T'& (Tn)& and ensure a consistentoutcome

    % %ne option:*serial+ schedule ,oneafter another-

    % &etter:allow interleain" of #actactions& as lon" as outcome ise/uialent to some serial schedule

  • 5/17/2018 01-intro

    44/52

  • 5/17/2018 01-intro

    45/52

    ?oc;in- eample

    $1 4>ill5: Savings !""# Checking $!""

    $2 4>ill7s (i#e5: 'rint(Checking)#'rint(Savings)

    +$1 and $2 both loc; Sa,in-s andChec;in- obEects

    + I# $1 loc;s Sa,in-s M Chec;in- =rst.

    $2 must (ait

  • 5/17/2018 01-intro

    46/52

    * (rin;le )

    Suppose:

    1/ $1 loc;s Sa,in-s

    2/ $2 loc;s Chec;in-% o( neither transaction can proceed@

    + called deadloc;

    + D>S (ill abort and restart one o# $1 and $2

    + eed undo mechanism that preser,es consistency

    + Nndo mechanism also necessary i# system crashesbet(een Sa,in-s +V 1 and Chec;in- QV 1 )

    $1 4>ill5: Savings !""# Checking $ !""

    $2 4>ill7s (i#e5: 'rint(Checking)# 'rint(Savings)

  • 5/17/2018 01-intro

    47/52

    3nsurin- $ransaction S ensures:+ atomicityeen if #act aborted ,due to deadlock& system

    crash& (-

    + durability of committed#acts& een if system crashes.

    % Idea: Aeep a logo# all actions carried out by the D>S:+ ecord all D> modi=cations in lo-. eorethey are eecuted

    + $o abort a act. undo lo--ed actions in re,erse order

    + I# system crashes. must:

    15 undopartially eecuted acts 4ensures atomicity525 redocommitted acts 4ensures durability5

    + trickier than it sounds

  • 5/17/2018 01-intro

    48/52

    *rchitecture o# a D>S)

  • 5/17/2018 01-intro

    49/52

    $ypical D>S architecture

    Query Optimization

    and Execution

    elational Operators

    !iles and "ccess #ethods

    $uffer #anagement

    %isk &pace #anagement

    %$

    concurrency control,

    logging & recovery

  • 5/17/2018 01-intro

    50/52

    *d,anta-es o# a D>S

    % Data independence

    % 3cient data access% Data inte-rity M security

    % Data administration

    % Concurrent access. crash reco,ery

    % educed application de,elopment time% So (hy not use them al(ays'

    + 3pensi,eKcomplicated to set up M maintain

    +$his cost M compleity must be oset by need

    + Leneral0purpose. not suited #or special0purpose tas;s 4e/-/

    tet search@5

    Databases ma;e these #ol;s

  • 5/17/2018 01-intro

    51/52

    Databases ma;e these #ol;shappy ///

    % D>S ,endors. pro-rammers

    + Oracle. I>. S )% 3nd users in many =elds

    + >usiness. education. science. )

    % D> application pro-rammers

    + >uild data entry M analysis tools on top o# D>Ss+ >uild (eb ser,ices that run o D>Ss

    % Database administrators 4D>*s5+ Desi-n lo-icalKphysical schemas

    + Handle security and authoriPation

    + Data a,ailability. crash reco,ery

    + Database tunin- as needs e,ol,e

    )must understand ho( a D>S (

    S

  • 5/17/2018 01-intro

    52/52

    Summary

    % D>S used to maintain. Buery lar-e datasets/

    + can manipulate data and eploit semantics% Other bene=ts include:

    + reco,ery #rom system crashes.

    + concurrent access.

    +Buic; application de,elopment.+ data inte-rity and security/

    % ?e,els o# abstraction pro,ide data independence/

    % In this course (e (ill eplore:15 Ho( to be a sophisticated user o# D>S technolo-y

    25 &hat -oes on inside the D>S