distributed systems · – a big data processing framework – mapper nodes par@@on data, reducer...

44
Distributed Systems Rik Sarkar University of Edinburgh Spring 2018

Upload: others

Post on 31-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

DistributedSystems

RikSarkar

UniversityofEdinburghSpring2018

Page 2: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

CourseInforma@on•  Instructors:

–  RikSarkar (IF3.45,[email protected])–  HeSun (IF5.03,[email protected])

–  TA:AbhirupGhosh([email protected])

•  Website:hQp://www.inf.ed.ac.uk/teaching/courses/ds•  Lectures:

–  Tuesdays11:10-13:00,DHT,LectureTheatreB–  Withshortbreakinthemiddle.

DistributedSystems,Edinburgh,2014 2

Page 3: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

ExamsandAssignments•  Grading:–  Coursework:1assignment,25%

•  Designadistributedalgorithmforagivenproblem•  Implementtheessen@alideainasimula@on

–  FinalExam:75%

•  Coursework– Markedon:

•  Descrip@onandanalysisofsolu@ondesign(theore@cal)•  Implementa@on(Prac@cal,inJava)

–  Requirements:•  Java,generalunderstandingofworkinginUNIX/LINUXenvironment

•  Basicunderstandingofalgorithmdesign,datastructures,probability

DistributedSystems,Edinburgh,2014 3

Page 4: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

Coursebackground&expecta@ons•  Understandingof–  Algorithmdesign–  Datastructures–  Graphtheory(DFS,BFS,MST,cycles,paths,shortestpaths)andrelevantalgorithms

–  Asympto@cnota@onsandcomplexity•  BigO,Ω,Θ

–  Basicprobability,expecta@on–  (Op@onally)Networksandcommunica@on

•  Thisyearcoursewillbemoretheore@calandanaly@cthanpreviousyears–  Emphasisoncorrectnessandrigor

DistributedSystems,Edinburgh,2014 4

Page 5: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

Reading&Books•  Norequiredtextbook

•  Suggestedreferences:–  [CDK]Coulouris,Dollimore,Kindberg;DistributedSystems:ConceptsandDesign•  4thEdi@on:hQp://www.cdk4.net/wo•  5thEdi@on:hQp://www.cdk4.net/wo

–  [VG]VijayGarg;ElementsofDistributedCompu@ng–  [NL]NancyLynch;DistributedAlgorithms

•  Wewillusetheseabbrevia@onstosuggestsuitablebooksfortopics.

•  Mostthingscanfoundwikipedia/web

DistributedSystems,Edinburgh,2014 5

Page 6: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

Whatisadistributedsystem?

DistributedSystems,Edinburgh,2014 6

Page 7: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

Whatisadistributedsystem?

•  Mul@plecomputersworkingtogetherononetask

•  Computersareconnectedbyanetwork,andexchangeinforma@on

DistributedSystems,Edinburgh,2014 7

Page 8: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

Whatisadistributedsystem?

•  Mul@plecomputersworkingtogetherononetask

•  Computersareconnectedbyanetwork,andexchangeinforma@on

DistributedSystems,Edinburgh,2014 8

Page 9: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

NetworksVsDistributedSystems

DistributedSystems,Edinburgh,2014 9

Page 10: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

NetworksVsDistributedSystems

DistributedSystems,Edinburgh,2014

datatransport

rou@ngmediumaccess

Networks:Howtosendmessagesfromonecomputerto

another

Computa@onUsingmanycomputersSendingmessagesto

Each-other

DistributedSystems:howtowriteprogramsthatusethe

networktomakeuseofmul@plecomputers

10

Page 11: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

DistributedSystems:Examples•  Webbrowsing:

client server

•  Inthiscase:–  Clientrequestswhatisneeded–  Servercomputesanddecideswhatistobeshown–  Clientshowsinforma@ontouser

DistributedSystems,Edinburgh,2014 11

Page 12: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

DistributedSystems:Examples

•  Mul:playerGames– Differentplayersaredoingdifferentthings–  Theirac@onsmustbeconsistent

•  Don’tallowonepersontobeatdifferentloca@onsinviewsofdifferentpeople

•  Don’tlettwopeoplestandatthesamespot•  IfXshootsY,theneveryonemustknowthatYisdead

– Madedifficultbythefactthatplayersareondifferentcomputers

–  Some@mesnetworkmaybeslow–  Some@mesmessagescanbelost

DistributedSystems,Edinburgh,2014 12

Page 13: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

DistributedSystems:Examples•  Stockmarkets:Mul:player

gameswithHighstakes!•  Everyonewantsinforma@on

quicklyandtobuy/sellwithoutdelay

•  Updatesmustbesenttomanyclientsfast

•  Transac@onsmustbeexecutedinrightorder

•  Specializednetworksworthmillionsareinstalledtoreducelatency

DistributedSystems,Edinburgh,2014 13

Page 14: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

DistributedSystems:Examples•  Hadoop– Abigdataprocessingframework– Mappernodespar@@ondata,reducernodesprocessdatabypar@@ons

– Userdecidespar@@oning,andprocessingofeachpar@@on

– Hadoophandlestasksofmovingdatafromnodetonode

– Hadoop/mapreduceisaspecificsetupfordistributedprocessingofdata

DistributedSystems,Edinburgh,2014 14

Page 15: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

DistributedSystems:Examples•  Networks:workdistributedly–  DNS:whatistheIPaddressofwww.google.com?

•  SearchlocalDNSserver(whichmaynotknoweverything)•  Itcontactshigherlevel(non-local)DNSservers•  IPaddressisreturnedtouser

–  Rou@ng:SendmessagetoIPaddressX•  SearchandfindapathtoX•  Noonenodeknowstheen@renetwork

– Mediumaccess:manynodesusingthesameaccesspointneedtocoordinatetheirtransmissions•  Whentwopeoplespeakatthesame@me,communica@ongetsgarbled

•  Onenodedoesnotknowtheinten@onsofothers•  Coordina@onisneededwithincompleteinforma@on

DistributedSystems,Edinburgh,2014 15

Page 16: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

DistributedSystems:Examples

•  Mainissueinnetworking:onenodedoesnothavecomplete(global)knowledgeoftherestofthenetwork– Needdistributedsolu@ons–networkprotocols– Nodesworkwithlocalinforma@on

DistributedSystems,Edinburgh,2014 16

Page 17: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

DistributedSystems:Examples•  MobileandSensorSystems

–  Mobilephonesandsmartsensorsarecomputers

–  Opportunitytoprocessdataatsensorsinsteadofservers

–  Distributednetworkedopera@on–  Inaddi@on,nodesarelowpowered,

baQeryoperated–  Nodesmaymove

•  Ubiquitouscompu:ng&Internetofthings–  Embeddedcomputersare

everywhereintheenvironment–  Wecanusethemtoprocessdata

availabletothemthroughsensors,ac@onsofusers,etc.

–  Networkinganddistributedcompu@ngeverywhereintheenvironment

DistributedSystems,Edinburgh,2014 17

Page 18: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

DistributedSystems:Examples•  Autonomousvehicles

–  Computeroperatedvehicles,willusesensorstomaptheenvironmentandnavigate

–  Sensorsinthecar,intheenvironment,othercars

–  Needtocommunicateandanalyzedatatomakequickdecisions

–  Manysensorsandlotsofdata

–  Strictconsistencyrules–twocarscannotbeatthesamespotatthesame@me!

–  Needveryfastinforma@onprocessing

–  NodesaremobileDistributedSystems,Edinburgh,2014 18

Page 19: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

ChallengesinDistributedCompu@ng

•  Fundamentalissue:Differentnodeshavedifferentknowledge.Onenodedoesknowthestatusofothernodesinthenetwork

•  Ifeachnodeknewexactlythestatusatallothernodesinthenetwork,compu@ngwouldbeeasy.

•  Butthisisimpossible,theore@callyandprac@cally

DistributedSystems,Edinburgh,2014 19

Page 20: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

Theore@calissue:Knowledgecannotbeperfectlyuptodate

–  Informa@ontransmissionisboundedbyspeedoflight(plushardwareandsovwarelimita@onsofthenodes&network)

–  Newthingscanhappenwhileinforma@onistravelingfromnodeAtonodeB

–  BcanneverbeperfectlyuptodateaboutthestatusofA

DistributedSystems,Edinburgh,2014 20

A B

e1

Blearnsaboute1

Time

C

e2e3

Page 21: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

Prac@calchallengesindistributedsystems

•  Communica@oniscostly:Itisnotprac@caltotransmiteverythingfromAtoBallthe@me

•  Therearemanynodes:Transmiwngupdatestoallnodesandreceivingupdatesfromallnodesareevenmoreimprac@cal

DistributedSystems,Edinburgh,2014 21

Page 22: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

•  Thecri@calques@onindistributedsystems:

•  Whatmessage/informa@ontosendtowhichnodes,andwhen?

DistributedSystems,Edinburgh,2014 22

Page 23: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

Example1

•  Asimpledistributedcomputa@on:– Eachnodehasstoredanumericvalue– Computethetotalofthesenumbers

Server

Howmanymessagesdoesittake?

DistributedSystems,Edinburgh,2014 23

Page 24: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

Example1

•  Asimpledistributedcomputa@on:– Eachnodehasstoredanumericvalue– Computethetotalofthesenumbers

Server4

DistributedSystems,Edinburgh,2014 24

Page 25: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

Example2

•  Asimpledistributedcomputa@on:– Eachnodehasstoredanumericvalue– Computethetotalofthenumbers

Server

Howmanymessagesdoesittake?

DistributedSystems,Edinburgh,2014 25

Page 26: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

Example2

•  Asimpledistributedcomputa@on:– Eachnodehasstoredanumericvalue– Computethetotalofthenumbers

Server

Total:101 2 3 4Numberofmessages:

DistributedSystems,Edinburgh,2014 26

Page 27: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

•  ComplexitymaydependontheNetwork

DistributedSystems,Edinburgh,2014 27

Page 28: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

Example2

•  Asimpledistributedcomputa@on:– Eachnodehasstoredanumericvalue– Computethetotalofthenumbers

Server

CanyoufindabeQer,moreefficientway?

a b c d

DistributedSystems,Edinburgh,2014 28

Page 29: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

Example2

•  Asimpledistributedcomputa@on:– Eachnodehasstoredanumericvalue– Computethetotalofthenumbers

Server a b c d

v(d)V(c)+v(d)

v(b)+v(c)+v(d)

v(a)+v(b)+v(c)+v(d)

Cost:4messages

DistributedSystems,Edinburgh,2014 29

Page 30: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

Prac@calchallengesindistributedsystems

•  Timecannotbemeasuredperfectly– Clocksalwaysmoveslightlyfaster/slower;speedschange

– Hardtocomparebefore/averrela@onsbetweeneventsatdifferentnodes

– Makesitdifficulttokeepcausalrela@onscorrect– E.g.Inamul@-playergame,twoplayersfiredtheirguns.Whoshotfirst?

DistributedSystems,Edinburgh,2014 30

Page 31: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

Prac@calchallengesindistributedsystems

•  Failures–  Somenodesmayfail–  Somecommunica@onlinksmayfail,messagesgetlost

– Weneedsystemsresilienttofailures–itshouldcon@nuetoworkevenifsomenodes/linksfail,oratleastrecoverfromfailures

–  E.g.Innetworkrou@ng,ifsomenodesfail,therou@ngprotocolsfindnewpathstothedes@na@on

DistributedSystems,Edinburgh,2014 31

Page 32: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

Prac@calchallengesindistributedsystems

•  Mobility– Somenodesmaybemobile– Noteasytofindandcommunicatewithmovingnodes

– Communica@onproper@es,delays,messagelossratesetcchangewithchangingloca@ons

– Loca@onsofnodesareimportant,determinetheirneedsandpreferences

DistributedSystems,Edinburgh,2014 32

Page 33: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

Prac@calchallengesindistributedsystems

•  Scalabilitywithsize(numberofnodes)–  Systemsmayneedtogrowinnumberofnodeswhenithastohandlemoredataorusers

–  Thedesignshouldeasilyadapttothisgrowthandnotgetstucktryingtohandlelargeamountsofdataormanynodes

–  E.g.Inamul@playergamewithmanyplayers,ifallac@onsofeachplayerineverysecondissenttoallotherplayers,thiswillgenerateO(n2)messageseverysecond.

–  Op@ons:•  MakeefficientsystemsthatcanhandleO(n^2)messagespersecond(moreandmoredifficultwithgrowingn)

•  Or,makecleverchoicesofwhichmessagestosendtowhichplayers,andkeepitmanageable

DistributedSystems,Edinburgh,2014 33

Page 34: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

Prac@calchallengesindistributedsystems

•  Transparency– Usershouldnothavetoworryaboutdetails•  Howmanynodes•  Howtheyareconnected•  Loca@ons,addresses•  mobility•  Failures•  concurrency•  Networkprotocols

DistributedSystems,Edinburgh,2014 34

Page 35: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

Prac@calchallengesindistributedsystems

•  Security– Confiden@ality–onlyauthorizeduserscanaccess–  Integrity–shouldnotgetaltered/corruptedorgetintoanundesirablestate

– Availability–shouldnotgetdisruptedbyenemies(e.g.byadenialofserviceaQack)

– Perfectsecurityisimpossible.Goodprac@calsecurityisusuallypossible,buttakessomecareandeffort.Encryp@onhelps.

DistributedSystems,Edinburgh,2014 35

Page 36: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

Distributedcomputa@ons:Examples

•  Agreement– Getnodestoagreeonthevalueofsomething• Whenshouldwegotothemovie?• Whatshouldbethemul@playerstrategy?• Whenshouldweselltheshares?•  …

DistributedSystems,Edinburgh,2014 36

Page 37: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

Distributedcomputa@ons:Examples

•  Leaderelec@on– Whichnodeisthecoordinatorinhadoop?– Whichnodeisthewhichreturnsthefinalresult?

DistributedSystems,Edinburgh,2014 37

Page 38: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

Distributedcomputa@ons:Examples

•  DecidingmaQersof@me:– Whathappenedfirst?AorB?– Whatsequenceofeventsdefinitelyhappenedandwhatcannothavehappened?

DistributedSystems,Edinburgh,2014 38

Page 39: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

Distributedcomputa@ons:Examples

•  Storeandretrievedata– Peertopeersystems– Sensornetworks

DistributedSystems,Edinburgh,2014 39

Page 40: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

Distributedcomputa@ons:Examples

•  Aggrega@on:gewngdatafrommanynodes– Whatistheaveragetemperaturerecordedbythemobilephones?

– Howmanypeopleareinthebuilding?– Whatisthemaximumspeedofcarsonthehighway?

DistributedSystems,Edinburgh,2014 40

Page 41: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

Summary:DistributedSystems•  Mul@plecomputersopera@ngbysendingmessagestoeachotheroveranetwork

•  Integraltomanyemergingtrendsincompu@ng•  Reasonsfordistributedsystems:–  Tasksgetdonefaster–  Canbemademoreresilient:Ifonecomputerfails,anothertakesover

–  Loadbalancingandresourcesharing–  Some@mes,systemsareinherentlydistributed.E.g.peoplefromdifferentloca@onscollabora@ngontasks,playinggames,etc.

–  Bringsoutmanynaturalques@onsabouthownaturalworld,ecosystems,economies,emergentbehaviorswork•  Eg.Birdsflocking,firefliesblinkinginsync,peoplewalkingwithoutcolliding,economicgametheoryandequilibria…

DistributedSystems,Edinburgh,2014 41

Page 42: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

Summary:DistributedSystems

•  Examples:– Webbrowsing– Mul@playergames– Digital(Stock)markets–  Collabora@veedi@ng(Wikipedia,reddit,slashdot..)–  Bigdataprocessing(hadoopetc)– Networks– Mobileandsensorsystems– Ubiquitouscompu@ng– Autonomousvehicles

DistributedSystems,Edinburgh,2014 42Ref:CDK

Page 43: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

ChallengesinDistributedsystemdesign

•  Lackofglobalknowledge•  Noperfect(shared)clock•  Communica@oniscostlyinlargevolumes•  Failuresofnodes/links,lossofmessages•  Scalability•  Transparency•  Security•  Mobility

DistributedSystems,Edinburgh,2014 43

Page 44: Distributed Systems · – A big data processing framework – Mapper nodes par@@on data, reducer nodes process data by par@@ons – User decides par@@oning, and processing of each

Thecourse

•  Novideorecording•  Bringpaperandpen•  Takenotes–  Iden@fywhatisimportant

DistributedSystems,Edinburgh,2014 44