cs 61c: great ideas in computer architecture (machine ...cs61c/sp16/lec/30/... · the hadoop...
TRANSCRIPT
CS61C:GreatIdeasinComputerArchitecture(MachineStructures)MapReduce,Spark,andHDFS
Instructors:NicholasWeaver&VladimirStojanovichttp://inst.eecs.berkeley.edu/~cs61c/
New-SchoolMachineStructures(It’sabitmorecomplicated!)
• ParallelRequestsAssigned tocomputere.g.,Search“cats”
• ParallelThreadsAssigned tocoree.g.,Lookup,Ads
• ParallelInstructions>[email protected].,5pipelined instructions
• ParallelData>1dataitem@one timee.g.,DeepLearningfor
imageclassification
• HardwaredescriptionsAllgates@onetime
• ProgrammingLanguages 2
SmartPhone
WarehouseScale
ComputerHarness
Parallelism&AchieveHighPerformance
LogicGates
Core Core…
Memory(Cache)
Input/Output
Computer
CacheMemory
Core
InstructionUnit(s) FunctionalUnit(s)
A3+B3A2+B2A1+B1A0+B0
SoftwareHardware
Data-LevelParallelism(DLP)• SIMD– Supportsdata-levelparallelisminasinglemachine
– Additionalinstructions&hardware
e.g.Matrixmultiplicationinmemory
• DLPonWSC
– Supportsdata-levelparallelismacrossmultiplemachines
– MapReduce&scalablefilesystems
e.g.TrainingCNNswithimagesacrossmultipledisks
3
WhatisMapReduce?• Simpledata-parallelprogrammingmodel and
implementation forprocessinglargedataset• Usersspecifythecomputationintermsof– amap function,and– areduce function
• Underlyingruntimesystem– Automaticallyparallelize thecomputationacrosslargescaleclustersofmachines.
– Handlesmachinefailure– Schedule inter-machinecommunicationtomakeefficientuseofthenetworks
4
JeffreyDeanandSanjayGhemawat,“MapReduce:SimplifiedDataProcessingonLargeClusters,”6th USENIXSymposiumonOperatingSystemsDesignandImplementation,2004.(optional reading, linkedoncoursehomepage– adigestibleCSpaperatthe61Clevel)
WhatisMapReduce usedfor?• AtGoogle:
– IndexconstructionforGoogleSearch– ArticleclusteringforGoogleNews– Statisticalmachinetranslation– Forcomputingmulti-layersstreetmaps
• AtYahoo!:– “Webmap”poweringYahoo!Search– SpamdetectionforYahoo!Mail
• AtFacebook:– Datamining– Adoptimization– Spamdetection
5
Inspiration:Map&ReduceFunctions,ex:Python
Calculate:
6
n2n=1
4
∑
A = [1, 2, 3, 4]def square(x):
return x * xdef sum(x, y):
return x + yreduce(sum,
map(square, A))
1 2 3 4
1 4 9 16
5 25
30
• Map:(in_key, in_value) à list(interm_key, interm_val)map(in_key, in_val):// DO WORK HEREemit(interm_key,interm_val)
– Slicedatainto“shards”or“splits”anddistributetoworkers– Computesetofintermediatekey/valuepairs
• Reduce:(interm_key, list(interm_value)) à list(out_value)reduce(interm_key, list(interm_val)): // DO WORK HEREemit(out_key, out_val)
– Combinesallintermediatevaluesforaparticularkey– Producesasetofmergedoutputvalues(usuallyjustone)
MapReduce ProgrammingModel
7
User-writtenMapfunctionreadsthedocumentdataandparsesoutthewords.Foreachword,itwritesthe(key,value)pairof(word,1).Thatis,thewordistreatedastheintermediatekeyandtheassociatedvalueof1meansthatwesawthewordonce.
Map phase:(docname,doccontents)à list(word,count)// “I do I learn” à [(“I”,1),(“do”,1),(“I”,1),(“learn”,1)]map(key, value):for each word w in value:
emit(w, 1)
MapReduce WordCountExample
8
Taskofcountingthenumberofoccurrencesofeachwordinalargecollectionofdocuments.
TheintermediatedataisthensortedbyMapReduce bykeysandtheuser’sReducefunctioniscalledforeachuniquekey.Inthiscase,Reduceiscalledwithalistofa"1"foreachoccurrenceofthewordthatwasparsedfromthedocument.Thefunctionaddsthemuptogenerateatotalwordcountforthatword.
Reducephase:(word,list(counts))à (word,count_sum)// (“I”, [1,1]) à (“I”,2)reduce(key, values): result = 0for each v in values:
result += vemit(key, result)
MapReduce WordCountExample
9
Taskofcountingthenumberofoccurrencesofeachwordinalargecollectionofdocuments.
MapReduce Implementation
10
MapReduce Execution
11
(1) Splitinputs,startupprogramsonaclusterofmachines
MapReduce Execution
12
(2) Assignmap&reducetaskstoidleworkers
MapReduce Execution
13
(3)Performamaptask,generateintermediatekey/valuepairs(4)Writetothebuffers
MapReduce Execution
14
(5)Readintermediatekey/valuepairs,sortthembyitskey.
MapReduce Execution
15
(6)Performareducetaskforeachintermediatekey,writetheresulttotheoutputfiles
BigDataFramework:Hadoop &Spark• ApacheHadoop– Open-sourceMapReduce Framework– Hadoop DistributedFileSystem(HDFS)– MapReduce JavaAPIs
• ApacheSpark– Fastandgeneralengineforlarge-scaledataprocessing.
– OriginallydevelopedintheAMPlabatUCBerkeley– RunningonHDFS– ProvidesJava,Scala,PythonAPIsfor
• Database• Machinelearning• Graphalgorithm
16
WordCount inHadoop’s JavaAPI
17
// RDD: primary abstraction of a distributed collection of itemsfile = sc.textFile(“hdfs://…”)// Two kinds of operations: // Actions: RDD à Value// Transformations: RDD à RDD// e.g. flatMap, Map, reduceByKeyfile.flatMap(lambda line: line.split())
.map(lambda word: (word, 1))
.reduceByKey(lambda a, b: a + b)
WordCountinSpark’sPythonAPI
18
TheHadoopDistributedFileSystem
• ThemagicofHadoopisnotjustinparallelizingthecomputation…– Butbuildinga(mostly)robust,(mostly)distributedfilesystem
• Modelistotakealargegroupofmachinesandprovidearobust,replicatedfilesystem– Disksandmachinescanarbitrarilycrash,withsomekeyexceptions• Donebyreplication:Usuallyatleast3x
– Canalsolocalizereplicas• EG,onecopyineachrack
19
HDFSBlocks• Filesarebrokenintofixed-sizedblocks
– Commonly128MB!• Small-elementlatencyisawful…
– Ittakesthesameamountoftimetoread1Basitdoes128MB!– Butthatisasensibledecision:
• Atypicalspinningdiskalreadybiasestowardsaccessing largeblocks:200+MB/sbandwidth,5+mslatency
• HDFSneedstostore"metadata"inmemory– HDFSisdesignedforhighthroughputoperations
• AnyblockisreplicatedacrossmultipleseparateDataNodes– Usuallyatleast3xreplication
20
HDFSNameNode
• TheNameNode tracksthefilesystemmetadata– Foreachfile,whatblocksonwhichDataNodes
• TheNameNode isbothapotentialbottleneckandpointoffailure– NeedlotsofmemorytokeepallthefilesystemmetadatainRAM• Sincethatislatency-bound
– RequestsallgototheNameNode• Singlepointofcontention
– NameNode failsandthesystemgoesdown!
21
HDFS'sSinglePointsofFailure…• TheNameNode itself
– Thebackupgivesfail-over,butthatisnotthesame– NameNode,unlikeDataNodes,areoftennottriviallyreplaceable
• SincetheNameNode oftenrequiresmorememory• Often:theswitch
– Needmultipleredundantnetworksinarackifyouneedtosurviveswitchfailures
• Often:thepower– Needsystemswithmultiplepowersuppliesandredundantpowerif
youneedtosurvivepowerfailures• Butit’satradeoff:
– HDFSisnotsupposed tobe"highavailability",but"inexpensiveandbigand(mostly)reliable":• '5-9sofavailability'aka'operating99.999%ofthetime'allowsforjust5
minutesdowntimeinayear!
22
Summary• WarehouseScaleComputers– Newclassofcomputers– Scalability,energyefficiency,highfailurerate
• Request-levelparallelisme.g.WebSearch
• Data-levelparallelismonalargedataset– MapReduce– Hadoop,Spark
23