cs 61c: great ideas in computer architecture (machine ...cs61c/sp16/lec/30/... · the hadoop...

CS61C:GreatIdeasinComputerArchitecture(MachineStructures)MapReduce,Spark,andHDFS

Instructors:NicholasWeaver&VladimirStojanovichttp://inst.eecs.berkeley.edu/~cs61c/

New-SchoolMachineStructures(It’sabitmorecomplicated!)

• ParallelRequestsAssigned tocomputere.g.,Search“cats”

• ParallelThreadsAssigned tocoree.g.,Lookup,Ads

• ParallelInstructions>[email protected].,5pipelined instructions

• ParallelData>1dataitem@one timee.g.,DeepLearningfor

imageclassification

• HardwaredescriptionsAllgates@onetime

• ProgrammingLanguages 2

SmartPhone

WarehouseScale

ComputerHarness

Parallelism&AchieveHighPerformance

LogicGates

Core Core…

Memory(Cache)

Input/Output

Computer

CacheMemory

Core

InstructionUnit(s) FunctionalUnit(s)

A3+B3A2+B2A1+B1A0+B0

SoftwareHardware

Data-LevelParallelism(DLP)• SIMD– Supportsdata-levelparallelisminasinglemachine

– Additionalinstructions&hardware

e.g.Matrixmultiplicationinmemory

• DLPonWSC

– Supportsdata-levelparallelismacrossmultiplemachines

– MapReduce&scalablefilesystems

e.g.TrainingCNNswithimagesacrossmultipledisks

3

WhatisMapReduce?• Simpledata-parallelprogrammingmodel and

implementation forprocessinglargedataset• Usersspecifythecomputationintermsof– amap function,and– areduce function

• Underlyingruntimesystem– Automaticallyparallelize thecomputationacrosslargescaleclustersofmachines.

– Handlesmachinefailure– Schedule inter-machinecommunicationtomakeefficientuseofthenetworks

4

JeffreyDeanandSanjayGhemawat,“MapReduce:SimplifiedDataProcessingonLargeClusters,”6th USENIXSymposiumonOperatingSystemsDesignandImplementation,2004.(optional reading, linkedoncoursehomepage– adigestibleCSpaperatthe61Clevel)

WhatisMapReduce usedfor?• AtGoogle:

– IndexconstructionforGoogleSearch– ArticleclusteringforGoogleNews– Statisticalmachinetranslation– Forcomputingmulti-layersstreetmaps

• AtYahoo!:– “Webmap”poweringYahoo!Search– SpamdetectionforYahoo!Mail

• AtFacebook:– Datamining– Adoptimization– Spamdetection

5

Inspiration:Map&ReduceFunctions,ex:Python

Calculate:

6

n2n=1

4

∑

A = [1, 2, 3, 4]def square(x):

return x * xdef sum(x, y):

return x + yreduce(sum,

map(square, A))

1 2 3 4

1 4 9 16

5 25

30

• Map:(in_key, in_value) à list(interm_key, interm_val)map(in_key, in_val):// DO WORK HEREemit(interm_key,interm_val)

– Slicedatainto“shards”or“splits”anddistributetoworkers– Computesetofintermediatekey/valuepairs

• Reduce:(interm_key, list(interm_value)) à list(out_value)reduce(interm_key, list(interm_val)): // DO WORK HEREemit(out_key, out_val)

– Combinesallintermediatevaluesforaparticularkey– Producesasetofmergedoutputvalues(usuallyjustone)

MapReduce ProgrammingModel

7

User-writtenMapfunctionreadsthedocumentdataandparsesoutthewords.Foreachword,itwritesthe(key,value)pairof(word,1).Thatis,thewordistreatedastheintermediatekeyandtheassociatedvalueof1meansthatwesawthewordonce.

Map phase:(docname,doccontents)à list(word,count)// “I do I learn” à [(“I”,1),(“do”,1),(“I”,1),(“learn”,1)]map(key, value):for each word w in value:

emit(w, 1)

MapReduce WordCountExample

8

Taskofcountingthenumberofoccurrencesofeachwordinalargecollectionofdocuments.

TheintermediatedataisthensortedbyMapReduce bykeysandtheuser’sReducefunctioniscalledforeachuniquekey.Inthiscase,Reduceiscalledwithalistofa"1"foreachoccurrenceofthewordthatwasparsedfromthedocument.Thefunctionaddsthemuptogenerateatotalwordcountforthatword.

Reducephase:(word,list(counts))à (word,count_sum)// (“I”, [1,1]) à (“I”,2)reduce(key, values): result = 0for each v in values:

result += vemit(key, result)

MapReduce WordCountExample

9

Taskofcountingthenumberofoccurrencesofeachwordinalargecollectionofdocuments.

MapReduce Implementation

10

MapReduce Execution

11

(1) Splitinputs,startupprogramsonaclusterofmachines

MapReduce Execution

12

(2) Assignmap&reducetaskstoidleworkers

MapReduce Execution

13

(3)Performamaptask,generateintermediatekey/valuepairs(4)Writetothebuffers

MapReduce Execution

14

(5)Readintermediatekey/valuepairs,sortthembyitskey.

MapReduce Execution

15

(6)Performareducetaskforeachintermediatekey,writetheresulttotheoutputfiles

BigDataFramework:Hadoop &Spark• ApacheHadoop– Open-sourceMapReduce Framework– Hadoop DistributedFileSystem(HDFS)– MapReduce JavaAPIs

• ApacheSpark– Fastandgeneralengineforlarge-scaledataprocessing.

– OriginallydevelopedintheAMPlabatUCBerkeley– RunningonHDFS– ProvidesJava,Scala,PythonAPIsfor

• Database• Machinelearning• Graphalgorithm

16

WordCount inHadoop’s JavaAPI

17

// RDD: primary abstraction of a distributed collection of itemsfile = sc.textFile(“hdfs://…”)// Two kinds of operations: // Actions: RDD à Value// Transformations: RDD à RDD// e.g. flatMap, Map, reduceByKeyfile.flatMap(lambda line: line.split())

.map(lambda word: (word, 1))

.reduceByKey(lambda a, b: a + b)

WordCountinSpark’sPythonAPI

18

TheHadoopDistributedFileSystem

• ThemagicofHadoopisnotjustinparallelizingthecomputation…– Butbuildinga(mostly)robust,(mostly)distributedfilesystem

• Modelistotakealargegroupofmachinesandprovidearobust,replicatedfilesystem– Disksandmachinescanarbitrarilycrash,withsomekeyexceptions• Donebyreplication:Usuallyatleast3x

– Canalsolocalizereplicas• EG,onecopyineachrack

19

HDFSBlocks• Filesarebrokenintofixed-sizedblocks

– Commonly128MB!• Small-elementlatencyisawful…

– Ittakesthesameamountoftimetoread1Basitdoes128MB!– Butthatisasensibledecision:

• Atypicalspinningdiskalreadybiasestowardsaccessing largeblocks:200+MB/sbandwidth,5+mslatency

• HDFSneedstostore"metadata"inmemory– HDFSisdesignedforhighthroughputoperations

• AnyblockisreplicatedacrossmultipleseparateDataNodes– Usuallyatleast3xreplication

20

HDFSNameNode

• TheNameNode tracksthefilesystemmetadata– Foreachfile,whatblocksonwhichDataNodes

• TheNameNode isbothapotentialbottleneckandpointoffailure– NeedlotsofmemorytokeepallthefilesystemmetadatainRAM• Sincethatislatency-bound

– RequestsallgototheNameNode• Singlepointofcontention

– NameNode failsandthesystemgoesdown!

21

HDFS'sSinglePointsofFailure…• TheNameNode itself

– Thebackupgivesfail-over,butthatisnotthesame– NameNode,unlikeDataNodes,areoftennottriviallyreplaceable

• SincetheNameNode oftenrequiresmorememory• Often:theswitch

– Needmultipleredundantnetworksinarackifyouneedtosurviveswitchfailures

• Often:thepower– Needsystemswithmultiplepowersuppliesandredundantpowerif

youneedtosurvivepowerfailures• Butit’satradeoff:

– HDFSisnotsupposed tobe"highavailability",but"inexpensiveandbigand(mostly)reliable":• '5-9sofavailability'aka'operating99.999%ofthetime'allowsforjust5

minutesdowntimeinayear!

22

Summary• WarehouseScaleComputers– Newclassofcomputers– Scalability,energyefficiency,highfailurerate

• Request-levelparallelisme.g.WebSearch

• Data-levelparallelismonalargedataset– MapReduce– Hadoop,Spark

23

cs 61c: great ideas in computer architecture (machine ...cs61c/sp16/lec/30/... · the hadoop...

Documents