02lecsp11mapreduce

Upload: pallilino

Post on 04-Jun-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 02LecSp11MapReduce

    1/70

    CS 61C: Great Ideas in Computer

    Architecture (Machine Structures)

    Instructors:Randy H. Katz

    David A. Pattersonhttp://inst.eecs.Berkeley.edu/~cs61c/sp11

    1Apeinf 2011 -- Lecture #21/30/2014

  • 8/13/2019 02LecSp11MapReduce

    2/70

    Review CS61c: Learn 5 great ideas in computer architecture to

    enable high performance programming via parallelism,not just learn C

    1. Layers of Representation/Interpretation

    2. Moores Law

    3. Principle of Locality/Memory Hierarchy

    4. Parallelism

    5. Performance Measurement and Improvement

    6. Dependability via Redundancy

    Post PC Era: Parallel processing, smart phones to WSC

    WSC SW must cope with failures, varying load, varyingHW latency bandwidth

    WSC HW sensitive to cost, energy efficiency 2Spring 2011 -- Lecture #11/30/2014

  • 8/13/2019 02LecSp11MapReduce

    3/70

    New-School Machine Structures(Its a bit more complicated!)

    Parallel RequestsAssigned to computer

    e.g., Search Katz

    Parallel ThreadsAssigned to core

    e.g., Lookup, Ads

    Parallel Instructions>1 instruction @ one time

    e.g., 5 pipelined instructions

    Parallel Data>1 data item @ one time

    e.g., Add of 4 pairs of words

    Hardware descriptionsAll gates @ one time

    1/30/2014 Spring 2011 -- Lecture #1 3

    SmartPhone

    WarehouseScale

    Computer

    Software Hardware

    HarnessParallelism &Achieve High

    Performance

    Logic Gates

    Core Core

    Memory (Cache)

    Input/Output

    Computer

    Main Memory

    Core

    Instruction Unit(s) FunctionalUnit(s)

    A3+B3A2+B

    2A1+B1A0+B0

    Todays Lecture

  • 8/13/2019 02LecSp11MapReduce

    4/70

    Agenda

    Request and Data Level Parallelism

    Administrivia + 61C in the News + InternshipWorkshop + The secret to getting good grades at

    Berkeley MapReduce

    MapReduce Examples

    Technology Break Costs in Warehouse Scale Computer (if time

    permits)

    1/30/2014 Apeinf 2011 -- Lecture #2 4

  • 8/13/2019 02LecSp11MapReduce

    5/70

    Request-Level Parallelism (RLP)

    Hundreds or thousands of requests per second

    Not your laptop or cell-phone, but popular Internet

    services like Google search

    Such requests are largely independent Mostly involve read-only databases

    Little read-write (aka producer-consumer) sharing

    Rarely involve readwrite data sharing or synchronization

    across requests

    Computation easily partitioned within a request

    and across different requests

    1/30/2014 Apeinf 2011 -- Lecture #2 5

  • 8/13/2019 02LecSp11MapReduce

    6/70

    Google Query-Serving Architecture

    1/30/2014 Apeinf 2011 -- Lecture #2 6

  • 8/13/2019 02LecSp11MapReduce

    7/70

    Anatomy of a Web Search

    Google Randy H. Katz Direct request to closest Google Warehouse Scale

    Computer

    Front-end load balancer directs request to one of

    many arrays (cluster of servers) within WSC Within array, select one of many Google Web Servers

    (GWS) to handle the request and compose theresponse pages

    GWS communicates with Index Servers to finddocuments that contain the search words, Randy,Katz, uses location of search as well

    Return document list with associated relevance score

    1/30/2014 Apeinf 2011 -- Lecture #2 7

  • 8/13/2019 02LecSp11MapReduce

    8/70

    Anatomy of a Web Search

    In parallel,

    Ad system: books by Katz at Amazon.com

    Images of Randy Katz

    Use docids (document IDs) to access indexeddocuments

    Compose the page

    Result document extracts (with keyword in context)ordered by relevance score

    Sponsored links (along the top) and advertisements(along the sides)

    1/30/2014 Apeinf 2011 -- Lecture #2 8

  • 8/13/2019 02LecSp11MapReduce

    9/70

    1/30/2014 Apeinf 2011 -- Lecture #2 9

  • 8/13/2019 02LecSp11MapReduce

    10/70

    Anatomy of a Web Search

    Implementation strategy

    Randomly distribute the entries

    Make many copies of data (aka replicas)

    Load balance requests across replicas

    Redundant copies of indices and documents

    Breaks up hot spots, e.g., Justin Bieber

    Increases opportunities for request-levelparallelism

    Makes the system more tolerant of failures

    1/30/2014 Apeinf 2011 -- Lecture #2 10

  • 8/13/2019 02LecSp11MapReduce

    11/70

    Data-Level Parallelism (DLP)

    2 kinds

    Lots of data in memory that can be operated onin parallel (e.g., adding together 2 arrays)

    Lots of data on many disks that can be operatedon in parallel (e.g., searching for documents)

    March 1 lecture and 3rdproject does DataLevel Parallelism (DLP) in memory

    Todays lecture and 1stproject does DLP across1000s of servers and disks using MapReduce

    1/30/2014 Apeinf 2011 -- Lecture #2 11

  • 8/13/2019 02LecSp11MapReduce

    12/70

    Problem Trying To Solve

    How process large amounts of raw data (crawleddocuments, request logs, ) every day tocompute derived data (inverted indicies, pagepopularity, ) when computation conceptually

    simple but input data large and distributed across100s to 1000s of servers so that finish inreasonable time?

    Challenge: Parallelize computation, distribute

    data, tolerate faults without obscuring simplecomputation with complex code to deal withissues

    Jeffrey Dean and Sanjay Ghemawat, MapReduce: Simplified Data Processing on LargeClusters, Communications of the ACM, Jan 2008.

    1/30/2014 Apeinf 2011 -- Lecture #2 12

  • 8/13/2019 02LecSp11MapReduce

    13/70

    MapReduce Solution

    Apply Map function to user supplier record ofkey/value pairs

    Compute set of intermediate key/value pairs

    Apply Reduce operation to all values thatshare same key in order to combine deriveddata properly

    User supplies Map and Reduce operations infunctional model so can parallelize, using re-execution for fault tolerance

    1/30/2014 Apeinf 2011 -- Lecture #2 13

  • 8/13/2019 02LecSp11MapReduce

    14/70

    Data-Parallel Divide and Conquer(MapReduce Processing)

    Map: Slice data into shards or splits, distribute these to

    workers, compute sub-problem solutions map(in_key,in_value)->list(out_key,intermediate value)

    Processes input key/value pair Produces set of intermediate pairs

    Reduce: Collect and combine sub-problem solutions reduce(out_key,list(intermediate_value))->list(out_value)

    Combines all intermediate values for a particular key

    Produces a set of merged output values (usually just one)

    Fun to use: focus on problem, let MapReduce librarydeal with messy details

    1/30/2014 Apeinf 2011 -- Lecture #2 14

  • 8/13/2019 02LecSp11MapReduce

    15/70

    MapReduce Execution

    1/30/2014 Apeinf 2011 -- Lecture #2 15

    Fine granularitytasks: many

    more map tasks

    than machines

    2000 servers => 200,000 Map Tasks,

    5,000 Reduce tasks

  • 8/13/2019 02LecSp11MapReduce

    16/70

    MapReduce Popularity at Google

    Aug-04 Mar-06 Sep-07 Sep-09Number of MapReduce jobs 29,000 171,000 2,217,000 3,467,000Average completion time(secs) 634 874 395 475

    Server years used 217 2,002 11,081 25,562

    Input data read (TB) 3,288 52,254 403,152 544,130Intermediate data (TB) 758 6,743 34,774 90,120

    Output data written (TB) 193 2,970 14,018 57,520Average number servers/job 157 268 394 488

    1/30/2014 Apeinf 2011 -- Lecture #2 16

  • 8/13/2019 02LecSp11MapReduce

    17/70

  • 8/13/2019 02LecSp11MapReduce

    18/70

    Agenda

    Request and Data Level Parallelism

    Administrivia + The secret to getting good

    grades at Berkeley

    MapReduce

    MapReduce Examples

    Technology Break Costs in Warehouse Scale Computer (if time

    permits)

    1/30/2014 Apeinf 2011 -- Lecture #2 19

  • 8/13/2019 02LecSp11MapReduce

    19/70

  • 8/13/2019 02LecSp11MapReduce

    20/70

    This Week

    Discussions and labs will be held this week

    Switching Sections: if you find another 61C student

    willing to swap discussion AND lab, talk to your TAs

    Partner (only project 3 and extra credit): OK ifpartners mix sections but have same TA

    First homework assignment due this Sunday

    January 23rd by 11:59:59 PM There is reading assignment as well on course page

    1/30/2014 Spring 2011 -- Lecture #1 21

  • 8/13/2019 02LecSp11MapReduce

    21/70

    Course Organization

    Grading Participation and Altruism (5%)

    Homework (5%)

    Labs (20%)

    Projects (40%)1. Data Parallelism (Map-Reduce on Amazon EC2)

    2. Computer Instruction Set Simulator (C)

    3. Performance Tuning of a Parallel Application/Matrix Multiplyusing cache blocking, SIMD, MIMD (OpenMP, due with partner)

    4. Computer Processor Design (Logisim) Extra Credit: Matrix Multiply Competition, anything goes

    Midterm (10%): 6-9 PM Tuesday March 8

    Final (20%): 11:30-2:30 PM Monday May 9

    1/30/2014 Spring 2011 -- Lecture #1 22

  • 8/13/2019 02LecSp11MapReduce

    22/70

    EECS Grading Policy

    http://www.eecs.berkeley.edu/Policies/ugrad.grading.shtml

    A typical GPA for courses in the lower division is 2.7. This GPA

    would result, for example, from 17% A's, 50% B's, 20% C's,

    10% D's, and 3% F's. A class whose GPA falls outside the range

    2.5 - 2.9 should be considered atypical.

    Fall 2010: GPA 2.81

    26% A's, 47% B's, 17% C's,

    3% D's, 6% F's

    Job/Intern Interviews: They grill

    you with technical questions, so

    its what you say, not your GPA

    (New 61c gives good stuff to say)1/30/2014 Spring 2011 -- Lecture #1 23

    Fall Spring

    2010 2.81 2.81

    2009 2.71 2.81

    2008 2.95 2.74

    2007 2.67 2.76

    http://www.eecs.berkeley.edu/Policies/ugrad.grading.shtmlhttp://www.eecs.berkeley.edu/Policies/ugrad.grading.shtml
  • 8/13/2019 02LecSp11MapReduce

    23/70

    Late Policy

    Assignments due Sundays at 11:59:59 PM Late homeworks not accepted (100% penalty)

    Late projects get 20% penalty, accepted up to

    Tuesdays at 11:59:59 PM No credit if more than 48 hours late

    No slip days in 61C

    Used by Dan Garcia and a few faculty to cope with 100s

    of students who often procrastinate without having tohear the excuses, but not widespread in EECS courses

    More late assignments if everyone has no-cost options;better to learn now how to cope with real deadlines

    1/30/2014 Spring 2011 -- Lecture #1 24

  • 8/13/2019 02LecSp11MapReduce

    24/70

    Policy on Assignments and

    Independent Work With the exception of laboratories and assignments that explicitly

    permit you to work in groups, all homeworks and projects are to beYOUR work and your work ALONE.

    You are encouraged to discuss your assignments with other students,and extra credit will be assigned to students who help others,

    particularly by answering questions on the Google Group, but weexpect that what you hand is yours.

    It is NOT acceptable to copy solutions from other students.

    It is NOT acceptable to copy (or start your) solutions from the Web.

    We have tools and methods, developed over many years, fordetecting this. You WILL be caught, and the penalties WILL be severe.

    At the minimum a ZERO for the assignment, possibly an F in thecourse, and a letter to your university record documenting theincidence of cheating.

    (We caught people last semester!)

    1/30/2014 Spring 2011 -- Lecture #1 25

  • 8/13/2019 02LecSp11MapReduce

    25/70

    YOUR BRAIN ON COMPUTERS; Hooked

    on Gadgets, and Paying a Mental PriceNY Times, June 7, 2010, by Matt Richtel

    SAN FRANCISCO -- When one of the most important

    e-mail messages of his life landed in his in-box a few

    years ago, Kord Campbell overlooked it.

    Not just for a day or two, but 12 days. He finally saw

    it while sifting through old messages: a big company

    wanted to buy his Internet start-up.

    ''I stood up from my desk and said, 'Oh my God, oh

    my God, oh my God,' '' Mr. Campbell said. ''It's kind

    of hard to miss an e-mail like that, but I did.''

    The message had slipped by him amid an electronic

    flood: two computer screens alive with e-mail,

    instant messages, online chats, a Web browser and

    the computer code he was writing.

    While he managed to salvage the $1.3 million deal

    after apologizing to his suitor, Mr. Campbell

    continues to struggle with the effects of the deluge

    of data. Even after he unplugs, he craves the

    stimulation he gets from his electronic gadgets. He

    forgets things like dinner plans, and he has trouble

    focusing on his family. His wife, Brenda, complains,

    ''It seems like he can no longer be fully in the

    moment.''

    This is your brain on computers.

    Scientists say juggling e-mail, phone calls and other

    incoming information can change how people think and

    behave. They say our ability to focus is being undermined

    by bursts of information.

    These play to a primitive impulse to respond to immediate

    opportunities and threats. The stimulation provokes

    excitement -- a dopamine squirt -- that researchers say can

    be addictive. In its absence, people feel bored.

    The resulting distractions can have deadly consequences, as

    when cellphone-wielding drivers and train engineers cause

    wrecks. And for millions of people like Mr. Campbell, these

    urges can inflict nicks and cuts on creativity and deep

    thought, interrupting work and family life.

    While many people say multitasking makes them more

    productive, research shows otherwise. Heavy multitaskersactually have more trouble focusing and shutting out

    irrelevant information, scientists say, and they experience

    more stress.

    And scientists are discovering that even after the

    multitasking ends, fractured thinking and lack of focus

    persist. In other words, this is also your brain off computers.

    Fall 2010 -- Lecture #2 26

  • 8/13/2019 02LecSp11MapReduce

    26/70

    The Rules

    (and we really mean it!)

    27Fall 2010 -- Lecture #2

  • 8/13/2019 02LecSp11MapReduce

    27/70

    Architecture of a Lecture

    1/30/2014 Spring 2011 -- Lecture #1 28

    Attention

    Time (minutes)

    0 20 25 50 53 78 80

    Administrivia And inconclusion

    Techbreak

    Full

  • 8/13/2019 02LecSp11MapReduce

    28/70

    61C in the News

    IEEE Spectrum Top 11 Innovations of the Decade

    1/30/2014 Apeinf 2011 -- Lecture #2 29

    61C

    61C 61C

  • 8/13/2019 02LecSp11MapReduce

    29/70

    1/30/2014 Apeinf 2011 -- Lecture #2 30

  • 8/13/2019 02LecSp11MapReduce

    30/70

    The Secret to Getting Good Grades

    Grad student said he figured finally it out

    (Mike Dahlin, now Professor at UT Texas)

    My question: What is the secret?

    Do assigned reading night before, so that getmore value from lecture

    Fall 61c Comment on End-of-Semester Survey:

    I wish I had followed Professor Patterson'sadvice and did the reading before eachlecture.

    Fall 2010 -- Lecture #2 31

  • 8/13/2019 02LecSp11MapReduce

    31/70

    MapReduce ProcessingExample: Count Word Occurrences

    Pseudo Code: Simple case of assuming just 1 word per document

    map(String input_key, String input_value):

    // input_key: document name

    // input_value: document contents

    for each word w in input_value:

    EmitIntermediate(w, "1");// Produce count of words

    reduce(String output_key, Iterator intermediate_values):

    // output_key: a word

    // output_values: a list of countsint result = 0;

    for each v in intermediate_values:

    result += ParseInt(v);// get integer from key-value

    Emit(AsString(result));

    1/30/2014 Apeinf 2011 -- Lecture #2 32

  • 8/13/2019 02LecSp11MapReduce

    32/70

    1/30/2014 Apeinf 2011 -- Lecture #2 33

    MapReduce Processing

    Shuffle phase

  • 8/13/2019 02LecSp11MapReduce

    33/70

    MapReduce Processing

    1/30/2014 Apeinf 2011 -- Lecture #2 34

    1. MR 1st splits the

    input files into Msplits then starts

    many copies of

    program on servers

    Shuffle phase

  • 8/13/2019 02LecSp11MapReduce

    34/70

    MapReduce Processing

    1/30/2014 Apeinf 2011 -- Lecture #2 35

    2. One copythe

    masteris special. Therest

    are workers. The master

    picks idle workers and

    assigns each 1 of M map

    tasks or 1 of R reduce

    tasks.

    Shuffle phase

  • 8/13/2019 02LecSp11MapReduce

    35/70

    MapReduce Processing

    1/30/2014 Apeinf 2011 -- Lecture #2 36

    3. A map worker reads the

    input split. It parseskey/value pairs of the input

    data and passes each pair

    to the user-defined map

    function.

    (The intermediate

    key/value pairsproduced by the map

    function are buffered

    in memory.)

    Shuffle phase

  • 8/13/2019 02LecSp11MapReduce

    36/70

    MapReduce Processing

    1/30/2014 Apeinf 2011 -- Lecture #2 37

    4. Periodically, the buffered

    pairs are written to localdisk, partitioned

    into R regions by the

    partitioning function.

    Shuffle phase

  • 8/13/2019 02LecSp11MapReduce

    37/70

    MapReduce Processing

    1/30/2014 Apeinf 2011 -- Lecture #2 38

    5. When a reduce worker

    has read all intermediate

    data for its partition, it sorts

    it by the intermediate

    keys so that all occurrences

    of the same key are

    grouped together.

    (The sorting is needed

    because typically manydifferent keys map to

    the same reduce task )

    Shuffle phase

  • 8/13/2019 02LecSp11MapReduce

    38/70

  • 8/13/2019 02LecSp11MapReduce

    39/70

    MapReduce Processing

    1/30/2014 Apeinf 2011 -- Lecture #2 40

    7. When all map tasks and

    reduce tasks have been

    completed, the master

    wakes up the user program.

    The MapReduce call

    In user program returns

    back to user code.

    Output of MR is in R

    output files (1 perreduce task, with file

    names specified by

    user); often passed

    into another MR job.

    Shuffle phase

  • 8/13/2019 02LecSp11MapReduce

    40/70

    MapReduce Processing Time Line

    Master assigns map + reduce tasks to worker servers

    As soon as a map task finishes, worker server can beassigned a new map or reduce task

    Data shuffle begins as soon as a given Map finishes

    Reduce task begins as soon as all data shuffles finish

    To tolerate faults, reassign task if a worker server dies1/30/2014 Apeinf 2011 -- Lecture #2 41

  • 8/13/2019 02LecSp11MapReduce

    41/70

    Show MapReduce Job Running

    ~41 minutes total

    ~29 minutes for Map tasks & Shuffle tasks

    ~12 minutes for Reduce tasks

    1707 worker servers used

    Map (Green) tasks read 0.8 TB, write 0.5 TB

    Shuffle (Red) tasks read 0.5 TB, write 0.5 TB

    Reduce (Blue) tasks read 0.5 TB, write 0.5 TB

    1/30/2014 Apeinf 2011 -- Lecture #2 42

  • 8/13/2019 02LecSp11MapReduce

    42/70

    1/30/2014 Apeinf 2011 -- Lecture #2 43

  • 8/13/2019 02LecSp11MapReduce

    43/70

    1/30/2014 Apeinf 2011 -- Lecture #2 44

  • 8/13/2019 02LecSp11MapReduce

    44/70

    1/30/2014 Apeinf 2011 -- Lecture #2 45

  • 8/13/2019 02LecSp11MapReduce

    45/70

    1/30/2014 Apeinf 2011 -- Lecture #2 46

  • 8/13/2019 02LecSp11MapReduce

    46/70

    1/30/2014 Apeinf 2011 -- Lecture #2 47

  • 8/13/2019 02LecSp11MapReduce

    47/70

    1/30/2014 Apeinf 2011 -- Lecture #2 48

  • 8/13/2019 02LecSp11MapReduce

    48/70

    1/30/2014 Apeinf 2011 -- Lecture #2 49

  • 8/13/2019 02LecSp11MapReduce

    49/70

    1/30/2014 Apeinf 2011 -- Lecture #2 50

  • 8/13/2019 02LecSp11MapReduce

    50/70

    1/30/2014 Apeinf 2011 -- Lecture #2 51

  • 8/13/2019 02LecSp11MapReduce

    51/70

    1/30/2014 Apeinf 2011 -- Lecture #2 52

  • 8/13/2019 02LecSp11MapReduce

    52/70

    1/30/2014 Apeinf 2011 -- Lecture #2 53

    A th E l W d I d

  • 8/13/2019 02LecSp11MapReduce

    53/70

    Another Example: Word Index(How Often Does a Word Appear?)

    1/30/2014 Apeinf 2011 -- Lecture #2 55

    that that is is that that is not is not is that it it is

    is 1, that 2 is 1, that 2 is 2, not 2 is 2, it 2, that 1

    Map 1 Map 2 Map 3 Map 4

    Reduce 1 Reduce 2

    is 1 that 2is 1,1 that 2,2is 1,1,2,2it 2

    that 2,2,1not 2

    is 6; it 2 not 2; that 5

    Shuffle

    Collect

    is 6; it 2; not 2; that 5

    Distribute

  • 8/13/2019 02LecSp11MapReduce

    54/70

    MapReduce Failure Handling

    On worker failure:

    Detect failure via periodic heartbeats

    Re-execute completed and in-progress map tasks

    Re-execute in progress reduce tasks Task completion committed through master

    Master failure:

    Could handle, but don't yet (master failure unlikely) Robust: lost 1600 of 1800 machines once, but

    finished fine

    1/30/2014 Apeinf 2011 -- Lecture #2 56

  • 8/13/2019 02LecSp11MapReduce

    55/70

    MapReduce Redundant Execution

    Slow workers significantly lengthen completiontime

    Other jobs consuming resources on machine

    Bad disks with soft errors transfer data very slowly

    Weird things: processor caches disabled (!!)

    Solution: Near end of phase, spawn backupcopies of tasks

    Whichever one finishes first "wins" Effect: Dramatically shortens job completion time

    3% more resources, large tasks 30% faster

    1/30/2014 Apeinf 2011 -- Lecture #2 57

  • 8/13/2019 02LecSp11MapReduce

    56/70

  • 8/13/2019 02LecSp11MapReduce

    57/70

  • 8/13/2019 02LecSp11MapReduce

    58/70

    Agenda

    Request and Data Level Parallelism

    Administrivia + The secret to getting good

    grades at Berkeley

    MapReduce

    MapReduce Examples

    Technology Break Costs in Warehouse Scale Computer (if time

    permits)

    1/30/2014 Apeinf 2011 -- Lecture #2 60

  • 8/13/2019 02LecSp11MapReduce

    59/70

    Design Goals of a WSC

    Unique to Warehouse-scale Ample parallelism:

    Batch apps: large number independent data sets withindependent processing. Also known as Data-LevelParallelism

    Scale and its Opportunities/Problems Relatively small number of these make design cost expensive

    and difficult to amortize

    But price breaks are possible from purchases of very largenumbers of commodity servers

    Must also prepare for high component failures

    Operational Costs Count: Cost of equipment purchases

  • 8/13/2019 02LecSp11MapReduce

    60/70

    Internet

    WSC Case Study

    Server Provisioning

    1/30/2014 Fall 2010 -- Lecture #37 62

    WSC Power Capacity 8.00 MWPower Usage Effectiveness (PUE) 1.45

    IT Equipment Power Share 0.67 5.36 MW

    Power/Cooling Infrastructure 0.33 2.64 MW

    IT Equipment Measured Peak (W) 145.00

    Assume Average Pwr @ 0.8 Peak 116.00

    # of Servers 46207

    # of Servers 46000

    # of Servers per Rack 40.00

    # of Racks 1150

    Top of Rack Switches 1150

    # of TOR Switch per L2 Switch 16.00# of L2 Switches 72

    # of L2 Switches per L3 Switch 24.00

    # of L3 Switches 3Rack

    Server

    TOR Switch

    L3 Switch

    L2 Switch

  • 8/13/2019 02LecSp11MapReduce

    61/70

    Cost of WSC

    US account practice separates purchase price

    and operational costs

    Capital Expenditure (CAPEX) is cost to buy

    equipment (e.g. buy servers)

    Operational Expenditure (OPEX) is cost to run

    equipment (e.g, pay for electricity used)

    1/30/2014 Apeinf 2011 -- Lecture #2 63

    WSC Case Study

  • 8/13/2019 02LecSp11MapReduce

    62/70

    WSC Case Study

    Capital Expenditure (Capex)

    Facility cost and total IT cost look about the same

    1/30/2014 Fall 2010 -- Lecture #37 64

    Facility Cost $88,000,000

    Total Server Cost $66,700,000

    Total Network Cost $12,810,000

    Total Cost $167,510,000

    However, replace servers every 3 years,networking gear every 4 years, and facility every10 years

  • 8/13/2019 02LecSp11MapReduce

    63/70

    Cost of WSC

    US account practice allow converting Capital

    Expenditure (CAPEX) into Operational

    Expenditure (OPEX) by amortizing costs over

    time period Servers 3 years

    Networking gear 4 years

    Facility 10 years

    1/30/2014 Apeinf 2011 -- Lecture #2 65

    WSC Case Study

  • 8/13/2019 02LecSp11MapReduce

    64/70

    WSC Case Study

    Operational Expense (Opex)

    1/30/2014 Fall 2010 -- Lecture #37 66

    Server 3 $66,700,000 $2,000,000 55%

    Network 4 $12,530,000 $295,000 8%

    Facility $88,000,000

    Pwr&Cooling 10 $72,160,000 $625,000 17%

    Other 10 $15,840,000 $140,000 4%Amortized Cost $3,060,000

    Power (8MW) $0.07 $475,000 13%

    People (3) $85,000 2%

    Total Monthly $3,620,000 100%

    Monthly Cost

    Years

    Amortization

    $/kWh

    Amortized

    Capital

    Expense

    Operational

    Expense

    Monthly Power costs $475k for electricity

    $625k + $140k to amortize facility power distribution and cooling

    60% is amortized power distribution and cooling

  • 8/13/2019 02LecSp11MapReduce

    65/70

    How much does a watt cost in a WSC?

    8 MW facility

    Amortized facility, including power

    distribution and cooling is $625k + $140k =

    $765k

    Monthly Power Usage = $475k

    Watt-Year = ($765k+$475k)*12/8M = $1.86 or

    about $2 per year

    To save a watt, if spend more than $2 a year,

    lose money1/30/2014 Apeinf 2011 -- Lecture #2 67

  • 8/13/2019 02LecSp11MapReduce

    66/70

    Replace Rotating Disks with Flash? Flash is non-volatile semiconductor memory

    Costs about $20 / GB, Capacity about 10 GB

    Power about 0.01 Watts

    Disk is non-volatile rotating storage

    Costs about $0.1 / GB, Capacity about 1000 GB

    Power about 10 Watts

    Should replace Disk with Flash to save money?

    1/30/2014 Apeinf 2011 -- Lecture #2 68

    A red) No: Capex Costs are 100:1 of OpEx savings!

    B orange) Dont have enough information to answer question

    C green) Yes: Return investment in a single year!

  • 8/13/2019 02LecSp11MapReduce

    67/70

    Replace Rotating Disks with Flash? Flash is non-volatile semiconductor memory

    Costs about $20 / GB, Capacity about 10 GB

    Power about 0.01 Watts

    Disk is non-volatile rotating storage

    Costs about $0.1 / GB, Capacity about 1000 GB

    Power about 10 Watts

    Should replace Disk with Flash to save money?

    1/30/2014 Apeinf 2011 -- Lecture #2 69

    A red) No: Capex Costs are 100:1 of OpEx savings!

    B orange) Dont have enough information to answer question

    C green) Yes: Return investment in a single year!

    WSC Case Study

  • 8/13/2019 02LecSp11MapReduce

    68/70

    WSC Case Study

    Operational Expense (Opex)

    $3.8M/46000 servers = ~$80 per month perserver in revenue to break even

    ~$80/720 hours per month = $0.11 per hour

    So how does Amazon EC2 make money???1/30/2014 Fall 2010 -- Lecture #37 70

    Server 3 $66,700,000 $2,000,000 55%

    Network 4 $12,530,000 $295,000 8%

    Facility $88,000,000

    Pwr&Cooling 10 $72,160,000 $625,000 17%

    Other 10 $15,840,000 $140,000 4%Amortized Cost $3,060,000

    Power (8MW) $0.07 $475,000 13%

    People (3) $85,000 2%

    Total Monthly $3,620,000 100%

    Monthly Cost

    Years

    Amortization

    $/kWh

    Amortized

    Capital

    Expense

    Operational

    Expense

  • 8/13/2019 02LecSp11MapReduce

    69/70

    January 2011 AWS Instances & Prices

    Closest computer in WSC example is Standard Extra Large

    @$0.11/hr, Amazon EC2 can make money!

    even if used only 50% of time

    1/30/2014 Fall 2010 -- Lecture #37 71

    Instance PerHour

    RatiotoSmall

    ComputeUnits

    VirtualCores

    ComputeUnit/Core

    Memory(GB)

    Disk(GB)

    Address

    Standard Small $0.085 1.0 1.0 1 1.00 1.7 160 32 bit

    Standard Large $0.340 4.0 4.0 2 2.00 7.5 850 64 bit

    Standard Extra Large $0.680 8.0 8.0 4 2.00 15.0 1690 64 bit

    High-Memory Extra Large $0.500 5.9 6.5 2 3.25 17.1 420 64 bitHigh-Memory Double Extra Large $1.000 11.8 13.0 4 3.25 34.2 850 64 bit

    High-Memory Quadruple Extra Large $2.000 23.5 26.0 8 3.25 68.4 1690 64 bit

    High-CPU Medium $0.170 2.0 5.0 2 2.50 1.7 350 32 bit

    High-CPU Extra Large $0.680 8.0 20.0 8 2.50 7.0 1690 64 bit

    Cluster Quadruple Extra Large $1.600 18.8 33.5 8 4.20 23.0 1690 64 bit

  • 8/13/2019 02LecSp11MapReduce

    70/70

    Summary Request-Level Parallelism

    High request volume, each largely independent of other

    Use replication for better request throughput, availability

    MapReduce Data Parallelism Divide large data set into pieces for independent parallel

    processing Combine and process intermediate results to obtain final

    result

    WSC CapEx vs. OpEx Servers dominate cost

    Spend more on power distribution and coolinginfrastructure than on monthly electricity costs

    Economies of scale mean WSC can sell computing as autility