lessons learned while building infrastructure software at...

48
Lessons Learned While Building Infrastructure Software at Google Jeff Dean [email protected] Tuesday, September 10, 13

Upload: others

Post on 22-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

Lessons Learned While Building Infrastructure Software at Google

Jeff [email protected]

Tuesday, September 10, 13

Page 2: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

“Google” Circa 1997 (google.stanford.edu)

Tuesday, September 10, 13

Page 3: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

“Corkboards” (1999)

Tuesday, September 10, 13

Page 4: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

Google Data Center (2000)

Tuesday, September 10, 13

Page 5: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

Google Data Center (2000)

Tuesday, September 10, 13

Page 6: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

Google Data Center (2000)

Tuesday, September 10, 13

Page 7: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

Google (new data center 2001)

Tuesday, September 10, 13

Page 8: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

Google Data Center (3 days later)

Tuesday, September 10, 13

Page 9: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

• Many datacenters around the world

Google’s Computational Environment Today

Tuesday, September 10, 13

Page 10: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

• Many datacenters around the world

Google’s Computational Environment Today

Tuesday, September 10, 13

Page 11: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

Zooming In...

Tuesday, September 10, 13

Page 12: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

Lots of machines...

Tuesday, September 10, 13

Page 13: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

Cool...

Tuesday, September 10, 13

Page 14: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

Low-Level Systems Software Desires

• If you have lots of machines, you want to:

• Store data persistently–w/ high availability–high read and write bandwidth

• Run large-scale computations reliably–without having to deal with machine failures

• GFS, MapReduce, BigTable, Spanner, ...

Tuesday, September 10, 13

Page 15: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

• Master manages metadata• Data transfers are directly between clients/chunkservers• Files broken into chunks (typically 64 MB)• Chunks replicated across multiple machines (usually 3)

Client

Client

Misc. servers

ClientRep

licas

Masters

GFS Master

GFS Master

C0 C1

C2C5

Chunkserver 1

C0

C2

C5

Chunkserver N

C1

C3C5

Chunkserver 2

Google File System (GFS) Design

Tuesday, September 10, 13

Page 16: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

• Indexing system clearly needed a large-scale distributed file system–wanted to treat whole cluster as single file system

• Developed by subset of same people working on indexing system

• Identified minimal set of features needed–e.g. Not POSIX compliant–actual data was distributed, but kept metadata

centralized• Colossus: Follow-on system developed many years later

distributed the metadata

• Lesson: Don’t solve everything all at once

GFS Motivation and Lessons

Tuesday, September 10, 13

Page 17: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

MapReduce History

• 2003: Sanjay Ghemawat and I were working on rewriting indexing system:–starts with raw page contents on disk–many phases:

• (near) duplicate elimination, anchor text extraction, language identification, index shard generation, etc.

–end result is data structures for index and doc serving

• Each phase was hand written parallel computation:–hand parallelized–hand-written checkpointing code for fault-tolerance

Tuesday, September 10, 13

Page 18: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

MapReduce

• A simple programming model that applies to many large-scale computing problems– allowed us to express all phases of our indexing system– since used across broad range of computer science areas, plus

other scientific fields– Hadoop open-source implementation seeing significant usage

• Hide messy details in MapReduce runtime library:– automatic parallelization– load balancing– network and disk transfer optimizations– handling of machine failures– robustness– improvements to core library benefit all users of library!

Tuesday, September 10, 13

Page 19: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

Typical problem solved by MapReduce

• Read a lot of data• Map: extract something you care about from each record• Shuffle and Sort• Reduce: aggregate, summarize, filter, or transform• Write the results

Outline stays the same,User writes Map and Reduce functions to fit the problem

Tuesday, September 10, 13

Page 20: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

• Developed by two people that were also doing the indexing system rewrite– squinted at various phases with an eye towards coming up with

common abstraction

• Initial version developed quickly– proved initial API utility with very simple implementation– rewrote much of implementation 6 months later to add lots of

the performance wrinkles/tricks that appeared in original paper

• Lesson: Very close ties with initial users of system make things happen faster–in this case, we were both building MapReduce and

using it simultaneously

MapReduce Motivation and Lessons

Tuesday, September 10, 13

Page 21: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

• Lots of (semi-)structured data at Google– URLs: Contents, crawl metadata, links, anchors, pagerank, …– Per-user data: User preferences, recent queries, …– Geographic locations: Physical entities, roads, satellite image

data, user annotations, …• Scale is large• Want to be able to grow and shrink resources devoted

to system as needed

BigTable: Motivation

Tuesday, September 10, 13

Page 22: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

• Distributed multi-dimensional sparse map (row, column, timestamp) → cell contents

Rows

Columns

• Rows are ordered lexicographically• Good match for most of our applications

BigTable: Basic Data Model

Tuesday, September 10, 13

Page 23: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

• Distributed multi-dimensional sparse map (row, column, timestamp) → cell contents

“www.cnn.com”

“contents:”

Rows

Columns

“<html>…”

• Rows are ordered lexicographically• Good match for most of our applications

BigTable: Basic Data Model

Tuesday, September 10, 13

Page 24: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

• Distributed multi-dimensional sparse map (row, column, timestamp) → cell contents

“www.cnn.com”

“contents:”

Rows

Columns

Timestamps

t17“<html>…”

• Rows are ordered lexicographically• Good match for most of our applications

BigTable: Basic Data Model

Tuesday, September 10, 13

Page 25: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

• Distributed multi-dimensional sparse map (row, column, timestamp) → cell contents

“www.cnn.com”

“contents:”

Rows

Columns

Timestamps

t11t17“<html>…”

• Rows are ordered lexicographically• Good match for most of our applications

BigTable: Basic Data Model

Tuesday, September 10, 13

Page 26: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

• Distributed multi-dimensional sparse map (row, column, timestamp) → cell contents

“www.cnn.com”

“contents:”

Rows

Columns

Timestamps

t3t11

t17“<html>…”

• Rows are ordered lexicographically• Good match for most of our applications

BigTable: Basic Data Model

Tuesday, September 10, 13

Page 27: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

Tablets & Splitting

“cnn.com”

“contents:”

“<html>…”

“language:”

EN

“cnn.com/sports.html”

“zuppa.com/menu.html”

“website.com”

“aaa.com”

Tuesday, September 10, 13

Page 28: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

Tablets & Splitting

…Tablets

“cnn.com”

“contents:”

“<html>…”

“language:”

EN

“cnn.com/sports.html”

“zuppa.com/menu.html”

“website.com”

“aaa.com”

Tuesday, September 10, 13

Page 29: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

Tablets & Splitting

…Tablets

“cnn.com”

“contents:”

“<html>…”

“language:”

EN

“cnn.com/sports.html”

“zuppa.com/menu.html”

…“yahoo.com/kids.html”

“yahoo.com/kids.html\0”

…“website.com”

“aaa.com”

Tuesday, September 10, 13

Page 30: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

Tablets & Splitting

…Tablets

“cnn.com”

“contents:”

“<html>…”

“language:”

EN

“cnn.com/sports.html”

“zuppa.com/menu.html”

…“yahoo.com/kids.html”

“yahoo.com/kids.html\0”

…“website.com”

“aaa.com”

Tuesday, September 10, 13

Page 31: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

Bigtable master

Bigtable tablet server Bigtable tablet serverBigtable tablet server …

Bigtable Cell

BigTable System Structure

Tuesday, September 10, 13

Page 32: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

Bigtable master

Bigtable tablet server Bigtable tablet serverBigtable tablet server …

performs metadata ops +load balancing

Bigtable Cell

BigTable System Structure

Tuesday, September 10, 13

Page 33: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

Bigtable master

Bigtable tablet server Bigtable tablet serverBigtable tablet server …

performs metadata ops +load balancing

serves data serves dataserves data

Bigtable Cell

BigTable System Structure

Tuesday, September 10, 13

Page 34: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

Lock service

Bigtable master

Bigtable tablet server Bigtable tablet serverBigtable tablet server

Cluster file systemCluster scheduling system

performs metadata ops +load balancing

serves data serves dataserves data

Bigtable Cell

BigTable System Structure

Tuesday, September 10, 13

Page 35: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

Lock service

Bigtable master

Bigtable tablet server Bigtable tablet serverBigtable tablet server

Cluster file systemCluster scheduling system

schedules tasks onto machines

performs metadata ops +load balancing

serves data serves dataserves data

Bigtable Cell

BigTable System Structure

Tuesday, September 10, 13

Page 36: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

Lock service

Bigtable master

Bigtable tablet server Bigtable tablet serverBigtable tablet server

Cluster file systemCluster scheduling system

holds tablet data, logsschedules tasks onto machines

performs metadata ops +load balancing

serves data serves dataserves data

Bigtable Cell

BigTable System Structure

Tuesday, September 10, 13

Page 37: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

Lock service

Bigtable master

Bigtable tablet server Bigtable tablet serverBigtable tablet server

Cluster file systemCluster scheduling system

holds metadata,handles master-electionholds tablet data, logsschedules tasks onto machines

performs metadata ops +load balancing

serves data serves dataserves data

Bigtable Cell

BigTable System Structure

Tuesday, September 10, 13

Page 38: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

Lock service

Bigtable master

Bigtable tablet server Bigtable tablet serverBigtable tablet server

Cluster file systemCluster scheduling system

holds metadata,handles master-electionholds tablet data, logsschedules tasks onto machines

performs metadata ops +load balancing

serves data serves dataserves data

Bigtable CellBigtable client

Bigtable clientlibrary

BigTable System Structure

Tuesday, September 10, 13

Page 39: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

Lock service

Bigtable master

Bigtable tablet server Bigtable tablet serverBigtable tablet server

Cluster file systemCluster scheduling system

holds metadata,handles master-electionholds tablet data, logsschedules tasks onto machines

performs metadata ops +load balancing

serves data serves dataserves data

Bigtable CellBigtable client

Bigtable clientlibrary

Open()

BigTable System Structure

Tuesday, September 10, 13

Page 40: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

Lock service

Bigtable master

Bigtable tablet server Bigtable tablet serverBigtable tablet server

Cluster file systemCluster scheduling system

holds metadata,handles master-electionholds tablet data, logsschedules tasks onto machines

performs metadata ops +load balancing

serves data serves dataserves data

Bigtable CellBigtable client

Bigtable clientlibrary

Open()read/write

BigTable System Structure

Tuesday, September 10, 13

Page 41: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

Lock service

Bigtable master

Bigtable tablet server Bigtable tablet serverBigtable tablet server

Cluster file systemCluster scheduling system

holds metadata,handles master-electionholds tablet data, logsschedules tasks onto machines

performs metadata ops +load balancing

serves data serves dataserves data

Bigtable CellBigtable client

Bigtable clientlibrary

Open()read/write

metadata ops

BigTable System Structure

Tuesday, September 10, 13

Page 42: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

BigTable Status• Production use for 100s of projects:

– Crawling/indexing pipeline, Google Maps/Google Earth/Streetview, Search History, Google Print, Google+, Blogger, ...

• Currently 500+ BigTable clusters• Largest cluster:

–100s PB data; sustained: 30M ops/sec; 100+ GB/s I/O• Many asynchronous processes updating different

pieces of information–no distributed transactions, no cross-row joins–initial design was just in a single cluster–follow-on work added eventual consistency across

many geographically distributed BigTable instances

Tuesday, September 10, 13

Page 43: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

Spanner

• Storage & computation system that runs across many datacenters– single global namespace

• names are independent of location(s) of data• fine-grained replication configurations

– support mix of strong and weak consistency across datacenters• Strong consistency implemented with Paxos across tablet replicas• Full support for distributed transactions across directories/machines

– much more automated operation• automatically changes replication based on constraints and usage patterns• automated allocation of resources across entire fleet of machines

Tuesday, September 10, 13

Page 44: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

• Future scale: ~105 to 107 machines, ~1013 directories, ~1018 bytes of storage, spread at 100s to 1000s of locations around the world

– zones of semi-autonomous control– consistency after disconnected operation– users specify high-level desires:

“99%ile latency for accessing this data should be <50ms” “Store this data on at least 2 disks in EU, 2 in U.S. & 1 in Asia”

Design Goals for Spanner

Tuesday, September 10, 13

Page 45: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

Spanner Lessons

• Several variations of eventual client API• Started to develop with many possible customers in mind, but no

particular customer we were working closely with

• Eventually we worked closely with Google ads system as initial customer– first real customer was very demanding (real $$): good and bad

• Different API than BigTable– Harder to move users with existing heavy BigTable usage

Tuesday, September 10, 13

Page 46: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

Designing & Building InfrastructureIdentify common problems, and build software systems to

address them in a general way

• Important to not try to be all things to all people– Clients might be demanding 8 different things– Doing 6 of them is easy– …handling 7 of them requires real thought– …dealing with all 8 usually results in a worse system

• more complex, compromises other clients in trying to satisfy everyone

Tuesday, September 10, 13

Page 47: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

Designing & Building Infrastructure (cont)Don't build infrastructure just for its own sake:• Identify common needs and address them• Don't imagine unlikely potential needs that aren't really there

Best approach: use your own infrastructure (especially at first!)• (much more rapid feedback about what works, what doesn't)

If not possible, at least work very closely with initial client team• ideally sit within 50 feet of each other• keep other potential clients needs in mind, but get system

working via close collaboration with first client first

Tuesday, September 10, 13

Page 48: Lessons Learned While Building Infrastructure Software at ...lintool.github.io/my-data-is-bigger-than-your-data/data/2013-09-10-JeffDean.pdf · Lessons Learned While Building Infrastructure

Thanks!

Further reading:• Ghemawat, Gobioff, & Leung. Google File System, SOSP 2003.

• Barroso, Dean, & Hölzle. Web Search for a Planet: The Google Cluster Architecture, IEEE Micro, 2003.

• Dean & Ghemawat. MapReduce: Simplified Data Processing on Large Clusters, OSDI 2004.

• Chang, Dean, Ghemawat, Hsieh, Wallach, Burrows, Chandra, Fikes, & Gruber. Bigtable: A Distributed Storage System for Structured Data, OSDI 2006.

• Corbett et al. Spanner: Google’s Globally Distributed Database, OSDI 2012.

• Burrows. The Chubby Lock Service for Loosely-Coupled Distributed Systems. OSDI 2006.

• Pinheiro, Weber, & Barroso. Failure Trends in a Large Disk Drive Population. FAST 2007.

• Barroso & Hölzle. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Morgan & Claypool Synthesis Series on Computer Architecture, 2009.

• Malewicz et al. Pregel: A System for Large-Scale Graph Processing. PODC, 2009.

• Schroeder, Pinheiro, & Weber. DRAM Errors in the Wild: A Large-Scale Field Study. SEGMETRICS’09.

• Protocol Buffers. http://code.google.com/p/protobuf/

See: http://research.google.com/papers.html

http://research.google.com/people/jeff

Tuesday, September 10, 13