astricon - realities of global infrastructure in the cloud

Post on 01-Nov-2014

593 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Talk by Cory von Wallenstein from Dyn at Astricon in Atlanta, GA on the realities of global infrastructure in the cloud.

TRANSCRIPT

@cvonwallenstein from @DynInc

Global Infrastructure in the Cloud

Cory von WallensteinChief Technology Officer, Dyn Inc.

@cvonwallenstein

http://www.flickr.com/photos/notaperfectpilot/8119088205/

“Wired people should know something about wires”- Neal Stephenson, quoted in Andrew Blum’s TED Talk What is the Internet, Really?

http://www.ted.com/talks/andrew_blum_what_is_the_internet_really.html

http://www.ted.com/talks/andrew_blum_what_is_the_internet_really.html

http://www.ted.com/talks/andrew_blum_what_is_the_internet_really.html

http://www.ted.com/talks/andrew_blum_what_is_the_internet_really.html

http://www.ted.com/talks/andrew_blum_what_is_the_internet_really.html

@cvonwallenstein from @DynInc

Going Global in the Cloud

• Never been easier• Never been more affordable• Why should or shouldn’t you?• If so, how?

@cvonwallenstein from @DynInc

A Word on Costs and Value

• Unlikely to save you raw dollars• Likely to spend the same or more• But here’s what you gain:

– Flexibility – Performance – Reliability – Efficiency

• Are those worthwhile to you?

(can’t really screw this up)(many caveats here)

(if you do it right)(if your team embraces it)

@cvonwallenstein from @DynInc

Why go from 1 to N?

Reason 1: Disaster Recovery

http://maps.google.com

Reason 1: Disaster Recovery

http://www.cogentco.com/files/images/network/network_map/networkmap_global_large.png

Speed of light299,792.458 km/second

Theoretical RTT~40ms

Real RTT~90ms

Reason 1: Disaster Recovery

• Things don’t work as well at 90ms RTT latency as they do at 9ms RTT latency

• Where can you go to get out of the way of a disaster but not create latency headaches?

http://www.globaldatavault.com/natural-disaster-threat-maps.htm

Reason 1: Disaster Recovery

http://www.datacenterknowledge.com/archives/2012/07/09/outages-surviving-electric-squirrels-ups-failures/

“A frying squirrel took out half of our Santa Clara data center two years back,”- Mike Christian, Yahoo

Reason 1: Disaster Recovery

http://blog.level3.com/level-3-network/the-10-most-bizarre-and-annoying-causes-of-fiber-cuts/

“Squirrel chews account for a whopping 17% of our damages so far this year! But let me add that it is down from 28% just last year and it continues to decrease since we added cable guards to our plant.”, Fred Lawler, Level(3)

Reason 2: Get closer to users

http://www.akamai.com/html/technology/dataviz1.html

Reason 2: Get closer to users

http://www.akamai.com/html/technology/dataviz1.html

Reason 3: “Sorry, we’re full”

http://www.theregister.co.uk/2010/10/12/capgemini_merlin_data_center/

How: Figure out who and where

• Figure out what your motivations are– Disaster recovery– Get closer to users– Future scaling

• Take a latency inventory of your apps– To end users– To other dependencies

• Get out the maps! Fire up traceroute!– EC2: US East (Northern Virginia), US West (Oregon), US West (Northern California), EU (Ireland), Asia

Pacific (Singapore), Asia Pacific (Tokyo), South America (Sao Paulo), and GovCloud.

@cvonwallenstein from @DynInc

How: Deploy and manage w/ sanity• Software defined datacenters

– Fancy term for “I defined the architecture in code instead of Microsoft Visio”

• Configuration management– Orchestrate the cloud APIs, and the config of

systems– Chef– Puppet– CFEngine, and more

• Huge loss if you don’t take advantage of this

@cvonwallenstein from @DynInc

How: Coordinating global traffic• What’s the app?

– Application agnostic, like DNS Global Server Load Balancing

• Fancy term for “DNS servers monitor your servers and change DNS answers when events are detected”

– Application specific, like DUNDi• Decentralized coordination and fault tolerance

• Avoid SPOFs like the plague– Keep it simple, keep it scalable

@cvonwallenstein from @DynInc

What can you expect?• Flexibility

– Deploy new servers in new locations in hours instead of weeks

• Performance– If horizontally scalable on commodity hardware,

you win. Else, be careful.– If closer to users and site-to-site latency not an

issue or data is distributed/eventually consistent, you win. Else, be careful.

@cvonwallenstein from @DynInc

What can you expect?• Reliability

– If you understand “regions” and “availability zones”, you win. Else, be careful.

http://joyent.com/blog/if-i-was-your-cloud-provider-i-d-never-let-you-down

What can you expect?• Efficiency

– Automation– More instrumentation -> reduced MTTD– More scalable– Most important: More focus on what delivers your

business core competitive advantage.

@cvonwallenstein from @DynInc

Thank you (and we’re hiring!)VP Technical Operations, Director of Engineering

Director of Security, Network Engineers, Software Engineers, System Engineers, System Administrators (and more!)

Reach out to me: dyn.com, cvw@dyn.com, @cvonwallenstein

top related