aws customer presentation - thomson reuters - delivering on the promise of digital media

THOMSON REUTERS WEBCASTING IN THE CLOUD MULTIMEDIA SOLUTIONS

SIMON BALL, THOMSON REUTERS ADRIAN ROE, ID3AS

14th JUNE 2012

Intro to Thomson Reuters

• Multimedia Solutions is part of Corporate Services

which is part of the Financial and Risk business

segment within Thomson Reuters.

• Provides multimedia communications solutions

which address the needs of professional

communicators, including content creation, vertical

and workflow specialization, distribution and reach,

and actionable analytics.

• As the only truly global provider in the industry, we

offer a unique single vendor solution for multi-

national firms.

The Business

• “Fair Disclosure” legislation demands that:

– Companies distribute quarterly results in a timely manner

– Releases to financial markets are made available to the

public at the same time

• Webcasting is a cost-effective way of doing this

• 25,000+ live events per year

– Very spiky - 4 high-volume periods, lots of quiet ones

– Average usage is less than 5% of peak

– Around 300 concurrent events on a busy day

• Audiences in thousands

http://edge.media-server.com/m/p/phcj8bzg/lan/en

How Did We Deliver to Customers Before?

• Service Vendors

– Conversion to web stream (encoder)

– Teleconference service

• Regional encoding centers

– Manual capture from telephone device

– Encoding Hardware

Motivation for change

• Previous Platform:

– Update technology and improve quality

– Did not allow the business to scale

– Was expensive to run

• Motivators were (in order):

– Improve customer experience

– Global platform consolation

– Allow business to scale

– Reduce cost

Evaluation Process

• Buy vs. Build

– No off-the-shelf product that offered required functionality

without significant customisation

• Private data center vs. cloud-based

– Private data center lacked flexibility

– Significant up-front capex was not attractive

• Tested multiple cloud vendors

– Explicitly wanted a multi-vendor strategy

• Resilience

• Avoid lock-in

System Schematic

How Do We Do It Now?

What do we do

• Webcast

Intro to id3as

• Elastic solutions for the broadcast, multimedia and

finance sectors

• Specialist in:

– Custom solutions

• Creation of lean, innovative, high-density solutions

– Large-scale

• Going beyond “simple” website clusters of a few machines to

systems needing highly distributed compute or data

requirements, where “traditional” tools are not necessarily

appropriate

– Highly available

• No single points of failure

• Zero downtime maintenance

Technical Challenges

• Delivering quality SLA from a commodity platform

• Scalability

– On-demand management of ~1000 servers

• Resilience

– No webcast to have a single point of failure

• Support

– support of ~1000 servers distributed around the world

– Need for (simple) tools (web UI, scripts etc)

Architecture

• Lightweight Management Layer

– Distributed database, distributed application

– Across 2 or 3 servers

– Across multiple availability zones

• Encoders launched and destroyed on demand

– 2 encoders in different availability zones per webcast

– “crossed streams” for PSTN recovery

– System is self-healing

– Crashes detected almost instantly, and recovery initiated

– New encoders commissioned in < 70 seconds

– US-East Outage. We barely noticed.

Architecture (2)

• Communication with TR internal services

through simple ReST API / file transfers

– Reduces coupling between systems

– Makes future changes easy to implement

– Keep things simple!

Architecture (3)

• Choice of language important

– “Simple” websites - Java, C#, Ruby etc. are fine

– When resilience / distributed computing is important, then

these are less appropriate

• We are big fans of Erlang. Happy to talk about this later...

• Initial deployment on Windows due to audio toolchain

• Recent port to Linux platform

– Reduced costs

• Removal of “overweight” 3rd party tools allowed smaller instance

size

– Improved performance (particularly boot-time)

– ReST interface meant zero changes to other systems

Why was Amazon on the short-list

• Multiple globally-distributed locations

• They were the number one provider

• Great API capability

• Supported Windows VMs with Admin access

– Not some higher-level PaaS model

– Nothing wrong with that, but we needed custom device

driver support for the audio tool chain

• Cost was competitive

What we learnt about Cloud

• “Cloud” is an abused buzzword

• We’ve always considered Cloud to be about the

elasticity

• Some consider Cloud to be “just” virtualisation. We

don’t.

• Turned out that most vendors are not as focused on

elasticity

– And hence have significant issues if you use them in that

way

– Which was a surprise, and cost quite a lot

What we learnt about Cloud (2)

• Cost model is not as simple as we first thought

– It not just compute hours

• Need to consider network traffic, EBS data and I/O charges,

long-term S3 storage etc. etc.

– And forgetting to turn off machines in the test stack gets

expensive!

• Get Lean

– Keep software stack as small as possible

• Smaller server instances => lower CPU and EBS costs

– When running many 1000’s of hours, this really adds up

– Therefore use of large third-party products can have hidden

costs

What we learnt about Amazon

• They understand their business

– No scope for negotiation; it’s a commodity product

• Handle elasticity vastly better than other vendors

• Support model has evolved

– Premium model for enterprise customers

• Well thought through API

– And we’ve never (yet) been hit by API maintenance

windows

• Admin UI is good

– Some other vendors’ UIs are unusable for this scale

Elasticity Demo

Quick to Market

• Proof of concept – May 2010

• Funding approval August 2010

• “Full” project start October 2010

• Launch September 2011

Outcomes

• Day one:

– Improved audio quality

– Improved resiliency

– Cost reduction

– Single biggest cause of customer issues (PSTN drops) now

resolved in ~20ms

• Ongoing:

– Ability to scale business has vastly improved

– Global flexibility, ability to control from anywhere in the

world

What would we like to see from Amazon

• Ability to share AMIs across availability zones

• Commercial grade SLAs

• Support for all instance types in at least two

availability zones

• Improved usage reporting for invoice reconciliation

• More flexibility in reserved instances

• Not bothered about a common API

– Easy to adopt a new API (assuming it’s been thought

through)

– Common API restricts innovation

aws customer presentation - thomson reuters - delivering on the promise of digital media

Technology