aws customer presentation - thomson reuters - delivering on the promise of digital media
TRANSCRIPT
THOMSON REUTERS WEBCASTING IN THE CLOUD MULTIMEDIA SOLUTIONS
SIMON BALL, THOMSON REUTERS ADRIAN ROE, ID3AS
14th JUNE 2012
Intro to Thomson Reuters
• Multimedia Solutions is part of Corporate Services
which is part of the Financial and Risk business
segment within Thomson Reuters.
• Provides multimedia communications solutions
which address the needs of professional
communicators, including content creation, vertical
and workflow specialization, distribution and reach,
and actionable analytics.
• As the only truly global provider in the industry, we
offer a unique single vendor solution for multi-
national firms.
The Business
• “Fair Disclosure” legislation demands that:
– Companies distribute quarterly results in a timely manner
– Releases to financial markets are made available to the
public at the same time
• Webcasting is a cost-effective way of doing this
• 25,000+ live events per year
– Very spiky - 4 high-volume periods, lots of quiet ones
– Average usage is less than 5% of peak
– Around 300 concurrent events on a busy day
• Audiences in thousands
How Did We Deliver to Customers Before?
• Service Vendors
– Conversion to web stream (encoder)
– Teleconference service
• Regional encoding centers
– Manual capture from telephone device
– Encoding Hardware
Motivation for change
• Previous Platform:
– Update technology and improve quality
– Did not allow the business to scale
– Was expensive to run
• Motivators were (in order):
– Improve customer experience
– Global platform consolation
– Allow business to scale
– Reduce cost
Evaluation Process
• Buy vs. Build
– No off-the-shelf product that offered required functionality
without significant customisation
• Private data center vs. cloud-based
– Private data center lacked flexibility
– Significant up-front capex was not attractive
• Tested multiple cloud vendors
– Explicitly wanted a multi-vendor strategy
• Resilience
• Avoid lock-in
System Schematic
How Do We Do It Now?
What do we do
• Webcast
Intro to id3as
• Elastic solutions for the broadcast, multimedia and
finance sectors
• Specialist in:
– Custom solutions
• Creation of lean, innovative, high-density solutions
– Large-scale
• Going beyond “simple” website clusters of a few machines to
systems needing highly distributed compute or data
requirements, where “traditional” tools are not necessarily
appropriate
– Highly available
• No single points of failure
• Zero downtime maintenance
Technical Challenges
• Delivering quality SLA from a commodity platform
• Scalability
– On-demand management of ~1000 servers
• Resilience
– No webcast to have a single point of failure
• Support
– support of ~1000 servers distributed around the world
– Need for (simple) tools (web UI, scripts etc)
Architecture
• Lightweight Management Layer
– Distributed database, distributed application
– Across 2 or 3 servers
– Across multiple availability zones
• Encoders launched and destroyed on demand
– 2 encoders in different availability zones per webcast
– “crossed streams” for PSTN recovery
– System is self-healing
– Crashes detected almost instantly, and recovery initiated
– New encoders commissioned in < 70 seconds
– US-East Outage. We barely noticed.
Architecture (2)
• Communication with TR internal services
through simple ReST API / file transfers
– Reduces coupling between systems
– Makes future changes easy to implement
– Keep things simple!
Architecture (3)
• Choice of language important
– “Simple” websites - Java, C#, Ruby etc. are fine
– When resilience / distributed computing is important, then
these are less appropriate
• We are big fans of Erlang. Happy to talk about this later...
• Initial deployment on Windows due to audio toolchain
• Recent port to Linux platform
– Reduced costs
• Removal of “overweight” 3rd party tools allowed smaller instance
size
– Improved performance (particularly boot-time)
– ReST interface meant zero changes to other systems
Why was Amazon on the short-list
• Multiple globally-distributed locations
• They were the number one provider
• Great API capability
• Supported Windows VMs with Admin access
– Not some higher-level PaaS model
– Nothing wrong with that, but we needed custom device
driver support for the audio tool chain
• Cost was competitive
What we learnt about Cloud
• “Cloud” is an abused buzzword
• We’ve always considered Cloud to be about the
elasticity
• Some consider Cloud to be “just” virtualisation. We
don’t.
• Turned out that most vendors are not as focused on
elasticity
– And hence have significant issues if you use them in that
way
– Which was a surprise, and cost quite a lot
What we learnt about Cloud (2)
• Cost model is not as simple as we first thought
– It not just compute hours
• Need to consider network traffic, EBS data and I/O charges,
long-term S3 storage etc. etc.
– And forgetting to turn off machines in the test stack gets
expensive!
• Get Lean
– Keep software stack as small as possible
• Smaller server instances => lower CPU and EBS costs
– When running many 1000’s of hours, this really adds up
– Therefore use of large third-party products can have hidden
costs
What we learnt about Amazon
• They understand their business
– No scope for negotiation; it’s a commodity product
• Handle elasticity vastly better than other vendors
• Support model has evolved
– Premium model for enterprise customers
• Well thought through API
– And we’ve never (yet) been hit by API maintenance
windows
• Admin UI is good
– Some other vendors’ UIs are unusable for this scale
Elasticity Demo
Quick to Market
• Proof of concept – May 2010
• Funding approval August 2010
• “Full” project start October 2010
• Launch September 2011
Outcomes
• Day one:
– Improved audio quality
– Improved resiliency
– Cost reduction
– Single biggest cause of customer issues (PSTN drops) now
resolved in ~20ms
• Ongoing:
– Ability to scale business has vastly improved
– Global flexibility, ability to control from anywhere in the
world
What would we like to see from Amazon
• Ability to share AMIs across availability zones
• Commercial grade SLAs
• Support for all instance types in at least two
availability zones
• Improved usage reporting for invoice reconciliation
• More flexibility in reserved instances
• Not bothered about a common API
– Easy to adopt a new API (assuming it’s been thought
through)
– Common API restricts innovation