big data on aws - toronto fsi symposium - october 2016
TRANSCRIPT
Shawn GandhiHead of Solutions Architecture
AWS Canada@shawnagram
Big Data on AWS
Generated data
Available for analysis
Data volume
Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011
IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
Abraham Wald (1902-1950)
Data is part of the fabric of the applications
Front-end and UX Mobile Back-end and operations
Data and analytics
What is AWS?
AWS Global Infrastructure
Application Services
Networking
Deployment & Administration
DatabaseStorageCompute
ENTERPRISE APPS
DEVELOPMENT & OPERATIONSMOBILE SERVICESAPP SERVICESANALYTICS
DataWarehousing
Hadoop/Spark
Streaming Data Collection
Machine Learning
Elastic Search
Virtual Desktops
Sharing & Collaboration
Corporate Email
Backup
Queuing & Notifications
Workflow
Search
Transcoding
One-click App Deployment
Identity
Sync
Single Integrated Console
PushNotifications
DevOps Resource Management
Application Lifecycle Management
Containers
Triggers
Resource Templates
TECHNICAL & BUSINESS SUPPORT
Account Management
Support
Professional Services
Training & Certification
Security & Pricing Reports
Partner Ecosystem
Solutions Architects
MARKETPLACE
Business Apps
Business Intelligence
DatabasesDevOps Tools
NetworkingSecurity Storage
RegionsAvailability Zones
Points of Presence
INFRASTRUCTURE
CORE SERVICES
ComputeVMs, Auto-scaling, & Load Balancing
StorageObject, Blocks, Archival, Import/Export
DatabasesRelational, NoSQL, Caching, Migration
NetworkingVPC, DX, DNS
CDN
Access Control
Identity Management
Key Management & Storage
Monitoring & Logs
Assessment and reporting
Resource & Usage Auditing
SECURITY & COMPLIANCE
Configuration Compliance
Web application firewall
HYBRIDARCHITECTURE
Data Backups
Integrated App Deployments
DirectConnect
IdentityFederation
IntegratedResource Management
Integrated Networking
API Gateway
IoT
Rules Engine
Device Shadows
Device SDKs
Registry
Device Gateway
Streaming Data Analysis
Business Intelligence
MobileAnalytics
Three types of data-driven development
Retrospectiveanalysis and
reporting
Amazon RedshiftAmazon RDS Amazon S3
Amazon EMR
Three types of data-driven development
Retrospectiveanalysis and
reporting
Here-and-nowreal-time processing
and dashboards
Amazon Kinesis Amazon EC2 AWS Lambda
Amazon Redshift, Amazon RDS Amazon S3
Amazon EMR
Three types of data-driven development
Retrospectiveanalysis and
reporting
Here-and-nowreal-time processing
and dashboards
Predictionsto enable smart
applications
Amazon Kinesis Amazon EC2 AWS Lambda
Amazon Redshift, Amazon RDS Amazon S3
Amazon EMR
Global Footprint
AZ
AZ
AZ AZ AZ
What is a Region?• Each datacenter has a purpose built
network
AZ
AZ
AZ AZ AZ
What is a Region?
• Metro-area DWDM links between AZs
• AZs <2ms apart & usually <1ms
• Each datacenter has a purpose built
network
Big Data Pipeline
Data AnswersCollect Process Analyze
Store
Primitive Patterns
Collect Process Analyze
Store
Data Collectionand Storage
Data
Processing
EventProcessing
Data Analysis
One tool to
rule them all
Collect Process Analyze
Store
Data Collectionand Storage
Data
Processing
Data Analysis
EventProcessing
Primitive Patterns
S3
Kinesis
DynamoDB
RDS (Aurora)
AWS Lambda
KCL AppsEMR Redshift
MachineLearning
Collect Process Analyze
Store
Data Collectionand Storage
Primitive Patterns
S3
Kinesis
DynamoDB
RDS (Aurora)
Data Collection and Storage
File
Stream
Transactional
Ap
ps
Logg
ing
Fram
ewo
rks
AWS Services – Data Collection and Storage
S3$0.030/GB-Mo
RedshiftStarts at $0.25/hour
EC2Starts at $0.02/hour
Glacier$0.010/GB-Mo
Kinesis$0.015/shard 1MB/s in; 2MB/out$0.028/million puts
Collect Process Analyze
Store
EventProcessing
Primitive Patterns
AWS Lambda
KCL Apps
Event Processing – Enabling Capabilities
AWS Lambda
KCL Apps
Primitive Patterns
Collect Process Analyze
Store
Data Collectionand Storage
Data
Processing
EventProcessing
Data Analysis
EMR Redshift
MachineLearning
Big Data in Action
FINRA handles approximately 30 billion market events every day to build a holistic picture of trading in the U.S.
Determisconduct by
enforcing the rules
Detectand prevent wrongdoing
in the U.S. markets
Disciplinethose who
break the rules
Market volumes are volatile and steadily increasing
Exchanges and markets are evolving dynamically
New securities products are being introduced
New rules and regulations are being created
Market manipulators are innovating
FINRA – The Need for Big Data
AWS Offered the Right Services For FINRA’s Platform
Cloud Platform
APIs at the right layer
Automated infrastructure deployment
Open source commitment
Operations Security
FINRA – A Platform That Adapts to Market Dynamics
Data IntegrationHbaseHadoopMapReduce
Flexible Interactive QueriesHadoopEMRSQL/Hive
Fast Predefined QueriesHbase/NoSQLHadoopPredefined Datamarts
Surveillance AnalyticsEMRHive
Web ApplicationsAnalystsRegulators
Data Management ServicesData MovementData RegistrationNotificationVersion ManagementJob ManagementCluster Management
S3
Firms
From One Instance
To Thousands
And Back Again
“ At FINRA, we chose AWS because we wanted to be able to deliver innovation at a
much larger scale and much more rapidly to our core business.
”- Saman Michael Far, SVP of TechnologyWhat FINRA needed:• Infrastructure for its market surveillance platform• Analysis and storage of approximately 75 billion market records every day • Interactively query multi-petabyte data sets
Why they chose AWS:• Fulfillment of FINRA’s security requirements• Ability to create a flexible platform using dynamic clusters (Hadoop, Hive, and
HBase), Amazon EMR, and Amazon S3
Benefits realized:• Increased agility, speed, and cost savings• Estimated savings of $20m annually by using AWS
FINRA
FINRA is the largest independent regulator for all securities firms doing business in the US. FINRA oversees about 4,250 brokerage
firms, about 162,155 branch offices and approximately 629,525
registered securities representatives.
“ The speed and performance of AWS are impressive. Data manipulation processes
that took days are now down to one minute.
”
National Bank of Canada has more than CAD$219 billion in AUM. The bank’s Global Equity Derivatives
Group (GED) is a leader in providing stock-trading solutions that manage exchange-traded securities such as stocks, funds, futures, and options.
- Pascal Bergeron, Director of Algorithmic TradingWhat the National Bank of Canada needed:• Quickly collect a fast-growing volume of stock-market financial data• Scale its data-analysis platform, which was outgrowing the on-prem resources• Process and analyze structured and unstructured data, historic and real time
Why they chose AWS:• The most big data services and solutions, such as Cloudera and TickSmith• Reliability to easily process and analyze hundreds of terabytes of data
Benefits realized:• Ability to easily access historic data, as far back as 10 years ago• Acceleration of post-trade analysis time, from weeks to hours• Improvement and optimization of trading operations, resulting in more revenue
The National Bank of Canada
The Benefits of Big Data on AWS
AgilityRespond quickly to market challenges
Speed Cost SavingsReduce query times from
hours to secondsEfficient scale
Pay for what you use
Thank you
@shawnagram