learn. connect. explore. - microsoft... · • deploys redis, cassandra, mongodb for nosql •...
TRANSCRIPT
Learn. Connect. Explore.Learn. Connect. Explore.
Architecting Open source solutions on Azure
Nicholas Dritsas
Senior Director, Microsoft Singapore
Agenda
• Developing OSS Apps on Azure
• Customer case with OSS Apps
• Hadoop on Azure
• Customer cases using Hadoop on Azure
Agenda
• Developing OSS Apps on Azure
• Customer case with OSS Apps
• Hadoop on Azure
• Customer cases using Hadoop on Azure
Flexible
Open Source & Azure
• Android, iOS & Node.js back-end via Azure Mobile Services
• Java, Ruby SDKs via Linux VM, Engine Yard & Oracle
• Websites for PHP, Node.js, Python & App Gallery
• MySQL via ClearDB, MongoDBvia MongoLab, Hadoop
• From Linux VMs via Image Gallery & VMDepot
Configuration
Example
Technologies What It Provides
Key/value stores
Column family stores
Document databases
Redis, Microsoft Azure
Tables and Cache
Fast access to large amounts of simply
structured data
Cassandra, HBase
MongoDB, CouchDB
Example Use Case
Fast access to large amounts of more structured data
Scalable store for JSON documents
Online shopping cart
A table storing web pages
Persistent store for Node.js application
Agenda
• Developing OSS Apps on Azure
• Customer case with OSS Apps
• Hadoop on Azure
• Customer cases using Hadoop on Azure
Migrating an end to end airline online system to Azure
Background
• FlyAir has very aggressive growth plans. As such, they expect their growth rates to be very high and they need to plan for better systems.
• The current systems are based on OSS. Centos/Ubuntu Linux OS running PHP and MySQL.
• FlyAir’s system consists of the following 4 main areas:• B2C, where they host the main web page and consumer interaction for
booking or managing flights directly.
• B2T, where they support the travel agencies and where the majority of the revenue is coming from
• B2M, mobile users support
• B2B, for corporate accounts
Migration process
• We moved all these 4 systems from on premises to Azure in a few weeks.
• The system is hosted in Singapore Data Center and it consists of a number of Large/Extra Large Ubuntu/CentOS VMs that host PHP for the front end and MySQL for the backend.
• HA is achieved using Azure Load Balancer, VM Availability sets and MySQL replication.
• Site to site VPN was established using a Cisco device to support connectivity to on premises LOB systems plus ticketing interface to Amadeus (centralized ticketing system).
Infrastructure view of B2C
Current state and futures
• System has been running stable and well performant since November 2013.
• FlyAir plans to add DR site in Hong Kong data center and utilize Traffic Manager and Resource Groups to manage failover/failback process.
• SCOM and Newrelic tools are used to monitor the sites and manage alerts and resource warnings.
Agenda
• Developing OSS Apps on Azure
• Customer case with OSS Apps
• Hadoop on Azure
• Customer cases using Hadoop on Azure
Azure HDInsight
HDInsight Supports Hive
• SQL-like queries on Hadoop data in HDInsight• HDInsight provides easy-to-use graphical query interface for Hive
• HiveQL is a SQL-like language (subset of SQL)
• Hive structures include well-understood database concepts such as tables, rows, columns, partitions
• Compiled into MapReduce jobs that are executed on Hadoop
• Dramatic performance gains with Stinger/Tez• Stinger is a Microsoft, Hortonworks and OSS driven initiative to bring interactive queries with Hive
• Brings query execution engine technology from Microsoft SQL Server to Hive
• Performance gains up to 100x
Hadoop 2.0
HDInsight Supports HBase
Data Node Data Node Data Node Data Node
Task Tracker Task Tracker Task Tracker Task Tracker
Name Node
Job Tracker
HMasterCoordination
Region Server Region Server Region Server Region Server
• NoSQL database on data in HDInsight
HDInsight Supports Mahout
• Machine learning library
HDInsight Supports StormComing Q4, CY2014
• Stream analytics for Near-Real Time processing
Connect Cloud Hadoop With On-premise
Scenarios For Deploying Hadoop As Hybrid
Agenda
• Developing OSS Apps on Azure
• Customer case with OSS Apps
• Hadoop on Azure
• Customer cases using Hadoop on Azure
Hadoop customer cases
1. Data Broker Company
Company Profile
Who is the customer
• Customer is a Seattle-based cloud software company, focused
exclusively on opening access to government data.
• SaaS government public set platform accessible via web, mobile,
and restful interfaces
Product details
• Open Data Platform
• GovStat insights and analytics
• API Foundry
Business Problem
Project Milestones
• M1: migration of Open data platform to Azure with 4-6 design validation
customers. Scaled down and ramp up as needed. Support and escalation
path defined for PFE.
• ~150 cores and 1.5 TB of data to be served for this phase
• M2: support up to100 customers. DR, monitoring and alerting
enhancements, compliance validation against FISMA/FedRamp. OData
integration, Windows 8 .NET application, Windows phone .NET
application, SQL IS integration for willing customers, Windows Azure
Marketplace integration and Localization.
• M3: IS integration completion post GA, OData enhancements, HDInsight
integration, Office 365 integration and PaaS transition study. 10 months
after M2.
Catalog
• Published Search API
• DCAT API
• Search over:
• Metadata
• Dataset contents
• Filters based on:
• View/Visualization
type
• Category
• Tags
• Geography
• Sorting over catalog
• Dataset view on Catalog
Views
Four basic visualizations
• Tabular
• Maps
• Charts
• Calendars
Operations
• Export (CSV, JSON, XLSX,
XML/RDF)
• Group By, Filter, Order By
• SoQL Requests
• Create Derived Views
Dataset Only Operations:
• Upsert, Append, Replace
• CSV upload
Can be embedded using the
Data Player
The Solution Architecture
Technology Landscape:
• ~120 cores of Ubuntu VMs in Production. ~50 VMs each in staging and
production environment.
• Standard 3-tier web application architecture
• Web tier is a RoR MVC application
• Application tier is Java deployed on Jetty, a servlet container
• REST API access to app layer. JAX-RS with Jersey
• SODA API
• Data tier is primarily PostgreSQL
• NoSQL options for monitoring, central service, rate limiting cache,
aggregate cache
• Deploys Redis, Cassandra, MongoDB for NoSQL
• Lucene based Orester service for search
• Zookeeper and ActiveMQ for coordination service, messaging, inter
process synchronization, discovery of services
• Miscellaneous for GeoServer, Monitoring, Alerting
• Deployment via Chef with azure-knife driver
• PureFTP for ftp uploads
High Level Component Architecture
High Level Role + Dataflow
Hadoop customer cases
2. Phone tracking and service company
Company 2 is providing technology protection services for mobile phones, consumer electronics, and
home appliance devices.
• Mobile telemetry scenario (uni-directional); data published from protected mobile devices
• Goal is to predict, detect and potentially mitigate failure conditions
• Business driver is improving customer claim experience; predicting customer escalation during claim
(self-service to agent), etc
• 6k events/second target (36M / day)
Project Overview
Business use cases
Ingestion Svc
Web Role(s)
Event Broker
Kafka
Predictive Maint.
Scoring
Customer Sat Scoring
Operational
Dashboard
Troubleshooting
Alerting
Blo
b
Sp
oo
ler
Azure Storage
Cloud ML
Cloud ML
Model Publishing
Model (Re)Training (Cloud
ML)
Orchestration (MDP)
Usage Reports & Analytics
Curated Data Sets for Self
Service
Insight
Backup &
device
telemetry
Call-Center and
Support-Site
logs
CRM Data
On-Premises
Anonymize
&
Synchroniz
e
Descriptive Analytics
Data Exploration
Insight
Your Feedback is Important
OPTION 3: Feedback stations outside the hall
Fill out evaluation of this session and help shape future events.
OPTION 1 OPTION 2