engineering big data infra with openstack
DESCRIPTION
This deck talks about, at a high level, how one can optimize Big Data analytics applications on Openstack.TRANSCRIPT
Cisco Confidential© 2010 Cisco and/or its affiliates. All rights reserved. 1
Engineering Big Data CloudsDebo Dutta – Principal Engineer, Cisco
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 2© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 2
Forward-looking StatementsThis presentation contains projections and other
forward-looking statements regarding future events or the future financial performance of Cisco, including
future operating results. These projections and statements are only predictions. Actual events or
results may differ materially from those in the projections or other forward-looking statements. Please
see Cisco’s filings with the SEC, including its most recent filings on Form 10-K and 10-Q, for a discussion
of important risk factors that could cause actual events or results to differ materially from those in the
projections or other forward-looking statements.
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 3
Data Deluge Everywhere: Enterprises need Insights in a cost-effective manner
Volume
Variety
Velocity
Veracity
Mobile Data – Location, Presence, Device, Access, Customer Video Growth - 65%
of Mobile and 90% of Fixed traffic will be video by 2015 (Cisco VNI)
M2M – 225 Million connections by 2014 (ABI Research) from vending machines & ATMs to connected automobileCommerce – Mobile
Payment platforms and local offers
Smart Converged Networks – B/W optimization, content placement, offload, SDN
Social Media – Consumer behavior, targeted advertisement, Social network platforms
Cloud (XaaS) & App stores – All data in the cloud
Adapted from PRIME deck
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 4
Big Data to Big Insights in the Cloud
IDC A new generation of technologies and architectures designed to economically extract value from very large volumes of a wide variety of data by enabling high-velocity capture, discovery, and/or analysis
• Shift from technology for finding information to discovering insights
• Increases interest in real-time analytics of machine generated data “Software defined” and converged technologies
• Open source software/platforms will play a pivotal role in big data Infrastructure – gain greater commercial adoption
• 2013 will be the year of “Big Data in the Cloud”
4
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 5
Rise of semi and unstructured data
Web Products,
Commerce, Services
Cloud Computing
Defense, Intelligence
and Security
Financial Services
• Click streams• Email• AVI files• User data / search
data
• Network log files• Event processing• Impact analysis
• Call data• Online activity• Travel data• GPS data• Satellite Feeds
• Fraud detection / risk analysis
• Transaction data warehousing
• PCI compliance• Surveillance
Forces Driving The Growth of Big Unstructured Data:
Other
• Meteorology• Disease /
epidemiology• Genomics• RFID data• Sensor data
Difficult to capture, store, search, share, and analyze with traditional tools
1) Source: Cisco Visual Networking Index, June 2011
Thanks to Corp Dev
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 6
BDaaS on public clouds• Focus on Platforms
• Focus on Integration
• Mostly ETL
• Leverage Public Clouds
• Very little focus on Insights
• Insights are obtained by in-house Data Scientists
• For Viz, UX is not there yet
Exceptions: Tableau
6
@Netflix
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 7
(Private) Cloud Provider, Big-data Services
Batch
HIVEAPI
Oozie
API
Hadoop
API
noSQL
Hive
API
Cassandra
API
Pig
API
Mongo
API
Real time
MapRD rill
API
Yahoo S4
API
Truviso
API
Storm
API
Lucene
API
User and System Admin
ComputeService
Servers
StorageServiceDisks
NetworkServiceNetworks
Hypervisor: KVM, Xen, ESX - Nexus 1000v + Open vSwitchNetwork Virtualization: VLAN, OpenFlow, LISP, VXLAN
Big Data on Openstack
OpenStack Cloud Platform• Bridges the virtual and physical layers
Resource Virtualization/hypervisor Layer• Creates and manages virtualized compute,
storage and networking resources
Physical Resource Layer• Networking, Storage and Compute resources
Devops
Intelligent Scheduler
Healthcare Big Data Application
Virtual VPN
Virtual Waas
Virtual
Firewall
App
OS
VM
DataBase
OS
VM
App
OS
VM
Single Instance Services
Healthcare Big Data Application
Virtual VPN
Virtual Waas
Virtual
Firewall
App
OS
VM
DataBase
OS
VM
App
OS
VM
Single Instance Services
Healthcare Big Data Application
Virtual VPN
Virtual Waas
Virtual
Firewall
App
OS
VM
DataBase
OS
VM
App
OS
VM
Single Instance Services
Healthcare Big Data Application
Virtual VPN
Virtual Waas
Virtual
Firewall
App
OS
VM
DataBase
OS
VM
App
OS
VM
Single Instance Services
Healthcare Big Data Application
Virtual VPN
DevopsServer
Virtual Load
Balancer
Dashboard
OS
VM
DataBase
OS
VM
Sensor
OS
VM
Single Instance Services
/tenant/industrial/tenant/healthcare
/tenant/industrial/tenant/finance
App Topology
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 8
Scheduling Heterogeneous Resources is key
8
VMsmetal metalVMsVMs VMsmetal metal
ComputeService
(VMs, Memory, Local Disk)
StorageService
(Block, Massive Key-value store)
Developer API
Servers Disks
NetworkService
(Virtual Networks, Services)
Networks
Map heavy workloads on bare metal with more resources,Light workloads on virtualized resources
Network(Topology), Storage aware scheduling