girish juneja - intel big data & cloud summit 2013

23
APAC Big Data & Cloud Summit 2013 Girish Juneja GM, Big Data Software Software & Services Group

Upload: intelapac

Post on 15-Jan-2015

737 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Girish Juneja - Intel Big Data & Cloud Summit 2013

APAC Big Data &

Cloud Summit 2013

Girish JunejaGM, Big Data Software

Software & Services Group

Page 2: Girish Juneja - Intel Big Data & Cloud Summit 2013

Data Fab Transistor System Enablement Optimization Intelligence

Page 3: Girish Juneja - Intel Big Data & Cloud Summit 2013

Data

30 million networked

sensors growing at 30% a year

Computing

1 trillion devices connected to the Internet by 2015

Experience

500 million smart phone users

increasing 20% a year

Social

Machine Generated

User Generated

Feedback loops driving exponential growth

Page 4: Girish Juneja - Intel Big Data & Cloud Summit 2013

Evolving towards end-to-end real-time analytics

Decade Paradigm Architecture Platform

• Reporting / Data Mining• High Cost / Isolated use

90s

2000s

Today

• Model-based discovery• High Cost / Dept Use

• Unbounded Map Reduce Query • Low Cost / Enterprise Use• Arrival of vast amounts of

unstructured data

• Batch – “sales reports”• Sequential SQL queries

• Batch-ie correlated buying pattern• No SQL. parallel analysis• Shared disk/memory

Unlimited Linear Scale

RDMS

Proprietary MPP/DW Appliance

Open Source SW looselycoupled to commodity HW

No SQL RDMS

Scale

Scale NodeNode

• Real-time - ie recommend engine• Process @ storage node• Built-in data replication/reliability• Shared nothing, in memory

Distributed node addition

NodeNode Node

Multi-core

Node

Page 5: Girish Juneja - Intel Big Data & Cloud Summit 2013

Make big data work for you

Amount of data your enterprise will need to ingest: 50X

Proportion of data that is useful to you: 10%

Projected increase in your IT budget: 10%

=> Business as usual is not an option

Page 6: Girish Juneja - Intel Big Data & Cloud Summit 2013

SoftwareGlobal

EcosystemSecurity

Systems Architecture

Energy Efficient

Performance

ManufacturingLeadership

Benefit from Intel’s long-standing investments

Page 7: Girish Juneja - Intel Big Data & Cloud Summit 2013

Using volume economics to drive innovation

Intel

Page 8: Girish Juneja - Intel Big Data & Cloud Summit 2013

Fabricating silicon for big data

22nmA Revolutionary Leap

in Process Technology

37%Performance Gain at Low

Voltage1

>50%Active Power Reduction at

Constant Performance1

Intel lead vs. Industry

3.5 years

2007

45 nm2009

32 nm2011

22 nm

High-k Metal Gate Tri Gate

Intel lead vs. Industry

4 years

Page 9: Girish Juneja - Intel Big Data & Cloud Summit 2013

Intel® Xeon® Processor E5-4600 Product Family

Highest reliability & scalability

Highest memory capacity

Highest enterprise & database performance

Density-optimized

Cost-optimized

Improved HPC performance

1 Source: Published results as of 8 May 2012. See http://www.intel.com/performance/server/xeonE7/summary.htm for full list of benchmarks and configuration details.

Pumping the heart of the open datacenterIntel® Xeon® Processor E7-4800

Product Family

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

Page 10: Girish Juneja - Intel Big Data & Cloud Summit 2013

Enabling open source solutionsOptimize software to take advantage of Intel® architecture

AES-NI SSD, 10GbE TXTMCAVT-*

3x performance in 3 years

Mission Critical deployments

Accelerates Crypto in JBoss

30x throughput Trusted Compute Pools

Page 11: Girish Juneja - Intel Big Data & Cloud Summit 2013

Contributing to Apache Hadoop

• File based encryption for Hadoop jobs• ACLs for HDFS and HBase at cell level

• Flash storage for MapReduce shuffle data• Caching and non-volatile memory for increased throughput• HDFS adaptive replication of hot-files

• HBase distributed tables across data centers• HDFS data replication across data centers• Archival storage support for cold data on HDFS

• SSE Instructions• JVM Enhancements• Infiniband RDMA Support

Page 12: Girish Juneja - Intel Big Data & Cloud Summit 2013

Supporting Intel Distribution for Apache Hadoop

Data Mining

Graph Analytics

Full Text SearchFull SQL

Batch Analytics

Security

Page 13: Girish Juneja - Intel Big Data & Cloud Summit 2013

Intel® Distribution for Apache Hadoop* software

Granular access control in HBase

Up to 20X faster crypto with AES-NI*

30X faster Terasort on Intel® Xeon processors, Intel 10GbE, and SSD

Up to 8.5X faster queries in Hive*

Job profiling and configuration, automated by Intel® Active Tuner

*Based on internal testing

Rhino

Cloud

HPC

Common authentication, access control, auditing

Bringing MapReduce to data on Lustre FS

Enabling real-time 100% SQL on Hadoop

Optimizing Hadoop for virtualization & cloud

Page 14: Girish Juneja - Intel Big Data & Cloud Summit 2013

Backed by portfolio of datacenter products

Software

NetworkStorage & MemoryServer

Cache

Acceleration Software

Page 15: Girish Juneja - Intel Big Data & Cloud Summit 2013

With broad support from the ecosystem

* Other names and brands may be claimed as the property of others.

Page 16: Girish Juneja - Intel Big Data & Cloud Summit 2013

Proven in the enterprise

Using the Intel® Distribution to gain tremendous results

* Other names and brands may be claimed as the property of others.

IT

Page 17: Girish Juneja - Intel Big Data & Cloud Summit 2013

Putting advanced capabilities at work…• Expose new data• Dashboard/historical reporting• Real-time campaigns• Vertical apps• Predictive data services• Graph visualization• Log analysis

to solve real use cases• Fraud & threat detection• Life sciences research• Behavioral analysis

• Warranty analysis• Customer segmentation• Infrastructure optimization

From Hype to High Performance

Page 18: Girish Juneja - Intel Big Data & Cloud Summit 2013

Data-Driven Business: Customer Service

Value

• Enable subscriber access to billing data

• 30X gain in performance; lower TCO

Analytics

• Provides real-time retrieval of 6 months data

• Supports new BI with 15 types of queries

• Enables targeted ad serving and promotions

Data Management

• 30 TB/month of billing data

• 300K reads/second; 800K inserts/second

• 133-node cluster / Intel Xeon E5 processorsCDR

Subscriber Self Service

Intel Distribution

Page 19: Girish Juneja - Intel Big Data & Cloud Summit 2013

Value

Enable researchers to discover biomarkers and drug targets by correlating genomic data sets

90% gain in throughput; 6X data compression

Analytics

Provide curated data sets with pre-computed analysis (classification, correlation, biomarkers)

Provide APIs for applications to combine and analyze public and private data sets

Data Management

Use Hive and Hadoop for query and search

Dynamically partition and scale Hbase

10-node cluster / Intel Xeon E5 processors / 10GbE

Data-Intensive Discovery: Genomics

Intel Distribution

Page 20: Girish Juneja - Intel Big Data & Cloud Summit 2013

Data-Rich Communities: Smart City

Value

• Enforce traffic laws and detect license fraud

• Monitor and predict traffic patterns

• In a city of 31 million people

Analytics

• Detect traffic law violations automatically

• Detect driver license fraud by data mining

• Forecast traffic with predictive analytics

Data Management

• 30,000 cameras

• 6Mb/s stream rate per camera

• 15 PB of images in use / 2B records in HBase

Detection Prevention

Regional

Local

Page 21: Girish Juneja - Intel Big Data & Cloud Summit 2013

Catalyzing the ecosystemFoster the ecosystem and develop new markets for Intel and its partners

Page 22: Girish Juneja - Intel Big Data & Cloud Summit 2013

Resources

Content

Case Studies

Whitepapers

Demos

http://hadoop.intel.com

Contacts

Girish Juneja

RK Hiremane

Eddie Toh

[email protected]

Page 23: Girish Juneja - Intel Big Data & Cloud Summit 2013