big data - applications and technologies overview

27
Big Data and its applications

Upload: sivashankar-ganapathy

Post on 15-Jul-2015

336 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Big Data and its

applications

Introduction

Big Data – use cases, applications, technologies and vendors overview

Aimed at providing high level overview of tools and technologies related to big data

Topics covered

Introduction to Big Data◦ Definition, need for Big Data, hype cycle

Applications of Big Data◦ Industry-wise applications

Big Data Technologies Overview◦ Hadoop, PIG, Hive, NoSQL, Columnar DB

Big Data Vendors Overview◦ Amazon, Cloudera, Hortonworks, MapR

etc

Big Data - definition

popular term used to describe the exponential growth and availability of data, both structured and unstructured.

collection of data from traditional and digital sources inside and outside your company that represents a source for ongoing discovery and analysis.

may refer to both volume of data as well as the tools and processes

3 Vs of Big Data

Need for Big Data

2.7 ZB of data in Digital Universe Today

FB stores and analyzes 30+ PB of data

Walmart data exceeds 2.5 PB

better decision making and increased operational efficiency

When to go for a Big Data

SolnAnalyze all types of data

Most or all of the data to be analyzed

Iterative and exploratory

Business measures not predetermined

Traditional warehouse not suitable for unstructured data and schema compliant

Gartner’s Hype Cycle

Retail – Pricing Optimization

Analyze millions of sold or items for sale

Valuable insights about customers and markets in quicker timeframes

Aggregate data from multiple channels in multiple formats

Day long jobs complete in minutes

Retail – Smart shopping exp

Pricing data, POS, txns, Social media, call center records, promotions

Better understanding of customer preferences, shopping patterns

Geo location apps - deliver personalized marketing experience

Big Data in Finance

Customer segmentation

◦ Correlate purchase history, profile info, behaviour on social media

◦ Generate portfolio advice

Fraud Detection systems

Wealth Management

◦ Investment Research – try out new investment ideas, improve algorithmic trading

◦ Customer knowledge – unified view of customer

Big Data in Finance

Regulatory Compliance

◦ Impact of Credit Crisis ‘08 – regulatory

compliance

◦ Stringent monitoring and reporting of data

Risk Management

◦ Better analysis of investment positions and risk

metrics

Big Data in Healthcare

EMR – Electronic Medical Records initiative in US

Complete digitization of a patient’s medical info such as profile, disease treatment, pharmacy visits etc

Shared across networks

Slow adoption and challenges in aggregation

Big Data in Healthcare

Predict health issues ◦ Build Model that predicts patient’s risk

◦ Hospital to do followup with high-risk patients to avoid hospitalization

Predicting outbreaks◦ IBM Research project -STEM

◦ Model – correlates disease data with climate and temperature

◦ Can predict disease outbreak for regions expecting climatic change

Big Data – Internet of

ThingsData generated by machine – RFID chips implanted in devices

3 phases◦ Data ingestion – cost

◦ Data storage - cost

◦ Analytics – real value

Outsouce phases 1 and 2 to DBAAS (redshift, hortonworks, cloudera)

UPS – Case study

Aim◦ Find the fastest and most fuel-efficient

way to deliver packages to customers

ORION research project◦ Captures driver behaviour and safety

habits thru GPS

◦ Sensor data on fuel emissions and consumption

◦ Monitors deliveries and customer service

◦ Runs advanced algorithms to optimize routes

UPS – Case Study

early testing in 2011-2012 for 10k routes – 1.5 million gallons of fuel saved

Complete deployment in 55000 routes throughout North America by 2017

Big Data – Technologies

Mapreduce◦ programming paradigm allows massive

job execution parallely across thousands of servers

◦ Map task - input dataset is converted into a different set of key/value pairs

◦ Reduce task - several of the outputs of the "Map" task are combined to form a reduced set of tuples

Big Data - Technologies

Hadoop◦ Most popular open-source implementation

of mapreduce

◦ Can work with multiple forms of data

◦ run processor-intensive machine learning jobs

HIVE◦ Developed by FB and later made open-

source

◦ SQL like feature on top of hadoop

◦ Query data stored in a hadoop cluster

Big Data - Technologies

PIG◦ Scripting language

◦ Transforms data present in Hadoopcluster

◦ Developed by Yahoo and made open-source

NoSQL◦ Schema less databases

◦ Storage and retrieval of huge amounts of unstructured data

◦ Scalable, flexible and cloud-friendly but less consistent

◦ Cassandra, MongoDB, CouchDB,

Other Big Data

TechnologiesSearch engines – Lucene, Solr, ElasticSearch, Amazon CloudSearch

Stream Processing ◦ Apache Storm, Apache Spark, Cloudera’s

Impala, Yahoo’s S4 and Apache Tez

Big Data – Vendors

Amazon◦ Elastic Map Reduce – Amazon’s hadoop

distribution to be run on AWS infrastructure

◦ “largest adoption of hadoop platforms in the market” – Forrester report

Cloudera◦ Uses many aspects of open-source

hadoop

◦ Lot of features built on top of its hadoopnamely Cloudera Manager and Impala

◦ Strategy – stick to core hadoop but innovate

Big Data - Vendors

Hortonworks◦ Builds open-source hadoop ecosystem

◦ Also innovates – Ambari – cluster management software

IBM◦ Infosphere BigInsights – Analytics at rest

◦ Infosphere streams – Analytics in motion

◦ Hadoop-based analytics

◦ Stream computing

◦ Data Warehousing

◦ Application development

Big Data - Vendors

Intel◦ Develops custom Hadoop version on

Xeon chips

◦ Closest affinity between hardware and software

MapR◦ Best growing Hadoop distribution

company

◦ Highest scores for distribution architechture and data processing capabilities

◦ Needs more branding

Big Data - Vendors

Microsoft◦ Does not encourage open-source but

promotes hadoop

◦ HDInsight

Hadoop as a service to be run on Windows Azure

based on Hortonworks’ hadoop distribution

◦ Polybase

SQL server info can be searched on hadoop

◦ Big presence in other markets enables delivering end-end Hadoop solution

Big Data - Vendors

Teradata◦ SQL and RDBMS specialization

◦ Partnered with HortonWorks

◦ Integrated Hadoop with existing SQL offerings

◦ Existing teradata users can use Hadoopplatform to process warehouses data

Questions