big data

28
Why Big Data?

Upload: alfredo-favenza

Post on 13-Jan-2015

155 views

Category:

Technology


7 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Big data

Why Big Data?

Page 2: Big data
Page 3: Big data
Page 4: Big data
Page 5: Big data

Understanding Big Data

Page 6: Big data

Cheap Storage

$100 gets you 3million times

more storage in 30 years)

Inexpensive Computing

1980 10 MIPS/$ 2005 10M MIPS/$

Device Explosion

>5.5 billion (70+% of global population)

KEY TRENDS

Social Networks

>2 Billionusers

Ubiquitous Connection

Web traffic2010 130 Exabyte (10 E18)

2015 1.6 ZettaByte (10 E21)

Sensor Networks

>10 Billion

Page 7: Big data

Internet of things Audio /

VideoLog Files

Text/Image

Social Sentiment

Data Market FeedseGov Feeds

Weather

Wikis / Blogs

Click Stream

Sensors / RFID / Devices

Spatial & GPS Coordinates

WEB 2.0Mobile

Advertising

Collaboration

eCommerce

Digital Marketing

Search Marketing

Web Logs

Recommendations

ERP / CRM

Sales Pipeline

PayablesPayroll

Inventory

Contacts

Deal Tracking

Terabytes(10E12)

Gigabytes(10E9)

Exabytes(10E18)

Petabytes(10E15)

Velocity - Variety - variability

Volu

me

1980190,000$

20100.07$

19909,000$

200015$Storage/GB

ERP / CRM WEB 2.0

Internet of things

What is Big Data?

Page 8: Big data

Big Data, BIG OPPORTUNITY

Big Data is a top priority for institutions

49% CEOs and CIOs are planning big data projects

Software Growth

2012

2013

2014

2015

0

41.8 2.5

3.44.6

Bil

lio

ns

$

34% compound annual growth rate2

Services Growth

2012

2013

2014

2015

048

2.7 3.9 5.16.5

Bil

lio

ns

$

39% compound annual growth rate2

1. McKinsey&Company, McKinsey Global Survey Results, Minding Your Digital Business, 20122. IDC Market Analysis, Worldwide Big Data Technology and Services 2012–2015 Forecast , 2012

Page 9: Big data

Big Data Scenarios

Page 10: Big data

OPERATIONAL DATA

Traditional E-Commerce Data Flow

NEW USER REGISTRY

NEW PURCHASE

NEW PRODUCT

Excess Data

Logs

ETL Some Data

Data Warehouse

Page 11: Big data

OPERATIONAL DATA

New E-Commerce Big Data Flow

Raw Data“Store it All” Cluster

Raw Data“Store it All” Cluster

NEW USER REGISTRY

NEW PURCHASE

NEW PRODUCT

Data Warehouse

Logs

Logs

How much do views for certain products increase when our TV ads run?

Page 12: Big data

Devices: Internet and Internet of things

Internet of

things Invisible devicesTrillions of networked

nodes

Low bandwidth last-mile

connection

100kBit/sec

Mostly addressed by local schemes

Machine-centric Sensing-focus

Trillions of computer-enabled

devices which are part of the

IoT

Global addressing

User-centricCommunication-

focus

Internet

Laptops / tablets / smartphones

Billions of networked devices

High-bandwidth access

Cable: 10Mbs+Fiber: 50-100Mbs

6+billion people

1.5 billion use net

US: 4.3 devices per adult

Page 14: Big data

Microsoft Hadoop VisionInsights to all users by activating new types of data

Integrate with Microsoft Business Intelligence

Choice of deployment on Windows Server + Windows Azure

Integrate with Windows Components (AD, Systems Center)Easy installation and configuration of Hadoop on Windows

Simplified programming with . Net & Javascript integration

Integrate with SQL Server Data Warehousing

Diff

ere

nti

ati

on

Page 15: Big data

Hadoop Distributed Architecture

Page 16: Big data

FIRST, STORE THE DATA

Server

ServerServer

MapReduce: Move Code to the Data

Files

Server

Page 17: Big data

SECOND, TAKE THE PROCESSING TO THE DATA

So How Does It Work?

// Map Reduce function in JavaScript

var map = function (key, value, context) {var words = value.split(/[^a-zA-Z]/);for (var i = 0; i < words.length; i++) {

if (words[i] !== "")context.write(words[i].toLowerCase(),1);}}};

var reduce = function (key, values, context) {var sum = 0;while (values.hasNext()) {sum += parseInt(values.next());

}context.write(key, sum);};

ServerServer

ServerServer

RUNTIME

Code

Page 18: Big data

MapReduce – Workflow

Page 19: Big data

Our weather model and resulting data sets should be accessible to universities and other institutions.

Aerospace Development Manager, U.S. Federal Government

It takes more time to hand a project from the seismic guys to me to the engineers in production than it does to figure out the oil field plays.

Geologist, Major oil and gas company

MapReduce – Workflow

Page 20: Big data

Windows Azure HDInsight Service

Page 21: Big data

Distributed Storage(HDFS)

Query(Hive)

Distributed Processing

(MapReduce)

Scripting(Pig)

NoSQ

L Data

base

(HB

ase

)

Metadata(HCatalog)

Data

Inte

gra

tion

( OD

BC

/ SQ

OO

P/ REST)

Rela

tiona

l(S

QL

Serve

r)

Machine Learning(Mahout)

Graph(Pegasus)

Stats processin

g(RHadoo

p)

Eve

nt Pip

elin

e(Flu

me)

Active Directory (Security)

Monitoring & Deployment

(System Center)

C#, F#, .NET

JavaScript

Pipelin

e / w

orkflo

w(O

ozie

)

Azure Storage Vault (ASV)

PD

W Po

lybase

Busin

ess

Inte

lligence

(E

xcel, Po

wer

Vie

w, S

SA

S)

HDINSIGHT / HADOOP Eco-System

World's Data (Azure Data Marketplace)

Eve

nt

Drive

n

Proce

ssing

LegendRed = Core HadoopBlue = Data processingPurple = Microsoft integration points and value addsOrange = Data MovementGreen = Packages

Page 22: Big data
Page 23: Big data

MICROSOFT CONFIDENTIAL – INTERNAL ONLY

Front end

Front end

Stream Layer

Partition Layer

HDFS on Azure: Tale of two File Systems

Name Node

de

Data Node Data Node

Front end

HDFS API

DFS (1 Data Node per Worker Role)and Compute Cluster

Azure Storage (ASV)

Azure Blob Storage

Page 24: Big data

Azure Storage (ASV)• Default file system for HDInsight Service• Provides sharable, persistent, highly-scalable Storage with high

availability (Azure Blob Store)• Azure storage itself does not provide compute• Fast access from compute nodes to data in same data center• Several file systems, addressable via:asv[s]:<container>@<account>.blob.core.windows.net/<path>

• Requires storage key in core-site.xml:<property> <name>fs.azure.account.key.accountname</name> <value>enterthekeyvaluehere</value></property>

Page 25: Big data
Page 26: Big data

Programming HDInsightExisting Ecosystem

Hive, Pig, Mahout, Cascading, Scalding, Scoobi, Pegasus…

.NET

JavaScript

DevOps / IT Pros

C#, F# Map/Reduce, LINQ to Hive, .NET management clients

JavaScript Map/Reduce, Browser hosted console, Node.js management clients

PowerShell, Cross Platform CLI tools

Page 27: Big data

Authoring Jobs App Integration

Building Developer Experiences

Core Hadoop

Consistent REST API’s

Breadth of Clients (Java, JS, .NET, etc)

Authoring frameworks and languages

End User Tooling (IDE’s, Analyst tools, Command lines)

ConnectivityProgrammabilitySecurityLoosely coupled

LightweightLow cost to

extendScenario oriented

Innovation flows upward

New compute models

Perf enhancements

Extend breadth & depthEnable new scenariosIntegrate with current tool chains

Page 28: Big data

© 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to

be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

© 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION

IN THIS PRESENTATION.