hadoop - an introduction for sql server dbas

18
Hadoop. An introduction for SQL Server DBAs.

Upload: andrewdenty

Post on 28-Nov-2014

63 views

Category:

Technology


0 download

DESCRIPTION

Hadoop - An introduction for SQL Server DBAs. Originally given to the Cambridge SQL Server User Group

TRANSCRIPT

Page 1: Hadoop - An introduction for SQL Server DBAs

Hadoop. An introduction for SQL Server DBAs.

Page 2: Hadoop - An introduction for SQL Server DBAs

Product Manager exploring Big Data

Red Gate Ventures

@andrewdenty

Andrew Denty

Page 3: Hadoop - An introduction for SQL Server DBAs

What is Hadoop? 1 Why you should care 2 How to get started 3

Page 4: Hadoop - An introduction for SQL Server DBAs

What we’re not going to talk about.

•  Replacing your existing servers with hadoop •  How Hadoop compares to other databases •  How to write Map Reduce or Java

Page 5: Hadoop - An introduction for SQL Server DBAs

Who has used Hadoop? ?

Page 6: Hadoop - An introduction for SQL Server DBAs

What is Hadoop?

•  Open source Apache project •  Written in Java •  Distributed system: – Shares large workloads – Commodity servers – Scales effectively

Page 7: Hadoop - An introduction for SQL Server DBAs

Map Reduce

(Java based distributed programming model)

YARN (Yet another resource

negotiator)

HDFS

(Hadoop Distributed File System)

Storage Compute

Page 8: Hadoop - An introduction for SQL Server DBAs

JBOD It’s just bytes 0II0I0I0I

Scalable Fault tolerant

Page 9: Hadoop - An introduction for SQL Server DBAs
Page 10: Hadoop - An introduction for SQL Server DBAs

Why should you care?

•  Never again throw away any data! •  Once you’ve kept EVERYTHING you can

then derive some insights from all of that data.

Page 11: Hadoop - An introduction for SQL Server DBAs

http://priceonomics.com/why-ups-trucks-dont-turn-left/

Page 12: Hadoop - An introduction for SQL Server DBAs
Page 13: Hadoop - An introduction for SQL Server DBAs

Salary

Page 14: Hadoop - An introduction for SQL Server DBAs

The things you can’t do with SQL Server

•  Distributed processing •  Generating insight from vast quantities of

structured and unstructured data.

Page 15: Hadoop - An introduction for SQL Server DBAs

The Hadoop Journey

Sandbox 2-3 node cluster

Something in production

Page 16: Hadoop - An introduction for SQL Server DBAs

How to get started now:

•  Download & Install a sandbox: – Hortonworks Sandbox - http://bit.ly/1gkkCte – Cloudera QuickStart VM - http://bit.ly/19eOwR3 – Map R Sandbox - http://bit.ly/TWZynR

•  Fire it up, import some data with HDFS Explorer - http://bit.ly/1ivuSz5

•  Create a table •  Run a query…

Page 17: Hadoop - An introduction for SQL Server DBAs

To sum up…

•  Hadoop is a distributed data storage and computation engine

•  Hadoop enables you to do things which were impossible with SQL Server… (and get paid more!)

•  Get started by downloading a Sandbox – it’s easy!

Page 18: Hadoop - An introduction for SQL Server DBAs

Product Manager exploring big data

Red Gate Ventures

@andrewdenty

Andrew Denty