a cheat sheet on open source, hadoop and sas for the … · a cheat sheet on open source, hadoop...

30
#analyticsx @tamaradull Copyright © 2016, SAS Institute Inc. All rights reserved. A Cheat Sheet on Open Source, Hadoop and SAS for the Non-Geeks Tamara Dull, Director of Emerging Technologies

Upload: trinhtuyen

Post on 13-May-2018

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: A Cheat Sheet on Open Source, Hadoop and SAS for the … · A Cheat Sheet on Open Source, Hadoop and SAS for the Non-Geeks Tamara Dull, Director of Emerging Technologies . ... (TLP)

#analyticsx @tamaradull

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

A Cheat Sheet on Open Source, Hadoop and SAS for the Non-Geeks Tamara Dull, Director of Emerging Technologies

Page 2: A Cheat Sheet on Open Source, Hadoop and SAS for the … · A Cheat Sheet on Open Source, Hadoop and SAS for the Non-Geeks Tamara Dull, Director of Emerging Technologies . ... (TLP)

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx@tamaradull

Page 3: A Cheat Sheet on Open Source, Hadoop and SAS for the … · A Cheat Sheet on Open Source, Hadoop and SAS for the Non-Geeks Tamara Dull, Director of Emerging Technologies . ... (TLP)

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx@tamaradull

Use the best tool for the job.

3on

Open Source

2on

Hadoop

1on

SAS

Page 4: A Cheat Sheet on Open Source, Hadoop and SAS for the … · A Cheat Sheet on Open Source, Hadoop and SAS for the Non-Geeks Tamara Dull, Director of Emerging Technologies . ... (TLP)

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

3 on Open Source

A Cheat Sheet on Open Source, Hadoop and SAS for the Non-Geeks

Page 5: A Cheat Sheet on Open Source, Hadoop and SAS for the … · A Cheat Sheet on Open Source, Hadoop and SAS for the Non-Geeks Tamara Dull, Director of Emerging Technologies . ... (TLP)

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx@tamaradull

3 Topics on Open Source

TheMyths

TheDefinitions

ProprietarySoftware

Compared

Page 6: A Cheat Sheet on Open Source, Hadoop and SAS for the … · A Cheat Sheet on Open Source, Hadoop and SAS for the Non-Geeks Tamara Dull, Director of Emerging Technologies . ... (TLP)

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx@tamaradull

Here’s 5 myths that need to go away:

the open source myth… …and the reality

it’s free.

it’s geekware.

it’s not ready for the enterprise.

it’s hard to support.

it’s not secure.

licensing is free. that’s it.

out of the gate, yes. over time, no.

from 42% in 2010 to 78% in 2015. 1

community support rules.

55% believe OS is more secure.1

1Source: 2015 Future of Open Source Survey, North Bridge and Black Duck Software, April 2015

Page 7: A Cheat Sheet on Open Source, Hadoop and SAS for the … · A Cheat Sheet on Open Source, Hadoop and SAS for the Non-Geeks Tamara Dull, Director of Emerging Technologies . ... (TLP)

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Let’s define “open [fill-in-the-blank]”:

open source software

open source

open source project

open source distribution

open data

ODPi

open

open closed

Page 8: A Cheat Sheet on Open Source, Hadoop and SAS for the … · A Cheat Sheet on Open Source, Hadoop and SAS for the Non-Geeks Tamara Dull, Director of Emerging Technologies . ... (TLP)

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Let’s define “open [fill-in-the-blank]”:

open source software

open source project

open source distribution

open data

ODPi

open

open source

Open source is something that can be

modified because its design is publicly accessible.

~opensource.com

Source: https://www.apertus.org/opensource

While it originated in the context of computer

software development, today the term open source designates a set of values—what we call

"the open source way."

Open source projects, products, or initiatives are those that embrace and celebrate open exchange, collaborative participation, rapid prototyping, transparency, meritocracy, and community development.

~opensource.com

Page 9: A Cheat Sheet on Open Source, Hadoop and SAS for the … · A Cheat Sheet on Open Source, Hadoop and SAS for the Non-Geeks Tamara Dull, Director of Emerging Technologies . ... (TLP)

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Let’s define “open [fill-in-the-blank]”:

open source

open source project

open source distribution

open data

ODPi

open

open source softwareFree and open-source software (F/OSS) is computer software that can be classified as both free software and open-source software.

That is, anyone is freely licensed to use, copy, study, and change the software in any way, and the source code is openly shared so that people are encouraged to voluntarily improve the design of the software.

This is in contrast to proprietary software, where the software is under restrictive copyright and the source code is usually hidden from the users.

Open source software is software whose

source code is available for modification or enhancement by anyone.

~opensource.com

The Fair Source License allows everyone to see the source code and makes the software free to use for a limited number of users in your organization. It offers some of the benefits of open source while preserving the ability to charge for the software. ~fair.io

Page 10: A Cheat Sheet on Open Source, Hadoop and SAS for the … · A Cheat Sheet on Open Source, Hadoop and SAS for the Non-Geeks Tamara Dull, Director of Emerging Technologies . ... (TLP)

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Let’s define “open [fill-in-the-blank]”:

open source software

open source

open source distribution

open data

ODPi

open

open source project

An open source project is:

100% open source software

a collection of related functions

developed by volunteers

top-level project (TLP) vs. incubator

sub-projects

managed and distributed by an open source community, such as ASF

Page 11: A Cheat Sheet on Open Source, Hadoop and SAS for the … · A Cheat Sheet on Open Source, Hadoop and SAS for the Non-Geeks Tamara Dull, Director of Emerging Technologies . ... (TLP)

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Let’s define “open [fill-in-the-blank]”:

open source software

open source

open source project

open data

ODPi

open

open source distribution

An open source distribution is:

a collection of related projects

may contain closed source projects/modules

typically managed and distributed by software providers

Example: Cloudera

Distribution including Hadoop

Page 12: A Cheat Sheet on Open Source, Hadoop and SAS for the … · A Cheat Sheet on Open Source, Hadoop and SAS for the Non-Geeks Tamara Dull, Director of Emerging Technologies . ... (TLP)

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Let’s define “open [fill-in-the-blank]”:

open source software

open source

open source project

open source distribution

ODPi

open

open data

Whereby open source software is volunteer-developed software that is available for free, open data is public data sets that are available for free.

Tip: Don’t call it “open source data.”

Open data is data that can be freely used,

reused and redistributed by anyone - subject only, at most, to the requirement to attribute and share alike.

~Open Data Handbook

Page 13: A Cheat Sheet on Open Source, Hadoop and SAS for the … · A Cheat Sheet on Open Source, Hadoop and SAS for the Non-Geeks Tamara Dull, Director of Emerging Technologies . ... (TLP)

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Let’s define “open [fill-in-the-blank]”:

open source software

open source

open source project

open source distribution

open data

open

ODPi

The Open Data Platform Initiative (ODPi)

brings industry leaders together to accelerate the adoption of Apache Hadoop and related Big Data technologies and make it easier to rapidly develop applications.

Page 14: A Cheat Sheet on Open Source, Hadoop and SAS for the … · A Cheat Sheet on Open Source, Hadoop and SAS for the Non-Geeks Tamara Dull, Director of Emerging Technologies . ... (TLP)

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx@tamaradull

Proprietary Software ComparedProprietary Open Source

WHO?

WHAT?

WHERE?

WHEN?

WHY?

HOWMUCH?

paid employees, contractors led by R&D, product management

volunteer group of peer developers collaborating

software is under restrictive copyright, source code is

usually hidden from users

source code is available for modification or enhancement by anyone

companies, organizations foundations, dev communities

customer demand, market conditions developer(s) see/respond to a need

to make money, part of IP to give back

license, subscription model – $0 to $$$; dev, support – $0 to $$$

software - $0;dev, support - $0 to $$$

Page 15: A Cheat Sheet on Open Source, Hadoop and SAS for the … · A Cheat Sheet on Open Source, Hadoop and SAS for the Non-Geeks Tamara Dull, Director of Emerging Technologies . ... (TLP)

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx@tamaradull

Why do companies open source their software?

They want to cut costs and speed up development.

They want more eyeballs on the code –new/improved functionality, more secured, etc.

It’s not part of their company’s IP or a way to make money.

It’s a way of giving back.

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Page 16: A Cheat Sheet on Open Source, Hadoop and SAS for the … · A Cheat Sheet on Open Source, Hadoop and SAS for the Non-Geeks Tamara Dull, Director of Emerging Technologies . ... (TLP)

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

2 on Hadoop

A Cheat Sheet on Open Source, Hadoop and SAS for the Non-Geeks

Page 17: A Cheat Sheet on Open Source, Hadoop and SAS for the … · A Cheat Sheet on Open Source, Hadoop and SAS for the Non-Geeks Tamara Dull, Director of Emerging Technologies . ... (TLP)

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx@tamaradull

2 Topics on Hadoop

TheHadoop

Ecosystem

HadoopDistributions

Compared

Page 18: A Cheat Sheet on Open Source, Hadoop and SAS for the … · A Cheat Sheet on Open Source, Hadoop and SAS for the Non-Geeks Tamara Dull, Director of Emerging Technologies . ... (TLP)

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx@tamaradull

What is ?

…or an ecosystem?

Is it a project…

Source: http://www.neevtech.com/blog/2013/03/18/hadoop-ecosystem-at-a-glance/

Page 19: A Cheat Sheet on Open Source, Hadoop and SAS for the … · A Cheat Sheet on Open Source, Hadoop and SAS for the Non-Geeks Tamara Dull, Director of Emerging Technologies . ... (TLP)

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx@tamaradull

The yellow elephant remains strong after 10 years.

“This project [Hadoop] has sort of sparked a revolution in open source software.”

Doug CuttingCo-creator of Hadoop and Chief Architect at Cloudera

Page 20: A Cheat Sheet on Open Source, Hadoop and SAS for the … · A Cheat Sheet on Open Source, Hadoop and SAS for the Non-Geeks Tamara Dull, Director of Emerging Technologies . ... (TLP)

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx@tamaradull

Open source software remains strong after 20+ years…

…and SAS remains strong after 40 years.

1990 1995 2000 2005 2010 2015

linux

drupalR

python

android

firefox

wordpress flink

cassandramySQL

spark

hadoop

Page 21: A Cheat Sheet on Open Source, Hadoop and SAS for the … · A Cheat Sheet on Open Source, Hadoop and SAS for the Non-Geeks Tamara Dull, Director of Emerging Technologies . ... (TLP)

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx@tamaradull

Apache Software Foundation (ASF) is a major player in OS big data technologies.

Source: https://projects.apache.org/projects.html?category

Page 22: A Cheat Sheet on Open Source, Hadoop and SAS for the … · A Cheat Sheet on Open Source, Hadoop and SAS for the Non-Geeks Tamara Dull, Director of Emerging Technologies . ... (TLP)

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx@tamaradull

Hadoop distributions compared, by Gartner:

Source: http://blogs.gartner.com/merv-adrian/2015/12/24/supported-hadoop-stack-continues-expansion/

Page 23: A Cheat Sheet on Open Source, Hadoop and SAS for the … · A Cheat Sheet on Open Source, Hadoop and SAS for the Non-Geeks Tamara Dull, Director of Emerging Technologies . ... (TLP)

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx@tamaradull

Hadoop distributions compared, by function:

all 6

distributions

support

5 of 6

support

4 of 6

support

Page 24: A Cheat Sheet on Open Source, Hadoop and SAS for the … · A Cheat Sheet on Open Source, Hadoop and SAS for the Non-Geeks Tamara Dull, Director of Emerging Technologies . ... (TLP)

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

1 on SAS

A Cheat Sheet on Open Source, Hadoop and SAS for the Non-Geeks

Page 25: A Cheat Sheet on Open Source, Hadoop and SAS for the … · A Cheat Sheet on Open Source, Hadoop and SAS for the Non-Geeks Tamara Dull, Director of Emerging Technologies . ... (TLP)

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx@tamaradull

1 Topic on SAS

How SASFits In

Page 26: A Cheat Sheet on Open Source, Hadoop and SAS for the … · A Cheat Sheet on Open Source, Hadoop and SAS for the Non-Geeks Tamara Dull, Director of Emerging Technologies . ... (TLP)

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx@tamaradull

SAS 9.4 Supported Hadoop Distributions

Source: https://support.sas.com/resources/thirdpartysupport/v94/hadoop/hadoop-distributions.html

?X

Page 27: A Cheat Sheet on Open Source, Hadoop and SAS for the … · A Cheat Sheet on Open Source, Hadoop and SAS for the Non-Geeks Tamara Dull, Director of Emerging Technologies . ... (TLP)

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx@tamaradull

SAS technology that interacts with Hadoop:

Page 28: A Cheat Sheet on Open Source, Hadoop and SAS for the … · A Cheat Sheet on Open Source, Hadoop and SAS for the Non-Geeks Tamara Dull, Director of Emerging Technologies . ... (TLP)

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx@tamaradull

SAS and Hadoop work together.

Coexistence is not a pipe dream; it’s here now.

SAS goes to the data in Hadoop.

It’s a two-way relationship: SAS makes calls to Hadoop/OSS and Hadoop/OSS calls back.

Hadoop is evolving (with rapidly revolving poster children) – and SAS is evolving with it.

Use the best tool for the job.

Page 29: A Cheat Sheet on Open Source, Hadoop and SAS for the … · A Cheat Sheet on Open Source, Hadoop and SAS for the Non-Geeks Tamara Dull, Director of Emerging Technologies . ... (TLP)

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx@tamaradull

BONUS: How does SAS Viya fit in?

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

From opensource.com:

While it originated in the context of computer software development, today the term “open source” designates a set of values—what we call "the open source way."

Open source projects, products, or initiatives are those that embrace and celebrate open exchange, collaborative participation, rapid prototyping, transparency, meritocracy, and community development.

From opensource.com:

While it originated in the context of computer software development, today the term “open source” designates a set of values—what we call "the open source way."

Open source projects, products, or initiatives are those that embrace and celebrate open exchange, collaborative participation, rapid prototyping, transparency, meritocracy, and community development.

Page 30: A Cheat Sheet on Open Source, Hadoop and SAS for the … · A Cheat Sheet on Open Source, Hadoop and SAS for the Non-Geeks Tamara Dull, Director of Emerging Technologies . ... (TLP)

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx @tamaradull