a cheat sheet on open source, hadoop and sas for the … · a cheat sheet on open source, hadoop...
TRANSCRIPT
#analyticsx @tamaradull
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
A Cheat Sheet on Open Source, Hadoop and SAS for the Non-Geeks Tamara Dull, Director of Emerging Technologies
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
#analyticsx@tamaradull
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
#analyticsx@tamaradull
Use the best tool for the job.
3on
Open Source
2on
Hadoop
1on
SAS
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
3 on Open Source
A Cheat Sheet on Open Source, Hadoop and SAS for the Non-Geeks
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
#analyticsx@tamaradull
3 Topics on Open Source
TheMyths
TheDefinitions
ProprietarySoftware
Compared
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
#analyticsx@tamaradull
Here’s 5 myths that need to go away:
the open source myth… …and the reality
it’s free.
it’s geekware.
it’s not ready for the enterprise.
it’s hard to support.
it’s not secure.
licensing is free. that’s it.
out of the gate, yes. over time, no.
from 42% in 2010 to 78% in 2015. 1
community support rules.
55% believe OS is more secure.1
1Source: 2015 Future of Open Source Survey, North Bridge and Black Duck Software, April 2015
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
Let’s define “open [fill-in-the-blank]”:
open source software
open source
open source project
open source distribution
open data
ODPi
open
open closed
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
Let’s define “open [fill-in-the-blank]”:
open source software
open source project
open source distribution
open data
ODPi
open
open source
Open source is something that can be
modified because its design is publicly accessible.
~opensource.com
Source: https://www.apertus.org/opensource
While it originated in the context of computer
software development, today the term open source designates a set of values—what we call
"the open source way."
Open source projects, products, or initiatives are those that embrace and celebrate open exchange, collaborative participation, rapid prototyping, transparency, meritocracy, and community development.
~opensource.com
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
Let’s define “open [fill-in-the-blank]”:
open source
open source project
open source distribution
open data
ODPi
open
open source softwareFree and open-source software (F/OSS) is computer software that can be classified as both free software and open-source software.
That is, anyone is freely licensed to use, copy, study, and change the software in any way, and the source code is openly shared so that people are encouraged to voluntarily improve the design of the software.
This is in contrast to proprietary software, where the software is under restrictive copyright and the source code is usually hidden from the users.
Open source software is software whose
source code is available for modification or enhancement by anyone.
~opensource.com
The Fair Source License allows everyone to see the source code and makes the software free to use for a limited number of users in your organization. It offers some of the benefits of open source while preserving the ability to charge for the software. ~fair.io
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
Let’s define “open [fill-in-the-blank]”:
open source software
open source
open source distribution
open data
ODPi
open
open source project
An open source project is:
100% open source software
a collection of related functions
developed by volunteers
top-level project (TLP) vs. incubator
sub-projects
managed and distributed by an open source community, such as ASF
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
Let’s define “open [fill-in-the-blank]”:
open source software
open source
open source project
open data
ODPi
open
open source distribution
An open source distribution is:
a collection of related projects
may contain closed source projects/modules
typically managed and distributed by software providers
Example: Cloudera
Distribution including Hadoop
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
Let’s define “open [fill-in-the-blank]”:
open source software
open source
open source project
open source distribution
ODPi
open
open data
Whereby open source software is volunteer-developed software that is available for free, open data is public data sets that are available for free.
Tip: Don’t call it “open source data.”
Open data is data that can be freely used,
reused and redistributed by anyone - subject only, at most, to the requirement to attribute and share alike.
~Open Data Handbook
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
Let’s define “open [fill-in-the-blank]”:
open source software
open source
open source project
open source distribution
open data
open
ODPi
The Open Data Platform Initiative (ODPi)
brings industry leaders together to accelerate the adoption of Apache Hadoop and related Big Data technologies and make it easier to rapidly develop applications.
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
#analyticsx@tamaradull
Proprietary Software ComparedProprietary Open Source
WHO?
WHAT?
WHERE?
WHEN?
WHY?
HOWMUCH?
paid employees, contractors led by R&D, product management
volunteer group of peer developers collaborating
software is under restrictive copyright, source code is
usually hidden from users
source code is available for modification or enhancement by anyone
companies, organizations foundations, dev communities
customer demand, market conditions developer(s) see/respond to a need
to make money, part of IP to give back
license, subscription model – $0 to $$$; dev, support – $0 to $$$
software - $0;dev, support - $0 to $$$
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
#analyticsx@tamaradull
Why do companies open source their software?
They want to cut costs and speed up development.
They want more eyeballs on the code –new/improved functionality, more secured, etc.
It’s not part of their company’s IP or a way to make money.
It’s a way of giving back.
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
2 on Hadoop
A Cheat Sheet on Open Source, Hadoop and SAS for the Non-Geeks
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
#analyticsx@tamaradull
2 Topics on Hadoop
TheHadoop
Ecosystem
HadoopDistributions
Compared
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
#analyticsx@tamaradull
What is ?
…or an ecosystem?
Is it a project…
Source: http://www.neevtech.com/blog/2013/03/18/hadoop-ecosystem-at-a-glance/
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
#analyticsx@tamaradull
The yellow elephant remains strong after 10 years.
“This project [Hadoop] has sort of sparked a revolution in open source software.”
Doug CuttingCo-creator of Hadoop and Chief Architect at Cloudera
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
#analyticsx@tamaradull
Open source software remains strong after 20+ years…
…and SAS remains strong after 40 years.
1990 1995 2000 2005 2010 2015
linux
drupalR
python
android
firefox
wordpress flink
cassandramySQL
spark
hadoop
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
#analyticsx@tamaradull
Apache Software Foundation (ASF) is a major player in OS big data technologies.
Source: https://projects.apache.org/projects.html?category
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
#analyticsx@tamaradull
Hadoop distributions compared, by Gartner:
Source: http://blogs.gartner.com/merv-adrian/2015/12/24/supported-hadoop-stack-continues-expansion/
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
#analyticsx@tamaradull
Hadoop distributions compared, by function:
all 6
distributions
support
5 of 6
support
4 of 6
support
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
1 on SAS
A Cheat Sheet on Open Source, Hadoop and SAS for the Non-Geeks
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
#analyticsx@tamaradull
1 Topic on SAS
How SASFits In
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
#analyticsx@tamaradull
SAS 9.4 Supported Hadoop Distributions
Source: https://support.sas.com/resources/thirdpartysupport/v94/hadoop/hadoop-distributions.html
?X
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
#analyticsx@tamaradull
SAS technology that interacts with Hadoop:
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
#analyticsx@tamaradull
SAS and Hadoop work together.
Coexistence is not a pipe dream; it’s here now.
SAS goes to the data in Hadoop.
It’s a two-way relationship: SAS makes calls to Hadoop/OSS and Hadoop/OSS calls back.
Hadoop is evolving (with rapidly revolving poster children) – and SAS is evolving with it.
Use the best tool for the job.
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
#analyticsx@tamaradull
BONUS: How does SAS Viya fit in?
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
From opensource.com:
While it originated in the context of computer software development, today the term “open source” designates a set of values—what we call "the open source way."
Open source projects, products, or initiatives are those that embrace and celebrate open exchange, collaborative participation, rapid prototyping, transparency, meritocracy, and community development.
From opensource.com:
While it originated in the context of computer software development, today the term “open source” designates a set of values—what we call "the open source way."
Open source projects, products, or initiatives are those that embrace and celebrate open exchange, collaborative participation, rapid prototyping, transparency, meritocracy, and community development.
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
#analyticsx @tamaradull