unite and free your data making big data big …files.meetup.com/14077672/widb - making big data...

28
Unite and Free your Data Making Big Data Big Business East Coast Chapter Launch of Women in Big Data Presented on: November 9, 2016 Contact Info: Donna-M. Fernandez | Co-Founder & COO| [email protected] | 703.201.0605

Upload: others

Post on 20-May-2020

17 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Unite and Free your Data Making Big Data Big …files.meetup.com/14077672/WiDB - Making Big Data Big...Unite and Free your Data Making Big Data Big Business East Coast Chapter Launch

Unite and Free your Data

Making Big Data Big Business East Coast Chapter Launch of Women in Big Data

Presented on: November 9, 2016

Contact Info:

Donna-M. Fernandez | Co-Founder & COO| [email protected] | 703.201.0605

Page 2: Unite and Free your Data Making Big Data Big …files.meetup.com/14077672/WiDB - Making Big Data Big...Unite and Free your Data Making Big Data Big Business East Coast Chapter Launch

| Page: 2

About me

Consultant

Sales & AE

Developer

Trainer

Project & Service

Delivery Manager

Recruiter

Organizer

BS in CIS

Co-Founder & COO

Writer

Page 3: Unite and Free your Data Making Big Data Big …files.meetup.com/14077672/WiDB - Making Big Data Big...Unite and Free your Data Making Big Data Big Business East Coast Chapter Launch

| Page: 3

My inspiration

Page 4: Unite and Free your Data Making Big Data Big …files.meetup.com/14077672/WiDB - Making Big Data Big...Unite and Free your Data Making Big Data Big Business East Coast Chapter Launch

| Page: 4

Why Big Data? Because Big Data is Here to Stay

Source: IDG

Enterprise Bid

Data Study, 2014

Page 5: Unite and Free your Data Making Big Data Big …files.meetup.com/14077672/WiDB - Making Big Data Big...Unite and Free your Data Making Big Data Big Business East Coast Chapter Launch

| Page: 5

Big Data Landscape 2016

Source: Matt

Turck, Jim Hao, &

FirstMark Capital

Page 6: Unite and Free your Data Making Big Data Big …files.meetup.com/14077672/WiDB - Making Big Data Big...Unite and Free your Data Making Big Data Big Business East Coast Chapter Launch

| Page: 6

Big Data = Big $

Source: Wikibon

2015

Page 7: Unite and Free your Data Making Big Data Big …files.meetup.com/14077672/WiDB - Making Big Data Big...Unite and Free your Data Making Big Data Big Business East Coast Chapter Launch

| Page: 7

“Be confident and be brave. Aim high and shoot for

Bold Hairy Audacious Goals (BHAG). Make funny

jokes.”

Page 8: Unite and Free your Data Making Big Data Big …files.meetup.com/14077672/WiDB - Making Big Data Big...Unite and Free your Data Making Big Data Big Business East Coast Chapter Launch

| Page: 8

“Be frugal and practical.”

Page 9: Unite and Free your Data Making Big Data Big …files.meetup.com/14077672/WiDB - Making Big Data Big...Unite and Free your Data Making Big Data Big Business East Coast Chapter Launch

| Page: 9

Overview

▪ Founded in April 2014

▪ A Big Data Integration and Advanced

Analytics Solutions and Services

company

▪ Innovators who leverage the best of

what Open Source technology has to

offer

▪ Certified Apache Spark Systems

Integrator and Trainers

▪ Small Women-owned & minority-

owned business located in DC area

Page 10: Unite and Free your Data Making Big Data Big …files.meetup.com/14077672/WiDB - Making Big Data Big...Unite and Free your Data Making Big Data Big Business East Coast Chapter Launch

| Page: 10

Community

VA-MD-DC Big Data Healthcare Meetup

Washington DC Area Apache Spark Interactive

South Big Data Hub NVTC – Big Data and Analytics Committee

http://www.meetup.com/VA-MD-DC-Big-Data-Healthcare-Meetup/

http://www.meetup.com/Washington-DC-Area-Spark-Interactive/

http://www.southbdhub.org// https://www.nvtc.org/community/bigdata.php

DataStart Awardee 900+ members 2,000+ members 2016 Hottest Startup Nominee

Page 11: Unite and Free your Data Making Big Data Big …files.meetup.com/14077672/WiDB - Making Big Data Big...Unite and Free your Data Making Big Data Big Business East Coast Chapter Launch

| Page: 11

“Be bold. Ask and you shall receive.”

Page 12: Unite and Free your Data Making Big Data Big …files.meetup.com/14077672/WiDB - Making Big Data Big...Unite and Free your Data Making Big Data Big Business East Coast Chapter Launch

| Page: 12

Spark Overview

Page 13: Unite and Free your Data Making Big Data Big …files.meetup.com/14077672/WiDB - Making Big Data Big...Unite and Free your Data Making Big Data Big Business East Coast Chapter Launch

| Page: 13

So if Spark were the Justice League...

Source: Databricks Spark Survey Result 2016

LEARN MORE HERE:

https://databricks.com/blog/2

016/09/27/spark-survey-

2016-released.html

Copyright: Justice League owned by DC Comics

SPARK CORE API (R, SQL, Python, Scala, Java)

SPARK SQL + DATAFRAMES

SPARK STREAMING MACHINE LEARNING + ML PIPELINES

GRAPHX + GRAPHFRAMES

Page 14: Unite and Free your Data Making Big Data Big …files.meetup.com/14077672/WiDB - Making Big Data Big...Unite and Free your Data Making Big Data Big Business East Coast Chapter Launch

| Page: 14

Know the details. Be obsessive with learning the details.

Page 15: Unite and Free your Data Making Big Data Big …files.meetup.com/14077672/WiDB - Making Big Data Big...Unite and Free your Data Making Big Data Big Business East Coast Chapter Launch

| Page: 15

What do you need to know to master Spark?

Java/J2EE - 80% | 183 58% | 96 - Scala

Python - 66% | 83

69% | 149 - Hadoop

MapReduce - 36% | 57

Linux - 41% | 55

41% | 53 - Algorithms

Machine Learning - 41% | 116

Spark Streaming - 31% | 38

First number

indicates % of

profiles with the

given skill

Second number

indicates

number of

occurrences for

that skill across

all profiles

C++ - 45% | 59

57% | 64 - Distributed Systems SQL - 49% | 91

Cloud Computing - 29% | 39

34% | 51 - Analytics

Spark - 93% | 257

28% | 50 - Hive

24% | 34 - Cloudera

24% | 35 - Open Source

23% | 43 - HBase

Git - 23% | 25

22% | 27 - JavaScript

Pig - 19% | 28

17% | 25 - Kafka

15% | 23 - NoSQL

AWS - 15% | 23

13% | 21 - Storm

Data Science - 12% | 14

Ant - 12% | 13

11% | 20 - Cassandra

11% | 12 - ETL

11% | 12 - XML

UNIX - 13% | 17

Hortonworks - 12% | 17

HDFS - 13% | 18

Page 16: Unite and Free your Data Making Big Data Big …files.meetup.com/14077672/WiDB - Making Big Data Big...Unite and Free your Data Making Big Data Big Business East Coast Chapter Launch

| Page: 16

Effective Spark learning techniques

▪ Under the gun! (Immediate Use)

▪ Classroom training with labs

▪ Get your hands dirty - build a small POC

– start with a fairly easy use case such as

data cleanup or even word count

– finding data may be the 1st stumbling

block so if you don’t already have your

inventory of open data, start with this list:

https://analytics.club/free-big-data-sets-

lists-and-links/

▪ Subscribe to Spark Users List/Stackoverflow

& regularly review posts; try responding to

posts as you gain confidence

▪ Join a Spark Meetup!

Page 17: Unite and Free your Data Making Big Data Big …files.meetup.com/14077672/WiDB - Making Big Data Big...Unite and Free your Data Making Big Data Big Business East Coast Chapter Launch

| Page: 17

Recommended learning resources

▪ Databricks YouTube Channel

https://www.youtube.com/channel/UC3q8O3Bh2Le8Rj1-Q-_UUbA

▪ Apache Spark YouTube Channel

https://www.youtube.com/channel/UCRzsq7k4-kT-h3TDUBQ82-w

▪ IBM Big Data University

https://bigdatauniversity.com/?s=spark

▪ Databricks Spark Reference Applications

https://www.gitbook.com/book/databricks/databricks-spark-reference-

applications/details

▪ Databricks Blog

https://databricks.com/blog

▪ Spark Summit

http://spark-summit.org/

▪ Cloudera Blog

http://blog.cloudera.com/blog/category/spark/

▪ Scala Cheat Sheet http://docs.scala-

lang.org/cheatsheets/?_ga=1.181267810.438655960.1441909758

Page 18: Unite and Free your Data Making Big Data Big …files.meetup.com/14077672/WiDB - Making Big Data Big...Unite and Free your Data Making Big Data Big Business East Coast Chapter Launch

| Page: 18

Healthcare Analytics

Page 19: Unite and Free your Data Making Big Data Big …files.meetup.com/14077672/WiDB - Making Big Data Big...Unite and Free your Data Making Big Data Big Business East Coast Chapter Launch

| Page: 19

Don’t let your limitations hold you back. Find a way forward.

Page 20: Unite and Free your Data Making Big Data Big …files.meetup.com/14077672/WiDB - Making Big Data Big...Unite and Free your Data Making Big Data Big Business East Coast Chapter Launch

| Page: 20

Page 21: Unite and Free your Data Making Big Data Big …files.meetup.com/14077672/WiDB - Making Big Data Big...Unite and Free your Data Making Big Data Big Business East Coast Chapter Launch

| Page: 21

The Problem…

80% of health data is unstructured and stored in hundreds of forms such as lab results, images, and medical transcripts, McKinsey Global Institute

Data Formats:

Amount of sample data sets required are compromised and sacrificed due to size and volume; genomics is a game changer

Large Datasets:

Analytic processing takes a long time based on type of calculations and computations required

Cycle Times:

MU and PMI are driving change and creating new needs

Healthcare Policy:

Building and deploying analytic models requires specialized skills and can take a long time

Resources:

Personal health information is incredibly sensitive so security & privacy are paramount

Privacy & Security:

Page 22: Unite and Free your Data Making Big Data Big …files.meetup.com/14077672/WiDB - Making Big Data Big...Unite and Free your Data Making Big Data Big Business East Coast Chapter Launch

| Page: 22

“Sometimes leading with your heart instead of your

brain opens up new doors.”

Page 23: Unite and Free your Data Making Big Data Big …files.meetup.com/14077672/WiDB - Making Big Data Big...Unite and Free your Data Making Big Data Big Business East Coast Chapter Launch

| Page: 23

The Marriage of Spark and FHIR

= Smoking HOT analytics!

Page 24: Unite and Free your Data Making Big Data Big …files.meetup.com/14077672/WiDB - Making Big Data Big...Unite and Free your Data Making Big Data Big Business East Coast Chapter Launch

| Page: 24

What is FHIR?

▪ Fast Healthcare Interoperability Resource (FHIR) is the new

standard for exchanging healthcare information

▪ “Best-of” standards and implementation resources from HL7 V2,

HL7 V3, and HL7 CDA

– Uses basic building blocks called "resources" to model healthcare data at

a granular level

– API driven and based on simple XML or JSON structures with an http

based RESTful protocol

– Maintained by Health Level 7 (HL7) International

– Currently “Draft Standard for Trial Use 2” which means in active

development

Page 25: Unite and Free your Data Making Big Data Big …files.meetup.com/14077672/WiDB - Making Big Data Big...Unite and Free your Data Making Big Data Big Business East Coast Chapter Launch

| Page: 25

Why FHIR

▪ Efficient: Faster and more efficient way to exchange

information, process analytics and develop solutions

▪ Progressive: Based on progressive web based API technology

to process and manipulate healthcare data across various

platforms, devices and cloud technologies

▪ Flexible: Lower level of granularity at the data element level to

exchange and process information

Efficient Progressive Flexible

Page 26: Unite and Free your Data Making Big Data Big …files.meetup.com/14077672/WiDB - Making Big Data Big...Unite and Free your Data Making Big Data Big Business East Coast Chapter Launch

| Page: 26

You are invited!

Page 27: Unite and Free your Data Making Big Data Big …files.meetup.com/14077672/WiDB - Making Big Data Big...Unite and Free your Data Making Big Data Big Business East Coast Chapter Launch

| Page: 27

“No excuses. Just get it done.”

Page 28: Unite and Free your Data Making Big Data Big …files.meetup.com/14077672/WiDB - Making Big Data Big...Unite and Free your Data Making Big Data Big Business East Coast Chapter Launch

| Page: 28

Thank you