unite and free your data making big data big …files.meetup.com/14077672/widb - making big data...

Post on 20-May-2020

18 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Unite and Free your Data

Making Big Data Big Business East Coast Chapter Launch of Women in Big Data

Presented on: November 9, 2016

Contact Info:

Donna-M. Fernandez | Co-Founder & COO| donna@metistream.com | 703.201.0605

| Page: 2

About me

Consultant

Sales & AE

Developer

Trainer

Project & Service

Delivery Manager

Recruiter

Organizer

BS in CIS

Co-Founder & COO

Writer

| Page: 3

My inspiration

| Page: 4

Why Big Data? Because Big Data is Here to Stay

Source: IDG

Enterprise Bid

Data Study, 2014

| Page: 5

Big Data Landscape 2016

Source: Matt

Turck, Jim Hao, &

FirstMark Capital

| Page: 6

Big Data = Big $

Source: Wikibon

2015

| Page: 7

“Be confident and be brave. Aim high and shoot for

Bold Hairy Audacious Goals (BHAG). Make funny

jokes.”

| Page: 8

“Be frugal and practical.”

| Page: 9

Overview

▪ Founded in April 2014

▪ A Big Data Integration and Advanced

Analytics Solutions and Services

company

▪ Innovators who leverage the best of

what Open Source technology has to

offer

▪ Certified Apache Spark Systems

Integrator and Trainers

▪ Small Women-owned & minority-

owned business located in DC area

| Page: 10

Community

VA-MD-DC Big Data Healthcare Meetup

Washington DC Area Apache Spark Interactive

South Big Data Hub NVTC – Big Data and Analytics Committee

http://www.meetup.com/VA-MD-DC-Big-Data-Healthcare-Meetup/

http://www.meetup.com/Washington-DC-Area-Spark-Interactive/

http://www.southbdhub.org// https://www.nvtc.org/community/bigdata.php

DataStart Awardee 900+ members 2,000+ members 2016 Hottest Startup Nominee

| Page: 11

“Be bold. Ask and you shall receive.”

| Page: 12

Spark Overview

| Page: 13

So if Spark were the Justice League...

Source: Databricks Spark Survey Result 2016

LEARN MORE HERE:

https://databricks.com/blog/2

016/09/27/spark-survey-

2016-released.html

Copyright: Justice League owned by DC Comics

SPARK CORE API (R, SQL, Python, Scala, Java)

SPARK SQL + DATAFRAMES

SPARK STREAMING MACHINE LEARNING + ML PIPELINES

GRAPHX + GRAPHFRAMES

| Page: 14

Know the details. Be obsessive with learning the details.

| Page: 15

What do you need to know to master Spark?

Java/J2EE - 80% | 183 58% | 96 - Scala

Python - 66% | 83

69% | 149 - Hadoop

MapReduce - 36% | 57

Linux - 41% | 55

41% | 53 - Algorithms

Machine Learning - 41% | 116

Spark Streaming - 31% | 38

First number

indicates % of

profiles with the

given skill

Second number

indicates

number of

occurrences for

that skill across

all profiles

C++ - 45% | 59

57% | 64 - Distributed Systems SQL - 49% | 91

Cloud Computing - 29% | 39

34% | 51 - Analytics

Spark - 93% | 257

28% | 50 - Hive

24% | 34 - Cloudera

24% | 35 - Open Source

23% | 43 - HBase

Git - 23% | 25

22% | 27 - JavaScript

Pig - 19% | 28

17% | 25 - Kafka

15% | 23 - NoSQL

AWS - 15% | 23

13% | 21 - Storm

Data Science - 12% | 14

Ant - 12% | 13

11% | 20 - Cassandra

11% | 12 - ETL

11% | 12 - XML

UNIX - 13% | 17

Hortonworks - 12% | 17

HDFS - 13% | 18

| Page: 16

Effective Spark learning techniques

▪ Under the gun! (Immediate Use)

▪ Classroom training with labs

▪ Get your hands dirty - build a small POC

– start with a fairly easy use case such as

data cleanup or even word count

– finding data may be the 1st stumbling

block so if you don’t already have your

inventory of open data, start with this list:

https://analytics.club/free-big-data-sets-

lists-and-links/

▪ Subscribe to Spark Users List/Stackoverflow

& regularly review posts; try responding to

posts as you gain confidence

▪ Join a Spark Meetup!

| Page: 17

Recommended learning resources

▪ Databricks YouTube Channel

https://www.youtube.com/channel/UC3q8O3Bh2Le8Rj1-Q-_UUbA

▪ Apache Spark YouTube Channel

https://www.youtube.com/channel/UCRzsq7k4-kT-h3TDUBQ82-w

▪ IBM Big Data University

https://bigdatauniversity.com/?s=spark

▪ Databricks Spark Reference Applications

https://www.gitbook.com/book/databricks/databricks-spark-reference-

applications/details

▪ Databricks Blog

https://databricks.com/blog

▪ Spark Summit

http://spark-summit.org/

▪ Cloudera Blog

http://blog.cloudera.com/blog/category/spark/

▪ Scala Cheat Sheet http://docs.scala-

lang.org/cheatsheets/?_ga=1.181267810.438655960.1441909758

| Page: 18

Healthcare Analytics

| Page: 19

Don’t let your limitations hold you back. Find a way forward.

| Page: 20

| Page: 21

The Problem…

80% of health data is unstructured and stored in hundreds of forms such as lab results, images, and medical transcripts, McKinsey Global Institute

Data Formats:

Amount of sample data sets required are compromised and sacrificed due to size and volume; genomics is a game changer

Large Datasets:

Analytic processing takes a long time based on type of calculations and computations required

Cycle Times:

MU and PMI are driving change and creating new needs

Healthcare Policy:

Building and deploying analytic models requires specialized skills and can take a long time

Resources:

Personal health information is incredibly sensitive so security & privacy are paramount

Privacy & Security:

| Page: 22

“Sometimes leading with your heart instead of your

brain opens up new doors.”

| Page: 23

The Marriage of Spark and FHIR

= Smoking HOT analytics!

| Page: 24

What is FHIR?

▪ Fast Healthcare Interoperability Resource (FHIR) is the new

standard for exchanging healthcare information

▪ “Best-of” standards and implementation resources from HL7 V2,

HL7 V3, and HL7 CDA

– Uses basic building blocks called "resources" to model healthcare data at

a granular level

– API driven and based on simple XML or JSON structures with an http

based RESTful protocol

– Maintained by Health Level 7 (HL7) International

– Currently “Draft Standard for Trial Use 2” which means in active

development

| Page: 25

Why FHIR

▪ Efficient: Faster and more efficient way to exchange

information, process analytics and develop solutions

▪ Progressive: Based on progressive web based API technology

to process and manipulate healthcare data across various

platforms, devices and cloud technologies

▪ Flexible: Lower level of granularity at the data element level to

exchange and process information

Efficient Progressive Flexible

| Page: 26

You are invited!

| Page: 27

“No excuses. Just get it done.”

| Page: 28

Thank you

top related