unite and free your data making big data big …files.meetup.com/14077672/widb - making big data...
TRANSCRIPT
Unite and Free your Data
Making Big Data Big Business East Coast Chapter Launch of Women in Big Data
Presented on: November 9, 2016
Contact Info:
Donna-M. Fernandez | Co-Founder & COO| [email protected] | 703.201.0605
| Page: 2
About me
Consultant
Sales & AE
Developer
Trainer
Project & Service
Delivery Manager
Recruiter
Organizer
BS in CIS
Co-Founder & COO
Writer
| Page: 3
My inspiration
| Page: 4
Why Big Data? Because Big Data is Here to Stay
Source: IDG
Enterprise Bid
Data Study, 2014
| Page: 5
Big Data Landscape 2016
Source: Matt
Turck, Jim Hao, &
FirstMark Capital
| Page: 6
Big Data = Big $
Source: Wikibon
2015
| Page: 7
“Be confident and be brave. Aim high and shoot for
Bold Hairy Audacious Goals (BHAG). Make funny
jokes.”
| Page: 8
“Be frugal and practical.”
| Page: 9
Overview
▪ Founded in April 2014
▪ A Big Data Integration and Advanced
Analytics Solutions and Services
company
▪ Innovators who leverage the best of
what Open Source technology has to
offer
▪ Certified Apache Spark Systems
Integrator and Trainers
▪ Small Women-owned & minority-
owned business located in DC area
| Page: 10
Community
VA-MD-DC Big Data Healthcare Meetup
Washington DC Area Apache Spark Interactive
South Big Data Hub NVTC – Big Data and Analytics Committee
http://www.meetup.com/VA-MD-DC-Big-Data-Healthcare-Meetup/
http://www.meetup.com/Washington-DC-Area-Spark-Interactive/
http://www.southbdhub.org// https://www.nvtc.org/community/bigdata.php
DataStart Awardee 900+ members 2,000+ members 2016 Hottest Startup Nominee
| Page: 11
“Be bold. Ask and you shall receive.”
| Page: 12
Spark Overview
| Page: 13
So if Spark were the Justice League...
Source: Databricks Spark Survey Result 2016
LEARN MORE HERE:
https://databricks.com/blog/2
016/09/27/spark-survey-
2016-released.html
Copyright: Justice League owned by DC Comics
SPARK CORE API (R, SQL, Python, Scala, Java)
SPARK SQL + DATAFRAMES
SPARK STREAMING MACHINE LEARNING + ML PIPELINES
GRAPHX + GRAPHFRAMES
| Page: 14
Know the details. Be obsessive with learning the details.
| Page: 15
What do you need to know to master Spark?
Java/J2EE - 80% | 183 58% | 96 - Scala
Python - 66% | 83
69% | 149 - Hadoop
MapReduce - 36% | 57
Linux - 41% | 55
41% | 53 - Algorithms
Machine Learning - 41% | 116
Spark Streaming - 31% | 38
First number
indicates % of
profiles with the
given skill
Second number
indicates
number of
occurrences for
that skill across
all profiles
C++ - 45% | 59
57% | 64 - Distributed Systems SQL - 49% | 91
Cloud Computing - 29% | 39
34% | 51 - Analytics
Spark - 93% | 257
28% | 50 - Hive
24% | 34 - Cloudera
24% | 35 - Open Source
23% | 43 - HBase
Git - 23% | 25
22% | 27 - JavaScript
Pig - 19% | 28
17% | 25 - Kafka
15% | 23 - NoSQL
AWS - 15% | 23
13% | 21 - Storm
Data Science - 12% | 14
Ant - 12% | 13
11% | 20 - Cassandra
11% | 12 - ETL
11% | 12 - XML
UNIX - 13% | 17
Hortonworks - 12% | 17
HDFS - 13% | 18
| Page: 16
Effective Spark learning techniques
▪ Under the gun! (Immediate Use)
▪ Classroom training with labs
▪ Get your hands dirty - build a small POC
– start with a fairly easy use case such as
data cleanup or even word count
– finding data may be the 1st stumbling
block so if you don’t already have your
inventory of open data, start with this list:
https://analytics.club/free-big-data-sets-
lists-and-links/
▪ Subscribe to Spark Users List/Stackoverflow
& regularly review posts; try responding to
posts as you gain confidence
▪ Join a Spark Meetup!
| Page: 17
Recommended learning resources
▪ Databricks YouTube Channel
https://www.youtube.com/channel/UC3q8O3Bh2Le8Rj1-Q-_UUbA
▪ Apache Spark YouTube Channel
https://www.youtube.com/channel/UCRzsq7k4-kT-h3TDUBQ82-w
▪ IBM Big Data University
https://bigdatauniversity.com/?s=spark
▪ Databricks Spark Reference Applications
https://www.gitbook.com/book/databricks/databricks-spark-reference-
applications/details
▪ Databricks Blog
https://databricks.com/blog
▪ Spark Summit
http://spark-summit.org/
▪ Cloudera Blog
http://blog.cloudera.com/blog/category/spark/
▪ Scala Cheat Sheet http://docs.scala-
lang.org/cheatsheets/?_ga=1.181267810.438655960.1441909758
| Page: 18
Healthcare Analytics
| Page: 19
Don’t let your limitations hold you back. Find a way forward.
| Page: 20
| Page: 21
The Problem…
80% of health data is unstructured and stored in hundreds of forms such as lab results, images, and medical transcripts, McKinsey Global Institute
Data Formats:
Amount of sample data sets required are compromised and sacrificed due to size and volume; genomics is a game changer
Large Datasets:
Analytic processing takes a long time based on type of calculations and computations required
Cycle Times:
MU and PMI are driving change and creating new needs
Healthcare Policy:
Building and deploying analytic models requires specialized skills and can take a long time
Resources:
Personal health information is incredibly sensitive so security & privacy are paramount
Privacy & Security:
| Page: 22
“Sometimes leading with your heart instead of your
brain opens up new doors.”
| Page: 23
The Marriage of Spark and FHIR
= Smoking HOT analytics!
| Page: 24
What is FHIR?
▪ Fast Healthcare Interoperability Resource (FHIR) is the new
standard for exchanging healthcare information
▪ “Best-of” standards and implementation resources from HL7 V2,
HL7 V3, and HL7 CDA
– Uses basic building blocks called "resources" to model healthcare data at
a granular level
– API driven and based on simple XML or JSON structures with an http
based RESTful protocol
– Maintained by Health Level 7 (HL7) International
– Currently “Draft Standard for Trial Use 2” which means in active
development
| Page: 25
Why FHIR
▪ Efficient: Faster and more efficient way to exchange
information, process analytics and develop solutions
▪ Progressive: Based on progressive web based API technology
to process and manipulate healthcare data across various
platforms, devices and cloud technologies
▪ Flexible: Lower level of granularity at the data element level to
exchange and process information
Efficient Progressive Flexible
| Page: 26
You are invited!
| Page: 27
“No excuses. Just get it done.”
| Page: 28
Thank you