data wrangling and the art of big data discovery

35
Grab some coee and enjoy the pre-show banter before the top of the hour!

Upload: inside-analysis

Post on 14-Jul-2015

151 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Data Wrangling and the Art of Big Data Discovery

Grab some

coffee and

enjoy the

pre-show

banter before

the top of the

hour!

Page 2: Data Wrangling and the Art of Big Data Discovery

The Briefing Room

Data Wrangling and the Art of Big Data Discovery

Page 3: Data Wrangling and the Art of Big Data Discovery

Twitter Tag: #briefr The Briefing Room

Welcome

Host: Eric Kavanagh

[email protected] @eric_kavanagh

Page 4: Data Wrangling and the Art of Big Data Discovery

Twitter Tag: #briefr The Briefing Room

  Reveal the essential characteristics of enterprise software, good and bad

  Provide a forum for detailed analysis of today’s innovative technologies

 Give vendors a chance to explain their product to savvy analysts

  Allow audience members to pose serious questions... and get answers!

Mission

Page 5: Data Wrangling and the Art of Big Data Discovery

Twitter Tag: #briefr The Briefing Room

Topics

March: BI/ANALYTICS

April: BIG DATA

May: CLOUD

Page 6: Data Wrangling and the Art of Big Data Discovery

Twitter Tag: #briefr The Briefing Room

Should I Bring My Tools?

Ø  Hammers aren’t good for plumbing!

Ø  Big Data requires a new set of tools

Ø  Preparing and Exploring are very different

Ø  Don’t throw out your old tool box!

Page 7: Data Wrangling and the Art of Big Data Discovery

Twitter Tag: #briefr The Briefing Room

Analyst: Robin Bloor

Robin Bloor is Chief Analyst at The Bloor Group

[email protected] @robinbloor

Page 8: Data Wrangling and the Art of Big Data Discovery

Twitter Tag: #briefr The Briefing Room

Trifacta and Zoomdata

Trifacta offers a platform for data transformation and preparation

  The interface is rich in visualization and provides previews and recommendations

  The platform also includes a learning layer which employs machine learning algorithms to facilitate automation and self-learning

Zoomdata is a Big Data exploration, visualization and analytics platform

  The platform offers a wide range of analytics and BI tools, such as dashboards, stream processing and IoT analytics

  Its pre-built connectors allow the Zoomdata server to connect directly to data sources

Page 9: Data Wrangling and the Art of Big Data Discovery

Twitter Tag: #briefr The Briefing Room

Guests:

Russ Cosentino is Vice President of Marketing & Business Development at Zoomdata. Throughout his career he has focused on developing solutions that leverage technology to solve business problems. His experience includes application development for mission critical systems for the DoD, automated recruitment programs for the intelligence community and the application of text analytics for commercial VOC programs.

Dr. Joe Hellerstein is Trifacta’s Chief Strategy Officer and a Professor of Computer Science at Berkeley. His career in research and industry has focused on data-centric systems and the way they drive computing. In 2010, Fortune Magazine included him in their list of 50 smartest people in technology, and MIT Technology Review magazine included his Bloom language for cloud computing on their TR10 list of the 10 technologies “most likely to change our world.”

Page 10: Data Wrangling and the Art of Big Data Discovery

Data Wrangling and the Art of Big Data Discovery

Page 11: Data Wrangling and the Art of Big Data Discovery

Dr. Joe Hellerstein Professor, EECS Computer Science Division, UC Berkeley Co-founder & Chief Strategy Officer, Trifacta

DATA WRANGLING AND THE ART OF BIG DATA DISCOVERY

Russ Cosentino Vice President Marketing & Business Development, Zoomdata

Page 12: Data Wrangling and the Art of Big Data Discovery

Founded in 2012, from Berkeley/Stanford research roots

dp = data to the people “facilitating interactions between people and data

throughout the analytic lifecycle”

Stanford Visualization Group’s “Data Wrangler” Elegant solutions for a messy world: The 80% problem of preparing data for exploratory analytics

Page 13: Data Wrangling and the Art of Big Data Discovery

TRADITIONAL APPROACH TO DATA MANAGEMENT

Enterprise  Data  Warehouse  

Implement  Data  Sources  

ETL  

Structured  

Ingest

Storage  #1,  2,  N  

ELT  

Store  &  Process  

EDW  

Archive  

ETL  

Access  Data  

Analyze  Data  

Search

Statistical

Machine Learning

SQL

Serve

Serve

Op

timiz

e

Implement

Custom Application

Point Solution

ELT  

ELT  

Page 14: Data Wrangling and the Art of Big Data Discovery

MANY PEOPLE INVOLVED IN THE PROCESS

DATA ARCHITECT

DATABASE ADMINISTRATOR

SYSTEM ADMINISTRATOR

BUSINESS ANALYST

BI ADMINISTRATOR

SYSTEM ADMINISTRATOR

Page 15: Data Wrangling and the Art of Big Data Discovery

IT COULD BE SIMPLER

DATABASE ADMINISTRATOR BUSINESS ANALYST

Page 16: Data Wrangling and the Art of Big Data Discovery

MODERN DATA AND VISUALIZATION ENVIRONMENT

Visualiza8on  Data  Sources  

Structured  

Ingest

Store  &  Process   Data  Prepara8on  

Serve Unstructured  

Ingest

Serve

Page 17: Data Wrangling and the Art of Big Data Discovery

REAL BENEFITS OF A SELF-SERVICE APPROACH

+15% Cash Increase

+26% Pipeline Growth

-67% Cost Reduction

Real-Time

Page 18: Data Wrangling and the Art of Big Data Discovery

+15% Cash Increase

+26% Pipeline Growth

-67% Cost Reduction

+48% Speed of Delivery

+42% Self-Service Access

+40% Decision Quality

Real-Time Big Data

REAL BENEFITS OF A SELF-SERVICE APPROACH

Page 19: Data Wrangling and the Art of Big Data Discovery

+15% Cash Increase

+26% Pipeline Growth

-67% Cost Reduction

+70% Collaboration

+64% Decision Speed

+61% User Adoption

+48% Speed of Delivery

+42% Self-Service Access

+40% Decision Quality

Real-Time Interactive Big Data

REAL BENEFITS OF A SELF-SERVICE APPROACH

Page 20: Data Wrangling and the Art of Big Data Discovery

DEMONSTRATION

Page 21: Data Wrangling and the Art of Big Data Discovery

MODERN DATA ARCHITECTURE FOR SELF-SERVICE INTELLIGENCE

Page 22: Data Wrangling and the Art of Big Data Discovery
Page 23: Data Wrangling and the Art of Big Data Discovery

Twitter Tag: #briefr The Briefing Room

Perceptions & Questions

Analyst: Robin Bloor

Page 24: Data Wrangling and the Art of Big Data Discovery

I am not anumber!

To Round-Up & Wrangle

Robin Bloor, PhD

Page 25: Data Wrangling and the Art of Big Data Discovery

The Flow of Data

The movement of data:

from ACQUISITION through PREPARATION

to ANALYSIS

Is not necessarily simple…

Page 26: Data Wrangling and the Art of Big Data Discovery

The General Picture Data Sources

Analytics

ServiceMgt

Life CycleMgt

MetaDataDiscovery

MDM

MetaDataMgt

DataCleansing

DataLineage

ROUND|UP

WRANGLING

Staging Area(Hadoop)

Data Warehouseor other location

Data Streams

ETL

ETL

Page 27: Data Wrangling and the Art of Big Data Discovery

Immediate Analytics & the Rest

§  Metadata discovery

§  Metadata management

§  Data cleansing

§  Data lineage

IMMEDIATE ANALYTICS Data Sources

Analytics

ServiceMgt

Life CycleMgt

MetaDataDiscovery

MDM

MetaDataMgt

DataCleansing

DataLineage

ROUND|UP

WRANGLING

Staging Area(Hadoop)

Data Warehouseor other location

Data Streams

ETL

ETL

§  MDM

§  Service mgt

§  Lifecycle mgt

§  ETL

DOWNSTREAM

Page 28: Data Wrangling and the Art of Big Data Discovery

The Analytics Business Process

§  The main point to note is that it is iterative

§  It has morphed, because of:

o  Data availability

o  Parallel technology

o  Scalable software

o  Open source tools

o  M/C learning

DataAccess

DataPrep

Model

Analyze

Deploy

Execute

Page 29: Data Wrangling and the Art of Big Data Discovery

Analytical Latencies

1.  Data access

2.  Data preparation

3.  Model development

4.  Execution

5.  Implementation

6.  Model audit & update

This is where the rubber meets the road: Speed = Value

Page 30: Data Wrangling and the Art of Big Data Discovery

The Impending Reality

Technology is speeding up analytics by TWO ORDERS OF MAGNITUDE

(on the IT side)

This is changing analytics

Page 31: Data Wrangling and the Art of Big Data Discovery

u  Is your capability only relevant to analytics or does it have broader areas of application?

u  Technically, what makes it fast?

u  Please comment on analytical workloads: - What do you see as the natural IT bottlenecks? - What do you see as the natural business bottlenecks?

u  Do we want business analysts to become ersatz data scientists?

Page 32: Data Wrangling and the Art of Big Data Discovery

u  In respect to scale, what is your largest implementation by data volume, and what was the industry sector/problem space?

u  Who do you partner with?

u  What do you see as the largest barrier to adoption of Trifacta?

Page 33: Data Wrangling and the Art of Big Data Discovery

Twitter Tag: #briefr The Briefing Room

Page 34: Data Wrangling and the Art of Big Data Discovery

Twitter Tag: #briefr The Briefing Room

Upcoming Topics

www.insideanalysis.com

March: BI/ANALYTICS

April: BIG DATA

May: CLOUD

Page 35: Data Wrangling and the Art of Big Data Discovery

Twitter Tag: #briefr The Briefing Room

THANK YOU for your

ATTENTION! Some images provided courtesy of

Wikimedia Commons and Wikipedia, including: "Multiple pliers" by Ed Stevenhagen from nl. Licensed under CC BY-SA 3.0 via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:Multiple_pliers.jpg#mediaviewer/File:Multiple_pliers.jpg