unlocking value in your (big) data
DESCRIPTION
The presentation is a introduction to Big Data and analytics, how to go about enabling big data and analytics in our company, what are the main differences between big data analytics vs. traditional analytics and how to get started. This material was used at the SAS Big Data Analytics event held in Helsinki on 19th of April 2011. The slides are copyright of Accenture.TRANSCRIPT
Unlocking Value in (Big) Data
Oscar Renalias, [email protected]
Copyright © 2012 Accenture All rights reserved.
About the presenter
Oscar is a Technology Architect and has been working at
Accenture in the Helsinki office for the last 5 years. He holds a
Bachelor’s Degree in Computer Science from the Universitat
Politècnica de Catalunya (UPC), in Barcelona.
Oscar currently belongs to the global organization within Accenture
responsible for pushing technology innovation, working with
selected new and emerging technologies together with clients to
generate business value. Hadoop/Big Data is one of those areas.
+358407725915
Oscar Renalias
Copyright © 2012 Accenture All rights reserved.
Agenda
• Top 4 things about Big Data & Analytics
• What is Big Data?
• Big Data Analytics – what is it?
• What does it contain?
• How is it integrated?
• How do we manage it?
• What next?
Copyright © 2012 Accenture All rights reserved.Copyright © 2012 Accenture All rights reserved.
Resistance is futile,you will be assimilated
Competitive advantage
It’s different
Data wants to be open
Top 4 things about Big Data Analytics
Copyright © 2012 Accenture All rights reserved.Copyright © 2012 Accenture All rights reserved.
Data is growing
It’s growing. Quickly. And it’s everywhere.
Source: IDC’s Digital Universe Study (sponsored by EMC), June 2011
2005 2010 20150
1000
2000
3000
4000
5000
6000
7000
8000
9000
130
1227
7910
Data stored in Exabytes (1018)
Copyright © 2012 Accenture All rights reserved.Copyright © 2012 Accenture All rights reserved.
Source: An IDC White Paper - sponsored by EMC. As the Economy Contracts, the Digital Universe Expands. May 2009.
.
Complex, Unstructured
Relational
New kinds of data
Structured data vs. Unstructured data growth
Our ability to analyze
Analysis gap
Copyright © 2012 Accenture All rights reserved.Copyright © 2012 Accenture All rights reserved.
Big Data Technologies
New technologies, new approaches
Source: Wordle for Credit Suisse, Does Size Matter Only?, September 2011
Copyright © 2012 Accenture All rights reserved.Copyright © 2012 Accenture All rights reserved.
Where do analysts see Big Data?
Gartner’s Hype Cycle for Emerging Technologies 2011
Copyright © 2012 Accenture All rights reserved.Copyright © 2012 Accenture All rights reserved.
MapReduce and Hadoop
MapReduce revolutionized how we handle large amounts of data, Hadoop made it simple and affordable
• Originally designed and first developed in Google as part of their efforts to more efficiently index the web
• MapReduce splits input data into smaller chunk that can be processed in parallel
• Scales linearly with number of nodes
• Yahoo’s implementation of MapReduce• Open source, top-level project in the
Apache Foundation• Designed to run on commodity software
(Linux) and hardware (consumer-grade computers with directly attached storage)
• Large ecosystem of additional components (both open source and commercial)
Copyright © 2012 Accenture All rights reserved.Copyright © 2012 Accenture All rights reserved.
Big Data Analytics is a shift in the mindset of how we think about analytics as an internal component to the organization
Focuses on letting data be productized in a way that drives meaningful insights in a rapid fashion and innovation to exploit missed opportunities in areas previously unlooked…
… providing a path to competitive advantage
Big Data Analytics
What is it?
Copyright © 2012 Accenture All rights reserved.Copyright © 2012 Accenture All rights reserved.
Big Data Analytics vs. traditional analytics
Where do they differ?
Technology Skills Processes & Organization
Big
Dat
a A
naly
tics
Tra
ditio
nal
Ana
lytic
s
Assumes condensed, structured, and feature rich datasets that can be modeled: relational databases, data warehouses, dashboards
Basic knowledge of reporting and analysis tools, few specialized resources
“Siloed” data organizations
Only specific “views” of data visible across the enterprise
A stack of tools that enables an organization to build a framework that allows them to extract useful features from a large dataset to further understand how to model their data.
Advanced analytical, mathematical and statistical knowledge required to develop new models – the data scientist
Data is productized and shared across the enterprise
Dedicated data organizations with well-defined data management processes and ownership
Copyright © 2012 Accenture All rights reserved.Copyright © 2012 Accenture All rights reserved.
Everything will be analyzed
The three Vs
Structured Unstructured
Batch
Real-time
Velocity
Variety Source: IDC
Hadoop, ETLRelational,
ETL
In-memory, NoSQL, Event
processing, EDW
Event processing, Hadoop + NoSQL
Volume
Copyright © 2012 Accenture All rights reserved.
Analytics-Focused Massively Parallel Processing (MPP) Software Platforms
Distributed In-memory
Big Data and Analytics in the Enterprise
Many technology choices in a rapidly changing environment. Which one is right for you?
Cloud
Hardware Optimized MPP Data Warehouses
Distributed Non-Relational Storage and Processing
Big Data-Enabled Intelligence and Analysis
Copyright © 2012 Accenture All rights reserved.Copyright © 2012 Accenture All rights reserved.
Technology
Augmenting existing analytics with Big Data technologies
Emerging Data
Technologies
Traditional Tools
Big Data Analytic
s
Copyright © 2012 Accenture All rights reserved.Copyright © 2012 Accenture All rights reserved.
SAS/Access Interface to Hadoop
• Enable SAS user to analyze data stored in Hadoop
• Allow Hadoop data processing from SAS client software such as Data Integration Studio, Enterprise Guide and
Enterprise Miner.
• The Access Engine not only move data into and out of Hadoop, but you can also run data processing and have it
“pushed-down” into Hadoop
SAS Data Integration Studio Transformation for Hadoop
• New sets of Hadoop transformations that enable DI studio user to load and unload data from Hadoop faster than
Sqoop (Can connect to Oracle)
• Perform “ETL-like” processing with Hive and Pig.
• Hadoop specific scoring transform that enable models to be developed with Enterprise Miner to be deployed to
Hadoop via DI Studio.
SAS-Hadoop integration
An example of how traditional analytics tools are evolving to interoperate with Hadoop
Copyright © 2012 Accenture All rights reserved.Copyright © 2012 Accenture All rights reserved.
The impact of Big Data Analytics on our landscapes
Hybrid landscapes, where old and new converge
ERP CRMWebLogsTime
Series Files Social
Relational DBs
EnterpriseDW
Real-time analytics
HDFS
HBaseMapReduce
Hive
Data Services (REST, WS)
Pig
ETL
Internal apps, customer-facing apps, mobile apps Analysis tools
(SAS, SPSS, R, Tableau)
Copyright © 2012 Accenture All rights reserved.Copyright © 2012 Accenture All rights reserved.
Data science
“The sexy job in the next 10 years will be statisticians”
– Hal Varian, Chief Economist at Google
Data scientists are the next-generation analytics professional, responsible for turning the data into insight
Data Science and the skill gap
Closing the loop – it’s not just about technology skills
Copyright © 2012 Accenture All rights reserved.Copyright © 2012 Accenture All rights reserved.
In big data analytics resources generally have a hybrid cross between Software Engineering and Advanced Statistics. This dynamic of skill sets produces a challenge in project methodology.
Big Data Analytics Management
How does Big Data Analytics Management Style Differ?
Strategy
Release
Iteration
Daily
Continuous
Requirements
Design
Implement
Verify
Maintain
Software MethodologiesAnalytics Methodologies
Copyright © 2012 Accenture All rights reserved.Copyright © 2012 Accenture All rights reserved.
Wrapping up
Big Data is challenging current patterns of thought
Cost-effective computing and
storage
Data “explosion”
Everything can be stored
Cheap large scale computing power readily available
Data everywhere: structured, unstructured, other people’s data, geolocation data
Big Data and Analytics
Resistance is futile
Are the path to competitive advantage and create value
Compared to traditional analytics, they’re different; adapt or become irrelevant
Open your data
Copyright © 2012 Accenture All rights reserved.Copyright © 2012 Accenture All rights reserved.
• Identify business processes that you could do more effectively with the help of big data and analytics
• Start with well-funded but small trials and proof-of-concepts, evolve towards a solid roadmap
• Open up your data, transformation towards a “data as a service” architecture
• Acquire or grow the needed technology and analytical skills
Wrapping up
How to get started
Copyright © 2012 Accenture All rights reserved.
Accenture Technology Vision
http://bit.ly/accenturetechnologyvision2012
Strong advice on data for 2012