“big data” & “the cloud” - dama ny€¦ · • cloud for self-service, collaboration,...

63
DAMA NY CHAPTER PRESENTATION “Big Data” & “The Cloud” Extreme Performance Data Warehousing Inside Of The Cloud Robert J. Abate, CBIP, CDMP Solutions Principal, EIM & Analytics Practice EMC Consulting CCo sut g January 19 th , 2012 1 © Copyright 2012 EMC Corporation. All rights reserved.

Upload: others

Post on 06-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

DAMA NY CHAPTER PRESENTATION

“Big Data” &“The Cloud”Extreme Performance Data Warehousing Inside Of The Cloud

Robert J. Abate, CBIP, CDMPSolutions Principal, EIM & Analytics PracticeEMC ConsultingC Co su t g

January 19th, 2012

1© Copyright 2012 EMC Corporation. All rights reserved.

DAMA NY CHAPTER PRESENTATION

Big Data & The Cloud

• Background & Definitions

AGENDA

• Background & Definitions

• The Challenge

A hit t l S l ti T Bi D t• Architectural Solutions To Big Data

• It’s A Brave New World

• Example Case Studies

• Open Discussion…

2© Copyright 2012 EMC Corporation. All rights reserved.

Background Background & & DefinitionsDefinitions

3© Copyright 2012 EMC Corporation. All rights reserved.

“Big data will represent a hugely disruptive force during the next five years – enabling levels of insight – that are currently unachievable through any other means”

4© Copyright 2012 EMC Corporation. All rights reserved.

currently unachievable through any other means” Gartner May 2011

We Are Awash In Data

• In the information age, every organization is in the “data” business

• Data is growing exponentially, so are the challenges

• Complexity is causing insight to be lost

Source: IDC Digital Universe White Paper, Sponsored by EMC, May 2009

5© Copyright 2012 EMC Corporation. All rights reserved.

Spo so ed by C, ay 009

Pictorial Representation Of Information

6© Copyright 2012 EMC Corporation. All rights reserved.

Big Data: More Than Just About Volume

i l

• Consider: Master Data, Fidelity, Complexity, Validity, Perishability, Linking Data

Velocity VolumeVideo

Transactional DataIndustry-

specificWeb traffic

• Structured Transactional Data: POS transactions, call detail records, credit card transactions, shipping updates purchase orders

TextSocial

shipping updates, purchase orders, payments, shipments, account transactions

• Unstructured Data: Web logs,

VarietyComplexity

Sensor/

newsfeeds, social media, geo-location, mobile, consumer comments, claims, doctor’s notes, clinical studies, images, video,

Smart GridImages

Audio

Documents

location-based

audio

• Device-generated Data: RFID sensors, smart meters, smart grids GPS spatial micro-payments

7© Copyright 2012 EMC Corporation. All rights reserved.

grids, GPS spatial, micro payments

The Typical BI/DW Environment Today…

8© Copyright 2012 EMC Corporation. All rights reserved.

Big Data’s Potential For Actionable Insight

Today’s Situation Big Data Ramifications

Vast majority of available Less than 10% of the

Forward looking or “Wi d hi ld i ”

Vast majority of available sources and external data

“Rear-view” mirror i d hb d d

Less than 10% of the enterprise’s data

“Windshield-view” predictions with recommendations

Re l time ne e l time

reporting, dashboards and analysis

– Weeks, months, or even quarters old

Correlated, high confidence, governed data

– Real-time near real-time

Incomplete, inaccurate, and disjointed data

quarters old

Vastly accelerated time to market

governed data

Architectures and methods that take 6 to 18 months to exploit

j

9© Copyright 2012 EMC Corporation. All rights reserved.

exploit

“Th Ti V l C ”“Th Ti V l C ”

Time Really is Money!

“THE TIME VALUE CURVE”© 2007 - Dr. Richard Hackathorn, Bolder Technology, Inc., All Rights Reserved. Used with Permission.

“The Time Value Curve”“The Time Value Curve”Value

ost

ost

Business EventBusiness Event

Capture Capture

Valu

e L

oValu

e L

o Latency

AnalysisLatency

Data Ready For Analysis

Information Delivered

Latency

AnalysisLatency

Data Ready For Analysis

Information Delivered

A ti TiA ti Ti

Action TakenTaken

Decision LatencyDecision Latency

DataLifecycle

Action TimeAction TimeTime

10© Copyright 2012 EMC Corporation. All rights reserved.

Lifecycle

Data Is Coming At Us Faster In a recent TDWI survey of 450 CIO’s

17% have a real time data warehouse– 17% have a real time data warehouse

– 90% plan on having a real time warehouse

% ill l l i– 75% will replace to get to a real-time solution

“REAL TIME IS A RAPIDLY BECOMING A NECESSARY FOUNDATION TO A A NECESSARY FOUNDATION TO A DATA SOLUTION AND WITHOUT

ARCHITECTURE THERE IS CHAOS!”

11© Copyright 2012 EMC Corporation. All rights reserved.

Data Is Coming From All Directions

Data is now commonly entering into the enterprise from external sourcesp– Government (Census, Revenues, …)

– Neilson, NPD Group (Sales), p ( )

– Bloomberg, NYSE (Financial Position)

– Experian, TransUnion, Equifax (Credit Experian, TransUnion, Equifax (Credit Reporting)

– Google Maps, MapInfo (Geospatial, …)

– Radian 6, Biz360, … (Client Trend Data)

– Etc.

12© Copyright 2012 EMC Corporation. All rights reserved.

Need For Data TrustC li ith l Compliance with laws

– Revenue Canada, Sarbanes Oxley [SOX], BASIL II, HIPAA, etc.

L k f fid i th d t Lack of confidence in the data– Reports utilizing same data do not report same totals or

computations

D t t d fi d d dil il bl Data not defined and readily available– Multiple sources of data have to be rationalized at each

project start-up thereby wasting valuable time & $ on every projecty p j

Data timeliness– Manual process to collect, analyze and provide results

Data integ it Data integrity– Unknown filters, varying calculation/computations, fields

used for data not indicative of field names, data passed along from one person to another to another to another…..

13© Copyright 2012 EMC Corporation. All rights reserved.

g p

Summation Of Challenges We Are ObservingObserving

• Business mandate to obtain more value out of the data (get answers)of the data (get answers)

• Variety of sources, amounts, types and granularity of data that customers want to integrate is growing exponentially

• Need to shrink the latency between the b i d h d il bili f business event and the data availability for analysis and decision-making

• Advancing agility of information is key• Advancing agility of information is key

• Need for Data trust and Compliance with regulations

14© Copyright 2012 EMC Corporation. All rights reserved.

regulations

The The Challenge Challenge Of Big DataOf Big Data

15© Copyright 2012 EMC Corporation. All rights reserved.

“Old” Journey To Information Maturity [EIM]Data Chaos• Same type of data means different things in different systemsE AT&T i th

Master Data• Publish and Subscribe to master dataEx: Single view of

Data Analytics• Analyzing the data.• Looking for trends and correlations

• Ex: AT&T is the same as AT&T Inc

• Ex: Single view of customer across all information systemsData Discovery Data Governance Data Integration Data MiningPROCESSES

Data Chaos Defined Data Master DataIntegrated

InformationData

AnalyticsBusiness

Optimization

Defined Data Integrated Predictive

Data Discover Metadata ETL Suite BI / DW / OLAPTOOLS

Defined Data• Define common meanings.

• Ex: Determine the sources, types, and

f d

Integrated Information• Bring metadata together with information for

Predictive Information• Using the analyzed data to optimize operations

• Wiki Type Sharing Of Self-

16© Copyright 2012 EMC Corporation. All rights reserved.

properties of grouped (i.e.: customer) records

reporting (BI) and warehousing (drilling and hierarchies).

Provisioned Environments• Atomic Data Analytics

The Information Issue IsThe Information Issue Is…

Too many organizations are not using information to its full advantage: information to its full advantage: – 1 in 3 business leaders frequently make

critical decisions without the information they need

– 1 in 2 business leaders do not have access to the information across their organization to the information across their organization needed to do their jobs.

– 3 in 4 business leaders say more predictive y pinformation would drive better decisions

17© Copyright 2012 EMC Corporation. All rights reserved.

Source: Source: IBM Institute for Business Value, March 2009

Information Trust & Business Alignment Harris Interactive recently polled 23,000 U.S.

employees and foundOnly 37% said they have a clear understanding of – Only 37% said they have a clear understanding of what their organization is trying to achieve and whyO l i fi th i ti b t th i t – Only one in five was enthusiastic about their team and the organization’s / corporation’s goals

– Only one in five said they have a clear “line of sight” between their tasks and their team and organization’s goals

– Only 15% felt that their organization fully enables y g ythem to execute key goals

– Only 20% fully trusted the organization they work for

18© Copyright 2012 EMC Corporation. All rights reserved.

Only 20% fully trusted the organization they work for

Viewed Using An Seasonal Analogy…

If a football team had these players on the fi ldfield:

– Only 4 of the 11 players on the field would know which goal is theirs

– Only 6 of the 11 would care – Only 3 of the 11 would know Only 3 of the 11 would know

what position they play and what they are supposed to do

– 9 players out of 11 would, in 9 players out of 11 would, in some way, be competing against their own team rather than the opponent

19© Copyright 2012 EMC Corporation. All rights reserved.

pp

Perceived Complicated Landscape

• BI/DW is perceived as not “enabling” the business– Inhibitor to corporate progress IT systems cannot be

changed fast enough to meet market demands, seize g gopportunity or comply with a new requirement.

– Weak alignment between IT and business strategy Marked by an intractable language barrier.

i l h f i– Business not always sure what Information or Dimensions they want or need How can IT provide without requirements?BI/DW is not known as the source of innovations– BI/DW is not known as the source of innovations

• The complexity of systems has caused BI/DW to be reactive rather than proactive

– Silo’d solutions, db’s and applications with trapped business rules

– Multiple sources of information and no single “truth”No “Architectural Blueprints” to the enterprise

20© Copyright 2012 EMC Corporation. All rights reserved.

– No “Architectural Blueprints” to the enterprise…

The Business Intelligence Maturity Model

21© Copyright 2012 EMC Corporation. All rights reserved.

Advancing The Maturity Of Information…

22© Copyright 2012 EMC Corporation. All rights reserved.

The big data impacts to both business and IT are significant;early adopters will fundamentally change their industries

• More agile, more real-time, more accurate decision-making

Business Expectations IT Ramifications

• Enhanced user experience that delivers insights to any deviceg

• Predict and spot changes in dynamic and volatile markets

• Deeper understanding of customer preferences and behavior

• Greater fidelity in risk assessment and li f t

g y• Operationalization of data scientists and

analytic insights• Tools and processes for data quality,

governance, and security• Cloud for self-service, collaboration, agility,

d t d ticompliance enforcement and cost reduction

“Big data poses a major opportunity for CIOs to drive added value for the business by deriving insights and added value for the business, by deriving insights and identifying patterns from the huge amounts of data available”

“Through 2015, organizations integrating high value, diverse new information sources and types into a coherent information management infrastructure will outperform industry peers financially by more than 20%”

23© Copyright 2012 EMC Corporation. All rights reserved.

Source: Gartner"The New Value Integrator," Insights from the Global Chief Financial Officer Study”July 2011

Architectural Architectural Solutions For Solutions For Big Datag

24© Copyright 2012 EMC Corporation. All rights reserved.

Big Data Requires Change…g q g

Consider 100 GB would store the entire US Census DB “basic” information set for every Census DB “basic” information set for every living human being on the planet:

Age Sex Income Ethnicity Language Religion – Age, Sex, Income, Ethnicity, Language, Religion, Housing Status, Location into a 128 bit set

– That equates to about 6.75 millions rows of at equates to about 6 5 o s o s oabout 10 collumns

Consider the Large Hadron Collinder at CERN– Expected to produce 150,000 times as much raw

data each year

25© Copyright 2012 EMC Corporation. All rights reserved.

The Big Change In Technologies

Consider that Relational technologies were invented to get data in invented to get data in and organized, not designed nor organized t t it tto get it out

– RDBMS’s were designed for efficient transactions processing on large data sets

▪ Adding, Updating

▪ Searching for & retrieving small amounts of data

26© Copyright 2012 EMC Corporation. All rights reserved.

[2] Source: ACM Website “The Pathologies of Big Data”, Adam Jacobs, 7/6/09

Data Warehouses Were An AnswerDW l i ll d i d “ f DW was classically designed as “copy of transaction data specifically structured for query and analysis”query and analysis

– General approach is bulk ETL into a DB designed for queries

Big data changes the answer– “Traditional RDBMS-based dimensional modeling

and cube-based OLAP turns out to be to slow or and cube based OLAP turns out to be to slow or to limited to support asking the really interesting questions of warehoused data”[2]

“To achieve acceptable performance for highly order-dependent queries on truly large data, one must be willing to consider

abandoning the purely relational database model[2]”

27© Copyright 2012 EMC Corporation. All rights reserved.

[2] Source: ACM Website “The Pathologies of Big Data”, Adam Jacobs, 7/6/09

Voluminous Data Sets…

What makes large data sets are repeated observations over time/spacerepeated observations over time/space

– Web log has M’s visits over handful pages

Retailer has 10K products M custs but B trans– Retailer has 10K products, M custs, but B trans

– Hi-Res Scientific like fMRI 1K GB per view

L d t t S ti l T l di ’– Large datasets Spatial or Temporal dim’s

Cardinalities (distinct observations) is usually small with regard to total # of observations

28© Copyright 2012 EMC Corporation. All rights reserved.

Technology Solutions Appeared…

29© Copyright 2012 EMC Corporation. All rights reserved.

Lets Talk Technical Solutions…Sequential and/or Distributed File-Based Solutions

– Oracle Exadata, Hadoop, etc.

Columnar (compression) / Multi-Level Tables( p ) /– Solves challenge of retrieving entire row– Par-Excel, Vertica, Sybase, etc.

Distributed MPP– Teradata, Greenplum, etc.

Polymorphic– Combination of Columnar & MPP

30© Copyright 2012 EMC Corporation. All rights reserved.

Finding Answers Sequentially With OLTP

Random access is slower than sequential

The advantage gained by doing all data g g y gaccess in sequential order is often 4x – 10x

– Many orders of magnitude !

31© Copyright 2012 EMC Corporation. All rights reserved.

[2] Source: ACM Website “The Pathologies of Big Data”, Adam Jacobs, 7/6/09

Distributed File: Partitioning With OLTP

Partitioning can solve challenges of data Partitioning can solve challenges of data growth, but true distributed processing utilizing MPP is best (author’s opinion)

32© Copyright 2012 EMC Corporation. All rights reserved.

utilizing MPP is best

Distributed File: Partitioning Viewed

Q: What was the total transactions (sales)

amount for May 20 and May 21 2009?

Sales Table

5/17May 21 2009? 5/17

5/18Only the 2

Select sum(sales_amount)

From SALES

5/19

5/20

relevant partitions are read

Where sales_date between

to_date(‘05/20/2009’,’MM/DD/YYYY’)

And

5/20

5/21

to_date(‘05/22/2009’,’MM/DD/YYYY’);5/22

33© Copyright 2012 EMC Corporation. All rights reserved.

Source: Extreme Performance With Oracle Data Warehousing

Distributed File: Open Source (Hadoop)

Apache Hadoop is a software framework that supports data-intensive p p ppdistributed applications under a free license.

– It enables applications to work with thousands of nodes and petabytes of data– Hadoop was inspired by Google's MapReduce and Google File System (GFS)

papers.papers.

Hadoop is a top-level Apache project being built and used by a global community of contributors using the Java programming language.

– Yahoo! has been the largest contributor to the project, and uses Hadoop extensively across its businesses

34© Copyright 2012 EMC Corporation. All rights reserved.

extensively across its businesses.

Source: Wikipedia “Hadoop”

Distributed File: Hash-Based Distribution

In a hash-based data distribution, the data is distributed across multiple platforms for

ll li f iparallelism of queries…

35© Copyright 2012 EMC Corporation. All rights reserved.

Columnar: Storage

In a table with say 256 columns, a lookup will retrieve all the data in the row (disk bound) Columnar storage reduces this I/O bandwidth by storing g / y g

column data using compression– State (50 combinations stored)– Master (compressed) table has pointers to State

36© Copyright 2012 EMC Corporation. All rights reserved.

( p ) p

Source: Vertica Website

Columnar: Multi-Level Table Partitioning

In multi-level table partitioning, data distribution occurs across multiple platforms in segmented p p gtables for distribution of columnar queries

This reduces the amount of work performed by each platfo m

37© Copyright 2012 EMC Corporation. All rights reserved.

each platform

MPP Shared Nothing Architectures

Extreme scalability

Elastic Expansion & Self-Healing Fault-Tolerance

Unified Analytics

38© Copyright 2012 EMC Corporation. All rights reserved.

y

Source: “Greenplum Database 4.0: Critical Mass Innovation”, White Paper, August 2010

MPP Shared Nothing Architectures

39© Copyright 2012 EMC Corporation. All rights reserved.

Source: “Greenplum Database 4.0: Critical Mass Innovation”, White Paper, August 2010

The “Ideal” – MPP Shared Nothing

Poly-Morphic StorageTabular, Columnar,

NoSQL, etc.

40© Copyright 2012 EMC Corporation. All rights reserved.

It’s A Brave It s A Brave New WorldNew World

41© Copyright 2012 EMC Corporation. All rights reserved.

From the Old Stack to a New Ecosystem: Drivers for Changeg Many new data sources (organic growth, data services, M&A)

– Impractical to add new data sources because of tightly coupled pipeline

M t t d d t i l di i l di More unstructured data, including social media– Lack of access to unstructured data; need analytics and classifiers that operate on it

Less up front data integration– Can’t assume data is pre-integrated – have to be able to locate and to query federated – Can t assume data is pre-integrated – have to be able to locate and to query federated

sources of data and content

More need to track and leverage metadata– Metadata is fragmented, jailed and inconsistent – need agile, community approach

Need for flexible, agile data structures– Current structures are too rigid, and too close to the sources or the business reports

More emphasis on dynamic views for purposeo e e p as s o dy a c e s o pu pose– Need dynamic planning, creation and structuring of views that support analytics

Information governance and management in a federated, regulated world– Need flexible policy expression and enforcement, not just at point of access

42© Copyright 2012 EMC Corporation. All rights reserved.

An Information Platform with New DNATo Promote Agility Business Value and CommunityTo Promote Agility, Business Value and Community

1. Coordinated ingestion of diverse information, changes, events

2. Metadata driven processing and management

3. Nuanced optimization – on demand, multi-source, matching information needs

4. Broader reach of query – contextual search, federation, materialization

5. Freedom from imposed information structure – roll your own structure!

6 Navigation through information – contextual faceted multi-dimensional6. Navigation through information contextual, faceted, multi dimensional

7. Visualization of information – heat, clouds, clusters, flows

8. New data paths engendered by patterned consumption of entities

9 R i b t d t t l ti d i ti f h d bli ti9. Reasoning about data set location, derivation, freshness, and obligations

10. User empowerment – collaboration and talent development

43© Copyright 2012 EMC Corporation. All rights reserved.

Businesses Want Integrated, Timely Information for Purposefor Purpose

Area Revolution

Latency “Microbatch is the new Batch”

Enrichment “Tagging is the new Transformation”

Query “Query is the new ETL”

Federation “Query Director is the new Query Optimizer”

Source “Purposeful View is the new Master”

44© Copyright 2012 EMC Corporation. All rights reserved.

Some Of The Newer Trends In Big Data

Powerful Analytics– What if, What will happen next, …, pp ,

– Self-service analytics?

▪ Build your own sandbox of data…u d you o sa dbo o da a

Data Cloud Surrounded Warehouse– Data Virtualization– Data Virtualization

▪ Abstracting the data from the systems, it complements existing data warehouses

– Many times the size of structured warehouse

– Provides for rapid analytic iterations

45© Copyright 2012 EMC Corporation. All rights reserved.

p y

When You Link Structured & Unstructured Information You Get…

46© Copyright 2012 EMC Corporation. All rights reserved.

Powerful Analytical EnginesWhat is the best price to sell my product?

47© Copyright 2012 EMC Corporation. All rights reserved.

How Do I Do This?How Do I Do This?

48© Copyright 2012 EMC Corporation. All rights reserved.

How Do I Do This #2?How Do I Do This #2?

49© Copyright 2012 EMC Corporation. All rights reserved.

How Do I Do This #3?How Do I Do This #3?

50© Copyright 2012 EMC Corporation. All rights reserved.

Visualize The Information…

51© Copyright 2012 EMC Corporation. All rights reserved.

Analytics: A Picture Is Worth A 1,000 WordsWords

52© Copyright 2012 EMC Corporation. All rights reserved.

Data Virtualization Example

53© Copyright 2012 EMC Corporation. All rights reserved.

Data Virtualization In Practice

54© Copyright 2012 EMC Corporation. All rights reserved.

Enterprise Big Data Cloud

55© Copyright 2012 EMC Corporation. All rights reserved.

The Future Of Data Warehousing?The “Ideal” AAbatebate Enterprise Data Cloud Truly Virtualized Data Environment Extreme Scale, Elastic Expansion Automated Metadata Discovery, Classification & Tagging Linearly Scalable Linearly Scalable

– Add 1x and get 2x performance

Self – Service Provisioning Single Point Of Management

– Resource utilization optimization

Secure, Unified Data Access – Single Point of Entry– Portal based sharing of data sandboxes (wiki-type)

Reduce TCO By Eliminating Excessive Licensing Fees– Use of open source community to improve solution

56© Copyright 2012 EMC Corporation. All rights reserved.

Example Example Case StudiesCase Studies

57© Copyright 2012 EMC Corporation. All rights reserved.

Telecomm Provider Learns A Lesson…

BIG DATA ANALYTICS USE CASE

e eco o de ea s essoBefore investing $M of dollars on infrastructure, a provider learned where to invest their monies that would payoff…

Ch llChallenge– 100TB Traditional EDW, Single Source Of Truth– Operational Reporting & Financial Consolidation– Heavy Governance And Control– Unable To Support Critical Business Initiatives– Customer Loyalty And Churn The #1 Business

Initiative From The CEO

Enterprise Data Cloud Enterprise Data Cloud Architecture-Based Solution

– Extracted Data From EDW & Other Sources– Generated Social Graph From Call Detail

And Subscriber Data– Within 2 Weeks Found “Connected”

Subscribers7X More Likely To Churn Than Average UsersN D l i 1PB P d ti

58© Copyright 2012 EMC Corporation. All rights reserved.

– Now Deploying 1PB Production

Drive Multi-channel Campaign Optimization

BIG DATA ANALYTICS USE CASE

Drive Multi channel Campaign OptimizationRetailer increases in-flight multi-channel effectiveness with customer and product insights

HIGH

ion

LegacySystem Advanced

Analytics

ood

Of C

onve

rsi

Big Data Analytics

I t t t b h i l d t ith

LOW

Like

liho

Monitor cross-channel product

sales effectiveness

Integrate customer behavioral data with social media sentiment data to yield new market, product and campaign insights

59© Copyright 2012 EMC Corporation. All rights reserved.

Innovate With Big Data Analytics

BIG DATA ANALYTICS USE CASE

Innovate With Big Data AnalyticsBig Data Analytics Accelerate Health Care 2.0 for Evidence-based Care Provider

HIGHHIGH

Car

e

LegacySystem BI Reporting

Big Data

AdvancedAnalytics

Qua

lity

of C

Delivering 10 Years

g ataAnalytics

Associative Rule Mining and User External Data Sources Enable

LOWTreatment

Pathways onTreatment

Pathways on

Delivering 10 Years Of Data In Seconds

Associative Rule Mining and User Clustering Improves Pathways

External Data Sources Enable Personalized Medicine

TRADITIONAL DATA LEVERAGED

a ays oSummary Data

a ays oAll the Data

BIG DATA LEVERAGED

60© Copyright 2012 EMC Corporation. All rights reserved.

O h OfOpen Exchange Of Ideas…Ideas…

Speaker Contact Information:Speaker Contact Information:

Robert J. Abate, CBIP, [email protected](201) 745-7680

61© Copyright 2012 EMC Corporation. All rights reserved.

Credits To Quoted AuthorsAdam Jacobs i i f i 1010d I h h l h l d h Adam Jacobs is senior software engineer at 1010data Inc., where, among other roles, he leads the continuing development of Tenbase, the company’s ultra-high-performance analytical database engine. He has more than 10 years of experience with distributed processing of big datasets, starting in his earlier career as a computational neuroscientist at Weill Medical College of Cornell University (where he holds the position of Visiting Fellow) and at UCLA. He holds a Ph.D. in neuroscience from UC Berkeley and a B.A. in linguistics from Columbia University. (QUOTED FROM: “The Pathologies of Big Data”, 7/6/09)a B.A. in linguistics from Columbia University. (QUOTED FROM: The Pathologies of Big Data , 7/6/09)

Bill Schmarzo has over two decades of experience in data warehousing, BI and analytic applications (Metaphor Computers, 1984). Bill authored the Business Benefits Analysis methodology that links an organization’s strategic business initiatives with their supporting data and analytic requirements, and co-authored with Ralph Kimball a series of articles on analytic applications. Bill has served on The Data W h I i f l h h d f h l i li i i l Bill VP f A l i Warehouse Institute faculty as the head of the analytic applications curriculum. Bill was VP of Analytics at Yahoo where he was responsible for the development of Yahoo’s Advertiser and Web Site analytics products, including the delivery of “actionable insights” through a holistic user experience. For Business Objects, Bill oversaw the Analytic Applications business unit including the development, marketing and sales of Business Objects’ industry-leading analytic applications.

Donald Sutton has over 20 years experience in Data Architecture, Analysis, Modeling, ETL, Implementation and Integration in the areas of Data Entry (OLTP) or ERP and 3rd Party COTS Applications, Operational Data Store (ODS), Master Data Store (MDS), Data Warehouse (DW) and Data Marts (DM) while providing Business Intelligence (BI) from multiple sources above. Passionate and motivated about sound design of data structures in all different data layers and the representation and t f ti f d t ith th ti d f d t th h t ll d t l hil transformation of data with the accounting and governance of data throughout all data layers while Providing Business Intelligence (BI) and analytics with Key Performance Indicators (KPI) along with business modeling in translating business requirements to data requirements. (QUOTED FROM: Current Warehousing Environment & Analytics Visualizations)

62© Copyright 2012 EMC Corporation. All rights reserved.