mar-10 improving data management through utilizing big data - mapping a technology to a data concept...

18
Improving Data Management through Utilizing Big Data: Mapping a Technology to a Data Concept March 10, 2015 Mike Jennings – Walgreens Boots Alliance ©2015 Walgreens Boots Alliance. All rights reserved. Confidential and proprietary information

Upload: mfjennin777

Post on 12-Aug-2015

207 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Mar-10 Improving Data Management through utilizing Big Data - Mapping a Technology to a Data Concept v1

Improving Data Management through Utilizing Big Data:Mapping a Technology to a Data ConceptMarch 10, 2015Mike Jennings – Walgreens Boots Alliance

©2015 Walgreens Boots Alliance. All rights reserved. Confidential and proprietary information

Page 2: Mar-10 Improving Data Management through utilizing Big Data - Mapping a Technology to a Data Concept v1

Big DataDefining

2©2015 Walgreens Boots Alliance. All rights reserved. Confidential and proprietary information

Describe any voluminous amount of structured, semi‐structured and unstructured data that has the potential to be analyzed for information

From www.bizcubed.com.au

Page 3: Mar-10 Improving Data Management through utilizing Big Data - Mapping a Technology to a Data Concept v1

Enterprise Data Management FrameworkStarting EDM Definition

3©2015 Walgreens Boots Alliance. All rights reserved. Confidential and proprietary information

Page 4: Mar-10 Improving Data Management through utilizing Big Data - Mapping a Technology to a Data Concept v1

Enterprise Data Management FrameworkContext with the DMBOK Framework

4©2015 Walgreens Boots Alliance. All rights reserved. Confidential and proprietary information

Page 5: Mar-10 Improving Data Management through utilizing Big Data - Mapping a Technology to a Data Concept v1

Enterprise Data Management FrameworkAlternative EDM Framework

Metad

ata Man

agem

ent

Data Con

text

Data M

odel/ClassificationData Structure and Fram

ework

Structured Data 

Management

UnstructuredData

Management

Master Data &Reference DataManagement

Business Intelligence & 

Data Warehousing

Data Quality Management

Data Security Management

DataIntegrationManagement

Data DeliveryManagement

Data GovernancePolicies, Processes, Standards, Organization, and Stewardship

5©2015 Walgreens Boots Alliance. All rights reserved. Confidential and proprietary information

Page 6: Mar-10 Improving Data Management through utilizing Big Data - Mapping a Technology to a Data Concept v1

6©2015 Walgreens Boots Alliance. All rights reserved. Confidential and proprietary information

Page 7: Mar-10 Improving Data Management through utilizing Big Data - Mapping a Technology to a Data Concept v1

7©2015 Walgreens Boots Alliance. All rights reserved. Confidential and proprietary information

Page 8: Mar-10 Improving Data Management through utilizing Big Data - Mapping a Technology to a Data Concept v1

8©2015 Walgreens Boots Alliance. All rights reserved. Confidential and proprietary information

Page 9: Mar-10 Improving Data Management through utilizing Big Data - Mapping a Technology to a Data Concept v1

DMBOK Functions & Big Data ProjectsData Storage & Operations

9©2015 Walgreens Boots Alliance. All rights reserved. Confidential and proprietary information

The technologies and processes organizations use to maximize or improve the performance of their data storage resources.

File system that provides the ability to store large volumes of structured and unstructured data

Operations, resource (node), and scheduling management for write and read to the cluster

Workflow scheduling component for data transformations

Manages services, configurations, and their synchronization across the cluster

Page 10: Mar-10 Improving Data Management through utilizing Big Data - Mapping a Technology to a Data Concept v1

DMBOK Functions & Big Data ProjectsData Integration & Interoperability

10©2015 Walgreens Boots Alliance. All rights reserved. Confidential and proprietary information

The combination of technical and business processes used to combine data from disparate sources into meaningful and unified view, according 

to business requirements and accepted practices.

Provides real‐time processing of data streams for monitoring and alerts.

Provides ability to import data from a RDBMS to HDFS. 

Provides ability to collect, aggregate, and move huge log files ). into HDFS (e.g., apps, GPS, social, sensors, other).  

Provides high volume fault tolerant publish & subscribe messaging for real‐time analysis.

Page 11: Mar-10 Improving Data Management through utilizing Big Data - Mapping a Technology to a Data Concept v1

DMBOK Functions & Big Data ProjectsData Quality

11©2015 Walgreens Boots Alliance. All rights reserved. Confidential and proprietary information

A measure of the degree to which data satisfies the information needs of its consumers, reflects the nature and state of the real world concepts to which it relates, is coherent within itself, and provides value in the decision‐making 

processes for which it is to be utilized.

Provides relational structure to HDFS data. File formats can be applied  to data from HDFS or local file system

Provides ability to import data from a RDBMS to HDFS. Imported data can be constrained through import control  arguments and basic SQL execution.

Provides ability to collect, aggregate, and move huge log files ). into HDFS (e.g., apps, GPS, social, sensors, other).  Flume agent can be use with predefined data patterns (sinks) to ensure data format.

Page 12: Mar-10 Improving Data Management through utilizing Big Data - Mapping a Technology to a Data Concept v1

DMBOK Functions & Big Data ProjectsMeta‐data

12©2015 Walgreens Boots Alliance. All rights reserved. Confidential and proprietary information

All the physical data and knowledge about the business and technical processes used by an organization.  Meta‐data is knowledge about the 

organization’s data. 

Provides data lineage between data sources and the  cluster including integration with the metastore/catalog (e.g., Hive HCatalog).

Provides relational structure to HDFS data. File formats can be applied  to data from HDFS or local filesystem

Page 13: Mar-10 Improving Data Management through utilizing Big Data - Mapping a Technology to a Data Concept v1

DMBOK Functions & Big Data ProjectsDocuments & Content

13©2015 Walgreens Boots Alliance. All rights reserved. Confidential and proprietary information

The management of documents and non‐structured content found in audio, video, email, images, etc. and the meta‐data associated 

with this material

Provides ability to collect, aggregate, and move huge log files ). into HDFS (e.g., apps, GPS, social, sensors, email, other).  

Provides ability to search of data in the cluster by indexing to enable full text search.

Page 14: Mar-10 Improving Data Management through utilizing Big Data - Mapping a Technology to a Data Concept v1

DMBOK Functions & Big Data ProjectsData Warehousing & Business Intelligence

14©2015 Walgreens Boots Alliance. All rights reserved. Confidential and proprietary information

A data warehouse is a subject‐oriented, integrated, time‐variant and non‐volatile collection of data in support of management's decision making process.  Business Intelligence is the collection of activities that allow an organization to analyze data 

and make decisions based on facts from historical and predictive data sets. 

Provides fast big table access to large quantities of data typically on top of the cluster.

Provides compute algorithm typically used to produce output data from a large volume of data in the cluster for consumption.

Provides semantic layer for accessing data in the cluster.

Provides a enhanced compute approach typically used to produce output data from a large volume of data in the cluster for consumption.

Provides a in‐memory compute method typically used to produce output data from a large volume of data in the cluster for consumption (e.g., machine learning algorithms).

Page 15: Mar-10 Improving Data Management through utilizing Big Data - Mapping a Technology to a Data Concept v1

DMBOK Functions & Big Data ProjectsData Security

15©2015 Walgreens Boots Alliance. All rights reserved. Confidential and proprietary information

Data security concerns the protection of data from accidental or intentional but unauthorized modification, destruction or disclosure through the use of 

physical security, administrative controls, logical controls, and other safeguards to limit accessibility.

Provides security authorization (grant/revoke), policy administration, and audit for the cluster.

Provides service level authorization for users/groups.

Provides semantic layer (table) for accessing data in the cluster that can be secured.

Page 16: Mar-10 Improving Data Management through utilizing Big Data - Mapping a Technology to a Data Concept v1

DMBOK Functions & Big Data ProjectsData Governance – Potential Opportunity Areas

16©2015 Walgreens Boots Alliance. All rights reserved. Confidential and proprietary information

Provides the organizational oversight, processes and methods to effectively manage data as an asset across the organization

Provides data lineage between data sources and the  cluster including integration with the metastore/catalog (e.g., Hive HCatalog).

Provides relational structure to HDFS data. File formats can be applied  to data from HDFS or local filesystem

Provides ability to search of data in the cluster by indexing to enable full text search.

Provides security authorization (grant/revoke), policy administration, and audit for the cluster.

Page 17: Mar-10 Improving Data Management through utilizing Big Data - Mapping a Technology to a Data Concept v1

17©2015 Walgreens Boots Alliance. All rights reserved. Confidential and proprietary information

Data about core business entities and concepts,  independent of transactions, and data that defines the set of permissible values to be 

used by other data fields

DMBOK Functions & Big Data ProjectsReference & Master Data – Potential Opportunity Areas

Provides ability to import data from a RDBMS to HDFS. 

Provides semantic layer for accessing data in the cluster.

Page 18: Mar-10 Improving Data Management through utilizing Big Data - Mapping a Technology to a Data Concept v1

Bio

Michael JenningsSenior Director, Enterprise Data ArchitectureWalgreens Boots Alliance1419 Lake Cook Road, MS: L497Deerfield, IL 60015  USA847 964 [email protected]/in/micahelfjennings

Michael Jennings is a recognized industry expert in enterprise architecture and informationmanagement with more than twenty-five years of experience in various industries. Mike speaksfrequently on enterprise architecture and information management concepts and practices at majorindustry conferences.

He is a co-author of the book "Universal Meta Data Models" (2004) and a contributing author to thebooks "Building and Managing the Meta Data Repository" (2000) and “The DAMA Guide to the DataManagement Body of Knowledge - DMBOK” (2009).

Mike was recognized with the 2013 DAMA International Professional Achievement Award and asone of Information Management Magazine’s 25 Top Information Managers for 2012.

He currently serves as VP of Programs for the Wisconsin DAMA Chapter and as VP of Operationsfor DAMA International.

18©2015 Walgreens Boots Alliance. All rights reserved. Confidential and proprietary information