rapid data integration and curation

22
| ©2013, Cognizant Rapid Data Integration and Curation Delivering Business Value in the First 24 Hours SPEAKER: Thomas Kelly, Practice Director Semantic Technology Center of Excellence Enterprise Information Management Cognizant Technology Solutions, Inc.

Upload: thomas-kelly-pmp

Post on 28-Nov-2014

408 views

Category:

Technology


5 download

DESCRIPTION

Organizations must onboard new data sources more frequently and quickly. In this presentation, you will learn about practices that rapidly deliver business value, while shrinking time to business value from months to days. Business decisions are becoming increasingly dependent on analyzing an ever-greater volume of data coming from a growing number of sources. Mobile technology is providing immediate access to data whenever and wherever it is needed. Users, customers, and business partners are waiting for answers, and the organization must reduce the time required to collect, understand, and analyze the data needed to provide those answers. Modern enterprises need to increase the agility, flexibility, and speed with which they can analyze a growing volume, variety, and velocity of data. This presentation discusses a method for rapid data integration and curation: - Techniques for light data integration of new data with existing data assets - Framework for data quality management - Refining data integration through evolutionary modeling - Managing curation processes - Validating business value Timely delivery of new data assets allows users to begin asking questions earlier and getting answers more quickly, allowing the organization to uncover the new insights that drive lasting business benefits.

TRANSCRIPT

Page 1: Rapid data integration and curation

| ©2013, Cognizant

Rapid Data Integration and Curation Delivering Business Value in the First 24 Hours

SPEAKER: Thomas Kelly, Practice Director Semantic Technology Center of Excellence Enterprise Information Management Cognizant Technology Solutions, Inc.

Page 2: Rapid data integration and curation

| ©2013, Cognizant

Agenda

2

DELIVERING BUSINESS VALUE

BARRIERS TO RAPID DATA INTEGRATION

RAPID DATA INTEGRATION AND CURATION METHOD

1

2

3

Page 3: Rapid data integration and curation

| ©2013, Cognizant

We are at an Inflection Point at which Value is Created or Destroyed

3

Source : The Motley Fool

Page 4: Rapid data integration and curation

| ©2013, Cognizant

Delivering Information Faster Produces Direct, Measurable Business Value

4

What Difference Does One Day Make?

A blockbuster drug generates $3M+ in revenue per day; a one-day delay in completing clinical trials can generate up to $500K in additional costs

Banking

A moderate-sized brokerage firm can generate up to $1M in financial services revenue per day

Page 5: Rapid data integration and curation

| ©2013, Cognizant

Barriers to Rapid Data Integration

Rework is expensive – must “get it right” from the start

5

Knowledge acquisition takes time; new insights come from experimentation

Fit with the existing data; avoid data silos

Reconciling differences (data formats, coding, identifiers, etc.)

Managing data quality (accuracy, precision, context)

Overcoming process inertia

Page 6: Rapid data integration and curation

| ©2013, Cognizant

Evolutionary Method to Data Integration and Curation

6

• As new information flows into the enterprise, people and processes are dynamic in nature

• Questions arising during this phase are “what to do” and “how to make the best sense of the new data source”. Rapid integration tools will aid in quick prototyping and building solutions of value

• As we progress, issues with the new data are identified and managed. The main focus is on establishing data quality and adhering to enterprise standards and frameworks while building optimal integration approaches

• The integration process is evolutionary as further discoveries are made for optimal design

• Data management evolves to a more-refined state. A feedback loop is built to enable proactive decisions around data organization and access.

• Data integration is efficient and stable. Verifiable compliance and security.

• Integrated with the enterprise information management framework

Responsive Managed Proactive

Data Approach

• The data is profiled and explored for value and quality issues.

• A rapid pruning exercise is undertaken by prototyping and integrating with in-house data to evaluate if data is fit for purpose. It influences in formulating a effective approach for further phases.

• Progressive build based on the new data.

• Building awareness of the new platform and fine tuning the capabilities around the data source are primary activities

• The services built around the new data sources are now managed.

• The focus is on evolution of business processes, based on managed models

Rapid Evolutionary Predictable

Integration and Curation Method

Tactical Progressive Managed Information Management

Approach

First 1-5 Days First 1 -3 Months After 3 months Time

Page 7: Rapid data integration and curation

| ©2013, Cognizant

Leverage Insights and Expertise, Rapidly and Sustainably

7

Ingest new data sources (light

integration and curation) Reuse Expertise

Identify and leverage existing, relevant data assets and expertise

Analyze

Extend

Create and extend data relationships,

leveraging insights from previous study cycles

Refine Capture insights from new data

analysis cycles, refining relationships to support new

analytics

Govern

Elevate proven data, relationships, and expertise

to organization-wise definition

Monitor and measure use and benefits

achieved; identify next set of priorities

Realize Benefits

Page 8: Rapid data integration and curation

| ©2013, Cognizant

Can You Help Me With Some Data?

8

Page 9: Rapid data integration and curation

| ©2013, Cognizant

Rapid Data Integration and Curation Method

9

Define Preliminary Objectives 1 Profile the New Data 2

Generate Initial Ontology for the New Data 3 Generate Initial Ontology for the Existing Data (if necessary) 4

Integrate Entities over Common URIs 5 Create URI Links 6

Add Initial Data Quality Filters 7 Analyze Data and Generate Feedback 8

Page 10: Rapid data integration and curation

| ©2013, Cognizant

1. Define Preliminary Objectives

10

1. Discuss Functional and Timing Objectives, and Priorities

2. Clarify Immediate, Short-Term, and Long-Term Business Value (SMART *)

a. Cost Reduction/Avoidance b. Meet Critical Customer Need

3. Is This the Right Solution?

4. Set Expectations a. Evolutionary Process b. Initial Results Quickly c. Frequent, Active Participation d. Feedback Critical to Making Refinements

5. Brainstorm Deliverables that Produce Business Benefits; Define a Few Sample Queries

6. Ask for Commitment to Benefits Realization

7. Start the Clock!

* SMART -- Specific, Measurable, Attainable, Realistic, and Traceable

Page 11: Rapid data integration and curation

| ©2013, Cognizant

2. Profile the New Data

11

Light Profiling, focusing on Understanding Key Data Elements Needed to Meet the First Deliverable

Identify Initial Data Filtering Candidates

Capture Insights about Key Data Relationships

Page 12: Rapid data integration and curation

| ©2013, Cognizant

3. Generate Initial Ontology for the New Data

12

Reverse-engineer Ontology from New Data

Load New Data into the RDF Store (or Create Link to the Data)

Create Business-relevant Synonyms for High-Importance Attributes

Refinements will be made in Future Iterations

Page 13: Rapid data integration and curation

| ©2013, Cognizant

4. Generate Initial Ontology for the Existing Data (if necessary)

13

Map Selected Entities and Critical Attributes for Existing Data Source(s) to the Source-specific Ontology

Add Reference to the Source-specific Ontology to the New Data Ontology

Refinements will be made in Future Iterations

New Data Ontology manages integration with Existing Data until the ontology is sufficiently mature to be promoted into an enterprise ontology

Existing Data

New Data

Page 14: Rapid data integration and curation

| ©2013, Cognizant

5. Integrate Entities over Common URIs

14

Different URIs, Separately Maintained

Focus on Key Entities

Equivalence Functions Logically Integrate the Federated Data

Reduces Query Complexity and Can Improve Query Performance

Page 15: Rapid data integration and curation

| ©2013, Cognizant

6. Create URI Links

15

Links Reduce Query Complexity and Can Improve Query Performance

The Data has Common Values that can be used in Join Operations, but Doesn’t have Links

Focus on Key Queries, Identify Complex or Time-Sensitive Joins

Add Linking URI Attribute to Dependent Entity

Amend Selected Queries to Leverage the New Link

cust:ZipCode

geo:ZipCode JOIN

Customer Geography

cust:ZipCodeURI

LINK

Customer Geography

Page 16: Rapid data integration and curation

| ©2013, Cognizant

7. Add Initial Data Quality Filters and Transformations

16

JIT Data Quality Management, Everywhere that it is Needed

Data Filtering and Transformation Rules are Encoded in the Ontology

Focus is on Critical Data Quality Rules

Rule Updates are Automatically in Effect, without Reloading All of the Data

Traditional Data Warehouse

Data Quality Happens Here

Data Source A

Data Source B

Data Source C

Data Warehouse ETL

Existing Data

New Data

Data Quality Happens Here

And Here

Page 17: Rapid data integration and curation

| ©2013, Cognizant

8. Analyze Data and Generate Feedback

17

Demonstrate Visualization using Sample Queries

Walk Through Available Data Sets and Data Organization

Experiment with Data Access and New Visualizations

Provide Next Steps Recommendations to Refine the Data Integration and Curation

Page 18: Rapid data integration and curation

| ©2013, Cognizant

Architectural Foundation for Rapid Data Integration and Curation

18

Data Profiling Automated Ontology Generation

Relational-to-RDF Mapping Ontology Editor

RDF Store

SPARQL-based Visualization

RDF Store Data Import

Page 19: Rapid data integration and curation

| ©2013, Cognizant

Capabilities That We Have Introduced

19

Rapid Response to New Data Onboarding Needs

Process for Evolutionary Data Integration and Curation

Flexible Design that is Responsive to Business Changes

Foundation for Refinement and Expansion of Ontology Models from Fit-for-Purpose to Department, to Business Unit, to Enterprise

Page 20: Rapid data integration and curation

| ©2013, Cognizant

Questions?

20

Page 21: Rapid data integration and curation

| ©2013, Cognizant

Thank you!

21

Page 22: Rapid data integration and curation

| ©2013, Cognizant

Speaker

22

Thomas (Tom) Kelly Practice Director, Enterprise Information Management, Cognizant

Thomas Kelly is a Director in Cognizant’s Enterprise Information Management (EIM) Practice and heads its Semantic Technology Center of Excellence, a technology specialty of Cognizant Business Consulting (CBC). He has 20-plus years of technology consulting experience in leading data warehousing, business intelligence and big data projects, focused primarily on the life sciences and healthcare industries. Tom can be reached at [email protected].