www.techinmotioninc.com data quality: prerequisite for data sharing bonnie k. o’neil sr. principal...
TRANSCRIPT
www.techinmotioninc.com
Data Quality: Prerequisite for Data Sharing
Bonnie K. O’Neil
Sr. Principal Data Architect
PPC
Yazmin Rowe
Data Architect
Technology in Motion, Inc.
2
Agenda
►Case Study Background►Data Quality Framework►Proof of Concept►Incremental Approach►Take-aways
3
Case Study
►Federal Bureau►Legal/Catching Bad Guys►Data, data everywhere…►President’s directives to share data
►Especially around terrorism
►Data quality an issue (where is it NOT an issue?!)►Initiative: Implement Data Quality Program to enable Data
Sharing
4
State of Affairs…
►No enterprise data model►No enterprise data dictionary►They DO have Data Management in place
►DM Reviews►DM Handbook
►Data Sharing Challenges:►No “gold copy” or source of record for data elements►Different systems, supposedly same data element, different values►How many systems is this data element in?►Impact analysis►Data sharing simply not possible!
5
Framework Fundamentals:Dictionary is the Heart of Data Quality
►How can you tell what “bad data” is if you don’t know what it is supposed to be in the first place?
►Dictionary tells you this►Good definitions spell out the expectation for the data
►You know it is a good definition when you are able to tell if the data conforms to expectations or not►Should be able to compare results of profiling with the definition for
the field►Does the data conform to the definition?
6
Framework Fundamentals:Top Down and Bottom Up Data Management
►Profiling to see what’s really in the fields►Examine data element definitions (where they exist) to
determine what the business thinks the fields contain►Enabled business person (Data Steward) to be key player
7
Study in 2007
►First Six Months 2007:►Enterprise-Wide Data Quality Standard/Procedures
►Standards are part of the infrastructure necessary to share data externally
►Enterprise Data Quality Framework ►Strategic and tactical approach
►Successful Proof-of-Concept Project►An enterprise data model
►A data dictionary
►In 2007:►Software Selection for Data Quality Framework►Kick-off First Business Unit Data Quality Project
8
The Proof of Concept
►All kinds of politics with getting new projects approved►Especially with Production Data
►We performed a Proof of Concept (POC)►Isolated environment; testing, using PC’s►Agreed to get rid of production data after we have profiled it
►Language is Critical►Can’t call it “Data Profiling”►Instead, called it “Data Demographic Analysis”
9
Proof of Concept Cont’d
►Found a Sponsor►Good friends with the DM Team Lead►Recent DQ “issue”►Has motivation to look into this
►Formed an interdisciplinary project team (IPT)►Involved many people from different areas of the business
10
Shoestring Principle
►Bonnie’s Law:►“Use Whatever is Laying Around”
►You will be surprised at what you find when you look for “whatever is lying around”►Already purchased software►Software/hardware scrapped from a failed project►Under-utilized systems
11
Using Bonnie’s Law
►Repository products were too expensive►Had Oracle Warehouse Builder (OWB) lying around►New OWB has a data profiling option
►Good News: Saved us from having to buy a separate profiling tool►Bad News: OWB was an option (meaning money)
►Still cheaper than having to buy separate profiling tool
►HAD TO HAVE Profiling!!►Using it NOT for ETL!
12
Benefits
►Statistics on their data►Profiling: Min/Max, % NULL, % Distinct, format/pattern, etc.
►Cannot manage what you cannot measure!►Immediately pinpoint data quality issues►Traceability to data concepts (EDM)
►Show multiple occurrences of same type of data►Setting the infrastructure in place for a “super query”
►Provided a “straw man” data quality methodology (Framework)►In draft►Solicited comments from everybody►Helps get buy-in BIG TIME instead of shoving it down their throats►Users felt included instead of alienated
13
Scoping
►Divide & conquer►Pick a subject area
►Less complex semantics►PEOPLE
►Limit systems for the POC►Three, but ended up with two►Not overly complex but not
simple either
►Kept refining the scope
14
Incremental: Growing the Project
►If the user likes it, this project can “graduate” to a “real” project►More complex subject areas ►More systems
►This is actually what happened!►Sell to other groups in the bureau►EDM will grow incrementally►Successfully established a Data Quality Program at the
Bureau
15
Take-Aways
► You must ALWAYS do data profiling!! Essential!! For anything. Period.► Try to use what you have instead of buying something new &
expensive► You’d be surprised what you find “lying around”
► Involve the users and other groups within the business ► Especially in creating a methodology
► Lets them feel a part of the creation
► Language is very important to people► Sometimes I have seen the term “Data Warehouse” disliked
► Get your project funded by tagging it along with a business goal► Funding the EDM by way of data quality
► Find business hot button and propose to solve it
16
Future Plans
►Data Governance►Data Inventory►Master Data Management (MDM) ►Formal integration of data quality measurements into SLC►Linking the EDM to application data
►Suck in the data from the source systems►Suck in the EDM from data modeling tool►Map the two►“Virtual” mapping
►No data movement taking place
17
Conclusion
►In order to achieve data sharing, you must clean up the data first
►You can get data quality projects funded if you:►Start small►Solve an important business problem►Establish a framework►Get a sponsor who sees the business value in what you are
providing►Be politically savvy about word usage; don’t use their charged
words►Get business people involved and participating►Limit expenditures at first, until you have proven the business
benefit►Do a POC to test drive your approach (“Try it, you’ll like it”)
►Isolate it from production applications
18
Thanks!
Bonnie O’NeilProject Performance Corporation
24771 Westridge Rd.Golden, CO 80403
Office: 303-642-3534 Cell: 303-725-1737
PPC is based in the Washington, DC areaand performs both Government and Commercial work
IT Consulting, Project Management
Yazmin RoweTechnology in Motion, Inc.
Office: 703-278-0792 Cell: 301-915-4471
19
Reference
►Newly released book : ►Business Metadata►Authors:
►Bill Inmon►Bonnie O’Neil►Lowell Fryman
►Making metadata useful to the business►Does metadata need to be translated into “business speak”?►Where does business metadata live?►What do you have to set in place to implement it?►How do you do a Vulcan Mind Meld to get it out of people’s heads?