towards better pipeline data governance - · other related data to calculate maop or...

Towards Better Pipeline Data Governance J. Tracy Thorleifson Eagle Information Mapping, Inc.

Upload: duongbao

Post on 27-Jul-2018




0 download


Towards Better Pipeline Data Governance

J. Tracy Thorleifson

Eagle Information Mapping, Inc.


• The limitations of data – “There are known knowns. These are things we know

that we know. There are known unknowns. That is to

say, there are things that we know we don't know. But

there are also unknown unknowns. There are things we

don't know we don't know.” – Donald Rumsfield • Lessons from manufacturing process management – “If you can't describe what you are doing as a process,

you don't know what you're doing.” – W. Edwards Deming

• Pipelines and Black Swans – “It’s tough to make predictions, especially about the

future.” – Yogi Berra

The $64 Question

• “Unfortunately in the San Bruno accident, we found that

the company’s underlying records were not accurate... My

question is that if your many efforts to improve safety are

predicated on identifying risk, and if your baseline

understanding of your infrastructure is not accurate, how

confident are you that your risks are being assessed


– Deborah Hersman, NTSB Chairman, at the National Pipeline Safety Forum, April 18, 2011

The $Billion Answer

• Is this a pipe? • Is this a pipeline?

• “The map is not the territory.”

– Alfred Korzibski, 1931

The Process of Data Abstraction

• Your pipeline database isn’t the real pipeline – The pipeline database is a representation of the


Common Pitfalls in Digital Pipeline Data Abstraction

• Source documents that summarize information – Alignment sheets summarize pipe data

• Individual joints of pipe are typically not represented

• Insufficient detail in source documents – Records for older pipelines may simply not contain

information we now require • Source documents that do not accurately reflect

change over time – Missing repair records – Missing assessment records

• Insufficient documentation of data provenance – Lack of metadata regarding the source of the data

The Goal

• “As PHMSA and NTSB recommended, operators relying on the review of design, construction, inspection, testing and other related data to calculate MAOP or MOP must assure that the records used are reliable. An operator must diligently search, review and scrutinize documents and records, including but not limited to, all as-built drawings, alignment sheets, and specifications, and all design, construction, inspection, testing, maintenance, manufacturer, and other related records. These records shall be traceable, verifiable, and complete.” – PHMSA Advisory Bulletin ADB-11-01

Information Manufacture

• The process of converting raw data to refined information is fundamentally a manufacturing process – Too often, we approach information creation like skilled

artisans • Information is crafted, not manufactured • Process uniformity is lacking • Data validation, verification and clean up is performed as a

custom, “one off” event • Reproducibility is dependent on the skill of the practitioner (i.e.

the Subject Matter Expert) – While results may be acceptable, it’s a grossly inefficient

way to run a business

Tools for Success Borrowed from Manufacturing Process Management • Six Sigma – Process improvement through defect reduction and

process uniformity

• Lean Manufacturing – Process improvement through elimination of waste

• Theory of Constraints (TOC) – Process improvement through maximization of


• All concentrate on DEFECT PREVENTION

Lessons from Six Sigma

• ϲʍ - if there are six standard deviations between the process mean and the nearest specification limit, the process yield is 99.99966% – 3.4 defects per million operations

• Define and document your processes • Establish process metrics

– Data cycle time – Data defect incidence

• Analyze results; improve the process • Institute process controls to prevent defects – Fail safe data checks to prevent bad data from entering

the system

Lessons from Lean Manufacturing

• Identify and relentlessly eliminate wastes – Long data cycle times – Bad data

• Incorporate “autonomation” (smart automation) in your fail safe checks – Computers are lousy at correcting problems,

but great at identifying them – Utilize the power of GIS • Incorporate spatial context into your autonmated fail


Lessons from Theory of Constraints

• Indentify process constraints, address them in priority order – Complicated processes are like rate-limited

chemical reactions • The overall reaction rate is constrained by the

slowest reaction step

• Speed up the slowest reaction step, and the overall reaction rate increases

Document Your Data!

• The most accurate information is worthless if you don’t know where it comes from – Make data provenance a priority – Treat data like courtroom evidence

• Document the chain of custody

• Popular pipeline data models like PODS and the APDM facilitate only record-level history tracking – This is necessary, but insufficient – Data edits should be tracked at the attribute level – The outcome of every decision branch in the data

manufacturing process should be recorded

The Problem of Induction, Black Swans, and Thermodynamics • The problem of induction (as explored by English

philosopher David Hume) – During much of the 17th century, an Englishman could seemingly state with confidence, “all swans we have seen are white; therefore all swans are white.” – Black swans were discovered in Australia in 1697

• A Black Swan is: – Any event, positive or negative, that is highly improbable, and

results in nonlinear consequences • Black Swans do not conform to Gaussian distributions, but rather obey

Pareto (power law) distributions – An outlier event; nothing in our past experience convincingly points

to its possibility • “It’s the Second Law of Thermodynamics: Sooner or later

everything turns to $#!+.” – Woody Allen

Black Swans and Narrative Fallacy

• Human beings are incredibly adept at explaining things – This leads to an unwarranted confidence in our ability to

predict outcomes resulting from complexly interacting phenomena

– Explanation т Prediction

• “Things always become obvious after the fact.” – Nassim Nicholas Taleb

• Question: How good are our risk models, really?

Black Swans and Diagnostic Testing

• Nuclear Cardiac Stress Testing is used to diagnose Coronary Artery Disease (CAD) – Sensitivity = 91%

• Failure to detect disease = 9% • In other words, it’s about the same as playing Russian Roulette

with a revolver that has ten cartridge chambers

– Specificity = 72% • False positives = 28%

– Utility as a predictor of Acute Coronary Syndrome • “The current myocardial perfusion imaging toolset has limited

sensitivity for screening patients who are at risk for ACS.”

– Question: Is hydrostatic testing a panacea for incomplete pipeline records?

Mitigation vs. Common Sense

• State-of-the-art shark bite

risk mitigation: – The Neptunic shark suit

• Designed to mitigate the effects of unsolicited social interactions with hungry sharks of the “bitey” variety

• Chainmail-style protection provides the diver with full body coverage

• Common sense: – Avoid risk


Conclusion • Data can never represent the physical world with

complete fidelity – We don’t really know much of what we think we know

• Information creation is a manufacturing process 1. We don't know what we don't know. 2. If we can't express what we do know numerically, we don't really

know much about it. 3. If we don't know much about it, we can't control it. 4. If we can't control it, we are at the mercy of chance.

• Dr. Mikel J. Harry

• Black Swans are unpredictable and unavoidable

– The best you can accomplish is Black Swan robustness – Hubris is fatal