dmac data integration what is it really? why does it seem frozen in place? how do we get it moving?...
TRANSCRIPT
DMAC Data IntegrationDMAC Data Integration
What is it really?What is it really?
Why does it seem frozen in place?Why does it seem frozen in place?
How do we get it moving? How do we get it moving?
Steve Hankin (NOAA/PMEL)Steve Hankin (NOAA/PMEL)
DMAC = Data Management and CommunicationsDMAC = Data Management and Communicationssubsystem of the US Integrated Ocean Observing System (IIOS)subsystem of the US Integrated Ocean Observing System (IIOS)
[ ]
June '07June '07 OCO Annual ReviewOCO Annual Review 22
Part 1. A Short DigressionPart 1. A Short Digression(begging your indulgence …)(begging your indulgence …)
What’s new in theWhat’s new in theObserving System Monitoring Center (OSMC)Observing System Monitoring Center (OSMC)
June '07June '07 OCO Annual ReviewOCO Annual Review 33
June '07June '07 OCO Annual ReviewOCO Annual Review 44
June '07June '07 OCO Annual ReviewOCO Annual Review 55
under the hood …under the hood …
Metadata feeds from NOAAPort & GODAEMetadata feeds from NOAAPort & GODAE
GODAE QC fields to be added next …GODAE QC fields to be added next …
A feed from NCEP ?A feed from NCEP ?
Goal: Goal: – Compare QC strategies.Compare QC strategies.– Compare GTS filters and feeds.Compare GTS filters and feeds.
June '07June '07 OCO Annual ReviewOCO Annual Review 66
Part 2. DMAC Data IntegrationPart 2. DMAC Data Integration(DMAC = Data Management and Communications subsystem of IOOS)(DMAC = Data Management and Communications subsystem of IOOS)
Just what is DMAC “data integration” ? Just what is DMAC “data integration” ? (and what is it not ?) (and what is it not ?)
Start with a taxonomy thru examples …Start with a taxonomy thru examples …
What is it really?What is it really?
Why does it seem frozen in place?Why does it seem frozen in place?
How do we get it moving? How do we get it moving?
June '07June '07 OCO Annual ReviewOCO Annual Review 1313
An analogy: the electric power gridAn analogy: the electric power grid
Energy goes in. Energy comes out.Energy goes in. Energy comes out.Providers do not target specific consumers.Providers do not target specific consumers.They just adhere to standards (60Hz).They just adhere to standards (60Hz).
Consumers are not aware of specific providersConsumers are not aware of specific providers..
Analogy appears simplistic until you refine your Analogy appears simplistic until you refine your concept of data. concept of data. Data must always be tightly Data must always be tightly bound to its metadata.bound to its metadata.
DMAC integration is a “data grid”DMAC integration is a “data grid”
The concept of “integration” in DMACThe concept of “integration” in DMAC
Analogy is simplistic?Analogy is simplistic?
June '07June '07 OCO Annual ReviewOCO Annual Review 1414
The DMAC Plan (2004) is built The DMAC Plan (2004) is built around a “data grid” conceptaround a “data grid” concept
(a.k.a. “data commons”) (a.k.a. “data commons”)
Uniform services (standards)Uniform services (standards)– to interconnect existing systemsto interconnect existing systems
““Do no Harm”Do no Harm”
Existing standards are inadequateExisting standards are inadequate An implementation plan, An implementation plan,not a specificationnot a specification
240 pages
How far have we progressed?How far have we progressed?
June '07June '07 OCO Annual ReviewOCO Annual Review 1515
Honest answer: Honest answer: barely at allbarely at all..
Why?Why?
1.1. Formulation choices in the DMAC PlanFormulation choices in the DMAC Plan
2.2. Political chaosPolitical chaos
3.3. Community social structureCommunity social structure
How do we overcome each of these obstacles?How do we overcome each of these obstacles?
How far has DMAC progressed since 2004?How far has DMAC progressed since 2004?
June '07June '07 OCO Annual ReviewOCO Annual Review 1616
DMAC Plan has detailed milestonesDMAC Plan has detailed milestonesBut they are not sufficiently tangibleBut they are not sufficiently tangible– e.g. “publish a community standard for [xxx]”.– e.g. “publish a community standard for [xxx]”.
Solution: Reformulate the Plan as a sequence of Solution: Reformulate the Plan as a sequence of tasks that each provide tangible benefits.tasks that each provide tangible benefits.
Obstacle 1: Obstacle 1: Formulation choices in the planFormulation choices in the plan
June '07June '07 OCO Annual ReviewOCO Annual Review 1717
Dumb, bad luck timing (post 9/11)Dumb, bad luck timing (post 9/11) & &Interagency coordination failuresInteragency coordination failures
lead tolead to
Negligible direct fundingNegligible direct funding(just enough for “volunteer” meetings)(just enough for “volunteer” meetings)
(Note: millions have been made available that (Note: millions have been made available that generated additional demand for DMAC guidance)generated additional demand for DMAC guidance)
Solution: Better marketing. Map out a Plan that Solution: Better marketing. Map out a Plan that can be marketed to Gov’t managerscan be marketed to Gov’t managers
Obstacle 2: Obstacle 2: Political chaosPolitical chaos
June '07June '07 OCO Annual ReviewOCO Annual Review 1818
Obstacle 3: Obstacle 3: Community social structureCommunity social structure
The diminutive nation of The diminutive nation of Science Data Management Science Data Management lies nestled among three lies nestled among three neighbors:neighbors:
1.1. IT InfrastructureIT Infrastructure2.2. Computer ScienceComputer Science3.3. Science ResearchScience Research
Each is larger and more Each is larger and more powerful and imposes its powerful and imposes its viewpoint on our small viewpoint on our small nation.nation.
Science Research
Computer Science
IT Infrastructure
DataMgmt
June '07June '07 OCO Annual ReviewOCO Annual Review 2121
Obstacle 3: Obstacle 3: Community social structureCommunity social structure
3. Science/Research viewpoint:3. Science/Research viewpoint:
““Reduce complexity by limiting the number of Reduce complexity by limiting the number of variables to be considered initiallyvariables to be considered initially.”.” But data management challenges are largely But data management challenges are largely independent of data content. independent of data content.
Analogy: would it reduce complexity in designing an ocean Analogy: would it reduce complexity in designing an ocean glider if it only had to measure temperature?glider if it only had to measure temperature?
Data management simplifies by reducing the Data management simplifies by reducing the number of data number of data structuresstructures (a.k.a. “data models”).(a.k.a. “data models”).
June '07June '07 OCO Annual ReviewOCO Annual Review 2323
Proposal: Build the DMAC integration framework as a Proposal: Build the DMAC integration framework as a collection of Virtual Data Assembly Centerscollection of Virtual Data Assembly Centers(“V-DACs”) (“V-DACs”) by data structure.by data structure.
To be developed one-by-one:To be developed one-by-one:
1.1. Grids (models, satellites, climatologies)Grids (models, satellites, climatologies)2.2. Time seriesTime series3.3. Surface TracksSurface Tracks4.4. Vertical Profiles and SectionsVertical Profiles and Sections5.5. ……, Scatters, Swaths, Radials, Polygons, … , Scatters, Swaths, Radials, Polygons, …
June '07June '07 OCO Annual ReviewOCO Annual Review 2525
time series protocol
Time series V-DAC
Meta-data
TAO BATS
OceanSites
U. Hawaii Sea Level Center
NDBC
NODC
• bricks-and-mortar time series “curator” (funded)
• standard protocol(s) (“web services”)
• one access point
• multiple variables
Imagine the V-DAC for time series data Imagine the V-DAC for time series data
June '07June '07 OCO Annual ReviewOCO Annual Review 2626
also fund a metadata development activity:also fund a metadata development activity:– Data discoveryData discovery– Controlled vocabulariesControlled vocabularies– Data lineageData lineage– Geo-referencingGeo-referencing– Instrument characterizationsInstrument characterizations– Quality control Quality control
How do we build an ocean temperature V-DAC? How do we build an ocean temperature V-DAC?
Time series V-DAC
Meta-data
Profiles V-DAC
Meta-data
Grids V-DAC
Meta-data
Temperature V-DAC
Meta-data
A single place to access all ocean temperature data A single place to access all ocean temperature data
June '07June '07 OCO Annual ReviewOCO Annual Review 2828
The virtues of this approach:The virtues of this approach:
Reductionism: One protocol at a timeReductionism: One protocol at a timeA concrete deliverable at every stepA concrete deliverable at every stepUnites communities of interest (integration)Unites communities of interest (integration)
But can we market the idea to management?But can we market the idea to management?(Who has the ability to carry the message to management?)(Who has the ability to carry the message to management?)
The science community has a strong voice.The science community has a strong voice.(Much stronger than DM.)(Much stronger than DM.)
June '07June '07 OCO Annual ReviewOCO Annual Review 2929
DiscussionDiscussion(Thank you)(Thank you)