data integration, analysis, and synthesis
DESCRIPTION
Data Integration, Analysis, and Synthesis. Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara Scalable Information Networks for the Environment. http://knb.ecoinformatics.org - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Data Integration, Analysis, and Synthesis](https://reader035.vdocument.in/reader035/viewer/2022081422/56816057550346895dcf8188/html5/thumbnails/1.jpg)
Data Integration, Analysis, and Synthesis
Matthew B. JonesNational Center for Ecological Analysis and Synthesis
University of California Santa Barbara
Scalable Information Networks for the Environment
http://knb.ecoinformatics.org
Funding: National Science Foundation (DEB99-80154, DBI99-04777)
![Page 2: Data Integration, Analysis, and Synthesis](https://reader035.vdocument.in/reader035/viewer/2022081422/56816057550346895dcf8188/html5/thumbnails/2.jpg)
NCEAS’ Mission
Integrate existing data for broad ecological synthesis
Use synthesis to inform policy and management
![Page 3: Data Integration, Analysis, and Synthesis](https://reader035.vdocument.in/reader035/viewer/2022081422/56816057550346895dcf8188/html5/thumbnails/3.jpg)
Synthesis at NCEAS
Research Management Policy
200+ synthesis projects 1900+ participating scientists
![Page 4: Data Integration, Analysis, and Synthesis](https://reader035.vdocument.in/reader035/viewer/2022081422/56816057550346895dcf8188/html5/thumbnails/4.jpg)
Research projects Hunsaker – Quantification of Uncertainty in
Spatial Data for Ecological Applications Ives & Frost – Intrinsic and Extrinsic Variability
in Community Dynamics Osenberg -- Meta-Analysis, Interaction
Strength and Effect Size; Application of Biological Models to the Synthesis of Experimental Data
Murdoch – Complex Population Dynamics
![Page 5: Data Integration, Analysis, and Synthesis](https://reader035.vdocument.in/reader035/viewer/2022081422/56816057550346895dcf8188/html5/thumbnails/5.jpg)
Management projects Andelman – Designing and Assessing the
Viability of Nature Reserve Systems at Regional Scales: Integration of Optimization, Heuristic and Dynamic Models
Boersma & Kareiva – Prospectus For An Analysis of Recovery Plans and Delisting
Kareiva – Habitat Conservation Planning for Endangered Species
Lubchenco, Palumbi, & Gaines – Developing the Theory of Marine Reserves
![Page 6: Data Integration, Analysis, and Synthesis](https://reader035.vdocument.in/reader035/viewer/2022081422/56816057550346895dcf8188/html5/thumbnails/6.jpg)
Policy projects Costanza & Farber -- The Value of the World's
Ecosystem Services and Natural Capital: Toward a Dynamic, Integrated Approach
http://www.nceas.ucsb.edu/
![Page 7: Data Integration, Analysis, and Synthesis](https://reader035.vdocument.in/reader035/viewer/2022081422/56816057550346895dcf8188/html5/thumbnails/7.jpg)
Synthesis projects
Use existing data...
Distributed sources Varying protocols Varying formats
Obtained via personal collaboration
![Page 8: Data Integration, Analysis, and Synthesis](https://reader035.vdocument.in/reader035/viewer/2022081422/56816057550346895dcf8188/html5/thumbnails/8.jpg)
Functional breakdown Functional breakdown for synthesis
Data discovery Data access Data storage Data interpretation
Quality assessment Data Conversion & Integration Analysis & Modeling Visualization
![Page 9: Data Integration, Analysis, and Synthesis](https://reader035.vdocument.in/reader035/viewer/2022081422/56816057550346895dcf8188/html5/thumbnails/9.jpg)
Presentation Outline Integration, Analysis, and
Synthesis:
Challenges
![Page 10: Data Integration, Analysis, and Synthesis](https://reader035.vdocument.in/reader035/viewer/2022081422/56816057550346895dcf8188/html5/thumbnails/10.jpg)
Population survey Experimental Taxonomic survey Behavioral Meteorological Oceanographic Hydrology …
Data Heterogeneity Economic Social (urban
ecology) Paleoecological Historical
Land use Demographics
![Page 11: Data Integration, Analysis, and Synthesis](https://reader035.vdocument.in/reader035/viewer/2022081422/56816057550346895dcf8188/html5/thumbnails/11.jpg)
Types of Heterogeneity Intensional vs. Arbitrary Heterogeneity
Syntax (format) CSV, Fixed ASCII, proprietary binary
Schema (organization) Non-normalized models
Semantics (meaning/methods) Protocol semantics (e.g., scale) Parameter semantics (e.g., bodysize (g)) Conceptual framework (e.g., experimental trts) Taxonomy + nomenclature
![Page 12: Data Integration, Analysis, and Synthesis](https://reader035.vdocument.in/reader035/viewer/2022081422/56816057550346895dcf8188/html5/thumbnails/12.jpg)
Data Dispersion Data are distributed among:
Independent researcher holdings Research station collections
LTER Network (24 sites) Org. of Biological Field Stations (168 sites) Univ. Cal Natural Reserve System (36 sites) MARINE (62 sites) PISCO
Agency databases Museum databases
Access via personal networking Not scalable
![Page 13: Data Integration, Analysis, and Synthesis](https://reader035.vdocument.in/reader035/viewer/2022081422/56816057550346895dcf8188/html5/thumbnails/13.jpg)
Lack of Metadata Majority of ecological data
undocumented Lack information on syntax, schema and
semantics of data Impossible to understand data without
contacting the original researchers
Documentation conventions widely vary Requires large time investment to
understand each data set
![Page 14: Data Integration, Analysis, and Synthesis](https://reader035.vdocument.in/reader035/viewer/2022081422/56816057550346895dcf8188/html5/thumbnails/14.jpg)
Scaling Data Integration Because of:
Data heterogeneity Data dispersion Lack of documentation
Integration and synthesis are limited to a manual process Thus, difficult to scale integration
efforts up to large numbers of data sets
![Page 15: Data Integration, Analysis, and Synthesis](https://reader035.vdocument.in/reader035/viewer/2022081422/56816057550346895dcf8188/html5/thumbnails/15.jpg)
Data IntegrationDate Site Species Area Count 10/1/1993 N654 PIRU 2 26 10/3/1994 N654 PIRU 2 29 10/1/1993 N654 BEPA 1 3
Date Site picrub betpap31Oct1993 1 13.5 1.614Nov1994 1 8.4 1.8
Date Site Species Density 10/1/1993 N654 Picea
rubens 13
10/3/1994 N654 Picea rubens
14.5
10/1/1993 N654 Betula papyifera
3
10/31/1993 1 Picea rubens
13.5
10/31/1993 1 Betula papyifera
1.6
11/14/1994 1 Picea rubens
8.4
11/14/1994 1 Betula papyifera
1.8
A
B
C
![Page 16: Data Integration, Analysis, and Synthesis](https://reader035.vdocument.in/reader035/viewer/2022081422/56816057550346895dcf8188/html5/thumbnails/16.jpg)
Presentation Outline Integration, Analysis, and
Synthesis:
Challenges Current work
Knowledge Network for Biocomplexity Partnership for Biodiversity Informatics
![Page 17: Data Integration, Analysis, and Synthesis](https://reader035.vdocument.in/reader035/viewer/2022081422/56816057550346895dcf8188/html5/thumbnails/17.jpg)
Knowledge Network for Biocomplexity (KNB) National network for biocomplexity
data Data discovery Data access Data interpretation
Enable advanced services Data integration Analysis framework Hypothesis modeling Visualization
![Page 18: Data Integration, Analysis, and Synthesis](https://reader035.vdocument.in/reader035/viewer/2022081422/56816057550346895dcf8188/html5/thumbnails/18.jpg)
Central Role of Metadata What metadata?
Ownership, attribution, structure, contents, methods, quality, etc.
Critical for addressing data heterogeneity issues
Critical for developing extensible systems
Critical for long-term data preservation
Allows advanced services to be built
![Page 19: Data Integration, Analysis, and Synthesis](https://reader035.vdocument.in/reader035/viewer/2022081422/56816057550346895dcf8188/html5/thumbnails/19.jpg)
KNB Components Ecological Metadata Language (EML) Morpho -- data management for ecologists
Cross platform Java application Metacat -- flexible metadata & data system
Analysis and Modeling engine Data integration engine Semantic Query Processor Hypothesis Modeling Engine
![Page 20: Data Integration, Analysis, and Synthesis](https://reader035.vdocument.in/reader035/viewer/2022081422/56816057550346895dcf8188/html5/thumbnails/20.jpg)
Ecological Metadata Language
XML syntax for representing metadata
Extensible – can add new metadata
Modular – can subset metadata for specific applications
![Page 21: Data Integration, Analysis, and Synthesis](https://reader035.vdocument.in/reader035/viewer/2022081422/56816057550346895dcf8188/html5/thumbnails/21.jpg)
EML 2.0beta3 modules eml-resource -- Basic resource info eml-dataset -- Data set info eml-literature -- Citation info eml-software -- Software info eml-party -- People and Organizations
eml-entity -- Data entity (table) info eml-attribute -- Attribute (variable) info eml-constraint -- Integrity constraints eml-physical -- Physical format info eml-access -- Access control eml-distribution -- Distribution info
eml-project -- Research project info eml-coverage -- Geographic, temporal and taxonomic coverage eml-protocol -- Methods and QA/QC
![Page 22: Data Integration, Analysis, and Synthesis](https://reader035.vdocument.in/reader035/viewer/2022081422/56816057550346895dcf8188/html5/thumbnails/22.jpg)
![Page 23: Data Integration, Analysis, and Synthesis](https://reader035.vdocument.in/reader035/viewer/2022081422/56816057550346895dcf8188/html5/thumbnails/23.jpg)
Metacat metadata system
LTERMetacat
NCEASMetacat
Metacat Catalog
Morpho clients
Key
SDSCMetacatSite metadata system
AND
SEV
CAP
OBFS
Web clients
XML wrapper
NRSMetacat
SEVMetacat
![Page 24: Data Integration, Analysis, and Synthesis](https://reader035.vdocument.in/reader035/viewer/2022081422/56816057550346895dcf8188/html5/thumbnails/24.jpg)
Metacat architectureMetacat Server
RDBMS(Oracle)
TransformationSubsystem
LDAP
Java
Ser
vlet
Eng
ine
(Tom
cat)
HTT
P Se
rver
(Apa
che)
JDBCAPI
LDAPAdapter
Met
acat
Ser
vlet
(Dis
patc
her)
AuthenticationInterface
StorageSubsystem
QuerySubsystem
ReplicationSubsystem
ValidationSubsystem
Data StorageInterface
FSAdapter
File System
![Page 25: Data Integration, Analysis, and Synthesis](https://reader035.vdocument.in/reader035/viewer/2022081422/56816057550346895dcf8188/html5/thumbnails/25.jpg)
Metacat web interface
![Page 26: Data Integration, Analysis, and Synthesis](https://reader035.vdocument.in/reader035/viewer/2022081422/56816057550346895dcf8188/html5/thumbnails/26.jpg)
UCNatural Reserve System
OBFS Network
LTERNetwork
![Page 27: Data Integration, Analysis, and Synthesis](https://reader035.vdocument.in/reader035/viewer/2022081422/56816057550346895dcf8188/html5/thumbnails/27.jpg)
Functional breakdown Functional breakdown for synthesis
Data discovery Data access Data storage Data interpretation
Quality assessment Data Conversion & Integration Analysis & Modeling Visualization
![Page 28: Data Integration, Analysis, and Synthesis](https://reader035.vdocument.in/reader035/viewer/2022081422/56816057550346895dcf8188/html5/thumbnails/28.jpg)
Quality Assessment system
SemanticMetadata+
+ + ResearcherDecisionsData
QualityAssessmentReport
![Page 29: Data Integration, Analysis, and Synthesis](https://reader035.vdocument.in/reader035/viewer/2022081422/56816057550346895dcf8188/html5/thumbnails/29.jpg)
Quality Assessment Integrity constraint checking Data type checking Metadata completeness Data entry errors Outlier detection Check assertions about data
e.g., trees don’t shrink e.g., sea urchins do
![Page 30: Data Integration, Analysis, and Synthesis](https://reader035.vdocument.in/reader035/viewer/2022081422/56816057550346895dcf8188/html5/thumbnails/30.jpg)
Data IntegrationSemanticMetadata+
+ + ResearcherDecisionsData
Date Site Species Density10/1/1993 N654 Picea
rubens13
10/3/1994 N654 Picearubens
14.5
10/1/1993 N654 Betulapapyifera
3
10/31/1993 1 Picearubens
13.5
10/31/1993 1 Betulapapyifera
1.6
11/14/1994 1 Picearubens
8.4
11/14/1994 1 Betulapapyifera
1.8
IntegratedData Set
![Page 31: Data Integration, Analysis, and Synthesis](https://reader035.vdocument.in/reader035/viewer/2022081422/56816057550346895dcf8188/html5/thumbnails/31.jpg)
Data IntegrationDate Site Species Area Count 10/1/1993 N654 PIRU 2 26 10/3/1994 N654 PIRU 2 29 10/1/1993 N654 BEPA 1 3
Date Site picrub betpap31Oct1993 1 13.5 1.614Nov1994 1 8.4 1.8
Date Site Species Density 10/1/1993 N654 Picea
rubens 13
10/3/1994 N654 Picea rubens
14.5
10/1/1993 N654 Betula papyifera
3
10/31/1993 1 Picea rubens
13.5
10/31/1993 1 Betula papyifera
1.6
11/14/1994 1 Picea rubens
8.4
11/14/1994 1 Betula papyifera
1.8
A
B
C 0
2
4
6
8
10
12
14
16
Pice
a ru
bens
Pice
a ru
bens
Betu
la p
apyi
fera
Pice
a ru
bens
Betu
la p
apyi
fera
Pice
a ru
bens
Betu
la p
apyi
fera
Dens
ity (#
/m2)
![Page 32: Data Integration, Analysis, and Synthesis](https://reader035.vdocument.in/reader035/viewer/2022081422/56816057550346895dcf8188/html5/thumbnails/32.jpg)
Scaling Analysis and Modeling
Data and Metadata Input
(from Morpho/Metacat)
Execution engine (plugins)
SASR
MatlabSimulation models
...
Analysis + Model Metadata
InputsOutputs
Processing
Output
![Page 33: Data Integration, Analysis, and Synthesis](https://reader035.vdocument.in/reader035/viewer/2022081422/56816057550346895dcf8188/html5/thumbnails/33.jpg)
Scaling Analysis and Modeling
Execution Engine
Data and Metadata InputConfiguration for Analysis and Models
DDLSpecification(Inputs andDDL Code)
ProceduralSpecification(Inputs andproc code)
Input MapSpecification(test inputsmapped to
metadata/datafields)
Script withunresolvedvariables
Input MapParser
TestSpecification
Parser
Script withsymbolically
resolvedvariables
Script/Metadata/Data Validation
and ConflictResolution
User orontological
input forconflict
resolution
Data/MetadataInput facilitator
and Parser
DataPackage
(Metadatawith data
file)
Fullyresolved
final scriptScriptExecutor
Output(HTML,
XML, Text,etc.)
Script withsome fullyresolvedvariables
AnalyticalEnginePlugin
OutputStream from
AnalyticalEngine
OuputRenderer
OuputConfig File
![Page 34: Data Integration, Analysis, and Synthesis](https://reader035.vdocument.in/reader035/viewer/2022081422/56816057550346895dcf8188/html5/thumbnails/34.jpg)
![Page 35: Data Integration, Analysis, and Synthesis](https://reader035.vdocument.in/reader035/viewer/2022081422/56816057550346895dcf8188/html5/thumbnails/35.jpg)
Semantic metadata Describes the relationship between
measurements and ecologically relevant concepts
Drawn from a controlled vocabulary Ontology for ecological
measurements
![Page 36: Data Integration, Analysis, and Synthesis](https://reader035.vdocument.in/reader035/viewer/2022081422/56816057550346895dcf8188/html5/thumbnails/36.jpg)
Ecological Ontologies
BiodiversitySpecies TaxonOrganism
SpeciesEveness (J')
ShannonDiversity (H')
S
ii ppH1
ln'
SHJ
ln''Species
Count (S)
Abundance (N)
Abundance ofSpecies i (Ni)
SamplingArea (A)
ProportionalAbundance
Species i (pi)
NNp i
i
isaisa
has
has
has
has
has
S
iNN1
![Page 37: Data Integration, Analysis, and Synthesis](https://reader035.vdocument.in/reader035/viewer/2022081422/56816057550346895dcf8188/html5/thumbnails/37.jpg)
What drives synthesis Science questions Hypotheses Analyses + Models Integrated Data Original Data
![Page 38: Data Integration, Analysis, and Synthesis](https://reader035.vdocument.in/reader035/viewer/2022081422/56816057550346895dcf8188/html5/thumbnails/38.jpg)
Conclusions
Barriers to integration can be addressed using structured metadata
Can accomplish a lot with ‘just’ mechanical transformations
Domain ontologies + semantic mediation are paths to scaling integration
Analysis drives all other phases of integration