data integration, analysis, and synthesis matthew b. jones national center for ecological analysis...
TRANSCRIPT
![Page 1: Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e375503460f94b26bc5/html5/thumbnails/1.jpg)
Data Integration, Analysis, and Synthesis
Matthew B. JonesNational Center for Ecological Analysis and Synthesis
University of California Santa Barbara
Scalable Information Networks for the Environment
http://knb.ecoinformatics.org
Funding: National Science Foundation (DEB99-80154, DBI99-04777)
![Page 2: Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e375503460f94b26bc5/html5/thumbnails/2.jpg)
NCEAS’ Mission
Integrate existing data for broad ecological synthesis
Use synthesis to inform policy and management
![Page 3: Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e375503460f94b26bc5/html5/thumbnails/3.jpg)
Synthesis at NCEAS
Research Management Policy
200+ synthesis projects 1900+ participating scientists
![Page 4: Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e375503460f94b26bc5/html5/thumbnails/4.jpg)
Research projects Hunsaker – Quantification of Uncertainty in
Spatial Data for Ecological Applications Ives & Frost – Intrinsic and Extrinsic Variability
in Community Dynamics Osenberg -- Meta-Analysis, Interaction
Strength and Effect Size; Application of Biological Models to the Synthesis of Experimental Data
Murdoch – Complex Population Dynamics
![Page 5: Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e375503460f94b26bc5/html5/thumbnails/5.jpg)
Management projects Andelman – Designing and Assessing the
Viability of Nature Reserve Systems at Regional Scales: Integration of Optimization, Heuristic and Dynamic Models
Boersma & Kareiva – Prospectus For An Analysis of Recovery Plans and Delisting
Kareiva – Habitat Conservation Planning for Endangered Species
Lubchenco, Palumbi, & Gaines – Developing the Theory of Marine Reserves
![Page 6: Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e375503460f94b26bc5/html5/thumbnails/6.jpg)
Policy projects Costanza & Farber -- The Value of the World's
Ecosystem Services and Natural Capital: Toward a Dynamic, Integrated Approach
http://www.nceas.ucsb.edu/
![Page 7: Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e375503460f94b26bc5/html5/thumbnails/7.jpg)
Synthesis projects
Use existing data...
Distributed sources Varying protocols Varying formats
Obtained via personal collaboration
![Page 8: Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e375503460f94b26bc5/html5/thumbnails/8.jpg)
Functional breakdown Functional breakdown for synthesis
Data discovery Data access Data storage Data interpretation
Quality assessment Data Conversion & Integration Analysis & Modeling Visualization
![Page 9: Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e375503460f94b26bc5/html5/thumbnails/9.jpg)
Presentation Outline Integration, Analysis, and
Synthesis:
Challenges
![Page 10: Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e375503460f94b26bc5/html5/thumbnails/10.jpg)
Population survey Experimental Taxonomic survey Behavioral Meteorological Oceanographic Hydrology …
Data Heterogeneity
Economic Social (urban
ecology) Paleoecological Historical
Land use Demographics
![Page 11: Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e375503460f94b26bc5/html5/thumbnails/11.jpg)
Types of Heterogeneity Intensional vs. Arbitrary Heterogeneity
Syntax (format) CSV, Fixed ASCII, proprietary binary
Schema (organization) Non-normalized models
Semantics (meaning/methods) Protocol semantics (e.g., scale) Parameter semantics (e.g., bodysize (g)) Conceptual framework (e.g., experimental trts) Taxonomy + nomenclature
![Page 12: Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e375503460f94b26bc5/html5/thumbnails/12.jpg)
Data Dispersion Data are distributed among:
Independent researcher holdings Research station collections
LTER Network (24 sites) Org. of Biological Field Stations (168 sites) Univ. Cal Natural Reserve System (36 sites) MARINE (62 sites) PISCO
Agency databases Museum databases
Access via personal networking Not scalable
![Page 13: Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e375503460f94b26bc5/html5/thumbnails/13.jpg)
Lack of Metadata Majority of ecological data
undocumented Lack information on syntax, schema and
semantics of data Impossible to understand data without
contacting the original researchers
Documentation conventions widely vary Requires large time investment to
understand each data set
![Page 14: Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e375503460f94b26bc5/html5/thumbnails/14.jpg)
Scaling Data Integration Because of:
Data heterogeneity Data dispersion Lack of documentation
Integration and synthesis are limited to a manual process Thus, difficult to scale integration
efforts up to large numbers of data sets
![Page 15: Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e375503460f94b26bc5/html5/thumbnails/15.jpg)
Data IntegrationDate Site Species Area Count 10/1/1993 N654 PIRU 2 26 10/3/1994 N654 PIRU 2 29 10/1/1993 N654 BEPA 1 3
Date Site picrub betpap31Oct1993 1 13.5 1.614Nov1994 1 8.4 1.8
Date Site Species Density 10/1/1993 N654 Picea
rubens 13
10/3/1994 N654 Picea rubens
14.5
10/1/1993 N654 Betula papyifera
3
10/31/1993 1 Picea rubens
13.5
10/31/1993 1 Betula papyifera
1.6
11/14/1994 1 Picea rubens
8.4
11/14/1994 1 Betula papyifera
1.8
A
B
C
![Page 16: Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e375503460f94b26bc5/html5/thumbnails/16.jpg)
Presentation Outline Integration, Analysis, and
Synthesis:
Challenges Current work
Knowledge Network for Biocomplexity Partnership for Biodiversity Informatics
![Page 17: Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e375503460f94b26bc5/html5/thumbnails/17.jpg)
Knowledge Network for Biocomplexity (KNB) National network for biocomplexity
data Data discovery Data access Data interpretation
Enable advanced services Data integration Analysis framework Hypothesis modeling Visualization
![Page 18: Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e375503460f94b26bc5/html5/thumbnails/18.jpg)
Central Role of Metadata What metadata?
Ownership, attribution, structure, contents, methods, quality, etc.
Critical for addressing data heterogeneity issues
Critical for developing extensible systems
Critical for long-term data preservation
Allows advanced services to be built
![Page 19: Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e375503460f94b26bc5/html5/thumbnails/19.jpg)
KNB Components Ecological Metadata Language (EML) Morpho -- data management for ecologists
Cross platform Java application Metacat -- flexible metadata & data
system
Analysis and Modeling engine Data integration engine Semantic Query Processor Hypothesis Modeling Engine
![Page 20: Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e375503460f94b26bc5/html5/thumbnails/20.jpg)
Ecological Metadata Language
XML syntax for representing metadata
Extensible – can add new metadata
Modular – can subset metadata for specific applications
![Page 21: Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e375503460f94b26bc5/html5/thumbnails/21.jpg)
EML 2.0beta3 modules eml-resource -- Basic resource info eml-dataset -- Data set info eml-literature -- Citation info eml-software -- Software info eml-party -- People and Organizations
eml-entity -- Data entity (table) info eml-attribute -- Attribute (variable) info eml-constraint -- Integrity constraints eml-physical -- Physical format info eml-access -- Access control eml-distribution -- Distribution info
eml-project -- Research project info eml-coverage -- Geographic, temporal and taxonomic coverage eml-protocol -- Methods and QA/QC
![Page 22: Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e375503460f94b26bc5/html5/thumbnails/22.jpg)
![Page 23: Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e375503460f94b26bc5/html5/thumbnails/23.jpg)
Metacat metadata system
LTERMetacat
NCEASMetacat
Metacat Catalog
Morpho clients
Key
SDSCMetacatSite metadata system
AND
SEV
CAP
OBFS
Web clients
XML wrapper
NRSMetacat
SEVMetacat
![Page 24: Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e375503460f94b26bc5/html5/thumbnails/24.jpg)
Metacat architectureMetacat Server
RDBMS(Oracle)
TransformationSubsystem
LDAP
Java
Ser
vlet
En
gin
e (T
om
cat)
HT
TP
Ser
ver
(Ap
ach
e)JDBCAPI
LDAPAdapter
Met
acat
Ser
vlet
(D
ispa
tche
r)
AuthenticationInterface
StorageSubsystem
QuerySubsystem
ReplicationSubsystem
ValidationSubsystem
Data StorageInterface
FSAdapter
File System
![Page 25: Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e375503460f94b26bc5/html5/thumbnails/25.jpg)
Metacat web interface
![Page 26: Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e375503460f94b26bc5/html5/thumbnails/26.jpg)
UC
Natural Reserve System
OBFS Network
LTER
Network
![Page 27: Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e375503460f94b26bc5/html5/thumbnails/27.jpg)
Functional breakdown Functional breakdown for synthesis
Data discovery Data access Data storage Data interpretation
Quality assessment Data Conversion & Integration Analysis & Modeling Visualization
![Page 28: Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e375503460f94b26bc5/html5/thumbnails/28.jpg)
Quality Assessment system
Semantic
Metadata++ +
Researcher
DecisionsData
Quality
Assessment
Report
![Page 29: Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e375503460f94b26bc5/html5/thumbnails/29.jpg)
Quality Assessment Integrity constraint checking Data type checking Metadata completeness Data entry errors Outlier detection Check assertions about data
e.g., trees don’t shrink e.g., sea urchins do
![Page 30: Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e375503460f94b26bc5/html5/thumbnails/30.jpg)
Data Integration
Semantic
Metadata++ +
Researcher
DecisionsData
Date Site Species Density10/1/1993 N654 Picea
rubens13
10/3/1994 N654 Picearubens
14.5
10/1/1993 N654 Betulapapyifera
3
10/31/1993 1 Picearubens
13.5
10/31/1993 1 Betulapapyifera
1.6
11/14/1994 1 Picearubens
8.4
11/14/1994 1 Betulapapyifera
1.8
Integrated
Data Set
![Page 31: Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e375503460f94b26bc5/html5/thumbnails/31.jpg)
Data IntegrationDate Site Species Area Count 10/1/1993 N654 PIRU 2 26 10/3/1994 N654 PIRU 2 29 10/1/1993 N654 BEPA 1 3
Date Site picrub betpap31Oct1993 1 13.5 1.614Nov1994 1 8.4 1.8
Date Site Species Density 10/1/1993 N654 Picea
rubens 13
10/3/1994 N654 Picea rubens
14.5
10/1/1993 N654 Betula papyifera
3
10/31/1993 1 Picea rubens
13.5
10/31/1993 1 Betula papyifera
1.6
11/14/1994 1 Picea rubens
8.4
11/14/1994 1 Betula papyifera
1.8
A
B
C0
2
4
6
8
10
12
14
16
Pic
ea r
ubens
Pic
ea r
ubens
Betu
la p
apyif
era
Pic
ea r
ubens
Betu
la p
apyif
era
Pic
ea r
ubens
Betu
la p
apyif
era
Densi
ty (
#/m
2)
![Page 32: Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e375503460f94b26bc5/html5/thumbnails/32.jpg)
Scaling Analysis and Modeling
Data and Metadata Input
(from Morpho/Metacat)
Execution engine (plugins)
SASR
MatlabSimulation models
...
Analysis + Model Metadata
InputsOutputs
Processing
Output
![Page 33: Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e375503460f94b26bc5/html5/thumbnails/33.jpg)
Scaling Analysis and Modeling
Execution Engine
Data and Metadata InputConfiguration for Analysis and Models
DDLSpecification(Inputs andDDL Code)
ProceduralSpecification(Inputs andproc code)
Input MapSpecification(test inputsmapped to
metadata/datafields)
Script withunresolvedvariables
Input MapParser
TestSpecification
Parser
Script withsymbolically
resolvedvariables
Script/Metadata/Data Validation
and ConflictResolution
User orontological
input forconflict
resolution
Data/MetadataInput facilitator
and Parser
DataPackage
(Metadatawith data
file)
Fullyresolved
final scriptScriptExecutor
Output(HTML,
XML, Text,etc.)
Script withsome fullyresolvedvariables
AnalyticalEnginePlugin
OutputStream from
AnalyticalEngine
OuputRenderer
OuputConfig File
![Page 34: Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e375503460f94b26bc5/html5/thumbnails/34.jpg)
![Page 35: Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e375503460f94b26bc5/html5/thumbnails/35.jpg)
Semantic metadata Describes the relationship between
measurements and ecologically relevant concepts
Drawn from a controlled vocabulary Ontology for ecological
measurements
![Page 36: Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e375503460f94b26bc5/html5/thumbnails/36.jpg)
Ecological Ontologies
BiodiversitySpecies TaxonOrganism
SpeciesEveness (J')
ShannonDiversity (H')
S
ii ppH1
ln'
S
HJ
ln
''Species
Count (S)
Abundance (N)
Abundance ofSpecies i (Ni)
SamplingArea (A)
ProportionalAbundance
Species i (pi)
N
Np
ii
isaisa
has
has
has
has
has
S
iNN1
![Page 37: Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e375503460f94b26bc5/html5/thumbnails/37.jpg)
What drives synthesis Science questions Hypotheses Analyses + Models Integrated Data Original Data
![Page 38: Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e375503460f94b26bc5/html5/thumbnails/38.jpg)
Conclusions
Barriers to integration can be addressed using structured metadata
Can accomplish a lot with ‘just’ mechanical transformations
Domain ontologies + semantic mediation are paths to scaling integration
Analysis drives all other phases of integration