data types
DESCRIPTION
Vision for the 21 st Century Information Environment in Ecology (Ecoinformatics) Deana Pennington University of New Mexico LTER Network Office Shawn Bowers UCSD San Diego Supercomputer Center. If georeferenced. GIS Moderately large Complex formats. Data Types. - PowerPoint PPT PresentationTRANSCRIPT
Vision for the 21Vision for the 21stst Century Century Information Environment in Information Environment in Ecology (Ecoinformatics) Ecology (Ecoinformatics)
Deana PenningtonDeana PenningtonUniversity of New MexicoUniversity of New Mexico
LTER Network OfficeLTER Network Office
Shawn BowersShawn BowersUCSDUCSD
San Diego Supercomputer CenterSan Diego Supercomputer Center
Data TypesData Types
Field dataSmallComplex formatsHeterogeneous
ImageryMassiveSimple formatsContinuous spatial
Ground sensorsMassiveSimple formatsContinuous temporal
NEON Observatories: question driven data collection
Ecological Metadata Language (EML) ======
SEEK: large ITR projectSpatial Data Workbench:Small NPACI project
Wireless Sensor Workshop
GISModerately largeComplex formats
If georeferenced
InformationAcquisition,
Archival & Retrieval
Data Preprocessing
& ProductCreation
IntegratedData
Analysis&
Synthesis
InferenceFrom
Pattern
Information Technologies:
Analytical Analytical Domains:Domains:
Hardware, networksElectronic notebooks
Remote SensingWireless Sensors
MetadataDatabases & Query
Web designGrid technologies
Processing Pipelines
High-throughput processing
Expert systems
Semantic mediation
Data miningExploratory spatial data
analysisPattern
matchingVisualization
Computational Models
Genetic algorithmsCellular automata
Adaptive agents, et al.
Hardware, networksElectronic notebooks
Remote SensingWireless Sensors
MetadataDatabases & Query
Web designGrid technologies
Processing Pipelines
High-throughput processing
Expert systems
Semantic mediation
Data miningExploratory spatial data
analysisPattern
matchingVisualization
Computational Models
Genetic algorithmsCellular automata
Adaptive agents, et al.
SEEK Workflows
Spatial Data Workbench
Hardware, networksElectronic notebooks
Remote SensingWireless Sensors
MetadataDatabases & Query
Web designGrid technologies
Processing Pipelines
High-throughput processing
Expert systems
Semantic mediation
Data miningExploratory spatial data
analysisPattern
matchingVisualization
Computational Models
Genetic algorithmsCellular automata
Adaptive agents, et al.
EML
Wireless Sensors
Characteristics of Ecological DataCharacteristics of Ecological Data
Complexity/Metadata RequirementsComplexity/Metadata Requirements
SatelliteImages
DataDataVolumeVolume(per(perdataset)dataset)
LowLow
HighHigh
HighHigh
Soil CoresSoil Cores
PrimaryPrimaryProductivityProductivity
GISGIS
Population DataPopulation Data
BiodiversityBiodiversitySurveysSurveys
Gene Sequences
Business Data
WeatherStations
Modified from B. Michener
WirelessSensors
SEEK
Date Site picrub betpap 31Oct1993 1 13.5 1.6 14Nov1994 1 8.4 1.8
Date Site Species Area Count 10/1/1993 N654 PIRU 2 26 10/3/1994 N654 PIRU 2 29 10/1/1993 N654 BEPA 1 3
Field Data:Semantics
Modified from B. Michener, 2003
Date Site Species Density 10/1/1993 N654 Picea
rubens 13
10/3/1994 N654 Picea rubens
14.5
10/1/1993 N654 Betula papyifera
3
10/31/1993 1 Picea rubens
13.5
10/31/1993 1 Betula papyifera
1.6
11/14/1994 1 Picea rubens
8.4
11/14/1994 1 Betula papyifera
1.8
Remotely Sensed & Remotely Sensed & Ground DataGround Data
SatelliteSatelliteLandsat since 1972 Landsat since 1972
(multispectral)(multispectral)Ikonos (hyperspatial)Ikonos (hyperspatial)Hyperion (hyperspectral)Hyperion (hyperspectral)
AirborneAirborneAir photos (historical Air photos (historical
reconnaisance) reconnaisance) RadarRadarThermalThermalADAR (multispectral)ADAR (multispectral)Aviris (hyperspectral)Aviris (hyperspectral)
Ground dataGround dataField dataField dataAutomated sensorsAutomated sensorsWireless sensorsWireless sensors
Target
Rem
ote
ly s
en
sed
Remotely sensed images capture information continuous space, which can then be compared through time to derive events
Wireless sensors capture information at a continuous time, which can then be compared through space to derive spatial patterns
Event
t = 2
t = 1
t
tt
Event A Event A
Event A
History Repeats Itself…History Repeats Itself…
“…“…use of remotely sensed data…lagged for many use of remotely sensed data…lagged for many years. The reasons for this have little to do with the years. The reasons for this have little to do with the sophistication of remote sensing technology. Rather sophistication of remote sensing technology. Rather it has to do more with the ability to store, manage, it has to do more with the ability to store, manage, access and use the massive data produced by access and use the massive data produced by satellites, radar facilities and other remote sensing satellites, radar facilities and other remote sensing instruments. Without instruments. Without advanced information advanced information processingprocessing, it would take decades , it would take decades to compile and to compile and analyzeanalyze the incredible amounts of information that the incredible amounts of information that produced by many of these instruments.” produced by many of these instruments.”
-Dr. Rita Colwell, Director NSF, 1998-Dr. Rita Colwell, Director NSF, 1998
SensorsSensors Deployed Sensor NetworksDeployed Sensor Networks MetadataMetadata Security and Error ResiliencySecurity and Error Resiliency Cyberinfrastructure for Sensor NetworksCyberinfrastructure for Sensor Networks Analysis and VisualizationAnalysis and Visualization
EducationEducation OutreachOutreach Collaboration and PartneringCollaboration and Partnering
Environmental Cyberinfrastructure Needs for Distributed Sensor Networks: a Report from a NSF Sponsored Workshop (2003)
InformationAcquisition,
Archival & Retrieval
Data Preprocessing
& Product Creation
Integrated DataAnalysis &Synthesis
InferenceFrom
Pattern
Incorporating IT Incorporating IT Analytical Advances into Analytical Advances into
EcologyEcology
Grid TechnologiesGrid Technologies
Knowledge Knowledge Representation, Representation,
Semantics and OntologiesSemantics and Ontologies
The Semantic WebThe Semantic Web
Extend the current web with Extend the current web with “knowledge”“knowledge” and and “meaning”“meaning” for for
Better searchingBetter searching (that is, better answers to current (that is, better answers to current searches)searches)
Automated software toolsAutomated software tools that process web that process web information (comparison shopping, making information (comparison shopping, making appointments, and so on)appointments, and so on)
Proposes a new form of Proposes a new form of web contentweb content,, which uses which uses ontologies ontologies and and knowledge representationknowledge representation techniquestechniques
The Semantic Web The Semantic Web [Sci. Am., [Sci. Am., May ‘01, Berners-Lee]May ‘01, Berners-Lee]
Semantic-Web Agent
Find physical therapistfor mom using my schedule
get openings
get physicianprescription
get possible providersand availability
get locations
Return provideravailable within 10 miles of location
“Mom needs to see a specialist for a series of physical therapy sessions – can you take her?”
Semantic Web Semantic Web Architecture (RDF)Architecture (RDF)
The The Resource Description Framework Resource Description Framework (RDF), (RDF), which is a language to:which is a language to:
Define Define standard ontologiesstandard ontologies AnnotateAnnotate web-pages with Semantic-Web web-pages with Semantic-Web
content content
Ultimately, tools … to exploit semantic Ultimately, tools … to exploit semantic mark upmark up
Web-crawlers, search engines, personal agentsWeb-crawlers, search engines, personal agents
RDF / RDF SchemaRDF / RDF Schema
An RDF Schema (or OWL) An RDF Schema (or OWL) ontologyontology
Serves as a common set of terms (a Serves as a common set of terms (a vocabularyvocabulary) with ) with relationshipsrelationships and and constraintsconstraints
Can be Can be publishedpublished as Web-content using RDF (for as Web-content using RDF (for others to use)others to use)
worksAtcoversInsuranceProvider
InsuranceProvider PhysicanPhysican
PhysicalTherapistPhysicalTherapist
MedicalFacilityMedicalFacility LocationLocation
locatedAt
RDF / RDF SchemaRDF / RDF Schema
With RDF, this Web-page With RDF, this Web-page can be annotated using the can be annotated using the ontologyontology
worksAtcoversInsuranceProvider
Physican
PhysicalTherapist
MedicalFacility
LocationlocatedAt
BlueCrossBlueCross Dr. HartmanDr. Hartman UniversityHospital
UniversityHospital
555 Univ.Drive …
555 Univ.Drive …
covers worksAt locatedAt
RDF / RDF SchemaRDF / RDF Schema
Annotations provide access to Annotations provide access to the meaningful, or semantic the meaningful, or semantic content of the Web-pagecontent of the Web-page
worksAtcoversInsuranceProvider
Physican
PhysicalTherapist
MedicalFacility
LocationlocatedAt
BlueCross Dr. HartmanDr. HartmanUniversityHospital
555 Univ.Drive …
covers worksAt locatedAt
Which Physical Therapists workAt a Facility within Location X?
Which Physical Therapists workAt a Facility within Location X?
SEEK and the Semantic SEEK and the Semantic WebWeb
We want to build technology using Semantic-We want to build technology using Semantic-Web standards to …Web standards to …
… … explore the use of semantics to help explore the use of semantics to help scientists deal with heterogeneityscientists deal with heterogeneity Define standard Define standard ecological ontologiesecological ontologies Automate dataset and analytic-step Automate dataset and analytic-step discoverydiscovery, ,
exchangeexchange, and , and integrationintegration Help researchers construct and reuse Help researchers construct and reuse scientific scientific
workflowsworkflows, for example, for ecological modeling, for example, for ecological modeling
SEEK SEEK EcoGridEcoGrid
Pipeline
Pipeline
1. Question of interest2. Query EcoGrid for workflows (ontologies)3. Query EcoGrid for data (ontologies & semantic mediation)4. SRB optimizes and runs analysis5. Get results…archive to EcoGrid
Working Groups:1. EcoGrid2. Semantic mediation & KR3. Analysis & Modeling4. Taxon5. BEAM6. EOT
60 Gigabits/second
Resources (data & computational)Managed by Storage Resource Broker (SRB)
EcoGridEcoGrid
Analytical Services
Matt Jones, 2003Data Services(includes analytical libraries)
Storage Resourc
e Broker
1. Node Registry• Web service: XML standards, SOAP/WSDL protocols• Data: REQUIRES standard metadata (EML and others)• Workflows: standard workflow metadata?
Overview of Overview of architecturearchitecture
SEEK Components
Benefits to UsersBenefits to Users ScientistsScientists
Access to high end computing Access to high end computing technologiestechnologies
Better integration of all relevant Better integration of all relevant datadata
Workflow standardization and Workflow standardization and analysisanalysis
Time and resource efficiencyTime and resource efficiency Reusable analytical steps & Reusable analytical steps &
workflowsworkflows
StudentsStudentsImproved access to knowledge baseImproved access to knowledge base
Environmental ManagersEnvironmental ManagersAccessibility to current scientific Accessibility to current scientific
approachapproach
Policy makersPolicy makersTimely input to decision makingTimely input to decision making
Formal documentation of Formal documentation of methods methods
(output in report format)(output in report format)Reproducibility of methodsReproducibility of methodsVisual creation and Visual creation and communication of methodscommunication of methodsVersioningVersioningAutomated data typing and Automated data typing and transformationtransformation
SEEK: ENM workflowsSEEK: ENM workflows
EcoGridDataBase
EcoGridDataBase
EcoGridDataBase
EcoGridDataBase
Training sample
GARPrule set
Test sample
Species pres. & abs.
points
EcoGridQuery
EcoGridQuery
LayerIntegration
SampleData
+A3+A2
+A1
DataCalculation
MapGeneration
Validation
User
Model qualityparameters
Native range prediction map
Env. layers
GenerateMetadata
ArchiveTo Ecogrid
Selectedprediction
maps
PhysicalTransformatio
n
Scaling
Integrated layers
Integrated layers
GARPrule set
Species pres. & abs.
points
Analytical Pipelines Analytical Pipelines Sloan Digital Sky Project: Sloan Digital Sky Project:
Mapping the Universe Mapping the Universe
“The raw data…are fed through data analysis software pipelines…to extract about 400 attributes for each celestial object…These pipelines embody much of mankind’s knowledge of astronomy.” Szalay et al., 2001
Training sample
GARPrule set
Test sampleSpecies
pres. & abs. points
EcoGridQuery
LayerIntegration
SampleData
+A3+A2
+A1
DataCalculation
MapGeneration
Validation
User
Model qualityparameters
Native range prediction map
Env. layers
GenerateMetadata
ArchiveTo Ecogrid
Selectedprediction
maps
PhysicalTransformatio
n
Scaling
Integrated layers
Integrated layers
GARPrule set
Species pres. & abs.
points
Species Distribution Species Distribution PipelinePipeline
AcousticSignal
ProcessingPipeline
Remotely sensed data (land cover class, etc.)Ground sensor data (climate, etc.)
Image Processing
Pipeline
InterpolationPipeline
Analytical Pipelines: Analytical Pipelines: SDWSDW
SRB/MCAT
HPSS @ SDSCRemotely Sensed
Imagery
Climate
Ground truth
Site Field Observations
Georegistration
DataTransformation
UnsupervisedClassification
BandIndices
Land Cover(Patch) Metrics
Band Selection
SupervisedClassification
Segmentation
Climate/Land Cover Integrated Graphics
Maps
Exploratory analysisVegetation patternsVegetation dynamicsModel parameterization
RadiometricCorrections
BiomedicBiomedical al
InformatiInformatics cs
Research Research NetworkNetwork
T. Kapur, et al., 1998; Tina Kapur, 1999.
Segmented images
Registration
Statistical Classification
Template Distance Transforms
Brain atlas
PrototypesGrey value images
Surgical Planning Laboratory, 2001
Kikinis et al., 2001
Society for Industrial and Society for Industrial and Applied Mathematics Applied Mathematics (SIAM) Conference on (SIAM) Conference on Imaging Science, 2004Imaging Science, 2004
CONFERENCE THEMES CONFERENCE THEMES Image acquisition Image acquisition Image reconstruction and Image reconstruction and
restoration restoration Image storage, compression, and Image storage, compression, and
retrieval retrieval Image coding and transmission Image coding and transmission PDEs in image filtering and PDEs in image filtering and
processing processing Image registration and warping Image registration and warping Image modeling and analysis Image modeling and analysis Statistical aspects of imaging Statistical aspects of imaging Wavelets and multiscale analysis Wavelets and multiscale analysis Multidimensional imaging sciences Multidimensional imaging sciences Inverse problems in imaging Inverse problems in imaging
sciences sciences Mathematics of visualization Mathematics of visualization Biomedical imaging Biomedical imaging Applications Applications
“By their very nature, these challenges cut across the disciplines of physics, engineering, mathematics, biology, medicine, and statistics.”
Why not ecology and environmental science?
OntologiesOntologies
GenericImage/SignalOntologies
AstrophysicsOntology
Digital FilmOntology
And many others…BiomedicalOntology
Ecology Ontology•Landscape Ecology•Land Managers•Soil science•Etc.
Landscape Ecology Landscape Ecology ExampleExample
Method OntologiesPixel calc
ClassificationSegmentation
StructuralOntologies
PhysicalOntologies
Generic Image Ontologies
Atm CorrLand cover class
Patch ID
Patch metrics
TM EMR 7 bandsHDF Place/date
Calibrations
Domain Ontologies
Modified from Camara et al. (2001)
So far….So far….
Grid TechnologyGrid TechnologyEcoGrid vs semantic webEcoGrid vs semantic web
Analytical pipelines/WorkflowsAnalytical pipelines/WorkflowsSensors: generic vs domain specificSensors: generic vs domain specificReuse of actors/workflowsReuse of actors/workflowsWorkflow metadata and reportingWorkflow metadata and reporting
Ontologies/Semantic MediationOntologies/Semantic MediationQuery EcoGrid for workflowsQuery EcoGrid for workflowsQuery EcoGrid for data to fit the selected Query EcoGrid for data to fit the selected
workflow(s)workflow(s)Integration of heterogenous data typesIntegration of heterogenous data types
Data MiningData Mining-finding interesting -finding interesting
patternspatternsVisualizationVisualization
-showing interesting -showing interesting patternspatterns
Exploratory Data Analysis
NDVI at NDVI at SevilletaSevilleta
TMAVHRRMODIS
1989 90 91 92 93 94 95 96 97 98 99 00 01 2002
AVHRR: 1 x 1 km pixels, 14 years * 26 images/year * 1824 pixels = 663,936 data pointsTM: 30 x 30m pixels, 14 years * 2 images/year * 65,260 pixels = 1,827,280 data points
if 20 images/year => 18,272,800 data points if 30 years => 39,156,000 data points
Spatiotemporal Analysis & Spatiotemporal Analysis & Vis: Drought EffectsVis: Drought Effects
1999
2000
2001
2002
July 16-29 July 30-12 Aug 13-26 Aug 27-9 Sep 10-23
Spatiotemporal Analysis & Spatiotemporal Analysis & Vis: Drought EffectsVis: Drought Effects1989 90 91 92 93 94 95 96 97 98 99 00 01 2002
Year
B
A
SpringSummer/Fall
0
20
40
60
80
100
120
140
1609 10 11 12 15 17 19 20 21 9 14 15 16 19 9 15 16 17 10 12 13 14 16 17 16 17 18 19 22 9 11 12 11 19 12 14 18 19 21 9 12
13
14
15
16
17
18
19 20 21 22 9 10 11 12 13 14 15 16 17 18 19
1989 1990 19911993 1994 1995 1996 1999 2000 2001 2002
N
S
Sum of count
year period
group
C
0
160
Count
NorthSouth
Year
1989 90 9193 94 95 96 99 00 01 2002
S F S F SF F S F F S SF S F S F S F
S = SpringF = Summer/Fall
Percentof allcells
Percentof allcells
1989 90 91 92 93 94 95 96 97 98 99 00 01 2002Year
B
A
SpringSummer/FallSpringSummer/Fall
0
20
40
60
80
100
120
140
1609 10 11 12 15 17 19 20 21 9 14 15 16 19 9 15 16 17 10 12 13 14 16 17 16 17 18 19 22 9 11 12 11 19 12 14 18 19 21 9 12
13
14
15
16
17
18
19 20 21 22 9 10 11 12 13 14 15 16 17 18 19
1989 1990 19911993 1994 1995 1996 1999 2000 2001 2002
N
S
Sum of count
year period
group
C
0
160
Count
NorthSouthNorthSouth
Year
1989 90 9193 94 95 96 99 00 01 2002
S F S F SF F S F F S SF S F S F S F
1989 90 9193 94 95 96 99 00 01 2002
S F S F SF F S F F S SF S F S F S F
S = SpringF = Summer/Fall
Percentof allcells
Percentof allcells
Linking and Linking and BrushingBrushing
Visualization : Investigating cancer incidence and risk factors. From GeoVista Studio, Penn State University.
Hyperspectral Imagery = Hyperspectral Imagery = 224 bands224 bands
AVIRIS hyperspectral data cube
> 50 gigabytes of raw data per acquisition
TrueColor
FalseColor
Hyperspectral ExampleHyperspectral ExamplePavement
AgricultureClouds
AridUpland
Riparian
River
300 pixels6 km
300 pixels * 300 pixels * 224 bands = 20,160,000 data points
192 training pixels, 7 mislabeled, out of 90,000 total pixels
*low % training pixels*errors in training set
Training Samples Testing Samples
Limited Set
Full Set
Legend
Label Error
Land Cover Class
CloudsRiverRiparianArid UplandSemi-arid UplandPavementAgricultureBarren
Limited Set:
Supervised ClassifiersSupervised Classifiers
Band 1
Ban
d 2
Class 1
Class 2
x
ClassMeans
ProbabilityContours
EuclideanDistance
x Pixel to be classified
Support Vector MachineHyperplane
Limited Sample SetLimited Sample SetA) ML 89.4%
D) MD 69.4%
B) NBN 83.3%
C) SVM 77.2%
ML = Maximum LikelihoodNBN = Naïve Bayesian NetworkSVM = Support Vector MachineMD = Minimum Distance
CloudsRiverRiparianAgricultureArid UplandBarrenPavement
Full Sample SetFull Sample SetA) ML 96.4%
D) MD 88.4%
B) NBN 90.9%
C) SVM 72.9%
ML = Maximum LikelihoodNBN = Naïve Bayesian NetworkSVM = Support Vector MachineMD = Minimum Distance
CloudsRiverRiparianAgricultureArid UplandSemi-arid UplandBarrenPavement
Data Mining ChallengesData Mining ChallengesBiomedical DataBiomedical Data Large sample setsLarge sample sets Few correlates (dozens)Few correlates (dozens) Hard classesHard classes
Ecologic DataEcologic Data Paucity of accurate reference dataPaucity of accurate reference data Spatial autocorrelationSpatial autocorrelation Large number of potential Large number of potential
correlatescorrelates Fuzzy classesFuzzy classes UncertaintyUncertainty
Basic Research NeedBasic Research Need
Spatiotemporal analysis & Spatiotemporal analysis & visualization techniques that visualization techniques that explicitly deal with these explicitly deal with these challengeschallenges
EcoGrid archive of ground truth EcoGrid archive of ground truth data and the ontologies that will data and the ontologies that will allow us to semantically mediate allow us to semantically mediate the classesthe classes
Where do we start?Where do we start?
Field data
Imagery
Ground sensors
SEEK: infrastructure
Spatial Data Workbench:Small NPACI project
Wireless Sensor Workshop
Pipeline
Pipeline
Future Future Systems: Link Systems: Link
with SEEKwith SEEK
EcoGridQuery
LayerIntegration
SampleData
+
DataCalculation
MapGeneration
Validation
User
GenerateMetadata
ArchiveTo Ecogrid
ModelsCompetitionConnectivityClimateUrban expansionEt al.
SRB/MCAT
HPSS @ SDSCRemotely Sensed
Imagery
Climate
Ground truthSite Field Observations
Georegistration
DataTransformation
UnsupervisedClassification
BandIndices
Land Cover(Patch) Metrics
Band Selection
SupervisedClassification
Segmentation
Climate/Land Cover Integrated Graphics
Maps
RadiometricCorrections
Unspecified ground sensor pipeline
Semantic transformationto integrate field data
ImageOntologies
AlgorithmOntologies
GeographicOntologies
Spatial &TemporalOntologies
Signal ProcessingOntologies
DomainOntologies
We start with you!We start with you!
Metadata
Databases
Data Sharing
Computer savvy
End!End!
1.Build a generic image and signal processing knowledge base
2.Develop actors for these functions3.Build knowledge bases for domains of
interest, and relate them to the generic• ENM pipelines• NEON competition• Hazards (fire, flood, drought, disease)
4.Develop processing pipelines5.Identify sensor (image and signal) data and
analytical resources, convert them to web services
6.When EcoGrid is ready, register them as nodes
Incorporating sensor Incorporating sensor processingprocessing
National Center?National Center?
Multidisciplinary staffMultidisciplinary staff Working groups (4-6 weeks)Working groups (4-6 weeks) Multidisciplinary postdocsMultidisciplinary postdocs Summer school in Summer school in
ecoinformaticsecoinformatics