using e-infrastructures for biodiversity conservation - module 4

63
Using e- Infrastructures for Biodiversity Conservation Gianpaolo Coro ISTI-CNR, Pisa, Italy

Upload: gianpaolo-coro

Post on 15-Aug-2015

28 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

Using e-Infrastructures for Biodiversity Conservation

Gianpaolo Coro ISTI-CNR, Pisa, Italy

Page 2: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

Module 4 - Outline

1. Data processing requirements by communities of practice

2. The D4Science Statistical Manager

3. Ecological modelling

Page 3: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

D4ScienceD4Science is both a Data and a Computational e-Infrastructure

• Used by several Projects: i-Marine, EUBrazil OpenBio, ENVRI;

• Implements the notion of e-Infrastructure as-a-Service: it offers on demand access to data management services and computational facilities;

• Hosts several VREs for Fisheries Managers, Biologists, Statisticians…and Students.

Page 4: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

D4Science - ResourcesLarge Set of Biodiversity and Taxonomic Datasets connected

A Network to distribute and access to Geospatial Data

Distributed Storage System to store datasets and documents

A Social Networkto share opinions and useful news

Algorithms for Biology-related experiments

Page 5: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

Data Processing

Page 6: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

1. Data processing requirements by communities of practice

2. The D4Science Statistical Manager

3. Ecological modelling

Page 7: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

Some interests by communities of practice in Computational Statistics:

1. Repetition and validation of experiments

2. Exploitation of algorithms in several contexts

3. Hide the complexity of the calculations

4. Facilitate the management and the publication of the algorithms

Issues

Page 8: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

…practically speaking, they search for:

1. Modular and pluggable solutions

2. Access by means of standard protocols

3. Hiding the complexity of parallel processing

4. Hiding the complexity of software management and provisioning

5. Active contribution with new algorithms and use cases

Issues

Page 9: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

1. Data processing requirements by communities of practice

2. The D4Science Statistical Manager

3. Ecological modelling

Page 10: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

The Statistical Manager is a set of web services that aim to:

• Help scientists in computational statistics experiments

• Supply precooked state-of-the-art algorithms as-a-Service

• Perform calculations by using Map-Reduce in a seamless way to the users

• Share input, results, parameters and comments with colleagues by means of Virtual Research Environment in the D4Science e-Infrastructure

Statistical Manager – Users’ View

StatisticalManager

D4ScienceComputational

FacilitiesSharing

Setup and execution

Page 11: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

Open Platform Approach

External Computing

Facility

OGC WPS

Interface

People can contribute with:

• R scripts• Java programs• Linux programs• OGC-WPS services

Page 12: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

The Statistical Manager allows to:

• Develop distributed computation in easy way (Statistical Manager Framework)

• Parallelize R Scripts without possibly changing the code

• Automatically produce a User Interface to perform experiments

• Reuse models and best practices developed by the community

• Connect external computational facilities via WPS OGC Standard

Statistical Manager – Developers’ View

Page 13: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

Architecture

Page 14: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

Internal Work

Page 15: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

The Context: Resources and Sharing

Page 16: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

Statistical Manager - Interface

Page 17: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

Experiment Execution

Page 18: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

Computations Check

Summary of the Input, Output and Parameters of the experiment

Page 19: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

Data Space - Sharing and Import

Page 20: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

100 Hosted Algorithms

Page 21: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

Numbers

FishBase (US, CA, TW)GeomarNaturhistoriska riksmuseet: StartsidaAgrocampusAnonymous Individ-ualsINRAKing Abdullah Uni-versity of Science and TechnologyISTI

Users

2013 2014Avg Users per month 200 20100

Number of Algorithms 50 100

Number of contributing Organizations providing algorithms

2 CNR,

Geomar

7CNR,

Geomar,FIN,FAO,T2,IRD,

AgrocampusPublications 8 13Sum Impact

Factor 2.66 12.17

Page 22: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

20121. L. Candela, G. Coro, P. Pagano, ”Supporting Tabular Data Characterization in a Large Scale Data Infrastructure by Lexical Matching Techniques”, In M. Agosti et al. (Eds.): IRCDL 2012, Communications in Computer

and Information Science Volume 354, pp. 21–32. Springer, Heidelberg (2012).

20132. R. Froese, J. Thorson, R. B. Reyes Jr. A Bayesian approach for estimating length-weight relationships in fishes. Journal of Applied Ichthyology. Volume 30, Issue 1, pages 78–85, 20133. G. Coro, P. Pagano, A. Ellenbroek, ”Combining Simulated Expert Knowledge with Neural Networks to Produce Ecological Niche Models for Latimeria chalumnae”, Ecological Modelling, DOI

10.1016/j.ecolmodel.2013.08.005, Ed. Elsevier.4. G. Coro, L. Fortunati, P. Pagano. Deriving Fishing Monthly Effort and Caught Species from Vessel Trajectories. Oceans 2013, Proceedings of MTS/IEEE.5. P. Pagano, G. Coro, D. Castelli, L. Candela, F. Sinibaldi, A. Manzi. Cloud Computing for Ecological Modeling in the D4Science Infrastructure. Proceedings of EGI Community Forum 2013.6. D. Castelli, P. Pagano, G. Coro, F. Sinibaldi, ”Modellazione della Nicchia Ecologica di Specie Marine (Marine Species Ecological Niche Modelling)”. In “Le Tecnologie del CNR per il Mare” (CNR Marine Technologies)

pp. 140, Ed. CNR (Roma, Italy).7. D. Castelli, P. Pagano, G. Coro, ”Variazioni Climatiche ed Effetto sulle Specie Marine (Climate Changes and Effect on Marine Species)”. In ”Le Tecnologie del CNR per il Mare” (CNR Marine Technologies) pp. 139, Ed.

CNR (Roma, Italy).8. D. Castelli, P. Pagano, G. Coro, ”Elaborazione di Dati Trasmessi da Pescherecci (Processing of fishing vessel transmitted information)”. In “Le Tecnologie del CNR per il Mare” (CNR Marine Technologies). pp. 133, Ed.

CNR (Roma, Italy).9. G. Coro, P. Pagano, A. Ellenbroek. Automatic Procedures to Assist in Manual Review of Marine Species Distribution Maps. To be published in M. Tomassini et al. (Eds.): International Conference on Adaptive and

Natural Computing Algorithms (ICANNGA’13), Springer, Heidelberg (2013).10. Candela L., Castelli D., Coro G., Pagano P., Sinibaldi F. Species distribution modeling in the cloud. In: Concurrency and Computation-Practice & Experience, Geoffrey C. Fox, David W. Walker (eds.). Wiley,11. Appeltans W., Pissierssens P., Coro G., Italiano A., Pagano P., Ellenbroek A., Webb T. Trendylyzer: a long-term trend analysis on biogeographic data. In: Bollettino di Geofisica Teorica e Applicata: an International

Journal of Earth Sciences, vol. 54 (Suppl.) pp. 203 - 205. Supplement: IMDIS 2013 - International Conference on Marine Data and Information Systems, 23-25 September, Lucca (Italy). OGS - Istituto Nazionale di Oceanografia e di Geofisica Sperimentale, 2013.

12. Coro G., Gioia A., Pagano P., Candela L. A service for statistical analysis of marine data in a distributed e-infrastructure. In: Bollettino di Geofisica Teorica e Applicata: an International Journal of Earth Sciences, vol. 54 (Suppl.) pp. 68 - 70. Supplement: IMDIS 2013 - International Conference on Marine Data and Information Systems, 23-25 September, Lucca (Italy). OGS - Istituto Nazionale di Oceanografia e di Geofisica Sperimentale, 2013.

13. Castelli D., Pagano P., Candela L., Coro G. The iMarine data bonanza: improving data discovery and management through a hybrid data infrastructure. In: Bollettino di Geofisica Teorica e Applicata: an International Journal of Earth Sciences, vol. 54 (Suppl.) pp. 105 - 107. Supplement: IMDIS 2013 - International Conference on Marine Data and Information Systems, 23-25 September, Lucca (Italy). OGS - Istituto Nazionale di Oceanografia e di Geofisica Sperimentale, 2013.

14. Coro G. A Lightweight Guide on Gibbs Sampling and JAGS. A Lightweight Guide on Gibbs Sampling and JAGS. Technical report, 2013.15. Vanden Berghe E., Bailly N., Aldemita C., Fiorellato F., Coro G., Ellenbroek A., Pagano P. BiOnym - a flexible workflow approach to taxon name matching. In: TDWG 2013 - Taxonomic Database Working Group 2013

(Firenze, 28-31 October 2013). 16. Coro G., Pagano P., Candela L. Providing Statistical Algorithms as-a-Service. In: TDWG 2013 - Taxonomic Database Working Group 2013 (Firenze, 28-31 October 2013).

201417. Candela L., Castelli D., Coro G., De Faveri F., Italiano A., Lelii L., Mangiacrapa F., Marioli V., Pagano P. Integrating Species Occurrence Databases to Facilitate Data Analysis. Approved for the Ecological Informatics

Journal, Elsevier 2014.18. Froese R, Coro G., Kleisner K., Demirel N. Revisiting Safe Biological Limits in Fisheries. Sumitted to the Fish and Fisheries Journal, Wiley 201419. Coro G., Candela L., Pagano P., Italiano A., Liccardo L. Parallelising the Execution of Native Data Mining Algorithms for Computational Biology. Submitted to Concurrency and Computation-Practice & Experience,

Wiley 2014.20. Coro G. , Pagano P., Ellenbroek A. Comparing Heterogeneous Distribution Maps for Marine Species. Submitted to GIScience & Remote Sensing, Taylor & Francis 2014.

201521. G. Coro, C. Magliozzi, A. Ellenbroek, P. Pagano, Improving data quality to build a robust distribution model for Architeuthis dux, Ecological Modelling, Volume 305, 10 June 2015, Pages 29-39, ISSN 0304-380022. G. Coro, C. Magliozzi, E. Vanden Berghe, N. Bailly, A. Ellenbroek, P. Pagano, Estimating absence locations of marine species from data of scientific surveys23. R. Froese, N. Demirel, G. Coro, K. Kleisner, H. Winker, Estimating Fisheries Reference Points from Catch and Resilience24. E. Vanden Berghe, N. Bailly, G. Coro, F. Fiorellato, C. Aldemita, A. Ellenbroek, P. Pagano. Retrieving taxa names from large biodiversity data collections using a flexible matching workflow25. G. Coro, C. Magliozzi, A. Ellenbroek, K. Kaschner, P. Pagano. Automatic classification of climate change effects on marine species distributions in 2050 using the AquaMaps model26. E. Trumpy, G. Coro, A. Manzella, P. Pagano, D. Castelli, P. Calcagno, A. Nador, T. Bragasson, S. Grellet. Building a European Geothermal Information Network using a

Publications around the Statistical Manager

Page 23: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

1. Data processing requirements by communities of practice

2. The D4Science Statistical Manager

3. Ecological modelling

Page 24: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

Niche Modelling

Scope: • characterize the environmental conditions that are suitable for the species to

subsist;• identify where suitable environment is distributed in geographical space;• estimate the actual and potential geographic distributions of a species.

Actual distribution: areas that are truly occupied by the speciesFundamental niche: the full range of abiotic conditions within which the species is viablePotential distribution: areas with abiotic conditions that fall within the fundamental niche

Page 25: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

Niche Modelling and Absence and Presence Points

Approaches: Mechanistic models: incorporate physiological limits in a species tolerance to environmental conditions;Correlative models: automatically estimate the environmental conditions that are suitable for a species by relying on examples.

Presence points: occurrence records, i.e. places where the species has been observed in its habitat

Absence points: locations where the environment is considered unsuitable for the species. In many cases, absence points must be simulated (pseudo-absence points), because reliable data are rare.

Page 26: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

Examples: Potential Distributions of the Coelacanth

Presence-only: MaxEnt Presence-only: GARP

Expert (semi-Mechanistic): AquaMaps

Presence\Absence: Artificial Neural Networks

Comparison between several approaches estimating the potential distribution of the Coelacanth.

The best depends on the quality of the data.Thus, cleaning operations are very important!

Page 27: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

C-squares (concise spatial query and representation system):

• A system of geocodes that provides a basis for simple spatial indexing of geographic features

• Devised by Tony Rees of CSIRO Marine and Atmospheric Research

• A compact encoding of Latitude and Longitude and Resolution

Example:

C-square code: 3414:227:3 Resolution: 0.5°N,S,W,E limits: -42.5,-43.0,147.0,147.5

A useful converter: http://www.marine.csiro.au/marq/csq_builder.init

C-square codes

Page 28: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

Contains information on:a) cell codesb) statistical cell properties (center, limits, and area);c) membership in relevant areas (FAO areas, EEZs or LMEs);d) physical attributes (depth, salinity or temperature);e) biological properties (e.g. primary production).

Data gathered from:Sea Around Us ProjectCSIROKansas Geological Survey

Compiled by:Kristin Kaschner & Jonathan Ready

HCAF (Half-degree Cells Authority File)

Page 29: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

Contains information used for describing the environmental tolerance and preference of a species:

• distribution using FAO areas and bounding box• range of values per environmental parameter (min., preferred

min., preferred max., max.)

HSPEN (Half-degree Species Environmental Envelope)

Page 30: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

Online experiment: the i-Marine Filtering Facilities

https://i-marine.d4science.org/group/biodiversitylab/processing-tools

Page 31: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

A Niche model relying on expert knowledge

Page 32: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

Contains the assignment of a species to a half-degree cell and the corresponding probability of occurrence of the species in a given cell;

The assignment probability is the multiplicative equation of each of the environmental parameters (SST, salinity, prim. prod., sea ice concentration, distance to land).

HSPEC (Half-degree Species Assignment)

Page 33: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

AquaMaps

Gadus morhua

A Presence-only species model that relies on expert knowledge about the species habitat• AquaMaps Suitable: estimates the Potential Distribution• AquaMaps Native: estimates the Actual Distribution

• Maps have 0.5 degrees resolution;• Expert knowledge is used in modelling the habitat parameters;• AquaMaps adopts mechanistic assumptions combined with an automatic estimation of

parameter values.

Page 34: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

• “good cells” - within bounding box or known FAO areas• minimum of 10 “good cells” for needed for extracting parameters

Bounding box or FAO area limits serve as independent verification of the validity of occurrence records.

AquaMaps – Good Cells

Taken from: http://www.aquamaps.org/main/presentations/Part%20II%20-%20AquaMaps%20behind%20the%20scene.pdf

Page 35: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

Global grid of 259,200 half degree cells

Good cells are used to derive the range of environmental parameters within the species’ native range.

AquaMaps – Extracting Environmental Parameters

Taken from: http://www.aquamaps.org/main/presentations/AquaMaps_General0908.pdf

Page 36: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

• Depth ranges: typically from literature; depth estimate based on habitat description

• Min = 25th percentile - 1.5 * interquartile or absolute minimum in extracted data (whichever is greater)

• Max = 75th percentile + 1.5 * interquartile or absolute maximum in extracted data (whichever is greater)

• PrefMin = 10th percentile of observed variation in an environmental parameter

• PrefMax = 90th percentile of observed variation in an environmental parameter

• Surface values for species with min depth ≤ 200m

• Bottom values for species with min depth > 200m

The environmental envelopes describe tolerances of a species with respect to each environmental parameter.

AquaMaps – Environmental Envelopes

Taken from: http://www.aquamaps.org/main/presentations/AquaMaps_General0908.pdf

Page 37: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

Predictor

Preferred min

Preferred max

Min Max

PMaxRe

lativ

e pr

obab

ility

of

occ

urre

nce

Pc = Pbathymetryc x PSSTc x Psalinityc x Pchl ac x PIceDistc x PLandDistc

Probabilities of species occurrence are generated by matching the species environmental envelope against local environmental conditions to determine relative suitability of a given area.

Probability of Occurrence

AquaMaps – Environmental Envelopes

Taken from: http://www.aquamaps.org/main/presentations/AquaMaps_General0908.pdf

Page 38: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

The probability is calculated for each 0.5 cell

in the oceans.A color is associated to the probability values

AquaMaps – Probability

Pc = Pbathymetryc x PSSTc x Psalinityc

x Pchl ac x PIceDistc x PLandDistc

Page 39: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

Online experiment: AquaMaps

https://i-marine.d4science.org/group/biodiversitylab/processing-tools

Page 40: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

What if Expert Knowledge was missing?

Page 41: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

Artificial Neural Network

Presence/Absence Points examples

Probability (1/ 0)

• Learns from positive (presence) and negative (absence) examples (training mode);• Adapts the network weights to produce the correct outputs on the examples;• Produces probability values for new input (test mode).

Page 42: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

Artificial Neural Networks Maps

Page 43: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

Examples and Exercises: AquaMaps - Neural Networks

https://i-marine.d4science.org/group/biodiversitylab/processing-tools

Page 44: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

Climate change analysis

Page 45: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

• HCAF Scenarios can be simulated by means of interpolation.

• Interpolation produces half-degree values between a start and an end date

• Once new HCAFs are available we can produce an HSPEC for each HCAF

Simulation of HCAF Scenarios

Page 46: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

Climate Changes Effects on Species

Estimated impact of climate changes over 20 years on 11549 species.

Bioclimate HSpec

Overall occupancy in time

Page 47: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

Online experiment: BioClimate Analysis

https://i-marine.d4science.org/group/biodiversitylab/processing-tools

Page 48: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

Grouping the occurrence points and the environmental features

of different species

Page 49: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

• Group points by spatial distance or density• Detect outliers

Occurrence Points Clustering

Page 50: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

DBScan acts on the points density

Parameters:• Epsilon = 10• Min Points = 2

Outliers

Density Clustering

Page 51: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

XMeans

K = [20,30]Min Points = 2MaxIter=1000

KMeans

K = 24Min Points = 2MaxIter=1000MaxOptSteps = 1000

No Outliers Detected!

No Outliers Detected!

Distance Clustering

Page 52: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

Online experiment: Clustering

https://i-marine.d4science.org/group/biodiversitylab/processing-tools

Page 53: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

Discovering similaritiesamong habitats

Page 54: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

Similarity between habitatsHabitat Representativeness Score:• Measures the degree to which sampled habitats are representative for a certain

area of study;• Has been used for assessing the minimum number of surveys on a study area that

are needed to cover a good heterogeneity of species habitat variables.Can be used to:• Measure the similarity between the environmental features of two areas;• Assesses the quality of models and environmental features.

HRS=10.6

Habitat Representativeness

Score

Page 55: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

A+P HRS 10.58

PHRS 10.61

Habitat Representativeness Score

Absence

Presence The HRS is too high -> all the maps can be unreliable and need expert validation

HRS is in [0;2] for each featureThe overall HRS is the sum of the HRSs of the environmental features

Page 56: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

Habitat Representativeness Score for each Feature

HRS 10.58

mean depth in t.c. 1.90max depth in t.c. 0.87min depth in t.c. 0.04mean annual s surface temp 1.19mean annual s bottom temp 1.59mean salinity in t.c. 1.23mean bottom salinity in t.c. 0.44mean primary production 0.61annual ice concentration 0.71distance from land 0.46ocean area in t.c. 1.54

Presence, Absence

HRS 10.61

mean depth in t.c. 1.92max depth in t.c. 0.86min depth in t.c. 0.04mean annual s surface temp 1.13mean annual s bottom temp 1.56mean salinity in t.c. 1.29mean bottom salinity in t.c. 0.34mean primary production 0.64annual ice concentration 0.78distance from land 0.49ocean area in t.c. 1.55

The most representative feature is the minimum depth in a cell of 0.5 degrees

Presence only

Even in this case the most representative feature is the minimum depth in a cell of 0.5 degrees

Page 57: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

Online experiment: Habitat Representativeness Score

https://i-marine.d4science.org/group/biodiversitylab/processing-tools

Page 58: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

Retrieving taxonomic information for a set of species

Page 59: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

BiOnym

PreprocessingAnd

Parsing

A workflow approach to taxon name matching.

Accounts for:• Variations in the spelling and

interpretation of taxonomic names

• Combination of data from different sources

• Harmonization and reconciliation of Taxa names

Taxon Matcher 1

Taxon Matcher 2

Taxon Matcher n

PostProcessing

ReferenceSource(ASFIS)

ReferenceSource

(FISHBASE)

ReferenceSource

(WoRMS)

Raw Input String. E.g. Gadus morua Lineus 1758

Correct Transcriptions: E.g. Gadus morhua (Linnaeus, 1758)

ReferenceSource

(Other in DwC-A)

Page 60: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

GSAy

GSAY

GSrAy

GSrAY

GSA

Complete matchStep RateGSAy 950GSAY 940GSrAy 930GSrAY 920GSA 910GSrA 900GSY 890GSrY 880SAy 870SAY 860SrAy 850SrAY 840GAy 830GAY 820…

Parentheses issue

Gender agreement issues

Gender agreement and parentheses issues

Year issues

GSAYear issues

Matcher Example - GSAy

Page 61: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

GSY

GS

SrAy

Rest

Author issues, misspelling or wrongStep RateGSY 950GSAY 940GSrAy 930GSrAY 920GSA 910GSrA 900GSY 890GSrY 880SAy 870SAY 860SrAy 850SrAY 840GAy 830GAY 820…

Homonyms

Other combinations

Taxamatch

GAYVisual check

Matcher Example - GSAy

Page 62: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

BiOnym - Output

Page 63: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

Online experiment: BiOnym

https://i-marine.d4science.org/group/biodiversitylab/processing-tools