text mining and environmental metadata suggestion - flash talk - gsc17

15
GSC17 – 5 th May 2015 – Walnut Creek, CA Evangelos Pafilis Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC) Hellenic Centre for Marine Research (HCMR), Heraklion Crete, Greece [email protected], http://epafilis.info Text Mining and Environmental Metadata Suggestion GSC17 Flash Talk

Upload: evangelos-pafilis

Post on 31-Jul-2015

181 views

Category:

Data & Analytics


0 download

TRANSCRIPT

GSC17 – 5th May 2015 – Walnut Creek, CA

Evangelos Pafilis

Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC)

Hellenic Centre for Marine Research (HCMR), Heraklion Crete, Greece

[email protected], http://epafilis.info

Text Mining and Environmental

Metadata Suggestion

GSC17 Flash Talk

GSC17 – 5th May 2015 – Walnut Creek, CA

Microbes are key players in both healthy and degraded coral reefs. A combination of metagenomics, microscopy, culturing, and water chemistry were used to characterize microbial communities on four coral atolls in the Northern Line Islands, central Pacific.

Source: http://metagenomics.anl.gov/linkin.cgi?metagenome=4440039.3 (“Project Description”, Dinsdale et al, 2008)

Microbial mat samples were collected from the hydrothermal vent field located in the Kolumbo submarine volcanic crater, off the coast of the island of Santorini. The bacteria and archaea community composition was evaluated further via shotgun metagenomics analysis

In-house HCMR document (Polymenakou, Oulas, et al.)

Source: http://onlinelibrary.wiley.com/doi/10.1111/1758-2229.12264/full (Lagostina et al., 2015)

Figure 1. Sampling sites on a cross-slope transect. [….] Oceanographically, the stations represent the abyssal plain (GeoB12815), the continental rise (GeoB12808, GeoB12811), the continental slope (GeoB12803, GeoB12802), the shelf break (GeoB12807) and the shelf (GeoB12806). Surface sediments were recovered by gravity and multi-coring. [.…]

Scientific web pages

Literature (abstracts, full-text articles, legends)

In-house documents

GSC17 – 5th May 2015 – Walnut Creek, CA

Microbes are key players in both healthy and degraded coral reefs. A combination of metagenomics, microscopy, culturing, and water chemistry were used to characterize microbial communities on four coral atolls in the Northern Line Islands, central Pacific.

Source: http://metagenomics.anl.gov/linkin.cgi?metagenome=4440039.3 (“Project Description”, Dinsdale et al, 2008)

Microbial mat samples were collected from the hydrothermal vent field located in the Kolumbo submarine volcanic crater, off the coast of the island of Santorini. The bacteria and archaea community composition was evaluated further via shotgun metagenomics analysis

In-house HCMR document (Polymenakou, Oulas, et al.)

Source: http://onlinelibrary.wiley.com/doi/10.1111/1758-2229.12264/full (Lagostina et al., 2015)

Figure 1. Sampling sites on a cross-slope transect. [….] Oceanographically, the stations represent the abyssal plain (GeoB12815), the continental rise (GeoB12808, GeoB12811), the continental slope (GeoB12803, GeoB12802), the shelf break (GeoB12807) and the shelf (GeoB12806). Surface sediments were recovered by gravity and multi-coring. [.…]

Scientific web pages

Literature (abstracts, full-text articles, legends)

In-house documents

GSC17 – 5th May 2015 – Walnut Creek, CA

http://species.hcmr.gr, http://species.jensenlab.org

http://environments.hcmr.gr,http://environments.jensenlab.org

http://www.environmentontology.org/ Buttigieg PL, et al. 2013, J Biomed Semant.4:43.

http://www.ncbi.nlm.nih.gov/Taxonomy Benson DA, et al. 2009, NAR

GSC17 – 5th May 2015 – Walnut Creek, CA

•  Dictionary based approaches

•  Flexible matching, e.g. hyphenation

•  Orthographic dictionary expansion,

e.g. adjective and plural forms,

shorthand taxon name forms

•  Manually Curated Stopword-list

GSC17 – 5th May 2015 – Walnut Creek, CA

Command line

GSC17 – 5th May 2015 – Walnut Creek, CA

•  Interactive

•  Lightweight

•  Term look up assistant

•  Standards-compliant term suggestions

GSC17 – 5th May 2015 – Walnut Creek, CA

Prototype: http://environments.hcmr.gr/biocreative.html

GSC17 – 5th May 2015 – Walnut Creek, CA

https://gold.jgi-psf.org/studies?Study.Metagenomic+Study=Yes&Study.Is+Public=Yes

A

B

C

GSC17 – 5th May 2015 – Walnut Creek, CA

Prototype: http://environments.hcmr.gr/biocreative.html

GSC17 – 5th May 2015 – Walnut Creek, CA

Prototype: http://environments.hcmr.gr/biocreative.html

GSC17 – 5th May 2015 – Walnut Creek, CA

Prototype: http://environments.hcmr.gr/biocreative.html

retroactive prospective

GSC17 – 5th May 2015 – Walnut Creek, CA

BioCreative V Track 5: Interactive Curation (IAT) Dr. L. Hirschman, Dr. C. Arighi et al. September 2015, Sevilla, Spain Under Development •  Entity highlighting •  Suggested Term:

•  sorting •  selection •  addition •  exporting

•  Integration with Metagenomics Resources

GSC17 – 5th May 2015 – Walnut Creek, CA

http://jensenlab.org/

http://tissues.jensenlab.org/ - Santos A et al. (under review), preprint: http://biorxiv.org/content/early/2014/11/10/010975

http://diseases.jensenlab.org/ - Pletscher-Frankild,S., et al. (2014) DISEASES: Text mining and data integration of disease-gene associations. Methods, 74, 83–89.

GSC17 – 5th May 2015 – Walnut Creek, CA

This prototype is based on components from: ENVIRONMENTS, SPECIES/ORGANISMS, SEQenv (https://bitbucket.org/seqenv), and Reflect (http://reflect.ws)

It is developed by the LifeWatchGreece Research Infrastructure and the group of Dr. Lars Juhl Jensen (Uni Copehagen); with input from the groups of Genomes OnLine Database, Virome / Metagenomes Online, and Dr. Pier L. Buttigieg (AWI). BioCreative: Dr. L. Hirschman (MITRE), DoE Award No DE-SC0010838

Funding: DoE – BioCreative V, LifeWatch Greece, NNF-CPR,”SEQenv” Hackathons (COST ES1103)

Thank You!

Amvrakikos Lagoons, May 2011

ACTION ES1103