text mining and environmental metadata suggestion - flash talk - gsc17
TRANSCRIPT
GSC17 – 5th May 2015 – Walnut Creek, CA
Evangelos Pafilis
Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC)
Hellenic Centre for Marine Research (HCMR), Heraklion Crete, Greece
[email protected], http://epafilis.info
Text Mining and Environmental
Metadata Suggestion
GSC17 Flash Talk
GSC17 – 5th May 2015 – Walnut Creek, CA
Microbes are key players in both healthy and degraded coral reefs. A combination of metagenomics, microscopy, culturing, and water chemistry were used to characterize microbial communities on four coral atolls in the Northern Line Islands, central Pacific.
Source: http://metagenomics.anl.gov/linkin.cgi?metagenome=4440039.3 (“Project Description”, Dinsdale et al, 2008)
Microbial mat samples were collected from the hydrothermal vent field located in the Kolumbo submarine volcanic crater, off the coast of the island of Santorini. The bacteria and archaea community composition was evaluated further via shotgun metagenomics analysis
In-house HCMR document (Polymenakou, Oulas, et al.)
Source: http://onlinelibrary.wiley.com/doi/10.1111/1758-2229.12264/full (Lagostina et al., 2015)
Figure 1. Sampling sites on a cross-slope transect. [….] Oceanographically, the stations represent the abyssal plain (GeoB12815), the continental rise (GeoB12808, GeoB12811), the continental slope (GeoB12803, GeoB12802), the shelf break (GeoB12807) and the shelf (GeoB12806). Surface sediments were recovered by gravity and multi-coring. [.…]
Scientific web pages
Literature (abstracts, full-text articles, legends)
In-house documents
GSC17 – 5th May 2015 – Walnut Creek, CA
Microbes are key players in both healthy and degraded coral reefs. A combination of metagenomics, microscopy, culturing, and water chemistry were used to characterize microbial communities on four coral atolls in the Northern Line Islands, central Pacific.
Source: http://metagenomics.anl.gov/linkin.cgi?metagenome=4440039.3 (“Project Description”, Dinsdale et al, 2008)
Microbial mat samples were collected from the hydrothermal vent field located in the Kolumbo submarine volcanic crater, off the coast of the island of Santorini. The bacteria and archaea community composition was evaluated further via shotgun metagenomics analysis
In-house HCMR document (Polymenakou, Oulas, et al.)
Source: http://onlinelibrary.wiley.com/doi/10.1111/1758-2229.12264/full (Lagostina et al., 2015)
Figure 1. Sampling sites on a cross-slope transect. [….] Oceanographically, the stations represent the abyssal plain (GeoB12815), the continental rise (GeoB12808, GeoB12811), the continental slope (GeoB12803, GeoB12802), the shelf break (GeoB12807) and the shelf (GeoB12806). Surface sediments were recovered by gravity and multi-coring. [.…]
Scientific web pages
Literature (abstracts, full-text articles, legends)
In-house documents
GSC17 – 5th May 2015 – Walnut Creek, CA
http://species.hcmr.gr, http://species.jensenlab.org
http://environments.hcmr.gr,http://environments.jensenlab.org
http://www.environmentontology.org/ Buttigieg PL, et al. 2013, J Biomed Semant.4:43.
http://www.ncbi.nlm.nih.gov/Taxonomy Benson DA, et al. 2009, NAR
GSC17 – 5th May 2015 – Walnut Creek, CA
• Dictionary based approaches
• Flexible matching, e.g. hyphenation
• Orthographic dictionary expansion,
e.g. adjective and plural forms,
shorthand taxon name forms
• Manually Curated Stopword-list
GSC17 – 5th May 2015 – Walnut Creek, CA
• Interactive
• Lightweight
• Term look up assistant
• Standards-compliant term suggestions
GSC17 – 5th May 2015 – Walnut Creek, CA
https://gold.jgi-psf.org/studies?Study.Metagenomic+Study=Yes&Study.Is+Public=Yes
A
B
C
GSC17 – 5th May 2015 – Walnut Creek, CA
Prototype: http://environments.hcmr.gr/biocreative.html
retroactive prospective
GSC17 – 5th May 2015 – Walnut Creek, CA
BioCreative V Track 5: Interactive Curation (IAT) Dr. L. Hirschman, Dr. C. Arighi et al. September 2015, Sevilla, Spain Under Development • Entity highlighting • Suggested Term:
• sorting • selection • addition • exporting
• Integration with Metagenomics Resources
GSC17 – 5th May 2015 – Walnut Creek, CA
http://jensenlab.org/
http://tissues.jensenlab.org/ - Santos A et al. (under review), preprint: http://biorxiv.org/content/early/2014/11/10/010975
http://diseases.jensenlab.org/ - Pletscher-Frankild,S., et al. (2014) DISEASES: Text mining and data integration of disease-gene associations. Methods, 74, 83–89.
GSC17 – 5th May 2015 – Walnut Creek, CA
This prototype is based on components from: ENVIRONMENTS, SPECIES/ORGANISMS, SEQenv (https://bitbucket.org/seqenv), and Reflect (http://reflect.ws)
It is developed by the LifeWatchGreece Research Infrastructure and the group of Dr. Lars Juhl Jensen (Uni Copehagen); with input from the groups of Genomes OnLine Database, Virome / Metagenomes Online, and Dr. Pier L. Buttigieg (AWI). BioCreative: Dr. L. Hirschman (MITRE), DoE Award No DE-SC0010838
Funding: DoE – BioCreative V, LifeWatch Greece, NNF-CPR,”SEQenv” Hackathons (COST ES1103)
Thank You!
Amvrakikos Lagoons, May 2011
ACTION ES1103