topic mapping tools for biomedical corpora

19
Topic Mapping Tools for Biomedical Corpora Gully APC Burns, USC/ISI Dave Newman, UC Irvine Bruce Herr, IU

Upload: rama-nielsen

Post on 31-Dec-2015

25 views

Category:

Documents


2 download

DESCRIPTION

Topic Mapping Tools for Biomedical Corpora. Gully APC Burns, USC/ISI Dave Newman, UC Irvine Bruce Herr, IU. ‘Snapshots of Neuroscience’. Society for Neuroscience Annual meeting (2000 New Orleans) ~30,000 attendees, ~12,000 posters per year. Basic Idea: Topic Modeling. - PowerPoint PPT Presentation

TRANSCRIPT

Topic Mapping Tools for Biomedical Corpora

Gully APC Burns, USC/ISIDave Newman, UC IrvineBruce Herr, IU

‘Snapshots of Neuroscience’

Society for Neuroscience Annual meeting (2000 New Orleans)~30,000 attendees, ~12,000 posters per year

Basic Idea: Topic Modeling

Erythropoietin (Epo), a hematopoietic cytokine, has recently been demonstrated to provide neuroprotection on nigral dopaminergic neurons. However, there is no information available about whether Epo can protect dopaminergic neurons from the neurotoxicity of 6-hydroxydopamine (6-OHDA) that is most commonly used to create a rat model of Parkinson’s disease (PD). In the present study, we tested the hypothesis that recombinant human Epo (rhEpo) would protect dopaminergic neurons and improve neurobehavioral outcomes in a rat model of progressive PD. rhEpo (20 units in 2μl of vehicle) was stereotaxically injected into one side of the striatum. The 6-OHDA lesion was made into the same side one day after rhEpo treatment. Methamphetamine-induced rotation was measured 3 and 10 weeks after the lesion, and paw reaching was also tested at 10 weeks. After the last time of behavioral test, rats were then sacrificed, and the brains were perfusion-fixed for histology and immunocytochemistry. We observed that intrastriatal administration of rhEpo significantly reduced the degree of rotational asymmetries. The rhEpo-treated animals also showed a better improvement in skilled forelimb use when compared with the control rats. In accompanying with the recovery of neurobehavioral outcomes, tyrosine hydroxylase (TH)-immunoreactive neurons of the substantia nigra were protected from progressive degeneration in the rhEpo-treated rats. TH-immunoreactivity in the 6-OHDA lesioned striatum also significantly increased in the rhEpo-treated rats. To examine if systemic administration of rhEpo could exert the similar biological effects …

Basic Idea: Topic Modeling

Basic Idea: Topic Modeling

... plus all remaining ‘topic mass’ – provides a signature from which we can calculate document-document similarities (~12,000 x ~12,000 matrix)

‘Topic Mapping’ Workflow

ischemia cerebral

ischemic stroke brain occlusion injury infarct mcao hour reperfusion

artery volume model middle

transient

LiteratureCorpus

Topic Modeling

using Gibbs Sampling

Topic ModelDocument-Document

Similarity Map

Google MapsApplication

Graph Layout

Processing with VxOrd /

DrL

Multi-level image

rendering, Cluster

analysis for label

placement

Implementation 1: SfN 2006 Maps @ SfN 2007

Analysis: Dave Newman, UCIVisualization: Bruce Herr, IU

Lessons LearnedThis demonstration had a high impact at SfN 2007

[Shown to Neuroinformatics Committee (NIC), PubMed Plus Panel, Program Committee, General Council]

Why?1. System emphasizes elegant visualization2. Application has natural, familiar, intuitive design3. Criticisms centered on concerns about analysis

validity (‘what do clusters actually mean’?) ...but, system focused on utility, not interpretations...

Next Steps

Gary Westbrook [NIC, ex-editor of J Neurosci, external

committee of National Institute of Neurological Disorders and Stroke, NINDS]

Edmund Talley [Program Director NINDS, Channels Synapses

and Circuits]

Requested a system to examine NINDS grants accessed from CRISP

CRISP: Computer Retrieval of Information on Scientific Projects

Lists all funded DHHS projects from 1972[including data from NIH, CDC, FDA, HRSA and AHRQ]

Build topic map of NINDS 2006 grants in relation to 13 other NIH institutes involved with funding Neuroscience research.[Largest Institute: NCI ~ 9373 grants (2006)][Smallest Institute: NIAAA ~ 1198 grants (2006)]

Downloaded 10 years of abstracts from NINDS (to weight distribution in favor of NINDS topics) and 1 year of all other 13 institutes.

NINDS staff hand-annotated ~2500 grants with SfN categories (theme, sub-theme, topic) to compare with categories generated by the topic model.

Additional Features for this implementation

Improved navigability Multiple maps Multiple labeling / coloring

schemes Search

Google Map – based flags, etc. full-text search within the HTML

application

Implementation 2: NINDS + NIH Maps for 2006

What’s Next?All 2007 abstracts from NIH (all institutes)Diagnostic functions within browser

- ‘Heat maps’ of each individual topic- ‘Cluster Expansion’

Trend analysisWhich topics are emergent? Which are in

decline?Can we perform analysis across corpora?

SfN abstracts from 2001-2008Medline (>8 million abstracts)CRISP (funded federal project abstracts) PubMed Central (~1 million full text papers)Other full-text resources

‘Cluster Expansion’

What’s Next?All 2007 abstracts from NIH (all institutes)Diagnostic functions within browser

- ‘Heat maps’ of each individual topic- ‘Cluster Expansion’

Trend analysisWhich topics are emergent? Which are in

decline?Can we perform analysis across corpora?

SfN abstracts from 2001-2008Medline (>8 million abstracts)CRISP (funded federal project abstracts) PubMed Central (~1 million full text papers)Other full-text resources

Data across many years allows trend analysis

Medline Data

PDHIVp53

What’s Next?All 2007 abstracts from NIH (all institutes)Diagnostic functions within browser

- ‘Heat maps’ of each individual topic- ‘Cluster Expansion’

Trend analysisWhich topics are emergent? Which are in

decline?Can we perform analysis across corpora?

SfN abstracts from 2001-2008Medline (>8 million abstracts)CRISP (funded federal project abstracts) PubMed Central (~1 million full text papers)Other full-text resources

Full-text Biomedical Articles

Source Size (# articles millions)

Type

Medline 15.8 Citations

Elsevier’s ScienceDirect 6.75 Articles

PubMed Central 0.97 Articles

Cambridge Journals 0.18 Articles

JSTOR 1.62 Articles

SpringerLink (Biomedical / Medical)

1.32 (0.72 / 0.60)

Articles

Wiley Interscience 1.50 Articles

Acknowledgements

Funding Information

Sciences Institute, seed funding

NSF: IIS-0513650 NINDS contracts

(Ned Talley)

Collaborators Dave Newman (UCI) Bruce Herr (IU)

Developers Tommy Ingulfsen

Contributing Computer Scientists Padhraic Smyth

(UCI) Katy Borner (IU) Patrick Pantel

(ISI/Yahoo!)