1 inls890 evidence-based discovery spring, 2009 catherine blake, ph.d

25
1 INLS890 Evidence-Based Discovery Spring, 2009 Catherine Blake, Ph.D

Upload: brendan-dean

Post on 12-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 INLS890 Evidence-Based Discovery Spring, 2009 Catherine Blake, Ph.D

1

INLS890 Evidence-Based Discovery

Spring, 2009Catherine Blake, Ph.D

Page 2: 1 INLS890 Evidence-Based Discovery Spring, 2009 Catherine Blake, Ph.D

2

Today

• Introductions• Administration• Course Structure• Learning Objectives• Assessment• Motivation

Page 3: 1 INLS890 Evidence-Based Discovery Spring, 2009 Catherine Blake, Ph.D

3

Introductions

• Dr Catherine Blake – Email - [email protected]– Office - 214A Manning Hall

• Lecture Time – 214 Manning Hall– Thurs 5:00-7:30pm

• Office Time– Email – anytime– By Appointment – Tues and Wed

Page 4: 1 INLS890 Evidence-Based Discovery Spring, 2009 Catherine Blake, Ph.D

4

Operational Details

• Web Page– http://www.ils.unc.edu/~cablake/INLS890-

EBD– Username = ebd Password = spr2009

• Email– Fastest response time – Please email from your UNC account– Start the subject with INLS890

• University Honor Code is in effect

Page 5: 1 INLS890 Evidence-Based Discovery Spring, 2009 Catherine Blake, Ph.D

Course Objectives

• This course combines theoretical models from discovery science, with a survey of informatics tools that support discovery.

• The seminar will show-case the discovery process via a lecture series comprising both discipline and policy champions and thus reveal the synergy between synthesis and discovery and the need for interdisciplinary collaboration.

5

Page 6: 1 INLS890 Evidence-Based Discovery Spring, 2009 Catherine Blake, Ph.D

Theory Theme

• Kuhn– Normal versus Revolutionary Science– Abnormalities

• Chalmers– Observation– Falsification

• Information Quality– Meta-analysis– Information quality

6

Page 7: 1 INLS890 Evidence-Based Discovery Spring, 2009 Catherine Blake, Ph.D

7

Informatics Theme

• Language tools– Information Extraction - Text Mining– Document Summarization - Entailment

• Social Networking– Bibliometrics - Visualizations

• Workflow– Myexperiment

• Domain specific software– Chrystallography - BLAST

Page 8: 1 INLS890 Evidence-Based Discovery Spring, 2009 Catherine Blake, Ph.D

Practice Theme• Synthesis

– Timothy S. Carey, MD, MPH Sarah Graham Kenan Professor of Medicine Director, Cecil G Sheps Center for Health Services Research

– Ila Cote, PhD, DABT Acting Division Director US Environmental Protection Agency National Center for Environmental Assessment

• Discovery– Paul Jones Clinical Associate Professor School of Information and

Library Science Director of ibiblio.org • Michael T Crimmins PhD. Mary Ann Smith Distinguished

Professor of Chemistry UNC and Department Chair Department of Chemistry

– Rudy L Juliano PhD. Boshamer Distinguished Professor of Pharmacology Principal Investigator, Carolina Center of Cancer Nanotechnology Excellence 8

Page 9: 1 INLS890 Evidence-Based Discovery Spring, 2009 Catherine Blake, Ph.D

Practice Theme• Discovery

– Robert C Millikan DVM PhD Barbara Sorenson Hulka Distinguished Professor Department of Epidemiology School of Public Health

– Jan F. Prins PhD. Professor of Computer Science and Chairman, Department of Computer Science

– Alexander Tropsha, Ph.D. Professor and Chair Director, Laboratory for Molecular Modeling

– Suzanne West, PhD Research Associate Professor Department of Epidemiology Acting Director, UNC-GSK Center of Excellence in Pharmacoepidemiology and Public Health

• To be confirmed– Humanities Scholar– Steven W. Matson Ph.D. Professor and Chair Department of Biology

9

Page 10: 1 INLS890 Evidence-Based Discovery Spring, 2009 Catherine Blake, Ph.D

Typical Class Structure• Before class (All): Post expert questions• First Hour

– Presentation by domain expert– Anointed domain expert – engage the presenter !

• Second Hour – Anointed informatics expert - present

technologies– Discuss the intersection between theory,

practice and informatics

• Last 30 mins– Anointed domain expert – introduce next expert

10

Page 11: 1 INLS890 Evidence-Based Discovery Spring, 2009 Catherine Blake, Ph.D

Assignments

• Informatics Review– What domain specific tools are used

in your discipline ?– What generic tools exist for your

discipline• Information extraction• Text mining tool kits

– Post results to the wiki

11

Page 12: 1 INLS890 Evidence-Based Discovery Spring, 2009 Catherine Blake, Ph.D

Assignments

• Engage the presenter– Introduce the presenter the week

before– Read their materials ahead of time– Find out what else they do– Give us any context you can about

the person– What are the key journals in the field

12

Page 13: 1 INLS890 Evidence-Based Discovery Spring, 2009 Catherine Blake, Ph.D

Assignment

• Gap analysis– What informatics tools work in your

discipline ? – What gaps exist between the

academic work being done by these researchers and the informatics tools that are currently available ?

13

Page 14: 1 INLS890 Evidence-Based Discovery Spring, 2009 Catherine Blake, Ph.D

Assignments

• Scientific practice in your domain– Conduct Interviews– Transcribe the interviews– Summarize your findings

• Group activities• Create wiki• Review questions• Submit IRB• Keep track of reference

14

Page 15: 1 INLS890 Evidence-Based Discovery Spring, 2009 Catherine Blake, Ph.D

Dissemination

• Dissemination– How are we going to get this to

people in the field ?• Health Science Library • Paper in their conference• Face to face visits• … what other mechanisms

15

Page 16: 1 INLS890 Evidence-Based Discovery Spring, 2009 Catherine Blake, Ph.D

Assignments

• Class participation– Read the assigned readings– Participate in class discussion– Contribute to the wiki

16

Page 17: 1 INLS890 Evidence-Based Discovery Spring, 2009 Catherine Blake, Ph.D

17

Assessment

• Class Participation 20%– Attendance and contributions to discussion

• Informatics Review 20%

• Introducing and Engaging your speaker 20%

• Gap Analysis– Data collection activities

10%– Final report 20%

• Class contributions 10%

Page 18: 1 INLS890 Evidence-Based Discovery Spring, 2009 Catherine Blake, Ph.D

18

• Questions, Issues, Comments ?

Page 19: 1 INLS890 Evidence-Based Discovery Spring, 2009 Catherine Blake, Ph.D

19

Why are you here ?

Page 20: 1 INLS890 Evidence-Based Discovery Spring, 2009 Catherine Blake, Ph.D

20

Motivation• Massive increase in electronic text

– MEDLINE• Abstracts from more than 5,000 journals• Current: more than 17 million citations• Growth: ~12000 new citations every week

– Chemistry – more than 110,000 articles in 2002 alone

• Consequences– Hundreds of thousands of relevant articles– Implicit connections between literature go

unnoticed

Source: MEDLINE factsheet http://www.nlm.nih.gov/pubs/factsheets/medline.htmlSource: Calculated from ISI’s 418 highest ranked chemistry articles

Shift focus from Retrieval to Synthesis

Page 21: 1 INLS890 Evidence-Based Discovery Spring, 2009 Catherine Blake, Ph.D

21

Information Overload

“One of the diseases of this age is the multiplicity of books; they doth so overcharge the world that it is not able to digest the abundance of idle matter that is every day hatched and brought forth into the world”

- Barnaby Rich, 1613

Page 22: 1 INLS890 Evidence-Based Discovery Spring, 2009 Catherine Blake, Ph.D

22

Existing Text Mining

SAS Text Miner(Association Rules)

IBM Intelligent Miner for text (Clustering)

• Clustering • Categorization• Association Rules

Page 23: 1 INLS890 Evidence-Based Discovery Spring, 2009 Catherine Blake, Ph.D

23

Example Pattern : Decision Tree

person P, P.degree = masters and P.income > 75,000 P.credit = excellent

Page 24: 1 INLS890 Evidence-Based Discovery Spring, 2009 Catherine Blake, Ph.D

24

Kohonen Maps

• Articles represented as vectors

• Assign n random articles

• Assign remaining articles to closest cluster

Snowy peaksindicate highly funded research

NCI-funded research 1995-present Blake,C and Tengs,T (2001) “The Nation’ Breast Cancer Research Portfolio: A view from 30,000 feet”, Avon Symposium, UC Irvine.

Page 25: 1 INLS890 Evidence-Based Discovery Spring, 2009 Catherine Blake, Ph.D

25

Knowledge Discovery in Literature

TargetLiterature

AMagnesium

SourceLiterature

CMigraine

B-Calcium Channel Blockers

B-Platelet Activity

B-Serotonin

...

Swanson, DR (1988) “Migraine and magnesium: eleven neglected connections”, Perspect. Biol. Med., 31: 526-57.Blake, C. & Pratt, W. (2002). A Semantic Approach to Identify Candidate Treatments from Existing Medical Literature. In AAAI Symposium on Knowledge-based Approaches, Stanford, CA.