tides ife-bio kickoff meeting

30
MITRE TIDES IFE-Bio KickOff Meeting David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry, Michael Merideth, Keith Miller, Bev Nunan, Jay Ponte, George Wilson, Flo Reeder, Steve Wohlever 0 50 100 150 200 250 300 350 400 10/13/2000 10/20/2000 1 0/27/20 00 11/3/200 0 1 1/10/2000 11/17/2000 11/24/2000 TIM E Num ber C ases C ases N ew _cases D ead Track_id D ate Disease C ountry City_name Cases N ew _case Dead Ebola 10/30/00 Ebola U ganda Gulu 224 19 73 Ebola 10/31/00 Ebola U ganda Gulu 239 15 75 Ebola 11/01/00 Ebola U ganda Gulu 251 12 80 Ebola 11/11/00 Ebola U ganda Gulu 269 4 87 Ebola 11/13/00 Ebola U ganda Gulu 321 1 104 Ebola 11/17/00 Ebola U ganda Gulu 329 4 107 Ebola 11/17/00 Ebola U ganda Masindi 4 0 4 Ebola 11/19/00 Ebola U ganda Mbarara 12 2 9 Ebola 11/20/00 Ebola Tanzania M w anza 7 2 0 Ebola 11/21/00 Ebola K enya Busia 3 3 0

Upload: carla-wynn

Post on 02-Jan-2016

23 views

Category:

Documents


0 download

DESCRIPTION

TIDES IFE-Bio KickOff Meeting. David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry, Michael Merideth, Keith Miller, Bev Nunan, Jay Ponte, George Wilson, Flo Reeder, Steve Wohlever October 17, 2001. Agenda. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: TIDES IFE-Bio KickOff Meeting

MITRE

TIDES IFE-BioKickOff Meeting

David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry, Michael Merideth,

Keith Miller, Bev Nunan, Jay Ponte, George Wilson, Flo Reeder, Steve Wohlever

October 17, 2001

0

50

100

150

200

250

300

350

400

10/1

3/20

00

10/2

0/20

00

10/2

7/20

00

11/3

/200

0

11/1

0/20

00

11/1

7/20

00

11/2

4/20

00

TIME

Nu

mb

er

Ca

se

s

Cases

New_cases

Dead

Track_id Date Disease Country City_nameCases New_casesDeadEbola 10/30/00 Ebola Uganda Gulu 224 19 73Ebola 10/31/00 Ebola Uganda Gulu 239 15 75Ebola 11/01/00 Ebola Uganda Gulu 251 12 80Ebola 11/11/00 Ebola Uganda Gulu 269 4 87Ebola 11/13/00 Ebola Uganda Gulu 321 1 104Ebola 11/17/00 Ebola Uganda Gulu 329 4 107Ebola 11/17/00 Ebola Uganda Masindi 4 0 4Ebola 11/19/00 Ebola Uganda Mbarara 12 2 9Ebola 11/20/00 Ebola Tanzania Mwanza 7 2 0Ebola 11/21/00 Ebola Kenya Busia 3 3 0

Page 2: TIDES IFE-Bio KickOff Meeting

MITRE

Agenda0 Current Status and Experiments (Laurie)0 User Feedback on MiTAP and Exercise (Eric)0 Lessons Learned (Laurie)0 Architecture Briefing (Jay & Scott)0 Geospatial Processing (George)0 Schedule (Jay)0 Issues and Discussion (All)

Page 3: TIDES IFE-Bio KickOff Meeting

MITRE

Status of MiTAP0 Availability: excellent

- Available ~100% to users inside, outside firewall- 12 individual user accounts, 6 group accounts- 8 daily users on average, mostly repeat users

0 Data capture: rich & dynamic- ~70 working sources, new source added in 30 min- Average 5.8K msgs/day, 1 min latency- 250K msgs total in system

0 Analysis tools: improving- Messages in 6 languages (with COTS translation)- Sorted into 173 newsgroups- Color coded tagging (pers/org/loc/disease)- Popup summarization

0 Product: need to understand how system is being used

Page 4: TIDES IFE-Bio KickOff Meeting

MITRE

0

2000

4000

6000

8000

10000

12000

14000

7/1

7/1

5

7/2

9

8/1

2

8/2

6

9/9

9/2

3

10/7

10/2

1

# M

essa

ges

0

2

4

6

8

10

12

14

# U

sers

# messages

# users

MiTAP Activity:Messages and Users Over Time

Aug Experiment

Attack on America

Page 5: TIDES IFE-Bio KickOff Meeting

MITRE

Performance Summary: Sudan 1999 vs Attack on America 2001

Sudan I ncident

J uly 1999

Comments

Availability NA 95% Security via I P fi ltering

Users 5 10

Capture

Msgs 1000 40,000 250,000 msgs total

Sources 20 70 29 new sources added; 30 min/ source

Throughput NA 8000 msgs/ day Latency for feeds: < 1min

Languages 1 6 French, Spanish, Portuguese, Russian,

Chinese English

Analysis

News groups NA 173 89 new groups

Tagging No Yes

People, organizations, locations, date,

diseases

Translation No Yes 5 languages, variable quality

Search No (web only) Yes Boolean, sort by date/ relevance

Attack on America

September 2001

Page 6: TIDES IFE-Bio KickOff Meeting

MITRE

Disease of the Month ExperimentsAugust September October

WhoMI TAP Team: control

vs test

UMass/ NYU: no

control

MI TAP Team: control

vs test

What dengue f ever dengue f ever bio threats

Whydebug experiment,

underlying processes

stimulate thinking re

inf o extraction, I R

see what system

collected since

exercise

Findings

MiTAP report had

more detail, more up-to-

date, poorer coverage

(nothing evaluated)

MiTAP user wrote

report with 1/ 5

searches, 1/ 2 docs,

more up-to-date

Lessons

Learned

useless f or report

writing, search

diffi cult, online

capture confi g hard

search more diffi cult

with more docs, search

poorly integrated,

need better viz tools

summaries useless,

duplicates hard to

distinguish

Outcomes

improved source

integration (f aster,

easier)

(brainstorming session

cancelled due to

change in priorities)

improvements on

search

Page 7: TIDES IFE-Bio KickOff Meeting

MITRE

Feedback from Eric

0 Report on Bio-Threats0 Deployment for N20 MiTAP Status

- Utility- Usability- Accessibility

Page 8: TIDES IFE-Bio KickOff Meeting

MITRE

Lessons LearnedAvailability

User accounts for production systemNo training needed (instructions available on

website)Stronger security (e.g., intrusion detection)Better back-up, monitoring of throughputMore processing power

CaptureReduced latency on scheduled downloads and

spidering, hourly capture of headlinesDistributed capture processingBetter capture of formatted sourcesSome badly filtered, excess volume causes backlogPoor zoning/formatting/decoding of some sources

Page 9: TIDES IFE-Bio KickOff Meeting

MITRE

Lessons Learned (2)

Analysis Improved search (e.g., by date/relevance,

popups, integrated with news server) Improved “normalization” of names, regionsToo much data! - need better filtering, topic

detection & clustering, summarizationBetter MT, support for ArabicQ&AGeospatial & temporal visualizationAdvanced searchBetter information extraction

Page 10: TIDES IFE-Bio KickOff Meeting

MITRE

Lessons Learned (3)

ProductNo environment for preparing reportsWorkspace

Drag&drop repositoryEditing capabilitiesMultidoc summarizationCollaboration feature (chat & shared workspace)

Page 11: TIDES IFE-Bio KickOff Meeting

MITRE

Catalyst Update: Recent work

0 Usability for developers- Logger- Configuration file refinements

0 Improvements for distributed systems- Redesign of I/O polling procedures- Explicit synchronization feature for

Language Processor developers

Page 12: TIDES IFE-Bio KickOff Meeting

MITRE

Logger

Documents

MetaDataWord.Text

SentenceWord.POS

Entity Extractio

n

Tokenize

Tagger

Sentence

Entities

catlogger catlogger

Page 13: TIDES IFE-Bio KickOff Meeting

MITRE

In progress

0 Usability for developers- Monitor (system status capability)- Native XML I/O! (for ease of debugging &

for lightweight Catalyst )0 Information retrieval

- Integration between Catalyst and new IR engine

- Pushing stream filters toward archived streams

0 Documentation

Page 14: TIDES IFE-Bio KickOff Meeting

MITRE

Monitor

Documents

MetaDataWord.Text

SentenceWord.POS

Entity Extractio

n

Tokenize

Tagger

Sentence

Monitor Monitor

Entities

Page 15: TIDES IFE-Bio KickOff Meeting

MITRE

XML I/O

XML doc XML doc

XML doc EventExtractio

n

XML doc

Catalyst to XML

EventExtractio

n

XML to Catalyst

Present

With XML I/O featureEasier to debug!

Page 16: TIDES IFE-Bio KickOff Meeting

MITRE

XML I/O

Non-Catalyst Process

XML

Wrapper

Process

CatalystProcesse

s

CatalystProcesse

s

With XML I/O feature

Easier path to integrate existing language processing systems!

Page 17: TIDES IFE-Bio KickOff Meeting

MITRE

Archived streams

XML docAnswer Extractio

n

IndexRefineme

nt

Question Answering Application

Candidate

Selection

Coreference

filter criteria

Filter criteria must be pushed upstream from its origination point toward the indices so that process may be reduced to little more than is absolutely necessary.

Origination point

Indices

Page 18: TIDES IFE-Bio KickOff Meeting

MITRE

For the Midterm - 12/12/2001

0 Monitor0 XML I/O support in the Catalyst library0 Lightweight Catalyst design0 Documentation

Page 19: TIDES IFE-Bio KickOff Meeting

MITRE

Catalyst collaborations

0 Qanda-Catalyst-based Qanda used for TREC-Catalyst-based Qanda deployed at AFIWC

0 Information retrieval-Archived annotation streams (for creating IR indexes)-Seekable streams (for processing IR queries)

0 Other projects-ACE/Alembic (Information Extraction)-Audio hot-spotting (Speech Retrieval)-Reading-comp (Question Answering)

Page 20: TIDES IFE-Bio KickOff Meeting

MITRE

Document Management

0 Process scheduling0 System linkage0 Inter-site cooperation support0 User features

Page 21: TIDES IFE-Bio KickOff Meeting

MITRE

Process Scheduling

0 Problem: MiTAP needs the ability to prioritize sources

- ‘Catching up’ on a new source shouldn’t prevent timely processing of an important existing source

0 Solution: - Preprocessing daemon will notify scheduler of

incoming content - Scheduler assigns jobs to available resources

based on priority0 Status:

- Prototype scheduler delivered (Ponte)- Preprocessing daemon rewrite in mid-

November (Wohlever)

Page 22: TIDES IFE-Bio KickOff Meeting

MITRE

System Linkage

0 Problem: Ever notice how new features tend to only apply to new content?

- MiTAP is not flexible - difficult to:=Reprocess and repost a message that has errors

=Find the original source document

=Etc.

- Currently, retroactive changes require 11th hour hacking (or sometimes 12th hour hacking)

0 Solution: Keep database of linkage information to make the system more flexible

0 Status: - Additional information currently being logged- Linkage database - March

Page 23: TIDES IFE-Bio KickOff Meeting

MITRE

Inter-site Cooperation Support

0 Problem: Collaboration with other TIDES contractors who have large legacy systems

- Issue of communication more than scalability0 Solution:

- Linkage database for annotations, similar to the one used for system maintenance

- Web client server communication- Path to scalable solution w/richer interactions

0 Status:- Data management - January- Communications: investigation of relevant

protocols and preliminary design - completed- Native XML support for Catalyst - December

Page 24: TIDES IFE-Bio KickOff Meeting

MITRE

User features

0 Problem: MiTAP helps you find good information, then what?

0 Solution: - Web accessible support for user views and

data organization to assist in reporting and analysis

- Automated view construction/feedback incorporating additional TIDES technologies

0 Status:- Schema for v.1 of workspace developed

(Ponte, Anderson)- Supporting code in progress (Ponte)- Prototype - December

Page 25: TIDES IFE-Bio KickOff Meeting

MITRE

Geo-Spatial Normalization - Goal

Goal:We have: Text containing place namesWe want: Points on maps

Process:Extract place namesLook up places on a listDetermine Lat-LongDisplay

Seattle

47.6 N 122.317 W

Problems:• Place name not on list• More than one place with same name

Page 26: TIDES IFE-Bio KickOff Meeting

MITRE

Geo-Spatial Normalization - Solution

Solution:Part 1: A significant portion of the references

can be resolved using easy methods.

Unambiguous: Seattle ToulouseAmbiguous: Paris WashingtonDisambiguated:Paris, Texas The State of WashingtonSolution:Part 2: Use the “easily resolved” references as

training data for a machine learning classifier which will distinguish the rest.

Page 27: TIDES IFE-Bio KickOff Meeting

MITRE

Geo-Spatial Normalization - Plans

For MidTerm (Dec. 12, 2001)• Detect a significant portion of the “easily

resolvable” references• Display with some map tool

- Web delivery desirable

After MidTerm (May, 2002)• Try to find more “easily resolvable” references• Do the machine learning part• Integrate with other mapping tools

Page 28: TIDES IFE-Bio KickOff Meeting

MITRE

IFE-Bio ScheduleWhat Why When

Availability Add user accounts Widen access to system by request

I mprove quality of online capture I mprove system utility as sources are added

Build new message processing demonI ncrease throughput, decrease

posting latencymid-November

Replace tides2000 with more powerf ul

machine

I ncrease throughput, decrease

posting latencyNovember

Simplif y document processing scripts &

improve logging and error detectionSimplif y admin duties December

Augment search page f unctionality Simplif y fi nding relevant data ongoing

Handle zoning & encoding issues better I mprove translations ongoing

Add MT f or other languages Support Arabic, others as available

Add question answering Simplif y fi nding relevant data December

I mprove sorting, fi ltering, thumbnail

"key entity" list

Provide better fi ltering (e.g.,

FBI S, Relief Web), provide

better name tagging to be used

f or better sorting into

newsgroups

soon

Product

Evaluation Disease of the Month Experiments

Assess utility, evaluate

usability, measure progressmonthly

Data Capture

Analysis

(see architecture schedule, f ollowing)

Page 29: TIDES IFE-Bio KickOff Meeting

MITRE

Architecture Schedule

What Why When

Scheduler PrototypeSupport of new message

capture daemon

Delivered, support

ongoing

DB Tools Prerequisite f or system linkage

and intersite cooperationJ anuary

System Linkage DB

Enable addition of new

f eatures; ease system

administration

March

AnalysisArchitecture support

f or Q&ADecember

ProductUser Workspace

ProtoypeSupport f or report construction December

I nfrastructure Catalyst MonitorEase development and

debuggingDecember

Native XML Support Support f or legacy systems December

Documentation Usability Ongoing

Data Capture

Page 30: TIDES IFE-Bio KickOff Meeting

MITRE

Issues and Discussion

0 How is MiTAP currently being used?- Who are the users?- What are the users doing?- What do users want?

0 Prioritization of issues- Integrated feasibility experiment versus

operational prototype: =Possible deployment vs integration of other TIDES technologies

(Do we need to adjust our priorities?)

- Along what dimensions should we optimize?=Availability, capture, analysis, presentation