interactive task of the trec legal track: theory meets practice

18
Interactive Task of Interactive Task of the the TREC Legal Track: TREC Legal Track: Theory meets Practice Theory meets Practice Douglas W. Oard College of Information Studies and Institute for Advanced Computer Studies University of Maryland, College Park rk with Jason Baron (NARA), Bruce Hedin (H5), Stephen Tomlinson (Op Making the world better for lawyers

Upload: carys

Post on 11-Feb-2016

39 views

Category:

Documents


0 download

DESCRIPTION

Interactive Task of the TREC Legal Track: Theory meets Practice. Making the world better for lawyers. Douglas W. Oard. College of Information Studies and Institute for Advanced Computer Studies University of Maryland, College Park. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Interactive Task of the TREC Legal Track: Theory meets Practice

Interactive Task of theInteractive Task of theTREC Legal Track:TREC Legal Track:

Theory meets PracticeTheory meets Practice

Douglas W. Oard

College of Information Studies andInstitute for Advanced Computer Studies

University of Maryland, College Park

Joint work with Jason Baron (NARA), Bruce Hedin (H5), Stephen Tomlinson (Open Text)

Making the world better for lawyers

Page 2: Interactive Task of the TREC Legal Track: Theory meets Practice

E-Discovery

National ArchivesNational Archives

Clinton Clinton White HouseWhite House Tobacco Tobacco

PolicyPolicy

search search requestrequest

hired 25 hired 25 personspersons

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~32 million

emails

200,000

80,000

for 6 months …

Page 3: Interactive Task of the TREC Legal Track: Theory meets Practice

Federal Rules of Civil Procedure

Rule 26(f): At the parties’ planning meeting, issues expected to be discussed include:

– “Any issues relating to disclosure or discovery of electronically stored information, including the form or forms in which it should be produced”

– “Any issues relating to preserving discoverable information”

Page 4: Interactive Task of the TREC Legal Track: Theory meets Practice

Judge Grimm, writing for the U.S. District Court for the District of Maryland

“all keyword searches are not created equal; and there is a growing body of literature that highlights the risks associated with conducting an unreliable or inadequate keyword search”

Victor Stanley, Inc. v. Creative Pipe, Inc., ---F.Supp.2d---, 2008 WL 2221841, * 3 & n.9 (D. Md. May 29, 2008)

Page 5: Interactive Task of the TREC Legal Track: Theory meets Practice

The Design Space

“Features”

“Method”

“Specification”

“Result”

Page 6: Interactive Task of the TREC Legal Track: Theory meets Practice

INCREASING EFFORT(time, resources expended, etc.)

“Baseline” Technique

“Better” Technique

B

C

D

INCREASINGSUCCESS

(findingrelevant

documents)

A

x

y

What Does “Better” Mean?

Page 7: Interactive Task of the TREC Legal Track: Theory meets Practice

Other Desiderata

• Two-party– Negotiated information needs

• Comprehensive– “Smoking gun detection” + completeness

• Justifiable– Quantifiable comparison to present practice

• Affordable– Minimize amount of human review

Page 8: Interactive Task of the TREC Legal Track: Theory meets Practice

• Goals– Foster development of research communities– Create “benchmark” evaluation resources– Establish baseline results

• History– Sponsored by NIST since 1992– “Legal Track” started in 2006; E-Discovery focus– Annual evaluation cycle

Text Retrieval Conference (TREC)

Page 9: Interactive Task of the TREC Legal Track: Theory meets Practice

Evaluation Design

Scanned Docs

Interactive Task

Page 10: Interactive Task of the TREC Legal Track: Theory meets Practice

2008 Interactive Task Participants

Clearwell SystemsH5University at BuffaloUniversity of Pittsburgh

4 research teams submitted 7 runs

Each run: YES/NO for all 7 million documentsfor a single production request

Page 11: Interactive Task of the TREC Legal Track: Theory meets Practice

“Complaint” and “Production Request”

…12. On January 1, 2002, Echinoderm announced record results for the prior year, primarily attributed to strong demand growth in overseas markets, particularly China, for its products. The announcement also touted the fact that Echinoderm was unique among U.S. tobacco companies in that it had seen no decline in domestic sales during the prior three years.13. Unbeknownst to shareholders at the time of the January 1, 2002 announcement, defendants had failed to disclose the following facts which they knew at the time, or should have known: a. The Company's success in overseas markets resulted in large part from bribes paid to foreign government officials to gain access to their respective markets; b. The Company knew that this conduct was in violation of the Foreign Corrupt Practices Act and therefore was likely to result in enormous fines and penalties; c. The Company intentionally misrepresented that its success in overseas markets was due to superior marketing. d. Domestic demand for the Company's products was dependent on pervasive and ubiquitous advertising, including outdoor, transit, point of sale and counter top displays of the Company's products, in key markets. Such advertising violated the marketing and advertising restrictions to which the Company was subject as a party to the Attorneys General Master Settlement Agreement ("MSA").e. The Company knew that it could be ordered at any time to cease and desist from advertising practices that were not in compliance with the MSA and that the inability to continue such practices would likely have a material impact on domestic demand for its products. …

All documents which describe, refer to, report on, or mention any “in-store,” “on-counter,” “point of sale,” or other retail marketing campaigns for cigarettes.

Page 12: Interactive Task of the TREC Legal Track: Theory meets Practice

~7 Million Documents

Title: CIGNA WELL-BEING NEWSLETTER - FUTURE STRATEGY

Organization Authors: PMUSA, PHILIP MORRIS USA

Person Authors: HALLE, L

Document Date: 19970530

Document Type: MEMO, MEMORANDUM

Bates Number: 2078039376/9377

Page Count: 2Collection: Philip Morris

Philip Moxx's. U.S.A. x.dr~am~c. cvrrespoaa.aaBenffrts Departmext Rieh>pwna, Yfe&iaTa: Dishlbutfon Data aday 90,1997.From: Lisa FisllaSabj.csr CIGNA WeWedng Newsbttsr -Yntsre StratsUDuring our last CIGNA Aatfoa Plan meadng, tlu iasuo of wLetSae to i0op per'Irw+ngartieles aod discontinue mndia6 CIGNA Well-Being aawslener to om employees was amsiter of disanision . I Imvm done somme reaearc>>, and wanted to pruedt you with mySadings and pcdiminary recwmmeadatioa for PM's atratezy Ieprding l4aas aewelattee* .I believe .vayone'a input is valusble, and would epproolate hoarlng fmaa aaeh of you onwhetlne you concur with my reeommendatioa…

Scanned OCR Metadata

Page 13: Interactive Task of the TREC Legal Track: Theory meets Practice

Relevance Assessment

• Volunteer assessors– Mostly from 13 law schools

• Web-based assessment system– Based on document images + metadata

Page 14: Interactive Task of the TREC Legal Track: Theory meets Practice

Estimating Retrieval Effectiveness

region in thisrelevant %6764

region in thisrelevant %3331

Page 15: Interactive Task of the TREC Legal Track: Theory meets Practice

Everyone Gets High Precision

High OCR-accuracy documents only

PrecisionRelRet / Ret

RecallRelRet / Rel

Rel Ret

Page 16: Interactive Task of the TREC Legal Track: Theory meets Practice

Interaction Time Effect

All documents

Page 17: Interactive Task of the TREC Legal Track: Theory meets Practice

Takeaway Messages

• Leverage guided interactive refinement– Factor of two in comprehensiveness

• Vibrant research community– 22 research teams in 7 countries

• Unique test collection– Sampling for “recall-oriented” evaluation

Page 18: Interactive Task of the TREC Legal Track: Theory meets Practice

Some Useful References • TREC Legal Track

– http://trec-legal.umiacs.umd.edu– Papers at http://trec.nist.gov– Mailing list (contact [email protected])

• DESI-3 Workshop on “Global E-Discovery and E-Disclosure”

– June 8, 2009 in Barcelona– http://www.law.pitt.edu/DESI3_Workshop