edi 2009 case law update
Upload: georgetown-university-law-center-office-of-continuing-legal-education
Post on 07-Dec-2014
1.082 views
DESCRIPTION
TRANSCRIPT
1
….The ever increasing volume of ESI is a problem
In a world of limited tools and resources…..
In a world of limited tools and resources…..
3
Searching the Haystack….
4
to find all the relevant needles…
5
ends up like searching in a maze…
Email is still the 800 lb. gorilla of ediscovery
7
To do “search” right, one has to know where relevant ESI may be found in the many branches of an organization
8
Is this still ‘best practice’ or have we moved beyond Boolean? Example of Boolean search string from U.S. v. Philip Morris, circa 2003
(((master settlement agreement OR msa) AND NOT (medical savings account OR metropolitan standard area)) OR s. 1415 OR (ets AND NOT educational testing service) OR (liggett AND NOT sharon a. liggett) OR atco OR lorillard OR (pmi AND NOT presidential management intern) OR pm usa OR rjr OR (b&w AND NOT photo*) OR phillip morris OR batco OR ftc test method OR star scientific OR vector group OR joe camel OR (marlboro AND NOT upper marlboro)) AND NOT (tobacco* OR cigarette* OR smoking OR tar OR nicotine OR smokeless OR synar amendment OR philip morris OR r.j. reynolds OR ("brown and williamson") OR ("brown & williamson") OR bat industries OR liggett group)
9
U.S. v. Philip Morris E-mail Winnowing Process & The Problem of Scale:(It’s Only Getting Worse Out There)
20 million 200,000 100,000 80,000 20,000 email hits based relevant produced placed on records on keyword emails to opposing privilege terms used party logs (1%)
A PROBLEM: only a handful entered as exhibits at trial A BIGGER PROGLEM: the 1% figure does not scale
10
A Hypothetical
1 billion emails, 25% with attachments Reviewed at 50 per hour Would take 100 people, 10 hrs per day, 7
days a week, 52 weeks a year ….
54 YEARS TO COMPLETE At $100/hr, $ 2 billion in cost Even 1% (10 million docs) … 28 weeks
and $20 million in cost …..
11
Judge Grimm writing for the U.S. District Court for the District of Maryland
“[W]hile it is universally acknowledged that keyword searches are useful tools for search and retrieval of ESI, all keyword searches are not created equal; and there is a growing body of literature that highlights the risks associated with conducting an unreliable or inadequate keyword search or relying on such searches for privilege review.” Victor Stanley, Inc. v. Creative Pipe, Inc., 250 F.R.D. 251 (D. Md. 2008); see id., text accompanying nn. 9 & 10 (citing to Sedona Search Commentary & TREC Legal Track research project)
12
Judge Facciola writing for the U.S. District Court for the District of Columbia
“Whether search terms or ‘keywords’ will yield the information sought is a complicated question involving the interplay, at least, of the sciences of computer technology, statistics and linguistics. See George L. Paul & Jason R. Baron, Information Inflation: Can the Legal System Adapt?', 13 RICH. J.L. & TECH.. 10 (2007) * * * Given this complexity, for lawyers and judges to dare opine that a certain search term or terms would be more likely to produce information than the terms that were used is truly to go where angels fear to tread.”-- U.S. v. O'Keefe, 537 F.Supp.2d 14, 24 D.D.C. 2008).
13
Judge Peck writing for the U.S. District Court for the Southern District of New York
William A. Gross Construction Associates Inc. v.
American Manufacturers Mutual Ins. Co., 2009 WL 724954 (S.D.N.Y. March 19, 2009) (in multi-million dollar dispute, where issue involved production of 3rd party emails where none of the three parties could agree on keyword search terms, court fashioned a compromise; court stated at the outset that “this Opinion should serve as a wake-up call to the Bar … about the need for careful thought, quality control, testing and cooperation with opposing counsel in designing search terms or ‘keywords’ to be used.”)
Judge Maas writing for the US District Court for the Southern District of New York in Capitol Records, Inc. v. MP3Tunes, LLC,
2009 WL 2568431 (Aug. 13, 2009)
“. . .[R]ather than sitting down with the Plaintiffs’ counsel to agre on search parameters and terms, MP3tunes’ counsel directed is client to conduct a search of MP3tunes’ emails . . . using the word ‘design’ as the only search term. Remarkably, when I questioned the wisdom of that decision . . . MP3tunes’ attorney suggested that he actually considered this one-word search to be ‘overly broad.’ After I observed that MP3tunes’ unilateral decision regarding its search reflected a failure to heed Magistrate Judge Andrew Peck’s recent ‘wake-up-call’ regarding the need for cooperation concerning e-discovery . . . Counsel apologized for not having also used the word ‘development’ as a search term.”
15
Beyond Keywords: What Alternative Search Methods Are We Beginning to Encounter in Current Litigation?
Greater Use Made of Boolean Strings Fuzzy Search Models Probabilistic models (Bayesian) Statistical methods (clustering) Machine learning approaches to semantic representation Categorization tools: taxonomies and ontologies Social network analysis Hybrid and fusion approaches
Reference: Appendix to The Sedona Conference® Best Practices Commentary on the Use of Search and Information Retrieval Methods in E-Discovery (2007), available at http://www.thesedonaconference.org (link to publications)
16
inspectforage
looking
look for
investigate
explore
SEARCH
lookup
seek
examine
hunthunting
Thesaurus c/o Herb Roitblatt
17
R5-D4 Astromech
IG-88 Assassin
R2-D2 Astromech
C-3PO Protocol
2-1B Surgical
T3-M4 Utility
Droids
Humanoid droidsNonhumanoid droids
R-Series droids
Droid Taxonomy
C/o Herb Roitblatt
http://starwars.wikia.com
18
19
What is TREC?What is TREC?
Conference series co-sponsored by the National Institute of Standards and Technology (NIST) and the Advanced Research and Development Activity (ARDA) of the Department of Defense
Designed to promote research into the science of information retrieval
First TREC conference was in 1992 15th Conference held November 15-17, 2006 in
U.S. in Gaithersburg, Maryland (NIST headquarters)
20
TREC Legal TrackTREC Legal Track
The TREC Legal Track was designed to evaluate the effectiveness of search technologies in a real-world legal context
First of a kind study using nonproprietary data since Blair/Maron research in 1985
Hypothetical complaints and 100+ “requests to produce” drafted by members of The Sedona Conference®
“Boolean negotiations” conducted as a baseline for search efforts New Interactive Task added in 2008 using Topic Authorities and a
post-adjudication round Documents to be searched were drawn from a publicly available 7
million document tobacco litigation Master Settlement Agreement database
In 2009, a second Enron data set was added as a separate task Participating teams of information scientists from around the world
contributing computer runs, plus in 2008 and 2009 from legal service providers (12 in 2009).
21
RequestNumber: 52 RequestText: Please produce any and all documents that discuss the use
or introduction of high-phosphate fertilizers (HPF) for the specific purpose of boosting crop yield in commercial
agriculture.
Proposal: "high-phosphate fertilizer!" AND (boost! w/5 "crop yield")AND (commercial w/5 agricultur!)
Rejoinder: (phosphat! OR hpf OR phosphorus OR fertiliz!) AND (yield! OR output OR produc! OR crop OR crops)
FinalQuery: (("high-phosphat! fertiliz!" OR hpf) OR ((phosphat! OR phosphorus) w/15 (fertiliz! OR soil))) AND
(boost! OR increas! OR rais! OR augment! OR affect! OR effect! OR multipl! OR doubl! OR tripl! OR high! OR greater)
AND (yield! OR output OR produc! OR crop OR crops)
B: 3078
TREC Legal Track: Topics
Beyond Boolean: getting at the “dark matter”
(i.e., relevant documents not found by keyword searches alone)
23
0
100
200
300
400
500
Topic
Kn
ow
n R
ele
va
nt
Do
cu
me
nts
Boolean Expert Searcher TREC Systems Only
Nobody Finds Everything
Source: TREC 2006 Legal Track
24
“Boolean” Searches May Miss A Large Percentage of Relevant Documents
0%
20%
40%
60%
80%
100%
Topic
Rec
all
at B
Source: TREC 2007 Legal Track
78% of relevant documents were only found by some other technique
25
Boolean v. TREC Systems: Results of Legal Track Years 1 and 2
53%
47%
22%
78%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Total Relevant Docs
1 2
Year
Boolean vs. TREC Systems
Boolean
26
Source: F.C. Zhao, D. W. Oard, and J.R. Baron, “Improving Search Effectiveness in the Legal E-Discovery Process Using
Relevance Feedback” (forthcoming 2009)
Improving Search Effectiveness Through Relevance Feedback and Multple Meet and
Confers
1st Meet and Confer
Second Meet and Confer
27
Interdisciplinary Approaches--
Three Languages: Legal, RM, and IT
Strategic challenges in our collective futures . . . .
Convincing lawyers and judges that automated searches are not just desirable but necessary in response to large e-discovery demands.
Challenges (cont.)
Designing an overall review process which maximizes the potential to find responsive documents in a large data collection (no matter which search tool is used), and using sampling and other analytic techniques to test hypotheses early on.
Challenges (cont.)
Having all parties and adjudicators understand that the use of automated methods does not guarantee all responsive documents will be identified in a large data collection.
Challenges (cont.)
Being open to using new and evolving search and information retrieval methods and tools.
Overarching Smart E-Discovery Strategy
ESPECIALLY FOR eDISCOVERY LITIGATOR-WARRIORS…
EMRACING COLLABORATION WITH ADVERSARIES (TRANSPARENCY) See The Sedona Conference Cooperation
Proclamation, www.thesedonaconference.org
The leading rule for the lawyer, as for the man, of every calling, is diligence.
-- Abraham Lincoln
Ongoing Research & Reference
TREC 2009 Legal Track / TREC 2010 Legal Trackhttp://trec-legal.umiacs.umd.edu/
ICAIL 2009 Barcelona DESI III Workshop -- June 8, 2009 http://www.law.pitt.edu/DESI3_Workshop/
Workshop-in-Planning -- October/November 2010, San Francisco
The Sedona Conference® Search & Retrieval Commentary (2007) &The Sedona Conference® Commentary on Achieving Quality in E-Discovery (2009) (both available at www.thesedonaconference.org)
35
Jason R. BaronDirector of LitigationOffice of General Counsel
National Archives and Records Administration
8601 Adelphi Road # 3110College Park, MD 20740(301) 837-1499Email: [email protected]
36
Best Practices in Keyword Searching (Maura Grossman)
Start with the Complaint: who are the custodians?
What is the applicable time frame? What terms-of-art are employed? Translate the request into plain English Involve multiple people to get differing
interpretations of the requests and potential keywords from different vantage points
37
Best Practices in Keyword Searching (cont.)
Next: seek input from people who actually created, sent or received the documents
Look at responsive documents for unique words or phrases. In what context do those documents appear?
Incorporate common misspellings, errors, variants and synonyms (utilize tools on web for this task)
Determine irrelevant file types
38
Best Practices in Keyword Searching (cont.)
Special search strategies for handwritten docs, drawings, facsimiles, password-protected or encrypted files
Learn capabilities and limitations of your search tool Take a representative sample and test, test, test
From both Hits and Misses pile Retest
39
Best Practices in Keyword Searching (cont.)
Keep track of and document what you did to explain rationale for process or method applied
Collaborate and communicate in good faith
(See The Sedona Cooperation Proclamation, available at www.thesedonaconference.org)