when to stop reviewing documents in ediscovery...
TRANSCRIPT
![Page 1: When to stop reviewing documents in eDiscovery …sigappfr.acm.org/MEDES/13/download/Jakob-Halskov.pdf– UBIC makes this process as painless as possible while ensuring defensibility](https://reader034.vdocument.in/reader034/viewer/2022042417/5f3273184f6c7b53b8681bbb/html5/thumbnails/1.jpg)
When to stop reviewing
documents in eDiscovery cases
The Lit i View Quality Monitor and Endpoint Detector
ⒸUBIC, Inc. 2013 All Rights Reserved.
Jakob Halskov, Hideki Takeda UBIC Inc., Technology Dept.
MEDES/ACM 2013
Luxembourg, October 30th 2013
Tokyo| Osaka | Nagoya | Seoul | Taipei| Hong Kong | Silicon Valley| Washington DC | New York | London
![Page 2: When to stop reviewing documents in eDiscovery …sigappfr.acm.org/MEDES/13/download/Jakob-Halskov.pdf– UBIC makes this process as painless as possible while ensuring defensibility](https://reader034.vdocument.in/reader034/viewer/2022042417/5f3273184f6c7b53b8681bbb/html5/thumbnails/2.jpg)
Outline of talk
• Introduction: Redefining Big Data!
• The Discovery system
• UBIC’s Legal Cloud & Lit i View SaaS
• Outline of Predictive Coding technology
• Impact of Predictive Coding: case study
• Estimating sample size & HOT ratio
• Demo of UBIC’s Quality Monitor and Endpoint Detector
ⒸUBIC, Inc. 2013 All Rights Reserved. 1
![Page 3: When to stop reviewing documents in eDiscovery …sigappfr.acm.org/MEDES/13/download/Jakob-Halskov.pdf– UBIC makes this process as painless as possible while ensuring defensibility](https://reader034.vdocument.in/reader034/viewer/2022042417/5f3273184f6c7b53b8681bbb/html5/thumbnails/3.jpg)
ⒸUBIC, Inc. 2013 All Rights Reserved. 2
Behavior Informatics
We need new approaches for analyzing
Human Thought and Behavior
UBIC redefines Big Data
Big Data is a Universe of
Human Thought and Behavior
![Page 4: When to stop reviewing documents in eDiscovery …sigappfr.acm.org/MEDES/13/download/Jakob-Halskov.pdf– UBIC makes this process as painless as possible while ensuring defensibility](https://reader034.vdocument.in/reader034/viewer/2022042417/5f3273184f6c7b53b8681bbb/html5/thumbnails/4.jpg)
ⒸUBIC, Inc. 2013 All Rights Reserved. 3
Informatics
Statistics – Mathematics
Data Mining - Text Mining
Speech Technology
Behavioral Science
Criminology
Sociology
Psychology
Discover Risk
Discover Knowledge
More Effectively and Efficiently
What is Behavior Informatics?
![Page 5: When to stop reviewing documents in eDiscovery …sigappfr.acm.org/MEDES/13/download/Jakob-Halskov.pdf– UBIC makes this process as painless as possible while ensuring defensibility](https://reader034.vdocument.in/reader034/viewer/2022042417/5f3273184f6c7b53b8681bbb/html5/thumbnails/5.jpg)
Legal Intelligence (eDiscovery)
Discover Risk for Company Digital
Forensics
Business Intelligence
Discover
Knowledge Medicine
Intelligence(Security Support)
Discover Risk for Community
M&A
ⒸUBIC, Inc. 2013 All Rights Reserved.
Applications of Behavior Informatics
![Page 6: When to stop reviewing documents in eDiscovery …sigappfr.acm.org/MEDES/13/download/Jakob-Halskov.pdf– UBIC makes this process as painless as possible while ensuring defensibility](https://reader034.vdocument.in/reader034/viewer/2022042417/5f3273184f6c7b53b8681bbb/html5/thumbnails/6.jpg)
The Discovery system • Data protection and privacy laws in the US are lax
– Categorical document requests (virtually all types of ESI is discoverable)
• Being forced to give information to a competitor/government is RISKY
– Narrow down the amount of information released
– UBIC makes this process as painless as possible while ensuring defensibility
• At the “Meet and Confer” the opposing parties will agree to a Discovery Plan (aka “protocol”)
– Defining the scope of responsive (relevant) data
– Scope of Accessibility & cost shifting (who is paying?)
– Defining privileged data (exempt from production)
– Setting performance goals and deadlines for production
• Recall and defensibility are key under this system
• Famous cases
– ENRON (TREC Legal track)
– Global Aerospace (Dec 2012, Predictive Coding becomes mainstream in eDiscovery, judge sets minimum recall rate at 75%)
ⒸUBIC, Inc. 2013 All Rights Reserved. 5
![Page 7: When to stop reviewing documents in eDiscovery …sigappfr.acm.org/MEDES/13/download/Jakob-Halskov.pdf– UBIC makes this process as painless as possible while ensuring defensibility](https://reader034.vdocument.in/reader034/viewer/2022042417/5f3273184f6c7b53b8681bbb/html5/thumbnails/7.jpg)
The nine phases of eDiscovery
ⒸUBIC, Inc. 2013 All Rights Reserved. 6
Review typically costs 70% of the total costs of an eDiscovery case.
![Page 8: When to stop reviewing documents in eDiscovery …sigappfr.acm.org/MEDES/13/download/Jakob-Halskov.pdf– UBIC makes this process as painless as possible while ensuring defensibility](https://reader034.vdocument.in/reader034/viewer/2022042417/5f3273184f6c7b53b8681bbb/html5/thumbnails/8.jpg)
UBIC’s Lit i View software
• Lit i View (Cloud-based SaaS)
– Used in more than 275 cross-border litigation cases, including
• Plaintiff = private
– Intellectual Property (patent infringement)
– Product Liability, …
• Plaintiff = government
– Anti-trust regulations (cartels)
– Covers virtually all phases of the EDRM
• Custodian identification and management (“Central Linkage”)
• Legal hold management (“Easy hold”)
• Collection & preservation
• Processing (+CJK support, encoding/segmentation etc.)
• Analysis & Review (Predictive Coding)
ⒸUBIC, Inc. 2013 All Rights Reserved. 7
![Page 9: When to stop reviewing documents in eDiscovery …sigappfr.acm.org/MEDES/13/download/Jakob-Halskov.pdf– UBIC makes this process as painless as possible while ensuring defensibility](https://reader034.vdocument.in/reader034/viewer/2022042417/5f3273184f6c7b53b8681bbb/html5/thumbnails/9.jpg)
UBIC Legal Cloud overview
ⒸUBIC, Inc. 2013 All Rights Reserved. 8
Customer benefit: Most data can stay locally (in Asia or US) for the duration of the case
![Page 10: When to stop reviewing documents in eDiscovery …sigappfr.acm.org/MEDES/13/download/Jakob-Halskov.pdf– UBIC makes this process as painless as possible while ensuring defensibility](https://reader034.vdocument.in/reader034/viewer/2022042417/5f3273184f6c7b53b8681bbb/html5/thumbnails/10.jpg)
Outline of Predictive Coding
What:
Supervised machine learning algorithm assigning Relevance Scores to documents
Why:
– Improve quality/consistency of Review
– Save time
– Optimize sampling strategy (control costs)
Flexible & iterative design
– Random sample extraction
– Feature selection
• Morphological analysis (+CJK)
• Statistical analysis
– Feature (re)weighting
• Association measures
– Document (re)scoring
ⒸUBIC, Inc. 2013 All Rights Reserved. 9
![Page 11: When to stop reviewing documents in eDiscovery …sigappfr.acm.org/MEDES/13/download/Jakob-Halskov.pdf– UBIC makes this process as painless as possible while ensuring defensibility](https://reader034.vdocument.in/reader034/viewer/2022042417/5f3273184f6c7b53b8681bbb/html5/thumbnails/11.jpg)
Impact of Predictive Coding: case study
ⒸUBIC, Inc. 2013 All Rights Reserved. 10
• Japanese maker, US law firm
• Over 1 million documents
• PC carried out twice
• Review costs were reduced by
40% vs. conventional human review
![Page 12: When to stop reviewing documents in eDiscovery …sigappfr.acm.org/MEDES/13/download/Jakob-Halskov.pdf– UBIC makes this process as painless as possible while ensuring defensibility](https://reader034.vdocument.in/reader034/viewer/2022042417/5f3273184f6c7b53b8681bbb/html5/thumbnails/12.jpg)
Estimating minimum sample size, ns
The error level, 𝛥𝑝, for the predictor 𝑝 =𝑁𝐻𝑂𝑇
𝑁 is given by:
𝛥𝑝 = 𝛾𝑁 − 𝑛𝑠
𝑁 − 1
𝑝 1 − 𝑝
𝑛𝑠<=>
𝑛𝑠 =𝛾2
𝛥𝑝2
1
𝑁 − 1𝑁
1𝑝(1 − 𝑝)
+𝛾2
𝑁𝛥𝑝2
When N is much greater than ns: 𝑁−𝑛𝑠
𝑁−1→ 1, and thus:
𝑛𝑠 ≈𝛾2
𝛥𝑝2𝑝(1 − 𝑝)
Unfortunately, we do not know p (as NHOT is unknown)
ⒸUBIC, Inc. 2013 All Rights Reserved. 11
![Page 13: When to stop reviewing documents in eDiscovery …sigappfr.acm.org/MEDES/13/download/Jakob-Halskov.pdf– UBIC makes this process as painless as possible while ensuring defensibility](https://reader034.vdocument.in/reader034/viewer/2022042417/5f3273184f6c7b53b8681bbb/html5/thumbnails/13.jpg)
Estimating NHOT and minimum ns
N Conf. Level = 95% Conf. Level = 99%
ns ns << N ns ns << N
10,000 4,899
9,604
6,247
16,641 100,000 8,057 14,267
1,000,000 9,513 16,369
5,000,000 9,586 16,586
ⒸUBIC, Inc. 2013 All Rights Reserved. 12
0
0.1
0.2
0.3
0 0.2 0.4 0.6 0.8 1
p(1
-p)
p
f(p)=p(1-p)
With p=0.5 as the worst case
(giving the highest sample size),
we get
𝑛𝑠 ≈1
4
𝛾2
𝛥𝑝2
Using a confidence interval of
95%, the confidence coefficient
(𝛾) is 1.96, and we can now
compute the minimum sample
sizes for various N, for example
setting the error level at 0.01.
𝑁𝐻𝑂𝑇𝑒𝑠𝑡 = 𝑁 𝑝𝑇𝐴𝐺 ± 𝛥𝑝
𝛥𝑝 = 𝛾𝑁 − 𝑛𝑠
𝑁 − 1
𝑝𝑇𝐴𝐺 1 − 𝑝𝑇𝐴𝐺
𝑛𝑠
𝑝𝑇𝐴𝐺 =𝑛𝑇𝐴𝐺
𝑛𝑠
![Page 14: When to stop reviewing documents in eDiscovery …sigappfr.acm.org/MEDES/13/download/Jakob-Halskov.pdf– UBIC makes this process as painless as possible while ensuring defensibility](https://reader034.vdocument.in/reader034/viewer/2022042417/5f3273184f6c7b53b8681bbb/html5/thumbnails/14.jpg)
Quality Monitor demo
ⒸUBIC, Inc. 2013 All Rights Reserved. 13
![Page 15: When to stop reviewing documents in eDiscovery …sigappfr.acm.org/MEDES/13/download/Jakob-Halskov.pdf– UBIC makes this process as painless as possible while ensuring defensibility](https://reader034.vdocument.in/reader034/viewer/2022042417/5f3273184f6c7b53b8681bbb/html5/thumbnails/15.jpg)
Conclusion & new/future features
• UBIC’s QM and EPD provide a user-friendly UI to secure a high
quality and defensible outcome of the review process
ⒸUBIC, Inc. 2013 All Rights Reserved. 14
• Leveraging
theory from
Social Network
Analysis, UBIC
released “Central
Linkage” on
October 1st 2013
![Page 16: When to stop reviewing documents in eDiscovery …sigappfr.acm.org/MEDES/13/download/Jakob-Halskov.pdf– UBIC makes this process as painless as possible while ensuring defensibility](https://reader034.vdocument.in/reader034/viewer/2022042417/5f3273184f6c7b53b8681bbb/html5/thumbnails/16.jpg)
References Diesner, Jana; Terrill L. Frantz; Kathleen M. Carley. 2005. Communication Networks from the Enron Email Corpus. “It’s Always About the People. Enron is no Different”. In Computational & Mathematical Organization Theory. 11 (3), 201-228. Kluwer, MA.
Halskov, Jakob. 2013. Augmenting Predictive Analytics for eDiscovery with Richer Linguistic Features. Poster presentation at Asian Summer School in Information Access, ASSIA (Research Center for Knowledge Communities, Tsukuba University, Japan, June 22-24). http://www.kc.tsukuba.ac.jp/assia2013/poster_presentation
Oard, Douglas W.; Jason R. Baron; Bruce Hedin; David D. Lewis; Stephen Tomlinson. 2010. Evaluation of information retrieval for E-discovery. In Artificial Intelligence and Law, 18 (4), 347-386. Springer, Amsterdam.
Takeda, Hideki. 2013. Trend on Digital Forensic Technologies and Business in Japan. Keynote Speech. In Proceedings of the 5th IEEE International Workshop on Computer Forensics in Software Engineering (Kyoto, Japan, July 22-26). IEEE Computer Society Press, CA.
Webber, William. 2011. Re-examining the Effectiveness of Manual Review. In Proceedings of SIGIR 2011 Information Retrieval for E-Discovery Workshop (Beijing, China, July 28). http://www.umiacs.umd.edu/~oard/sire11/
ⒸUBIC, Inc. 2013 All Rights Reserved. 15