![Page 1: Statistical Validation And Data Analytics In e Discovery](https://reader035.vdocument.in/reader035/viewer/2022062305/568164a1550346895dd694e8/html5/thumbnails/1.jpg)
Statistical Validation And Data Analytics In eDiscovery
Geoff BlackDirector, High Tech Investigations
Prudential
The views expressed in this presentation are solely those of the presenter and do not necessarily reflect the views of the presenter’s employer.
![Page 2: Statistical Validation And Data Analytics In e Discovery](https://reader035.vdocument.in/reader035/viewer/2022062305/568164a1550346895dd694e8/html5/thumbnails/2.jpg)
Recommended Reading
![Page 3: Statistical Validation And Data Analytics In e Discovery](https://reader035.vdocument.in/reader035/viewer/2022062305/568164a1550346895dd694e8/html5/thumbnails/3.jpg)
Why do we need Statistics?(Ensuring Quality in eDiscovery)
Professional standards
Savvy judges already require sampling
Defensibility
![Page 4: Statistical Validation And Data Analytics In e Discovery](https://reader035.vdocument.in/reader035/viewer/2022062305/568164a1550346895dd694e8/html5/thumbnails/4.jpg)
Types of Sampling
Judgmental
Statistical*
![Page 5: Statistical Validation And Data Analytics In e Discovery](https://reader035.vdocument.in/reader035/viewer/2022062305/568164a1550346895dd694e8/html5/thumbnails/5.jpg)
A Recent Experience with Sampling
Setting the stage
![Page 6: Statistical Validation And Data Analytics In e Discovery](https://reader035.vdocument.in/reader035/viewer/2022062305/568164a1550346895dd694e8/html5/thumbnails/6.jpg)
A Recent Experience with SamplingThe Challenge
Select appropriate filters for a large data set
Audit reviewers without double reviewing everything
Test our processing tools
Accomplish all of these with a high confidence level and low confidence interval
![Page 7: Statistical Validation And Data Analytics In e Discovery](https://reader035.vdocument.in/reader035/viewer/2022062305/568164a1550346895dd694e8/html5/thumbnails/7.jpg)
Statistics for eDiscoveryConfidence Interval
The “confidence interval” or margin of error
How closely our results will reflect the general population
Lower is better
![Page 8: Statistical Validation And Data Analytics In e Discovery](https://reader035.vdocument.in/reader035/viewer/2022062305/568164a1550346895dd694e8/html5/thumbnails/8.jpg)
Statistics for eDiscoveryConfidence Interval Example
We have 100 documents and our confidence interval is ± 2%.
Testing shows 10% responsiveness
General population should show between 8% and 12% responsiveness, or
8 to 12 documents.
![Page 9: Statistical Validation And Data Analytics In e Discovery](https://reader035.vdocument.in/reader035/viewer/2022062305/568164a1550346895dd694e8/html5/thumbnails/9.jpg)
Statistics for eDiscoveryConfidence Level
The “confidence level”
Does our sample accurately represent the results of general population?
Higher is better
![Page 10: Statistical Validation And Data Analytics In e Discovery](https://reader035.vdocument.in/reader035/viewer/2022062305/568164a1550346895dd694e8/html5/thumbnails/10.jpg)
Statistics for eDiscoverySample Sizes for Population of 1,000,000
± 10% ± 5% ± 2%0
500
1,000
1,500
2,000
2,500
3,000
3,500
4,000
4,500
99% Confidence Level95% Confidence Level90% Confidence Level
Margin of Error
![Page 11: Statistical Validation And Data Analytics In e Discovery](https://reader035.vdocument.in/reader035/viewer/2022062305/568164a1550346895dd694e8/html5/thumbnails/11.jpg)
[Scaling] Statistics for eDiscovery
10,000 100,000 1,000,000 10,000,0002,800
3,000
3,200
3,400
3,600
3,800
4,000
4,200
4,400
Sample Sizes at 99% Confidence ± 2%
Population Size
![Page 12: Statistical Validation And Data Analytics In e Discovery](https://reader035.vdocument.in/reader035/viewer/2022062305/568164a1550346895dd694e8/html5/thumbnails/12.jpg)
A Recent Experience with SamplingFiltering Selection
Finding a good search method is difficult
Who chooses search terms?
Requires iterative testing and validation
![Page 13: Statistical Validation And Data Analytics In e Discovery](https://reader035.vdocument.in/reader035/viewer/2022062305/568164a1550346895dd694e8/html5/thumbnails/13.jpg)
A Recent Experience with SamplingValidating Filters
Began with around 10,000,000 documents
A 99% confidence level with a ± 2% confidence interval dictated a sample size of 4,150 documents
Chose a random sample and searched
Reviewed all the results (positive and negative)
![Page 14: Statistical Validation And Data Analytics In e Discovery](https://reader035.vdocument.in/reader035/viewer/2022062305/568164a1550346895dd694e8/html5/thumbnails/14.jpg)
A Recent Experience with SamplingValidating Filters
Results did not match expectations
Revised the list of search terms
Tested the filtering again, and…
A more accurate search with less responsive data!
![Page 15: Statistical Validation And Data Analytics In e Discovery](https://reader035.vdocument.in/reader035/viewer/2022062305/568164a1550346895dd694e8/html5/thumbnails/15.jpg)
A Recent Experience with SamplingValidating Filters
Wait a minute, I always test my keywords!
Not whether you test, but how much data…
![Page 16: Statistical Validation And Data Analytics In e Discovery](https://reader035.vdocument.in/reader035/viewer/2022062305/568164a1550346895dd694e8/html5/thumbnails/16.jpg)
A Recent Experience with SamplingValidating Review
After filtering about 120,000 documents to review
Reviewers often disagree about relevance or simply don’t understand the material
Double and triple review kills budgets
Instead, sample a random set of 4,010 reviewed documents
![Page 17: Statistical Validation And Data Analytics In e Discovery](https://reader035.vdocument.in/reader035/viewer/2022062305/568164a1550346895dd694e8/html5/thumbnails/17.jpg)
A Recent Experience with SamplingValidating Review
Subject matter expert noted a few anomalies
Re-reviewed items with the confusing term
One reviewer’s results could not be trusted
![Page 18: Statistical Validation And Data Analytics In e Discovery](https://reader035.vdocument.in/reader035/viewer/2022062305/568164a1550346895dd694e8/html5/thumbnails/18.jpg)
A Recent Experience with SamplingKeeping Your Vendors Honest
How do they test their tools?
How were automated tools used in your matter?
Do you know what they cannot do?
How did you use the results in your decisions?
![Page 19: Statistical Validation And Data Analytics In e Discovery](https://reader035.vdocument.in/reader035/viewer/2022062305/568164a1550346895dd694e8/html5/thumbnails/19.jpg)
What’s Next?
Built-in iterative review with statistical sampling
Relying solely on “Concept Searching” is a black box and a dead end
Advanced search techniques must offer explanatory reasoning
![Page 20: Statistical Validation And Data Analytics In e Discovery](https://reader035.vdocument.in/reader035/viewer/2022062305/568164a1550346895dd694e8/html5/thumbnails/20.jpg)
What does all this mean?(The Benefits of Using Statistics)
Small dataset for testing
Minimize false positives
More accurate search, reduced data volume
Defensibility of statistically validated testing
![Page 21: Statistical Validation And Data Analytics In e Discovery](https://reader035.vdocument.in/reader035/viewer/2022062305/568164a1550346895dd694e8/html5/thumbnails/21.jpg)
One last thing…
Technologies will always differ and change rapidly,
but statistical validation is a timeless truth.
![Page 22: Statistical Validation And Data Analytics In e Discovery](https://reader035.vdocument.in/reader035/viewer/2022062305/568164a1550346895dd694e8/html5/thumbnails/22.jpg)
References & Related Cases—The Sedona Conference Working Group Series, “Commentary
on Achieving Quality in the E-Discovery Process,” May 2009.—Losey, Ralph. “The Multi-Modal ‘Where’s Waldo?’ Approach
to Search…,” 2010. http://e-discoveryteam.com/2010/02/27/—William A. Gross Construction Associates, Inc. v. American
Manufacturers Mutual Insurance Co., 256 F.R.D. 134, 134 (S.D.N.Y. 2009)
—Victor Stanley v. Creative Pipe, 250 F.R.D. 251 (D. Md. 2008) —In re Seroquel Products Liability Litigation, 244 F.R.D. 650, 662
(M.D. Fla. 2007)
![Page 23: Statistical Validation And Data Analytics In e Discovery](https://reader035.vdocument.in/reader035/viewer/2022062305/568164a1550346895dd694e8/html5/thumbnails/23.jpg)
Statistical Validation And Data Analytics In eDiscovery
Geoff [email protected]
www.geoffblack.com/ediscovery