taming big data within the corporate litigation lifecycle

1. Taming Big Datawithin theCorporate LitigationLifecycleJeremy Greshin Chris ToomeyStoredIQ Catalyst

2. True or False? Every two days we create as much information as we did from the dawn ofcivilization up until 2003? Unstructured data is the largest and fastest growing segment of the digitaluniverse, growing at more than 62% a year? On average, more than 50% of data is more than 3 years old? 91% of data is never accessed beyond 90 days after it was created? 69% of data stored has no current business value? Collecting & reviewing data for eDiscovery costs $17,500/gigabyte? Review makes up the largest percentage of E-Discovery production costs? ANSWERS:(sources: Eric Schmidt of Google, Gartner, IDC, CGOC, EDRM, Sedona, RAND study) 3. Info Management Maturity Curve BIG BUSINEStrategic SSData-drivenEnterprise Corporate PolicyEnforcement BusinessValueBusiness ProcessOptimization Classification & RelevanceFind & Analyze Data Tactical BIGData DATA Intelligence LowHigh3 4. Identify Your Data through Data Topology Map4 5. Classify Data for RelevancePROTECT protect high value datathrough identification and isolation Defensible DataESI DESTROY identify and remove dataDeletionthat is unauthorized, or not related togeneral business5%30% - 50%EXPIRE delete aged data that is not on Permanent/ Legal Hold Legal HoldCLASSIFY optional phase for planningan information governance projectData SourceRetention100 TB ProtectDestroy Expire Classify Store Platform50 - 65TB Phase I Phase II Phase III Phase IV5 6. Intelligent Approach to ECA before collection Early Case Assessment can occur prior to preservation and collection, without moving data from where it natively resides Reduces the amount of downstream data ECA Qualitatively enriches the data for downstream review Enables assessment of the matter sooner and more accuratelyProcessing Lowers cost of formal Review and Analysis PreservationInformationIdentification ReviewProduction PresentationManagementCollection AnalysisVolume Relevance 7. Transition into Litigation ManagementProcessingPreservation Records Identification AnalysisProductionTrial/HearingManagement CollectionReviewData Volume (Actual) Relevance (% of Total) Electronic Discovery Reference Model Seventy percent of e-discovery costs are spent on processing, analysis, review, and production. - Forrester 8. E-Discovery CostsE-Discovery Costs Collection 8% Review Processing73%19% 9. Keyword Searching The BasicsAttorneys working withexperienced paralegals were ableonly to find about 20% of therelevant documents despite theirbelief that they had found morethan 75% of the relevantdocumentsDavid C. Blair & M.E. Maron, An Evaluation of Retrieval Effectivenessfor a Full-Text Document-Retrieval System (1985). 10. Are Lawyers Qualified to Search? Whether search terms will yield the information sought is a complicated question involving the sciences of computer technology, statistics and linguistics. Magistrate Facciola in U. S. v. OKeefe (D.D.C Feb. 2008) [A]ll keyword searches are not created equal. The only prudent way to test the reliability of the keyword search is to perform some appropriate sampling Magistrate Grimm in Victor Stanley v. Creative Pipe (D. Md. May 2008) 11. Are Key Words Enough?Federal Rule 502 (b):Advanced Analytics[A] party that uses advancedanalytical software applicationsand linguistic tools in screeningfor privilege and work productmay be found to have takenreasonable steps to preventinadvertent disclosure. Advisory Committee Notes on the Amended Rule 12. Search Tools to Streamline Content NIST and Systems Files Date Ranges, Custodians, File Types Key Words (Stemming, Proximity, Boolean) Concept searching Clustering Email and Near Duplicates Tracked Search Language Identification 13. Protecting Privilege Workflow SearchCategorizeReview Quality Control CodingValidationAttorney to Names Attorney Automated WorkflowAttorney toEmail Thread / ClientNear Duplicate Law FirmsAttorney to Integrated Client + OtherPrivilege Log Statistical Legal TermsCreationAttorney SamplingVendor 14. Production Quality ControlQuality ControlRule 502(b) requires reasonable precautions to prevent disclosures. CodingValidation Rule based production module to safeguard against producing documents coded as privileged Email Thread / Use email threads and near duplicate analysis to Near Duplicateverify that similar/related documents are tagged consistently Random statistically valid stratified sampling to QC productions Statistical Sampling 15. Is Manual Review the Gold Standard? The idea that exhaustive manual review is the most effective and therefore the most defensible approach to document review is strongly refuted. Technology assisted review can (and does) yield more accurate results than exhaustive manual review, with much lower effort- Maura Grossman and Gordon CommackTechnology-Assisted Review in E-Discovery Can Be More Effective and More Efficient that Exhaustive Manual Review, Richmond Journal of Law and Tech, Vol XVII, Issue 3 16. Predictive Coding 17. How does it work? 18. Predictive Coding Protocol Use on document samples or seed sets Transparency: Disclose the searches and results Use the samples to start training the computer Matter Expert reviews first 500 selected Computer refines search to validate Continued review through number of rounds Share the document sets Larger document set analyzed Non-relevant documents sampledand reviewed Cut-off point determined after sampling. 19. What are the Benefits of Predictive Coding? Cost savings can be significant Decrease review time Increased accuracy and quality Early Case Assessment tool Manage risk & exposure Enable small review teams and eliminate costly outsidereview Reduction of collections and preservation volumes 20. Potential Uses of Predictive Coding Powerful tool with many use cases, each with its own degree of perceived risk Parties Agreement High Early Case Assessment Search Terms Review Prioritization Advanced CullingMedium Automated Review For clients not ready for automated review, can still be used to achieve significant cost savings LowPerceived Risk 21. Predictive Ranking Trends in the Industry 37% use predictive ranking now 36% will start using it in thenext 12 months 88% of those who use it nowwill increase use in next 12months 1% of those who use it now willdecrease useSource: E-discovery Journal Poll, 2012 22. Benefits of Complete End to End Solution Lowers eDiscovery costs by reducing the amount of downstream data while improving search strategy outcomes Cost Enables assessment of matter & data quickly, accurately Time Formulate strategy with accurate cost predictability, control costs of review, target searching effectively, leverageStrategy technology. Comprehensive approach reduces data touch points, reducesspoliation, targets cost reductions, improves accuracy Risk 23. Jeremy Greshin Chris ToomeyStoredIQ Catalyst

taming big data within the corporate litigation lifecycle

Documents

document review

reviewing data

aged data

unstructured data

downstream review

review processing73

exhaustive manual review

downstream data eca