why i love predictive coding - 2017 · i hope you get the opportunity to experience this for...

7
WHY I LOVE PREDICTIVE CODING Making document review fun again with Mr. EDR and Predictive Coding 4.0 Ralph Losey * e-Discovery Team Many lawyers and technologists like predictive coding and recommend it to their colleagues. They have good reasons. It has worked for them. It has allowed them to do e-discovery reviews in an effective, cost efficient manner, especially the big projects. That is true for me too, but that is not why I love predictive coding. My feelings come from the excitement, fun, and amazement that often arise from seeing it in action, first hand. I love watching the predictive coding features in my software find documents that I could never have found on my own. I love the way the AI in the software helps me to do the impossible. I really love how it makes me far smarter and skilled than I really am. I have been getting those kinds of positive feelings consistently by using the latest Predictive Coding 4.0 methodology (shown right) and KrolLDiscovery’s latest eDiscovery.com Review software (“EDR”). So too have my e- Discovery Team members who helped me to participate in TREC 2015 and 2016 (the great science experiment for the latest text search techniques sponsored by the National Institute of Standards and Technology). During our grueling forty-five days of experiments in 2015, and again for sixty days in 2016, we came to admire the intelligence of the new EDR software so much that we decided to personalize the AI as a robot. We named him Mr. EDR out of respect. He even has his own website now, MrEDR.com, where he explains how he helped my e-Discovery Team in the 2015 and 2015 TREC Total Recall Track experiments. *This is an edited reprint of the author’s personal blog, e-discoveryteam.com, and contains his personal opinions and not those of his law firm or its clients. Copyright Ralph Losey 2015, 2017. Reference to any products should not be construed as a commercial endorsement.

Upload: others

Post on 21-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: WHY I LOVE PREDICTIVE CODING - 2017 · I hope you get the opportunity to experience this for yourself someday. The TREC experiments in 2015 and 2016 on recall in predictive coding

WHYILOVEPREDICTIVECODING

MakingdocumentreviewfunagainwithMr.EDRandPredictiveCoding4.0

RalphLosey*e-DiscoveryTeam

Manylawyersandtechnologistslikepredictivecodingandrecommendittotheircolleagues.Theyhavegoodreasons.Ithasworkedforthem.Ithasallowedthemtodoe-discoveryreviewsinaneffective,costefficientmanner,especiallythebigprojects.Thatistrueformetoo,butthatisnotwhyIlovepredictivecoding.Myfeelingscomefromtheexcitement,fun,andamazementthatoftenarisefromseeingitinaction,firsthand.IlovewatchingthepredictivecodingfeaturesinmysoftwarefinddocumentsthatIcouldneverhavefoundonmyown.IlovethewaytheAIinthesoftwarehelpsmetodotheimpossible.IreallylovehowitmakesmefarsmarterandskilledthanIreallyam.

IhavebeengettingthosekindsofpositivefeelingsconsistentlybyusingthelatestPredictiveCoding4.0methodology(shownright)andKrolLDiscovery’slatesteDiscovery.comReviewsoftware(“EDR”).Sotoohavemye-DiscoveryTeammemberswhohelpedmetoparticipateinTREC2015and2016(thegreatscienceexperimentforthelatesttextsearchtechniquessponsoredbytheNationalInstituteofStandardsandTechnology).Duringourgruelingforty-fivedaysofexperimentsin2015,andagainforsixtydaysin2016,wecametoadmiretheintelligenceofthenewEDRsoftwaresomuchthatwedecidedtopersonalizetheAIasarobot.WenamedhimMr.EDRoutofrespect.Heevenhashisownwebsitenow,MrEDR.com,whereheexplainshowhehelpedmye-DiscoveryTeaminthe2015and2015TRECTotalRecallTrackexperiments.

*Thisisaneditedreprintoftheauthor’spersonalblog,e-discoveryteam.com,andcontainshispersonalopinionsandnotthoseofhislawfirmoritsclients.CopyrightRalphLosey2015,2017.Referencetoanyproductsshouldnotbeconstruedasacommercialendorsement.

Page 2: WHY I LOVE PREDICTIVE CODING - 2017 · I hope you get the opportunity to experience this for yourself someday. The TREC experiments in 2015 and 2016 on recall in predictive coding

2

Bottomlineforusfromthisresearchwastoproveandimproveourmethods.Ourlatestversion4.0ofPredictiveCoding,HybridMultimodalISTMethodistheresult.Wehaveevenopen-sourcedthismethod,wellmostofit,andteachitinafreeseventeen-classonlineprogram:TARcourse.com.Asidefromtestingandimprovingourmethods,another,perhapsevenmoreimportantresultofTRECforuswasourrediscoverythatwithgoodteamwork,andgoodsoftwarelikeMr.EDRatyourside,documentreviewneedneverbeboringagain.Thedocumentsthemselvesmaywellbeboringashell,that'sanothermatter,butthesearchforthemneednotbe.

HowandWhyPredictiveCodingisFun

StepsFour,FiveandSixofthestandardeight-stepworkflowforPredictiveCoding4.0iswhereweworkwiththeactivemachine-learningfeaturesofMr.EDR.Theseareitspredictivecodingfeatures,atypeofartificialintelligence.Wetrainthecomputeronourconceptionofrelevancebyshowingitrelevantandirrelevantdocumentsthatwehavefound.Thesoftwareisdesignedtothengooutandfindallotherrelevantdocumentsinthetotaldataset.Oneoftheskillswelearniswhenwehavetaughtenoughandcanstopthetrainingandcompletethedocumentreview.AtTRECwecallthattheStopdecision.Itisimportanttokeepdownthecostsofdocumentreview.

Weuseamultimodalapproachtofindtrainingdocuments,meaningweusealloftheothersearchfeaturesofMr.EDRtofindrelevantESI,suchaskeywordsearches,similarityandconcept.Weiteratethetrainingbysampledocuments,bothrelevantandirrelevant,untilthecomputerstartstounderstandthescopeofrelevancewehaveinmind.ItisatrainingexercisetomakeourAIsmart,togetittounderstandthebasicideasofrelevanceforthatcase.ItusuallytakesmultipleroundsoftrainingforMr.EDRtounderstandwhatwehaveinmind.Butheisafastlearner,andbyusingthelatesthybridmultimodalIST("intelligentlyspacedlearning")techniques,wecanusuallycompletehistraininginafewdays.AtTREC,whereweweremovingfastafterhourswiththeÃ-Team,wecompletedsomeofthetrainingexperimentsinjustafewhours.

AfterawhileMr.EDRstartsto“getit,”hestartstoreallyunderstandwhatweareafter,whatwethinkisrelevantinthecase.Thatiswhenahappyshockandawetypemomentcanhappen.ThatiswhenMr.EDR’sintelligenceandsearchabilitiesstarttoexceedourown.Yes.Ithappens.Thepupilthenstartstoevolvebeyondhisteachers.Thesmartalgorithmsstarttoseepatternsandfindevidenceinvisibletous.Atthatpointwesometimesevenlethimtrainhimselfbyautomaticallyacceptinghistop-rankedpredictedrelevantdocumentswithoutevenlookingatthem.Ourmainrolethenistodetermineagoodrangefortheautomaticacceptanceanddosomespot-checking.Weare,ineffect,allowingMr.EDRtotakeoverthereview.Ohwhatafeelingtothenwatchwhathappens,toseehimkeepfindingnewrelevantdocumentsandkeep

Page 3: WHY I LOVE PREDICTIVE CODING - 2017 · I hope you get the opportunity to experience this for yourself someday. The TREC experiments in 2015 and 2016 on recall in predictive coding

3

gettingsmarterandsmarterbyhisownself-programming.ThatisthespecialAI-highthatmakesitsomuchfuntoworkwithPredictiveCoding4.0andMr.EDR.

Itdoesnothappenineveryproject,butwiththenewPredictiveCoding4.0methodsandthelatestMr.EDR,weareseeingthiskindoftransformationhappenmoreandmoreoften.ItisatippingpointinthereviewwhenweseeMr.EDRgobeyondus.Hestartstounearthrelevantdocumentsthatmyteamwouldneverevenhavethoughttolookfor.Therelevantdocumentshefindsaresometimescompletelydissimilartoanyotherswefoundbefore.Theydonothavethesamekeywords,oreventhesameknownconcepts.Still,Mr.EDRseespatternsinthesedocumentsthatwedonot.Hecanfindthehiddengemsofrelevance,evenoutliersandblackswans,iftheyexist.Whenhestartstotrainhimself,thatisthepointinthereviewwhenwethinkofMr.EDRasgoingintosuperheromode.Atleast,thatisthewaymyyounge-DiscoveryTeammemberslikestotalkabouthim.

BytheendofmanyprojectsthealgorithmicfunctionsofMr.EDRhaveattainedahigherintelligenceandskilllevelthanourown(atleastonthetaskoffindingtherelevantevidenceinthedocumentcollection).Heisalwayslightningfastandinexhaustible,evenuntrained,butbytheendofhistraining,hebecomesasearchgenius.WatchingMr.EDRinthatkindofsuperheromodeiswhatmakesPredictiveCoding4.0apleasure.

TheEmpowermentofAIAugmentedSearch

ItishardtodescribethecombinationofprideandexcitementyoufeelwhenMr.EDR,yourstudent,takesyourtrainingandthengoesbeyondyou.Morethanthat,thesuper-AIyoucreatedthenempowersyoutodothingsthatwouldhavebeenimpossiblebefore,absurdeven.Thatfeelsprettygoodtoo.YoumaynotbeIronMan,orlooklikeRobertDowney,butyouwillbecapableofremarkablefeatsoflegalsearchstrength.

Page 4: WHY I LOVE PREDICTIVE CODING - 2017 · I hope you get the opportunity to experience this for yourself someday. The TREC experiments in 2015 and 2016 on recall in predictive coding

4

Forinstance,usingMr.EDRasourIronMan-likesuits,mye-discoveryÃ-Teamofthreeattorneyswasabletodothirtydifferentreviewprojectsandclassify17,014,085documentsin45days.See2015TRECexperimentsummaryatMr.EDR.Wedidtheseprojectsmostlyatnights,andonweekends,whileholdingdownourregularjobs.Whatmakesthiscrazyimpossible,isthatwewereabletoaccomplishthisbyonlypersonallyreviewing32,916documents.Thatislessthan0.2%ofthetotalcollection.Thatmeanswereliedonpredictivecodingtodo99.8%ofourreviewwork.Incredible,buttrue.

Usingtraditionallinearreviewmethodsitwouldhavetakenus45yearstoreviewthatmanydocuments!Instead,wediditin45days.Plusourrecallandprecisionrateswereinsanelygood.Weevenscored100%precisionand100%recallinoneTRECprojectin2015andtwomorein2016.Youreadthatright.Perfection.Manyofourotherprojectsattainedscoresinthehighandmidnineties.Wearenotsayingyouwillgetresultslikethat.Everyprojectisdifferent,andsomearemuchmoredifficultthanothers.ButwearesayingthatthiskindofAI-enhancedreviewisnotonlyfastandefficient,itiseffective.

Yes,it’sprettycoolwhenyourlittleAIcreationdoesalltheworkforyouandmakesyoulookgood.Still,norobotcoulddothiswithoutyourtrainingandsupervision.Weareateam,whichiswhywecallithybridmultimodal,manandmachine.

HavingFunwithScientificResearchatTREC2015and2016

Duringthe2015TRECTotalRecallTrackexperimentsmyteamwouldsometimesgettotallylostonafewofthereallyhardTopics.Wewerenotgivenlegalissuestosearch,asusual.Theywerearcanetechnicalhackerissues,politicalissues,orlocalnewsstories.Notonlywereweinnewfields,thescopeofrelevanceofthethirtyTopicswasneverreallyexplained.(Weweregivenonetothreewordexplanationsin2015,in2016wegotawholesentence!)WehadtofigureoutintendedrelevanceduringtheprojectbasedonfeedbackfromtheautomatedTRECdocumentadjudicationsystem.Wewouldhavesomelimitedunderstandingofrelevancebasedonoursuppositionsoftheinitialkeywordhints,andsowecouldbegintotrainMr.EDRwiththat.But,inseveralTopics,weneverhadanyrealunderstandingofexactlywhatTRECthoughtwasrelevant.

Thiswasaveryfrustratingsituationatfirst,but,andhereisthecoolthing,eventhoughwedidnotknow,Mr.EDRknew.That’sright.HesawtheTRECpatternsofrelevancehiddentousmeremortals.InmanyofthethirtyTopicswewouldjustsitbackandlethimdoallofthedriving,likeaGooglecar.Wewouldoftenjustcheerhimon(andeachother)astheTRECsystemskeptsayingMr.EDRwasright,thedocumentsheselectedwererelevant.Thetruthis,duringmuchofthe45daysofTRECwewerelikekidsinacandystorehavingagreat

Page 5: WHY I LOVE PREDICTIVE CODING - 2017 · I hope you get the opportunity to experience this for yourself someday. The TREC experiments in 2015 and 2016 on recall in predictive coding

5

time.ThatiswhenwedecidedtogiveMr.EDRacapeandsuperherostatus.Heneverletusdown.ItisagreatfeelingtocreateanAIwithgreaterintelligencethanyourownandthenseeitaugmentandimproveyourlegalwork.Itistrulyahybridhuman-machinepartnershipatitsbest.

Ihopeyougettheopportunitytoexperiencethisforyourselfsomeday.TheTRECexperimentsin2015and2016onrecallinpredictivecodingareover,butthesearchfortruthandjusticegoesoninlawsuitsacrossthecountry.Tryitonyournextdocumentreviewproject.

DoWhatYouLoveandLoveWhatYouDo

Mr.EDR,andothergoodpredictivecodingsoftwarelikeit,canaugmentourownabilitiesandmakeusincrediblyproductive.ThisiswhyIlovepredictivecodingandwouldnottradeitforanyotherlegalactivityIhaveeverdone(althoughIhavehadsimilarhighsfromoralargumentsthatwentgreat,ortherushthatcomesfromwinningabigcase).

TheexcitementofpredictivecodingcomesthroughclearlywhenMr.EDRisfullytrainedandabletocarryonwithoutyou.ItisakindofKurzweilianmini-singularityevent.Itusuallyhappensneartheendoftheproject,butcanhappenearlierwhenyourcomputercatchesontowhatyouwantandstartstofindthehiddengemsyoumissed.IsuggestyougivePredictiveCoding4.0andMr.EDRatry.TomakeiteasierIopen-sourcedourlatestmethodandcreatedanonlinecourse.TARcourse.com.Itwillteachanyoneourmethod,iftheyhavetherightsoftware.Learnthemethod,getthesoftwareandthenyoutoocanhavefunwithevidencesearch.Youtoocanlovewhatyoudo.Documentreviewneedneverbeboringagain.

Caution

Onenoteofcaution:moste-discoveryvendors,includingthelargest,donothaveactivemachinelearningfeaturesbuiltintotheirdocumentreviewsoftware.EventhefewthathaveactivemachinelearningdonotnecessarilyfollowtheHybridMultimodalISTPredictiveCoding4.0approachthatweusedtoattaintheseresults.Theyinsteadrelyentirelyonmachine-selecteddocumentsfortraining,orevenworse,relyentirelyonrandomselecteddocumentstotrainthesoftware,orhaveelaborateunnecessarysecretcontrolsets.

Thealgorithmsusedbysomevendorswhosaytheyhave"predictivecoding"or"artificialintelligence"arenotverygood.Scientiststellmethatsomeareonlydressed-upconceptsearch

Page 6: WHY I LOVE PREDICTIVE CODING - 2017 · I hope you get the opportunity to experience this for yourself someday. The TREC experiments in 2015 and 2016 on recall in predictive coding

6

orunsuperviseddocumentclustering.OnlybonafideactivemachinelearningalgorithmscreatethekindofAIexperiencethatIamtalkingabout.Softwarefordocumentreviewthatdoesnothaveanyactivemachinelearningfeaturesmaybecheap,andmaybepopular,buttheylackthepowerthatIlove.Withoutactivemachinelearning,whichisfundamentallydifferentfromjust"analytics,"itisnotpossibletoboostyourintelligencewithAI.Sobewareofsoftwarethatjustsaysithasadvancedanalytics.Askifithas"activemachinelearning"?

Itisimpossibletodothethingsdescribedinthisessayunlessthesoftwareyouareusinghasactivemachinelearningfeatures.Thisisclearlythewayofthefuture.ItiswhatmakesdocumentreviewenjoyableandwhyIlovetodobigprojects.Itturnsscarytofun.

So,ifyoutried"predictivecoding"or"advancedanalytics"before,anditdidnotworkforyou,itcouldwellbethesoftware’sfault,notyours.Oritcouldbethepoormethodyouwerefollowing.ThemethodthatwedevelopedinDaSilvaMoore,wheremyfirmrepresentedthedefense,wasaversion1.0method.DaSilvaMoorev.PublicisGroupe,287F.R.D.182,183(S.D.N.Y.2012).Wehavecomealongwaysincethen.Wehaveeliminatedunnecessaryrandomcontrolsetsandgonetocontinuoustraining,insteadoftrainthenreview.ThisisspelledoutintheTARcourse.comthatteachesourlatestversion4.0techniques.

Thenew4.0methodsarenothardtofollow.TheTARcourse.computsourmethodsonlineandeventeachesthetheoryandpractice.Andthe4.0methodscertainlywillwork.WehaveproventhatatTREC,butonlyifyouhavegoodsoftware.Withjustalittletraining,andsomehelpatfirstfromconsultants(mostvendorswithbonafideactivemachinelearningfeatureswillhavegoodonestohelp),youcanhavethekindofsuccessandexcitementthatIamtalkingabout.

Donotgiveupifitdoesnotworkforyouthefirsttime,especiallyinacomplexproject.Tryanothervendorinstead,onethatmayhavebettersoftwareandbetterconsultants.Also,besurethatyourconsultantsarePredictiveCoding4.0experts,andthatyoufollowtheiradvice.Finally,rememberthatthecheapestsoftwareisalmostneverthebest,and,inthelongrunwillcostyouasmallfortuneinwastedtimeandfrustration.

Page 7: WHY I LOVE PREDICTIVE CODING - 2017 · I hope you get the opportunity to experience this for yourself someday. The TREC experiments in 2015 and 2016 on recall in predictive coding

7

Conclusion

Lovewhatyoudo.Itisagreatfeelingandsurefirewaytojobsatisfactionandsuccess.Withthesenewpredictivecodingtechnologiesitiseasierthanevertolovee-discovery.Trythemout.TreatyourselftotheAIhighthatcomesfromusingsmartmachinelearningsoftwareandfastcomputers.Thereisnothingelselikeit.Ifyouswitchtothe4.0methodsandsoftware,youtoocanknowthatthrill.Youcanwatchanadvancedintelligence,whichyouhelpedcreate,exceedyourownabilities,exceedanyone’sabilities.YoucansitbackandwatchMr.EDRcompleteyoursearchforyou.Youcanwatchhimdosoinrecordtimeandwithrecordresults.Itisamazingtoseegoodsoftwarefinddocumentsthatyouknowyouwouldneverhavefoundonyourown.

PredictivecodingAIinsuperheromodecanbeexcitingtowatch.Whydepriveyourselfofthat?Whosaysdocumentreviewhastobeslowandboring?Startmakingthepracticeoflawfunagain.

__________

TheauthorcanbereachedatRalph.Losey@gmail.comoratworkatRalph.Losey@JacksonLewis.com.Consultationsbytheauthorrelatedtopredictivecoding,e-discoveryoranyotherfor-payservicesareprovidedexclusivelytocurrentclientsoftheauthor’slawfirm,JacksonLewisP.C.