Transcript
Page 1: WHY I LOVE PREDICTIVE CODING - 2017 · I hope you get the opportunity to experience this for yourself someday. The TREC experiments in 2015 and 2016 on recall in predictive coding

WHYILOVEPREDICTIVECODING

MakingdocumentreviewfunagainwithMr.EDRandPredictiveCoding4.0

RalphLosey*e-DiscoveryTeam

Manylawyersandtechnologistslikepredictivecodingandrecommendittotheircolleagues.Theyhavegoodreasons.Ithasworkedforthem.Ithasallowedthemtodoe-discoveryreviewsinaneffective,costefficientmanner,especiallythebigprojects.Thatistrueformetoo,butthatisnotwhyIlovepredictivecoding.Myfeelingscomefromtheexcitement,fun,andamazementthatoftenarisefromseeingitinaction,firsthand.IlovewatchingthepredictivecodingfeaturesinmysoftwarefinddocumentsthatIcouldneverhavefoundonmyown.IlovethewaytheAIinthesoftwarehelpsmetodotheimpossible.IreallylovehowitmakesmefarsmarterandskilledthanIreallyam.

IhavebeengettingthosekindsofpositivefeelingsconsistentlybyusingthelatestPredictiveCoding4.0methodology(shownright)andKrolLDiscovery’slatesteDiscovery.comReviewsoftware(“EDR”).Sotoohavemye-DiscoveryTeammemberswhohelpedmetoparticipateinTREC2015and2016(thegreatscienceexperimentforthelatesttextsearchtechniquessponsoredbytheNationalInstituteofStandardsandTechnology).Duringourgruelingforty-fivedaysofexperimentsin2015,andagainforsixtydaysin2016,wecametoadmiretheintelligenceofthenewEDRsoftwaresomuchthatwedecidedtopersonalizetheAIasarobot.WenamedhimMr.EDRoutofrespect.Heevenhashisownwebsitenow,MrEDR.com,whereheexplainshowhehelpedmye-DiscoveryTeaminthe2015and2015TRECTotalRecallTrackexperiments.

*Thisisaneditedreprintoftheauthor’spersonalblog,e-discoveryteam.com,andcontainshispersonalopinionsandnotthoseofhislawfirmoritsclients.CopyrightRalphLosey2015,2017.Referencetoanyproductsshouldnotbeconstruedasacommercialendorsement.

Page 2: WHY I LOVE PREDICTIVE CODING - 2017 · I hope you get the opportunity to experience this for yourself someday. The TREC experiments in 2015 and 2016 on recall in predictive coding

2

Bottomlineforusfromthisresearchwastoproveandimproveourmethods.Ourlatestversion4.0ofPredictiveCoding,HybridMultimodalISTMethodistheresult.Wehaveevenopen-sourcedthismethod,wellmostofit,andteachitinafreeseventeen-classonlineprogram:TARcourse.com.Asidefromtestingandimprovingourmethods,another,perhapsevenmoreimportantresultofTRECforuswasourrediscoverythatwithgoodteamwork,andgoodsoftwarelikeMr.EDRatyourside,documentreviewneedneverbeboringagain.Thedocumentsthemselvesmaywellbeboringashell,that'sanothermatter,butthesearchforthemneednotbe.

HowandWhyPredictiveCodingisFun

StepsFour,FiveandSixofthestandardeight-stepworkflowforPredictiveCoding4.0iswhereweworkwiththeactivemachine-learningfeaturesofMr.EDR.Theseareitspredictivecodingfeatures,atypeofartificialintelligence.Wetrainthecomputeronourconceptionofrelevancebyshowingitrelevantandirrelevantdocumentsthatwehavefound.Thesoftwareisdesignedtothengooutandfindallotherrelevantdocumentsinthetotaldataset.Oneoftheskillswelearniswhenwehavetaughtenoughandcanstopthetrainingandcompletethedocumentreview.AtTRECwecallthattheStopdecision.Itisimportanttokeepdownthecostsofdocumentreview.

Weuseamultimodalapproachtofindtrainingdocuments,meaningweusealloftheothersearchfeaturesofMr.EDRtofindrelevantESI,suchaskeywordsearches,similarityandconcept.Weiteratethetrainingbysampledocuments,bothrelevantandirrelevant,untilthecomputerstartstounderstandthescopeofrelevancewehaveinmind.ItisatrainingexercisetomakeourAIsmart,togetittounderstandthebasicideasofrelevanceforthatcase.ItusuallytakesmultipleroundsoftrainingforMr.EDRtounderstandwhatwehaveinmind.Butheisafastlearner,andbyusingthelatesthybridmultimodalIST("intelligentlyspacedlearning")techniques,wecanusuallycompletehistraininginafewdays.AtTREC,whereweweremovingfastafterhourswiththeÃ-Team,wecompletedsomeofthetrainingexperimentsinjustafewhours.

AfterawhileMr.EDRstartsto“getit,”hestartstoreallyunderstandwhatweareafter,whatwethinkisrelevantinthecase.Thatiswhenahappyshockandawetypemomentcanhappen.ThatiswhenMr.EDR’sintelligenceandsearchabilitiesstarttoexceedourown.Yes.Ithappens.Thepupilthenstartstoevolvebeyondhisteachers.Thesmartalgorithmsstarttoseepatternsandfindevidenceinvisibletous.Atthatpointwesometimesevenlethimtrainhimselfbyautomaticallyacceptinghistop-rankedpredictedrelevantdocumentswithoutevenlookingatthem.Ourmainrolethenistodetermineagoodrangefortheautomaticacceptanceanddosomespot-checking.Weare,ineffect,allowingMr.EDRtotakeoverthereview.Ohwhatafeelingtothenwatchwhathappens,toseehimkeepfindingnewrelevantdocumentsandkeep

Page 3: WHY I LOVE PREDICTIVE CODING - 2017 · I hope you get the opportunity to experience this for yourself someday. The TREC experiments in 2015 and 2016 on recall in predictive coding

3

gettingsmarterandsmarterbyhisownself-programming.ThatisthespecialAI-highthatmakesitsomuchfuntoworkwithPredictiveCoding4.0andMr.EDR.

Itdoesnothappenineveryproject,butwiththenewPredictiveCoding4.0methodsandthelatestMr.EDR,weareseeingthiskindoftransformationhappenmoreandmoreoften.ItisatippingpointinthereviewwhenweseeMr.EDRgobeyondus.Hestartstounearthrelevantdocumentsthatmyteamwouldneverevenhavethoughttolookfor.Therelevantdocumentshefindsaresometimescompletelydissimilartoanyotherswefoundbefore.Theydonothavethesamekeywords,oreventhesameknownconcepts.Still,Mr.EDRseespatternsinthesedocumentsthatwedonot.Hecanfindthehiddengemsofrelevance,evenoutliersandblackswans,iftheyexist.Whenhestartstotrainhimself,thatisthepointinthereviewwhenwethinkofMr.EDRasgoingintosuperheromode.Atleast,thatisthewaymyyounge-DiscoveryTeammemberslikestotalkabouthim.

BytheendofmanyprojectsthealgorithmicfunctionsofMr.EDRhaveattainedahigherintelligenceandskilllevelthanourown(atleastonthetaskoffindingtherelevantevidenceinthedocumentcollection).Heisalwayslightningfastandinexhaustible,evenuntrained,butbytheendofhistraining,hebecomesasearchgenius.WatchingMr.EDRinthatkindofsuperheromodeiswhatmakesPredictiveCoding4.0apleasure.

TheEmpowermentofAIAugmentedSearch

ItishardtodescribethecombinationofprideandexcitementyoufeelwhenMr.EDR,yourstudent,takesyourtrainingandthengoesbeyondyou.Morethanthat,thesuper-AIyoucreatedthenempowersyoutodothingsthatwouldhavebeenimpossiblebefore,absurdeven.Thatfeelsprettygoodtoo.YoumaynotbeIronMan,orlooklikeRobertDowney,butyouwillbecapableofremarkablefeatsoflegalsearchstrength.

Page 4: WHY I LOVE PREDICTIVE CODING - 2017 · I hope you get the opportunity to experience this for yourself someday. The TREC experiments in 2015 and 2016 on recall in predictive coding

4

Forinstance,usingMr.EDRasourIronMan-likesuits,mye-discoveryÃ-Teamofthreeattorneyswasabletodothirtydifferentreviewprojectsandclassify17,014,085documentsin45days.See2015TRECexperimentsummaryatMr.EDR.Wedidtheseprojectsmostlyatnights,andonweekends,whileholdingdownourregularjobs.Whatmakesthiscrazyimpossible,isthatwewereabletoaccomplishthisbyonlypersonallyreviewing32,916documents.Thatislessthan0.2%ofthetotalcollection.Thatmeanswereliedonpredictivecodingtodo99.8%ofourreviewwork.Incredible,buttrue.

Usingtraditionallinearreviewmethodsitwouldhavetakenus45yearstoreviewthatmanydocuments!Instead,wediditin45days.Plusourrecallandprecisionrateswereinsanelygood.Weevenscored100%precisionand100%recallinoneTRECprojectin2015andtwomorein2016.Youreadthatright.Perfection.Manyofourotherprojectsattainedscoresinthehighandmidnineties.Wearenotsayingyouwillgetresultslikethat.Everyprojectisdifferent,andsomearemuchmoredifficultthanothers.ButwearesayingthatthiskindofAI-enhancedreviewisnotonlyfastandefficient,itiseffective.

Yes,it’sprettycoolwhenyourlittleAIcreationdoesalltheworkforyouandmakesyoulookgood.Still,norobotcoulddothiswithoutyourtrainingandsupervision.Weareateam,whichiswhywecallithybridmultimodal,manandmachine.

HavingFunwithScientificResearchatTREC2015and2016

Duringthe2015TRECTotalRecallTrackexperimentsmyteamwouldsometimesgettotallylostonafewofthereallyhardTopics.Wewerenotgivenlegalissuestosearch,asusual.Theywerearcanetechnicalhackerissues,politicalissues,orlocalnewsstories.Notonlywereweinnewfields,thescopeofrelevanceofthethirtyTopicswasneverreallyexplained.(Weweregivenonetothreewordexplanationsin2015,in2016wegotawholesentence!)WehadtofigureoutintendedrelevanceduringtheprojectbasedonfeedbackfromtheautomatedTRECdocumentadjudicationsystem.Wewouldhavesomelimitedunderstandingofrelevancebasedonoursuppositionsoftheinitialkeywordhints,andsowecouldbegintotrainMr.EDRwiththat.But,inseveralTopics,weneverhadanyrealunderstandingofexactlywhatTRECthoughtwasrelevant.

Thiswasaveryfrustratingsituationatfirst,but,andhereisthecoolthing,eventhoughwedidnotknow,Mr.EDRknew.That’sright.HesawtheTRECpatternsofrelevancehiddentousmeremortals.InmanyofthethirtyTopicswewouldjustsitbackandlethimdoallofthedriving,likeaGooglecar.Wewouldoftenjustcheerhimon(andeachother)astheTRECsystemskeptsayingMr.EDRwasright,thedocumentsheselectedwererelevant.Thetruthis,duringmuchofthe45daysofTRECwewerelikekidsinacandystorehavingagreat

Page 5: WHY I LOVE PREDICTIVE CODING - 2017 · I hope you get the opportunity to experience this for yourself someday. The TREC experiments in 2015 and 2016 on recall in predictive coding

5

time.ThatiswhenwedecidedtogiveMr.EDRacapeandsuperherostatus.Heneverletusdown.ItisagreatfeelingtocreateanAIwithgreaterintelligencethanyourownandthenseeitaugmentandimproveyourlegalwork.Itistrulyahybridhuman-machinepartnershipatitsbest.

Ihopeyougettheopportunitytoexperiencethisforyourselfsomeday.TheTRECexperimentsin2015and2016onrecallinpredictivecodingareover,butthesearchfortruthandjusticegoesoninlawsuitsacrossthecountry.Tryitonyournextdocumentreviewproject.

DoWhatYouLoveandLoveWhatYouDo

Mr.EDR,andothergoodpredictivecodingsoftwarelikeit,canaugmentourownabilitiesandmakeusincrediblyproductive.ThisiswhyIlovepredictivecodingandwouldnottradeitforanyotherlegalactivityIhaveeverdone(althoughIhavehadsimilarhighsfromoralargumentsthatwentgreat,ortherushthatcomesfromwinningabigcase).

TheexcitementofpredictivecodingcomesthroughclearlywhenMr.EDRisfullytrainedandabletocarryonwithoutyou.ItisakindofKurzweilianmini-singularityevent.Itusuallyhappensneartheendoftheproject,butcanhappenearlierwhenyourcomputercatchesontowhatyouwantandstartstofindthehiddengemsyoumissed.IsuggestyougivePredictiveCoding4.0andMr.EDRatry.TomakeiteasierIopen-sourcedourlatestmethodandcreatedanonlinecourse.TARcourse.com.Itwillteachanyoneourmethod,iftheyhavetherightsoftware.Learnthemethod,getthesoftwareandthenyoutoocanhavefunwithevidencesearch.Youtoocanlovewhatyoudo.Documentreviewneedneverbeboringagain.

Caution

Onenoteofcaution:moste-discoveryvendors,includingthelargest,donothaveactivemachinelearningfeaturesbuiltintotheirdocumentreviewsoftware.EventhefewthathaveactivemachinelearningdonotnecessarilyfollowtheHybridMultimodalISTPredictiveCoding4.0approachthatweusedtoattaintheseresults.Theyinsteadrelyentirelyonmachine-selecteddocumentsfortraining,orevenworse,relyentirelyonrandomselecteddocumentstotrainthesoftware,orhaveelaborateunnecessarysecretcontrolsets.

Thealgorithmsusedbysomevendorswhosaytheyhave"predictivecoding"or"artificialintelligence"arenotverygood.Scientiststellmethatsomeareonlydressed-upconceptsearch

Page 6: WHY I LOVE PREDICTIVE CODING - 2017 · I hope you get the opportunity to experience this for yourself someday. The TREC experiments in 2015 and 2016 on recall in predictive coding

6

orunsuperviseddocumentclustering.OnlybonafideactivemachinelearningalgorithmscreatethekindofAIexperiencethatIamtalkingabout.Softwarefordocumentreviewthatdoesnothaveanyactivemachinelearningfeaturesmaybecheap,andmaybepopular,buttheylackthepowerthatIlove.Withoutactivemachinelearning,whichisfundamentallydifferentfromjust"analytics,"itisnotpossibletoboostyourintelligencewithAI.Sobewareofsoftwarethatjustsaysithasadvancedanalytics.Askifithas"activemachinelearning"?

Itisimpossibletodothethingsdescribedinthisessayunlessthesoftwareyouareusinghasactivemachinelearningfeatures.Thisisclearlythewayofthefuture.ItiswhatmakesdocumentreviewenjoyableandwhyIlovetodobigprojects.Itturnsscarytofun.

So,ifyoutried"predictivecoding"or"advancedanalytics"before,anditdidnotworkforyou,itcouldwellbethesoftware’sfault,notyours.Oritcouldbethepoormethodyouwerefollowing.ThemethodthatwedevelopedinDaSilvaMoore,wheremyfirmrepresentedthedefense,wasaversion1.0method.DaSilvaMoorev.PublicisGroupe,287F.R.D.182,183(S.D.N.Y.2012).Wehavecomealongwaysincethen.Wehaveeliminatedunnecessaryrandomcontrolsetsandgonetocontinuoustraining,insteadoftrainthenreview.ThisisspelledoutintheTARcourse.comthatteachesourlatestversion4.0techniques.

Thenew4.0methodsarenothardtofollow.TheTARcourse.computsourmethodsonlineandeventeachesthetheoryandpractice.Andthe4.0methodscertainlywillwork.WehaveproventhatatTREC,butonlyifyouhavegoodsoftware.Withjustalittletraining,andsomehelpatfirstfromconsultants(mostvendorswithbonafideactivemachinelearningfeatureswillhavegoodonestohelp),youcanhavethekindofsuccessandexcitementthatIamtalkingabout.

Donotgiveupifitdoesnotworkforyouthefirsttime,especiallyinacomplexproject.Tryanothervendorinstead,onethatmayhavebettersoftwareandbetterconsultants.Also,besurethatyourconsultantsarePredictiveCoding4.0experts,andthatyoufollowtheiradvice.Finally,rememberthatthecheapestsoftwareisalmostneverthebest,and,inthelongrunwillcostyouasmallfortuneinwastedtimeandfrustration.

Page 7: WHY I LOVE PREDICTIVE CODING - 2017 · I hope you get the opportunity to experience this for yourself someday. The TREC experiments in 2015 and 2016 on recall in predictive coding

7

Conclusion

Lovewhatyoudo.Itisagreatfeelingandsurefirewaytojobsatisfactionandsuccess.Withthesenewpredictivecodingtechnologiesitiseasierthanevertolovee-discovery.Trythemout.TreatyourselftotheAIhighthatcomesfromusingsmartmachinelearningsoftwareandfastcomputers.Thereisnothingelselikeit.Ifyouswitchtothe4.0methodsandsoftware,youtoocanknowthatthrill.Youcanwatchanadvancedintelligence,whichyouhelpedcreate,exceedyourownabilities,exceedanyone’sabilities.YoucansitbackandwatchMr.EDRcompleteyoursearchforyou.Youcanwatchhimdosoinrecordtimeandwithrecordresults.Itisamazingtoseegoodsoftwarefinddocumentsthatyouknowyouwouldneverhavefoundonyourown.

PredictivecodingAIinsuperheromodecanbeexcitingtowatch.Whydepriveyourselfofthat?Whosaysdocumentreviewhastobeslowandboring?Startmakingthepracticeoflawfunagain.

__________

TheauthorcanbereachedatRalph.Losey@gmail.comoratworkatRalph.Losey@JacksonLewis.com.Consultationsbytheauthorrelatedtopredictivecoding,e-discoveryoranyotherfor-payservicesareprovidedexclusivelytocurrentclientsoftheauthor’slawfirm,JacksonLewisP.C.


Top Related