e-discovery team at trec 2015 total recall track

116
e-Discovery Team at TREC 2015 Total Recall Track Ralph C. Losey Jim Sullivan and Tony Reichenberger National e-Discovery Counsel Sr. Discovery Services Consultants, Jackson Lewis P.C. Kroll Ontrack, Inc. e-DiscoveryTeam.com eDiscovery.com [email protected] [email protected] [email protected] ABSTRACT The 2015 TREC Total Recall Track provided instant relevance feedback in thirty prejudged topics searching three different datasets. The e-Discovery Team of three attorneys specializing in legal search participated in all thirty topics using Kroll Ontrack’s search and review software, eDiscovery.com Review (EDR). They employed a hybrid approach to continuous active learning that uses both manual and automatic searches. A variety of manual search methods were used to find training documents, including high probability ranked documents and keywords, an ad hoc process the Team calls multimodal. In the one topic (109) requiring legal analysis the Team’s approach was significantly more effective than all other participants, including the fully automated approaches that otherwise attained comparable scores. In all topics the Team’s hybrid multimodal method consistently attained the highest F1 values at the time of Reasonable Call, equivalent to a stop point. In all topics the Team’s multimodal human machine approach also found relevant documents more quickly and with greater precision than the fully automated or other methods. Categories and Subject Descriptors: H.3.3 Information Search and Retrieval: Search process, relevance feedback, supervised learning, best practices. Keywords: Hybrid Multimodal; AI-enhanced review; predictive coding; predictive coding 3.0; electronic discovery; e-discovery; legal search; active machine learning; continuous active learning; CAL; Computer-assisted review; CAR; Technology-assisted review; TAR; relevant irrelevant training ratios. 1. INTRODUCTION The e-Discovery Team participated in all thirty Total Recall Track topics in the Athome group where both manual and automatic methods were permitted. The Team is composed of three practicing attorneys who specialize in legal search. They used Kroll Ontrack’s search and review software, eDiscovery.com Review (“EDR”), employing what they call a hybrid multimodal method. 1 They attained high recall and precision in most of the thirty topics. The few exceptions appear derived from the fact that the attorneys are accustomed to self-defining the ground truth, and, in some topics, their opinions on relevance differed significantly from the TREC assessors. In later topics the attorney Team learned to turn off their own judgments and rely primarily on their software’s automated processes, which generally led to improved scores better matching the TREC relevance assessments. The Team’s manual efforts, as measured by time expended and number of documents manually reviewed, were very low by legal search standards. The views expressed herein are solely those of the author, Ralph Losey, and should not be attributed to his firm or its clients.

Upload: ngobao

Post on 06-Feb-2017

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: e-Discovery Team at TREC 2015 Total Recall Track

e-DiscoveryTeamatTREC2015TotalRecallTrack

RalphC.Losey∗ JimSullivanandTonyReichenbergerNationale-DiscoveryCounsel Sr.DiscoveryServicesConsultants,

JacksonLewisP.C. KrollOntrack,Inc.

e-DiscoveryTeam.com eDiscovery.com

[email protected] [email protected]

[email protected]

ABSTRACT The2015TRECTotalRecallTrackprovidedinstantrelevancefeedbackinthirtyprejudged

topicssearchingthreedifferentdatasets.Thee-DiscoveryTeamofthreeattorneysspecializingin

legalsearchparticipatedinallthirtytopicsusingKrollOntrack’ssearchandreviewsoftware,

eDiscovery.comReview(EDR).Theyemployedahybridapproachtocontinuousactivelearningthatusesbothmanualandautomaticsearches.Avarietyofmanualsearchmethodswereused

tofindtrainingdocuments,includinghighprobabilityrankeddocumentsandkeywords,anadhocprocesstheTeamcallsmultimodal. Intheonetopic(109)requiringlegalanalysistheTeam’sapproachwassignificantlymore

effectivethanallotherparticipants,includingthefullyautomatedapproachesthatotherwise

attainedcomparablescores.InalltopicstheTeam’shybridmultimodalmethodconsistently

attainedthehighestF1valuesatthetimeofReasonableCall,equivalenttoastoppoint.InalltopicstheTeam’smultimodalhumanmachineapproachalsofoundrelevantdocumentsmore

quicklyandwithgreaterprecisionthanthefullyautomatedorothermethods.

CategoriesandSubjectDescriptors:H.3.3InformationSearchandRetrieval:Searchprocess,

relevancefeedback,supervisedlearning,bestpractices.

Keywords:HybridMultimodal;AI-enhancedreview;predictivecoding;predictivecoding

3.0;electronicdiscovery;e-discovery;legalsearch;activemachinelearning;continuousactive

learning;CAL;Computer-assistedreview;CAR;Technology-assistedreview;TAR;relevant

irrelevanttrainingratios.

1. INTRODUCTION Thee-DiscoveryTeamparticipatedinallthirtyTotalRecallTracktopicsintheAthomegroupwherebothmanualandautomaticmethodswerepermitted.TheTeamiscomposedofthree

practicingattorneyswhospecializeinlegalsearch.TheyusedKrollOntrack’ssearchandreview

software,eDiscovery.comReview(“EDR”),employingwhattheycallahybridmultimodalmethod.

1Theyattainedhighrecallandprecisioninmostofthethirtytopics.Thefewexceptions

appearderivedfromthefactthattheattorneysareaccustomedtoself-definingtheground

truth,and,insometopics,theiropinionsonrelevancedifferedsignificantlyfromtheTREC

assessors.InlatertopicstheattorneyTeamlearnedtoturnofftheirownjudgmentsandrely

primarilyontheirsoftware’sautomatedprocesses,whichgenerallyledtoimprovedscores

bettermatchingtheTRECrelevanceassessments.TheTeam’smanualefforts,asmeasuredby

timeexpendedandnumberofdocumentsmanuallyreviewed,wereverylowbylegalsearch

standards.

∗Theviewsexpressedhereinaresolelythoseoftheauthor,RalphLosey,andshouldnotbeattributedtohisfirmoritsclients.

Page 2: e-Discovery Team at TREC 2015 Total Recall Track

2

ThefullyautomaticmethodsemployedbytheSandboxgroupparticipantsintheTotalRecallTrackattainedcomparablehighrecallandprecisioninmosttopics.TheTeam’shybridmultimodalmethoddid,however,consistentlyattainthehighestF1valuesatthetimeof

ReasonableCall,equivalenttoatrainingstoppoint,whichisveryimportanttolegalsearch.Oneofthethirtytopics,109-ScarletLetterLaw-requiredasmallamountoflegalknowledgeand

analysistounderstandrelevance(mostoftheothersrequirednone).Onthistopicourlegal

team,asyouwouldexpect,attainedsignificantlybetterresultsthanthefullyautomated

methodsthatcontainednobaselegalknowledge.

Thee-DiscoveryTeam’shybridmultimodalmethodisatypeofcontinuousactivelearning

textretrievalsystemthatemployssupervisedmachinelearningandavarietyofmanualsearch

methods.2,3

TheTeamattainedveryhighrecallandprecisionratesinmost,butnotall,ofthe

thirtyTotalRecalltopics.TheTeam’sF1scoresatthetimeofReasonableCallrangedfroma

perfectscoreof100%inonetopic(3484),to91%to99%ineighttopics,and82%-87%infive

others.Although,ofcourse,notdirectlycomparable,thesescoresarefarhigherthanany

previouslyrecordedinthesixyearsofTRECLegalTrack(2006-2011)oranyotherstudyoflegal

search.Onereasonforthismaybethatthethirtytopicsinthe2015TotalRecalltrackpresented

relativelysimpleinformationneedsbylegalsearchstandards,withoneexception(Topic109–

ScarletLetterLaw).AnothermaybeimprovedsoftwareandtheTeam’simprovedhybridmultimodalmethodthatincludescontinuousactivelearning.

Thee-DiscoveryTeamwasabletofindthetargetrelevantdocumentsinallthirtytopicswith

relativelylittlehumaneffortandalmostnolegalanalysis.OnlyTopic109requiredlegal

knowledgeandanalysis,withfourothers-101,105,106,107-requiringsomesmallmeasureof

analysis.

Atotalof16,576,798documentswereclassifiedinthirtytopics.Ofthesedocuments70,414

werepredeterminedbyTRECassessorstoberelevant.Thee-DiscoveryTeamfoundthese

relevantdocumentsbymanualreviewofonly32,916documents.Theother37,498relevant

documentswerefoundwithnohumanreviewofthesedocuments.1.1 TotalRecallTrackDescription–AthomeandSandbox. TheTotalRecalltrackoffered30differentpre-judgedtopicsforsearchintwodifferent

divisions,AthomeandSandbox.OurTeamonlyparticipatedintheAthomeexperiments.IntheAthomeexperimentsthedatawasloadedontotheparticipants’owncomputers.Therewereno

restrictionsonthetypesofsearchesthatcouldbeperformed.Thesetupallowedthee-DiscoveryTeamtouseaslightlymodifiedversionofourstandardHybridMultimodalmethod,

which,asmentioned,employsbothadhocmanualreviewandmachinelearning.

TheSandboxparticipantswereonlypermittedtousefullyautomatedsystemsandthedata

remainedonTRECadministratorcomputers.TheysearchedthesamethreedatasetsasAthome,plustwomorenotincludedintheAthomedivisionduetoconfidentialityrestrictions.TheSandboxparticipantswereprohibitedfromanymanualreviewofdocumentsoradhocsearchadjustments.

4Evenafterthesubmissionsended,theSandboxparticipantsreportedatthe

Conferencethattheyneverlookedatanydocuments,eventheunrestrictedAthomeshareddatasets.Theynevermadeanyefforttodeterminewheretheirsoftwaremadeerrorsin

predictingrelevance,orforanyotherreasons.Totheseparticipants,allofwhomwereacademic

institutions,thegroundtruthitselfwasofnorelevance.

ThreedifferentdatasetsweresearchedinboththeAthomeandSandboxevents,withthesametentopicsineach.Eventhoughthedatasearchedandtopicsoverlappedinthetwo

divisions,noneoftheparticipantsinonedivisionparticipatedintheotherdivision.Thisis

unfortunatebecauseitmakesdirectcomparisonsproblematic,ifnotimpossible,especiallyasto

Page 3: e-Discovery Team at TREC 2015 Total Recall Track

3

thesoftwaresystemsused.ItishopethatsomeparticipantswillparticipateinbotheventsinfutureTotalRecalltracks. Thee-DiscoveryTeamparticipatedinallthirtyoftheAthometopics.Weweretheonlymanualparticipanttodoso,withallotherscompletingtenorfewertopics.ThelackofparticipationbyothersintheAthomegroupalsomakemeaningfulcomparisonsverydifficultorimpossible,butwenotethatthee-DiscoveryTeam’sscoreswereconsistentlyhigherthananyotherAthomeparticipants. AtHomeparticipantswereaskedtotrackandreporttheirmanualefforts.Thee-DiscoveryTeamdidthisbyrecordingthenumberofdocumentsthatwerehumanreviewedandclassified.Virtuallyalldocumentshumanreviewedwerealsoclassified,althoughalldocumentsclassifiedwerenotusedforactivetrainingofthesoftwareclassifier.Moreover53%oftherelevantdocumentsusedfortrainingwereneverhumanreviewed.Wealsotrackedeffortbynumberofattorneyhoursworkedasistraditionalinlegalservices. TheTeamusedKrollOntrack’ssoftware,knownaseDiscovery.comReview,orEDR,whichincludesactivemachinelearningfeatures,a/k/apredictivecodinginlegalsearch.EDRemploysaproprietaryprobabilistictypeoflogisticregressionalgorithmfordocumentclassificationandranking. TheAtHomeparticipantsusedtheirowncomputersystemsandsoftwareforsearch,andthensubmitteddocumentstotheTRECadministratorthattheyconsideredrelevant.TRECsetupa“jig”wherebyinstantfeedbackwasprovidedtoaparticipantaswhethereachdocumentsubmittedasrelevantwasinfactpreviouslyjudgedtohavebeenrelevantbyTRECassessors.Whenaparticipantdeterminedthatareasonableefforthadbeenmadetofindallrelevantdocumentsrequired,whichisimportantinlegalsearchandrepresentsastoppingpointforfurthermachinetraininganddocumentreview,theywouldnotifyTRECofthissuppositionand“CallReasonable.”Continuedsubmissionsweremadeafterthatpointsothatalldocumentswereclassifiedaseitherrelevantorirrelevant.ThegoalasweunderstooditwastosubmitasmanyrelevantdocumentsaspossiblebeforetheReasonablecall,andthereaftertohaveallfalsenegativesappearinsubmissionsassoonaftertheReasonableCallaspossible. Mostofthethirtytopicspresentedonlysimple,single-issueinformationneedssuitableforsingle-facetclassification.Further,onlyafewofthetopicsrequiredanylegalanalysisforrelevanceidentification.Thesetwofactors,plustheomissionofmetadata,was,wethink,adisadvantagetothee-DiscoveryTeamoflawyers.Conversely,itappearsthatthesesamefactorsmadeitsimplerfortheacademicSandboxparticipantstoperformwellinmosttopicsusingfullyautomatedmethods.ItshouldalsobenotedthatalthoughourlawyerTeamwaspracticedandskilledincomplexinformationneedsrequiringextensivelegalanalysis,andhadlongexperiencewithprojectsusingSMEdefinedgroundtruths,nonehadanypriorexperienceusingmachinelearningforthetypesofsearchespresentedinthe2015RecallTrack. TheoneexceptionthatbroughtinlegalanalysiswithbeneficialSMEanalysis,wasTopic109,ScarlettLetterLaw.Itrequiredsomelegalknowledge,albeitveryrudimentary,tobeginlocatingrelevantdocuments.Thekeywordsalone-“ScarlettLetterLaw”–wouldonlyfindrelevantdocumentswiththiswordcombinationandsimilartextpatterns.ThesewordswerejustthenicknameoftheproposedandeventuallyenactedFloridaStatute.Anyattorneywouldknowthattofindrelevantinformationtheywouldnotonlyhavetosearchthename,buttheywouldalsohavetosearchthevarioushouseandsenatebillnumbersforthislaw.Thesenumberswouldnotoftenappearinthesamedocumentasthenickname,andsincethemachinedidnotknowtosearchforthesenumbers,itdidnotrealizethesignificance.Eventuallytheautomatedmachinelearningdidseetheconnection,aftermanyrelevancefeedbacksubmissions.These

Page 4: e-Discovery Team at TREC 2015 Total Recall Track

4

submissionsandinstantfeedbackofrelevant,ornot,would,ofcourse,nothappeninreallegalsearch.1.2GovernorBushEmail ThefirstsetofAthomeTopicssearchedacorpusof290,099emailsofFloridaGovernorJebBush.Mostofthemetadataoftheseemailsandassociatedattachmentsandimageshadbeenstrippedandconvertedtopuretextfiles.ThisincreasedthedifficultyoftheTeam’ssearch,whichnormallyincludesamixtureofmetadataspecificsearches. AsignificantpercentageoftheBushemailswereformtypelobbyingemailsfromconstituents,whichrepeatedthesamelanguagewithlittleofnovariance.Theunusuallyhighprevalenceofnear-duplicateemailsmadesearchofmanyoftheBushtopicseasierthanistypicalinlegalsearch. ThetenBushemailtopicssearched,andtheirnames,whichweretheonlyguidanceonrelevanceprovidedtoeithertheAthomeorSandboxparticipants,areshownbelow.

Topic100SchoolandPreschoolFunding

Topic101 JudicialSelectionTopic102 CapitalPunishmentTopic103 ManateeProtectionTopic104 NewMedicalSchoolsTopic105 AffirmativeActionTopic106 TerriSchiavoTopic107 TortReformTopic108 ManateeCountyTopic109 ScarletLetterLaw

E-DiscoveryTeamleader,RalphLosey,alifelongFloridanative,personallysearchedeachofthesetenTopics.Inabouthalfofthetopicshispersonalknowledgeoftheissueswashelpful,butinseveralothersitwasdetrimental.HehaddefinitepreconceptionsofwhatemailshethoughtshouldberelevantandthesesometimesdifferedsignificantlyfromtheTRECassessors.InalloftheBushTopicsLoseywasatleastsomewhatassistedbyasingle“contractreviewattorney.”5ThecontractattorneysinmostofthesetenTopicsdidamajorityofthedocumentreviewunderLosey’sveryclosesupervision,buthadonlylimitedinvolvementininitialkeywordsearches,andnoinvolvementinpredictivecodingsearchesorrelateddecisions. Allparticipantsinthe2015RecallTrackwererequiredtocompletealltenoftheBushEmailTopics.CompletionoftheothertwentyTopicsinthetwootherdatacollectionswasoptional.SeveralparticipantsstartedreviewoftheBushTopics,butdidnotfinish,andthuswerenotpermittedtosubmitareportorattendtheTRECConference.OnlyoneotherAthomeparticipant,Catalyst,completedalltenBushTopics.NootherAthomeparticipantsevenattemptedtheothertwentytopics,andthuscomparisonswiththee-DiscoveryTeam’sresultsarelimitedtothefullyautomaticparticipants.1.3BlackHatWorldForums. ThesecondsetofAthomeTopicssearchedacorpusof465,149poststakenfromBlackHatWorldForums.Again,almostallmetadataofthesepostsandassociatedimageshadbeenstrippedandconvertedtopuretextfiles.Thetentopicssearched,andtheirnames,whichagainweretheonlyguidanceinitiallyprovidedonrelevance,areshownbelow.

Page 5: e-Discovery Team at TREC 2015 Total Recall Track

5

Topic2052

PayingforAmazonBook

Reviews

Topic2108 CAPTCHAServices

Topic2129 FacebookAccounts

Topic2130 SurelyBitcoinscanbeUsed

Topic2134 PayPalAccounts

Topic2158

UsingTORforAnonymous

InternetBrowsing

Topic2225 Rootkits

Topic2322 WebScraping

Topic2333 ArticleSpinnerSpinning

Topic2461 OffshoreHostSites

TheTeammembersagainhadexpertiseissueswithsomeofthesearcanetopicsthattheyhappenedtobefamiliarwith.Theirknowledgewouldsometimesprovedetrimental.Again,asthereviewcontinued,theTeammemberslearnedtosuspendtheirownknowledgeandgroundtruthjudgmentsandinsteadrelyentirelyontheautomatedrankingsearches,muchlikethefullyautomatedparticipantsalwaysnecessarilydid.1.4 LocalNewsArticles.

ThethirdsetofAthomeTopicssearchedacorpusof902,434onlineLocalNewsArticles,againintextonlyformat.Thetentopicssearched,andtheirnames,whichagainweretheonlyguidanceprovidedonrelevanceasidefromtheinstantfeedback,areshownbelow.

Topic3089 PicktonMurders

Topic3133 PacificGateway

Topic3226 TrafficEnforcementCameras

Topic3290

RoosterTurkeyChicken

Nuisance

Topic3357 OccupyVancouver

Topic3378

RobMcKennaGubernatorial

Candidate

Topic3423 RobFordCuttheWaist

Topic3431 KingstonMillsLockMurders

Topic3481 Fracking

Topic3484 PaulandCathyLeeMartin

TheTeamfoundtheNewsArticleslessdifficulttoworkwiththanourtypicallegalsearchofcorporateESI.Still,thesamekindofgroundtruthvalidityandconsistencyissueswerenotedinsomeofthenewstopics,buttoalesserdegreethantheothertwodatasets.1.5 E-DiscoveryTeam’sThreeResearchQuestions. Ourfirstandprimaryquestionwastodetermine:WhatRecall,PrecisionandEffortlevelsthee-DiscoveryTeamwouldattaininTRECtestconditionsoverall30TopicsusingtheTeam’s

Page 6: e-Discovery Team at TREC 2015 Total Recall Track

6

PredictiveCoding3.0hybridmultimodalsearchmethodsandKrollOntrack’ssoftware,eDiscovery.comReview(EDR). Oursecondaryquestionwas:HowwilltheTeam’sresultsusingitssemi-automated,supervisedlearningmethodcomparewithotherRecallTrackparticipantsusingsemiautomatedsupervisedorfullyautomatedunsupervisedlearningmethods.Ourlastquestionwas:Whataretheidealratios,ifany,forrelevantandirrelevanttrainingexamplestomaximizeeffectivenessofactivemachinelearningwithEDR. 2.RELATEDWORK Itisgenerallyacceptedinthelegalsearchcommunitythattheuseofpredictivecodingtypesearchalgorithmscanimprovethesearchandreviewofdocumentsinlegalproceedings.6Theuseofpredictivecodinghasalsobeenapproved,andevenencouragedbyvariouscourtsaroundtheworld,includingnumerouscourtsintheU.S.7 Althoughthereisagreementonuseofpredictivecoding,thereiscontroversyanddisagreementastothemosteffectivemethodsofuse.8Thereare,forinstance,proponentsforavarietyofdifferentmethodstofindtrainingdocumentsforpredictivecoding.Someadvocatefortheuseofchanceselectionalone,othersfortheuseoftoprankeddocumentsalone,othersforacombinationoftoprankedandmid-levelrankeddocumentswhereclassificationisunsure,andstillothers,includingLosey,callfortheuseofacombinationofallthreeoftheseselectionprocessesandmore.9ThelatestrespectfuldisagreementisbetweenLosey’se-DiscoveryTeam,andtheAdministratorsoftheTotalRecallTrack,GrossmanandCormack,concerningtheadvisabilityof:1)keepingattorneysearchexpertsintheloop,thehybridapproach,asopposedtothefullyautomatedapproach;and2)usingavarietyofsearchmethods,themultimodalapproach,asopposedtorelianceonhighrankingdocumentsaloneformachinetraining.10

Someattorneys,predictivecodingsoftwarevendors,and,apparently,GrossmanandCormack,advocatefortheuseofpredictivecodingsearchmethodsalone,andforegoothersearchmethodswhentheydoso,suchaskeywordsearch,conceptsearches,similaritysearchesandlinearreview.E-DiscoveryTeammembersrejectthatapproachandinsteadadvocateforahybridmultimodalapproachthattheycallPredictiveCoding3.0,furtherdescribedbelow.Itusesallmethods.AsdiscussedinEndnote2,werejectthenotionofinherentlawyerbiasthatunderliessomeexperts’fullyautomatedapproaches,including,buttoalesserdegree,GrossmanandCormack.Weinsteadseektoaugmentandenhanceattorneysearchexperts,notautomateandreplacethem.Wedo,however,favorcertainsafeguardsagainstthepropagationoferrors,intentionalorinadvertent,andadvocatewithinthelegalcommunityforcontinuousactivetrainingoflawyersinsearchtechniquesandethics. Ourparticipationinthe2015TRECTotalRecallTrack,theresearchquestionsweposed,andtheexperimentsweperformed,werenotinanymannerdesignedorintendedtoattempttoresolvethiscurrentmethodologydisputewiththeAdministratorsofthisTrack.Infact,itwasonlyatthe2015Conferencethatwefullyunderstoodtheextentofthesedifferences.AlthoughGrossmanandCormackdidindividuallyparticipateinthisTrack,aswellasadministratorit,andsotoodidothergroupsfromCormack’suniversity,theydidnotparticipateinthemanualAthomedivisionthatwedid.ToourknowledgetheTotalRecalltrackwasnotdesignedtoaddressthisnewlyemergingdisagreementinpreferredmethodologies,noradvanceanyoneparticularmethodology.Still,wewouldconcedethat,subjecttonormalcaveats,someindirectlessonscanbederivedonthisissuefromtheTotalRecallTrackresults.

Page 7: e-Discovery Team at TREC 2015 Total Recall Track

7

3.HYBRIDMULTIMODALAPPROACH Thee-DiscoveryTeamapproachincludesalltypesofsearchmethods,withprimaryrelianceplacedonpredictivecodingandtheuseofhigh-rankeddocumentsforcontinuousactivetraining.InthatwayitissimilartotheapproachusedbyGrossmanandCormack,11butdiffersinthattheTeamusesamultimodalselectionofsearchmethodstolocatesuitabletrainingdocuments,includinghighrankingdocuments,somemid-levelrankeduncertaindocuments,andallothersearchmethods,includingkeywordsearch,similaritysearch,conceptsearchandevenoccasionaluseoflinearreviewandrandomsearches.ThevarioustypesofsearchesusuallyincludedintheTeam’smultimodalapproachareshowninthesearchpyramid,below.

Thestandardeight-stepworkflowusedbytheTeaminlegalsearchprojectsisshowninthediagrambelow.Astepbystepdescriptionsoftheworkflowcanbefoundine-DiscoveryTeamwritings.12TheapplicationofthismethodologycanbeseentheTeam’sdescriptionoftheirworkineachofthethirtyTopicsthatisincludedintheAppendix.OurusualstepsOne,ThreeandSevenhadtobeomittedorseverelyconstrainedtomeettheTRECexperimentformat.

Page 8: e-Discovery Team at TREC 2015 Total Recall Track

8

StandardstepsThreeandSevenoftheworkflowwereomittedtomeetthetimerequirementsofcompletingeveryreviewprojectin1.5days.Skippingthesestepsallowedustocomplete30reviewprojectsin45daysintheTeam’ssparetime,buthadadetrimentalimpact. Ourusualfirststep,ESIDiscoveryCommunications,iswhereourinformationneedsareestablished.ThishadtoomittedtofittheformatoftheRecallTrackAthomeexperiments.TheonlycommunicationundertheTRECprotocolwasaveryshort,oftenjusttwo-worddescriptionofrelevance,plusinstantfeedbackintheformoryesornoresponsesastowhetherparticulardocumentssubmittedwererelevant.Inthee-DiscoveryTeam’stypicalworkflowdiscoverycommunicationstypicallyinvolve:1)detailedrequestsforinformationcontainedincourtdocumentssuchasubpoenasorRequestForProduction;2)inputfromaqualifiedSME,whoistypicallyalegalexpertwithdeepknowledgeofthefactualissuesinthecaseandhowthepresidingjudgeinthelegalproceedingwilllikelyruleonborderlinerelevantissues;and,3)dialogueswiththeclient,witnesses,andwiththepartyrequestingtheproductionofdocumentstoclarifythesearchtarget. TheTeamneverreceivesarequestforproductionwithjusttwoorthreeworddescriptionsasencounteredintheTRECexperiments.WhentheTeamreceivesvaguerequests,whichiscommon,theTeamseeksclarificationindiscussions(StepOne).Inpracticeifthereisdisagreementastorelevancebetweentheparties,whichisalsocommon,thepresidingjudgeisaskedtomakerelevancerulings.Again,noneofthiswaspossibleintheTRECexperiments. AllofourusualpracticesinStepOnehadtobeadjustedtothesubmissionsformatofthe30AthomeTopics.ThemostprofoundimpactoftheseadjustmentswasthattheattorneysontheTeamoftenlackedaclearunderstandingastotheintendedscopeofrelevanceandtherationalebehindtheautomatedTRECrelevancerulingsonparticulardocuments.TheseprotocolchangeshadtheimpactofminimizingtheimportanceoftheSMEroleontheactivemachinelearningprocess.Instead,thisrolewasoftenshiftedalmostentirelytotheanalyticsoftheEDRsoftware.Thesoftwareanalyticscouldoftenseepatterns,andcorrectlypredictrelevance,thatthehumanattorneyreviewerscouldnot(often,butnotalways,becausethehumanreviewersdisagreed

Page 9: e-Discovery Team at TREC 2015 Total Recall Track

9

withtheTRECassessorshumanjudgmentofgroundtruthinseveraltopics,andotherwisecouldnotfolloworseeanylogictothedocumentsreturnedasrelevant). ThisminimizationoftheimportanceoftheSMEroleisnotcommoninlegalsearchwhereattorneyreviewersalwayshavesomesortofunderstandingofrelevance.TheroleoftheSMEintheTeam’sdecadesofexperienceinlegalsearchhasalwaysbeenimportanttohelpensurehighquality,trustworthyresults.ContrarytotheunfortunatepopularbeliefamonglaypersonsgoingbacktothetimeofShakespeare,13thevastmajorityoflegalprofessionalsmaintainveryhighstandardsofethicsandtrustworthiness.Inspiteoftheallegednegativeinfluencesofthecenturiesoldadversarialtraditionofthecommonlaw,attorneysarededicatedtouncoveringthetruth,thewholetruth,andnothingbutthetruth,regardlessoftheparticularcaseimpact.Anynotionofinherentbiasbyattorneysismisplaced.Itis,afterall,attorneyswhocontrolthediscoveryprocessanddefinerelevance,andattorneys,notrobotsorscientists,whomaketheproductionofrelevantdocumentstotheotherside.14 Scientificresearchisbetterservedwhendrivenbyreasonandobjectivemeasurements,notprejudicesandassumptionsaboutanentireprofessionandourcommonlawsystemofjustice,basedasitisonanadversarialtruthseekingprocess.Thee-DiscoveryTeamwillcontinuetolookforwaystoimprovequalitycontrol,andguardagainstinadvertenterrors,whichalwaysexistsinanyhumanendeavor,andidentifyintentionalerrors,whichrarelyexistinlegalsearch,but,weconcedemaysometimestakeplace.Forthatreasonwewillexploregreaterrelianceonautomatedprocessinourfutureresearchandotherqualitycontroltechniques.15Wewillnot,however,abandonahybridapproachwhereahumanremains,ifnotincontrol,thenatleastasanactivepartner,outofanysubjectiveprejudicesagainstlawyers.Wealsorefusetoaccepttheunprovenassumptionthatouradversarialsystemisinherentlysuspect,encouragesbias,andotherwiserequiresthathumansberemovedfrome-discoveryandreplacedbyrobots.Conversely,wedonotnaivelyassumelawyersareautomaticallysuperiortomachines.Wehavelongadvocatedagainstthecurrentlegalstandardofonlyusingmanualreviewofeverydocument.TheTeam’shybridapproachaimsforaproportionalbalance.4.EXPERIMENTSANDDISCUSSIONS Thee-DiscoveryTeamsoughttoanswerthethreepreviouslylistedResearchQuestionsinitsexperimentsatthe2015TRECTotalRecallTrack.4.1FirstandPrimaryResearchQuestion. WhatRecall,PrecisionandEffortlevelswillthee-DiscoveryTeamattaininTRECtestconditionsoverall30TopicsusingtheTeam’sPredictiveCoding3.0hybridmultimodalsearchmethodsandKrollOntrack’ssoftware,eDiscovery.comReview(EDR). Weprimarilymeasuredeffortbythenumberofdocumentsthatwereactuallyhuman-reviewedandcodedrelevantorirrelevant.TheTeamhuman-reviewedonly32,916documentstoclassify16,576,798documents.Asanadditionalmeasureofeffort,weestimatedourtotaltimespentonallTopics.TheTeamspent45daysdoingallofthework,withanestimatedaverageof8hoursperdaytotalexpendedbytheTeam.(AllTeammemberscarriedontheirnormalemploymentactivitiesononlyasomewhatreducedbasisduringthe45daysofthereview,andTRECworkwasalsoreducedonmostweekends.)TheestimatedtotalhoursspentbyTeammembersforbothanalysisandreviewisthusapproximately360hours. Itistypicalinlegalsearchtotrytomeasuretheefficiencyofadocumentreviewbythenumberofdocumentsclassifiedinanhour.Forinstance,atypicalcontractreviewattorneycanclassifyanaverageof50documentsperhour.HereusingPredictiveCoding3.0ourTeamclassified16,576,798documentsin360hours.Thatisanaveragespeedof46,047filesperhour.

Page 10: e-Discovery Team at TREC 2015 Total Recall Track

10

Inlegalsearchitisalsotypical,indeedmandatory,tomeasurethecostsofreviewandbillclientsaccordingly.Ifwehereassumeahighattorneyhourlyrateof$500perhour,thenthetotalcostofthereviewofall30Topicswouldbe$180,000.Thatisacostoflessthan$0.01perdocument.Inatraditionallegalreview,wherealawyerreviewsonedocumentatatime,thecostwouldbefarhigher.Evenifyouassumealowattorneyrateof$50perhour,andreviewspeedof50filesperhour,thetotalcosttoreviewwouldbe$16,576,798.Thatisacostof$1.00perdocument,whichisactuallylowbylegalsearchstandards.16

Analysisofprojectdurationisalsoveryimportantinlegalsearch.Insteadofthe360hoursexpendedbyourTeamusingPredictiveCoding3.0,traditionallinearreviewwouldhavetaken331,536hours(16,576,798/50).Inotherwords,whatwedidin45days,taking360hours,wouldhavetakenateamoftwolawyersusingtraditionalmethodsover45years. CompletedetailsanddescriptionsoftheadhocmethodsemployedinallthirtytopicsareincludedintheAppendix.4.2ResearchQuestionNo.2. HowwilltheTeam’sresultsusingitssemi-automated,supervisedlearningmethodcomparewithotherRecallTrackparticipantsusingsemiautomatedsupervisedlearningmethods. UnfortunatelynootherAthomeparticipantscompletedallthirtytopicsandonlyonecompletedalltenBushemailtopics.ThelackofparticipationbyothersintheAthomegroupmakesmeaningfulcomparisonsverydifficultorimpossible,butwenotethatthee-DiscoveryTeam’sscoreswereconsistentlyhigherthananyotherAthomeparticipants. TheSandboxparticipants’workincludedthesamethreedatasetsasAtHome,butnoneofthemalsoparticipatedintheAthomedivision.Thisisunfortunatebecauseitmakesdirectcomparisonsproblematic,ifnotimpossible,especiallyastothesoftwaresystemsused.Still,withsomecaveats,afewlimitedcomparisonsarepossiblebetweenthetwodivisionsbecausethesametopicsanddatasetsweresearched.4.3ResearchQuestionNo.3. Whataretheidealratios,ifany,forrelevantandirrelevanttrainingexamplestomaximizeeffectivenessofactivemachinelearningwithEDR. TheTeamexperimentedwithvariouspositiveandnegativetrainingratiosusingthepredictivecodingtrainingfeaturesoftheirsoftware.Mostoftheseexperimentswereposthoc,butsomewerecarriedoutduringtheinitialTRECsubmissions.Insomeofthethirtytopicsourreviewworkwouldhavebeenconcludedearlierbutforthesesideexperiments.5.RESULTS5.1ResearchQuestionNo.1. TheTRECmeasuredresultsdemonstratedhighlevelsofRecallandPrecisionwithrelativelylittlehumanrevieweffortsusingthee-DiscoveryTeam’smethodsandEDR.Thethree-manattorneyTeamwasabletoreviewandclassify16,576,798documentsin45daysunderdifficultTRECtestconditions.TheyattainedtotalRecallofallrelevantdocumentsinall30Topicsbyhumanreviewofonly32,916documents.Theydidsowithtwo-manattorneyteamsinthe10BushEmailTopics,andone-attorneyteamsinthe20otherTopics.InTopic3484,whichsearchedacollectionof902,434NewsArticles,theTeamattainedboth100%Recalland100%Precision.OnmanyotherTopicstheTeamattainednearperfectionscores.Intotal,veryhighscoreswererecordedin18ofthe30topicswithgoodresultsobtainedinall,especiallywhenconsideringthelowhumaneffortsinvolvedinthesupervisedlearning.Moreover,theTeam’sF1scoresatthetimeofReasonableCallrangedfromaperfectscoreof100%inTopic3484,to91%to99%ineighttopics,and82%-87%infiveothers.

Page 11: e-Discovery Team at TREC 2015 Total Recall Track

11

Consideringthelimitedhumaneffortputintothereviews,andthespeedofthereviews,weconsidertheresultsinallTopicstobeexcellent.Asshownbythecomparisonswithtraditionalreviewdiscussedabove,theseresultsarefarsuperiortothetypicallinearlegaldocumentreviewdonebylawfirmattorneysandcontractreviewattorneys. TheeffortsbynumberofdocumentshumanreviewedinallthirtytopicsareshowninthebelowchartFigure1.Asyoucansee,theTeamreviewed32,916documentstoattaintotalrecallofthe70,414documentspredeterminedbyTRECasrelevantinall30Topicsfromoutofatotalof16,576,798documents.TheaveragenumberofdocumentsreviewedtoattaintotalRecallineachtopicwas1,097.Thefigurerangedfromalowof19documentsreviewedinTopic2134(PayPal),whichhad252relevantdocuments,toahighof7,203inTopic103(ManateeProtection),whichhad5,725relevantdocuments.

TheTeam’sattainmentofhighlevelsofRecallandPrecisioninmultipleprojectsconfirmsthehypothesisthatEDRsoftwareandtheTeam’sPredictiveCoding3.0hybridmultimodalmethodsareeffectiveinmostprojectsatattaininghighlevelsofRecallandPrecisionwithminimalhumanefforts. ThebelowchartssummarizeforeachofthethreedatasetsthePrecisionresultsobtainedineachtopicat70%orhigherRecalllevels.PrecisionisshownontheleftandRecalllevelsattainedbysubmissionsareshownonthebottom.AdifferentcoloredlineshowseachTopic.AlthoughPrecisionwasnotthefocusoftheeffortsintheTeam’sRecallTrackparticipation,insteadthe

Topic NeedTotal

DocumentsTotal

Relevant 70% 80% 90% 95% 97.5% 100%

Topic100 SchoolandPreschoolFunding 290,099 4,542 651 651 651 651 651 651Topic101 JudicialSelection 290,099 5,834 6,841 6,895 6,895 6,895 6,895 6,896Topic102 CapitalPunishment 290,099 1,624 1,493 1,493 1,493 1,493 1,493 1,493Topic103 ManateeProtection 290,099 5,725 7,203 7,203 7,203 7,203 7,203 7,203Topic104 NewMedicalSchools 290,099 227 1,091 1,091 1,091 1,091 1,091 1,091Topic105 AffirmativeAction 290,099 3,635 582 582 582 674 674 674Topic106 TerriSchiavo 290,099 17,135 831 1,987 1,995 2,005 2,025 2,226Topic107 TortReform 290,099 2,369 877 1,142 1,164 1,164 1,164 1,164Topic108 ManateeCounty 290,099 2,375 696 696 696 696 696 696Topic109 ScarletLetterLaw 290,099 506 491 496 639 753 753 753Topic2052 PayingforAmazonBookReviews 465,147 265 1,842 1,960 2,213 2,325 2,325 2,325Topic2108 CAPTCHAServices 465,147 656 2,101 2,101 2,101 2,101 2,101 2,101Topic2129 FacebookAccounts 465,147 589 94 94 94 94 94 94Topic2130 SurelyBitcoinscanbeUsed 465,147 2,299 283 283 285 285 285 285Topic2134 PaypalAccounts 465,147 252 19 19 19 19 19 19Topic2158 UsingTORforAnonymousInternetBrowsing 465,147 1,261 1,332 1,332 1,332 1,332 1,332 1,335Topic2225 Rootkits 465,147 182 183 186 205 214 219 225Topic2322 WebScraping 465,147 10,145 194 195 195 195 195 195Topic2333 ArticleSpinnerSpinning 465,147 4,805 190 228 228 228 228 228Topic2461 OffshoreHostSites 465,147 179 32 32 32 32 32 32Topic3089 PicktonMurders 902,434 255 472 516 779 834 834 836Topic3133 PacificGateway 902,434 113 49 49 49 49 49 49Topic3226 TrafficEnforcementCameras 902,434 2,094 18 18 18 78 81 81Topic3290 RoosterTurkeyChickenNuisance 902,434 26 137 191 306 306 310 310Topic3357 OccupyVancouver 902,434 629 751 751 920 920 920 920Topic3378 RobMcKennaGubernatorialCandidate 902,434 66 79 161 200 200 200 200Topic3423 RobFordCuttheWaist 902,434 76 92 92 92 92 92 92Topic3431 KingstonMillsLockMurders 902,434 1,111 272 272 272 272 272 302Topic3481 Fracking 902,434 1,966 31 236 367 367 367 367Topic3484 PaulandCathyLeeMartin 902,434 23 22 22 22 22 73 73

Figure1 TOTALS 16,576,800 70,964 28,949 30,974 32,138 32,590 32,673 32,916

Effort(Docsreviewed)byRECALLSCORES

Page 12: e-Discovery Team at TREC 2015 Total Recall Track

12

focuswasonRecallandeffort,stillthemeasurementsofPrecisionacrosstheRecalllevelsprovidevaluableinsightsintotheoverallwork.Figure2belowshowstheresultsofthe10TopicsinJebBushEmailcollectionof290,099emails.Figure3showstheresultsofthe10TopicsinBlackHatWorldForumcollectionof465,149posts,andFigure4showstheresultsoftheNewsArticlescollectionof902,434articles.

Figure1

AquickexamoftheresultsoftheBushEmailTopicsshowsthatfourofthetenTopicshadsignificantlylessPrecisioninattaining80%orhigherRecallthantheothers.Theyare:Topic104NewMedicalSchools,showninpurple;Topic100SchoolandPreschoolFunding,showninblue;Topic102CapitalPunishment,showningreen;and,Topic108ManateeCounty.Topic108wasprobablythemosterror-filledofalloftheTopicstandards,andthismayexplainpartoftheoutlierresultsforthattopicandothersinthislowperforminggroup.InvestigationoftheoutliersshowedthattheprimarycauseoftheseresultswasdisagreementbytotheTeam’sleadattorneyfortheBushemail,aFloridalife-longresidentwhoisusedtoservingastheSMEdefininggroundtruth,andtheTRECassessors’relevancedeterminations.Also,thesetenBushtopicswerecarriedoutatthebeginningoftheprojectbeforetheTeamadoptedmitigatingcounterstrategiesofgreaterrelianceonmachinerankingtomitigatetheimpactofthepersonaljudgmentdisagreements.

Page 13: e-Discovery Team at TREC 2015 Total Recall Track

13

Figure2

AnalysisoftheresultsofthetenTopicsinBlackHatWorldalsoindicatedthattherelevancedisagreementsaccountedformostofthediscrepancies. ItappearsthaterrorsandinconsistenciesintheTRECstandardjudgingexplainmostofthePrecisiondifferencesamongtheTopics,especiallytheTopicsintheBlackHatWorlddataset.InseveraloftheseTopicstheTeamoftenhaddifficultydetectinganylogicalpatterntotherelevancescope.Theyinstead,asmentioned,hadtorelyalmostentirelyontheEDRrelevancepredictions.OnlytheTeamsoftwareinsomeoftheseTopicscoulddetectanyconnectivityandpatterntotheTRECrelevantstandards. TheresultsonthelocalNewsdatasetof902,434articles(Figure4below)againshowssignificantdivergencesinPrecision,althoughlessthanthedifferencesseeninBushEmailorBlackHatWorlddatasets.AnalysisoftheresultsofthetenNewsArticlesTopicsagainshowsconsiderabledisagreementonrelevancejudgmentsinsometopics.InherentdifficultyofthevariousissuesintheTopicsmayalsoexplainsomeofthedifferences.ThesizeoftherelevancepoolalsohasadirectrelationshiponthePrecision.

Page 14: e-Discovery Team at TREC 2015 Total Recall Track

14

Figure3

ThefollowingresultsarehighlightsoftheTeam’stop18topicswhereatleastseventy-five

percentofthetargetdocuments(Recall75%+)werefoundwithaPrecisionrateof80%or

higher.TheTop-18ProjectsoftheTeamarerankedbyus,somewhatarbitrarily,asfollows,

startingwithapreviouslyunheardofperfectscore.1. InTopic3484(Paul&KathyMartin),thee-DiscoveryTeam(JimSullivan)attainedaperfect

scoreof100%Precisionand100%Recall.All23ofthetargetdocumentswerefoundinthefirst

23documentssubmitted.SullivanthencalledReasonableafterthe23rdrelevantdocumentwas

submittedandsoplayedtheperfectgame.Hepredictedthattheremaining902,411articlesin

theNewscollectionwouldbeirrelevant.Sullivanwasright.Theeffortexpendedforperfection

washispersonalreviewof73newsreportsoutofthetotalcollectionof902,434.100%Recall

with100%Precisioninalargesearchprojectwaspreviouslythoughtimpossiblebymosttext

retrievalexperts.

2. InTopic3431(KingstonMillsMurders),100%RecallwasattainedbytheTeam(Tony

Reichenberger)with82.3%Precision.Heattained97.5%RecallwithaPrecisionof98.9%,and

95%Recallwith99%Precision.Theeffortexpendedtoreach100%Recallwashispersonal

reviewof332newsreportsoutofthetotalcollectionof902,434.

3. InTopic106(TerrySchaivo),whichhadthehighestprevalenceofanytopic(5.9%),98.47%

RecallwasattainedbytheTeam(RalphLosey)with97.22%Precision.Atthattime,after

submitting2,025documents,hecalledreasonable.TheF1measurethenattainedwas97.84%.

Theeffort,ornumberofdocumentsreviewedandcodedbyLoseytoattainthisresult,was

2,025Bushemails,outofthetotalcollectionof290,099,andtotalrelevantof17,135.Acontract

reviewattorney,whosestandardbillingrateisone-tenththatofLosey’s,assistedinthereview

effort.Loseyalsoattained99.7%RecallinthisTopicwithaPrecisionof70%.

4. InTopic2158(UsingTOR),theTeam(JimSullivan)attained97.5%Recallofthetargetwhile

maintainingaPrecisionof95%.Heattained95%RecallwithaPrecisionof98.4%,and90%

Page 15: e-Discovery Team at TREC 2015 Total Recall Track

15

Recallwith99%Precision.Theeffortexpendedtoreach97.5%Recallwashispersonalreviewof1,332BlackHatForumposts,outofthetotalcollectionof465,149.5. Topic103(ManateeProtection),whichhadthethirdhighestPrevalenceof1.97%,theTeam(RalphLosey)attained97.5%RecallwithaPrecisionof90.6%,95%RecallwithaPrecisionof98.8%,and90%Recallwith99.3%Precision.Theeffortexpendedtoreach97.5%Recallwashispersonalreviewof7,203Bushemails,outofthetotalcollectionof290,099.Againhewasassistedbyacontractreviewattorney.ThehighreviewcounthereisduetothefactthisisoneoftwoprojectswherethePredictiveCoding3.0secondstepofrandomsamplingwasincluded.Thisisalsothefirstprojectundertaken.6. InTopic109(ScarlettLetterLaw),theTeam(RalphLosey)attained97.5%Recallwith84.4%Precision,95%Recallwith95.4%Precision,and90%Recallwith96%Precision.Theeffortexpendedtoreach97.5%Recallwashispersonalreviewof753Bushemails,againoutofthetotalcollectionof290,099.Onecontractreviewattorneyassisted.7. InTopic3378(RobMcKenna),theTeam(TonyReichenberger)attained100%Recallafterthesubmissionofonly192documentsandreviewofonly200documents.ThiswasalowprevalenceTopicwithonly66relevantoutofthetotalcollectionof902,434.ForthesereasonsthePrecisionwas34.31%,eventhoughonly192documentsweresubmittedtoattain100%Recall. TheTeamresultsexceededexpectations,whereourRecallgoalwas90%,inmanyadditionalTopics:8. InTopic3481(Fracking),theTeam(JimSullivan)attained95%Recallwith95.2%Precisionbyreviewingonly367newsarticles.9. InTopic105(AffirmativeAction),theTeam(RalphLosey)attained90%Recallwith99.7%Precisionbyreviewingonly582mails(onecontractreviewattorneyassisted).10. InTopic3089(PicktonMurders),theTeam(JoeWhite)attained90%Recallwith97.9%Precisionbyreviewingonly779articles.A99.61%Recalllevelwasattainedwith54.98%Precision,againwithreviewofonly799articles.11. InTopic3226(TrafficCameras),theTeam(JimSullivan)attained90%Recallwith95.9%Precisionbyhispersonalreviewonly18forumposts.12. InTopic101(JudicialSelection),whichhadthesecondhighestPrevalencerateof2%,theTeam(RalphLosey)attained90%Recallwith87.8%Precisionbyreviewing6,895emails(onecontractreviewattorneyassisted).13. InTopic3357(OccupyVancouver),theTeam(TonyReichenberger)attained90%Recallwith82.4%Precisionbyreviewingonly920newsarticles.14. InTopic107(TortReform),theTeam(RalphLosey)attained90%Recallwith80.9%Precisionbyreviewingonly1,164emails(onecontractreviewattorneyassisted). FouradditionalTopicsalsodidquitewell,andattainedRecalllevelsover75%withhighPrecisionrates:15. InTopic2225(Rootkits)theTeam(RalphLosey)attained80%Recallwith88%Precisionbyreviewingonly186forumposts.16. InTopic2333(ArticleSpinner)theTeam(RalphLosey)attained80%Recallwith79%Precisionbyreviewingonly228forumposts.17. InTopic2052(PayingforBookReviews)theTeam(JimSullivan)attained80%Recallwith73.4%Precision)byreviewing1,960forumposts.18. InTopic3133(PacificGateway)theTeam(RalphLosey)attained76.99%Recallwith89.69%Precisionbyreviewingonly49NewsArticles.Figure5belowshowstherecallandprecisionofthesetop18projects.

Page 16: e-Discovery Team at TREC 2015 Total Recall Track

16

Figure5 TheTeam’slowerperformanceintheother12projectswas,accordingtoouranalysis,primarilycausedbythefactthattheattorneyTeammembersareaccustomedtoself-definingthegroundtruth,andtheiropinionsonrelevancedifferedsignificantlyfromtheTRECassessors.InlatertopicstheattorneyTeamlearnedtoturnofftheirownjudgmentsandrelyprimarilyontheirsoftware’sautomatedprocesses,atwhichpointtheirscoresimproved.InalltopicsthemachinelearningoftheTeam’sEDRsoftwarewasabletofinddocumentsthatTRECwouldconsiderrelevant,evenwherethehumanteammemberscouldseenoconnection.ButinsometopicsthehumansearcherswouldbecompletelybewilderedbythezigzagrelevancescopeshownbyTREC’sresponsetosubmissions.TheattorneyswouldnotseeanykindoflogicalconnectingpatterntosomeofthedocumentsthatTRECdeterminedtoberelevant.Sometimestheattorneysonlysawwronganswersandinconsistencies.Eventhoughtheattorneyscouldnotseeanypattern,theylearnedthattheirEDRsoftwarecouldoftenstillfindthepatternsandcorrectlypredictwhichdocumentsTRECwouldlabelrelevant.WhenthishappenedtheywouldineffectturnallsubmissiondecisionsovertoEDRandonlysubmitthehighest-rankingdocuments.Thecut-offpointofrankingforsubmissions,beittop5%ortop100documents,orsomeotherscheme,wasstilldeterminedbythehumanincharge.ThatispartoftheTeam’shybriddesign. ThereareprobablyotherexplanationsforthebottomtwelvescoringtopicsasidefromquestionableTRECassessoradjudications,including:thedataitself;thedifficultyoftheissuesaddressedintheTopic;relativeperformanceofhumanreviewers;and,theimpactoftheomissionofStepsThreeandSevenfromtheTeam’sstandardworkflowtomeetthe45daytimelimitation,andtheradicalchangetoStepOne.See:ConceptDriftandConsistency:TwoKeystoDocumentReviewQuality,e-DiscoveryTeam(Jan.20,2016).AlloftheTeam’sinconsistencieswerenotcausedbydifferencesofopiniononTRECrelevanceadjudications,onlysome.Weappreciatethedifficultyofcreatinginterestingtopicsforsuchadiversegroupofparticipants,mostofwhomusedfullyautomatedCALapproaches.WeunderstandtheinherentdifficultiesinsettingagroundtruthforprejudgedrelevancewherethetraditionalTRECpoolingmethodscouldnotbeused.17Inspiteofourcriticismshere,weoverallhavehighpraiseandthanksfortheTRECadministrators’tirelesseffortsandagreewiththemajorityoftheassessmentstheymadeunderdifficult,timeconstrainedconditions.

Page 17: e-Discovery Team at TREC 2015 Total Recall Track

17

Regardlessoftheseissuesandmetricinconsistencies,theTeam’smanualefforts,as

measuredbytimeexpendedandnumberofdocumentsmanuallyreviewedwereconsistently

verylowinalltopics.Morethanhalfoftherelevantdocumentsfoundwerenotmanually

reviewed.Instead,theTeamwasroutinelyabletodelegaterelevancecodingtotheEDR

software,eitherbychoiceandconvenience,orsometimes,asdiscussed,bynecessityinthe

topicswherethegroundtruthofrelevancewasunknownandincomprehensibletothe

attorneys.Thisresultshouldshatteronceandforallthealreadyweakenedlegalsearchmyththatalldocumentsmustbemanuallyreviewedforrelevance.

Althoughnotdirectlycomparableduetodifferenttestconditions,differentsearches,etc.,

thee-DiscoveryTeam’sscoreswerefarhigherthananypreviouslyrecordedinthesixyearsofTRECLegalTrack(2006-2011)

18oranyotherstudyoflegalsearch.

19TheresultsofBlairand

MaronandTRECfrom2007to2011aresummarizedbelowinFigure6withF1scores.

Figure6

Thisisnotalistingoftheaveragescoreperyear,suchscoreswouldbefar,farlower.Ratherthisshowstheverybesteffortattainedbyanyparticipantinthatyearinanytopic.Theseare

thehighestscoresfromeachTRECyear.NotehowtheycomparewiththeTeam’shighscoresin

2015,Figure7.

Figure7

Page 18: e-Discovery Team at TREC 2015 Total Recall Track

18

Onereasonforthissignificantjumpinhighscoresmaybethatmanyofthethirtytopicsinthe2015TotalRecallTrackpresentedrelativelysimpleinformationneedsbylegalsearchstandards,withonemajorexception,Topic109–ScarletLetterLaw.Itrequiredsomelegalknowledgeandanalysis.Therewerealsofourotherminorexceptions–Topics101,105,106,107–thatrequiredsomemeasureoflegalanalysis.AnotherexplanationmaybeimprovedsoftwareandtheTeam’shybridmultimodalmethodthatincludescontinuousactivelearning.ThelaterisstronglysuggestedbecausetheresultsinTopic109,aswellasTopics101,105,106and107,areclosetotypicallegalsearchtypeprojectsandtheTeam’sresultsinthesetopicswereallconsistentlyhigh:Topic109(ScarlettLetterLaw)-95%F1atReasonableCall;Topic101(JudicialSelection)-87%F1atReasonableCall;Topic105(AffirmativeAction)-95%F1atReasonableCall;Topic106(TerriSchiavo)-98%F1atReasonableCall;Topic107(TortReform)-84%F1atReasonableCall.ThisisshowninFigure8below.

Figure85.2ResearchQuestionNo.2. TheTeamattainedveryhighrecallandprecisionratesinmost,butnotall,ofthethirtyTotalRecalltopics.TheTeam’sF1scoresatthetimeofReasonableCallrangedfromaperfectscoreof100%inonetopic(3484),to91%to99%ineighttopics,and82%-87%infiveothers. Although,ofcourse,notdirectlycomparable,thesescoresarefarhigherthananypreviouslyrecordedinthesixyearsofTRECLegalTrack(2006-2011)oranyotherstudyoflegalsearch.Onereasonforthismaybethatthethirtytopicsinthe2015TotalRecalltrackpresentedrelativelysimpleinformationneedsbylegalsearchstandards,withoneexception(Topic109–ScarletLetterLaw).AnothermaybeimprovedsoftwareandtheTeam’shybridmultimodalmethodthatincludescontinuousactivelearning. Sincemostofthethirtytopicspresentedonlysimple,single-issueinformationneedssuitableforsingle-facetclassification,theyhadsomewhatlimitedvalueforpurposesoflegalsearchexperimentation.Further,onlyafewofthetopicsrequiredanylegalanalysisforrelevanceidentification.Thisagainlimitedtheuseoftheseexperimentsforpurposesoflegalsearchresearch.Thesetwofactors,plustheomissionofmetadata,wasadisadvantagetothee-DiscoveryTeamoflawyerswhoarepracticedinmorecomplexinformationneedsrequiringextensivelegalanalysisandSMEdefinedgroundtruths.Further,theirmethodsandEDR

Page 19: e-Discovery Team at TREC 2015 Total Recall Track

19

softwarearedesignedtoutilizefullmetadataderivedfromnativefiles.Conversely,itappearsthatthesesamefactorsmadeitsimplerfortheSandboxparticipantstoperformwellinmosttopics. TheoneexceptionwasTopic109,ScarlettLetterLaw,which,asmentioned,wastheonlytopicrequiringlegalanalysisandsomeveryrudimentaryknowledgetobeginlocatingrelevantdocuments.Thekeywordsalone-“ScarlettLetterLaw”–wouldonlyfindrelevantdocumentswiththiswordcombinationandsimilartextpatterns.ThesewordswerejustthenicknameoftheproposedandeventuallyenactedFloridaStatute.Anyattorneywouldknowthattofindrelevantinformationtheywouldnotonlyhavetosearchthename,theywouldhavetosearchthevarioushouseandsenatebillnumbersforthislaw.Thesenumberswouldnotoftenappearinthesamedocumentasthenickname,andsincethemachinedidnotknowtosearchforthesenumbers,itdidnotrealizethesignificance.Eventuallytheautomatedmachinelearningsawtheconnection,aftermanyrelevancefeedbacksubmissions.Thesesubmissionswould,ofcourse,nothappeninreallegalsearch,andeveniftheydid,thisimprecisionwouldequatetosubstantialadditionalhumanreviewsandthusexpense. Somewhatsurprisinglytous,thefullyautomaticmethodsemployedbytheSandboxparticipantsattainedrecallandprecisionscorescomparabletothatofthee-DiscoveryTeaminmostofthetopics.Moreover,therewerefewdifferencesbetweenthevariousfullyautomatedapproaches.Still,thehighestF1valuesatthetimeofReasonableCallwereattainedbythee-DiscoveryTeamintwentyofthethirtytopics,andthesecondorthirdbestF1scoresinfourothers.ThisisshowninFigure9below.TheTeamF1rankingsforeachtopicareshowninthethirdcolumn.

Figure9

InTopic109,ScarletLetterLaw,wheresomelegalknowledgeandanalysiswasrequiredtounderstandrelevance,theTeamattainedsignificantlybetterresults-96%F1-atthetimeofReasonableCallthandidtheautomaticruns.IntheSandboxautomaticrunstheF1valuesatthetimeofReasonableCallrangedfrom0%to29%.Moreover,atthe1RpointinTopic109,thee-

F1 Topic100 Rank Topic101 Rank Topic102 Rank Topic103 Rank Topic104 Rank Topic105 Rank Topic106 Rank Topic107 Rank Topic108 Rank Topic109 RankeDiscoveryTeam 68.96% 2 82.45% 4 69.88% 1 90.69% 1 73.53% 1 95.07% 1 97.38% 1 84.40% 1 47.03% 5 95.58% 1NINJA 22.74% 8 79.17% 5 56.38% 5 83.79% 3 57.40% 4 77.24% 2 88.90% 5 50.89% 8 13.43% 11 48.79% 2UvA.ILPS-baseline 73.55% 1 86.36% 1 56.38% 4 89.94% 2 10.27% 10 64.13% 5 95.87% 2 77.26% 4 64.47% 1 28.88% 3UvA.ILPS-baseline2 45.56% 5 71.04% 7 42.42% 8 77.24% 6 2.42% 11 43.27% 7 84.67% 6 47.81% 9 35.13% 8 26.90% 6WaterlooClarke-UWPAH1 11.95% 9 9.98% 11 32.16% 11 10.46% 11 68.51% 2 15.99% 10 3.61% 10 22.96% 11 21.61% 9 0.73% 8WaterlooClarke-UWPAH2 10.37% 10 9.98% 10 32.16% 10 10.46% 10 65.93% 3 15.99% 11 3.54% 11 23.11% 10 21.54% 10 0.73% 9WaterlooCormack-Knee100 45.02% 6 67.65% 9 42.32% 9 71.10% 9 28.49% 7 34.08% 8 77.03% 9 53.92% 7 42.65% 7 0.94% 7WaterlooCormack-Knee1000 41.82% 7 67.67% 8 45.21% 7 71.11% 8 31.06% 5 33.90% 9 77.03% 8 57.79% 5 42.65% 6 27.17% 5WaterlooCormack-stop2399 68.21% 3 72.02% 6 51.74% 6 75.55% 7 14.34% 9 58.92% 6 81.60% 7 57.77% 6 58.96% 2 27.17% 4Webis-baseline 66.96% 4 83.87% 3 68.36% 2 82.42% 5 27.95% 8 64.91% 4 94.89% 4 79.24% 3 58.76% 3 0.00% 11Webis-keyphrase 0.14% 11 85.21% 2 67.71% 3 83.15% 4 31.04% 6 65.13% 3 94.90% 3 79.24% 2 58.34% 4 0.33% 10

F1 Topic2052 Rank Topic2108 Rank Topic2129 Rank Topic2130 Rank Topic2134 Rank Topic2158 Rank Topic2225 Rank Topic2322 Rank Topic2333 Rank Topic2461 RankeDiscoveryTeam 45.21% 1 53.99% 1 26.10% 6 64.31% 1 12.23% 6 95.61% 1 84.90% 1 72.60% 3 73.23% 1 16.68% 7NINJA 58.13% 2 53.66% 2 49.22% 2 52.18% 2 39.70% 2 76.26% 2 39.43% 4 24.83% 9 62.65% 6 24.48% 5UvA.ILPS-baseline 10.74% 3 22.74% 9 21.88% 7 41.12% 4 8.08% 7 42.02% 7 7.20% 9 73.20% 2 69.80% 2 7.33% 9UvA.ILPS-baseline2 10.37% 4 22.45% 10 19.23% 8 30.88% 5 6.96% 8 22.47% 9 6.45% 10 48.11% 6 46.02% 9 6.53% 10WaterlooClarke-UWPAH1 78.54% 5 52.20% 3 56.89% 1 13.42% 8 63.18% 1 40.08% 8 61.45% 2 5.85% 10 12.22% 10 49.90% 1WaterlooCormack-Knee100 41.43% 6 33.89% 5 28.52% 5 19.49% 6 18.45% 3 16.15% 10 41.33% 3 47.39% 7 47.33% 7 43.87% 2WaterlooCormack-Knee1000 38.10% 7 34.00% 4 30.91% 4 19.45% 7 18.45% 4 60.57% 5 27.02% 5 44.11% 8 47.30% 8 21.65% 6WaterlooCormack-stop2399 16.94% 8 31.35% 7 31.01% 3 46.56% 3 15.51% 5 45.06% 6 11.84% 8 75.86% 1 68.87% 3 11.72% 8Webis-baseline 13.24% 9 32.65% 6 7.73% 10 0.00% 10 2.21% 10 61.11% 4 18.36% 6 67.40% 5 68.07% 4 43.56% 3Webis-keyphrase 10.53% 10 30.56% 8 8.29% 9 0.00% 9 2.21% 9 62.14% 3 12.97% 7 67.72% 4 68.04% 5 31.95% 4

F1 Topic3089 Rank Topic3133 Rank Topic3226 Rank Topic3290 Rank Topic3357 Rank Topic3378 Rank Topic3423 Rank Topic3431 Rank Topic3481 Rank Topic3484 RankeDiscoveryTeam 93.28% 1 82.46% 1 55.39% 4 37.70% 2 86.70% 2 68.21% 1 58.12% 1 99.24% 1 95.48% 1 100.00% 1NINJA 86.84% 2 67.97% 2 22.75% 9 38.98% 1 89.95% 1 67.88% 2 57.85% 2 74.67% 4 71.59% 2 100.00% 1UvA.ILPS-baseline 5.47% 9 2.47% 9 37.25% 5 0.57% 9 12.75% 9 1.39% 9 1.26% 9 21.90% 7 35.00% 7 0.51% 9UvA.ILPS-baseline2 5.35% 10 2.39% 10 34.75% 6 0.39% 10 11.82% 10 1.38% 10 0.74% 10 21.74% 8 29.19% 9 0.51% 10WaterlooClarke-UWPAH1 76.14% 3 50.45% 3 24.73% 7 11.90% 5 62.65% 3 32.58% 4 18.65% 5 44.29% 6 26.87% 10 12.99% 6WaterlooCormack-Knee100 57.66% 4 49.02% 4 64.61% 2 26.09% 3 55.57% 4 57.87% 3 30.70% 3 93.34% 3 53.62% 5 34.07% 4WaterlooCormack-Knee1000 37.35% 5 18.38% 6 68.61% 1 4.59% 7 48.23% 5 11.26% 7 6.77% 7 93.77% 2 61.55% 4 4.07% 7WaterlooCormack-stop2399 16.41% 7 8.43% 7 56.65% 3 2.01% 8 32.80% 6 5.01% 8 3.56% 8 44.78% 5 53.56% 6 1.78% 8Webis-baseline 14.77% 8 47.06% 5 24.51% 8 19.31% 4 18.84% 7 27.37% 5 28.16% 4 19.71% 9 65.54% 3 34.59% 3Webis-keyphrase 19.10% 6 6.40% 8 18.29% 10 10.22% 6 17.98% 8 18.23% 6 16.04% 6 19.19% 10 32.89% 8 30.08% 5

Page 20: e-Discovery Team at TREC 2015 Total Recall Track

20

DiscoveryTeamhadattainedover95%recall,whereasalloftheautomatedmethodswerestilllessthan1%recall.Thisisshowninthechartbelow,Figure10.

Figure10

TheTeam’smultimodalhumanmachineapproachalsoconsistentlyfoundmorerelevantdocumentsatthestartofasearch,anddidsowithgreaterprecisionthanthefullyautomatedapproaches.Further,thehybridman-machineapproachwasconsistentlymoreeffectiveatdeterminingastoppoint,referredtobytheRecallTrackasa“ReasonableCall.”AnexampleofthisisshownintheFigure11forTopic109.ThedarkgreenlinerepresentstheReasonableCallpoint,recallisshowninthevertical,andhorizontalisthenumberofdocumentssubmitted.

Figure11

Page 21: e-Discovery Team at TREC 2015 Total Recall Track

21

Anotherwaytoevaluatetheperformanceofthemulti-modalapproachistoconsiderhowprecisethecodingsuggestionswereduringthecourseofreview.Thiswouldindicateanefficientreview,whichiscriticalinlegalsearchtocostsavings.AstotheAthome109topic,thebelowFigure12contrastsprecisionpercentageontheY-axis,withrecallpercentageontheX-axis.Precisiondoesnotbegintodropuntilapproximately95%Recall.Notethatthegreenlinerepresentingpercentofthedatabasesubmittedbarelymovesoffthebaseline.Figure13showstheactualdocumentcountsreviewedandsubmittedinordertoobtainthevariousprecisionthresholds.

Figure12

Figure13

Page 22: e-Discovery Team at TREC 2015 Total Recall Track

22

ForfurthercomparisonFigure14below(preparedbytheTotalRecalladministrators)plotstheaverageAthome3precisionbyrecallresults.Thee-DiscoveryTeamresults(barelyvisibleontop)followacurveverysimilartotheAthome109topic.TheTeam’sresultsoutperformedtheautomatedrunsformostofthedurationoftheprocess,demonstratingaconsistentefficiencyinresults.WhilevariousautomatedrunsexperiencedcomparableresultsintheAthome1andAthome2sets,theconsistentlyhighlevelofthemultimodalapproachcorroboratesaconsistentefficientprocessacrossalldatasets.

Figure145.3ResearchQuestionNo.3. TheTeam’sexperimentswithdifferentpositivenegativetrainingratiosshowedthattrainingusinga50/50ratioofrelevanttoirrelevantdocumentsperformedconsistentlybetterthananyotherratios.ThisresultisbelievedtobespecifictotheproprietarytypeoflogisticregressionalgorithmusedinKrollOntrack’sEDR.Itmaynothaveapplicationsbeyondthissoftware,orevenothermorecomplexprojects.Ourworkonthisquestioncontinues.6.CONCLUSIONS TheresultsinTopic109andothertopicsindicatethathybridman-machinelearningbyskilledattorneysis,atthecurrenttime,significantlymoreeffectiveatmeetingcomplexlegalsearchneedsthanfullyautomatedapproaches.Thisseemsobvious,butmoreexperimentsonthisissueareneededbeforethiscanbeaccuratelyquantified.ThesurprisingsuccessoftheSandboxparticipantsusingfullyautomatedsearch,eventhoughlimitedtonon-legaltopicsandsituationswithonlysimpleinformationneeds,suggeststhatgreaterrelianceonautomatedmethodscouldbeplacedinlegalsearchwherethecasesandneedsaresimple.Therelativelyloweffortinvolvedinautomatedlearning,andthuslowexpense,iscompelling,especiallyinviewoftheproportionalityanalysisrequiredbylawundertheDecember2015AmendmentstotheFederalRulesofCivilProcedure.TheTeamhasbegunandwillcontinueposthocanalysisandexperimentsusingvarioushybridmethodsthatadjustthebalancebetweenmanandmachine.

Page 23: e-Discovery Team at TREC 2015 Total Recall Track

23

Weareexperimentingwithmethodsthatplacegreaterrelianceonmachinelearninginalltopics,

including,butnotlimitedto,topicswithlessercomplexityandinformationneeds.Wewillalso

furtherinvestigatetheuseofbothfullyautomatedmethods,andhybridmethods,inlegal

searchqualitycontrol,frauddetection,andinthepredictionoffuturewrongfulconduct.20

The2015TRECTotalRecallTrackresultsalsosuggestthatevenwheninformationneedsare

simpleandrequirenocomplexanalysisorbackgroundknowledge,aswastrueofmostofthe

topics,thatahybridmethodoutperformsfullyautomatedmethodsintwoways:one,atfinding

relevantdocumentsquicklyandwithhighprecision;andtwo,atmakingbetterstopdecisions.Thesetwoconsiderationsareveryimportantinlegalsearchwhereattorneysmustfinda

proportionalbalancebetweenrecallandeffort/expense.Theresultsinalltopics,eventhe

simpleones,thuscautionagainstover-relianceatthistimeonmachinelearningalonewithout

properexpertsupervision.7.ACKNOWLEDGMENTS Thee-DiscoveryTeamwouldliketothankKrollOntrack,Inc.andJacksonLewisP.C.fortheir

generoussupportofthisproject.WewouldalsoliketothankthemanyemployeesatKroll

Ontrackwhopitchedinbehindthescenes,oftenlateatnightandonweekends,tohelpmake

thishappen.

8.REFERENCES(Endnotes)[1] Losey,R.,PredictiveCoding3.0,parttwo(e-DiscoveryTeam,10/18/15);alsosee

PredictiveCodingArticlesbyRalphLosey,(collectionofover50articlesbyRalphLoseyfurtherdescribingthehybridmultimodalapproach).

[2] Thee-DiscoveryTeam’shybridmultimodalapproachissimilartothemethodpromoted

bytheTotalRecallTrackadministrators,MauraGrossmanandGordonCormack,inthat

theybothusecontinuousactivelearning(CAL)inlegalsearchaspartofatechnology-

assistedreview(TAR).Itis,however,fundamentallydifferentfromGrossmanand

Cormack’scurrentmethodsintwoways.

First,ourapproachreliesuponandencouragesparticipationofskilledreviewersin

thesearchprocess,thehybridapproach,whereastheGrossmanandCormackapproach

seekstoeliminatetheroleoftheskilleduser,namelytrainedattorneys.Therationale

fortheirautomationgoalistheunsubstantiatedclaimthattheadversarialcontextof

legalsearchmakesattorneysuntrustworthy.Theyclaimthatinherentuserbiasmeans

fullyautomatedapproachesaretheonlyreliablemethodsoflegalsearch.Grossman&

Cormack,AutonomyandReliabilityofContinuousActiveLearningforTechnology-AssistedReview,CoRRabs/1504.06868atpg.1(2015)(“IneDiscovery,thereviewistypicallyconductedinanadversarialcontext,whichmayofferthereviewerlimitedincentivetoconductthebestpossiblesearch.”)ObviouslytheTeamdisputesthis

assumptionandconclusion.Wedonotendorsetheviewoftheinherentbiasand

untrustworthinessofattorneys.InRalphLosey’sexperienceasapracticingattorney

since1980suchbiasistherareexception,notthenorm,andshouldnotbethebasisof

alegalsearchstrategy.Thebettersolutiontothisminorissueoftrustworthinessis

educational,totrainmoreattorneysinsearchandinprofessionalethics.Sinceourcore

assumptionsonprocessandattorneyhonestyarefundamentallydifferent,sotooare

ourmethodsandgoal.Ouraimisaugmentationofskilledattorneystoperformlegal

search,notautomation,notreplacement.

Second,ourTeamusesavarietyofsearchmethods,amultimodalapproach,whereastheGrossmanandCormackapproachreliessolelyupontheuseofhigh-ranking

Page 24: e-Discovery Team at TREC 2015 Total Recall Track

24

documentstotrainaclassifier.Thisisconsistentwiththeiraimtofullyautomateandeliminateattorneysfromthelegalsearchprocess,againbasedonthepremisewedisputeofattorneybias.Intheirwords:“Forthereasonsstatedabove,itmaybe

desirabletolimitdiscretionarychoicesintheselectionofsearchtools,tuningparameters,

andsearchstrategy.”Id.Wedisagreeandseektoempowerattorneyswithavarietyofsearchtools,includingtheonesearchmethodthattheyendorseofrelianceonhigh-rankingdocuments.AlsoseeandthediscussionandcitationsinEndnote19.

[3]Intheserespectsthee-DiscoveryTeamfollowstheteachingsofGaryMarchionini,DeanoftheSchoolofInformationandLibrarySciencesofU.N.C.atChapelHill,whoexplainedinInformationSeekinginElectronicEnvironments(Cambridge1995)thatinformationseekingexpertiseisacriticalskillforsuccessfulsearch.ProfessorMarchioniniargues,andweagree,that:“Onegoalofhuman-computerinteractionresearchistoapply

computingpowertoamplifyandaugmentthesehumanabilities.”WealsofollowtheteachingsofUCLAProfessorMarciaJ.Bateswhohasadvocatedforamultimodalapproachtosearchsince1989.Bates,MarciaJ.,TheDesignofBrowsingandBerrypickingTechniquesfortheOnlineSearchInterface,OnlineReview13(October1989):407-424.AsProfessorBatesexplainedin2011inQuora:

“AnimportantthingwelearnedearlyonisthatsuccessfulsearchingrequireswhatI

called“berrypicking.”…Berrypickinginvolves1)searchingmanydifferent

places/sources,2)usingdifferentsearchtechniquesindifferentplaces,and3)

changingyoursearchgoalasyougoalongandlearnthingsalongtheway.Thismay

seemfairlyobviouswhenstatedthisway,but,infact,manysearcherserroneously

thinktheywillfindeverythingtheywantinjustoneplace,andsecond,many

informationsystemshavebeendesignedtopermitonlyonekindofsearching,and

inhibitthesearcherfromusingthemoreeffectiveberrypickingtechnique.”

Alsosee:White&Roth,ExploratorySearch:BeyondtheQuery-ResponseParadigm(Morgan&Claypool,2009).

[4] TheTotalRecallTrackfullyautomatedmethodfollowstheTrackAdministrator’spreferredmethodologyoffullyautomatedmonomodalsearch(highrankingonly)andtheirrecentlyannouncedgoaltoeliminateattorneyreviewinfavoroffullautomation.Grossman&Cormack,AutonomyandReliabilityofContinuousActiveLearningfor

Technology-AssistedReview,supraatpg.1(2015): “Ourgoalistofullyautomatethesechoices,sothattheonlyinputrequiredfromthe

revieweris,attheoutset,ashortquery,topicdescription,orsinglerelevant

document,followedbyanassessmentofrelevanceforeachdocument,asitis

retrieved.” Theycallthemethod“AutonomousTAR.”Id.atpg.6.Theprotocolsofthefully

automateddivisionoftheTotalRecallTrackwereapparentlydesignedinpartbyCormackandGrossmantotestthispremise,andtheresultstheyattainedasparticipantsinthisdivision,alongwithalloftheotherfullyautomatedparticipantsfromUniversitiesaroundtheworld,areveryimpressive.Still,thee-DiscoveryTeam,whodidnotparticipateinthe2015automateddivision,notesthatmanyoftheprotocolsinthisexperimentarebasedonfictionsandconditionsnotfoundintherealworldoflegalsearch,wheretheTeam’smethodsweredeveloped.Thedifferencesinclude,butarenotlimitedto:theexistenceofanomnipotentSMEthatinstantlyprovidesperfectlycorrectjudgmentalfeedbackastorelevanceofalldocumentsselectedbytheautomatedprocessesasprobablerelevant;simple,single-facetissues;relativelysimpledatasetsstrippedofmostnativemetadata;and,perhapsmostimportantly,issues

Page 25: e-Discovery Team at TREC 2015 Total Recall Track

25

requiringlittleornolegalanalysisorbackgroundlegalknowledge.Note,inposthocrunsthee-DiscoveryTeamranafewfullyautomatedrunsonKrollOntracksystemsandEDR.WeusedthesamehighrankingonlyAutonomousTARtrainingmethodandobtainedthesameresultsasalloftheotherfullyautomateddivisionparticipants.

[5] “Contractreviewattorney,”orsimply“contractattorney,”isatermnowincommonparlanceinthelegalprofessiontorefertolicensedattorneyswhododocumentreviewonaproject-by-projectbasis.Theirpayunderaprojectcontractisusuallybythehourandisatafarlowerratethanattorneysinalawfirm,typicallyonly$50to$75perhour.Theironlyresponsibilityistoreviewdocumentsunderthedirectsupervisionoflawfirmattorneyswhohavemuchhigherbillingrates.

[6] PredictiveCodingisdefinedbyTheGrossman-CormackGlossaryofTechnology-AssistedReview,2013Fed.Cts.L.Rev.7(January2013)(Grossman-CormackGlossary)as:“Anindustry-specifictermgenerallyusedtodescribeaTechnologyAssistedReviewprocessinvolvingtheuseofaMachineLearningAlgorithmtodistinguishRelevantfromNon-RelevantDocuments,basedonSubjectMatterExpert(s)CodingofaTrainingSetofDocuments.”ATechnologyAssistedReviewprocessisdefinedas:“AprocessforPrioritizingorCodingaCollectionofelectronicDocumentsusingacomputerizedsystemthatharnesseshumanjudgmentsofoneormoreSubjectMatterExpert(s)onasmallersetofDocumentsandthenextrapolatesthosejudgmentstotheremainingDocumentCollection.…TARprocessesgenerallyincorporateStatisticalModelsand/orSamplingtechniquestoguidetheprocessandtomeasureoverallsystemeffectiveness.”Alsosee:Technology-AssistedReviewinE-DiscoveryCanBeMoreEffectiveandMoreEfficientThanExhaustiveManualReview,RichmondJournalofLawandTechnology,Vol.XVII,Issue3,Article11(2011).

[7] DaSilvaMoorev.PublicisGroupe868F.Supp.2d137(SDNY2012)andnumerouscaseslatercitingtoandfollowingthislandmarkdecisionbyJudgeAndrewPeck,includingJudgePeck’sownmorerecentRioTintov.Vale,2015WL872294(March2,2015,SDNY).

[8] Grossman&Cormack,EvaluationofMachine-LearningProtocolsforTechnology-AssistedReviewinElectronicDiscovery,SIGIR’14,July6–11,2014;Grossman&Cormack,Commentson“TheImplicationsofRule26(g)ontheUseofTechnology-AssistedReview”,7FederalCourtsLawReview286(2014);HerbertRoitblat,seriesoffiveOrcaTecblogposts(1,2,3,4,5),May-August2014;HerbertRoitblat,Daubert,Rule26(g)andtheeDiscoveryTurkeyOrcaTecblog,August11th,2014;Hickman&Schieneman,TheImplicationsofRule26(g)ontheUseofTechnology-AssistedReview,7FED.CTS.L.REV.239(2013);Losey,R.PredictiveCoding3.0,partone(e-DiscoveryTeam10/11/15).

[9] Id.;Webber,Randomvsactiveselectionoftrainingexamplesine-discovery(Evaluatinge-Discoveryblog,7/14/14).

[10] SeeEndnote[2].Thisdisagreementiswithinageneralframeworkofagreementonthesuperiorityofcomputerassistedmethodsovertraditionallinearreview,jointcriticismofrandomselectionmethodsandcontrolsetsinlegalreview,andagreementontheuseofcontinuousactivelearning,asopposedtooneanddone,identifiedbyLoseyasPredictiveCodingVersion1.0.PredictiveCoding3.0,partone(e-DiscoveryTeam10/11/15).

[11] Grossman&Cormack,AutonomyandReliabilityofContinuousActiveLearningforTechnology-AssistedReview,CoRRabs/1504.06868(2015);Multi-FacetedRecallof

Page 26: e-Discovery Team at TREC 2015 Total Recall Track

26

ContinuousActiveLearningforTechnology-AssistedReview,SIGIR’15,August09-13,2015,Santiago,Chile.(2015).

[12] Losey,R.,PredictiveCoding3.0,parttwo(e-DiscoveryTeam,10/18/15).[13] Shakespeare,W.,HenryVI,PtII,Act4,Scene2,71-78(“Thefirstthingwedo,let'skillall

thelawyers.”).Thisfamousanti-lawyerlinewasspokenby“Dickthebutcher,”atraitorhopingtostartarevolutionandpropuphisfriendasanautocraticruler.

[14] Losey,R.,PredictiveCoding3.0,partone)(2015e-DiscoveryTeam),seethesubsectiontherein,PredictiveCoding1.0andtheFirstPatents,discussingcommonprejudiceagainstlawyersbyacademicsandITthatdrovetheill-advisedimpositionofsecretcontrolsetsinthefirstversionsofpredictivecodingsoftware.ThenewdrivebyCormackandGrossmantofullyautomatelegalsearchandeliminateSMEsandattorneysearchexpertisefromlegalsearchseemsbased,atleastinpart,onthesamefalsepremises.AlsoseeLosey,R.,Manciav.MayflowerBeginsaPilgrimagetotheNewWorldofCooperation,10SedonaConf.J.377(2009Supp.);Losey,R.,LawyersBehavingBadly,60MercerL.Rev.983(Spring2009).

[15] SeeZeroErrorNumericsforapartiallistofqualitycontrolandqualityassurancemethodsendorsedbythee-DiscoveryTeam,foundatZeroErrorNumerics.com(ZENDocumentReview).Alsosee:ConceptDriftandConsistency:TwoKeystoDocumentReviewQuality,e-DiscoveryTeam(Jan.20,2016).

[16] Thecostoftraditionallineardocumentreviewisoftenfarhigherthan$1.00perfileinpractice.In2007theU.S.DepartmentofJusticespent$9.09perdocumentforreviewintheFannieMaecase,eventhoughitusedcontractlawyersforthereviewwork.InreFannieMaeSecuritiesLitig.,552F.3d814,817(D.C.Cir.2009)($6,000,000/660,000emails).AtaboutthesametimeVerizonpaid$6.09perdocumentforamassivesecondreviewprojectthatenjoyedlargeeconomiesofscaleand,again,utilizedcontractreviewlawyers.Roitblat,Kershaw,andOot,Documentcategorizationinlegalelectronicdiscovery:computerclassificationvs.manualreview.JournaloftheAmericanSocietyforInformationScienceandTechnology,61(1):70–80,2010($14,000,000toreview2.3milliondocumentsinfourmonths).

[17] E.M.Voorhees,VariationsinrelevancejudgmentsandthemeasurementofretrievalEffectiveness,InformationProcessing&Management,36(5):697{716,2000(onpooling);Oard,Baron,Hedlin,lewis,Tomlinson,EvaluationofInformationRetrievalforE-Discovery,JournalArtificialIntelligenceandLaw,Vol.18Issue4,December2010Pgs.347-386.

[18] AutonomyandReliability,supraatpgs.2-3(“Thispaperoffersahistoricalreviewofresearcheffortstoachievehighrecall...”ThepaperalsoestimatestheBlairMaronprecisionscoreof20%andliststhetopscores(withoutattribution)inmostTRECyears);Hedin,Tomlinson,Baron,andOard,OverviewoftheTREC2009LegalTrack(TREC2009);Cormack,Grossman,Hedin,andOard;OverviewoftheTREC2010LegalTrack(TREC2010);Grossman,Cormack,Hedin,andOard,OverviewoftheTREC2011LegalTrack(TREC2011);EvaluationofInformationRetrievalforE-Discovery,supraatpgs.24-27.ThetopTRECresultscitedforthesixyearsofLegaltrackareinthe60%to70%F1rangewithacoupleofresultsinthelow80%F1range.TheRecommindparticipationinthelastTRECLegalTrack2011,andtheirsubsequentprohibitedmarketingadvertisementsclaimingto“win,”whichledtotheirlifetimebanfromTREC,onlyattainedaRecallof62.3%inonetopic(403).OverviewoftheTREC2011LegalTrack(TREC2011)supra.ContrastallofthepriorTRECresultswiththee-DiscoveryTeamresultsin18topicsinthe80%to100%F1range,withnumeroustopicsinthemidtohigh90%F1range.Of

Page 27: e-Discovery Team at TREC 2015 Total Recall Track

27

course,thesedifferentTRECeventshadvaryingexperimentsandtestconditionsandsodirectcomparisonsbetweenTRECstudiesarenevervalid,butgeneralcomparisonsareinstructiveandfrequentlymadeinthecitedliterature.

[19] SeethereportontheElectronicDiscoveryInstitute(EDI)Oraclelegalsearchexperimentsinvolvingthelargestnumberoflegalsearchparticipantstodatewhereamemberofthee-DiscoveryTeamattainedhighscores.Bay,M.,EDI-OracleStudy:HumansAreStillEssentialinE-Discovery:PhaseIofthestudyshowsthatolderlawyersstillhavee-discoverychopsandyoudon’twanttoturnEDDovertorobots(11/20/13,LTN).MonicaBay,theEditorofLawTechnologyNews,summarizestheconclusionofEDIfromthestudythat:“Conclusion:Softwareisonlyasgoodasitsoperators.Humancontributionisthemostsignificantelement.”PatrickOot,co-founderoftheElectronicDiscoveryInstitutepresentedthefindingsofPhaseIIoftheOraclePredictiveCodingSurveyatILTACONDay3,asreportedinTheRelativityBlog,9/2/15:“[W]henitcomestowhatsomevendorscallContinuousActiveLearning,Ootindicatedthedebatewassomewhatofaredherring,adding,“ContinuousActiveLearningisjustabuzzword.”Ootsummeduphisthoughtsbystressingthehumancomponentoftechnology-assistedreview.NotingthatthebestperformingtechnologyintheOraclestudywastheoneusedbyaseniorattorney,Ootsaid,“Agoodartistwithagoodbrushisbest.”UnfortunatelythefinalresultsoftheEDIOraclestudyhavenotyetbeenpublishedand,asparticipantsinthatstudy,wearecurrentlyconstrainedfromanydetailedreporting.

[20] SeePreSuit.comwherethee-DiscoveryTeam’sproposalisoutlinedtomonitortheITsystemsoflargeorganizationswithadvancedanalyticsandothersearchmethodstopredictandavoidfutureillegalconduct.Thisman-machinehybridtypeofearlywarningsystemincludessafeguardstoprotectbothindividualprivacyrightsandconfidentialcorporateinformation.

Page 28: e-Discovery Team at TREC 2015 Total Recall Track

APPENDIX

E-DiscoveryTeam89-PageNarrativeReportofall30Topics

ThisAppendixNarrativeReportdescribesthesearchofallthirtyTotalRecalltopicsinTREC2015

usingthee-DiscoveryTeam’sHybridMultimodalmethod.Thereportfollowsthechronological

orderinwhichthesearcheswereconducted.ThefirstprojectstartedonJuly14,2015.Itwas

Topic103ManateeProtection.ThelastTopic3089PicktonMurdersconcludedonAugust28,2015.AtthebeginningofeachTopictheresultsarereportedforthatTopic.Eachhasthesame

formanddisclosesmetricsatthetimeswhen:(1)theReasonablecallwasmade;and,(2)the

pointwhere97.5%Recallwasattained.Theyaresummarizedalongwithavariationofa

standardConfusionMatrix,a/k/aContingencyTable1TheConfusionMatrixitselfishighlighted

inblue.Itisfollowedbyalistofthekeythevaluesattained:Recall,Precision,F1Measure,Accuracy,Error,ElusionandFallout.

Workonmultipletopicswasconductedatthesametime.Sullivan,whoworkedoneighttopics,

Reichenberger,whoworkedonfour,andWhite,whodidone,eachworkedonasingletopicata

time.Theydid,however,workconcurrentlywithLoseyandeachother.Losey,whoworkedon

seventeentopics,andhadtheassistanceofacontractreviewattorneyonthetenBushEmail

Topics,typicallyworkedconcurrentlyonmultipletopicsatthesametime.AllTopicswerea

Teameffort,buttheattorneysidentifiedasrunningeachTopiccontrolledthereviewworkforthatTopic.Consultationwascommon,especiallyatfirst.

Topic103ManateeProtection

ConfusionMatrix-Topic103TotalDocuments:290,099

TotalRelevant:5,725

TotalPrevalence:1.97%

1Grossman&CormackGlossary,supraFN1atpg.6.TheConfusionMatrixisalsoreferredtoasaContingencyTable.

@Reas.Call

@97.5%Recall

TruePositives 4,780 5,582

TrueNegatives 284,348 283,793

FalsePositives 26 581

FalseNegatives 945 143

Recall 83.49% 97.50%

Precision 99.46% 90.57%

F1Measure 90.78% 93.91%

Accuracy 99.67% 99.75%

Error 0.33% 0.25%

Elusion 0.33% 0.05%

Fallout 0.01% 0.20%

Page 29: e-Discovery Team at TREC 2015 Total Recall Track

2

Thee-DiscoveryTeam’sTRECTotalRecallprojectcommencedonJuly14,2015withworkonTopic103ManateeProtection.ThistopicwasrunbyLosey.HedidnotcompleteworkuntilJuly22,2015.Althoughitmayseemfasttoseeareviewof290,099documentscompletedbyoneattorneyinonlyeightdays(withnobreaks),therewasmoretimespentonthistopicthananyoftheothers.Butasignificantamountofthistimewasspentongeneralset-up,procedures,contractreviewertraining,projectorientation,andcommunicationprotocols.CompletionofthisTopicwasalsodelayedduetotheavailabilityofthecontractreviewattorney,AnneBottolene,whoassistedLoseyforthefirstpartoftheworkonTopic103,andduetosomeinitialsoftwareconfigurationsetupissues.TheTeamfoundthisTopicchallengingforavarietyofreasons,includingthefactthattheBushcollectionof290,099emailshadbeenstrippedofitsoriginalmetadata,images,andattachments.Further,wefoundsomeinconsistenciesinjudgingthistopic,althoughnotmany.OverallwefoundTopic103hadoneofthebestgold-standardsofthetenBushEmailTopics.RalphLoseyisanativeFloridianandFloridaattorneyfor35years.HewassomewhatknowledgeableaboutalloftheBushEmailissues,certainlyfarmoresothantheaverageperson,buthedidnotconsiderhimselfabonafidesubjectmatterexpert(SME)onanyofthem.Losey’sknowledgeandinterestonManateeProtectionissueswas,however,higherthantheotherBushTopics.Forthatreasonitwaschosenasthefirsttopic.Losey’sassistant,Bottolene,hadlivedinFloridaforseveralyearsandalsohadsomebackgroundwiththeManateeProtectionissue.TheygenerallyconsideredtheirfamiliaritywiththeissuetobeanassetinthesearchofTopic103.ThesamecannotbesaidofotherBushEmailTopics.TheprojectcommencedafterinitialorientationonJuly14,2015withLoseybeginningStepTwo,MultimodalSearchReviews.BottolenewasassignedStepThree,RandomBaseline.DuetovariousschedulingandimplementationissuesBottolenedidnotcompleteherreviewofthesampleuntilJuly20,2015,lateafternoon.Shereviewedandcodedaseitherrelevantorirrelevantarandomsampleof1,534Bushemails.ThiswasoneofonlytwoTopicswhereinStepThreewasfollowedandafullrandomsamplewastaken.Itprovedveryhelpful.BasedonthesampleprevalencewepredictedaspotprojectionforprevalenceinTopic103of5,175documents(95%+/-2.5%confidencelevels).Infact,thetotalrelevantdocumentsinTopic103provedtobe5,725,wellwithinthe2.5%marginoferror.Basedonthelengthoftimeneededforrandomsamplereview,andourdesiretocompleteallthirtytopicsin45days,wedecidedtoskipthisstepforensuingreviews.(Topic101JudicialSelectionwasstartedshortly

Page 30: e-Discovery Team at TREC 2015 Total Recall Track

3

afterTopic103,andalsoincludedStepThreeRandomBaseline.)Asmentioned,wealsoskippedmostoftheproceduresinStep7-“ZeroErrorNumerics”concerningqualitycontrolinthisandall30Topics.AfterBottolenecompletedtherandomsamplereviewonJuly20thsheassistedLoseyonJuly21stand22ndinhisworkonStepFiveMultimodalSearchReview.AtthattimesubmissiontoTREChadalreadybegunandtheTeamwasevaluatingtheconfirmedrelevantandirrelevantdocumentsfromTREC.Atotalof24documentsubmissionsweremadetoTRECinthisTopic:fourdocumentsubmissionsonJuly20th,oneofJuly21st,andtheremainingnineteensubmissionsweremadeonJuly22,2015.InbetweenmostofthesesubmissionstheTeamconductedStepsFour,FiveandSixofitsstandardworkflow.Thesearethepredictivecodingstepsthatiterate.InStepFourthesoftware,Mr.EDR,analyzesthedocumentsdesignatedfortraininginStepTwointheseedset,andinStepFivethereafter.Mr.EDRthenranksthewholedatasetaccordingtoprobablerelevanceandirrelevance.InStepFivetheattorneyssearchformoredocumentstousetotrainMr.EDR.ItisessentiallythesameasStepTwo,exceptnowtheattorneyscanaddprobabilityandrankbasedsearchestotheirmultimodaltoolkit.ThatistheTeam’sfullsearchpyramid,shownright.ThemethodsareusedadhocaccordingtowhattheattorneyreviewerconsidersapromisingmethodtofindadditionalrelevantdocumentsbasedinpartonthelatestEDRrankingsandTRECsubmissionreturns.Oncenewdocumentsarefoundthatarelikelytoberelevant,theyarethendesignatedinStepSixforTraining.Notalldocumentsaresodesignated.Againthisisatthediscretionoftheattorneysastowhatdocumentstheythinkwouldbestservetotrainintheongoingactivelearningprocess.InTopic103theuseofpredictivecodingrankedbasedsearcheswasseverelyconstrained.Thiswasduetoinitialconfigurationsetuperrors,whereinputparametersforthelearningengineweresetincorrectly.ThesesetuperrorsweredetectedandcorrectedbyJuly22,2015,andthereafterMr.EDRwasofgreatassistance.Still,asaresultofthedelaysandearlyerrors,thisTopicreliedmuchmoreheavilythananyotheronkeywordsearchesandhumanlinearreviews.Similaritysearcheswerealsousedextensively.BasicallythepredictivecodingassistanceinthisTopicdidnotbeginuntilthe14thsubmission.LoseycalledReasonableafterthe15thsubmission.IntheTRECexperimentsmost,butnotall,ofthedocumentsreturnedasrelevantorirrelevantbyTRECwereincludedintraining(StepSix).Inthatwaytheirrankingimpactwasevaluated(StepFour)beforethenextsubmission.TrainingalsoincludedvariousirrelevantdocumentsthatwerenotTRECadjudicated,butwerethoughttobeobviouslyirrelevant.Experimentsweremadeastotheimpactofvaryingthenumberofirrelevantdocumentsinthehopethatsome

Page 31: e-Discovery Team at TREC 2015 Total Recall Track

4

idealrangeorratiocouldbedeterminedtomaximizeMr.EDRefficiency.Theseexperimentsarestillunderway.OurconclusionsasoflateDecember2015arestatedinthebodyofthisreport.Afteratotalof15submissionsthatpresented4,806documentstoTRECforadjudication,LoseycalledReasonableandstoppedworkonJuly22,2015,aweekaftertheTopicstarted.Thereafteranadditional9submissionsweremadetoTRECtosubmittheremaining285,293emails(98.34%ofthe290,099total).TherewasTraininginbetweenmostoftheremainingsevensubmissionsbasedontheTRECadjudications,butnofurtherhumaninput.Thefirsttwopost-callsubmissionswerecriticaltotheTeam’sexcellentperformanceonthisTopic.LoseycalledReasonableatthepointhethoughtthatareasonablehumanefforthadbeenmadetofindrelevantdocuments.LoseyandhisassistantBottolenehadpersonallyreviewedandcodedasrelevantorirrelevant7,203documents.(Additionaldocumentshadbeencodedwithoutreview.)Infact,bythetimeLoseyhadsubmitted2,309documentstoTRECforadjudication(the14thsubmission)hehadcompletedallindividualdocumentreview(7,203documents),andhadcompletedallsearchesotherthanpredictivecodingrankingsearcheswheredocumentcontentisnotreviewed.Atthattime(afterthe14thsubmission)heessentiallyturnedtheprocessovertoMr.EDR,whohadbythenjustrecoveredfromanearliertechnicalillnessandhadnotbeenfunctionalbefore.AtthetimeLoseycalledReasonablehehadsubmittedatotalof4,806documents.Ofthose,4,780hadbeenadjudicatedasrelevant.ThiswasanincrediblePrecisionrateof99.46%.ThiswasthemostPreciseproductionthatLoseythinkshehasevermade.Healsothoughtthathemayhaveattainedashighasa90%Recall,but,infactthelatersubmissionsshowedthatatthetimeReasonablewascalledhehadattainedaRecallof83.5%.ThisisstillconsideredahighRecalllevelinlegalsearch,andthecombinedF1measureof90.8%is,inlegalsearch,likeanyother,averyoutstandingeffort.ThenextsubmissionsafterReasonablewascalledwerealwaysthedocumentsthatwerehighestrankedbyMr.EDR,whichiswhywecallthisanautomatedfunction.AsweunderstandthegamesetupbyTRECfortheRecallTrack,theactualscoringisnotimpactedbytheReasonablecall.Thescoringcontinuesforallsubmissionsuntilalldocumentshavebeenreturned.TheReasonablecallismerelyanindicationofefforts.Thesamegoesforthe70%,80%recallcalls,whenandiftheyaremadebeforetheReasonableeffortcall,excepttheyareofevenlessinterest.Thesecallswerenotsupposedtohaveanimpactonscoring.InthefirsttwosubmissionsafterthecallinTopic103,the16thand17thsubmissions,Mr.EDRidentifiedandhighlyranked661additionalrelevantdocuments,bringingthetotalrelevantfoundto5,467outofthetotal5,725.WeweretherebyabletoattaininthatsubmissionaRecallof90%withPrecisionof99.33%,aRecallof95%withPrecisionof98.8%,and97.5%RecallwithaPrecisionof90.57%!AsfarasLoseyknows,thesestatisticsrepresenthispersonalbestefforts,especiallyconsideringthathedidsowithverylittlerelianceonpredictiveranking.Whatmakesthis97.5%Recall,90.6%Precisionallthemoreremarkableforlegalsearchisthatitwasaccomplishedbyonlyoneexpertattorneyassistedbyonecontractreviewattorney.Themeasuredefforttoattainthesehighlevelswasremarkablylow,especiallyconsideringthatasignificantamountoftimeinTopic103wasspentreviewingthebaselinesample(StepThree).Togetherthetwoattorneysonlyreviewed7,203documentsoutofthetotalcorpusof290,099

Page 32: e-Discovery Team at TREC 2015 Total Recall Track

5

emails(2.5%).Inlegalsearchitiscommonforattorneyreviewteamstoconsistofdozensorevenhundredsofattorneys.Moreover,evenwhenpredictivecodingisused,afarhigherpercentofthecorpusistypicallyreviewedthan2.5%,andRecalllevelsof97.5%areunheardof,muchlessprecisioninexcessof90%.Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%recallcall,andthedarkgreenlinethereasonablerecallcall.(Pleasenote,thatthegraphisnottoscaleasthegraphisbasedonindividualsubmissions.Wethoughtthisabetterdepictionthanbyproportionallyshowingprogressbecauseinmostcasesaproportionalgraphwouldbealinevirtuallystraightupfromthestartandflatgoingover).

ThenextchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheManateeProtectiontopic,bythetime97.5%Recallhadbeenattainedonly2.12%ofthecorpus,6,163documents,hadbeensubmittedforadjudication.Thisisatriumphforthesearchpyramidfoundation,especiallykeywordsearch,thatsupportsAItraining.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining97.88%or283,936documents.

Page 33: e-Discovery Team at TREC 2015 Total Recall Track

6

Thechartbelowrepresentstheamountofeffortintermsofdocumentsreviewedtoattain100%RecallusingthemultimodalhybridmodelofsearchandtrainingofMr.EDR.

______________________________________

Topic2108CAPTCHAServices

ConfusionMatrix-Topic2108TotalDocuments:465,147TotalRelevant:656 TotalPrevalence:0.14%

@Reas.Call

@97.5%Recall

TruePositives 580 640TrueNegatives 463,566 458,906FalsePositives 925 5,585FalseNegatives 76 16Recall 88.41% 97.56%Precision 38.54% 10.28%F1Measure 53.68% 18.60%Accuracy 99.78% 98.80%Error 0.22% 1.20%Elusion 0.02% 0.00%Fallout 0.20% 1.20%

Page 34: e-Discovery Team at TREC 2015 Total Recall Track

7

Topic2108wasrunbyLoseywithoutanyassistanceofareviewlawyer.Theworktosearchthe465,149BlackHatWorldForumpostsstartedonJuly16,2015,butdidnotconcludeuntilAugust1,2015.ThereasonforthedelayincompletionisthattheTeamencountereddifficultiesinunderstandingtheinitialTRECadjudicationstotheirfirstsubmissions.NeitherLosey,northeotherattorneyTeammembersconsulted,couldunderstandtherelevancepatternbehindTREC’sinitialsubmissionresponses.DuetotheinitialEDRconfigurationerror,predictivecodingwasnotavailabletoassistatfirstinascertainingtherelevancescope.Afterseveraldaysofstrugglingwiththisproject,LoseyputthisTopiconholduntilJuly29thatwhichtimeLoseyreturnedtotheTopictofinish.AsageneralcommenttheTeamfoundalloftheBlackHatWorldForumpostschallengingtosearch,moredifficultthanatypicalsearchofcorporateESI.Thatisinpartbecausealmostallmetadataoftheseposts,andallassociatedimagery,hadbeenstrippedbyTRECandtheESIconvertedtotextfiles.Alsothelanguageandissues(allnon-legal)intheBlackHatWorldForumswereobscure.Eventhoughourattorneysearcherswereallfamiliarwithforumsandhadknowledgeofmostofthetechnologiesandsometimesillegal,nearlyalwaysunethical,marketingpracticesdiscussedinBlackHatWorld,theystillfoundtheslang-filledpostsdifficulttoreviewandanalyze.Thechallengeswerecompoundedbysignificantinconsistencies,andapparentillogicoftheTRECjudginginmanyofthesetopics.Still,theTeamwasabletoovercomethesechallengesand,afterwelearnednottotrytounderstandanyrelevancerules,weoveralldidquitewellinreviewofthetenBlackHatWorldForumTopics.Basedontheelusive(tohumans)relevancestandard,wefoundthatthesetopicsrequiredgreaterrelianceonMr.EDRthantheBushEmailsandNewsArticles.EventhoughwecontinuedtouseamultimodalapproachinForumtopics,ouremphasiswasontheAIfeaturesofrankingandprobability.TheTeamreadilyadmitsthatitsownhumanintelligence,withouttheconsiderableAIenhancementsofMr.EDR,wasnotuptothetaskofmatchingTRECrelevancecallsfortheForumTopics.Butwiththehelpofpredictivecoding(Me.EDR)weovercamethedifficultiesandattainedrelativelyhighrecalllevels.OnJuly31,2015,aftermaking22documentsubmissionstoTRECprovidingatotal1,505documents,Loseyhadfoundatotalof580relevantdocuments.Theeffort,ornumberofdocumentsreviewedandcodedbyLoseytoattainthisresult,was2,101documents.Infact,Loseyhadstoppeddocumentreviewafterthe21stsubmission.His22ndsubmissionwasentirelybasedondocumentrankingswithoutreview.Afterthe22ndTRECsubmission,LoseydecidedtocallReasonable.ItwaslaterdeterminedthataRecallof88.41%hadbeenattained.ThereweresevenadditionalsubmissionstoTRECaftertheReasonablecallpoint.Inthenext,23rdsubmission,95%Recallwasattainedaftersubmittingonly2,130additionaldocuments.Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%recallcall,andthedarkgreenlinethereasonablerecallcall.(Pleasenote,thatthisgraph,andallotherslikeit,arenottoscaleasthegraphsarebasedonindividualsubmissions.Wethoughtthisabetterdepictionthanbyproportionallyshowingprogressbecauseinmostcasesaproportionalgraphwouldbealinevirtuallystraightupfromthestartandflatgoingover).

Page 35: e-Discovery Team at TREC 2015 Total Recall Track

8

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheCAPTCHAServicestopic,bythetime97.5%Recallhadbeenattainedonly1.34%ofthecorpus,6,225documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining98.66%or458,922documents.

Thelastchartbelowrepresentstheamountofeffortintermsofdocumentsreviewedtoattain100%RecallusingthemultimodalhybridmodelofsearchandtrainingofMr.EDR.

Page 36: e-Discovery Team at TREC 2015 Total Recall Track

9

______________________________________

Topic101JudicialSelectionConfusionMatrix-Topic101TotalDocuments:290,099TotalRelevant:5,834 TotalPrevalence:2.01%

@Reas.Call

@97.5%Recall

TruePositives 5,026 5,688TrueNegatives 283,608 281,901FalsePositives 657 2,364FalseNegatives 808 146Recall 86.15% 97.50%Precision 88.44% 70.64%F1Measure 87.28% 81.93%Accuracy 99.49% 99.13%Error 0.51% 0.87%Elusion 0.28% 0.05%Fallout 0.23% 0.83%

Page 37: e-Discovery Team at TREC 2015 Total Recall Track

10

Topic101wasrunbyLoseywiththeassistanceofareviewattorney,DavidJensen.Theworktosearchthe290,099BushEmailsstartedonJuly16,2015andconcludedonJuly26,2015.TheprojectcommencedwithLoseybeginningStepTwo,MultimodalSearchReviews,andJensenassignedStepThree,RandomBaseline.JensenfinishedtherandomsamplereviewthenextdayandbeganassistingLoseyinStepTwo,andaftersubmissionsbegan,theechoStepFive,multimodal.LoseyhandledalloftheAIrelatedsearchesinStepFive,includingtheprobabilityandrankingrelatedsearches.Jensenfocusedonkeywordsearchesandalsomadesuggestionsofdocumentstosubmit.FinaldecisionsonsubmissionswerealwaysmadebyLoseyonallTopics.DuetothesamementionedinitialconfigurationsetuperrorstheAIfeaturesdidnotworkuntilneartheendofthisTopic.LoseyinsteadreliedheavilyonKeyword,linear,andanewtypeofSimilaritysearchtheTeaminventedoutofnecessityduringTRECevents.ItisanticipatedthatthenewsimilaritysearchfeaturewillbeincludedinfutureMr.EDRreleases.Reviewoftherandomsampleof1,534Bushemailsfound30thatwererelevant.Thatsuggestedaprevalenceof1.96%andaspotprojectionof5,673documents.Theactualrelevantcountof5,834andprevalenceof2.01%wasveryclosetotheprojection.NotethisisthesecondandlastTopicinwhichafullStepThreerandomsamplewasimplemented.OnJuly25,2015,aftermaking15documentsubmissionstoTRECprovidingatotal5,683documents,Loseyhadfoundatotalof5,026relevantdocuments.Theeffort,ornumberofdocumentsreviewedandcodedbyLoseytoattainthisresult,was6,895documents.Infact,Loseyhadstoppeddocumentreviewafterthe14thsubmission,ashis15thsubmissionwasentirelybasedondocumentrankingswithoutreview.Afterthe15thTRECsubmission,LoseydecidedtocallReasonable.ItwaslaterdeterminedthataRecallof86.15%hadbeenattainedwithaPrecisionof88.44%.Therewereanadditional8submissionstoTRECaftertheReasonablecallpoint.Inthenext,the16ththerewasasubmissionof652documents,345ofwhichwererelevant.95%Recallwith82.7%Precisionwasattainedaftersubmittingonly6,705documents(1,022afterReasonablecall).97.5%Recallwith70.6%Precisionwasattainedaftersubmittingonly8,052documents(2,369afterReasonablecall).Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%recallcall,andthedarkgreenlinethereasonablerecallcall.

Page 38: e-Discovery Team at TREC 2015 Total Recall Track

11

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheJudicialSelectiontopic,bythetime97.5%Recallhadbeenattainedonly2.78%ofthecorpus,8,052documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining97.22%or282,047documents.

Thelastchartbelowrepresentstheamountofeffortintermsofdocumentsreviewedtoattain100%RecallusingthemultimodalhybridmodelofsearchandtrainingofMr.EDR.

Page 39: e-Discovery Team at TREC 2015 Total Recall Track

12

______________________________________

Topic108ManateeCountyConfusionMatrix-Topic108TotalDocuments:290,099TotalTRECRelevant:2,375 TotalTRECPrevalence:0.82%

Topic108wasrunbyLoseywiththeassistanceofareviewattorney,Bottolene.Theworktosearchthe290,099BushEmailsalsostartedonJuly16,2015andconcludedonJuly24,2015.TheprojectcommencedwithLoseyandhisassistantbeginningStepTwo,MultimodalSearch

UsingTRECrelevantcalls

@Reas.Call

@97.5%Recall

TruePositives 734 2,316TrueNegatives 287,712 26,197FalsePositives 12 261,527FalseNegatives 1,641 59Recall 30.91% 97.52%Precision 98.39% 0.88%F1Measure 47.04% 1.74%Accuracy 99.43% 9.83%Error 0.57% 90.17%Elusion 0.57% 0.22%Fallout 0.00% 90.90%

Page 40: e-Discovery Team at TREC 2015 Total Recall Track

13

Reviews.StepThree,RandomBaseline,wasomitted.Aftersubmissionsbegan,theechoStepFive,multimodal,wasdonebyLoseywithassistanceatfirstofBottolene.LoseyhandledalloftheAIrelatedsearchesinStepFive,includingtheprobabilityandrankingrelatedsearches.Hisassistantfocusedonkeywordsearchesandalsomadesuggestionsofdocumentstosubmit.AllfinalsubmittaldecisionsweremadebyLosey.

ObservationsontheErrorsofRelevanceJudgmentsinThisandOtherTopicsThiswasthemostfrustratingofalloftheTRECRecallTopicsfortheTeamtoworkonbecausethejudgmentsonrelevancecontainedmoreobviouserrorsandinconsistenciesthananyother.ThisTopicwasManateeCounty,asopposedtoTopic103,whichwasManateeProtection,whichofcoursereferredtotheendangeredmammal.Unfortunately,asalifelongFloridaattorney,LoseyhassubstantialindependentknowledgeofManateeCountyandmanatees.BottolenehadalsobeenaFloridaresidentforseveralyearsandanattorney.TheirdirectpersonalknowledgeofFloridaprovedtobeasignificantdisadvantageinthisTrack(and,toalesserextent,inotherTracks,especiallyonesthatcontainedobviouserrorsinrelevance)becauseTRECadjudicationswerenottiedtoactualfactsandreality(obviouslynooneatTRECwasaFloridaSME)andwereotherwisesurprising.ForinstanceinTopic108,eventhoughthesubjectwastheCountyofManatee,apoliticalentity,sometimes,butnotalways,anemailwithmerementionofthemammalmanateewouldbeconsideredrelevant,eventhoughtherewasnomentionoflocationorthecounty.Also,manyreferencestoManateeParkwereconsideredrelevanttoTREC,eventhoughthatparkis,asanyFloridianwouldknow,especiallyLoseywholivesinCentralFlorida,notlocatedinManateeCountyandotherwisehasnoconnectiontothecounty.Also,almostallemailaddressesthathadmanateeinthenamewerecalledrelevantbyTREC,eveniftheemailhadnothingtodowiththeCountyofManatee.Theremaywellbesomepatterntotheso-calledgoldstandardusedinthisTopic,butifso,itwasnotlogicalandnotknowntoBottoleneorLosey.ItappearedtotheseFloridians,afterthefact,tobelackofexpertiseonthepartofTREC.Otherteammembersreviewedtheseadjudicationslateragreed.Oneexamplewewerelaterabletofigureout:awell-knownFloridalawfirm(Holland&Knight)hasahomeofficeinBradenton,Florida,andtheattorneystherewouldoftenwritetothegovernor.Aspartofpost-hocanalysiswesawthatalmostalloftheseemailswereconsideredrelevantbyTRECassessorstothistopicsimplybecausetheofficecitywasintheirstandardsignaturelineaddress,eventhoughthecontentoftheemailshasnothingtodowithManateeCounty.SinceLoseyisusedtodirectinglegalsearchasanSME,ordirectSMEsurrogate,hisusualapproachtolegalsearchinvolvesusinghisknowledgeandunderstandingtodifferentiaterelevantfromirrelevant.Asmentioned,inlegalsearchunderstandingofrelevanceiscritical,infact,itisalegaldutyandresponsibilityoftheattorneysearchers.ThushispositionasanactualFloridaSMEservedasadisadvantageinmanyoftheBushemailTopics,includingthisone.TheTeamlaterencounteredotherTopicswithinconsistenciesandmistakeslikeTopic108.InsuchcasesweeventuallylearnedtostepoutoftheprocessandstoptryingtounderstandorlookforarationalbasisfortheTRECrelevancecalls.WewouldputasideourtraditionalSMErole,whichisotherwisethefirmlyestablishednorminlegalsearch.Instead,whenwefound

Page 41: e-Discovery Team at TREC 2015 Total Recall Track

14

ourselvesinthissituation(andthishappenedinalittlelessthanhalfoftheTopics),wewould

basicallyturnthesearchandsubmissiondecisionsovertoMr.EDR.Inthosesituationswedid

noteventrytoseeanypatternorconsistencytotheadjudications.Whenweadoptedthis

approachinlatertopicswedidquitewell,inspiteofdefectswesawintheTRECgoldstandards.

ThissuggeststhatTREC’sselectionofrelevantdocumentsinsomeoftheTopicssufferedfrom

over-delegationtocomputerselectionwithoutadequateSMEbasedqualitycontrols.Itis

unknownwhatsoftwarewasusedbyTRECtocreatetherelevantgoldstandarddocumentset,

butlikeanypredictivecodingsoftwaretoday,itobviouslycanbeledastraywithoutadequate

humansupervisionandqualitycontrolsafeguards.Thisiswhythee-DiscoveryTeamadoptsa

hybridapproach,computerandhuman,includingSMEs,andwhyinnormalcircumstancesStep

SevenforqualitycontrolissoimportantundertheirPredictiveCoding3.0method.

Topic108Description

OnJuly23,2015,aftermaking10documentsubmissionstoTRECprovidingatotal746

documents,Loseyhadfoundatotalof734relevantdocuments(Precisionof98.4%).Theeffort,

ornumberofdocumentsreviewedandcodedbyLoseytoattainthisresult,was696documents.

Afterthe10thTRECsubmission,LoseydecidedtocallReasonable.Itwaslaterdeterminedthata

Recallof31%hadbeenattained.ThedecisiontocallReasonableprovedtobeabigmistake

becausetheTRECadjudicationswerenotlimitedtoManateeCountyrelevanceastheTeamhad

assumed.Asmentioned,theerrorwasbasedupontheTeam’sconstructionofrelevanceina

muchnarrowermannerthanTREC.ThedivergencewasnotknownbecausetheTeamdidnotdo

enoughexplorationofirrationalconstructionsandsodidnotdetectthe,toourmind,outlier

natureofTREC’sapproachtothisTopic.

TheTeamshouldhavebeenlessprecise(itssubmissionshadaPrecisionof98.4%),andshould

havepresentedmoredocumentsforsubmission,eventhoughtheTeamdidnotpersonally

considerthemtoberelevant.Itshouldhavebettertesteditsrelevanceconcept.Butas

mentioned,asanSMELoseywasusedtosettingthescopeofrelevance,andaslawyers,the

entireTeamwasusedtorationaladjudicationsofrelevancealonglinesthatmakesensetothem.

Page 42: e-Discovery Team at TREC 2015 Total Recall Track

15

Thiswasanearlytopicforusintheprocessandwehadnotyetlearnedtomistrustourownassessments.Therewere6additionalsubmissionstoTRECaftertheReasonablecallpoint.Inretrospect,thiswasalsoanerror.TheTeamshouldhavesubmittedmultiplesmallersubmissionsaftertheystartedtodiscovertheoutliernatureoftheTRECadjudications,withtrainingbetweeneachsubmissionwhereMr.EDRcouldtakeoverinanautomatedfashion.Thiswasanothergame-typelessonlearnedthehardwaybythisTopic,whichprovedtobetheTeam’sworstperformance.EvenintheworstcasewithmultiplemistakestheTeamstillmanagedtoattain78%Recallwithreviewofonly696documents,andsubmissionofonly60,817ofthetotal290,099documents.Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%recallcall,andthedarkgreenlinetheReasonablerecallcall.

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheManateeCountytopic,bythetime97.5%Recallhadbeenattained90.95%ofthecorpus,263,843documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining9.05%or26,256documents.

Page 43: e-Discovery Team at TREC 2015 Total Recall Track

16

Thelastchartbelowrepresentstheamountofeffortintermsofdocumentsreviewedtoattain100%RecallusingthemultimodalhybridmodelofsearchandtrainingofMr.EDR.

CorrectionoftheGoldStandardRelevanceSetinTopic108SincetheTeamisconsideringuseoftheBushemailsetinfurthertesting,trainingandresearch,theywantedtotrytocorrectthemanydeficienciestheysawinTREC’sdeterminationofthegoldstandardforthisTopic.TheyalsowantedtobetterunderstandwhythescoreonthisTopicwassooutofrangefromtheirotherscores.Withthisinmindtheyre-reviewedtheTREC

Page 44: e-Discovery Team at TREC 2015 Total Recall Track

17

adjudicationsandsetupathree-attorneypeerreviewofallerrorsspottedintherelevancydeterminations.AconservativeapproachwastakenanddeferencewasgiventotheTRECadjudicationswherearational,consistentbasiscouldbefound.Losey’spersonal,narrowviewofwhatshouldberelevantwasnotfollowed,iftherewasareasonseentofollowTREC’sadjudications.(Note,theTeamandothersinthefiledofLegalSearch,haveobservedovermanyprojectsthatSMEstypicallytakeamorenarrowviewofrelevancethannon-SMEswho,bydefinition,donotunderstandthesubjectaswell.)Loseyacceptedalladverserulingsagainsthisownpositionsaspartofthisprocess.AlsonotethatsuggestionstoreviseTRECadjudicationscamefromallthreeTeammembers,notjustLosey,andwereallsubjecttomultiplereviewsandobjections.Afterthere-reviewandre-adjudicationprocesswascompleted,1,264documentsadjudicatedasrelevantbyTRECwerechangedtoIrrelevant.Further,3documentsadjudicatedasirrelevantbyTRECwerechangedtorelevant.BelowarethecorrectedmetricsoftheTeam’sreviewundertheimprovedadjudications.ConfusionMatrix(Adjusted)-Topic108TotalDocuments:290,099TotalAdjustedRelevant:1,114(was2,375)(1,264changedtoIrrelevant,3ChangedtoRelevant) TotalAdjustedPrevalence:0.38%(was0.82%)

Afterthe10thTRECsubmission,whenLoseydecidedtocallReasonable,Loseyhadfoundatotalof736relevantdocuments(anincreaseof2documents)undertheadjustedgoldstandard.ThiswasaRecallof66.07%andPrecisionof98.66%undertheadjustedstandard.TheF1measurewas79.14%.Notethatthesemetricsaremuchmoreinlinewiththeother29projects,althoughtheadjusted66%RecallisstilltheTeam’ssecondtolowestRecallscoreattheReasonablecallpoint.UnderthecorrectedstandardtheTeamattained94.43%Recallwithreviewofonly696documents,andsubmissionofonly60,817ofthetotal290,099documents.AgraphmappinghowthereviewbyRecallattainedafternumberofdocumentssubmittedisshownbelowwithboththeoriginalTRECstandard(blue)andtheTeamadjustedstandard(red).

Usingadjustedrelevantcalls

@Reas.Call

@97.5%Recall

TruePositives 736 1,087TrueNegatives 288,975 131,844FalsePositives 10 157,141FalseNegatives 378 27Recall 66.07% 97.58%Precision 98.66% 0.69F1Measure 79.14% 1.36%Accuracy 99.87% 45.82%Error 0.13% 54.18%Elusion 0.13% 0.02%Fallout 0.00% 54.38%

Page 45: e-Discovery Team at TREC 2015 Total Recall Track

18

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholdsundertheadjustedstandard.

______________________________________

Page 46: e-Discovery Team at TREC 2015 Total Recall Track

19

Topic2052PayingforAmazonBookReviewsConfusionMatrix-Topic2052TotalDocuments:465,147TotalRelevant:265 TotalPrevalence:0.06%

Topic2052wasrunbySullivan,whostartedonJuly20,2015,andconcludedJuly22,2015.ThiswasSullivan’sfirstTopic.Forthatreasonhespentmoretimethaninhislaterreviewsintryingtounderstandthedatasetandprocesses.Sullivanhasabackgroundincomputersandprogramming.Hehassubstantialexperienceinforumstounderstandtheuniquecharacteristicspresentinforumcommunications.Whileheconsidershimselffarmoreknowledgeablethantheaverageperson,hehasnoexperiencewiththeunethicalworldofBlackhatForumsanddoesnotconsiderhimselftobeabonafidesubjectmatterexpert(SME)onanyofthem.Allforumtopicspresentedauniquechallengeofidentifyingvariationsoftermsandunderstandinguseofslang.Whilethisprovedtobeeasytoovercome,itcertainlyplayedavitalroleintheprocessinawaynotnecessaryintheNewstopics,wherespellingerrorswerelargelynon-existent.Onthefirstday,SullivanstartedwithStepThree,RandomBaselineandreviewedarandomsampleof1,534documents.ThiswasusedbothasamethodtoestimateprevalenceandameansofgainingbetterunderstandingofthedatasetforthisandfuturetopicsinAtHome2.Thisrandomsampleyielded1relevantdocument.Basedonthesampleprevalencewepredicted303relevantdocumentsexistedinthedataset(95%confidencelevelwith2.5%marginoferror).Wewouldlaterdiscoverthedatasetcontained265relevantdocuments,whichiswellwithinthemarginoferror.Giventheamountoftimenecessarytocompletethisrandomsample,andthelittlevaluegained,StepThreewasomittedfromallsubsequenttopicsreviewedbySullivan.

@Reas.Call

@97.5%Recall

TruePositives 257 259TrueNegatives 464,364 464,165FalsePositives 518 717FalseNegatives 8 6Recall 96.98% 97.74%Precision 33.16% 26.54%F1Measure 49.42% 41.74%Accuracy 99.89% 99.84%Error 0.11% 0.16%Elusion 0.00% 0.00%Fallout 0.11% 0.15%

Page 47: e-Discovery Team at TREC 2015 Total Recall Track

20

DaytwowasspentrunningkeywordsearchestofinddocumentsforseedingintothepredictivecodingalgorithmandsubmittingdocumentstogetabetterunderstandingtheTRECstandardforrelevance.Attheendofdaytwo,273documentshadbeensubmitted,with204beingreturnedasrelevant.Thisprovidedanadequateseedsettobeingrelyingmoreheavilyonpredictivecoding.Ondaythree,Sullivandevelopedastrategywhichhereliedheavilyinfuturetopics.RatherthanrelyingonMr.EDRaloneandreviewingthedocumentsthatweregivenhighscoresbythemachine,heusedthemulti-modalapproachtoprioritizedocumentsforreview.Startingwithallvariationsof“Amazon”w/5“Review,”heworkeddownreviewingandcategorizingthehighestscoringdocumentsfirst.Whenhehitapointwherefewrelevantdocumentswerebeingfound,heiterativelyexpandedthescopeofhisreviewuniverse.Hemovedtoallvariationsof“Amazon”w/10“Review,then“Amazon”w/25“Review,”and“Amazon”AND“Review.”Heexpandedinto“Amazon”and(“Review”or“Book”or“Feedback”or“Purchase”)andeventuallytoanydocumentcontainingavariationof“Amazon.”Aspreviouslymentioned,theuniquecharacteristicsoftheforumsrequiredmorecreativesearchesthannecessaryinotherdatasets.UsingtheConceptSearchingtoolasaguide,itwasdeterminedthatalmostallreasonablevariationsof“Amazon”couldbefoundusingthefollowingsearch:(“amazon*”OR“@mazon”OR“@maz0n”OR“azmon*”OR“azmn*”OR“amzn*”).Thismethodprovedeffectiveineliminatingissuesofmisseddocumentsduetoslangormisspelling.Usingthismethod,Sullivanwasabletoidentify257ofthe265relevantdocumentsatthetimehecalledReasonableeffort.2,325totaldocumentshadbeenreviewed,includedthe1,534documentsintheinitialrandomsample.AftercallingReasonableeffort,Sullivancontinuedbysubmittingalldocumentsthatcontainedanyvariationoftheterm“Amazon”inorderofpriorityscoredescending.100%recallwasobtainedthroughthismethod.Allremainingdocumentswerethensubmittedindescendingpriorityorder,withnomorerelevantdocumentsbeingreturned.Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%Recallcall,slightlydarkerlinesignifies80%RecallcallandthedarkgreenlinetheReasonableRecallcall.

Page 48: e-Discovery Team at TREC 2015 Total Recall Track

21

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OnthePayingforAmazonBookReviewstopic,bythetime97.5%Recallhadbeenattainedonly0.21%ofthecorpus,976documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining99.79%or464,171documents.

Thelastchartbelowrepresentstheamountofeffortintermsofdocumentsreviewedtoattain100%recallusingthemultimodalhybridmodeloftrainingEDR.

______________________________________

Page 49: e-Discovery Team at TREC 2015 Total Recall Track

22

Topic2225Rootkits

ConfusionMatrix-Topic2225TotalDocuments:465,147TotalRelevant:182 TotalPrevalence:0.04%

Topic2225wasrunbyLoseywhostartedthesearchof290,099BlackHatForumpostsonJuly21,2015andconcludedonAugust18,2015.LoseyputasideworkonthisTopicseveraltimeswhilehegaveprioritytotheJebBushEmailTopics.TheprojectcommencedasusualwithLoseybeginningStepTwo,MultimodalSearchReviews.StepThree,RandomBaseline,wasomitted.Aftersubmissionsbegan,theechoStepFive,multimodalsearchbegan,includingpredictivecodingfeatures,withiteratedtraining.OnAugust,2015,aftermaking12submissionstoTREC,andtrainingafteralmosteverysubmission,Loseyhadprovidedatotal201documentstoTRECandconfirmedatotalof163relevantdocuments.Theeffort,ornumberofdocumentsreviewedandcodedbyLoseytoattainthisresult,was205documents.Afterthe12thTRECsubmission,LoseydecidedtocallReasonable.ItwaslaterdeterminedthataRecallof89.56%hadbeenattainedwithaPrecisionof81%.Therewere23additionalsubmissionstoTRECaftertheReasonablecallpoint.A90%Recallwasattainedaftersubmittingonly212documents.A95%Recallwasattainedaftersubmitting891documents,and97.5%Recallattainedafter3,188documents.TotalRecallwasattainedaftersubmitting12,109documentsoutofthecorpustotalof465,147.Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%Recallcall,andthedarkgreenlinetheReasonableRecallcall.

@Reas.Call

@97.5%Recall

TruePositives 163 178TrueNegatives 464,927 461,955FalsePositives 38 3,010FalseNegatives 19 4Recall 89.56% 97.80%Precision 81.09% 5.58%F1Measure 85.11% 10.56%Accuracy 99.99% 99.35%Error 0.01% 0.65%Elusion 0.00% 0.00%Fallout 0.01% 0.65%

Page 50: e-Discovery Team at TREC 2015 Total Recall Track

23

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheRootkitstopic,bythetime97.5%Recallhadbeenattainedonly0.69%ofthecorpus,3,188documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining99.31%or461,959documents.

Thelastchartbelowrepresentstheamountofeffortintermsofdocumentsreviewedtoattain100%RecallusingthemultimodalhybridmodelofsearchandtrainingofMr.EDR.

Page 51: e-Discovery Team at TREC 2015 Total Recall Track

24

______________________________________

Topic102CapitalPunishmentConfusionMatrix-Topic102CapitalPunishmentTotalDocuments:290,099TotalRelevant:1,624 TotalPrevalence:0.56%

@Reas.Call

@97.5%Recall

TruePositives 941 1,583TrueNegatives 288,345 17,048FalsePositives 130 271,427FalseNegatives 683 41Recall 57.94% 97.50%Precision 87.86% 0.58%F1Measure 69.83% 1.15%Accuracy 99.72% 6.42%Error 0.28% 93.58%Elusion 0.24% 0.24%Fallout 0.05% 94.09%

Page 52: e-Discovery Team at TREC 2015 Total Recall Track

25

Topic102wasrunbyLoseywiththeassistanceofareviewattorney,Jensen.Theworktosearchthe290,099BushEmailsstartedonJuly26,2015andconcludedonJuly29,2015.TheprojectcommencedwithLoseyandhisassistantbeginningStepTwo,MultimodalSearchReviews.StepThree,RandomBaseline,wasomitted.Aftersubmissionsbegan,theechoStepFive,multimodal,washandledwiththeassistance,atfirst,ofJensen.LoseyperformedalloftheAIrelatedsearchesinStepFive,includingtheprobabilityandrankingrelatedsearches.Hisassistantfocusedonkeywordsearchesandalsomadesuggestionsofdocumentstosubmit.Again,allfinaldecisionsonsubmittalweremadebyLosey.OnJuly28,2015,aftermaking20submissionstoTREC,andtrainingafteralmosteverysubmission,Loseyhadprovidedatotal1,071documentstoTRECandconfirmedatotalof941relevantdocuments.Theeffort,ornumberofdocumentsreviewedandcodedbyLoseytoattainthisresult,was1,493documents.Afterthe20thTRECsubmission,LoseydecidedtocallReasonable.ItwaslaterdeterminedthataRecallof57.94%hadbeenattainedwithaPrecisionof87.86%,sohiscallprovedtobeearly.Therewereonly3additionalsubmissionstoTRECaftertheReasonablecallpoint,whichwelaterlearnedwasamistake.WelearnedlaterthathigherRecallandoverallTRECscoringcomesfrommultiple,smallersubmissions,withtrainingaftereach.ThisisanotherTopicinwhichwefoundmanyoftheTRECjudgmentsinconsistentandincomprehensible.Still,evenwiththeseproblemsanderrors,aRecallof70%wasattainedafteratotalofonly7,785documentshadbeensubmittedoutof290,099,andonly1,493documentshadbeenreviewed.Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%RecallCall,andthedarkgreenlinetheReasonableRecallcall.

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheCapitalPunishmenttopic,bythetime97.5%Recallhadbeenattained94.11%ofthecorpus,273,010documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining5.89%or17,089documents.

Page 53: e-Discovery Team at TREC 2015 Total Recall Track

26

Thelastchartbelowrepresentstheamountofeffortintermsofdocumentsreviewedtoattain100%RecallusingthemultimodalhybridmodelofsearchandtrainingofMr.EDR.

______________________________________

Page 54: e-Discovery Team at TREC 2015 Total Recall Track

27

Topic 106TerriSchiavoConfusionMatrix-Topic106TotalDocuments:290,099TotalRelevant:17,135 TotalPrevalence:5.91%

Topic106wasrunbyLoseywiththeassistanceofareviewattorney,Bottolene.Theworktosearchthe290,099BushEmailsstartedonJuly27,2015andconcludedonAugust2,2015.TheprojectcommencedwithLoseyandhisassistantbeginningStepTwo,MultimodalSearchReviews.StepThree,RandomBaseline,wasomitted.Aftersubmissionsbegan,theechoStepFive,multimodalwashandledwiththeassistanceatfirstofBottolene.LoseyperformedalloftheAIrelatedsearchesinStepFive,includingtheprobabilityandrankingrelatedsearches.Hisassistantfocusedonkeywordsearchesandalsomadesuggestionsofdocumentstosubmit.Again,allfinaldecisionsonsubmittalweremadebyLosey.ThisreviewprocesswentlongerthanotherbecausethisprovedtobethehighestprevalenceTopic(5.91%).OnAugust2,2015,aftermaking25submissions,withtrainingaftermostofthese,Loseyhadsubmittedatotal17,354documents.Atotalof16,872ofthesesubmissionswereconfirmedrelevantbyTREC,foraPrecisionrateof97.22%.Theeffort,ornumberofdocumentsreviewedandcodedbyLoseytoattainthisresult,was2,025documents.Afterthe25thTRECsubmission,LoseydecidedtocallReasonable.ItwaslaterdeterminedthatanincredibleRecallof98.47%hadbeenattained.TheF1measurewas97.84%.ThatistheTeam’sbestresultonanyoftheBushEmailTopics.Further,LoseybelievesthismaybeapersonalbestforRecallandF1scores.Therewere7additionalsubmissionstoTRECaftertheReasonablecallpoint.Inthe29thsubmission,99.7%Recallwasattainedaftersubmittingonly7,060additionaldocuments.ThePrecisionwas70%.Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%recallcall,andthedarkgreenlinetheReasonableRecallCall.

@Reas.Call

@97.5%Recall

TruePositives 16,872 16,707TrueNegatives 272,482 272,551FalsePositives 482 413FalseNegatives 263 428Recall 98.47% 97.50%Precision 97.22% 97.59%F1Measure 97.84% 97.54%Accuracy 99.74% 99.71%Error 0.26% 0.29%Elusion 0.10% 0.16%Fallout 0.18% 0.15%

Page 55: e-Discovery Team at TREC 2015 Total Recall Track

28

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheTerriSchiavotopic,bythetime97.5%Recallhadbeenattainedonly5.90%ofthecorpus,17,120documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining94.10%or272,979documents.

Thelastchartbelowrepresentstheamountofeffortintermsofdocumentsreviewedtoattain100%RecallusingthemultimodalhybridmodelofsearchandtrainingofMr.EDR.

Page 56: e-Discovery Team at TREC 2015 Total Recall Track

29

______________________________________

Topic105AffirmativeActionConfusionMatrix-Topic105TotalDocuments:290,099TotalRelevant:3,635 TotalPrevalence:1.25%

@Reas.Call

@97.5%Recall

TruePositives 3,353 3,544TrueNegatives 286,399 281,585FalsePositives 65 4,879FalseNegatives 282 91Recall 92.24% 97.50%Precision 98.10% 42.08%F1Measure 95.08% 58.78%Accuracy 99.88% 98.29%Error 0.12% 1.71%Elusion 0.10% 0.03%Fallout 0.02% 1.70%

Page 57: e-Discovery Team at TREC 2015 Total Recall Track

30

Topic105wasrunbyLoseywiththeassistanceofareviewattorney,Jensen.Theworktosearchthe290,099BushEmailsstartedonJuly29,2015andconcludedonJuly31,2015.TheprojectcommencedwithLoseyandhisassistantbeginningStepTwo,MultimodalSearchReviews.StepThree,RandomBaseline,wasomitted.Aftersubmissionsbegan,theechoStepFive,multimodal,wasperformedwiththeassistanceatfirstofJensen.LoseyhandledalloftheAIrelatedsearchesinStepFive,includingtheprobabilityandrankingrelatedsearches.Hisassistantfocusedonkeywordsearchesandalsomadesuggestionsofdocumentstosubmit.Again,allfinaldecisionsonsubmittalweremadebyLosey.OnJuly30,2015,aftermaking23documentsubmissionstoTRECprovidingatotal3,418documents,Loseyhadfoundatotalof3,353relevantdocuments.Theeffort,ornumberofdocumentsreviewedandcodedbyLoseytoattainthisresult,was674documents.Afterthe23rdTRECsubmission,LoseydecidedtocallReasonable.ItwaslaterdeterminedthataRecallof92.24%hadbeenattained,withPrecisionof98.1%,andF1of95.08%.Therewere7additionalsubmissionstoTRECaftertheReasonablecallpoint.Inthe27thsubmission,aftersubmittingonly3,427additionaldocuments(total6,845),95%Recallwasattained.Thiswasattainedaftersubmissionofonly2.36%ofthetotaldocuments.Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%recallcall,andthedarkgreenlinetheReasonableRecallcall.

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheAffirmativeActiontopic,bythetime97.5%Recallhadbeenattainedonly2.90%ofthecorpus,8,423documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining97.10%or281,676documents.

Page 58: e-Discovery Team at TREC 2015 Total Recall Track

31

Thelastchartbelowrepresentstheamountofeffortintermsofdocumentsreviewedtoattain100%RecallusingthemultimodalhybridmodelofsearchandtrainingofMr.EDR.

______________________________________

Page 59: e-Discovery Team at TREC 2015 Total Recall Track

32

Topic3357OccupyVancouverConfusionMatrix-Topic3357TotalDocuments:902,434TotalRelevant:629 TotalPrevalence:0.07%

Topic3357wasrunbyReichenberger.Theworktosearchthe902,434NewsArticlesdatabasestartedonJuly29,2015,andcompletedonJuly30,2015.Theinitialsubmissionsonthefirstdayweretotesttheoutlinesofthecategory.Theinitialsearchof“Occupy”AND“Vancouver”identifiedaseriesofprotestsinVancouverabouteconomicincomeinequality.Documentswereselectedbasedonavaryingofcontent,including“Occupy”movementsinothercities,riots/proteststhattookplaceinthesamearea(butnotsametime)astheOccupyVancouverprotests,andgenericstoriesabout“Occupy”proteststhatreferenceprotestsinVancouverbutdonotspecificallynamethemas“OccupyVancouver.”Varioussourceswerealsotested,suchasLetterstotheEditor,storiessourcedinothercitiesandsoforth.Resultshelpedformulateananticipatedruleonrelevance.AftertrainingEDRandreceivingpriorityscores,relevantdocumentsonsubsequentsubmissionswereconfirmedbytheserulesandtheirpriorityscores.Infact,ofthefiveirrelevantdocumentsfoundinthelast2submissionsonJuly29th,threescoredover97%andcontainedsubstantialanddirectreferencestoOccupyVancouver;thesemaybeTRECcodingerrors.AmodifiedStepThree,RandomSampleof1,000documentswastakenafterStepTwowascomplete.Thefirst500contained50“training”documentstofocuson,whilethesecond500documentscontained250.Alldocumentshittingon“Occupy”OR“Vancouver”OR“AshlieGough”(astudentwhodiedattheprotests)OR“RobsonSquare”(locationoftheprotests)werereviewed,whileallothersmasstrainedasirrelevant.ThelastTRECsubmissiononJuly29thwasfromthe1,000randomdocuments.Ofthe1,000documents,33wereidentifiedasrelevant,confirmedbysubmission.

@Reas.Call

@97.5%Recall

TruePositives 576 613TrueNegatives 901,680 900,834FalsePositives 125 971FalseNegatives 53 16Recall 91.57% 97.46%Precision 82.17% 38.70%F1Measure 86.62% 55.40%Accuracy 99.98% 99.89%Error 0.02% 0.11%Elusion 0.01% 0.00%Fallout 0.01% 0.11%

Page 60: e-Discovery Team at TREC 2015 Total Recall Track

33

Onthesecondday,the30th,submissionsbydocumentscontainingsearchtermsandescalatedasrelevantwerereviewedandsubmittedinpriorityorder.Inthefirstsubmissionoftheday,123weresubmittedasrelevantand118camebackasconfirmedrelevant.Ofthefiveirrelevantinthatset,fourweredocumentsthathadtheexactsamerelevanttextasdocumentsTRECpreviouslyconfirmedasrelevant.Thisisanotherexampleofthekindof“goldstandard”inconsistenciestheTeamencounteredinmostoftheTopics.Inthenextsetofsubmissions,documentsescalatedasrelevantbyMr.EDRincludedstoriessourcedintheVancouverpaperonOccupymovementselsewhere,andsportsstorieswiththeword“occupy”inthearticle(e.g.“AnotherVancouverplayeroccupiedthepenaltybox”).Oncethosedocumentswereremovedasirrelevant,allothersweresubmittedandconfirmedasrelevantonsubmission.Someadditional“grayarea”documentsweresubmitted(e.g.“OccupyChristmas”whichwasanoffshootoftheprotests,orcampaignquestionsposedtocandidatesabouttheOccupyVancouverprotests).AstheMr.EDRrankingscoresdecreased,theprecisiondropped.Priortothefinalsubmissions,alldocumentswith“Occupy”and“Vancouver”withrelevanceprobabilityscoresover0.1%hadeitherbeensubmittedorreviewed,andalldocumentswithscoresover75%withoutthosetermshadalsobeenreviewed.AfterthefinalReasonablecallwasmadetheremainingdocumentsweresubmittedinthefollowinggroupsindescendingpriorityorder:1)alldocumentscurrentlycodedasirrelevantbythehumanreviewernotyetsubmitted(2,212documents,ofwhich45werefoundtoberelevant);2)anythingremainingwith“Occup!”AND“Vancouver”(493documents,allthesehadscoresbelow0.1%,ofwhich8werefoundtoberelevant);andthen3)allelse(norelevantdocumentsfoundinthisset).Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%Recallcall,andthedarkgreenlinetheReasonablecall.

Page 61: e-Discovery Team at TREC 2015 Total Recall Track

34

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheOccupyVancouvertopic,bythetime97.5%Recallhadbeenattainedonly0.18%ofthecorpus,1,584documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining99.82%or900,850documents.

Thelastchartbelowrepresentstheamountofeffortintermsofdocumentsreviewedtoattain100%RecallusingthemultimodalhybridmodelofsearchandtrainingofMr.EDR.

Page 62: e-Discovery Team at TREC 2015 Total Recall Track

35

______________________________________

Topic 2158UsingTORforAnonymousBrowsingontheInternetConfusionMatrix-Topic2158TotalDocuments:465,149TotalRelevant:1,261 TotalPrevalence:0.27%

Topic2158wasrunbySullivanwhoalsostartedonJuly29,2015.Hefinishedhisreviewof465,149forumpostsinBlackHatWorldonJuly31,2015Sullivan’scomputerbackgroundprovedtobehelpfulinanotheruncommonforumtopic.Heconsidershimselfmoreknowledgeableonthistopicthantheaverageperson,butdoesnotconsiderhimselftobeasubjectmatterexpertonTOR.Day1ofthistopicstartedwithconceptsearchingtofindotherkeywordsrelatingtoTORandanonymousbrowsing.Manypreviouslyunknowntermscametolight,suchasvpn,torbrowser,proxy,andip.ThisprocessofusingconceptsearchingatthebeginningofeverytopicbecamestandardprocessforallremainingreviewsdonebySullivan.Theresultsofthisexercisewereusedinfuturekeywordsearchesaswellasdatabase-widekeywordhighlighting.Next,Sullivanstartedmanuallyreviewingsomeofthehitsontermshefeltwouldbemostlikelytoyieldresponsivedocuments.Startingwith102documentsthathiton“TOR”and“anonym*”andmovingontohitson“TORBrowser,”then“TOR”and“Prox*.”Itwasnotdifficulttofindarelativelyhighquantityofrelevantdocuments.108relevantdocumentsand100irrelevantdocumentsweretrainedforpredictivecodingwhenthefirstlearningsessionwasrun.Afterthefirstlearningsessioncompleted,Sullivanmanuallyreviewedthehighestscoringdocumentsthatcontainedtheterm“TOR”andfoundalmostalltoberelevant.Atthe

@Reas.Call

@97.5%Recall

TruePositives 1,243 1,230TrueNegatives 463,793 463,824FalsePositives 95 64FalseNegatives 18 31Recall 98.57% 97.54%Precision 92.90% 95.05%F1Measure 95.65% 96.28%Accuracy 99.98% 99.98%Error 0.02% 0.02%Elusion 0.00% 0.01%Fallout 0.02% 0.01%

Page 63: e-Discovery Team at TREC 2015 Total Recall Track

36

conclusionofthefirstday,214documentshadbeensubmittedtoTREC,withall214being

returnedasrelevant.

Day2consistedofmanyiterationsoflearningsessionsandevaluatingsearchresults.Similarto

howSullivanreviewedTopic2052,hestartedwithanarrowlistofkeywordsearchesand

broadenedthetermsiteratively.Foreachset,hereviewedthedocumentswiththehighest

predictivecodingscores.Startingthedaywith“TOR”and“prox*,”hemovedto“TryTOR,”“Try

usingTOR,”and“UseTOR.”Eventuallyhemovedtoalldocumentsthatcontained“TOR”or

“T0R.”EverydocumenthedeterminedtoberelevantwassubmittedtoTREC.

Attheendoftheexercise,Sullivanhadsubmitted1,339documents,with1,244beingreturned

asrelevantand95beingreturnedasnotrelevantaccordingtotheTRECstandard.Atthispoint

hecalledhisshotatReasonableRecall.

Day3startedwiththesubmissionofallremainingdocumentsthatcontainedtheterm“TOR”as

amethodtocatchanydocumentspotentiallymissed.Noadditionalrelevantdocumentswere

returned.

Allremainingdocumentsinthedatabaseweresubmittedinorderofdescendingpredictive

codingscore.14morerelevantdocumentswerereturned.Evaluationofthesedocumentsled

tofindingspectacularerrorsintheTRECstandard.All14contained“*tor*”insomecontext,

butnonehadanyevenmarginallinkstothecurrenttopic.Amajorityofthemisseddocuments

containedtheterm“hostigator.com.”Evaluationofthese14documentsresultedina

determinationthatall14werecausedbyanerrorintheTRECclassificationsystem.

Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenline

signifyingtheanticipated70%Recallcall,andthedarkgreenlinetheReasonableRecallcall.

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocuments

submitted(green)astrackedacrossvaryingrecallthresholds.OntheUsingTORforAnonymous

InternetBrowsingtopic,bythetime97.5%Recallhadbeenattainedonly0.28%ofthecorpus,

Page 64: e-Discovery Team at TREC 2015 Total Recall Track

37

1,294documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining99.72%or463,855documents.

Thelastchartbelowrepresentstheamountofeffortintermsofdocumentsreviewedtoattain100%RecallusingthemultimodalhybridmodelofsearchandtrainingofMr.EDR.

______________________________________

Page 65: e-Discovery Team at TREC 2015 Total Recall Track

38

TOPIC104NewMedicalSchools

ConfusionMatrix-Topic104TotalDocuments:290,099TotalRelevant:227 TotalPrevalence:0.08%

Topic104wasrunbyLoseywiththeassistanceofareviewattorney,Jensen.Theworktosearchthe290,099BushEmailsstartedonJuly31,2015andconcludedonAugust4,2015.TheprojectcommencedwithLoseyandhisassistantbeginningStepTwo,MultimodalSearchReviews.StepThree,RandomBaseline,wasomitted.Aftersubmissionsbegan,theechoStepFive,multimodal,wasperformedwiththeassistanceatfirstofJensen.LoseyhandledalloftheAIrelatedsearchesinStepFive,includingtheprobabilityandrankingrelatedsearches.Hisassistantfocusedonkeywordsearchesandalsomadesuggestionsofdocumentstosubmit.Again,allfinaldecisionsonsubmittalweremadebyLosey.OnAugust3,2015,aftermaking8documentsubmissionstoTRECprovidingatotal199documents,Loseyhadfoundatotalof157relevantdocuments.Theeffort,ornumberofdocumentsreviewedandcodedbyLoseytoattainthisresult,was1,091documents.Afterthe8thTRECsubmission,LoseydecidedtocallReasonable.ItwaslaterdeterminedthataRecallof69.16%hadbeenattained,withPrecisionof78.89%,andF1of73.71%.HemadethecalldecisionalittleprematurelyonthisTopic.Inthenextsubmissionofonly20documents,LoseybroughttheRecalllevelupto71.37%withPrecisionof73.97%.Inthenextsubmissionof781documentshebroughttheRecalllevelto77.97%.Therewereatotalof7additionalsubmissionstoTRECaftertheReasonablecallpoint.Aftersubmittingatotalof1,611documents,whichisonly0.56%ofthetotaldocuments,andreviewingonly1,091documents,an80%Recallwasattained.Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%Recallcall,andthedarkgreenlinetheReasonableRecallcall.

@Reas.Call

@97.5%Recall

TruePositives 157 222TrueNegatives 289,830 51,763FalsePositives 42 238,109FalseNegatives 70 5Recall 69.16% 97.80%Precision 78.89% 0.09%F1Measure 73.71% 0.19%Accuracy 99.96% 17.92%Error 0.04% 82.08%Elusion 0.02% 0.01%Fallout 0.01% 82.14%

Page 66: e-Discovery Team at TREC 2015 Total Recall Track

39

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheNewMedicalSchoolstopic,bythetime97.5%Recallhadbeenattained82.16%ofthecorpus,238,331documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining17.84%or51,768documents.

Thelastchartbelowrepresentstheamountofeffortintermsofdocumentsreviewedtoattain100%RecallusingthemultimodalhybridmodelofsearchandtrainingofMr.EDR.

Page 67: e-Discovery Team at TREC 2015 Total Recall Track

40

______________________________________

Topic109ScarletLetterLaw

ConfusionMatrix-Topic109ScarletLetterLawTotalDocuments:290,099TotalRelevant:506 TotalPrevalence:0.17%

@Reas.Call

@97.5%Recall

TruePositives 485 494TrueNegatives 289,568 289,502FalsePositives 25 91FalseNegatives 21 12Recall 95.85% 97.63%Precision 95.10% 84.44%F1Measure 95.47% 90.56%Accuracy 99.98% 99.96%Error 0.02% 0.04%Elusion 0.01% 0.00%Fallout 0.01% 0.03%

Page 68: e-Discovery Team at TREC 2015 Total Recall Track

41

Topic109wasrunbyLoseywiththeassistanceofareviewattorney,Bottolene.Theworktosearchthe290,099BushEmailsstartedonAugust3,2015andconcludedonAugust11,2015.TheprojectcommencedwithLoseyandhisassistantbeginningStepTwo,MultimodalSearchReviews.StepThree,RandomBaseline,wasomitted.Aftersubmissionsbegan,theechoStepFive,multimodal,wasperformedwiththeassistanceatfirstofBottolene.LoseyhandledalloftheAIrelatedsearchesinStepFive,includingtheprobabilityandrankingrelatedsearches.Hisassistantfocusedonkeywordsearchesandalsomadesuggestionsofdocumentstosubmit.Again,allfinaldecisionsonsubmittalweremadebyLosey.OnAugust11,2015,aftermaking26submissionstoTRECprovidingatotal510documents,Loseyhadfoundatotalof485relevantdocuments.Theeffort,ornumberofdocumentsreviewedandcodedbyLoseytoattainthisresult,was953documents.Afterthe26thTRECsubmission,LoseydecidedtocallReasonable.ItwaslaterdeterminedthataRecallof95.85%hadbeenattained,withPrecisionof95.1%.Therewere14additionalsubmissionstoTRECaftertheReasonablecallpoint.Inthenextsubmissionafterthecallofonly121documentsaRecallof98.62%wasattained.Recallof100%wasattainedthreesubmissionslateraftersubmittingonly1,074documents,0.37%ofthetotal,andreviewofonly953documents.Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%recallcall,andthedarkgreenlinetheReasonableRecallcall.

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheScarletLetterLawtopic,bythetime97.5%Recallhadbeenattainedonly0.20%ofthecorpus,585documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining99.80%or289,514documents.

Page 69: e-Discovery Team at TREC 2015 Total Recall Track

42

Thelastchartbelowrepresentstheamountofeffortintermsofdocumentsreviewedtoattain100%RecallusingthemultimodalhybridmodelofsearchandtrainingofMr.EDR.

______________________________________

Page 70: e-Discovery Team at TREC 2015 Total Recall Track

43

Topic100SchoolandPreschoolFundingConfusionMatrix-Topic100TotalDocuments:290,097TotalRelevant:4,542 TotalPrevalence:1.57%

Topic100wasrunbyLoseywiththelimitedassistanceofareviewattorney,Jensen.Theworktosearchthe290,099BushEmailsstartedonAugust4,2015andconcludedonAugust8,2015.TheprojectcommencedwithLoseyandhisassistantbeginningStepTwo,MultimodalSearchReviews.StepThree,RandomBaseline,wasomitted.Aftersubmissionsbegan,theechoStepFive,multimodal,wasperformedwithsomeassistanceatfirstofJensen.LoseyhandledalloftheAIrelatedsearchesinStepFive,includingtheprobabilityandrankingrelatedsearches.Hisassistantfocusedonkeywordsearchesandalsomadeacoupleofsuggestionsofdocumentstosubmit.Again,allfinaldecisionsonsubmittalweremadebyLosey.OnAugust6,2015,aftermaking44submissionstoTRECprovidingatotal2,537documents,Loseyhadfoundatotalof2,441relevantdocuments.Theeffort,ornumberofdocumentsreviewedandcodedbyLoseytoattainthisresult,was651documents.Afterthe44thTRECsubmission,LoseydecidedtocallReasonable.Thisprovedtobeaprematurecall.ItwaslaterdeterminedthataRecallof53.74%hadbeenattained,withPrecisionof96.22%,andF1of68.96%.Therewere19additionalsubmissionstoTRECaftertheReasonablecallpoint.Aftersubmittingatotalof7,541documents,whichisonly2.6%ofthetotaldocuments,andreviewingonly651documents,a70%Recalllevelwasattained.ARecallof80%wasattainedaftersubmitting6.28%ofthetotaldocuments,andRecallof90%aftersubmitting7.92%.Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%recallcall,andthedarkgreenlinethereasonablerecallcall.

@Reas.Call

@97.5%Recall

TruePositives 2,441 4,429TrueNegatives 285,459 199,460FalsePositives 96 86,095FalseNegatives 2,101 113Recall 53.74% 97.51%Precision 96.22% 4.89%F1Measure 68.96% 9.32%Accuracy 99.24% 70.28%Error 0.76% 29.72%Elusion 0.73% 0.06%Fallout 0.03% 30.15%

Page 71: e-Discovery Team at TREC 2015 Total Recall Track

44

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheSchoolandPreschoolFundingtopic,bythetime97.5%Recallhadbeenattainedonly31.20%ofthecorpus,90,524documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining68.80%or199,573documents.

Thelastchartbelowrepresentstheamountofeffortintermsofdocumentsreviewedtoattain100%RecallusingthemultimodalhybridmodelofsearchandtrainingofMr.EDR.

Page 72: e-Discovery Team at TREC 2015 Total Recall Track

45

______________________________________

Topic107TortReformConfusionMatrix-Topic107TotalDocuments:290,099TotalRelevant:2,369 TotalPrevalence:0.82%

@Reas.Call

@97.5%Recall

TruePositives 1,950 2,310TrueNegatives 287,421 284,197FalsePositives 309 3,533FalseNegatives 419 59Recall 82.31% 97.51%Precision 86.32% 39.53%F1Measure 84.27% 56.26%Accuracy 99.75% 98.76%Error 0.25% 1.24%Elusion 0.15% 0.02%Fallout 0.11% 1.23%

Page 73: e-Discovery Team at TREC 2015 Total Recall Track

46

Topic107wasrunbyLoseywiththelimitedassistanceofareviewattorney,Jensen.Theworktosearchthe290,099BushEmailsstartedonAugust5,2015andconcludedonAugust15,2015.TheprojectcommencedwithLoseyandhisassistantbeginningStepTwo,MultimodalSearchReviews.StepThree,RandomBaseline,wasomitted.Aftersubmissionsbegan,theechoStepFive,multimodal,wasperformedwithsomeassistanceatfirstofJensen.LoseyhandledalloftheAIrelatedsearchesinStepFive,includingtheprobabilityandrankingrelatedsearches.Hisassistantfocusedonkeywordsearchesandalsomadeacoupleofsuggestionsofdocumentstosubmit.Again,allfinaldecisionsonsubmittalweremadebyLosey.OnAugust14,2015,aftermaking48submissionstoTRECprovidingatotal2,259documents,Loseyhadfoundatotalof1,950relevantdocuments.Theeffort,ornumberofdocumentsreviewedandcodedbyLoseytoattainthisresult,was1,164documents.Afterthe48thTRECsubmission,LoseydecidedtocallReasonable.ItwaslaterdeterminedthataRecallof82.31%hadbeenattained,withPrecisionof86.32%,andF1of84.27%.Therewere31additionalsubmissionstoTRECaftertheReasonablecallpoint.Aftersubmittingatotalof2,648documents,whichisonly0.91%ofthetotaldocuments,andreviewingonly1,164documents,a90%Recalllevelwasattainedwith80.55%Precision.Recallof95%wasattainedaftersubmitting3,963documents,1.37%oftotal.Recallof98%wasattainedaftersubmitting5,843documents,2.01%oftotal.Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%Recallcall,andthedarkgreenlinetheReasonableRecallcall.

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheTortReformtopic,bythetime97.5%Recallhadbeenattainedonly2.01%ofthecorpus,5,843documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining97.99%or284,256documents.

Page 74: e-Discovery Team at TREC 2015 Total Recall Track

47

Thelastchartbelowrepresentstheamountofeffortintermsofdocumentsreviewedtoattain100%RecallusingthemultimodalhybridmodelofsearchandtrainingofMr.EDR.

______________________________________

Page 75: e-Discovery Team at TREC 2015 Total Recall Track

48

Topic3481Fracking

ConfusionMatrix-Topic3481FrackingTotalDocuments:902,434TotalRelevant:1,966 TotalPrevalence:0.22%

Topic3481wasrunbySullivanwhostartedonAugust4,2015.Hefinishedhisreviewof902,434NewsArticlesonAug.7,2015after7totalhoursofeffort.

Sullivanhadnobackgroundorknowledgeoffrackingpriortothisexercise.Whileexpert

knowledgewasnotnecessary,therewereafewinstanceswheresomeadditionalknowledgeof

thetopicwouldhavebeenhelpful.

Sullivanhadpreviouslytackledtopicsintheforumsdataset,butthiswashisfirsttopicinthe

Newsdataset.Hefoundthelackofspellingissuesandoverallconsistencyinthedocuments

providedamucheasiersetofdatatoreview.Muchlessmanualreviewwasnecessarywiththe

newstopics.

Onthefirstday,Sullivanusedconceptsearchingtoidentifysimilartopics,perhisstandard

process.Hecreatedalistofmostlikelyrelevantkeywordsandusedthelistforsearchingand

keywordhighlighting.Bothsearchandkeywordhighlightinglistsweremodifiedthroughthe

courseofthereviewasnewinformationwasobtained.

Sullivandecidedtogowithadifferentapproachtothistopic.Ratherthanperformingamanual

reviewofdocumentstobegin,hedecidedtosubmitasrelevantanydocumentthatcontained

over5instancesoftheterm“fracking”withoutreview.286documentsmetthisstandard,and

allwerereturnedasrelevantwhensubmittedtoTREC.

Whilethedatausedforthisexercisedidnotcontainanymetadata,Sullivandeterminedanytext

thatappearedinthefirst2linesofthedocumentcouldbeconsideredthedocument’stitle.He

found61documentsthatcontained“fracking”inthetitleandanadditionalinstanceoffracking

elsewhereinthedocument.All60werereturnedasrelevant,with1onenotrelevant.Further

@Reas.Call

@97.5%Recall

TruePositives 1,893 1,917TrueNegatives 900,284 899,841FalsePositives 184 627FalseNegatives 73 49Recall 96.29% 97.51%Precision 91.14% 75.35%F1Measure 93.64% 85.01%Accuracy 99.97% 99.93%Error 0.03% 0.07%Elusion 0.01% 0.01%Fallout 0.02% 0.07%

Page 76: e-Discovery Team at TREC 2015 Total Recall Track

49

evaluationdeterminedthenotrelevantdocumentwasanerrorintheTRECstandard.Next,9documentswerefoundwhichcontained“hydrofracking”inthetitle.All9werereturnedasrelevant.Hethencontinuedwithslightvariationsuntilsubmittingalldocumentsthatcontain2ormorehitsontheterm“fracking.”After1hourandmanualreviewof29documents,746documentshadbeensubmittedwith745beingreturnedasrelevant.Sullivancontinuedmanuallyreviewingthedocumentswithasinglehitonfrackingtosortoutthefalsepositives.Afterreviewingacouplesetsofdocuments,heinitiatedhisfirstpredictivecodinglearningsessionforthistopic.OnthestartofDay2,Sullivanbelievedhehadfoundnearlyallrelevantdocumentsforthistopic.However,afterreviewingdocumentswithhighpredictivecodingscores,hequicklyrealizedthat“fracturing”wasanotherkeytermhehadn’tpreviouslyconsidered.Theuseofpredictivecodinghelpedhimquicklyfindanadditional400relevantdocumentsthatwouldhavebeenlostifusingkeywordsearchingalone.ReasonableRecallwascalledaftersubmitting2,077documents,with1,893returnedasrelevant.Theremainingdocumentsweresubmittedinorderofdescendingpredictivecodingscores,and73morerelevantdocumentswerereturned.AnevaluationofthereturneddocumentscontainedmanyerrorsintheTRECstandard,aswellasafairnumberofrelevantdocumentsthatwerenotproperlycapturedduetoSullivan’slackofknowledgeoffrackingandrelatedminingterms.Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%recallcall,andthedarkgreenlinethereasonablerecallcall.

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheFrackingtopic,bythetime97.5%Recallhadbeenattainedonly0.27%ofthecorpus,2,439documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining99.73%or899,995documents.

Page 77: e-Discovery Team at TREC 2015 Total Recall Track

50

Thelastchartbelowrepresentstheamountofeffortintermsofdocumentsreviewedtoattain100%recallusingthemulti-modalhybridmodeloftrainingEDR.

______________________________________

Page 78: e-Discovery Team at TREC 2015 Total Recall Track

51

Topic3431KingstonMillsLockMurdersConfusionMatrix-Topic3431TotalDocuments:902,434TotalRelevant:1,111 TotalPrevalence:0.12%

Topic3431wasrunbyReichenberger.Theworktosearchthe902,434NewsArticlesdatabasestartedonAugust4,2015,andwascompletedonAugust5,2015.Theinitialsubmissionsonthefirstdayweretotesttheoutlinesofthecategory.Theinitialsearchof“Kingston”AND“murder”identifiedasensationalizedmurderstoryaboutamanwiththelastname“Shafia”murderinghisdaughtersinan“honorkilling.”Documentscontainingtheinformationinvariousforms(headline,text,“clickbait”linkreferenceatendofarticle)weresubmitted.Resultshelpedformulateananticipatedruleonrelevance.AftertrainingMr.EDRandreceivingrelevancepriorityscores,asearchonthespecificvictimnamesor“Shafia”weresortedbyprioritizationorder.Samplesof10documentsabove90%,10between80-90%,10between60-80%,10between25-60%and10below25%showedthatdocumentsabove60%wereverylikelyrelevant.Infact,documentsscoringover90%allhadmultiplenamehitsandwerespecificallyonpoint;documentsinthemiddlerangeswereusuallyindirectlyrelated(e.g.about“honorkilling,”ordomesticabuse,ormoreofacasualreferencetotheKingstonMillsmurders);andthosedocumentsbelow5%werealmostalwaysirrelevant.Asatest,thesecondsubmissioncontainedalldocumentswithascoreover90%,alongwithsamplesofseveraldocumentsatvariousscoresgreaterthan50%,cuttingthesubmissionoffat200documentseven.Withonly111documentsreviewedeyesontothispoint,Reichenbergerhada98.5%precisionon205documentssubmitted.Ofthe205documentssubmittedtothispoint,theonly3irrelevantdocumentsallhadthesametrait:“Shafia”appearedintheheaderbuttherewasnoreferencetoitinthetext.Similardocumentsweremasscodedasirrelevantgoingforward.Likewise,peoplewithnamessimilartothevictimswerefoundinthe40-60%probabilityrangebutwere“falsepositive”documents.TheseincludedanAPphotographer,thePresidentofGambia,andprotestersinYemenwithfirst

@Reas.Call

@97.5%Recall

TruePositives 1,107 1,084TrueNegatives 901,309 901,311FalsePositives 14 12FalseNegatives 4 27Recall 99.64% 97.57%Precision 98.75% 98.91%F1Measure 99.19% 98.23%Accuracy 100.00% 100.00%Error 0.00% 0.00%Elusion 0.00% 0.00%Fallout 0.00% 0.00%

Page 79: e-Discovery Team at TREC 2015 Total Recall Track

52

namesthesameasoneofthevictims.Searchesweredoneonthosespecificnamesandmass-taggedasirrelevant.Afteramachinelearningsession,thescoresadjusteddroppingthosefalsepositivenamestothebottom.Atthispoint,asamplingofkeytermhitsshowedeverythingover20%scoreswererelevant,andeverythingbelow1%wereirrelevant.Everythinginbetweenwerelowqualityreferencestothemurderswithsomeirrelevantdocumentsmixedin.Assuch,thenextsubmissionwasforeverythingwithakeytermover25%relevantscore(456documents)ofwhich449werefoundrelevant.The7documentsfoundirrelevantweremisclicksbyReichenberger(humanerror).Inonecaseadocumentwasprimarilyaboutadifferentmurder,butlaterinthearticletherewasrelevantdiscussionofthetargetmurder.Mr.EDRpickedthisup,butitwasapparentlymissedbyTREC’srelevancescopeadjudications.The70%Recallcallwasthenmadehavingreviewedonly209documents.ItturnedoutthatRecallwasactually58.6%withPrecisionat98.5%.Thenextsubmissionconsistedlargelyofdocumentscontainingasinglelineof“clickbait”linktextfoundbyTRECtoberelevant.Otherdocumentsconsideredweredocumentswithkeytermsthathadscoresraiseabove20%followingthemachinelearningsessionfromtheprevioussetanddocumentswithscoresabove50%withnokeyterms.Whiledocumentswithkeytermswerelargelyfoundtoberelevant,mostofthedocumentswithoutthetermswerefoundtobeirrelevant.Infact,documentsscoringabove70%wereoftentangentialtotheissuesinthemurder(domesticviolencemostly)butnotrelevant,whilethose50-70%hadnosemblanceofrelevanceatall,andwerebeingescalatedbasedoncoincidental“clickbait”textadvertisementlinesattheendofthearticle.Another459documentsweresubmittedwith456werefoundrelevant.Thethreeirrelevantdocumentsallwereonthelowendscoreswithinthesubmissionandwereonlypassingreferencestothecase.Atthispointthe80%recallcallwasmade.Recallwasactuallyat99.64%withaprecisionat99.34%.Only272documentswerereviewedeyesontothispoint,and1120relevantdocumentshadbeenfound.Alldocumentswithscoresover70%hadbeenreviewedorsubmitted,andallthosewithkeytermsandscoresover20%hadbeenreviewedorsubmitted.Followingthesubsequentmachinelearningsession,30documentswereescalatedtoconsider.Oneborderlinedocumentwasconsideredpotentiallyrelevantandsubmitted,returnedasirrelevant,whiletherestallmarkedirrelevant.TheReasonablecallwasmade.AftertheReasonablecallwasmadedocumentsweresubmittedinthefollowinggroupsindescendingpriorityscoreorder:1)threedocumentspotentiallyrelevantfoundwhilependingresultsoftheprevioussubmission(onewasfoundtoberelevant)2)alldocumentsreviewedeyesonanticipatedtobeirrelevant,butnotyetsubmitted(199documents,ofwhichtwowererelevantandtheonlyrelevanttextwithinthesetwodocumentswerecontainedinadocumentpreviouslysubmittedtoTRECandreturnedasirrelevant);3)anythingmass-codedasirrelevant(thisresultedinonerelevantdocument,ofwhichtheredoesnotappeartobeanyrelevantmaterialwithinitandmaybeyetanotherTRECcodingerror);and4)anythingremaining(allirrelevant).Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%recallcall,andthedarkgreenlinetheReasonablecall.

Page 80: e-Discovery Team at TREC 2015 Total Recall Track

53

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheKingstonMillsLockMurderstopic,bythetime97.5%Recallhadbeenattainedonly0.12%ofthecorpus,1,096documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining99.88%or901,338documents.

Thelastchartbelowrepresentstheamountofeffortintermsofdocumentsreviewedtoattain100%RecallusingtheMultimodalHybridmodeloftrainingMr.EDR.

Page 81: e-Discovery Team at TREC 2015 Total Recall Track

54

______________________________________

Topic2130SurelyBitcoinsCanBeUsedConfusionMatrix-Topic2130TotalDocuments:465,147TotalRelevant:2,299 TotalPrevalence:0.49%

@Reas.Call

@97.5%Recall

TruePositives 1,961 2,242TrueNegatives 461,007 448,083FalsePositives 1,841 14,765FalseNegatives 338 57Recall 85.30% 97.52%Precision 51.58% 13.18%F1Measure 64.29% 23.23%Accuracy 99.53% 96.81%Error 0.47% 3.19%Elusion 0.07% 0.01%Fallout 0.40% 3.19%

Page 82: e-Discovery Team at TREC 2015 Total Recall Track

55

Topic2130wasrunbyReichenberger.Theworktosearchthe465,147documentsintheBlackHatWorldForumsdatabasestartedonAugust7,2015andwascompletedAugust13,2015.Theinitialsubmissionsweretotesttheoutlinesofthecategory.ThefirstsubmissionwasninedocumentswithvaryingdiscussionsaboutBitcoin(e.g.bitcoinexchanges,whetherbitcoinwasaccepted,bitcoinmining,etc).Allninecamebackasirrelevant.Asecondsubmissionofninereturnedfiverelevantdocumentsbutnonoticeablecommonalityamongthemexceptthat“acceptbitcoin”wasrelevantand“acceptbitcoins”wasnot.Thenext25documentssubmittedalsofollowedthistrend,withsingular“acceptbitcoin”beingrelevant,thoseinthepluralbeingirrelevant.Alldocumentswith“acceptw/3bitcoin”weresubmittedinthefollowingtwosubmissionsets;however,havingthattextwasnotindicativeofrelevance,assomestillcamebackirrelevant.Likewise,avariationofbitcoin(“BTC”)wassubmitted(15relevant,5irrelevant,noconsistentthread).Afteramachinelearningsession,thesubmitteddocumentswererevisitedanditappearedusingbitcoinforlegalactivityorsomeonevouchingforaforumusertendedtoberelevant,whileillegalorimmoralactivitywereirrelevant.Forthenextsubmission,the60highestscoringdocumentsweresubmittedandanticipatedasrelevant/irrelevantbasedonthepurposeofthetransaction.Whilenotperfect,thislargelycorrelatedwiththeresults.(10expectedrelevant,endresultwas13).Thenextsubmissioncontainedalldocumentswitha90%orhigherprobablerelevantscoreandcontainingtheterm“vouch*”.Ofthe122documents,94wererelevant.Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%Recallcall,andthedarkgreenlinetheReasonablecall.

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheSurelyBitcoinscanbeUsedtopic,bythetime97.5%Recallhadbeenattainedonly3.66%ofthecorpus,17,007documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining96.34%or448,140documents.

Page 83: e-Discovery Team at TREC 2015 Total Recall Track

56

Thelastchartbelowrepresentstheamountofeffortintermsofdocumentsreviewedtoattain100%RecallusingtheMultimodalHybridmodeloftrainingMr.EDR.

______________________________________

Page 84: e-Discovery Team at TREC 2015 Total Recall Track

57

Topic3089PicktonMurders

ConfusionMatrix-Topic3089TotalDocuments:902,434TotalRelevant:255 TotalPrevalence:0.03%

Topic3089wasrunbyJoeWhite.WorkonthistopiccommencedonAugust5,2015and

concludedonAugust28,2015.Approximately24hourswerespentonthistopic,includingafew

hoursupfrontresearchingthesubjectmatter.Thisservedasaproxyforthee-DiscoveryTeam

HybridMultimodalModel,Step1,ESIDiscoveryCommunications.CompletionofthisTopicwas

drawnoutduetotimeconflictsincludingvacation.

Thecollectionof902,434NewsArticlesweregenerallyeasiertosearchthantheBushEmailsorBlackHatWorldForumposts,thoughthenewsarticlescontainedmanylinks,footersand

subjectmattersthatweresharedwithothernewsstories,creatingtheappearanceofsimilarity.

Aswouldbeexpectedwithnewsarticles,misspelledwordsandnamesseemednonexistent,

whichwashelpful.Whitedid,however,findafewgold-standardinconsistenciesinthistopic.

WhitebeganStepTwo,multimodalsearch,bycreatingseveralkeywordlistsbasedonhis

judgmentandnotesfromtheinitialtopicresearch.Thisresearchincludedevents,names,

locations,andotherinformationrelatedtothecase.Thekeywordlistgoalswereto:(a)to

createaseedsettobeginfindingthepotentiallyrelevantdocumentsandtobegintrainingMr.

EDR;(b)toguesstimatehowlargetherelevantdocumentsetwouldbeakindofrough

substituteforStepThreeSample;and(c)tohighlightrelevanttermsinthesoftwaretofacilitate

moreeffectivereviewandtraining.(Note–allreviewerssohighlightedcertainkeywordsasa

matterofcoursetospeedupandimprovereview.)

Whentheinitialkeywordsbroughtbackonlyjustover220-somedocuments,whilestill

cognizantofthelimitationsofkeywordsearch,Whitebelievedthismeantarelativelysmall

potentialdatasetexisted.Thisaffordedhimtheabilitytoperformalinearreviewofallofthe

keywordhits,butalsomeantthatprecisionwouldbeeasilyharmedbyfalsepositives.Forthat

reasonWhiteknewthatcarewouldbeneededinascertainingtruerelevance.AnormalStep3,

@Reas.Call

@97.5%Recall

TruePositives 236 249TrueNegatives 902,164 901,971FalsePositives 15 208FalseNegatives 19 6Recall 92.55% 97.65%Precision 94.02% 54.49%F1Measure 93.28% 69.94%Accuracy 100.00% 99.98%Error 0.00% 0.02%Elusion 0.00% 0.00%Fallout 0.00% 0.02%

Page 85: e-Discovery Team at TREC 2015 Total Recall Track

58

initialRandomBaselinesample,wasomittedgiventhelikelylowprevalenceandgeneraltimeconstraintsforthework.BasedontheinitialjudgmentalsamplereviewsinStepTwo,WhitesubmittedinitialsetsofdocumentstoTRECtoestablishrelevanceboundariesandbeginwhittlingdownonthesetofrelevantcandidatedocuments.Aminorlossofprecisionwasanticipatedoncertaindocumentsinexchangeforknowledgethatwouldguidesubsequentsubmissions.Eachtimedocumentsweredeterminedtoberelevant,Whiteupdatedthetrainingandpredictiveranking,tofacilitatepriority-drivenreviewthataugmentedthejudgmentalsamplingwork(seestepsFour,FiveandSix:AIPredictiveRanking,MultimodalSearchReview&HybridActiveTraining).Healsoutilizedconceptualsearch(predominantlyFindSimilar,viaLSI)tobranchoffparticularlyinterestingornoveldocumentstolearnmore.AlthoughWhite,likeallofthereviewers,diduseconceptsearch,andsimilaritysearch,hefoundthatthepredictivecodingrankings(usingamorerobusttechnology)provedtobemoreeffectiveoverall.Allreviewershadthesameexperience.Duringtheinitialpartofthesubmissionprocess,WhitetrainedonalldocumentsdeemedrelevantorirrelevantbyTREC.Thishelpedcreateadditionalseparationinthemodelandrankings.InoneinstanceheleftoneobviousTRECmistaketrainedasrelevant(aduplicateofanotherdocumentthathadbeenadjudicatedrelevant)inordertoensurehewouldfindanyotherslikeit.Duringthepredictiveanalysisandtraining,Whitefounditwasmosthelpfultoreviewcertainsetsofdocumentsfromthebottom-up,toanalyzetheleast-likelycandidatesincaseswhererelevanceseemedclear.Inothersetsofdocuments,whererelevanceseemedlesscertain,Whitereviewedfromthetop-down.Afteradditionalanalysiswascompletedand99documentshadbeensubmittedtoTREC,Whitepredictedtherewouldbe200–250relevantdocumentsintotal.(Intheend,hewouldlearntherewere255totalrelevantdocumentsinthistopic,sotheearlypredictionturnedouttobequiteclose.)Whitealsousedrandomsamplinginoneinstance,totrainasetof100documentsthatseemedclearlyirrelevant.ThesedocumentsassistedMr.EDRinseparatingirrelevantdocsfromrelevantonesatapointearlyintheprocesswhenonlyrelevantdocumentshadbeentrained.ThiswasparttheTeam’sexperimentationoftheidealratiosofirrelevanttorelevantintrainingmodels.Asisalmostalwaysthecasewithaniterativetrainingprocess,asthetrainingandlearningcommenced,additionalrelevantsubjectareascametolight.Whilealmostalloftheseareasweresomewhatapparentfromthestart,fascinatingandsubtlenuancesemerged.Newsstoriesonthecasetooklittleturnsandspawnedentirelynewareasofrelevanceuntothemselves.Whitethoughtthebiggestchallengewiththesedocumentswasn’tasmuchaboutwhethertheyexistedorhowtolocatethem,butaboutwhetherTRECwouldseethemasrelevantornot.Hefoundthatithelpedtotrackeachpocketofrelevanceasaseparatesubjectarea,toutilizekeywordsforeachsubjectareatocreatesmallseedsets,andtothenutilizethepredictiverankingswithineachsubjectareatodivedeeperandensurethateachwasadequatelyexplored.Whitemadeatotalof56documentsubmissionstoTRECinthistopic:6submissionsbetweenAug.6thand12th,encompassing184documents,22submissionsbetweenAug.21and27th,encompassing284documents,andtheremaining28submissionsonAug.28th,encompassing901,966documents.InbetweenmostofthesesubmissionsheconductediterativestepsFour,FiveandSixofthestandardworkflow,utilizingpredictiveranking,search,andtraining.

Page 86: e-Discovery Team at TREC 2015 Total Recall Track

59

After218documentshadbeensubmittedandadditionalpriority-rankeddocumentsandtopkeywordsetshadbeenevaluated,Whitecalled70%.Therewasstillafairquantityofsuspectedborderlinedocumentsin-hand,buthisintuitionwasthathehadprobablysurpassed70%byafairmarginandsoneededtocalltheshot.ActualRecallatthispointturnedouttobe83.53%.Whitethenstudiedcloselythesuspectedborderlinedocumentsbeforehedecidedtosubmitthem.Hewasattemptingtodeterminethescopeofrelevanceforthesesubjectareas.Afterlocatingwhathebelievedtobethefullextentofthesubject,andhavingfound23morerelevantdocuments,hecalledthe80%shot.Whitebelievedhewasevenfartheralongthan80%,giventherankedresultshewasseeing.AsitturnedouttheactualRecallatthispointwas92.55%.Aftersubmitting8moredocumentsthathethoughtmightbeconsideredrelevant,butwereclosequestionsandprobablywouldnot,WhitecalledReasonable.Thiswaswith251totaldocumentssubmitted,236ofthemrelevant,andonly779documentsreviewed.ActualRecallatthispointwasstill92.55%.HavingcalledReasonableandfindingnothingnewthatlookedrelevant,Whiteturnedtohispoolofremainingdocumentsthatlookedirrelevant,toallowthepredictiverankingtohelphimbeingsubmittingthem.Indeed,Mr.EDRhelpedseethingshecouldnot,andsoonfound18additionaldocumentsthatcontainedanobliquereferencetoasubjectrelatedtothecase.Whilethesedocumentsseemedjustasobliqueasothersthatweredeemedirrelevant,thefactthatthepredictiverankingscaughtthemquicklywasreassuring.Afteranadditionalroundoftrainingandpredictiverankingturnedupnoadditionaldocuments,thesubmissionscontinued.Finally,atthe2,000thdocumentsubmitted,a“relevant”documentwasdiscoveredthatcompletedthe255-docset.Thisdocumentappearedtobeaclearmistake,asitwasonlyareferencetoanunrelatedLondon,UKmurder.Afterthat,allremainingdocumentssubmittedwereconfirmedasirrelevant.OnAugust28,2015,aftermaking19submissionstoTRECprovidingatotal251documents,Whitehadfoundatotalof236relevantdocuments.Theeffort,ornumberofdocumentsreviewedandcodedbyWhitetoattainthisresult,was834documents.Afterthe18thTRECsubmission,WhitedecidedtocallReasonable.ItwaslaterdeterminedthataRecallof92.55%hadbeenattained,withPrecisionof94.02%.Therewere37additionalsubmissionstoTRECaftertheReasonablecallpoint.Aftersubmittingatotalof462documents,whichisonly0.05%ofthetotal902,434documents,andreviewingonly834documents,a99.61%Recalllevelwasattainedwith54.98%Precision.100%Recallwith12.75%Precisionwasattainedaftersubmissionof2,000documents,whichis0.22%ofthetotal.Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%Recallcall,andthedarkgreenlinetheReasonableRecallcall.

Page 87: e-Discovery Team at TREC 2015 Total Recall Track

60

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OnthePicktonMurderstopic,bythetime97.5%Recallhadbeenattainedonly0.05%ofthecorpus,457documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining99.95%or901,977documents.

Thelastchartbelowrepresentstheamountofeffortintermsofdocumentsreviewedtoattain100%RecallusingthemultimodalhybridmodelofsearchandtrainingofMr.EDR.

Page 88: e-Discovery Team at TREC 2015 Total Recall Track

61

______________________________________

Topic2461OffshoreHostSitesConfusionMatrix-Topic2461OffshoreHostSitesTotalDocuments:465,147TotalRelevant:179 TotalPrevalence:0.04%

Topic2461wasrunbySullivanwhostartedonAugust14,2015.

@Reas.Call

@97.5%Recall

TruePositives 175 175TrueNegatives 463,225 463,408FalsePositives 1,743 1,560FalseNegatives 4 4Recall 97.77% 97.77%Precision 9.12% 10.09%F1Measure 16.68% 18.29%Accuracy 99.62% 99.66%Error 0.38% 0.34%Elusion 0.00% 0.00%Fallout 0.37% 0.34%

Page 89: e-Discovery Team at TREC 2015 Total Recall Track

62

Hefinishedhisreviewof902,434NewsArticlesonAug.15,2015after5.0totalhoursofeffort.Sullivan’sbackgroundandknowledgeinhostsiteswasexpectedtobehelpfulinthistopic,butinrealityitworkedagainsthim.Whilehedoesnotconsiderhimselftobeasubjectmatterexpertonthistopic,hehasasolidlevelofknowledgewithhostsites.Thisproveddifficult,becausehethoughtheknewwhatdocumentsshouldbeconsideredrelevant,buttheTRECgoldstandarddisagreedwithmostofhisdeterminations.Perhisstandardprocess,Sullivanstartedwithconceptsearchingtoidentifypopularkeywordstouseashighlightingandfuturesearches.ThisgeneratedalonglistoftermsrelatingtodifferenthostingsitesandVPNs.SullivancontinuedwiththenextstepoffindingsomedocumentstoseedforpredictivecodingandgetanunderstandingoftheTREClineforrelevance.Hefound8documentsthathiton“offshorehost*site*”andcontainedclearlyrelevantcontentbyhisdefinition.TRECdeterminedall8tobenotrelevant.Hethenfound5documentsthatrelatetospecificoffshorehostingsites,suchashostingpanamaandanonhoster.TRECreturned1relevantand4notrelevant.HecontinuedtotrydifferentvariationsoftermsrelatingtohostingisspecificcountriesanddocumentswithdifferenttypesofcontentandcouldnotfindanylogictotheTRECrelevancestandard.Frustrated,heinitiatedalearningsessionandtookabreak.Uponreturning,hedecidedtotryatestsubmissionof29topscoringdocumentsthatcontainedthetext“offshore”w/2“host”withoutlookingatanyofthedocuments.Tohissurprise,26ofthedocumentswerereturnedbyTRECasrelevant.Inareviewofthedocuments,hesawnodifferencebetweenthecontentoftheTRECrelevantdocumentsandthedocumentshefoundandsubmittedthatwerereturnedasnotrelevant.TheonlygeneralcorrelationhewasabletoidentifyistheTRECstandardappearedtofavorsmallersizeddocumentswithahigherproportionofcontentdedicatedtooffshorehostsites.Adocumentwithasinglelinediscussingoffshorehostsiteswasmorelikelytoberelevantthanadocumentwith50linesand10references.Beingunabletodetermineanyreasonableconnectionbetweencontentandrelevance,SullivanhadnochoicebuttocontinueridingMr.EDR’ssuggestionsfordocumentstosubmit.Thisprocessconsistedofmanyiterationsoflearningsessionsandsearching.SimilartohowSullivanreviewedTopic2052and3481,hestartedwithanarrowlistofkeywordsearchesandbroadenedthetermsiteratively.Foreachset,hesubmittedthedocumentswiththehighestpredictivecodingscores.Startingwith“offshore”w/2“host*,”hemovedto“offshore”and“host,”“offshore”and“web,”and“offshore”and“vpn.”Eventuallyhemovedtoalldocumentsthatcontained“offshore”or“hosting.”ThedifferencebetweenthisprocessandwhatwasusedinpriorreviewsisSullivandidnotactuallylookatanyofthedocuments.AshefoundhisjudgmenttobeoutoflinewiththeTRECstandard,documentsweresubmittedwithoutreview.Resultsofasearchwouldbetakenandthetopdocumentswouldbesubmitted.Ifmostweredeterminedtoberelevant,lowersetsofdocumentsfromtheresultwouldbesubmitteduntilalowamountofrelevantdocumentswerereturned.Hewouldthenmoveontothenextsearchandrepeat.Afterexhaustingalloftheallkeyterms,Sullivansubmittedallremainingdocumentsindescendingpriorityorder.

Page 90: e-Discovery Team at TREC 2015 Total Recall Track

63

Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenline

signifyingtheanticipated70%recallcall,andthedarkgreenlinethereasonablerecallcall.

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocuments

submitted(green)astrackedacrossvaryingrecallthresholds.OntheOffshoreHostSitestopic,

bythetime97.5%Recallhadbeenattainedonly0.37%ofthecorpus,1,735documents,had

beensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionof

theremaining99.63%or463,412documents.

Thelastchartbelowrepresentstheamountofeffortintermsofdocumentsreviewedtoattain

100%recallusingthemulti-modalhybridmodeloftrainingEDR.

______________________________________

Page 91: e-Discovery Team at TREC 2015 Total Recall Track

64

Topic3290RoosterTurkeyChickenNuisance

ConfusionMatrix-Topic3290TotalDocuments:902,434TotalRelevant:26 TotalPrevalence:0.00%

Topic3290wasrunbyLoseyalonewhostartedonAugust15,2015andconcludedonAugust23,2015.TheprojectcommencedasusualwithLoseybeginningStepTwo,MultimodalSearchReviews.StepThree,RandomBaseline,wasomitted.Aftersubmissionsbegan,theechoStepFive,multimodalsearchbegan,includingpredictivecodingfeatures,withiteratedtraining.OnAugust22,2015,aftermaking14submissionstoTREC,andtrainingafteralmosteverysubmission,Loseyhadprovidedatotalof95documentstoTRECandconfirmedatotalof23relevantdocuments.Theeffort,ornumberofdocumentsreviewedandcodedbyLoseytoattainthisresult,was306documents.Afterthe14thTRECsubmission,LoseydecidedtocallReasonable.ItwaslaterdeterminedthataRecallof88.46%wasattainedbysubmissionofonly95documents,whichis0.01%ofthetotal902,434documents.Thiswasaccomplishedbyreviewofonly0.03%ofthetotalcollection.Therewere23additionalsubmissionstoTRECaftertheReasonablecallpoint.InthenextsubmissionafterReasonablecall,the15th,theRecalllevelroseto96.15%.Recallof100%wasattainedaftersubmissionofonly0.15%.A90%Recallwasattainedaftersubmittingonly129documents.A95%Recallwasattainedaftersubmitting1,923documents,and97.5%Recallattainedafter3,188documents.TotalRecallwasattainedaftersubmitting17,414documentsoutofthecorpustotalof902,43(0.15%).Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%recallcall,andthedarkgreenlinethereasonablerecallcall.

@Reas.Call

@97.5%Recall

TruePositives 23 26TrueNegatives 902,336 885,020FalsePositives 72 17,388FalseNegatives 3 0Recall 88.46% 100.00%Precision 24.21% 0.15%F1Measure 38.02% 0.30%Accuracy 99.99% 98.07%Error 0.01% 1.93%Elusion 0.00% 0.00%Fallout 0.01% 1.93%

Page 92: e-Discovery Team at TREC 2015 Total Recall Track

65

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheRoosterTurkeyChickenNuisancetopic,bythetime97.5%Recallhadbeenattainedonly1.93%ofthecorpus,17,414documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining98.07%or885,020documents.

Thelastchartbelowrepresentstheamountofeffortintermsofdocumentsreviewedtoattain100%RecallusingthemultimodalhybridmodelofsearchandtrainingofMr.EDR.

Page 93: e-Discovery Team at TREC 2015 Total Recall Track

66

______________________________________

Topic2333ArticleSpinnerSpinningConfusionMatrix-Topic2333TotalDocuments:465,147TotalRelevant:4,805 TotalPrevalence:1.03%

@Reas.Call

@97.5%Recall

TruePositives 4,201 4,685TrueNegatives 457,877 450,329FalsePositives 2,465 10,013FalseNegatives 604 120Recall 87.43% 97.50%Precision 63.02% 31.88%F1Measure 73.24% 48.04%Accuracy 99.34% 97.82%Error 0.66% 2.18%Elusion 0.13% 0.03%Fallout 0.54% 2.18%

Page 94: e-Discovery Team at TREC 2015 Total Recall Track

67

Topic2333wasrunbyLoseywhoalsostartedonAugust19,2015.Hefinishedhisreviewof465,149forumpostsinBlackHatWorldonAugust23,2015.TheprojectcommencedasusualwithLoseybeginningStepTwo,MultimodalSearchReviews.StepThree,RandomBaseline,wasomitted.Aftersubmissionsbegan,theechoStepFive,multimodalsearchbegan,includingpredictivecodingfeatures,withiteratedtraining.OnAugust21,2015,aftermaking23submissionstoTREC,andtrainingafteralmosteverysubmission,Loseyhadprovidedatotalof6,666documentstoTRECandconfirmedatotalof4201relevantdocuments.Theeffort,ornumberofdocumentsreviewedandcodedbyLoseytoattainthisresult,was228documents.Afterthe23rdTRECsubmission,LoseydecidedtocallReasonable.ItwaslaterdeterminedthataRecallof87.43%wasattainedbysubmissionofonly6,666documents,whichis.043%ofthetotal465,147documents.Thiswasaccomplishedbypersonalreviewofonly228documents,0.05%ofthetotalcollection.Therewere32additionalsubmissionstoTRECaftertheReasonablecallpoint.Recallof90%wasattainedaftersubmittingaftersubmitting7,091documents,and95%Recallafter10,931.Recallof98%Recallwasreachedaftersubmitting14,698documents,whichwasonly3.22%oftotalof456,147collectionofBlackHatWorldForumposts.Again,thiswasaccomplishedbypersonalreviewofonly228documents,0.05%ofthetotalcollection.InalltopicswealwaysstoppedindividualdocumentreviewaftertheReasonablecallandreliedonMr.Robotsautomaticprocesseswhereinthedocumentsweresubmittedinorderofhighestranking.Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%Recallcall,andthedarkgreenlinetheReasonableRecallcall.

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheArticleSpinnerSpinningtopic,bythetime97.5%Recallhadbeenattainedonly3.16%ofthecorpus,14,698documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining96.84%or450,449documents.

Page 95: e-Discovery Team at TREC 2015 Total Recall Track

68

Thelastchartbelowrepresentstheamountofeffortintermsofdocumentsreviewedtoattain100%RecallusingthemultimodalhybridmodelofsearchandtrainingofMr.EDR.

______________________________________

Page 96: e-Discovery Team at TREC 2015 Total Recall Track

69

Topic2129FacebookAccounts

ConfusionMatrix-Topic2129TotalDocuments:465,147

TotalRelevant:589

TotalPrevalence:0.13%

Topic2129wasrunbySullivanwhostartedonAugust21,2015.Hefinishedhisreviewof

465,149forumpostsinBlackHatWorldonAugust22,2015.

WhilehecountshimselfamongFacebook’s1.5billionactiveusers,Sullivandoesnotconsider

himselfmoreknowledgeableonthistopicthantheaverageperson.

Day1onthistopicstartedlikeallSullivantopicswithconceptsearchingtofindkeywords

relatingtoFacebookaccountsforsearchingandhighlighting.Specifically,variationsof

Facebookspellingandslangwereinvestigatedtoensureallcommonvariantsareidentified.

Manypreviouslyunexpectedvariationsoffacebookwereidentified,suchasfbook.All

variationswereaddedtothehighlightinglistanddocumentedforfuturesearches.

Sullivanspent2.5hoursonDay1tryingtodefinerelevanceaccordingtotheTRECstandard.He

startedwith8documentsthatcontainedclearreferencestofacebookaccounts,andonly1of

thedocumentswasreturnedasrelevantaccordingtotheTRECstandard.Hecontinuedby

isolatingdocumentsthatcontained“Facebookaccount*”inthetitleaswellasanumberof

commonvariants.Attheendoftheday,SullivanwasnoclosertocrackingtheFacebookpuzzle

andwasbarelyabletoexceed50%precisioneventhoughhewasonlysubmittingdocuments

thatwerecertaintoberelevantbyanyobjectivestandard.

Facingwhatappearedtobeadead-end,SullivanstartedDay2byrelyingonthepriorityscores

generatedbyMr.EDR,andstartedtoseemuchbetterresults.WhileSullivanwasunableto

identifywhichdocumentswouldbereturnedasresponsivebyTREC,Mr.EDRseemedtobeable

tofindthepattern.Assuch,hestoppedlookingatthedocuments,andjuststartedsubmitting

alldocumentsthathadahighpriorityscorethatcontainedthetermFacebookoranyknown

@Reas.Call

@97.5%Recall

TruePositives 580 575

TrueNegatives 461,284 462,644

FalsePositives 3,274 1,914

FalseNegatives 9 14

Recall 98.47% 97.62%

Precision 15.05% 23.10%

F1Measure 26.11% 37.36%

Accuracy 99.29% 99.59%

Error 0.71% 0.41%

Elusion 0.00% 0.00%

Fallout 0.70% 0.41%

Page 97: e-Discovery Team at TREC 2015 Total Recall Track

70

variation,withlearningsessionsbeingrunperiodicallytoupdatethescoresbasedonnewlearning.Oncethosedocumentswereexhausted,allremainingdocumentsweresubmittedindescendingpriorityscoreorder.Hespent2.75hourssubmittingandevaluatingtheresults,foratotalof5.25hoursspentonthistopic.Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%recallcall,andthedarkgreenlinethereasonablerecallcall.

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheFacebookAccountstopic,bythetime97.5%Recallhadbeenattainedonly0.54%ofthecorpus,2,489documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining99.46%or462,658documents.

Page 98: e-Discovery Team at TREC 2015 Total Recall Track

71

Thelastchartbelowrepresentstheamountofeffortintermsofdocumentsreviewedtoattain100%recallusingthemulti-modalhybridmodeloftrainingEDR.

______________________________________

Topic3378RobMcKennaGubernatorialCandidateConfusionMatrix-Topic3378TotalDocuments:902,434TotalRelevant:66 TotalPrevalence:0.01%

@Reas.Call

@97.5%Recall

TruePositives 59 65TrueNegatives 902,321 902,264FalsePositives 47 104FalseNegatives 7 1Recall 89.39% 98.48%Precision 55.66% 38.46%F1Measure 68.60% 55.32%Accuracy 99.99% 99.99%Error 0.01% 0.01%Elusion 0.00% 0.00%Fallout 0.01% 0.01%

Page 99: e-Discovery Team at TREC 2015 Total Recall Track

72

Topic3357wasrunbyReichenberger.Theworktosearchthe902,434NewsArticlesdatabasestartedonAugust22,2015,andwascompletedonAugust23,2015.Theinitialsubmissionsonthefirstdayweretotesttheoutlinesoftherelevancescope.ItwasascertainedinthefirsttwosubmissionsthatdocumentsrelatingtoMcKennaasacandidatewererelevant,andthoserelatedtohisjobasAttorneyGeneralwereirrelevant.BorderlinedocumentswerethoseassociatedwithhisAttorneyGeneraljobthatcouldbepretexttoapoliticalcampaign(e.g.filingasuitrelatedtoObamacareimplementation).Thethirdsubmissionwasmadewiththenext65documentsbasedonprioritizationwithoutlookingatthecontent;theresultslargelyconfirmedtheanticipatedparameters(43relevant,22irrelevant,withtheborderlinedocumentsskewingtotheirrelevant)The70%callwasmadefollowingthereturnofresults.Afterlookingatwhatwasbeingpromotedbyprioritizationandcontaining“McKenna,”thenext13documentsweresubmitted.Mostoftheseappearedtobeborderline,only4wereadjudicatedrelevantbyTREC.The80%recallcallwasmadeatthatpoint.Onemoresetof14documentswassubmittedandonly3camebackresponsive.ThedecisionwasthenmadetocallReasonable,andthereafterthefinalsubmissionsweremade.Thepostcallsubmissionsweremadebythefollowinggroupsindescendingpriorityscoreorder:1)alldocumentsreviewedthatwerecurrentlyanticipatedtobeirrelevant,buthadnowbeensubmitted(129documents,ofwhich7wererelevant);2)anythingremainingwith“McKenna”(695documents,allirrelevant;andthen3)allelse(allirrelevant).Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%Recallcall,andthedarkgreenlinetheReasonablecall.

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingRecallthresholds.OntheRobMcKennaGubernatorialCandidatetopic,bythetime97.5%Recallhadbeenattainedonly0.02%ofthecorpus,169documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining99.98%or902,265documents.

Page 100: e-Discovery Team at TREC 2015 Total Recall Track

73

Thelastchartbelowrepresentstheamountofeffortintermsofdocumentsreviewedtoattain100%RecallusingtheMultimodalHybridmodeloftrainingMr.EDR.

______________________________________

Page 101: e-Discovery Team at TREC 2015 Total Recall Track

74

Topic2322WebScrapingConfusionMatrix-Topic2322WebScrapingTotalDocuments:456,147TotalRelevant:10,145 TotalPrevalence:2.22%

Topic2322wasrunbyLoseywhoalsostartedonAugust22,2015.Hefinishedhisreviewof465,149forumpostsinBlackHatWorldonAugust25,2015.TheprojectcommencedasusualwithLoseybeginningStepTwo,MultimodalSearchReviews.StepThree,RandomBaseline,wasomitted.Aftersubmissionsbegan,theechoStepFive,multimodalsearchbegan,includingpredictivecodingfeatures,withiteratedtraining.OnAugust25,2015,aftermaking24submissionstoTREC,andtrainingafteralmosteverysubmission,Loseyhadprovidedatotalof12,799documentstoTRECandconfirmedatotalof8,060relevantdocuments.Theeffort,ornumberofdocumentsreviewedandcodedbyLoseytoattainthisresult,was195documents.Afterthe24thTRECsubmission,LoseydecidedtocallReasonable.ItwaslaterdeterminedthataRecallof79.45%wasattainedbysubmissionofonly12,799documents,whichis2.8%ofthetotaldocuments.Thiswasaccomplishedbyreviewofonly0.04%ofthetotalcollection.Therewere21additionalsubmissionstoTRECaftertheReasonablecallpoint.InthenextsubmissionafterReasonablecall,the25th,1,000documentsweresubmittedandtheyallcamebackrelevant.Obviouslyanerroringamesmanshiphadbeenmadeandthecallwasmadealittletooearly.Afterthat25thsubmission,theRecalllevelroseto89.31%andthePrecisionincreasedto65.66%.A90%Recallwasattainedaftersubmitting14,477documents.A95%Recallwasattainedaftersubmitting16,983documents,and97.5%Recallattainedafter19,821documentsweresubmitted,whichwasonly4.35%oftotalof456,147collectionofBlackHatWorldForumposts.Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%Recallcall,andthedarkgreenlinetheReasonableRecallcall.

@Reas.Call

@97.5%Recall

TruePositives 8,060 9,892TrueNegatives 441,263 436,073FalsePositives 4,739 9,929FalseNegatives 2,085 253Recall 79.45% 97.51%Precision 62.97% 49.91%F1Measure 70.26% 66.02%Accuracy 98.50% 97.77%Error 1.50% 2.23%Elusion 0.47% 0.06%Fallout 1.06% 2.23%

Page 102: e-Discovery Team at TREC 2015 Total Recall Track

75

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheWebScrapingtopic,bythetime97.5%Recallhadbeenattainedonly4.35%ofthecorpus,19,821documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining95.65%or436,326documents.

Thelastchartbelowrepresentstheamountofeffortintermsofdocumentsreviewedtoattain100%RecallusingthemultimodalhybridmodelofsearchandtrainingofMr.EDR.

Page 103: e-Discovery Team at TREC 2015 Total Recall Track

76

______________________________________

Topic3484PaulandCathyLeeMartin

ConfusionMatrix-Topic3484TotalDocuments:902,434TotalRelevant:23 TotalPrevalence:0.00%

@Reas.Call

@97.5%Recall

TruePositives 23 23TrueNegatives 902,411 902,411FalsePositives 0 0FalseNegatives 0 0Recall 100.00% 100.00%Precision 100.00% 100.00%F1Measure 100.00% 100.00%Accuracy 100.00% 100.00%Error 0.00% 0.00%Elusion 0.00% 0.00%Fallout 0.00% 0.00%

Page 104: e-Discovery Team at TREC 2015 Total Recall Track

77

ThisTopicwasrunbySullivanwhostartedonAugust24,2015.Hecompletedhisreviewof902,434documentsonAugust25,2015.TheentireTeamobservedhisfinalsubmissionsandcheeredonhisperfecthandlingofthissearchproject.ThistopicwascompletelyunknowntoSullivanpriortothisexercise.HisonlyknowledgecamefromaquickGooglesearchonthetopic.SullivanstartedlateonDay1andbeganwithasimplesearchusingthefollowingkeywords:((martinw/3paul)ANDcathy)OR((martinw/3cathy)ANDpaul).Thissearchreturned26documents.Aquickreviewofthedocumentsyielded22clearlyrelevantdocumentsand1marginallyrelevant.Sullivansubmittedthe22relevantdocuments,whichwereallreturnedasrelevantbyTRECandquitforthenightafter15minutesofeffort.OnDay2,Sullivanwentbacktohisstandardprocessofusingconceptsearchingtofindrelevantkeywordsforhighlightingandsearches.Aswithalltopicsindataset3,spellingerrorswerenon-existent,whichremovedtherequirementofbroadsearchingtoaccountforslangorspellingissues.Broadsearcheswererunusingallrelevantkeywordsandtheresultsweresampled.Nextpredictivecodingscoreswereusedtoidentifyadditionalpotentiallyrelevantdocuments.AlargenumberoffalsepositiveswereencounteredwhenitwasdiscoveredapopularhockeyplayerandPrimeMinistersharedthesamenamesastheparties.Thesewerequicklyidentifiedandexcludedfromthepotentiallyrelevantset.After90minutesofwork,Sullivanconcededthathewasunabletofindanyadditionalrelevantdocuments.InreviewingthesinglemarginallyrelevantdocumentfoundonDay1,itwasdeterminedthisdocumentwasverylikelytoberelevant,soitwassubmittedtoTRECandwasinfactreturnedrelevant.Atthispoint,Sullivancalledreasonablerecallandsubmittedallremainingdocumentsindescendingorderofpriorityscore.Afteralldocumentsweresubmitted,itwasdiscoveredthatSullivaninfacthadattained100%recalland100%precisionatthepointthereasonablecallwasmade.Additionally,95.7%recallwasattained,with100%precision,afteronly15minutes.Inall,hewasabletoachieveaperfectgamewithonly1.75hourscommittedtothistopic!Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%recallcall,andthedarkgreenlinethereasonablerecallcall.

Page 105: e-Discovery Team at TREC 2015 Total Recall Track

78

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OnthePaulandCathyLeeMartintopic,bythetime97.5%Recallhadbeenattainedonly0.00%ofthecorpus,23documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining100.00%or902,411documents.

Thelastchartbelowrepresentstheamountofeffortintermsofdocumentsreviewedtoattain100%recallusingthemulti-modalhybridmodeloftrainingEDR.

Page 106: e-Discovery Team at TREC 2015 Total Recall Track

79

______________________________________

Topic2134PaypalAccountsConfusionMatrix-Topic2134TotalDocuments:465,147TotalRelevant:252 TotalPrevalence:0.05%

Topic2134wasrunbySullivanwhostartedonAugust26,2015.Hefinishedhisreviewof465,149forumpostsinBlackHatWorldonAugust26,2015.

@Reas.Call

@97.5%Recall

TruePositives 241 246TrueNegatives 461,447 443,136FalsePositives 3,448 21,759FalseNegatives 11 6Recall 95.63% 97.62%Precision 6.53% 1.12%F1Measure 12.23% 2.21%Accuracy 99.26% 95.32%Error 0.74% 4.68%Elusion 0.00% 0.00%Fallout 0.74% 4.68%

Page 107: e-Discovery Team at TREC 2015 Total Recall Track

80

AsaregularPayPaluserforabout10years,Sullivanhasahighlevelofknowledgeregardingthis

topic.Thisadvancedknowledgeprovedtobeaburdenonthistopicbecausehisunderstanding

ofwhatshouldberelevantdidnotmatchwiththeTRECgoldstandard.Hewasableto

overcomethisburdenbyrelyingonavarietyofadvancedmethodsratherthanusinghisown

judgmentinreviewofthedocuments.

Sullivanstartedthistopicwithhisusualprocessofrunningconceptsearchestofindsimilarand

relatedkeywordtermsforhighlightingandfuturesearching.Aswithallforumtopics,hespend

sometimeidentifyingcommonvariantsbasedonmisspellingorslang.Allvariationswereadded

tothedatabaseforhighlighting.

Whileusinganumberofmethodstoidentifydocumentshefeltwereclearlyrelevant,Sullivan

quicklyrealizedhewasunabletomakeanylogicoftheTRECrelevancestandard.Documents

withsimilaroridenticalcontentwereseeminglyarbitrarilydesignatedasrelevantornot

relevant.Ratherthanspendaconsiderabletimeevaluatingthedocumentshimself,aswasdone

inTopic2129FacebookAccounts,hewentstraighttoMr.EDRforhelp.

SimilartothemethoddevelopedinTopic2129,Sullivanreliedheavilyonthepredictivecoding

anddidverylittlereviewonanydocuments.Hewoulditerativelysubmitthehighestscoring

documentstoTRECforanalysis,andtrainthedocumentswiththerelevancydetermination

returned.Inadditiontousingacontinuousactivelearningapproach,hestartedusingthe“Find

Similar”featuremuchmoretofinddocumentsthatcontainedsimilarcharacteristicsto

documentsalreadydeterminedtoberelevant.Hestartedwithdocumentsthatcontaineda

variationofPayPalinthesubjectline,thenmovingtodocumentsthatcontainedtheterm

anywhereinthetext.Usingthismultimodalmethodhewasabletoworkhiswaythroughthe

entiredatasetwithalmostnoactualreviewofthedocuments.Inall,Sullivanwasableto

completethereviewforthistopicinlessthan4hours.

Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenline

signifyingtheanticipated70%recallcall,andthedarkgreenlinethereasonablerecallcall.

Page 108: e-Discovery Team at TREC 2015 Total Recall Track

81

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OnthePaypalAccountstopic,bythetime97.5%Recallhadbeenattainedonly4.73%ofthecorpus,22,005documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining95.27%or443,142documents.

Thelastchartbelowrepresentstheamountofeffortintermsofdocumentsreviewedtoattain100%recallusingthemulti-modalhybridmodeloftrainingEDR.

______________________________________

Page 109: e-Discovery Team at TREC 2015 Total Recall Track

82

Topic3423RobFordCuttheWaist

ConfusionMatrix-Topic3423TotalDocuments:902,434TotalRelevant:76 TotalPrevalence:0.01%

Topic3423wasrunbyLoseywhoalsostartedonAugust26,2015.Hefinishedhisreviewof902,434NewsArticlesonAugust27,2015.TheprojectcommencedasusualwithLoseybeginningStepTwo,MultimodalSearchReviews.StepThree,RandomBaseline,wasomitted.Aftersubmissionsbegan,theechoStepFive,multimodalsearchbegan,includingpredictivecodingfeatures,withiteratedtraining.OnAugust26,2015,aftermaking11submissionstoTREC,andtrainingafteralmosteverysubmission,Loseyhadprovidedatotalof40documentstoTRECandconfirmedatotalof34relevantdocuments.Theeffort,ornumberofdocumentsreviewedandcodedbyLoseytoattainthisresult,was92documents.Afterthe11thTRECsubmission,LoseydecidedtocallReasonable.Thisprovedtobeaprematurecall.ItwaslaterdeterminedthataRecallof44.74%wasattained.Inthe17automaticsubmissionsthatfollowed,Recallof76.32%wasattainedwith84.06%Precision.The76.32%Recallwasattainedaftersubmittingonly106documents,whichis0.01%ofthetotalof902,434.Therewere17submissionstoTRECaftertheReasonablecallpoint.Total100%Recallwasattainedaftersubmittingonly35,193documents,whichis3.9%ofthetotal.Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%Recallcall,andthedarkgreenlinetheReasonableRecallcall.

@Reas.Call

@97.5%Recall

TruePositives 34 75TrueNegatives 902,352 867,337FalsePositives 6 35,021FalseNegatives 42 1Recall 44.74% 98.68%Precision 85.00% 0.21%F1Measure 58.62% 0.43%Accuracy 99.99% 96.12%Error 0.01% 3.88%Elusion 0.00% 0.00%Fallout 0.00% 3.88%

Page 110: e-Discovery Team at TREC 2015 Total Recall Track

83

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheRobFordCuttheWaisttopic,bythetime97.5%Recallhadbeenattainedonly3.89%ofthecorpus,35,096documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining96.11%or867,338documents.

Thelastchartbelowrepresentstheamountofeffortintermsofdocumentsreviewedtoattain100%RecallusingthemultimodalhybridmodelofsearchandtrainingofMr.EDR.

Page 111: e-Discovery Team at TREC 2015 Total Recall Track

84

______________________________________

Topic3133PacificGatewayConfusionMatrix-Topic3133TotalDocuments:902,434TotalRelevant:113 TotalPrevalence:0.01%

Topic3133wasrunbyLoseywhoalsostartedonAugust27,2015.Hefinishedhisreviewof902,434NewsArticlesonAugust28,2015.TheprojectcommencedasusualwithLosey

@Reas.Call

@97.5%Recall

TruePositives 87 111TrueNegatives 902,311 799,986FalsePositives 10 102,335FalseNegatives 26 2Recall 76.99% 98.23%Precision 89.69% 0.11%F1Measure 82.86% 0.22%Accuracy 100.00% 88.66%Error 0.00% 11.34%Elusion 0.00% 0.00%Fallout 0.00% 11.34%

Page 112: e-Discovery Team at TREC 2015 Total Recall Track

85

beginningStepTwo,MultimodalSearchReviews.StepThree,RandomBaseline,wasomitted.Aftersubmissionsbegan,theechoStepFive,multimodalsearchbegan,includingpredictivecodingfeatures,withiteratedtraining.OnAugust28,2015,aftermaking7submissionstoTREC,andtrainingafteralmosteverysubmission,Loseyhadprovidedatotalof97documentstoTRECandconfirmedatotalof87relevantdocuments.Theeffort,ornumberofdocumentsindividuallyreviewedandcodedbyLoseytoattainthisresult,was49documents.Afterthe7thTRECsubmission,LoseydecidedtocallReasonable.Thatcallprovedtobealittlepremature.ItwaslaterdeterminedthataRecallof76.99%wasattainedwithPrecisionof89.69%.Inthe6thautomaticsubmissionafterthecall,aRecallof94.69%wasattainedaftersubmittingonly693documentstotal,whichis0.07%ofthetotalof902,434.Therewere24submissionstoTRECaftertheReasonablecallpoint.Total100%Recallwasattainedaftersubmitting103,189documents,whichis11.43%ofthetotal.Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%recallcall,andthedarkgreenlinethereasonablerecallcall.

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OnthePacificGatewaytopic,bythetime97.5%Recallhadbeenattainedonly11.35%ofthecorpus,102,446documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining88.65%or799,988documents.

Page 113: e-Discovery Team at TREC 2015 Total Recall Track

86

Thelastchartbelowrepresentstheamountofeffortintermsofdocumentsreviewedtoattain100%recallusingthemulti-modalhybridmodeloftrainingEDR.

______________________________________

Page 114: e-Discovery Team at TREC 2015 Total Recall Track

87

Topic3226TrafficEnforcementCamerasConfusionMatrix-Topic3226TotalDocuments:902,434TotalRelevant:2,094 TotalPrevalence:0.23%

Topic3226wasrunbySullivanwhoalsostartedonAugust27,2015.Hefinishedhisreviewof902,434NewsArticlesonAugust28,2015.Sullivanhassomepriorexperienceasacriminaldefenseattorney,withexperiencewithtrafficlaws,buthehasnopriorexperiencewithtrafficenforcementcameras,whichwerenotinuseatthetimehewaspracticing.Asusual,Sullivanstartedhisinvestigationwithhisstandardprocessofusingkeywordandconceptsearchestoformulatealistofrelatedkeywordsforhighlightingandfuturesearching.Forthisexercise,nothingextraordinarywasdiscovered,buthewasabletogenerateagoodlistoftermsrelatingtotrafficcameras,redlightcameras,andtraffictickets.Day1wasashortdayandstartedwithsubmittingtheresultsofthemostpopularkeywordsearcheswithminimalreview.After30minutesofwork,76documentsweresubmittedwith50beingreturnedasrelevant.UsingthedocumentsidentifiedonDay1,SullivanwasabletostartutilizingthepredictivecodingtosupplementhissearchesonDay2.Hewasabletoprogressivelymakehiswaythroughthereviewsetusingacombinationofpredictivecodingscoresandkeywordhits.Heusedthismultimodalapproachtosubmitlargesetsofdocumentswithminimal,ifany,manualreview.Hebelievedhehadfoundallrelevantdocumentsaftersubmittingonly5,347totaldocumentswith2,061relevant.Aftersubmittingalloftheremainingdocumentsindescendingorderbypredictivecodingpriorityscore,itwasdiscoveredheonlymissed33oftherelevantdocumentsinthedatasetaftersubmitting0.6%ofthedocuments!Becauseheminimizedtheamountofmanualreviewonthistopic,hewasabletocompletethistopicafter3.0hoursonDay2,foratotalof3.5hoursonthistopic.

@Reas.Call

@97.5%Recall

TruePositives 2,061 2,042TrueNegatives 897,054 899,807FalsePositives 3,286 533FalseNegatives 33 52Recall 98.42% 97.52%Precision 38.54% 79.30%F1Measure 55.39% 87.47%Accuracy 99.63% 99.94%Error 0.37% 0.06%Elusion 0.00% 0.01%Fallout 0.36% 0.06%

Page 115: e-Discovery Team at TREC 2015 Total Recall Track

88

Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%recallcall,andthedarkgreenlinethereasonablerecallcall.

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheTrafficEnforcementCamerastopic,bythetime97.5%Recallhadbeenattainedonly0.29%ofthecorpus,2,575documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining99.71%or899,859documents.

Thelastchartbelowrepresentstheamountofeffortintermsofdocumentsreviewedtoattain100%recallusingthemulti-modalhybridmodeloftrainingEDR.

Page 116: e-Discovery Team at TREC 2015 Total Recall Track

89