AgendaWhyanalyticsandWhatisdata?DatascientistandtheFraudDataScientistPredictiveAnalyticsWhatpredictiveanalyticsmeansforFraudUsecasesFraudanalyticsprocessmodel
WhyAnalytics?Ifthecurrentrateofchangeandcomplexityweretoremainconstant,wewouldhaveexperiencedallthemajormilestonesofthetwentiethcentury– inasingleweekin2025!
1. Thecreationoftheautomobile;2. ThefirstandsecondworldwarANDtheVietnamwar;3. DecodingoftheDNAstructure;4. Nuclearenergy;5. Spacetravel;6. Theinternet;and7. Humangenomesequencing
Thechallengefororganizationsis:Howtonavigatethis,buildstrategiesthatidentifytrendsofthefuture:Analyticsispostulatedtobetheanswer!=IDENTIFICATIONOFTRENDS,PRESENTANDFUTURETRENDS
Considerthefollowinginasingleday…online1. Enoughinformationisconsumedtofill
±168MillionDVDs
2. ±294Billionemailsaresent
3. ±2Millionblogpostsarewritten
4. ±4.7MillionminutesarespentonFacebook
5. ±864,000hoursofvideoareuploadedonYouTube
Whatmanythinkdatais…
Gobbledygooknoun informal
Language that is meaningless or is made unintelligible by excessive use of abstruse technical terms,
nonsenseSynonyms: gibberish, claptrap, nonsense, balderdash, blather,
garbage
Terminology
• UnstructuredDataDatathathasnoidentifiablestructure– forexample,thetextofemailmessages
• StructuredDataDatathatisorganisedbyapredeterminedstructure.
Thedataproblem?
Dataexistsbuttheproblemis:• DataMining• DataAnalysisSkills• Understandingwhatitmeansformybusiness
Thequestionshiftsfromwhatdowethink,towhatdoweknow? 95%
Resides internally
34%Recognised globally
BIGdata
Volume
Value
Velocity
Variety
VeracityWhile“size”ofdataistraditionallythehallmarkofbigdata,thetermispoor,andmaybebetter
rootedinanunderstandingthatBigDataisaboutcapacitytoSEARCH,
AGGREGATE andCROSS-REFERENCEdatasets
BusinessValue
• Maturityhasbeenreached….
• Trendin:
• FromInfrastructure(DevelopersandEngineers)toAnalytics(DataScientistsandAnalysts)
• FromAnalytics(DataScientistsandAnalysts)toApplication(Businessusersandconsumers)- InourcontextFraudDetection!
Whatdoesthismean?
In a world of near infinite data,professionals who can fish outinsights from the ocean of data we’redrowning swimming in are incrediblyattractive.
~ScottBrinker– Chiefmartec ~
Data Science
ComputerScience
MachineLearning
Unicorn
Math&Statistics
TraditionalSoftware
TraditionalResearch
SubjectMatterExpertise
InterestingdataversusActionable
data
InterestingvsActionable
Interesting:NicetoknowDoesNOT helpyoumakeinformeddecisionsDoesNOT provideinsight:Whyshouldwecare?
Actionable:Insights>ActionDesignProgrammesDevelopstrategiesAchievegoals
• PredictiveAnalyticslikestatisticshasbeenaroundforalongtime…
• Sowhathaschanged?
1. Increaseinvolumeandtypeofdata2. Greaterinterestindataforinsights3. Computingpower,and“pointandclick”4. Toughereconomicconditionsandneedforcompetitivedifferentiation:Businessefficiency;ROI…..
Timeforpredictiveanalyticshascome…
Whypredictiveanalyticsmatters
• Descriptive• Whatarethecharacteristicsofthosewhocommitfraud?HowdoIturnmydataintorulesforbetterdecisions?
Knowledge
• Predictive• Howlikelyisaclaimwithsomeoneorabusinesswiththosecharacteristicstobefraudulent?
ActionUncertainty
Usableprobability
Sampletechniquesofpredictiveanalytics…
RuleInduction Decisiontree&classification Regression
Clustering AffinityAnalysisNearestNeighbor
NeuralNetworks Geneticalgorithms
Whatdoesthismeanforfrauddetectionandprevention• BigDataandanalyticsprovidepowerfultoolsthatmayimproveanorganizationsfrauddetectionsystems
• COMPLIMENTARYtotraditionalexpert-basedfraud-detectionapproaches- DOESNOTREPLACE!!!
Socialnetworks:Thatis,fraudulentcompaniesaremoreconnectedtootherfraudulentcompaniesthantonon-fraudulentcompanies.
Whatdoesthismeanforfrauddetectionandprevention
Socialnetworks:Thatis,fraudulentcompaniesaremoreconnectedtootherfraudulentcompaniesthantonon-fraudulentcompanies.
Contextualinformation:SocialNetworkAnalysis
Usecaseofpredictiveanalyticstodetectfraud
• Context:CarinsurancecompanyinSA,operatesglobally
• Decliningprofits>increasedpremiums=fraudulentclaims
• Historicalclaimsdatawithknownfraudoutcomestopredictprobabilitythatnewclaimsarefraudulent!
• Understandingwhathashappened
• Problem:Repairshopsthatinflateestimates
Whatwedousinganalytics… Geo-spatialdata
• Ourproblem:Repairshopsthatinflaterepairestimates
UseofData:
• Claimants’address(Geocoded)
• Locationofrepairshops
• Averageclaimestimateforaparticularproblem
Analyzingthedata:
• Mapareaswhereestimatesarehigherthantheaverage
• Overlayclaimants’address
Algorithm:
• Predictbasedondistanceclaimanttravelstogetarepairdone>WHYtravellingoutsidearadius?
Usecaseofpredictiveanalyticstodetectfraud
• Algorithm:Claimantstravellingadistancetogetarepairdonecorrelateswiththerepairshopprovidingover-estimates(aboveaverage)
• >inflatedestimate>potentialfraud
• Outcome:• Reducetimerequiredtoreferquestionableclaimsforinvestigationbyasmuchas95%.
• Successrateinpursuingfraudulentclaimsfrom50%to88%!
• HealthcareinKenya!
Usecaseofpredictiveanalyticstomanage&preventfraud
• Context:Insurance(Turkey)
• Mismatchbetweenpublicandprivateprofilesofindividuals(narrativesforclaims)>Publicdatatoserveasareferenceforinternaldatabaserecords
• Relationshipbetweencustomerprofileandfraudulentclaims
• Useofsocialmediaasalisteningtool
Whatwedousinganalytics… SocialCRM• Ourproblem:Characteristics(customerbehaviorandfraudulentclaims)
UseofData:
• Consumersinternal“known”datacorroboratedwithexternalsocialdata(e.g.check-inat“home”is50kmawayfromregisteredaddress)
• Usingsocialanalytics(textandimages;check-in’s;likesetc.)
Analyzingthedata:
• Buildbehavioralprofilesfromsocialmediadata;
• Overlaybehavioraldatawithknownfraudulentclaims
Algorithm:
PredictbasedonbehavioraldataPROBABILITYoffraudulentclaim(relationshipbetweencustomerbehaviorandfraudulentclaims)
Sendforinvestigation:86%accuracy.Socialanalyticsisonlyanindicator>Investigatorsconfirmindependently
Usecaseofpredictiveanalyticstounderstandcreditcardfraud… Earlyadopters
• Context:Financialinstitution(largeimpairmentsonCCfraud)
• ”Classic”symptoms:Smallpurchasefollowedbyabigone;largenumberofonlinepurchasesinashortperiodoftime;spendingasmuchaspossiblequickly;smalleramounts,spreadacrosstimes
• Problem:“Normal”behaviourpatternsofCCusage>outliers
Whatwedousinganalytics… Supervisedandunsupervisedlearning
• Ourproblem:Identifycharacteristicsoftransactionsthatdeviatefromthenormalbehavior
UseofData:
• 2million+CCholders
Results
BusinessResults• 350+hoursofpureanalysis
• 3Monthsunderstanding
• Near-realtimedetectionoffraudulentpurchasesandCCuse
• 76%accuracy… >85%oncedataissuesfixed
Identifybusinessproblem
Identifydata
sources
Selectthedata
Cleanthedata
Transformthedata
Preprocessing
PreprocessingAnalyzethedata
Interpret,evaluate,deploythemodel
Keycharacteristicsofsuccessfulfraudanalyticsmodels
StatisticalAccuracy
Interpretability
OperationalefficiencyEconomiccost
Regulatorycompliance
Withtherightdata
• Garbagedatain>Garbagedataout• MasterDataManagement
• Policies• Governance• Processes• StandardsandTools• Leadstoincreasedaccuracyofpredictivemodels
Attheheartofpredictiveanalytics
ANALYTICSDataScienceiswhat Data
Scientistsdo….
Bringinthinkingandexpertisefromavarietyof
fieldstosolve“problems”
SowhyareweNOTleveragingpredictiveanalytics…1. Data-drivencompanyculture2. Whatisthevalue (costvsbenefit)3. Innovation:Sayingnobeforetrying– losingfirstmoveradvantage4. Leadership:Moredatadoesnotleadtosuccess– makingsenseofthedata
withcleargoalsdoes!5. Talentmanagement:Asdatabecomescheaper,thecomplementsbecome
expensive1. DataScientistswithabusinessunderstandingbecomecentral– Dowehavetheskills?
Whatskillsdoweneed?Whatisadatascientist?2. Problemsolvingskills:logicandreasoning– theabilitytoknowhow
non-traditionalandtraditionaldatasourcescanassistbusinessderiveanddrivevalue
ThankYouFormoreinformation,[email protected];[email protected];[email protected];0827845769