analytics and where it fits - acs dama sig

46
ANALYTICS AND WHERE IT FITS

Upload: russell-tibballs

Post on 12-Apr-2017

121 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Page 1: Analytics and Where it Fits - ACS DAMA SIG

ANALYTICSAND WHERE IT FITS

Page 2: Analytics and Where it Fits - ACS DAMA SIG

A LITTLE ABOUT YOURS TRULY

MY NAME IS RUSSELL TIBBALLS

I HAVE BEEN WORKING WITH DATA, PROGRAMMING, AND PERFORMING VARIOUS LEVELS OF ANALYSIS FOR OVER 40 YEARS. I JOINED THE CUSTOMS STATISTICS TEAM AFTER LEAVING HIGH SCHOOL IN NOVEMBER 1974.

I AM THE CANBERRA CHAIR OF IAPAI ALSO CHAIR THE ADVISORY COMMITTEE TO THE SCIENCES AT USQ

I HAVE A MASTERS IN SOCIAL RESEARCH METHODS (ANU), FOCUSES ON SURVEY ANALYSIS, INTERNATIONAL MIGRATION, AND ANALYSIS OF WEB PRESENCE USING SOCIAL NETWORK ANALYSIS (SNA).

ACS CERTIFIED PROFESSIONALTDWI CERTIFIED BUSINESS PROFESSIONAL – DATA ANALYSISSAS ADVANCED PROGRAMMER

ETC. ETC.

I MAJOR INTERESTS ARE:• MY FAMILY AND WHAT IS HAPPENING ON MY ACREAGE AND THE MOLONGO RIVER (IT ADJOINS)• ANY APPLICATIONS OF ANALYTICS IN THE HARD AND SOFT SCIENCES. YES I AM A TRAGIC WHO READS NATURE AND JSTOR PAPERS WHENEVER I CAN.• INTERNATIONAL MIGRATION

Page 3: Analytics and Where it Fits - ACS DAMA SIG

THE IMPRESSION• THE IMPRESSION IS THAT ANYONE GIVEN ACCESS TO THE RIGHT INFORMATION CAN

ANALYZE AND COME UP WITH A SOLUTION FOR ANY PROBLEM IN MOMENTS.• THE VIEW HAS BECOME INCREASING PERVASIVE. SEE – ‘ARE WE COOL YET?: A

LONGITUDINAL CONTENT ANALYSIS OF NERD AND GEEK REPRESENTATIONS IN POPULAR TELEVISION’ (2012 – CARDIEL C L)

• HOLLYWOOD HAS MOVED FROM THE MAD SCIENTIST WHO CAN WHIP A WORLD BEATING GADGET IN SECONDS (THINK DEXTERS LAB), TAKEN THE NERDY FRIEND OF THE HERO FROM HACKER (MARKY MARK IN DATE NIGHT) TO THE ANALYST WHO CAN LOG INTO THE INTERNET, FIND THE PETABYTES DATA YOU IN TO ANALYZE, TO STOP THE END OF THE WORLD IN MOMENTS; OCCASIONALLY SECONDS.

Page 4: Analytics and Where it Fits - ACS DAMA SIG

THE MAD SCIENTIST

Page 5: Analytics and Where it Fits - ACS DAMA SIG

THE HACKER

Page 6: Analytics and Where it Fits - ACS DAMA SIG

THE ANALYST

Page 7: Analytics and Where it Fits - ACS DAMA SIG

ANALYST DO ANALYTICS

Page 8: Analytics and Where it Fits - ACS DAMA SIG

A FEW DEFINITIONS

• FROM EVAN STUBBS “THE VALUE OF BUSINESS ANALYTICS”• ANALYTICS – ANY DATA DRIVEN PROCESS THAT PROVIDES INSIGHT• COMMON FORMS ARE:

• REPORTING – THE ORGANISATION OF HISTORICAL DATA• TREND ANALYSIS – THE IDENTIFICATION OF PATTERNS IN TIMES SERIES DATA• SEGMENTATION – IDENTIFICATION OF SIMILARITIES WITHIN DATA• PREDICTIVE MODELING – PREDICTION OF FUTURE EVENTS USING HISTORICAL

DATA

Page 9: Analytics and Where it Fits - ACS DAMA SIG

CONTINUING FROM EVAN STUBBS• ALL APPLICATIONS OF ANALYTICS HAVE A NUMBER OF COMMON CHARACTERISATIONS:

• THEY ARE BASED ON DATA• THEY APPLY VARIOUS MATHEMATICAL TECHNIQUES TO TRANSFORM AND SUMMARIZE THE RAW DATA• THE ADD VALUE TO THE ORIGINAL DATA AND TRANSFORM IT INTO “KNOWLEDGE”

• ADVANCED ANALYTICS HOWEVER AIMS TO IDENTIFY:• WHY THINGS ARE HAPPENING• WHAT WILL HAPPEN NEXT• WHAT IS THE POSSIBLE COURSE OF ACTION

• THE BUSINESS OUTCOME DRIVERS FOR THE USE OF “BUSINESS ANALYTICS” ARE:• BUSINESS RELEVANCY• ACTIONABLE INSIGHT• PERFORMANCE MEASUREMENT AND VALUE MEASUREMENT

• KNOWLEDGE IS A FAMILIARITY, AWARENESS OR UNDERSTANDING OF SOMEONE OR SOMETHING, SUCH AS FACTS, INFORMATION, DESCRIPTIONS, OR SKILLS, WHICH IS ACQUIRED THROUGH EXPERIENCE OR EDUCATION BY PERCEIVING, DISCOVERING, OR LEARNING.

• IN GOVERNMENT AN AGENCY’S RELEVANCY IS MEASURED IN TERMS OF POLICY ALIGNMENT

Page 10: Analytics and Where it Fits - ACS DAMA SIG

THE PRIVATE SECTOR PERSPECTIVE

• TO INCREASE THE EFFICIENCY OF DELIVERY AND VALUE TO THE CUSTOMER• DRIVERS BEING INCREASING MARKET SHARE AND PROFITS • AND MOST IMPORTANTLY TO DELIVER BENEFIT TO THE SHAREHOLDER

• ANY PRIVATE OR PUBLIC ENTERPRISE HAS TWO MAIN DRIVERS:• THE WISHES OF ITS OWNERS, SHAREHOLDERS, OR GOVERNMENT.• THE ONGOING RELEVANCY OF THE ORGANIZATION

Page 11: Analytics and Where it Fits - ACS DAMA SIG

SOME EXAMPLES OF ANALYTICS. SOME IS INTERESTING!

Page 12: Analytics and Where it Fits - ACS DAMA SIG

SOME NOT AS EXCITING BUT STILL INTERESTING

Page 13: Analytics and Where it Fits - ACS DAMA SIG

THE HEDGEHOG AND THE FOXTHE HEDGEHOG AND THE FOX IS AN ESSAY BY PHILOSOPHER ISAIAH BERLIN. IT WAS ONE OF BERLIN'S MOST POPULAR ESSAYS WITH THE GENERAL PUBLIC.

BERLIN EXPANDS UPON THIS IDEA TO DIVIDE WRITERS AND THINKERS INTO TWO CATEGORIES: HEDGEHOGS, WHO VIEW THE WORLD THROUGH THE LENS OF A SINGLE DEFINING IDEA, AND FOXES WHO DRAW ON A WIDE VARIETY OF EXPERIENCES AND FOR WHOM THE WORLD CANNOT BE BOILED DOWN TO A SINGLE IDEA.

IN HIS 2012 NEW YORK TIMES BEST-SELLING BOOK THE SIGNAL AND THE NOISE, FORECASTER NATE SILVER URGES READERS TO BE "MORE FOXY" AFTER SUMMARIZING BERLIN'S DISTINCTION.

Page 14: Analytics and Where it Fits - ACS DAMA SIG

A BRIEF DETOUR ON THE VENDOR VIEW OF A DATA SCIENTIST/ANALYST (ADVANCED

ANALYTICS)

Page 15: Analytics and Where it Fits - ACS DAMA SIG

THE DATA ANALYST

Page 16: Analytics and Where it Fits - ACS DAMA SIG

THE UNICORN

Page 17: Analytics and Where it Fits - ACS DAMA SIG

DATA ANALYTICS

• ANALYTICS – ANY DATA DRIVEN PROCESS THAT PROVIDES INSIGHT• COMMON FORMS ARE:

• REPORTING – THE ORGANISATION OF HISTORICAL DATA• TREND ANALYSIS – THE IDENTIFICATION OF PATTERNS IN TIMES SERIES DATA• SEGMENTATION – IDENTIFICATION OF SIMILARITIES WITHIN DATA• PREDICTIVE MODELING – PREDICTION OF FUTURE EVENTS USING HISTORICAL

DATA

Page 18: Analytics and Where it Fits - ACS DAMA SIG

REPORTING• VERB - MAKE A FORMAL STATEMENT OR COMPLAINT ABOUT (SOMEONE OR

SOMETHING) TO THE NECESSARY AUTHORITY.• NOUN- AN ACCOUNT GIVEN OF A PARTICULAR MATTER, ESPECIALLY IN THE

FORM OF AN OFFICIAL DOCUMENT, AFTER THOROUGH INVESTIGATION OR CONSIDERATION BY AN APPOINTED PERSON OR BODY. E.G., "THE CHAIRMAN'S ANNUAL REPORT”. A SPOKEN OR WRITTEN DESCRIPTION OF AN EVENT OR SITUATION, ESPECIALLY ONE INTENDED FOR PUBLICATION OR BROADCASTING IN THE MEDIA.

Page 19: Analytics and Where it Fits - ACS DAMA SIG

TREND ANALYSIS• TREND ANALYSIS IS THE PRACTICE OF COLLECTING INFORMATION AND ATTEMPTING TO SPOT A

PATTERN, OR TREND, IN THE INFORMATION. IN SOME FIELDS OF STUDY, THE TERM "TREND ANALYSIS" HAS MORE FORMALLY DEFINED MEANINGS.[1][2][3]

• ALTHOUGH TREND ANALYSIS IS OFTEN USED TO PREDICT FUTURE EVENTS, IT COULD BE USED TO ESTIMATE UNCERTAIN EVENTS IN THE PAST, SUCH AS HOW MANY ANCIENT KINGS PROBABLY RULED BETWEEN TWO DATES, BASED ON DATA SUCH AS THE AVERAGE YEARS WHICH OTHER KNOWN KINGS REIGNED.

• IN STATISTICS, TREND ANALYSIS OFTEN REFERS TO TECHNIQUES FOR EXTRACTING AN UNDERLYING PATTERN OF BEHAVIOR IN A TIME SERIES WHICH WOULD OTHERWISE BE PARTLY OR NEARLY COMPLETELY HIDDEN BY NOISE. A SIMPLE DESCRIPTION OF THESE TECHNIQUES IS TREND ESTIMATION, WHICH CAN BE UNDERTAKEN WITHIN A FORMAL REGRESSION ANALYSIS.

Page 20: Analytics and Where it Fits - ACS DAMA SIG

SEGMENTATION• MARKET SEGMENTATION IS A MARKETING STRATEGY WHICH INVOLVES DIVIDING A

BROAD TARGET MARKET INTO SUBSETS OF CONSUMERS, BUSINESSES, OR COUNTRIES THAT HAVE, OR ARE PERCEIVED TO HAVE, COMMON NEEDS, INTERESTS, AND PRIORITIES, AND THEN DESIGNING AND IMPLEMENTING STRATEGIES TO TARGET THEM. MARKET SEGMENTATION STRATEGIES ARE GENERALLY USED TO IDENTIFY AND FURTHER DEFINE THE TARGET CUSTOMERS, AND PROVIDE SUPPORTING DATA FOR MARKETING PLAN ELEMENTS SUCH AS POSITIONING TO ACHIEVE CERTAIN MARKETING PLAN OBJECTIVES. BUSINESSES MAY DEVELOP PRODUCT DIFFERENTIATION STRATEGIES, OR AN UNDIFFERENTIATED APPROACH, INVOLVING SPECIFIC PRODUCTS OR PRODUCT LINES DEPENDING ON THE SPECIFIC DEMAND AND ATTRIBUTES OF THE TARGET SEGMENT.

Page 21: Analytics and Where it Fits - ACS DAMA SIG

PREDICTIVE MODELLING• PREDICTIVE MODELING USES STATISTICS TO PREDICT OUTCOMES.[1] MOST OFTEN THE EVENT ONE WANTS TO PREDICT IS IN

THE FUTURE, BUT PREDICTIVE MODELLING CAN BE APPLIED TO ANY TYPE OF UNKNOWN EVENT, REGARDLESS OF WHEN IT OCCURRED. FOR EXAMPLE, PREDICTIVE MODELS ARE OFTEN USED TO DETECT CRIMES AND IDENTIFY SUSPECTS, AFTER THE CRIME HAS TAKEN PLACE.[2]

• IN MANY CASES THE MODEL IS CHOSEN ON THE BASIS OF DETECTION THEORY TO TRY TO GUESS THE PROBABILITY OF AN OUTCOME GIVEN A SET AMOUNT OF INPUT DATA, FOR EXAMPLE GIVEN AN EMAIL DETERMINING HOW LIKELY THAT IT IS SPAM.

• MODELS CAN USE ONE OR MORE CLASSIFIERS IN TRYING TO DETERMINE THE PROBABILITY OF A SET OF DATA BELONGING TO ANOTHER SET, SAY SPAM OR 'HAM'.

• DEPENDING ON DEFINITIONAL BOUNDARIES, PREDICTIVE MODELLING IS SYNONYMOUS WITH, OR LARGELY OVERLAPPING WITH, THE FIELD OF MACHINE LEARNING, AS IT IS MORE COMMONLY REFERRED TO IN ACADEMIC OR RESEARCH AND DEVELOPMENT CONTEXTS. WHEN DEPLOYED COMMERCIALLY, PREDICTIVE MODELLING IS OFTEN REFERRED TO AS PREDICTIVE ANALYTICS.

Page 22: Analytics and Where it Fits - ACS DAMA SIG

WHERE DOES ANALYTICS FIT

ANYWHERE YOU NEED TO MAKE A DECISION!TODAY, STATISTICAL METHODS ARE APPLIED IN ALL

FIELDS THAT INVOLVE DECISION MAKING, FOR MAKING ACCURATE INFERENCES FROM A COLLATED BODY OF DATA AND FOR MAKING DECISIONS IN THE

FACE OF UNCERTAINTY BASED ON STATISTICAL METHODOLOGY.

Page 23: Analytics and Where it Fits - ACS DAMA SIG

ANALYTICS HAS MANY LEVELS OF COMPLEXITY

ESTIMATION

THE USE OF STATISTICAL METHODS DATES BACK TO LEAST TO THE 5TH CENTURY BCE. THE HISTORIAN THUCYDIDES IN HIS HISTORY OF THE PELOPONNESIAN WAR [2] DESCRIBES HOW THE ATHENIANS CALCULATED THE HEIGHT OF THE WALL OF PLATEA BY COUNTING THE NUMBER OF BRICKS IN AN UNPLASTERED SECTION OF THE WALL SUFFICIENTLY NEAR THEM TO BE ABLE TO COUNT THEM. THE COUNT WAS REPEATED SEVERAL TIMES BY A NUMBER OF SOLDIERS. THE MOST FREQUENT VALUE (IN MODERN TERMINOLOGY - THE MODE ) SO DETERMINED WAS TAKEN TO BE THE MOST LIKELY VALUE OF THE NUMBER OF BRICKS. MULTIPLYING THIS VALUE BY THE HEIGHT OF THE BRICKS USED IN THE WALL ALLOWED THE ATHENIANS TO DETERMINE THE HEIGHT OF THE LADDERS NECESSARY TO SCALE THE WALLS.

HOW TO MEASURE ANYTHING: FINDING THE VALUE OF INTANGIBLES BY DONALD W HUBBARD

Page 24: Analytics and Where it Fits - ACS DAMA SIG

THE CENSUSTHE BIBLICAL STORY OF THE BIRTH OF JESUS WAS SET IN THE CONTEXT OF THE CENSUS. IN 6 CE PUBLIUS SULPICIUS QUIRINIUS (51 BCE-21 CE), A DISTINGUISHED SOLDIER AND FORMER CONSUL, WAS APPOINTED IMPERIAL LEGATE (GOVERNOR) OF THE PROVINCE OF ROMAN SYRIA. IN THE SAME YEAR JUDEA WAS DECLARED A ROMAN PROVINCE, AND QUIRINIUS WAS TASKED TO CARRY OUT A CENSUS OF THE NEW TERRITORY FOR TAX PURPOSES.

’ IN THOSE DAYS A DECREE WENT OUT FROM EMPEROR AUGUSTUS THAT ALL THE WORLD SHOULD BE REGISTERED. THIS WAS THE FIRST REGISTRATION AND WAS TAKEN WHILE QUIRINIUS WAS GOVERNOR OF SYRIA. ALL WENT TO THEIR OWN TOWNS TO BE REGISTERED. JOSEPH ALSO WENT FROM THE TOWN OF NAZARETH IN GALILEE TO JUDEA, TO THE CITY OF DAVID CALLED BETHLEHEM, BECAUSE HE WAS DESCENDED FROM THE HOUSE AND FAMILY OF DAVID. HE WENT TO BE REGISTERED WITH MARY, TO WHOM HE WAS ENGAGED AND WHO WAS EXPECTING A CHILD. (LUKE 2:1–7)’

Page 25: Analytics and Where it Fits - ACS DAMA SIG

SAMPLINGTHE TRIAL OF THE PYX IS A TEST OF THE PURITY OF THE COINAGE OF THE ROYAL MINT WHICH HAS BEEN HELD ON A REGULAR BASIS SINCE THE 12TH CENTURY. THE TRIAL ITSELF IS BASED ON STATISTICAL SAMPLING METHODS. AFTER MINTING A SERIES OF COINS - ORIGINALLY FROM TEN POUNDS OF SILVER - A SINGLE COIN WAS PLACED IN THE PYX - A BOX IN WESTMINSTER ABBEY. AFTER A GIVEN PERIOD - NOW ONCE A YEAR - THE COINS ARE REMOVED AND WEIGHED. A SAMPLE OF COINS REMOVED FROM THE BOX ARE THEN TESTED FOR PURITY.

Page 26: Analytics and Where it Fits - ACS DAMA SIG

THE MEAN AND MEDIAN• THE ARITHMETIC MEAN, ALTHOUGH A CONCEPT KNOWN TO THE GREEKS, WAS NOT

GENERALIZED TO MORE THAN TWO VALUES UNTIL THE 16TH CENTURY. THE INVENTION OF THE DECIMAL SYSTEM BY SIMON STEVIN IN 1585 SEEMS LIKELY TO HAVE FACILITATED THESE CALCULATIONS. THIS METHOD WAS FIRST ADOPTED IN ASTRONOMY BY TYCHO BRAHE WHO WAS ATTEMPTING TO REDUCE THE ERRORS IN HIS ESTIMATES OF THE LOCATIONS OF VARIOUS CELESTIAL BODIES.

• THE IDEA OF THE MEDIAN ORIGINATED IN EDWARD WRIGHT'S BOOK ON NAVIGATION (CERTAINE ERRORS IN NAVIGATION) IN 1599 IN A SECTION CONCERNING THE DETERMINATION OF LOCATION WITH A COMPASS. WRIGHT FELT THAT THIS VALUE WAS THE MOST LIKELY TO BE THE CORRECT VALUE IN A SERIES OF OBSERVATIONS.

Page 27: Analytics and Where it Fits - ACS DAMA SIG

DEMOGRAPHYGAIN UNDERSTANDING COMPLEX SOCIAL PHENOMENA

THE BIRTH OF STATISTICS IS OFTEN DATED TO 1662, WHEN JOHN GRAUNT, ALONG WITH WILLIAM PETTY, DEVELOPED EARLY HUMAN STATISTICAL AND CENSUS METHODS THAT PROVIDED A FRAMEWORK FOR MODERN DEMOGRAPHY. HE PRODUCED THE FIRST LIFE TABLE, GIVING PROBABILITIES OF SURVIVAL TO EACH AGE. HIS BOOK NATURAL AND POLITICAL OBSERVATIONS MADE UPON THE BILLS OF MORTALITY USED ANALYSIS OF THE MORTALITY ROLLS TO MAKE THE FIRST STATISTICALLY BASED ESTIMATION OF THE POPULATION OF LONDON. HE KNEW THAT THERE WERE AROUND 13,000 FUNERALS PER YEAR IN LONDON AND THAT THREE PEOPLE DIED PER ELEVEN FAMILIES PER YEAR. HE ESTIMATED FROM THE PARISH RECORDS THAT THE AVERAGE FAMILY SIZE WAS 8 AND CALCULATED THAT THE POPULATION OF LONDON WAS ABOUT 384,000.

IN 1802 LAPLACE ESTIMATED THE POPULATION OF FRANCE TO BE 28,328,612. [11] HE CALCULATED THIS FIGURE USING THE NUMBER OF BIRTHS IN THE PREVIOUS YEAR AND CENSUS DATA FOR THREE COMMUNITIES. THE CENSUS DATA OF THESE COMMUNITIES SHOWED THAT THEY HAD 2,037,615 PERSONS AND THAT THE NUMBER OF BIRTHS WERE 71,866. ASSUMING THAT THESE SAMPLES WERE REPRESENTATIVE OF FRANCE, LAPLACE PRODUCED HIS ESTIMATE FOR THE ENTIRE POPULATION.

Page 28: Analytics and Where it Fits - ACS DAMA SIG

PREDICT ORBIT OF PLANETSTHE METHOD OF LEAST SQUARES, WHICH WAS USED TO MINIMIZE ERRORS IN DATA MEASUREMENT, WAS PUBLISHED INDEPENDENTLY BY ADRIEN-MARIE LEGENDRE (1805), ROBERT ADRAIN (1808), AND CARL FRIEDRICH GAUSS (1809). GAUSS HAD USED THE METHOD IN HIS FAMOUS 1801 PREDICTION OF THE LOCATION OF THE DWARF PLANET CERES. THE OBSERVATIONS THAT GAUSS BASED HIS CALCULATIONS ON WERE MADE BY THE ITALIAN MONK PIAZZI.

A DETAILED ACCOUNT OF THE METHOD USED CAN BE FOUND AT HTTP://SCIENCE.LAROUCHEPAC.COM/GAUSS/CERES/INTERIMII/ASTRONOMY/KEPLERPROBLEM.HTML

Page 29: Analytics and Where it Fits - ACS DAMA SIG

WISDOM OF CROWDSFRANCIS GALTON IS CREDITED AS ONE OF THE PRINCIPAL FOUNDERS OF STATISTICAL THEORY. HIS CONTRIBUTIONS TO THE FIELD INCLUDED INTRODUCING THE CONCEPTS OF STANDARD DEVIATION, CORRELATION, REGRESSION AND THE APPLICATION OF THESE METHODS TO THE STUDY OF THE VARIETY OF HUMAN CHARACTERISTICS - HEIGHT, WEIGHT, EYELASH LENGTH AMONG OTHERS. HE FOUND THAT MANY OF THESE COULD BE FITTED TO A NORMAL CURVE DISTRIBUTION.[19]

GALTON SUBMITTED A PAPER TO NATURE IN 1907 ON THE USEFULNESS OF THE MEDIAN.[20] HE EXAMINED THE ACCURACY OF 787 GUESSES OF THE WEIGHT OF AN OX AT A COUNTRY FAIR. THE ACTUAL WEIGHT WAS 1208 POUNDS: THE MEDIAN GUESS WAS 1198. THE GUESSES WERE MARKEDLY NON-NORMALLY DISTRIBUTED.

Page 30: Analytics and Where it Fits - ACS DAMA SIG

AGRICULTURETHE SECOND WAVE OF MATHEMATICAL STATISTICS WAS PIONEERED BY RONALD FISHER WHO WROTE TWO TEXTBOOKS, STATISTICAL METHODS FOR RESEARCH WORKERS, PUBLISHED IN 1925 AND THE DESIGN OF EXPERIMENTS IN 1935, THAT WERE TO DEFINE THE ACADEMIC DISCIPLINE IN UNIVERSITIES AROUND THE WORLD. HE ALSO SYSTEMATIZED PREVIOUS RESULTS, PUTTING THEM ON A FIRM MATHEMATICAL FOOTING. IN HIS 1918 SEMINAL PAPER THE CORRELATION BETWEEN RELATIVES ON THE SUPPOSITION OF MENDELIAN INHERITANCE, THE FIRST USE TO USE THE STATISTICAL TERM, VARIANCE. IN 1919, AT ROTHAMSTED EXPERIMENTAL STATION HE STARTED A MAJOR STUDY OF THE EXTENSIVE COLLECTIONS OF DATA RECORDED OVER MANY YEARS. THIS RESULTED IN A SERIES OF REPORTS UNDER THE GENERAL TITLE STUDIES IN CROP VARIATION. IN 1930 HE PUBLISHED THE GENETICAL THEORY OF NATURAL SELECTION WHERE HE APPLIED STATISTICS TO EVOLUTION.

Page 31: Analytics and Where it Fits - ACS DAMA SIG

MEDICINE, RELIABILITY, AND JURISPRUDENCE

• THE TERM BAYESIAN REFERS TO THOMAS BAYES (1702–1761), WHO PROVED A SPECIAL CASE OF WHAT IS NOW CALLED BAYES' THEOREM. HOWEVER IT WAS PIERRE-SIMON LAPLACE (1749–1827) WHO INTRODUCED A GENERAL VERSION OF THE THEOREM AND APPLIED IT TO CELESTIAL MECHANICS, MEDICAL STATISTICS, RELIABILITY, AND JURISPRUDENCE.[52].

• AN INTERESTING READ - HTTP://BLOGS.SCIENTIFICAMERICAN.COM/CROSS-CHECK/ARE-BRAINS-BAYESIAN/

Page 32: Analytics and Where it Fits - ACS DAMA SIG

A QUICK HISTORYTime Contributor ContributionAncient Greece Philosophe

rs Ideas - no quantitative analyses

17th CenturyGraunt, PettyPascal, Bernoulli

studied affairs of state, vital statistics of populationsstudied probability through games of chance, gambling

18th Century Laplace, Gauss normal curve, regression through study of astronomy

19th Century QueteletGalton

astronomer who first applied statistical analyses to human biologystudied genetic variation in humans(used regression and correlation)

20th Century (early)

PearsonGossett (Student)Fisher

studied natural selection using correlation, formed first academic department of statistics, Biometrika journal, helped develop the Chi Square analysisstudied process of brewing, alerted the statistics community about problems with small sample sizes, developed Student's testevolutionary biologists - developed ANOVA, stressed the importance of experimental design

20th Century (later)

WilcoxonKruskal, WallisSpearmanKendallTukeyDunnettKeulsComputer Technology

biochemist studied pesticides, non-parametric equivalent of two-samples testeconomists who developed the non-parametric equivalent of the ANOVApsychologist who developed a non-parametric equivalent of the correlation coefficientstatistician who developed another non-parametric equivalent the correlation coefficientstatistician who developed multiple comparisons procedurebiochemist who studied pesticides, developed multiple comparisons procedure for control groupsagronomist who developed multiple comparisons procedureprovided many advantages over calculations by hand or by calculator, stimulated the growth of investigation into new techniques

http://www.anselm.edu/homepage/jpitocch/biostatstime.html

Page 33: Analytics and Where it Fits - ACS DAMA SIG

SO HOW DO YOU DECIDE OF WHICH ANALYTIC TOOLS TO USE?

WELL ACTUALLY THAT IS THE WRONG QUESTION?

PROCESS IS MUCH MORE IMPORTANT THAN THE TOOLS. THE TOOL/S SHOULD SUPPORT THE PROCESS

Page 34: Analytics and Where it Fits - ACS DAMA SIG

TO GAIN BUSINESS UNDERSTANDING/SCOPE AND PLANNING/

PLANTHERE IS NO TOOL FOR THIS:

YOU NEED TO RESEARCH: • UNDERSTAND THE CONTEXT OF YOUR INVESTIGATION• UNDERSTAND WHAT IS IMPORTANT TO THE BUSINESS/AGENCY/ORG.• WHAT HAS GONE BEFORE• WHAT MIGHT BE DONE DIFFERENTLY• WAS THE INFORMATION YOU HAD ACCESS TO VALID INPUT?

Page 35: Analytics and Where it Fits - ACS DAMA SIG

DATA COLLECTION

Page 36: Analytics and Where it Fits - ACS DAMA SIG

DATA COLLECTION CONTINUED

Page 37: Analytics and Where it Fits - ACS DAMA SIG

DATA UNDERSTANDING/DISCOVERYTHERE ARE SEVERAL TOOLS THAT CAN HELP YOU HERE:

MOST SITES THESE DAYS HAVE REPORTS AND BUSINESS INTELLIGENCE DASHBOARDS THAT WILL GIVE YOU AN INSIGHT INTO HOW A BUSINESS/AGENCY/ORG SEES ITSELF. GAIN AS MUCH INSIGHT AS YOU CAN FROM THESE EXISTING PRODUCTS. DON’T ACCEPT THAT THEY ARE THE FULL STORY – THEY NOT.

EXCEL: USE PIVOT, AND CHARTING TO GAIN A BASIC UNDERSTANDING.OTHER COMMON TOOLS ARE:• SAS/VA• TABLEAU• QLIK• SPSS• SQL• STATISTICA• MATLAB• ETC

Page 38: Analytics and Where it Fits - ACS DAMA SIG

MODELLINGUSE THE APPROPRIATE TOOL FOR YOUR

INVESTIGATION.• USE THE APPROPRIATE DATA• USE AN APPROPRIATE METHOD• ITERATE AND CHECK THAT YOUR RESULTS MAKE SENSE IN THE CONTEXT OF THE COLLECTION, AND THE

QUESTION YOU ARE LOOKING TO ANSWER• SAYING – TO A CARPENTER THE SOLUTION TO EVERYTHING LOOKS LIKE A NAIL.• ALL ANALYSTS HAVE THEIR BENT TOWARDS PARTICULAR TOOLS – MINE BENT IS TOWARD THE MODELING

TECHNIQUES USED IN THE SOCIAL SCIENCES BECAUSE THAT IS WHAT I STUDIED. BE AWARE OF THE LIMITS OF YOUR FAVOURITE TOOLS AND BE WILLING TO LEARN NEW TRICKS.

Page 39: Analytics and Where it Fits - ACS DAMA SIG

EVALUATING/CHECK/VALIDATIONHAVE A PROCESS AND STANDARDS FOR YOUR ENVIRONMENT THAT LAYS OUT THE RULES FOR EVALUATING YOUR MODEL. THE STANDARD WILL DEPEND ON THE TOOLS THAT YOU USE. MOST TOOLS SUCH AS CORRELATION, ANOVA, REGRESSION, ETC.; HAVE WELL UNDERSTOOD METHODS OF EVALUATION. HOWEVER CHECK THE WHOLE PROCESS AND IF YOU WANT TO USE THIS MODEL HAVE YOUR TEAM REVIEW AS WELL. WE ALL LIKE TO THINK WE NEVER MAKE MISTAKES; UNFORTUNATELY THAT IS NEVER TRUE.

MAKE SURE THAT THE PROCESS INCLUDES SOME SANITY CHECK METHODS. IE THAT THE NUMBER OF ROWS/OBSERVATIONS THAT WERE READ IS WHAT YOU EXPECTED.

Page 40: Analytics and Where it Fits - ACS DAMA SIG

DEPLOYMENT/ACTAFTER EVALUATING AND VALIDATING YOUR MODEL IT IS OFTEN ’DEPLOYED’ TO OPERATIONAL SYSTEMS AND REPORTS.

SCORING: OFTEN THE OUTPUT OF THE MODEL WILL BE SCORE THAT USED AS INPUT TO OPERATIONAL SYSTEMS. EG, ESTIMATES FINANCIAL RISK, TRAVEL TIME, FUEL CONSUMPTION, RESOURCE REQUIREMENT, AND MEDICAL OUTCOMES.

PARAMETER TO REPORTING: INTEGRATION INTO BUSINESS INTELLIGENCE DASHBOARDS, AND REGULAR MANAGEMENT INFORMATION SYSTEM REPORTS.

OPERATIONAL SYSTEMS: MODELS PROVIDE INPUT TO ALL MANNER OF OPERATIONAL SYSTEMS RANGING FROM PRODUCTION CONTROL PROCESSING, LOGISTICS, FRAUD DETECTION, SYSTEMS MANAGEMENT, AND TRAFFIC CONTROL.

Page 41: Analytics and Where it Fits - ACS DAMA SIG

BUSINESS UNDERSTANDING/REPORTALL ANALYTICS IS UNDERTAKEN WITHIN A GIVEN CONTEXT. IN A RESEARCH CONTEXT A PAPER WILL BE THE OUTCOME WITH AN ABSTRACT, BACKGROUND, METHODS, RESULTS, CONCLUSION. IN A COMMERCIAL SETTING ANY FINDING (IN MY LIMITED EXPERIENCE) ARE REPORTED IN A VERY SIMILAR MANNER.

REGARDLESS THE OUTCOMES OF THE ANALYTICS PROCESS SHOULD BE DOCUMENTED AND ADDED TO THE COLLECTIVE STORE OF BUSINESS KNOWLEDGE AT YOUR SITE.

Page 42: Analytics and Where it Fits - ACS DAMA SIG

PS DON’T OVER COMPLICATE THINGS

Page 43: Analytics and Where it Fits - ACS DAMA SIG

MONITOR/REVIEW/REPEATKNOWLEDGE IS NOT STATIC. THERE ARE THE THINGS YOU KNOW ARE GOING TO HAPPEN ANDTHERE ARE NEW FACTORS THAT YOU WILL NOT HAVE THOUGHT OF.

BOX ‘FOR SUCH A MODEL THERE IS NO NEED TO ASK THE QUESTION "IS THE MODEL TRUE?". IF "TRUTH" IS TO BE THE "WHOLE TRUTH" THE ANSWER MUST BE "NO". THE ONLY QUESTION OF INTEREST IS "IS THE MODEL ILLUMINATING AND USEFUL?”’

IN SHORT ‘ALL MODELS ARE WRONG, SOME ARE USEFUL'

MONITOR: COMPARE THE ACTUAL PERFORMANCE OF THE MODELS YOU PRODUCE AGAINST EXPECTED/PLANNED PERFORMANCE. BE PREPARED TO PROCEED WITH A PROCESS OF CONTINUAL IMPROVEMENT.

Page 44: Analytics and Where it Fits - ACS DAMA SIG

CONCLUSIONWHAT IS ANALYTICS - ANY DATA DRIVEN PROCESS THAT PROVIDES INSIGHT

WHERE DOES IT(ANALYTICS) FIT – ANYWHERE WE YOU NEED TO MAKE A DECISION

Page 45: Analytics and Where it Fits - ACS DAMA SIG

A FEW ANALYTICS TOOLS• THE 40 DATA SCIENCE

TECHNIQUES

1 LINEAR REGRESSION 

2 LOGISTIC REGRESSION 

3 JACKKNIFE REGRESSION *

4 DENSITY ESTIMATION 

5 CONFIDENCE INTERVAL 

6 TEST OF HYPOTHESES 

7 PATTERN RECOGNITION 

8 CLUSTERING - (AKA UNSUPERVISED LEARNING)

9 SUPERVISED LEARNING 

1 TIME SERIES 

1 DECISION TREES 

1 RANDOM NUMBERS 

1 MONTE-CARLO SIMULATION 

1 BAYESIAN STATISTICS 

1 NAIVE BAYES 

1Principal Component Analysis - (PCA)1Ensembles 1Neural Networks 1Support Vector Machine - (SVM)2Nearest Neighbors - (k-NN)2Feature Selection - (aka Variable Reduction)2Indexation / Cataloguing *2(Geo-) Spatial Modelling 2Recommendation Engine *2Search Engine *2Attribution Modelling *2Collaborative Filtering *2Rule System 2Linkage Analysis 

3Association Rules 3Scoring Engine 3Segmentation 3Predictive Modelling 3Graphs 3Deep Learning 3Game Theory 3Imputation 3Survival Analysis 3Arbitrage 4Lift Modelling 4Yield Optimization4Cross-Validation4Model Fitting

Page 46: Analytics and Where it Fits - ACS DAMA SIG

SOME THINGS TO CHECK OUTINFORMATIVEHTTP://WWW.KDNUGGETS.COM

HTTP://WWW.PREDICTIVEANALYTICSTODAY.COM/DEPLOYMENT-PREDICTIVE-MODELS

BOOKS

HTTP://SHOP.OREILLY.COM/CATEGORY/EBOOKS.DO

COOLHTTPS://RAPIDMINER.COMXPATH CAPABILITIES FOR WEB SCRAPING USING GOOGLE DOCSHTTP://NODEXL.CODEPLEX.COMHTTPS://D3JS.ORGHTTP://WWW.FACULTY.UCR.EDU/~HANNEMAN/NETTEXT/ (SOCIAL NETWORK ANALYSI)

HTTPS://WWW.KAGGLE.COM

EDUCATION – CHEAP AND AT WHATEVER PACE YOU WANT TO TAKEHTTPS://WWW.UDEMY.COM

ACADEMIC EDUCATIONIN CANBERRA BOTH THE ANU AND CU HAVE GOOD COURSES

AND USQ HAS EXCELLENT COURSES AS – SO DO A LOT OF OTHERS

I WAS ASKED WHO TO FOLLOW ON TWITTER, FOLLOW TRY JUST SEARCH FOR DATA SCIENCE, AND ANALYTICS AND CHOOSE WHO TO FOLLOW.. ALSO FOLLOW THE JOURNALS, NATURE, AND OTHERS.