analysis patterns
Post on 16-Apr-2017
117 Views
Preview:
TRANSCRIPT
ANALYSIS PATTERNS
FASTEST SCORERS
CRICKET“ I’ve always been curious… who
among India’s prolific one-day run-getters had the best strike rate?
Sachin?
Sehwag?
What about the rest of the world?
LET’S TAKE ONE DAY CRICKET DATA
Country Player Runs ScoreRate MatchDate Ground VersusAustralia Michael J Clarke 99* 93.39 30-06-2010 The Oval EnglandAustralia Dean M Jones 99* 128.57 28-01-1985 Adelaide Oval Sri LankaAustralia Bradley J Hodge 99* 115.11 04-02-2007 Melbourne Cricket Ground New ZealandIndia Virender Sehwag 99* 99 16-08-2010 Rangiri Dambulla International Stad. Sri LankaNew Zealand Bruce A Edgar 99* 72.79 14-02-1981 Eden Park IndiaPakistan Mohammad Yousuf 99* 95.19 15-11-2007 Captain Roop Singh Stadium IndiaWest Indies Richard B Richardson 99* 70.21 15-11-1985 Sharjah CA Stadium PakistanWest Indies Ramnaresh R Sarwan 99* 95.19 15-11-2002 Sardar Patel Stadium IndiaZimbabwe Andrew Flower 99* 89.18 24-10-1999 Harare Sports Club AustraliaZimbabwe Alistair D R Campbell 99* 79.83 01-10-2000 Queens Sports Club New ZealandZimbabwe Malcolm N Waller 99* 133.78 25-10-2011 Queens Sports Club New ZealandAustralia David C Boon 98* 82.35 08-12-1994 Bellerive Oval ZimbabweAustralia Graeme M Wood 98* 63.22 11-01-1981 Melbourne Cricket Ground IndiaEngland Ian J L Trott 98* 84.48 20-10-2011 Punjab Cricket Association Stadium IndiaIndia Yuvraj Singh 98* 89.09 01-08-2001 Sinhalese Sports Club Ground Sri LankaIreland Kevin J O'Brien 98* 94.23 10-07-2010 VRA Ground ScotlandKenya Collins O Obuya 98* 75.96 13-03-2011 M.Chinnaswamy Stadium AustraliaNetherlands Ryan N ten Doeschate 98* 73.68 01-09-2009 VRA Ground AfghanistanNew Zealand James E C Franklin 98* 142.02 07-12-2010 M.Chinnaswamy Stadium IndiaPakistan Ijaz Ahmed 98* 112.64 28-10-1994 Iqbal Stadium South AfricaSouth Africa Jacques H Kallis 98* 74.24 06-02-2000 St George's Park Zimbabwe
Against which countries are higher averages
scored?
Which countries’ players score more per
match?
Which player scores the most per ball?
The player with the highest strike rate is an obscure South African whose name most of us have never heard of.
In fact, this list is filled with players we have never heard of.
ODI STRIKE RATES OF THE WORLD
We want to see the prioritised performance. That is, what is the strike rate of the established players?
Most analysis answers the question
“Which is are the top 10 X”?Which are my top products?
Which are my top branches?
Who are my best sales people?
Which vendors have the highest cost per unit?
Which divisions are spending the most money?
In which hours does the under 12 segment watch TV most?
Which customer segment has the highest revenue per user?
THIS QUESTION CAN BE ANSWERED SYSTEMATICALLY
Country Player Runs ScoreRate MatchDate Ground VersusAustralia Michael J Clarke 99* 93.39 30-06-2010 The Oval EnglandAustralia Dean M Jones 99* 128.57 28-01-1985 Adelaide Oval Sri LankaAustralia Bradley J Hodge 99* 115.11 04-02-2007 Melbourne Cricket Ground New ZealandIndia Virender Sehwag 99* 99 16-08-2010 Rangiri Dambulla International Stad. Sri LankaNew Zealand Bruce A Edgar 99* 72.79 14-02-1981 Eden Park IndiaPakistan Mohammad Yousuf 99* 95.19 15-11-2007 Captain Roop Singh Stadium IndiaWest Indies Richard B Richardson 99* 70.21 15-11-1985 Sharjah CA Stadium PakistanWest Indies Ramnaresh R Sarwan 99* 95.19 15-11-2002 Sardar Patel Stadium IndiaZimbabwe Andrew Flower 99* 89.18 24-10-1999 Harare Sports Club AustraliaZimbabwe Alistair D R Campbell 99* 79.83 01-10-2000 Queens Sports Club New ZealandZimbabwe Malcolm N Waller 99* 133.78 25-10-2011 Queens Sports Club New ZealandAustralia David C Boon 98* 82.35 08-12-1994 Bellerive Oval ZimbabweAustralia Graeme M Wood 98* 63.22 11-01-1981 Melbourne Cricket Ground IndiaEngland Ian J L Trott 98* 84.48 20-10-2011 Punjab Cricket Association Stadium IndiaIndia Yuvraj Singh 98* 89.09 01-08-2001 Sinhalese Sports Club Ground Sri LankaIreland Kevin J O'Brien 98* 94.23 10-07-2010 VRA Ground ScotlandKenya Collins O Obuya 98* 75.96 13-03-2011 M.Chinnaswamy Stadium AustraliaNetherlands Ryan N ten Doeschate 98* 73.68 01-09-2009 VRA Ground AfghanistanNew Zealand James E C Franklin 98* 142.02 07-12-2010 M.Chinnaswamy Stadium IndiaPakistan Ijaz Ahmed 98* 112.64 28-10-1994 Iqbal Stadium South AfricaSouth Africa Jacques H Kallis 98* 74.24 06-02-2000 St George's Park Zimbabwe
Take every column in the data
Find the top value by that column
Country South Africa has the highest strike rate of 76%Player Johann Louw has the highest strike rate of 329%Runs 164 runs has the highest strike rate of 156%MatchDate12-03-2006 has the highest strike rate of 136%Ground AC-VDCA Stadium has the highest strike rate of98%Versus United States has the highest strike rate of 104%
AUTOLYSISA PRODUCT THAT ENCAPSULATES BUSINESS
ANALYSIS PATTERNS
SPATIAL FREQUENCY ANALYSIS
12
100
YEAR
S O
F IN
DIA
’S
WE
ATH
ER
1901
1911
1921
1931
1941
1951
1961
1971
1981
1991
2001
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
TEMPORAL FREQUENCY ANALYSIS
14
IMPACT OF THE BUDGET ON STOCK PRICES
15
RESTAURANT FOUND AN UNUSUAL DIP IN SALESA restaurant chain had data for every single transaction made over a few years. Plotting this as a time series showed them nothing unusual.
However, the same data on a calendar map reveals a very different story.
Specifically, at the bottom left point-of-sale terminal, sales dips on every Wednesday. At the bottom right point-of-sale terminal, sales rises on every Wednesday (almost as if to compensate for the loss.)
It turns out that the manager closes the bottom-left counter every Wednesday afternoon due to shortage of staff, assuming that it results in no loss of sales. There is, however, a net loss every Wednesday.
HOW BIRTHDAYS AFFECT MARKS
17
BANK FOUND ALL LOANS BEFORE 20TH POOR
Every loan disbursed after the 20th of the month, i.e. from the 21st to the end of the month, shows consistently lower non-performing assets (i.e. better quality) than any loan disbursed prior to the 20th.
The bank mapped this back to their incentive scheme. The sales team’s commission is based only on loans disbursed until the 20th. Hence new loans are squeezed into this period without regard for their quality.
The personal finance division of a bank, focusing on retail loans, drove its sales through a branch sales team.
A study of the non-performing assets of loans generated over the course of one year shows a strange pattern.
This representation, known as a calendar map, can show some interesting patterns, particularly weekday-based patterns, as the next example will show.A similar visual helped a telecom company identify specific days on which their competitors’ market share rose significantly, enabling them to negate
the strategy.
Communicating data visually is the most effective way to a shared understanding
A brief aside on this distribution...
Based on the results of the 20 lakh students taking the Class XII exams at Tamil Nadu over the last 3 years, it appears that the month you were born in can make a difference of as much as 120 marks out of 1,200.
June borns score the
lowest
The marks shoot up for Aug borns
… and peaks for Sep-borns
120 marks out of 1200
explainable by month of birth
An identical pattern was observed in 2009 and 2010…
… and across districts, gender, subjects, and class X & XII.
“It’s simply that in Canada the eligibility cut-off for age-class hockey is January 1. A boy who turns ten on January 2, then, could be playing alongside someone who doesn’t turn ten until the end of the year—and at that age, in preadolescence, a twelve-month gap in age represents an enormous difference in physical maturity.”
-- Malcolm Gladwell, Outliers
PATTERN OF “BIRTHS” IN INDIA IS SKEWEDThis is a birth date dataset that’s obtained from school admission data for over 10 million children. When we compare this with births in the US, we see none of the same patterns.
For example,• Is there an aversion to the 13th or is there a local cultural
nuance?• Are holidays avoided for births?• Which months have a higher propensity for births, and
why?• Are there any patterns not found in the US data?
Very few children are born in the month of August, and
thereafter. Most births are concentrated in the first half
of the year
We see a large number of children born on the 5th, 10th,
15th, 20th and 25th of each month – that is, round
numbered dates
Such round numbered patterns a typical indication
of fraud. Here, birthdates are brought forward to aid
early school admission
More births Fewer births … on average, for each day of the year (from 2007 to 2013)
THIS ADVERSELY IMPACTS CHILDREN’S MARKSIt’s a well established fact that older children tend to do better at school in most activities. Since many children have had their birth dates brought forward, these younger children suffer.
The average marks of children “born” on the 1st, 5th, 10th, 15th etc. of the month tend to score lower marks. • Are holidays avoided for births?• Which months have a higher propensity for births, and
why?• Are there any patterns not found in the US data?
Higher marks Lower marks… on average, for children born on a given day of the year (from 2007 to 2013)
Children “born” on round numbered days score lower marks on average,due to a higher proportion of younger children
RANK SCALE DISTRIBUTIONS
23
AN ENERGY UTILITY DETECTED BILLING FRAUD
This plot shows the frequency of all meter readings from Apr-2010 to Mar-2011. An unusually large
number of readings are aligned with the slab boundaries.
Below is a simple histogram (or frequency distribution) of usage levels. Each bar represents the number of customers with a customers with a specific bill amount (in units, or KWh).
Tariffs are based on the usage slab. Someone with 101 units is billed in full at a higher tariff than someone with 100 units. So people have a strong incentive to stay at or within a slab boundary.
An energy utility (with over 50 million subscribers) had 10 years worth of customer billing data available.
Most fraud detection software failed to load the data, and sampled data revealed little or no insight.
This can happen in one of two ways.
First, people may be monitoring their usage very carefully, and turn of their lights and fans the instant their usage hits the slab boundary.
Or, more realistically, there’s probably some level of corruption involved, where customers pay a small sum to the meter reading staff to ensure that it stays exactly at the slab boundary, giving them the advantage of a lower price.
24
TN CLASS X: ENGLISH
0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 87 90 93 96 990
5,000
10,000
15,000
20,000
25,000
30,000
35,000
40,000
25
TN CLASS X: SOCIAL SCIENCE
0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 87 90 93 96 990
5,000
10,000
15,000
20,000
25,000
30,000
35,000
40,000
26
TN CLASS X: MATHEMATICS
0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 87 90 93 96 990
5,000
10,000
15,000
20,000
25,000
30,000
35,000
40,000
27
CBSE 2013 CLASS XII: ENGLISH MARKS
CLUSTERED CORRELATIONS
68% correlation between AUD &
EUR
Plot of 6 month daily AUD - EUR
values
Block of correlated currencies
… clustered hierarchically
RESTAURANT: PRODUCT SALES CORRELATION
31
RESTAURANT: PRODUCT SALES CORRELATION
MAXIMAL TEXTUAL SEGMENTATION
33
WHAT TOPICS DID THE YOUNG & OLD FOCUS ON?
P.W.D.
Health and
family welfare
Revenue
Rural Developme
nt and Panchayat
Raj
Social Welfar
e
Urban Development
Water Resour
ces
Minor Irrigati
on
Fuel
Housing
Agriculture
Primary Educati
on
Primary and
Secondary Education
Woman & Child
Development
Higher Educat
ion
HomeCoope
rative
Forest
Adminisrative
Reforms
Labour
Food & Civil
Supplies
Tourism
Finance
Animal Husbandry
Transportatio
n
Horticulture
Muzrai
Haz & Wakf
TransportMedical
Education
Medium and Large Industries
Excise
Major & Medium Industrie
s
Kannada &
Culture
Textile
Fisheries
Parliamentary Affairs and Human
Rights
Adult Educat
ion
Rural Water
Supply and Sanitation
Mines & Geolog
y
Small Industri
es
Youth and
Sports
Sugar
Planning and
Statistics
Agricultural
Marketing
Rural Water Supply
Fisheries & Inland
water transport
Small Scale
Industries
Youth Service & Sports
Sericulture
Law & Human Rights
Prison
Planning
Information &
Technology
Public Library
Young Old
Based on assembly session questions, Karnataka, 2008-2012
34
THE LANGUAGE OF TWEETSBased on 1 week of geo-coded tweets from India, this visual shows words sized by frequency. Words on the left (in red) are used by people with few followers, while those on the right (in green) is the reverse.
High-followers use significantly more hash-tags and are perhaps more polite with ‘good morning’s and ‘thank you’s
People with low followers tend to talk more about ‘know’, ‘traffic’, ‘high’ etc
35
PARLIAMENT DECISIONS
promotion scheme
project
approved
development
agreement amendment
central
act
section
limited
billlaning
plan
government
new
ltd
phaseapproval
sector
state
settinginvestment
pradesh
policy
four
programme
amendments
indianextensioninstitute
commission
nhdp
technology
proposal
iii
implementation
fund
establishment
equity
assistancecooperation
transfer
infrastructure
corporation
international
mou cabinet
company
public
year
revised
construction
services
continuation
approves
stateseducationadditional
financial
revision
sponsored
port
mission
centrally
basis
signing
protection
management
capital
bank
two
projects
research
upgradation
rural
special
land
delhi
employees
existing
committee
relief
convention six
crore
payment
power
health
cost
package
institutionsacquisition
control
restructuring
air
grant
field
university
scheduled
PRE-2009 2009 AND AFTERDecisions related to intervention, assistance and relief were almost entirely concentrated in pre-2009
The number of international agreements has declined dramatically between pre-2009 and post-2009
A significant rise in the number of decisions related to the States is
seen post 2009 – in contrast with the focus on “Central” pre-2009
Decisions to increase the number of lanes on highways grew significantly
post-2009, especially as part of the CCI (Cabinet Committee on Infrastructure)
decisions
36
WHAT DO FINANCIAL ANALYSTS ASK IBM VS MSFT?
BIPARTITE NETWORK CLUSTERING
38
How does Mahabharata, one of the largest epics with 1.8 million words lend itself to text analytics?
Can this ‘unstructured data’ be processed to extract analytical insights?
What does sentiment analysis of this tome convey?
Is there a better way to explore relations between characters?
How can closeness of characters be analysed & visualized?
VISUALISING THE MAHABHARATA
Tata TeleservicesTata Consultancy Services
Tata Business Support ServicesTata Global BeveragesTata Infotech (merged)
Tata Toyo RadiatorHoneywell Automation India
Tata CommunicationsA G C Networks
Tata Technologies
Tata ProjectsTata PowerTata FinanceIdea CellularTata MotorsTata SonsTata SteelTayo RollsTata SecuritiesTata CoffeeTata Investment Corp
A J EngineerH H MalghamH K SethnaKeshub MahindraRavi KantRussi ModySujit Gupta
A S BamAmal GanguliD B EngineerD N GhoshM N BhagwatN N KampaniU M Rao
B MuthuramanIshaat Hussain
J J IraniN A PalkhivalaN A Soonawala
R GopalakrishnanRatan Tata
S RamadoraiS Ramakrishnan
DIRECTORSHIPS AT THE TATASEvery person who was a Director at the Tata Group is shown here as an orange circle. The size of the circle is based on the number of directorship positions held over their lifetime.Every company in the Tata Group is shown here as a blue circle. The size of the circle is based on the number of directors the company has had over time.Every directorship relation is shown by a line. If a person has held a directorship position at a company, the two are connected by a line.The group appears to be divided into two clusters based on the network of directorship roles.
Prominent leadersbridge the groups
Second group of companies
First group of companies
Some directors are mainly associated with the first group of companies
Some directors are mainly associated with the second group of companies
Manual exploration Automated insightsMore
problems
Tougherproblems
EXCEL
TABLEAUQLIK
RSASSPS
S
TENSORFLOW
THEANO
SPOTFIRE MICROSTR
ATEGYCOGNOS
CAFFE
Deep insights
TORCH
This fills a gap in thepattern-based analysis space
AUTOLYSIS
GRAMENER.COMs.anand@gramener.com
top related