deep learning vs. cheap learning
TRANSCRIPT
![Page 1: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/1.jpg)
© 2016 MapR Technologies 1© 2014 MapR Technologies
Deep Learning is great
Ted Dunning
![Page 2: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/2.jpg)
© 2016 MapR Technologies 2© 2014 MapR Technologies
Deep Learning is greatBut don’t forget cheap learning!
Ted Dunning
![Page 3: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/3.jpg)
© 2016 MapR Technologies 3
Me, Us• Ted Dunning, MapR Chief Application Architect, Apache Member
– Committer PMC member Zookeeper, Drill, others– Mentor for Flink, Beam (nee Dataflow), Drill, Storm, Zeppelin– VP Incubator– Bought the beer at the first HUG
• MapR– Produces first converged platform for big and fast data– Includes data platform (files, streams, tables) + open source– Adds major technology for performance, HA, industry standard API’s
• Contact@ted_dunning, [email protected], [email protected]
![Page 4: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/4.jpg)
© 2016 MapR Technologies 4
Agenda• Rationale• Why cheap isn't the same as simple-minded• Some techniques• Examples
![Page 5: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/5.jpg)
© 2016 MapR Technologies 5
Outline• We have a revolution on our hands• This leads to a green-field situation• That implies that many important problems are easy to solve• The limiting factor is fielding good enough solutions
– Quickly– With available workforce
• Examples
![Page 6: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/6.jpg)
© 2016 MapR Technologies 6
Is this really a revolutionary moment?
![Page 7: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/7.jpg)
© 2016 MapR Technologies 7
Big is the next big thing• Data scale is exploding
• Companies are being funded
• Books are being written
• Applications sprouting up everywhere
![Page 8: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/8.jpg)
© 2016 MapR Technologies 8
Why Now?• But Moore’s law has applied for a long time
• Why is data exploding now?
• Why not 10 years ago?
• Why not 20?
![Page 9: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/9.jpg)
© 2016 MapR Technologies 9
Size Matters, but …• If it were just availability of data then existing big companies
would adopt big data technology first
![Page 10: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/10.jpg)
© 2016 MapR Technologies 10
Size Matters, but …• If it were just availability of data then existing big companies
would adopt big data technology first
They didn’t
![Page 11: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/11.jpg)
© 2016 MapR Technologies 11
Or Maybe Cost• If it were just a net positive value then finance companies should
adopt first because they have higher opportunity value / byte
![Page 12: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/12.jpg)
© 2016 MapR Technologies 12
Or Maybe Cost• If it were just a net positive value then finance companies should
adopt first because they have higher opportunity value / byte
They didn’t
![Page 13: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/13.jpg)
© 2016 MapR Technologies 13
Backwards adoption• Under almost any threshold argument startups would not adopt
big data technology first
![Page 14: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/14.jpg)
© 2016 MapR Technologies 14
Backwards adoption• Under almost any threshold argument startups would not adopt
big data technology first
They did
![Page 15: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/15.jpg)
© 2016 MapR Technologies 15
Everywhere at Once?• Something very strange is happening
– Big data is being applied at many different scales– At many value scales– By large companies and small
![Page 16: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/16.jpg)
© 2016 MapR Technologies 16
Everywhere at Once?• Something very strange is happening
– Big data is being applied at many different scales– At many value scales– By large companies and small
Why?
![Page 17: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/17.jpg)
© 2016 MapR Technologies 17
Analytics Scaling Laws• Analytics scaling is all about the 80-20 rule
– Big gains for little initial effort– Rapidly diminishing returns
• The key to net value is how costs scale– Old school – exponential scaling– Big data – linear scaling, low constant
• Cost/performance has changed radically– IF you can use many commodity boxes
![Page 18: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/18.jpg)
© 2016 MapR Technologies 18
Most data isn’t worth much in isolation
First data is valuable
Later data is dregs
![Page 19: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/19.jpg)
© 2016 MapR Technologies 19
Suddenly worth processing
First data is valuable
Later data is dregs
But has high aggregate value
![Page 20: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/20.jpg)
© 2016 MapR Technologies 20
If we can handle the scale
It’s really big
![Page 21: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/21.jpg)
© 2016 MapR Technologies 21
So what makes that possible?
![Page 22: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/22.jpg)
© 2016 MapR Technologies 22
![Page 23: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/23.jpg)
© 2016 MapR Technologies 23
Net value optimum has a sharp peak well before maximum effort
![Page 24: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/24.jpg)
© 2016 MapR Technologies 24
But scaling laws are changing both slope and shape
![Page 25: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/25.jpg)
© 2016 MapR Technologies 25
More than just a little
![Page 26: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/26.jpg)
© 2016 MapR Technologies 26
They are changing a LOT!
![Page 27: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/27.jpg)
© 2016 MapR Technologies 27
![Page 28: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/28.jpg)
© 2016 MapR Technologies 28
![Page 29: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/29.jpg)
© 2016 MapR Technologies 29
![Page 30: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/30.jpg)
© 2016 MapR Technologies 30
![Page 31: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/31.jpg)
© 2016 MapR Technologies 31
Initially, linear cost scaling actually makes things worse
Then a tipping point is reached and things change radically …
![Page 32: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/32.jpg)
© 2016 MapR Technologies 32
Pre-requisites for Tipping• To reach the tipping point, • Algorithms must scale out horizontally
– On commodity hardware– That can and will fail
• Data practice must change– Denormalized is the new black– Flexible data dictionaries are the rule– Structured data becomes rare
![Page 33: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/33.jpg)
© 2016 MapR Technologies 33
With great scale comes great opportunity• Increasing scale by 1000x changes the game
• We essentially have green fields opening up all around
• Most of the opportunities don’t require advanced learning
![Page 34: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/34.jpg)
© 2016 MapR Technologies 34
OK.
We have a bona fide revolution
![Page 35: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/35.jpg)
© 2016 MapR Technologies 35
Greenfield Problem Landscape
![Page 36: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/36.jpg)
© 2016 MapR Technologies 36
Mature Problem Landscape
![Page 37: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/37.jpg)
© 2016 MapR Technologies 37
Why is cheap better than deep (sometimes)?When we have a greenfield, problems can be
– Easy (large number of these)– Impossible (large number of these)– Hard but possible (right on the boundary)
In a mature field, problems can be– Easy (these are already done)– Impossible (still a large number of these)– Hard but possible (now the majority of the effort)
![Page 38: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/38.jpg)
© 2016 MapR Technologies 38
Some examples
![Page 39: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/39.jpg)
© 2016 MapR Technologies 39
A simple example - security monitoring
• “Small” data– Capture IDS logs– Detect what you already know
• “Big” data– Capture switch, server, firewall logs as well– New patterns emerge immediately
![Page 40: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/40.jpg)
© 2016 MapR Technologies 40
Another example – fraud detection
• “Small” data– Maintain card profiles– Segment models– Evaluate all transactions
• “Big” Data– Maintain card profiles, full 90 day transaction history– Evaluate all transactions
![Page 41: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/41.jpg)
© 2016 MapR Technologies 41
Another example – indicator-based recommendation
• “Advanced” approach– Use matrix completion techniques (LDA, NNM, ALS)– Tune meta-parameters– Ensembles galore
• “Simple” approach– Count cooccurrences and cross-occurrences– Finding “interesting” pairs– Use standard search engine to recommend
![Page 42: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/42.jpg)
© 2016 MapR Technologies 42
Easy != Stupid• You still have to do things reasonably well
– Techniques that are not well founded are still problems
• Heuristic frequency ratios still fail – Coincidences still dominate the data– Accidental 100% correlations abound
• Related techniques still broken for coincidence– Pearson’s χ2
– Simple correlations
![Page 43: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/43.jpg)
© 2016 MapR Technologies 43
Scale does not cure wrong
It just makes easy more common
![Page 44: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/44.jpg)
© 2016 MapR Technologies 44
A core technique• Many of these easy problems reduce to finding interesting
coincidences
• This can be summarized as a 2 x 2 table
• Actually, many of these tables
A OtherB k11 k12
Other
k21 k22
![Page 45: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/45.jpg)
© 2016 MapR Technologies 45
How do you do that?• This is well handled using G2-test
– See wikipedia– See http://bit.ly/surprise-and-coincidence
• Original application in linguistics now cited > 2000 times
• Available in ElasticSearch, in Solr, in Mahout• Available in R, C, Java, Python
![Page 46: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/46.jpg)
© 2016 MapR Technologies 46
Which one is the anomalous co-occurrence?
A not AB 13 1000
not B 1000 100,000
A not AB 1 0
not B 0 10,000
A not AB 10 0
not B 0 100,000
A not AB 1 0
not B 0 2
![Page 47: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/47.jpg)
© 2016 MapR Technologies 47
Which one is the anomalous co-occurrence?
A not AB 13 1000
not B 1000 100,000
A not AB 1 0
not B 0 10,000
A not AB 10 0
not B 0 100,000
A not AB 1 0
not B 0 20.90 1.95
4.52 14.3
Dunning Ted, Accurate Methods for the Statistics of Surprise and Coincidence, Computational Linguistics vol 19 no. 1 (1993)
![Page 48: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/48.jpg)
© 2016 MapR Technologies 48
So we can find interesting coincidences.
That gets us exactly what?
![Page 49: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/49.jpg)
© 2016 MapR Technologies 49
Operation Ababil – Brobots on Parade• Dork attack to find unpatched default Joomla sites
– Especially web servers with high bandwidth connections– Basically just Google searches for default strings– Joomla compromised into attack Brobot
• C&C network checks in occasionally– Note C&C is incoming request and looks like normal web requests
• Later, on command, multiple Brobots direct 50-75 Gb/s of attack– Attacks come from white-listed sites
![Page 50: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/50.jpg)
© 2016 MapR Technologies 50
Attack Sequence
![Page 51: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/51.jpg)
© 2016 MapR Technologies 51
Attack Sequence
![Page 52: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/52.jpg)
© 2016 MapR Technologies 52
Attack Sequence
![Page 53: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/53.jpg)
© 2016 MapR Technologies 53
Attack Sequence
![Page 54: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/54.jpg)
© 2016 MapR Technologies 54
Outline of an Advanced Persistent Threat• Advanced
– Common use of zero-day for preliminary attacks– Often attributed to state-level actors– Modern privateers blur the line
• Persistent– Result of first attack is heavily muffled, no immediate exploit– Remote access toolset installed (RAT)
• Threat– On command, data is exfiltrated covertly or en masse– Or the compromised host is used for other nefarious purpose
![Page 55: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/55.jpg)
© 2016 MapR Technologies 55
APT in Summary• Attack, penetrate, pivot, exfiltrate or exploit• If you are a high-value target, attack is likely and stealthy
– High-value = telecom, banks, utilities, retail targets, web100– … and all their vendors– Conventional multi-factor auth is easily breached
• Penetration and pivot are critical counter-measure opportunities– In 2010, RAT would contact command and control (C&C)– In 2016, C&C looks like normal traffic
• Once exfiltration or exploit starts, you may no longer have a business
![Page 56: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/56.jpg)
© 2016 MapR Technologies 56
Example 1 - Ababil
Defense has to happen here
![Page 57: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/57.jpg)
© 2016 MapR Technologies 57
Spot the Important Difference?
Attacker request Real request
![Page 58: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/58.jpg)
© 2016 MapR Technologies 58
Spot the Important Difference?
Attacker request Real request
![Page 59: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/59.jpg)
© 2016 MapR Technologies 59
This could only be found at scale
![Page 60: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/60.jpg)
© 2016 MapR Technologies 60
This could only be found at scale
But at scale, it is stupidly simple to find
![Page 61: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/61.jpg)
© 2016 MapR Technologies 61
Overall Outline Again
Tradecraft error!
![Page 62: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/62.jpg)
© 2016 MapR Technologies 62
Large corpus analysis of source IP’s wins big
![Page 63: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/63.jpg)
© 2016 MapR Technologies 63
![Page 64: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/64.jpg)
© 2016 MapR Technologies 64
Example 2 - Common Point of Compromise• Scenario:
– Merchant 0 is compromised, leaks account data during compromise– Fraud committed elsewhere during exploit– High background level of fraud– Limited detection rate for exploits
• Goal:– Find merchant 0
• Meta-goal:– Screen algorithms for this task without leaking sensitive data
![Page 65: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/65.jpg)
© 2016 MapR Technologies 65
Example 2 - Common Point of Compromise
Card data is stolen from Merchant 0
That data is used in frauds at other merchants
![Page 66: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/66.jpg)
© 2016 MapR Technologies 66
Simulation Setup
![Page 67: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/67.jpg)
© 2016 MapR Technologies 67
Detection Strategy• Select histories that precede non-fraud• And histories that precede fraud detection
• Analyze 2x2 cooccurrence of merchant n versus fraud detection
![Page 68: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/68.jpg)
© 2016 MapR Technologies 68
![Page 69: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/69.jpg)
© 2016 MapR Technologies 69
What about the real world?
![Page 70: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/70.jpg)
© 2016 MapR Technologies 70
Really truly bad guys
![Page 71: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/71.jpg)
© 2016 MapR Technologies 71
Historical cooccurrence gives high S/N
![Page 72: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/72.jpg)
© 2016 MapR Technologies 72
Historical cooccurrence gives high S/N
(we win)
![Page 73: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/73.jpg)
© 2016 MapR Technologies 73
Cooccurrence Analysis
![Page 74: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/74.jpg)
© 2016 MapR Technologies 74
Real-life example• Query: “Paco de Lucia”• Conventional meta-data search results:
– “hombres de paco” times 400– not much else
• Recommendation based search:– Flamenco guitar and dancers– Spanish and classical guitar– Van Halen doing a classical/flamenco riff
![Page 75: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/75.jpg)
© 2016 MapR Technologies 75
Real-life example
![Page 76: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/76.jpg)
© 2016 MapR Technologies 76
So …• There are suddenly lots of these problems• Simple techniques have surprising power at scale
– Cooccurrence via G2 / LLR– Distributional anomaly detection via t-digest
• These simple techniques are largely unsuitable for academic research
• But they are highly applicable in resource constrained industrial settings
![Page 77: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/77.jpg)
© 2016 MapR Technologies 77
Summary• We live in a golden age of newly achieved scale
• That scale has lowered the tree– Hard problems are much easier– Lots of low-hanging fruit all around us
• Cheap learning has huge value
• Code available at http://github.com/tdunning
![Page 78: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/78.jpg)
© 2016 MapR Technologies 78
Me, Us• Ted Dunning, MapR Chief Application Architect, Apache Member
– Committer PMC member Zookeeper, Drill, others– Mentor for Flink, Beam (nee Dataflow), Drill, Storm, Zeppelin– VP Incubator– Bought the beer at the first HUG
• MapR– Produces a converged platform for big and fast data– Includes data platform (files, streams, tables) + open source– Adds major technology for performance, HA, industry standard API’s
• Contact@ted_dunning, [email protected], [email protected]
![Page 79: Deep Learning vs. Cheap Learning](https://reader038.vdocument.in/reader038/viewer/2022102811/58778b981a28abc85f8b7505/html5/thumbnails/79.jpg)
© 2016 MapR Technologies 79
Q & A