cognitive computing with big data, high tech and low tech approaches
Post on 15-Jan-2015
1.295 Views
Preview:
DESCRIPTION
TRANSCRIPT
© 2014 MapR Technologies 1© 2014 MapR Technologies
Cognitive Computing on Hadoop
Low Tech and High Tech Approaches
Ted Dunning
© 2014 MapR Technologies 2© 2014 MapR Technologies
Cognitive Computing on Hadoop with Data
Low Tech and High Tech Approaches
Ted Dunning
© 2014 MapR Technologies 3
Who I am
Ted Dunning, Chief Applications Architect, MapR Technologies
Email tdunning@mapr.com tdunning@apache.org
Twitter @Ted_Dunning
Apache Mahout https://mahout.apache.org/
Twitter @ApacheMahout
© 2014 MapR Technologies 4
The outline
• The first open source, big data project
• Another big data project
• Conclusions
© 2014 MapR Technologies 5
First:
An apology for going off-script
© 2014 MapR Technologies 6
Now, the story
© 2014 MapR Technologies 7
© 2014 MapR Technologies 8
In 1866, the top finishers in the tea race reached London in 99 days, within 2 hours of each other
© 2014 MapR Technologies 9
© 2014 MapR Technologies 10
But in 1851, the record had been set at 89 days by the Flying Cloud
© 2014 MapR Technologies 11
The difference was due (in part) to big data
© 2014 MapR Technologies 12
© 2014 MapR Technologies 13
© 2014 MapR Technologies 14
© 2014 MapR Technologies 15
These charts were free …
If you donated your data
© 2014 MapR Technologies 16
But how does this apply today?
© 2014 MapR Technologies 17
Key Points of Maury’s Work
• Give to get– Give the Abstract Log to captains, get data
• Data consortium wins– Merging data gives pictures nobody else can see
• Give back– Them that gives, also gets
• But this is just what every data driven web site does!– Just 150 years before everybody else
© 2014 MapR Technologies 18
The Real News in Behavioral Analysis
• Everybody knows that:
• You need ensembles of many models to do recommendations
• You need to use factorization models
• You predict what you observe
• (You should predict ratings)
© 2014 MapR Technologies 19
But …
none of this is really true
© 2014 MapR Technologies 20
In fact,
• Fancy models are rarely useful expenditures of time• Factorization can be good, but not much better (if at all)• Ratings are disastrously bad data• Cross-recommendation and multi-modal recommendations are
much more interesting– Multiple kinds of input are far better than multiple models
• The UI has a far larger impact than the models• The best algorithms combine simplicity with accuracy
– So simple you can embed them in a search engine
© 2014 MapR Technologies 21
Here’s how
© 2014 MapR Technologies 22
Cooccurrence Analysis
© 2014 MapR Technologies 23
How Often Do Items Co-occur
© 2014 MapR Technologies 24
Which Co-occurrences are Interesting?
Each row of indicators becomes a field in a search engine document
© 2014 MapR Technologies 25
Recommendations
Alice got an apple and a puppyAlice
© 2014 MapR Technologies 26
Recommendations
Alice got an apple and a puppyAlice
Charles got a bicycleCharles
© 2014 MapR Technologies 27
Recommendations
Alice got an apple and a puppyAlice
Charles got a bicycleCharles
Bob Bob got an apple
© 2014 MapR Technologies 28
Recommendations
Alice got an apple and a puppyAlice
Charles got a bicycleCharles
Bob What else would Bob like?
© 2014 MapR Technologies 29
Recommendations
Alice got an apple and a puppyAlice
Charles got a bicycleCharles
Bob A puppy!
© 2014 MapR Technologies 30
You get the idea of how recommenders can work…
© 2014 MapR Technologies 31
By the way, like me, Bob also wants a pony…
© 2014 MapR Technologies 32
Recommendations
?
Alice
Bob
Charles
Amelia
What if everybody gets a pony?
What else would you recommend for new user Amelia?
© 2014 MapR Technologies 33
Recommendations
?
Alice
Bob
Charles
Amelia
If everybody gets a pony, it’s not a very good indicator of what to else predict...
© 2014 MapR Technologies 34
Problems with Raw Co-occurrence
• Very popular items co-occur with everything or why it’s not very helpful to know that everybody wants a pony… – Examples: Welcome document; Elevator music
• Very widespread occurrence is not interesting to generate indicators for recommendation– Unless you want to offer an item that is constantly desired, such as
razor blades (or ponies)• What we want is anomalous co-occurrence
– This is the source of interesting indicators of preference on which to base recommendation
© 2014 MapR Technologies 35
Overview: Get Useful Indicators from Behaviors
1. Use log files to build history matrix of users x items– Remember: this history of interactions will be sparse compared to all
potential combinations
2. Transform to a co-occurrence matrix of items x items
3. Look for useful indicators by identifying anomalous co-occurrences to make an indicator matrix– Log Likelihood Ratio (LLR) can be helpful to judge which co-
occurrences can with confidence be used as indicators of preference– ItemSimilarityJob in Apache Mahout uses LLR
© 2014 MapR Technologies 36
Which one is the anomalous co-occurrence?
A not A
B 13 1000
not B 1000 100,000
A not A
B 1 0
not B 0 10,000
A not A
B 10 0
not B 0 100,000
A not A
B 1 0
not B 0 2
© 2014 MapR Technologies 37
Which one is the anomalous co-occurrence?
A not A
B 13 1000
not B 1000 100,000
A not A
B 1 0
not B 0 10,000
A not A
B 10 0
not B 0 100,000
A not A
B 1 0
not B 0 20.90 1.95
4.52 14.3
© 2014 MapR Technologies 38
Collection of Documents: Insert Meta-Data
Search Technology
Item meta-data
Document for “puppy” id: t4
title: puppydesc: The sweetest little puppy ever.keywords: puppy, dog, pet
Ingest easily via NFS
© 2014 MapR Technologies 39
From Indicator Matrix to New Indicator Field
✔
id: t4title: puppydesc: The sweetest little puppy ever.keywords: puppy, dog, pet
indicators: (t1)
Solr document for “puppy”
Note: data for the indicator field is added directly to meta-data for a document in Apache Solr or Elastic Search index. You don’t need to create a separate index for the indicators.
© 2014 MapR Technologies 40
Going Further: Multi-Modal Recommendation
© 2014 MapR Technologies 41
Going Further: Multi-Modal Recommendation
© 2014 MapR Technologies 42
For example
• Users enter queries (A)– (actor = user, item=query)
• Users view videos (B)– (actor = user, item=video)
• ATA gives query recommendation– “did you mean to ask for”
• BTB gives video recommendation– “you might like these videos”
© 2014 MapR Technologies 43
The punch-line
• BTA recommends videos in response to a query– (isn’t that a search engine?)– (not quite, it doesn’t look at content or meta-data)
© 2014 MapR Technologies 44
Real-life example
• Query: “Paco de Lucia”• Conventional meta-data search results:
– “hombres de paco” times 400– not much else
• Recommendation based search:– Flamenco guitar and dancers– Spanish and classical guitar– Van Halen doing a classical/flamenco riff
© 2014 MapR Technologies 45
Real-life example
© 2014 MapR Technologies 46
Hypothetical Example
• Want a navigational ontology?• Just put labels on a web page with traffic
– This gives A = users x label clicks
• Remember viewing history– This gives B = users x items
• Cross recommend– B’A = label to item mapping
• After several users click, results are whatever users think they should be
© 2014 MapR Technologies 47
More Details Available
available for free at
http://www.mapr.com/practical-machine-learning
© 2014 MapR Technologies 48
Who I am
Ted Dunning, Chief Applications Architect, MapR Technologies
Email tdunning@mapr.com tdunning@apache.org
Twitter @Ted_Dunning
Apache Mahout https://mahout.apache.org/
Twitter @ApacheMahout
© 2014 MapR Technologies 49
Q & A
@mapr maprtech
jbates@mapr.com
Engage with us!
MapR
maprtech
mapr-technologies
top related