Download - Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks
![Page 1: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks](https://reader031.vdocument.in/reader031/viewer/2022021816/58a59f571a28abaf3e8b66d1/html5/thumbnails/1.jpg)
O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
![Page 2: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks](https://reader031.vdocument.in/reader031/viewer/2022021816/58a59f571a28abaf3e8b66d1/html5/thumbnails/2.jpg)
Event Processing and Data Analytics with Lucidworks Fusion Kiran Chitturi
Software Engineer, Lucidworks
![Page 3: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks](https://reader031.vdocument.in/reader031/viewer/2022021816/58a59f571a28abaf3e8b66d1/html5/thumbnails/3.jpg)
3
• How to capture/record user events ?
• How to use events/signals for recommendations ?
• How to produce reports/analytics from user events ?
• What type of recommendations can be generated for different user types?
Problem Statement
![Page 4: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks](https://reader031.vdocument.in/reader031/viewer/2022021816/58a59f571a28abaf3e8b66d1/html5/thumbnails/4.jpg)
4
• Library to collect user events from client-side tier of websites and apps (https://github.com/snowplow/snowplow-javascript-tracker)
• Open source equivalent for enterprise analytics
• Sends events using tracking pixel
• Signals API acts as a collector for Snowplow events
• Tracks page views, page pings, links and any custom configured events
• https://github.com/snowplow/snowplow/wiki/javascript-tracker
Event collection - Snowplow JS tracker
![Page 5: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks](https://reader031.vdocument.in/reader031/viewer/2022021816/58a59f571a28abaf3e8b66d1/html5/thumbnails/5.jpg)
![Page 6: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks](https://reader031.vdocument.in/reader031/viewer/2022021816/58a59f571a28abaf3e8b66d1/html5/thumbnails/6.jpg)
6
• Examples:
• page-view, query, search-click, add-to-cart, rating
• Signals Schema:
• required fields: type
• additional properties can be specified in ‘params’ map
• Special treatment for fields ‘docId’, ‘userId’, ‘query’, ‘filterQueries’, ‘collection’, ‘weight’, ‘count’
• Processing logic in ‘_signals_ingest’ pipeline
Event collection - JSON payloads
![Page 7: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks](https://reader031.vdocument.in/reader031/viewer/2022021816/58a59f571a28abaf3e8b66d1/html5/thumbnails/7.jpg)
test
Primary collection
Raw signals
collection
Aggregated signals
collection
test_signalstest_signals
_aggr
Signals Service
JSON payloads
Snowplow payloads
Solr
Signals - data flow
![Page 8: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks](https://reader031.vdocument.in/reader031/viewer/2022021816/58a59f571a28abaf3e8b66d1/html5/thumbnails/8.jpg)
8
Example: page-view signal
{ "timestamp": "2015-09-14T10:12:13.456Z", "type": "pv", "params": { "url": "http://www.ecommerce.com/abws-mcl008-080201" } }
{ "type_s": "pv", "flag_s": "event", "params.url_s": "http://www.ecommerce.com/abws-mcl008-080201", "id": "62a26152-7971-406e-bf06-3df44974c220", "timestamp_tdt": "2015-09-14T10:12:13.45Z", "count_i": 1, "_version_": 1515057367743463400 }
Input signal Indexed signal document
![Page 9: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks](https://reader031.vdocument.in/reader031/viewer/2022021816/58a59f571a28abaf3e8b66d1/html5/thumbnails/9.jpg)
9
Example: page-view signal
{ "timestamp": "2015-09-14T10:12:13.456Z", "type": "pv", "params": { "page": "Dark Gray Wool Suit", "url": "http://www.ecommerce.com/abws-mcl008-080201", "userId": "12891291", "useragent_type_name_s": "Browser", "ipAddr": "64.134.151.1" "tz": "America/NewYork" } }
{ "type_s": "pv", "params.tz_s": "America/NewYork", "user_id_s": "12891291", "params.page_s": "Dark Gray Wool Suit", "tz_timestamp_txt": [ "Mon 2015-09-14 10:12:13.456 UTC" ], "flag_s": "event", "params.ipAddr_s": "64.134.151.1", "params.url_s": "http://www.ecommerce.com/abws-mcl008-080201", "id": "4b993f85-67d3-4523-b2b3-cf4e3ff2f202", "timestamp_tdt": "2015-09-14T10:12:13.45Z", "count_i": 1, "_version_": 1515057643959353300 }
Input signal Indexed signal document
![Page 10: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks](https://reader031.vdocument.in/reader031/viewer/2022021816/58a59f571a28abaf3e8b66d1/html5/thumbnails/10.jpg)
10
Example: click signal
{ "type": "click", "params": { "query": "Madden 12", "docId": "2375201", "userId": "abc121", "position" : "4", "filterQueries": [ "cat00000", "abcat0700000", "abcat0703000", "abcat0703002", "abcat0703008" ] } }
{ "filters_orig_ss":[ "abcat0700000", "abcat0703000", "abcat0703002", "abcat0703008", "cat00000" ], "user_id_s":"abc121", "query_s":"madden 12", "type_s":"click", "params.position_s" : "4", "query_t": "madden 12", "doc_id_s":"2375201", "tz_timestamp_txt":["Tue 2015-10-13 18:33:04.012 UTC"], "filters_s":"abcat0700000 $ abcat0703000 $ abcat0703002
$ abcat0703008 $ cat00000", "flag_s":"event", "query_orig_s":"Madden 12", "id":"69c609f6-a2c1-4f89-990e-88a63e68063d", "timestamp_tdt":"2015-10-13T18:33:04.01Z", "count_i":1, "_version_":1514941903557099520 }
Input signal Indexed signal document
![Page 11: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks](https://reader031.vdocument.in/reader031/viewer/2022021816/58a59f571a28abaf3e8b66d1/html5/thumbnails/11.jpg)
11
• Batch processing using Apache Spark
• spark-solr library (https://github.com/LucidWorks/spark-solr)
• Types
• Simple
• Click
• EventMiner
Aggregations
![Page 12: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks](https://reader031.vdocument.in/reader031/viewer/2022021816/58a59f571a28abaf3e8b66d1/html5/thumbnails/12.jpg)
12
Aggregations - data flow
Aggregation job
Aggregator Spark Agent
test
Primary collection
Raw signals collection
Worker Worker Cluster Mgr.
Spark
Aggregated signals collection
Spark Driver
Stores aggregated results
Fetches raw signals for processing
test_signals test_signals_aggr
![Page 13: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks](https://reader031.vdocument.in/reader031/viewer/2022021816/58a59f571a28abaf3e8b66d1/html5/thumbnails/13.jpg)
13
• Simple aggregations
• Top queries
• Top clicked documents
• Most popular categories
• …
• Complex aggregations
• Click stream aggregations with decaying weights
• Generate a Co-occurence matrix for (user, docId, query) tuple
Aggregation examples
![Page 14: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks](https://reader031.vdocument.in/reader031/viewer/2022021816/58a59f571a28abaf3e8b66d1/html5/thumbnails/14.jpg)
14
Example: simple aggregation { "type": "rating", "params": { "rating": “5.0”, "source": “web” } }, { "type": "rating", "params": { "rating": “1.0”, "source": “web” } }, { "type": "rating", "params": { "rating": “2.0”, "source": “web”, } }, { "type": "rating", "params": { "rating": “2.0”, "source": “web”, } }, { "type": "rating", "params": { "rating": “1.0”, "source": “web” } }
API
test
Primary collection
Raw signals collection
Aggregated signals
collection
test_signalstest_signals
_aggr
Solr
Signals Service
![Page 15: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks](https://reader031.vdocument.in/reader031/viewer/2022021816/58a59f571a28abaf3e8b66d1/html5/thumbnails/15.jpg)
15
Example: simple aggregation (continued)
15
test
Primary collection
Raw signals collection
Aggregated signals
collection
test_signalstest_signals
_aggr
Solr
Submitted manually or
via scheduler
Aggregation Service Spark
Fetches raw signals for processing
Stores aggregated results
{ "id" : "test_simple_aggr", "signalTypes" : [ "rating" ], "selectQuery" : "*:*", "aggregator" : "simple", "groupingFields" : "params.source_s", "aggregates" : [ { "type" : "stddev", "sourceFields" : [ "params.rating_s" ], "targetField" : "stddev_rating_d" }, { "type": "topk", "sourceFields": ["params.rating_s"], "targetField": "topk_rating_ss" }, { "type": "mean", "sourceFields": ["params.rating_s"], "targetField": "mean_position_d" } ] } Aggregation
definition
job submission
![Page 16: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks](https://reader031.vdocument.in/reader031/viewer/2022021816/58a59f571a28abaf3e8b66d1/html5/thumbnails/16.jpg)
16
• Aggregated document:
Example: simple aggregation (continued){ "aggr_job_id_s": "b91ffdebc44d4e128a8431c2f8a3deb7", "aggr_type_s": "simple@doc_id_s-query_s-filters_s", "flag_s": "aggr", "type_s": "rating", "id": "24494dba-93a6-4fc5-bb4d-5b546c3c0c5e", "aggr_id_s": "test_simple_aggr", "timestamp_tdt": "2015-10-15T02:26:17.337Z", "count_i": 5, “grouping_key_s": "web",
"stddev_rating_d": 1.6431676725154982,
"mean_position_d": 2.2,
"values.topk_rating_ss": ["2.0", "1.0", "5.0"], "counts.topk_rating_ss": ["2", "2", "1"], "errors.topk_rating_ss": ["0", "0", "0"] }
![Page 17: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks](https://reader031.vdocument.in/reader031/viewer/2022021816/58a59f571a28abaf3e8b66d1/html5/thumbnails/17.jpg)
17
Example: Click aggregation[ { "timestamp": "2014-09-01T23:44:52.533Z", "params": { "query": "Sharp", "docId": "2009324" }, "type": "click" }, { "timestamp": "2014-09-05T12:25:37.420Z", "params": { "query": "Sharp", "docId": "2009324" }, "type": "click" }, { "timestamp": "2014-08-24T12:56:58.910Z", "params": { "query": "Sharp TV", "docId": "1517163" }, "type": "click" }, { "timestamp": "2015-10-25T07:18:14.722Z", "params": { "query": "rca", "docId": "2877125" }, "type": "click" } ]
Signals indexed and aggregated
{ "doc_id_s": "1517163", "query_s": "sharp tv", "weight_d": 0.000006602878329431405, "count_i": 1 }, { "doc_id_s": "2009324", "query_s": "sharp", "weight_d": 0.000016734602468204685, "count_i": 2 }, { “doc_id_s”: "2877125", "query_s": "rca", "weight_d": 0.06324164569377899, "count_i": 1 }
aggregated docsraw docs
![Page 18: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks](https://reader031.vdocument.in/reader031/viewer/2022021816/58a59f571a28abaf3e8b66d1/html5/thumbnails/18.jpg)
18
• How to mix signals with search results ?
• Recommendation API
• Generic query pipeline configuration using 3 stage approach
• Sub-query
• Rollup-results
• Advanced-boost
Driving search relevancy
![Page 19: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks](https://reader031.vdocument.in/reader031/viewer/2022021816/58a59f571a28abaf3e8b66d1/html5/thumbnails/19.jpg)
19
Boosting search results using aggregated documentsUser App
Search query
Query-pipeline
stages
Set Params Query Solr
Raw signals collection
Aggregated signals
collection
test_signalstest_signals
_aggr
Recommendation Stages
test
Primary collection
1. Query aggregated documents 2. Process results 3. Add parameters to the request
Search response
![Page 20: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks](https://reader031.vdocument.in/reader031/viewer/2022021816/58a59f571a28abaf3e8b66d1/html5/thumbnails/20.jpg)
20
![Page 21: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks](https://reader031.vdocument.in/reader031/viewer/2022021816/58a59f571a28abaf3e8b66d1/html5/thumbnails/21.jpg)
21
• Calculate Co-occurence matrix for tuples based on sessions
• Example: (userId, query, docId)
• Construct DAG from matrix data
• Recommendations are powered from Graph at query time
• Increases diversity in recommendations
• See https://lucidworks.com/blog/2015/08/31/mining-events-recommendations/
Event Miner aggregation
![Page 22: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks](https://reader031.vdocument.in/reader031/viewer/2022021816/58a59f571a28abaf3e8b66d1/html5/thumbnails/22.jpg)
22
Graph Navigation - Example Query
![Page 23: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks](https://reader031.vdocument.in/reader031/viewer/2022021816/58a59f571a28abaf3e8b66d1/html5/thumbnails/23.jpg)
23
Graph Navigation - Example Query
![Page 24: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks](https://reader031.vdocument.in/reader031/viewer/2022021816/58a59f571a28abaf3e8b66d1/html5/thumbnails/24.jpg)
24
Graph Navigation - Example Query
![Page 25: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks](https://reader031.vdocument.in/reader031/viewer/2022021816/58a59f571a28abaf3e8b66d1/html5/thumbnails/25.jpg)
25
Graph Navigation - Example Query
![Page 26: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks](https://reader031.vdocument.in/reader031/viewer/2022021816/58a59f571a28abaf3e8b66d1/html5/thumbnails/26.jpg)
Graph Navigation - Example Query
![Page 27: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks](https://reader031.vdocument.in/reader031/viewer/2022021816/58a59f571a28abaf3e8b66d1/html5/thumbnails/27.jpg)
27
Demo
![Page 28: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks](https://reader031.vdocument.in/reader031/viewer/2022021816/58a59f571a28abaf3e8b66d1/html5/thumbnails/28.jpg)
28
Using Signals
=
Modifying Your Behavior in Response to your Environment
Events & Signals
![Page 29: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks](https://reader031.vdocument.in/reader031/viewer/2022021816/58a59f571a28abaf3e8b66d1/html5/thumbnails/29.jpg)