![Page 1: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil](https://reader031.vdocument.in/reader031/viewer/2022021919/586fde401a28ab18428b6b27/html5/thumbnails/1.jpg)
O C T O B E R 1 3 -‐ 1 6 , 2 0 1 6 • A U S T I N , T X
![Page 2: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil](https://reader031.vdocument.in/reader031/viewer/2022021919/586fde401a28ab18428b6b27/html5/thumbnails/2.jpg)
Solr & R to deploy Custom Search Interfaces
Patrick Beaucamp
Chairman – Bpm-‐Conseil -‐ France patrick.beaucamp@bpm-‐conseil.com
![Page 3: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil](https://reader031.vdocument.in/reader031/viewer/2022021919/586fde401a28ab18428b6b27/html5/thumbnails/3.jpg)
PresentaHon Agenda
Solr & R IntegraHon inside AklaBox
AklaBox PresentaHon
AklaBox & Solr + R & GoJS & OSM
Demo Pla;orm : AklaBox
Going further : Vanilla Air, Spark & R & Solr
![Page 4: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil](https://reader031.vdocument.in/reader031/viewer/2022021919/586fde401a28ab18428b6b27/html5/thumbnails/4.jpg)
Cer@fied on Cloudera & HortonWorks
Run on Hadoop : Solr/Cloud, Hdfs ...
Ready for OpenStack
Aklabox PresentaHon
![Page 5: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil](https://reader031.vdocument.in/reader031/viewer/2022021919/586fde401a28ab18428b6b27/html5/thumbnails/5.jpg)
Aklabox PresentaHon
User Interface
![Page 6: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil](https://reader031.vdocument.in/reader031/viewer/2022021919/586fde401a28ab18428b6b27/html5/thumbnails/6.jpg)
Aklabox PresentaHon
Upload your documents
Share your documents
Collaborate on documents
Search on documents
Synchronize your
documents
Publish your documents
Document Viewer
![Page 7: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil](https://reader031.vdocument.in/reader031/viewer/2022021919/586fde401a28ab18428b6b27/html5/thumbnails/7.jpg)
Aklabox PresentaHon
WorkFlow
Synchro
Mobile
![Page 8: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil](https://reader031.vdocument.in/reader031/viewer/2022021919/586fde401a28ab18428b6b27/html5/thumbnails/8.jpg)
Aklabox PresentaHon
Standard Search Interface
![Page 9: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil](https://reader031.vdocument.in/reader031/viewer/2022021919/586fde401a28ab18428b6b27/html5/thumbnails/9.jpg)
Solr & R IntegraHon inside AklaBox
• Why do I get this list when I search inside the document repository ?
• What does value when I run a search : weight of every words ? • If a word is 100 @mes in a document, is the document more valuable for my search ?
• May be the document I’m looking for has not the exact word spelling ?
• How do I take into account mul@ language support ?
![Page 10: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil](https://reader031.vdocument.in/reader031/viewer/2022021919/586fde401a28ab18428b6b27/html5/thumbnails/10.jpg)
Solr & R IntegraHon inside AklaBox
• We need to review our module and rethink how we can help user to deploy their own search policy
• R was a natural choice to create a new search algorithm • We use R for our Data Mining development • R contains packages to inspect documents • R has virtually no limit to analyze and classify documents • We read a lot about R & Search engine …
![Page 11: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil](https://reader031.vdocument.in/reader031/viewer/2022021919/586fde401a28ab18428b6b27/html5/thumbnails/11.jpg)
Solr & R IntegraHon inside AklaBox
• When do we analyze documents with R : • Before Solr Indexa@on • AZer Solr Indexa@on
• Choice : • Before Solr Indexa@on • We add Metadata on every document, like top words, class of document ….
• We create classes for documents, and rela@on between classes
![Page 12: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil](https://reader031.vdocument.in/reader031/viewer/2022021919/586fde401a28ab18428b6b27/html5/thumbnails/12.jpg)
Solr & R IntegraHon inside AklaBox
Keywords are added inside Solr Index
![Page 13: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil](https://reader031.vdocument.in/reader031/viewer/2022021919/586fde401a28ab18428b6b27/html5/thumbnails/13.jpg)
Solr & R IntegraHon inside AklaBox
![Page 14: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil](https://reader031.vdocument.in/reader031/viewer/2022021919/586fde401a28ab18428b6b27/html5/thumbnails/14.jpg)
Solr & R IntegraHon inside AklaBox
![Page 15: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil](https://reader031.vdocument.in/reader031/viewer/2022021919/586fde401a28ab18428b6b27/html5/thumbnails/15.jpg)
Solr & R IntegraHon inside AklaBox
R Packages : • tm, textmining func@ons (stemming, words frequency, words manipula@on,
etc...) • TF IDF funcHon (Term Frequency)
• Matrix, for complex ma@rx manipula@on
• cluster -‐ fanny & kmeans func-ons, to calculate classes on various group
• libsvm -‐ fonc@uns svm, predict e& tune, for automa@c words classifica@on
• Sampling – to create & manipulate different data sets
![Page 16: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil](https://reader031.vdocument.in/reader031/viewer/2022021919/586fde401a28ab18428b6b27/html5/thumbnails/16.jpg)
Solr & R IntegraHon inside AklaBox
+ • R algorithm runs when the document is uploaded
• We keep only a few number of words per documents (parameter) • We create classes for documents • We can managed other concerns, such as interna@onalisa@on
• R Package can be switch (other algorithm, new deployment) • easy & flexible to deploy and maintain
• No impact on Solr
-‐ • Solr index is a gold mine … and we don’t run analysis on it
![Page 17: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil](https://reader031.vdocument.in/reader031/viewer/2022021919/586fde401a28ab18428b6b27/html5/thumbnails/17.jpg)
AklaBox & Solr + R & GoJS & OSM
![Page 18: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil](https://reader031.vdocument.in/reader031/viewer/2022021919/586fde401a28ab18428b6b27/html5/thumbnails/18.jpg)
AklaBox & Solr + R & GoJS & OSM
Mind Map with Words associa@on
![Page 19: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil](https://reader031.vdocument.in/reader031/viewer/2022021919/586fde401a28ab18428b6b27/html5/thumbnails/19.jpg)
AklaBox & Solr + R & GoJS & OSM
Map Visualiza@on
OSM Visualiza@on
![Page 20: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil](https://reader031.vdocument.in/reader031/viewer/2022021919/586fde401a28ab18428b6b27/html5/thumbnails/20.jpg)
DemonstraHon
![Page 21: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil](https://reader031.vdocument.in/reader031/viewer/2022021919/586fde401a28ab18428b6b27/html5/thumbnails/21.jpg)
DemonstraHon
• Other Business Cases
• Document Management : Pre-‐classifica@on of documents (pharmaceu@cal industry)
• Search engine : Analysis of WebSite during crawling process
• Open Door to New development
• Phone@cs search (to solve the word spelling problem)
![Page 22: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil](https://reader031.vdocument.in/reader031/viewer/2022021919/586fde401a28ab18428b6b27/html5/thumbnails/22.jpg)
Vanilla Air, Spark, Spark Sql for Solr
New Technologies are emerging … well : it’s already there !!!
![Page 23: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil](https://reader031.vdocument.in/reader031/viewer/2022021919/586fde401a28ab18428b6b27/html5/thumbnails/23.jpg)
Vanilla Air, Spark, Spark Sql for Solr
• Vanilla Air – Can Process R Packages – Can scale with growing number of documents
www.vanillasmartdata.com
![Page 24: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil](https://reader031.vdocument.in/reader031/viewer/2022021919/586fde401a28ab18428b6b27/html5/thumbnails/24.jpg)
Vanilla Air, Spark, Spark Sql for Solr
Easy Switch in Architecture -‐> scalability
![Page 25: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil](https://reader031.vdocument.in/reader031/viewer/2022021919/586fde401a28ab18428b6b27/html5/thumbnails/25.jpg)
Vanilla Air, Spark, Spark & R & Solr
Spark 1.5 Version 1.5 (sept 2015) support for YARN cluster mode in R
![Page 26: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil](https://reader031.vdocument.in/reader031/viewer/2022021919/586fde401a28ab18428b6b27/html5/thumbnails/26.jpg)
Vanilla Air, Spark, Spark & R & Solr
We have now Spark & Solr Tools : SolrRDD Tools for reading data from Solr as a Spark RDD and indexing objects from Spark into Solr using SolrJ
hlps://github.com/LucidWorks/spark-‐solr
![Page 27: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil](https://reader031.vdocument.in/reader031/viewer/2022021919/586fde401a28ab18428b6b27/html5/thumbnails/27.jpg)
Vanilla Air, Spark, Spark & R & Solr
Admin Side – Runing complex R program on Solr index, using Vanilla Air
![Page 28: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil](https://reader031.vdocument.in/reader031/viewer/2022021919/586fde401a28ab18428b6b27/html5/thumbnails/28.jpg)
![Page 29: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil](https://reader031.vdocument.in/reader031/viewer/2022021919/586fde401a28ab18428b6b27/html5/thumbnails/29.jpg)
Lucky One !
![Page 30: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil](https://reader031.vdocument.in/reader031/viewer/2022021919/586fde401a28ab18428b6b27/html5/thumbnails/30.jpg)