searching does not mean finding stuff - apache solr for typo3
Post on 22-Oct-2014
1.485 views
DESCRIPTION
TRANSCRIPT
![Page 1: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/1.jpg)
http://www.dkd.de
Freitag, 10. Juni 2011
![Page 2: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/2.jpg)
d dkdevelopmentkommunikationdesign
Freitag, 10. Juni 2011
![Page 3: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/3.jpg)
Welcome
Olivier DobberkauCEOdkd Internet Service GmbHFrankfurt am Main, Germany
Freitag, 10. Juni 2011
![Page 4: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/4.jpg)
Agenda
What is search?
Search in TYPO3
Search expectations today
Apache Solr
Why and how?
Watch out!
Freitag, 10. Juni 2011
![Page 5: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/5.jpg)
About�me
Freitag, 10. Juni 2011
![Page 6: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/6.jpg)
Olivier�Dobberkau
Founder of dkd Internet Service GmbH
aka „the reverend never-end“
Met TYPO3 with Version 3.2 beta 3
Member of T3A BCC
43 years old
Twitter: @T3RevNeverEnd
Freitag, 10. Juni 2011
![Page 7: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/7.jpg)
What�is�Search?
Freitag, 10. Juni 2011
![Page 8: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/8.jpg)
Definition�of�Information�Retrieval
Information retrieval (IR) is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching relational databases and the World Wide Web.
Wikipedia: http://en.wikipedia.org/wiki/Information_retrieval
Freitag, 10. Juni 2011
![Page 9: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/9.jpg)
Factors�in�Information�Retrieval
Recall
Precision
Fall-out
Scalability
Performance
Freitag, 10. Juni 2011
![Page 10: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/10.jpg)
Factors�in�Information�Retrieval
Recall
Precision
Fall-out
Scalability
Performance
Simplicity
Flexibility
Freitag, 10. Juni 2011
![Page 11: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/11.jpg)
Recall
Percent of documents that are returned
400 documents
100 containing information
25% recall
Freitag, 10. Juni 2011
![Page 12: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/12.jpg)
Precision
Percentage of documents that are relevant
500 returned, 100 relevant
20% precision
Freitag, 10. Juni 2011
![Page 13: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/13.jpg)
Best would be:
100% Recall with 100% Precision
Freitag, 10. Juni 2011
![Page 14: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/14.jpg)
Index
The purpose of storing an index is to optimize speed and performance in !nding relevant documents for a search query.
Freitag, 10. Juni 2011
![Page 15: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/15.jpg)
Index
Index
Document 5
Document 4
Document 3
Document 2
Document 1
Extbase
TYPO3
San
Baseball
My
is
Francisco
is
cat
T3CON
my
is
a
rocks
Fort
cool
Ghetto
Mason
Sport
Freitag, 10. Juni 2011
![Page 16: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/16.jpg)
Posting�File
Word Document
My 1,2
cat 1
is 1,2,5
cool 1
Baseball 2
Sport 2
San 3
Freitag, 10. Juni 2011
![Page 17: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/17.jpg)
Search�in�TYPO3
Freitag, 10. Juni 2011
![Page 18: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/18.jpg)
Indexed�Search
Indexed Search since TYPO3 Version 3.5
Frontend Indexing through the Frontend
Searches in Pages and in some Filetypes
Works with Languages and Accessrights
Freitag, 10. Juni 2011
![Page 19: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/19.jpg)
Indexed�Search
Index in Database
Problems with large websites
Slow
no sorting
no Templating
OK for small websites
Freitag, 10. Juni 2011
![Page 20: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/20.jpg)
Search�Expectations
Freitag, 10. Juni 2011
![Page 21: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/21.jpg)
Expectation�vs.�Experience
Users expect „Google-Like“ interface and behaviour in search
No one navigates through an online shop
up to 30% of users use the search instead of going through text or navigation
Search is mediocre on a lot of websites
Slow and incomplete
Lots of improvement possible
Freitag, 10. Juni 2011
![Page 22: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/22.jpg)
Apache�Solr
Enterprise Search Server
Freitag, 10. Juni 2011
![Page 23: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/23.jpg)
Apache�Solr
Apache Software Foundation
Enterprise Search Server
uses the Lucene Index
Lots of great Features
CNet, Net"ix, Zappos.com and many more...
Freitag, 10. Juni 2011
![Page 24: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/24.jpg)
Solr�Key-Features
Synonyms
Stopwords
Boosting / Weighting
Facetting
Paid Content / Elevation
Freitag, 10. Juni 2011
![Page 25: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/25.jpg)
Solr�Key-Features
Synonyms
Stopwords
Boosting / Weighting
Facetting
Paid Content / Elevation
Spellchecking / Did you mean?
Freitag, 10. Juni 2011
![Page 26: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/26.jpg)
Solr�Key-Features
Synonyms
Stopwords
Boosting / Weighting
Facetting
Paid Content / Elevation
Spellchecking / Did you mean?
Speed
Freitag, 10. Juni 2011
![Page 27: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/27.jpg)
How�does�it�work?
REST like Interface
Indexing with POST
Search with GET
Results in XML, JSON, PHP and many more
Libraries for many programming languages
SolrPhpClient
Freitag, 10. Juni 2011
![Page 28: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/28.jpg)
Why�and�how?
Freitag, 10. Juni 2011
![Page 29: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/29.jpg)
Scratching�our�Itch
Why?
Indexed Search was too slow
misses a lot of now a days requirements
Freitag, 10. Juni 2011
![Page 30: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/30.jpg)
History
Prototype im Summer 2008
Kick-off February 2009
„Acts like Indexed Search“
Early Access Program
T3CON September 2009 Version 1.0
Freitag, 10. Juni 2011
![Page 31: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/31.jpg)
Components
Indexing
Search
Flexible Templating
Analysis and Statistics
Administration
Freitag, 10. Juni 2011
![Page 32: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/32.jpg)
Challenges
Page Rendering in TYPO3
Access Rights
File Indexing
Easy Setup for Non Java People
Integrating Solr in general
Freitag, 10. Juni 2011
![Page 33: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/33.jpg)
Solutions
Record Monitor und Indexing Queue
Solr Query Parser Plugin
Integration of Apache Tika
Fully Automated bash Install Script
SolrPhpClient
Freitag, 10. Juni 2011
![Page 34: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/34.jpg)
Features
Facetted Search
File Indexing
Multi-language Support
Did you mean
Freitag, 10. Juni 2011
![Page 35: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/35.jpg)
Features
Search Word Highlighting
Autocomplete / Suggestions
Access Rights Support
More to come
Freitag, 10. Juni 2011
![Page 36: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/36.jpg)
Watch�out!
Freitag, 10. Juni 2011
![Page 37: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/37.jpg)
„I do not have any solution. I admire the problem.“Ashleight Brillant, Cartonist and Author.
Freitag, 10. Juni 2011
![Page 38: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/38.jpg)
Common�Problems
Relanvancy Perception Trap
Assumption: Search should display a certain result like an Employee Name
Query: Mike Miller
Results: Mill 100% Relanvancy
Miller 75% Relanvancy
Possible Issue: Stemming on proper Names
Solution: Don‘t stemm Fields with Names
Freitag, 10. Juni 2011
![Page 39: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/39.jpg)
Common�Problems
Finding Corpses in your Corpus
While Searching you !nd „interesting“ Results
You have forgotten to hide content
You have not set the „no search“ Flag
You have made copies of records and forgotten them
Freitag, 10. Juni 2011
![Page 40: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/40.jpg)
Common�Problems
Data updates without using the TCE Main
You wonder: Why do my new records of table XY not show up
You have updated the tables with i.e phpMyAdmin
You might have forgotten to add the Language id in the records
Freitag, 10. Juni 2011
![Page 41: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/41.jpg)
Common�Problems
Can‘t access the Solr Server
You can not access the Solr Server on another Machine
Possible Solution
Freitag, 10. Juni 2011
![Page 42: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/42.jpg)
Common�Problems
Help my Index gets deleted
Syntom: Your Index is empty
Possible Cause: Your Solr Server is not secured
Freitag, 10. Juni 2011
![Page 43: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/43.jpg)
Common�Problems
My news are not being indexed
News that you have in a Sysfolder are not showing up in your Results
The Folder in not in the rootline of the Website
Con!gure the PID of the Sysfolder correctly
Freitag, 10. Juni 2011
![Page 44: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/44.jpg)
Questions?
Freitag, 10. Juni 2011
![Page 45: Searching does not mean finding Stuff - Apache Solr for TYPO3](https://reader034.vdocument.in/reader034/viewer/2022051108/5447744cb1af9f13098b469b/html5/thumbnails/45.jpg)
d dkdevelopmentkommunikationdesign
Thank�you.
Freitag, 10. Juni 2011