Download - Search features and architecture in DNN 7.1
![Page 1: Search features and architecture in DNN 7.1](https://reader035.vdocument.in/reader035/viewer/2022070316/55616c46d8b42a654b8b563f/html5/thumbnails/1.jpg)
@DNNCon @ashishprasad
Don’t forget to include #DNNCon in your tweets!
7.1 Search and Lucene.Net
Ash Prasad
![Page 2: Search features and architecture in DNN 7.1](https://reader035.vdocument.in/reader035/viewer/2022070316/55616c46d8b42a654b8b563f/html5/thumbnails/2.jpg)
@DNNCon @ashishprasad
Don’t forget to include #DNNCon in your tweets!
• History and New Objectives • Architecture• Lucene / Lucene.Net• Crawlers, Entities, Controllers• Ranking, Synonyms, Ignore Words,
Stemming• Security Trimming• Module Integration, New Crawler
Agenda
![Page 3: Search features and architecture in DNN 7.1](https://reader035.vdocument.in/reader035/viewer/2022070316/55616c46d8b42a654b8b563f/html5/thumbnails/3.jpg)
@DNNCon @ashishprasad
Don’t forget to include #DNNCon in your tweets!
• Platform Edition• SQL Server• ISearchable
• Commercial Edition• Lucene 2.9.2• URL and Files
History of Search
Lucene
Scheduler
SQL
Scheduler Module
Module
ISearchable
![Page 4: Search features and architecture in DNN 7.1](https://reader035.vdocument.in/reader035/viewer/2022070316/55616c46d8b42a654b8b563f/html5/thumbnails/4.jpg)
@DNNCon @ashishprasad
Don’t forget to include #DNNCon in your tweets!
• Handle diverse Content • CMS, Social, Localized, 3rd Party
Modules)
• Consistent User Experience• Simple for Module Developers• Uniform Architecture • Feature based differentiation
Objectives of New Search
![Page 5: Search features and architecture in DNN 7.1](https://reader035.vdocument.in/reader035/viewer/2022070316/55616c46d8b42a654b8b563f/html5/thumbnails/5.jpg)
@DNNCon @ashishprasad
Don’t forget to include #DNNCon in your tweets!
Architecture
![Page 6: Search features and architecture in DNN 7.1](https://reader035.vdocument.in/reader035/viewer/2022070316/55616c46d8b42a654b8b563f/html5/thumbnails/6.jpg)
@DNNCon @ashishprasad
Don’t forget to include #DNNCon in your tweets!
• Java-based indexing and search technology
• Managed by Apache• NOSQL database• Near real-time, Spellchecking,
Highlighting, Ranking, Synonyms
• Many companies use Lucene directly or customize
• Facebook’s Graph search uses
similar ‘Inverted Index’
What’s Lucene
![Page 7: Search features and architecture in DNN 7.1](https://reader035.vdocument.in/reader035/viewer/2022070316/55616c46d8b42a654b8b563f/html5/thumbnails/7.jpg)
@DNNCon @ashishprasad
Don’t forget to include #DNNCon in your tweets!
• Line-by-line port from Java to C#• Maintains high-performance requirements• A bit behind Java releases• Who Uses Lucene.Net• Products - RavenDB, Orchard, Umbraco,
SubText• Commercial Sites – BBC UK Top Gear,
AutoDesk, Koders.Com
What’s Lucene.Net
![Page 8: Search features and architecture in DNN 7.1](https://reader035.vdocument.in/reader035/viewer/2022070316/55616c46d8b42a654b8b563f/html5/thumbnails/8.jpg)
@DNNCon @ashishprasad
Don’t forget to include #DNNCon in your tweets!
• Flexible Schema
• Consists of Documents• Which are collection of Fields
• Documents can have different set of Fields• Field(“ID”,”xxx-yyy-999”), Field(“Title”,
“My best doc”)• Field(“Owner”,”Ash”),
Field(“Locale”,”en-US”)
Lucene – A Document Store
![Page 9: Search features and architecture in DNN 7.1](https://reader035.vdocument.in/reader035/viewer/2022070316/55616c46d8b42a654b8b563f/html5/thumbnails/9.jpg)
@DNNCon @ashishprasad
Don’t forget to include #DNNCon in your tweets!
• Denormalized (No Referential Integrity)
• Deletion – Done through a flag• Compact reclaims deleted space
• Update is Delete + Insert • Boost = Ranking• Unicode compliant
Lucene – A Document Store (Contd.)
![Page 10: Search features and architecture in DNN 7.1](https://reader035.vdocument.in/reader035/viewer/2022070316/55616c46d8b42a654b8b563f/html5/thumbnails/10.jpg)
@DNNCon @ashishprasad
Don’t forget to include #DNNCon in your tweets!
Book consulted for Search
• Book on version 3.0
• ~ 500 pages• Very useful
![Page 11: Search features and architecture in DNN 7.1](https://reader035.vdocument.in/reader035/viewer/2022070316/55616c46d8b42a654b8b563f/html5/thumbnails/11.jpg)
@DNNCon @ashishprasad
Don’t forget to include #DNNCon in your tweets!
Search Phases
Content Acquisition• Crawling• ISearchable• ModuleSearchBase• URL• Doc / PDF
Content Indexing• Text Analysis• Ranking• Synonyms• Ignore Words• Stemming
Content Search• Querying• Sorting• Security Trimming• Boolean Search• Highlighting
![Page 12: Search features and architecture in DNN 7.1](https://reader035.vdocument.in/reader035/viewer/2022070316/55616c46d8b42a654b8b563f/html5/thumbnails/12.jpg)
@DNNCon @ashishprasad
Don’t forget to include #DNNCon in your tweets!
• Platform• Site Crawler• Module and Tab Metadata• Module Content
(ModuleSearchBase/ISearchable)
• Commercial Edition• File Crawler • Uses IFilter for extraction of text
PDF/Office files
• URL Crawler• Internal and External URLs
Crawlers
![Page 13: Search features and architecture in DNN 7.1](https://reader035.vdocument.in/reader035/viewer/2022070316/55616c46d8b42a654b8b563f/html5/thumbnails/13.jpg)
@DNNCon @ashishprasad
Don’t forget to include #DNNCon in your tweets!
• SearchType• Distinguishes Crawlers
• SearchDocument• Properties for a Content• Stored in the Index
• SearchQuery• Parameters to execute a Query
• SearchResult• Derived from SearchDocument
Search Entities
![Page 14: Search features and architecture in DNN 7.1](https://reader035.vdocument.in/reader035/viewer/2022070316/55616c46d8b42a654b8b563f/html5/thumbnails/14.jpg)
@DNNCon @ashishprasad
Don’t forget to include #DNNCon in your tweets!
Search Entities – Indexing vs. Querying
![Page 15: Search features and architecture in DNN 7.1](https://reader035.vdocument.in/reader035/viewer/2022070316/55616c46d8b42a654b8b563f/html5/thumbnails/15.jpg)
@DNNCon @ashishprasad
Don’t forget to include #DNNCon in your tweets!
• SearchController• For Querying
• InternalSearchController• For Adding / Updating / Deleting
• LuceneController• Interacts with Lucene
Controllers
![Page 16: Search features and architecture in DNN 7.1](https://reader035.vdocument.in/reader035/viewer/2022070316/55616c46d8b42a654b8b563f/html5/thumbnails/16.jpg)
@DNNCon @ashishprasad
Don’t forget to include #DNNCon in your tweets!
• Doc and/or Field can be boosted in Lucene
• DNN does Field boosts (Default - 10)• Title (50)• Tag (40)• Keyword (35)• Description (20)• Author (15)
• Configured manually by HostSettings
Ranking = Boosting
![Page 17: Search features and architecture in DNN 7.1](https://reader035.vdocument.in/reader035/viewer/2022070316/55616c46d8b42a654b8b563f/html5/thumbnails/17.jpg)
@DNNCon @ashishprasad
Don’t forget to include #DNNCon in your tweets!
• Synonyms are injected into Index
• Ignore Words are removed from Index
Synonyms and Ignore Words
![Page 18: Search features and architecture in DNN 7.1](https://reader035.vdocument.in/reader035/viewer/2022070316/55616c46d8b42a654b8b563f/html5/thumbnails/18.jpg)
@DNNCon @ashishprasad
Don’t forget to include #DNNCon in your tweets!
• Convert words to its root• PorterStemFilter is used• Country and Countries = countri• breathe, breathes, breathing,
breathed = breath• fishing, fished, fisher = fish
Stemming
![Page 19: Search features and architecture in DNN 7.1](https://reader035.vdocument.in/reader035/viewer/2022070316/55616c46d8b42a654b8b563f/html5/thumbnails/19.jpg)
@DNNCon @ashishprasad
Don’t forget to include #DNNCon in your tweets!
• Done through Collectors (Callback)
• Each Doc found is sent to Collector
• Collector rejects/accept per Permission
• Site Crawler - Module / Tab Permission
• File Crawler - Folder Permission• User Crawler – Profile
Permission
Security Trimming
![Page 20: Search features and architecture in DNN 7.1](https://reader035.vdocument.in/reader035/viewer/2022070316/55616c46d8b42a654b8b563f/html5/thumbnails/20.jpg)
@DNNCon @ashishprasad
Don’t forget to include #DNNCon in your tweets!
• ModuleSearchBase • New abstract class with just one
method• Defined in BusinessControllerClass• GetModifiedSearchDocuments• Returns New, Changed and Deleted
content• Delta based• Granular Permission, Localization, etc.
• ISearchable continues to work (no delta)
Module Integration
![Page 21: Search features and architecture in DNN 7.1](https://reader035.vdocument.in/reader035/viewer/2022070316/55616c46d8b42a654b8b563f/html5/thumbnails/21.jpg)
@DNNCon @ashishprasad
Don’t forget to include #DNNCon in your tweets!
• Define a new SearchType• Optionally use IsPrivate to hide
from site search
• Implement BaseResultController (2 methods)• HasViewPermission• GetDocUrl
• Create Scheduled Task• Call AddSearchDocuments to inject
content
New Crawler (How to)
![Page 22: Search features and architecture in DNN 7.1](https://reader035.vdocument.in/reader035/viewer/2022070316/55616c46d8b42a654b8b563f/html5/thumbnails/22.jpg)
@DNNCon @ashishprasad
Don’t forget to include #DNNCon in your tweets!
Demo
![Page 23: Search features and architecture in DNN 7.1](https://reader035.vdocument.in/reader035/viewer/2022070316/55616c46d8b42a654b8b563f/html5/thumbnails/23.jpg)
@DNNCon @ashishprasad
Don’t forget to include #DNNCon in your tweets!
• New Search uses Lucene.Net• Platform has Site Crawler • Commercial has URL and File
Crawlers• Modules to implement
ModuleSearchBase• New Crawler implements
BaseResultController
Recap
![Page 24: Search features and architecture in DNN 7.1](https://reader035.vdocument.in/reader035/viewer/2022070316/55616c46d8b42a654b8b563f/html5/thumbnails/24.jpg)
@DNNCon @ashishprasad
Don’t forget to include #DNNCon in your tweets!
THANKS TO ALL OF OUR GENEROUS SPONSORS!