a new content processing framework for search applications iain fletcher...
DESCRIPTION
A New Content Processing Framework for Search Applications Iain Fletcher [email protected]. Agenda. Briefly About Search Technologies Key Issues for Enterprise Search A New Content Processing Framework for Search Applications How do we use it? What does it look like? - PowerPoint PPT PresentationTRANSCRIPT
![Page 2: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com](https://reader036.vdocument.in/reader036/viewer/2022081520/56816856550346895dde735e/html5/thumbnails/2.jpg)
2Agenda
• Briefly About Search Technologies• Key Issues for Enterprise Search• A New Content Processing Framework for
Search Applications• How do we use it?• What does it look like?• Use case example
2
![Page 3: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com](https://reader036.vdocument.in/reader036/viewer/2022081520/56816856550346895dde735e/html5/thumbnails/3.jpg)
3Search Technologies overview 3
• The leading IT services company focused on search engines• Consulting• Implementation• Managed services
• Technology independent, working with most of the leading search engines
• 90 staff, 250+ customers
![Page 4: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com](https://reader036.vdocument.in/reader036/viewer/2022081520/56816856550346895dde735e/html5/thumbnails/4.jpg)
4Search Technologies overview
San Diego, CA
San Jose, CR
Herndon, VA
Ascot, UKBoston, MACincinnati, OH
![Page 5: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com](https://reader036.vdocument.in/reader036/viewer/2022081520/56816856550346895dde735e/html5/thumbnails/5.jpg)
5Executive team
Executive Enterprise Search Industry Experience
Kamran KhanPresident & CEO
18 years: International Sales, VP Sales, Executive
John Steinhauer VP Technology
16 years: Development Management, Project Management, Executive
Paul NelsonChief Architect
22 years: Development, Innovation, Architecting, Dev. Management
Graham CharlesworthVP Europe
16 years: Business Development, VP Sales, Executive
Phil LewisTech. Director, Europe
19 years: Development, Innovation, Architecting, Project Management
Dennis TranVP & Founder
21 years: International Sales, VP Sales
John BackVP Sales
15 years: Sales, Federal Sales Director
Iain FletcherVP Marketing
16 years: International Sales, Product Management, VP Marketing
# years in the search engine industry
5
![Page 6: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com](https://reader036.vdocument.in/reader036/viewer/2022081520/56816856550346895dde735e/html5/thumbnails/6.jpg)
6Selected customers 6
![Page 7: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com](https://reader036.vdocument.in/reader036/viewer/2022081520/56816856550346895dde735e/html5/thumbnails/7.jpg)
7
7
A New Content Processing Framework for Search Applications
![Page 8: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com](https://reader036.vdocument.in/reader036/viewer/2022081520/56816856550346895dde735e/html5/thumbnails/8.jpg)
8Agenda
• Briefly About Search Technologies• Key Issues for Enterprise Search• A New Content Processing Framework for
Search Applications• How do we use it?• What does it look like?• Use case example
8
![Page 9: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com](https://reader036.vdocument.in/reader036/viewer/2022081520/56816856550346895dde735e/html5/thumbnails/9.jpg)
9Enterprise Search - An Indifferent Reputation
• Major surveys show that no progress has been made during the last 10 years
• Searchers are successful in finding what they seek 50% of the time or less • 2001, IDC, “Quantifying Enterprise Search”
• More than half cannot find the information they need using their Enterprise search system • 2011, MindMetre/SmartLogic, “Mind the Enterprise
Search Gap”
9
![Page 10: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com](https://reader036.vdocument.in/reader036/viewer/2022081520/56816856550346895dde735e/html5/thumbnails/10.jpg)
10Search Fundamentals 10
![Page 11: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com](https://reader036.vdocument.in/reader036/viewer/2022081520/56816856550346895dde735e/html5/thumbnails/11.jpg)
11Metadata Supports Relevance Ranking
![Page 12: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com](https://reader036.vdocument.in/reader036/viewer/2022081520/56816856550346895dde735e/html5/thumbnails/12.jpg)
12Metadata Supports Relevance Ranking
Supported by great metadata!• Title• Meta description•URL• Inbound links• Alt tag text•Etc.•Provided for free by millions of SEO practitioners
![Page 13: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com](https://reader036.vdocument.in/reader036/viewer/2022081520/56816856550346895dde735e/html5/thumbnails/13.jpg)
13Key Issues
• Almost all modern search functions are driven by data structure
13
![Page 14: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com](https://reader036.vdocument.in/reader036/viewer/2022081520/56816856550346895dde735e/html5/thumbnails/14.jpg)
14Key Issues
• The majority of serious problems in serious search systems are caused by data quality issues
Also...• “Big Data” and BI from unstructured data will
face the same challenges• Can you trust an analysis if you are unsure of data
providence?
14
![Page 15: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com](https://reader036.vdocument.in/reader036/viewer/2022081520/56816856550346895dde735e/html5/thumbnails/15.jpg)
15Data quality examples
• The subscription portal caught out by template information
• The Intranet search skewed by a new piece of hardware
• The Intranet search where great quality was the problem!
15
![Page 16: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com](https://reader036.vdocument.in/reader036/viewer/2022081520/56816856550346895dde735e/html5/thumbnails/16.jpg)
16Key Issues
• Data structure and quality issues are addressed in the indexing pipelines of search engines• Cleaning, enriching, normalizing, granularizing...
• It is about process as much as technology• And data constantly evolves
• Sometimes the built-in indexing pipeline is not good enough (issues with scale, flexibility or transparency)• Some search engines don’t really have one
• We’ve written our own
16
![Page 17: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com](https://reader036.vdocument.in/reader036/viewer/2022081520/56816856550346895dde735e/html5/thumbnails/17.jpg)
17Agenda
• Briefly About Search Technologies• Key Issues for Enterprise Search• A New Content Processing Framework for
Search Applications• How do we use it?• What does it look like?• Use case example
17
![Page 18: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com](https://reader036.vdocument.in/reader036/viewer/2022081520/56816856550346895dde735e/html5/thumbnails/18.jpg)
18Document Processing Methodology for Search (DPMS)
• The Philosophy• Understand the Document Model• Understand the User Model
• Includes business-level requirements• Create the Search Engine Model
• Search = the pivot point between User and Data• Document everything
18
![Page 19: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com](https://reader036.vdocument.in/reader036/viewer/2022081520/56816856550346895dde735e/html5/thumbnails/19.jpg)
19DPMS – The Methodology
Assessment (Search Technologies
Architect and Business Analyst)
DPMSAnalysis
(Knowledge Engineer, Business Analyst, etc.)
Assessment Report
Expert assessment and recommendations
ValidationAspire
DMDsReview
(Architect, Domain Experts, Peers)
1Assessment
2Detailed Analysis
3Execution
Implementation(Developer)
Validate DMDsSearchEngine
![Page 20: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com](https://reader036.vdocument.in/reader036/viewer/2022081520/56816856550346895dde735e/html5/thumbnails/20.jpg)
20DPMS – The Implementation
![Page 21: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com](https://reader036.vdocument.in/reader036/viewer/2022081520/56816856550346895dde735e/html5/thumbnails/21.jpg)
21Introducing “Aspire”
• Think of it as a stand-alone indexing pipeline with a framework + component architecture
• Framework built for scalability, performance and flexibility – designed to use cloud elasticity
• Components built to be autonomous and transparent
![Page 22: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com](https://reader036.vdocument.in/reader036/viewer/2022081520/56816856550346895dde735e/html5/thumbnails/22.jpg)
22Technology Suite
• 100% Java• OSGi™ See www.osgi.org
• The Dynamic Module System for Java™• Apache Felix
• Open source implementation of OSGi• Jetty
• Embedded HTTP server• Maven & Maven Repositories
• For component deployment
![Page 23: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com](https://reader036.vdocument.in/reader036/viewer/2022081520/56816856550346895dde735e/html5/thumbnails/23.jpg)
23Component Configuration
• Any number of document processing pipelines can be used in an application
• Disparate data sources will need different treatment• Components can be shared where appropriate• Configurations are easy to change
23
![Page 24: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com](https://reader036.vdocument.in/reader036/viewer/2022081520/56816856550346895dde735e/html5/thumbnails/24.jpg)
24Component autonomy
• Components communicate via XML• Each component has a known and transparent input and output,
and can be tested in isolation• This simplifies problem diagnosis, promotes transparency and
controls cost-of-ownership
24
![Page 25: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com](https://reader036.vdocument.in/reader036/viewer/2022081520/56816856550346895dde735e/html5/thumbnails/25.jpg)
25Data Quality Monitoring
• Components have built-in quarantine systems to monitor data quality
• Content is constantly evolving• This provides transparency and enables content issues to be
diagnosed and resolved faster
25
![Page 26: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com](https://reader036.vdocument.in/reader036/viewer/2022081520/56816856550346895dde735e/html5/thumbnails/26.jpg)
26The Component Library
• Search Technologies maintains a library of components
• Currently there are more than 70• Components can be as simple as 3 lines
of groovy script, or complex, 3rd party technologies
• Many applications can be addressed using existing components + configuration
26
![Page 27: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com](https://reader036.vdocument.in/reader036/viewer/2022081520/56816856550346895dde735e/html5/thumbnails/27.jpg)
27Component Upgrading
• Components can be upgraded in-situ from a cloud-based service, without stopping/restarting the system
• Helpful in the maintenance of complex or mission-critical systems
27
![Page 28: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com](https://reader036.vdocument.in/reader036/viewer/2022081520/56816856550346895dde735e/html5/thumbnails/28.jpg)
28Component control
• Every component has its own control / status page
28
![Page 29: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com](https://reader036.vdocument.in/reader036/viewer/2022081520/56816856550346895dde735e/html5/thumbnails/29.jpg)
29A very simple example
![Page 30: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com](https://reader036.vdocument.in/reader036/viewer/2022081520/56816856550346895dde735e/html5/thumbnails/30.jpg)
30Security expansion example
![Page 31: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com](https://reader036.vdocument.in/reader036/viewer/2022081520/56816856550346895dde735e/html5/thumbnails/31.jpg)
31Patent Assignee Name Normalization
![Page 32: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com](https://reader036.vdocument.in/reader036/viewer/2022081520/56816856550346895dde735e/html5/thumbnails/32.jpg)
32Complexity example 32
• CPA Global Discover• The world’s leading patent research
portal• 80 million patents from 95 patent offices• More than a dozen navigators built• Numerous graphical search results
display options• Whole document comparison features
![Page 33: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com](https://reader036.vdocument.in/reader036/viewer/2022081520/56816856550346895dde735e/html5/thumbnails/33.jpg)
33In Summary
• Many applications today don’t need this level of diligence• But as data and data dynamism grows, more will
• A stand-alone unstructured content processing system can serve multiple applications, and makes sense for some companies
• Method. Diligence. Transparency – its not rocket science...
• Applying this approach to enterprise search is a key part of moving user satisfaction forward during the next few years
33