the enterprise search market in a nutshell

The Enterprise Search Market in a Nutshell

Iain Fletcher

ifletcher@searchtechnologies.com

October 19, 2015

ICIC 2015, Nice

Agenda

• About Search Technologies (30 seconds)

• The enterprise search market

• Likely future architectures for supporting

important search applications

Search Technologies: Background

San Diego

London UK

San Jose, CR

Cincinnati

San Francisco

Washington (HQ)

Frankfurt DE

• Founded 2005

• 180 employees

• 600+ customers

• Independent consulting company

• Focus on enterprise search

• Working will all leading platforms

Prague, CZ

600+ Customers

The Enterprise Search Market

High-level Search Engine Classifications

1. Part of a portfolio, many are recently acquired technologies

– E.g. SharePoint/FAST, HP Autonomy, IBM/Vivisimo, Dassault/Exalead,

Oracle/Endeca

2. Stand-alone specialists, often deployed to address specific apps or

challenges

– E.g. GSA, Coveo, Attivio, Sinequa, Recommind

3. Open source, with or without support or proprietary add-ons

– Raw: Lucene, Solr, Elasticsearch

– With support/add-ons: LucidWorks, Cloudera Search, Elastic ELK

4. Cloud-based services, typically based on open source technology

– E.g. Amazon Cloudsearch (Solr), Microsoft Azure search (Elasticsearch)

The dominant market share is currently with

SharePoint, open source, and the GSA

• SharePoint 2013 search is credible, and bundled

– Search teams are under pressure to use it, or to provide a

compelling reason to do otherwise

• Solr and Elasticsearch are robust and reliable

– Thanks to very wide-spread deployment

• The Google brand sells – and a lot of GSAs have been

shipped during the past few years

Market Observations

Functional Observations

• Core indexing / searching is generally fast and reliable

– Search is a maturing / converging technology

• Key differences remain in peripheral functionality, such as

content processing prior to indexing, and query processing

– Coveo, Attivio, Sinequa etc. have well-developed indexing

pipelines, UI tools, and a range of data connectors

– SharePoint and GSA are delivered with limited content

processing functionality and limited connectivity

– Solr, Elasticsearch, AWS Cloudsearch and Azure search don’t

provide a formal indexing pipeline, UI, or connectors

Further Observations

• The search engines with less focus on peripheral issues

such as content processing and connectivity have dominant

market share

• Connectivity is often challenging, especially when

combined with continual data growth, and document-level

security requirements

• The movement of data sets to the cloud adds further

complexity for enterprise search systems

– Hybrid indexing environments will be with us for some years

– Some content sets in the cloud, some behind the firewall

Great Search requires Attention to Detail

E.g. in content processing

prior to indexing • Normalization

– Names, dates, synonyms….

• Entity identification and resolution

• Categorization

• Document vector extraction

• Document splitting and concatenation

• Link & popularity analysis

• Dupe & near-dupe detectionIndex

security

Directory

File Share

Designed for Unstructured Content

Directory

File Share

• As data volumes grow, re-indexing

becomes challenging

• The rate at which content can be

acquired from repositories is usually the

bottleneck

Designed for Unstructured Content

Directory

File Share

• A few documents-per-second?

• There are only 2.6 million seconds in a

RE-INDEX

A Better Search Architecture

• Re-indexing rates greatly improved

• “Touch-time” with repositories can be managed autonomously

Search EngineContentSources

ConnectorsIndex

PipelineSearchIndex

EmployeeDirectory

RE-INDEX

Content

Processing

SecureCache

Iterative

Development

The Future Architecture?

Hadoop

ConnectorsIndex

PipelineSearchIndexEmployee

Directory

RE-INDEX

Content

Processing

SecureCache

Iterative

Development

• This environment will encourage ever more sophisticated text analytics

• We expect to see much innovation in text analytics during the next few years

• The deliverable is a better, and richer search index

An Established Architecture

Hadoop

ConnectorsIndex

PipelineSearchIndexEmployee

Directory

RE-INDEX

Content

Processing

SecureCache

Iterative

Development

• Google.com works something like this, since 2004

An Integrated Search/Analytics Architecture

Hadoop

ContentSources

Connectors

File system

Rapid Indexing

Content

Processing

SecureCache

Iterative

Development

DataSources

Data Warehouse

Logfiles

Etc. Search App.

Search App.

Analysis App.

• Encourages agile exploitation of data and content resources

Summary 1

• Search and Big Data applications are tending towards to the same architecture

• Autonomous connectivity and content processing simplifies and de-risks – if you can get it right

• The foundation of great search is still a clean, rich and detailed index

• The “search index” itself is a mature technology, almost a commodity

• Much of the innovation during the next few years will be in text analytics, and other methods of preparing content prior to indexing

The compulsory analyst quote….

And finally….

“Enterprise Search Can Bring Big Data Within Reach”

• Multiple, purpose-built indexes that are derived from enriched content are necessary.

http://blogs.gartner.com/darin-stewart/2014/04/01/enterprise-search-can-bring-big-data-within-reach/

* Darin Stewart, Enterprise Search Can Bring Big Data Within Reach, April 2014 Blog

The Enterprise Search Market in a Nutshell

Iain Fletcher

ifletcher@Searchtechnologies.com

October 20, 2015

Questions?

Spare Slides

Reference Architecture

Content sources

Connectors

Indexes

Semantics

Text Mining

Quality Metrics

Content Processing Pipelines

Big Data Framework

Indexes

Queryparsing

Search Engine

Web Browser

Staging Repository

Where is the Focus?

• The Business View

• The Implementation View

ApplicationContent Capture & Preparation

Data Store

/ Index

ApplicationContent Capture

& PreparationData Store

/ Index

the enterprise search market in a nutshell

Internet

the enterprise search experience

enterprise machine learning in a nutshell

oracle secure enterprise search 11g r2...oracle secure...

best practice enterprise search

enterprise search - introduction

positioning enterprise search

enterprise search & retrieval platform

per meyler,proactive: enterprise search 2.0. videndanmark...

enterprise search - actionable information

microsoft enterprise search

capitalising on enterprise search

eccenca enterprise search scenario

enterprise search in_drupal_pub

hfdp1 - search...

enterprise search - cvut.cz · overview 1 motivation 2 de...

search analytics at enterprise search summit fall 2011

enterprise search platform

thesaurus based enterprise search

secure enterprise search

enterprise search introduktion