the enterprise search market in a nutshell

Post on 12-Apr-2017

1.879 Views

Category:

Internet

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

The Enterprise Search Market in a Nutshell

Iain Fletcher

ifletcher@searchtechnologies.com

October 19, 2015

ICIC 2015, Nice

2

Agenda

• About Search Technologies (30 seconds)

• The enterprise search market

• Likely future architectures for supporting

important search applications

3

Search Technologies: Background

San Diego

London UK

San Jose, CR

Cincinnati

San Francisco

Washington (HQ)

Frankfurt DE

• Founded 2005

• 180 employees

• 600+ customers

• Independent consulting company

• Focus on enterprise search

• Working will all leading platforms

Prague, CZ

5

The Enterprise Search Market

6

High-level Search Engine Classifications

1. Part of a portfolio, many are recently acquired technologies

– E.g. SharePoint/FAST, HP Autonomy, IBM/Vivisimo, Dassault/Exalead,

Oracle/Endeca

2. Stand-alone specialists, often deployed to address specific apps or

challenges

– E.g. GSA, Coveo, Attivio, Sinequa, Recommind

3. Open source, with or without support or proprietary add-ons

– Raw: Lucene, Solr, Elasticsearch

– With support/add-ons: LucidWorks, Cloudera Search, Elastic ELK

4. Cloud-based services, typically based on open source technology

– E.g. Amazon Cloudsearch (Solr), Microsoft Azure search (Elasticsearch)

7

The dominant market share is currently with

SharePoint, open source, and the GSA

• SharePoint 2013 search is credible, and bundled

– Search teams are under pressure to use it, or to provide a

compelling reason to do otherwise

• Solr and Elasticsearch are robust and reliable

– Thanks to very wide-spread deployment

• The Google brand sells – and a lot of GSAs have been

shipped during the past few years

Market Observations

8

Functional Observations

• Core indexing / searching is generally fast and reliable

– Search is a maturing / converging technology

• Key differences remain in peripheral functionality, such as

content processing prior to indexing, and query processing

– Coveo, Attivio, Sinequa etc. have well-developed indexing

pipelines, UI tools, and a range of data connectors

– SharePoint and GSA are delivered with limited content

processing functionality and limited connectivity

– Solr, Elasticsearch, AWS Cloudsearch and Azure search don’t

provide a formal indexing pipeline, UI, or connectors

9

Further Observations

• The search engines with less focus on peripheral issues

such as content processing and connectivity have dominant

market share

• Connectivity is often challenging, especially when

combined with continual data growth, and document-level

security requirements

• The movement of data sets to the cloud adds further

complexity for enterprise search systems

– Hybrid indexing environments will be with us for some years

– Some content sets in the cloud, some behind the firewall

10

Great Search requires Attention to Detail

E.g. in content processing

prior to indexing • Normalization

– Names, dates, synonyms….

• Entity identification and resolution

• Categorization

• Document vector extraction

• Document splitting and concatenation

• Link & popularity analysis

• Dupe & near-dupe detectionIndex

security

category

metadata

11

Future Directions for Search

So what will search architectures look like in the future?

Important influences:

• The business need for organizational and analytical agility

• The convergence of search and (“big data”) analytics

• Continual growth in data volumes, and evolution in

repository / storage fashions

12

Converging Architectures

Let’s take a brief look at:

1. The “Big Data Architecture”, as evangelized by IBM,

Cloudera, etc.

2. Recent Search Architectures

Background Info

13

The Big Data Architecture

Designed for Structured Data

14

The Traditional Search Architecture

Integrated Search EngineContentSources

Connectors Index Pipeline SearchIndexEmployee

Directory

CMS

File Share

UI

Etc.

Designed for Unstructured Content

15

The Traditional Search Architecture

Integrated Search EngineContentSources

Connectors Index Pipeline SearchIndexEmployee

Directory

CMS

File Share

UI

Etc.

• As data volumes grow, re-indexing

becomes challenging

• The rate at which content can be

acquired from repositories is usually the

bottleneck

Designed for Unstructured Content

16

The Traditional Search Architecture

Integrated Search EngineContentSources

Connectors Index Pipeline SearchIndexEmployee

Directory

CMS

File Share

UI

Etc.

• A few documents-per-second?

• There are only 2.6 million seconds in a

month

RE-INDEX

17

A Better Search Architecture

• Re-indexing rates greatly improved

• “Touch-time” with repositories can be managed autonomously

Search EngineContentSources

ConnectorsIndex

PipelineSearchIndex

EmployeeDirectory

CMS

Etc.

RE-INDEX

Content

Processing

SecureCache

Iterative

Development

18

The Future Architecture?

Hadoop

Search EngineContentSources

ConnectorsIndex

PipelineSearchIndexEmployee

Directory

CMS

Etc.

RE-INDEX

Content

Processing

SecureCache

Iterative

Development

• This environment will encourage ever more sophisticated text analytics

• We expect to see much innovation in text analytics during the next few years

• The deliverable is a better, and richer search index

19

An Established Architecture

Hadoop

Search EngineContentSources

ConnectorsIndex

PipelineSearchIndexEmployee

Directory

CMS

Etc.

RE-INDEX

Content

Processing

SecureCache

Iterative

Development

• Google.com works something like this, since 2004

20

An Integrated Search/Analytics Architecture

Hadoop

ContentSources

Connectors

CMS

File system

Rapid Indexing

Content

Processing

SecureCache

Iterative

Development

ETL

DataSources

Data Warehouse

Logfiles

Etc.

Etc. Search App.

Search App.

Analysis App.

Analysis App.

• Encourages agile exploitation of data and content resources

21

Summary 1

• Search and Big Data applications are tending towards to the same architecture

• Autonomous connectivity and content processing simplifies and de-risks – if you can get it right

• The foundation of great search is still a clean, rich and detailed index

• The “search index” itself is a mature technology, almost a commodity

• Much of the innovation during the next few years will be in text analytics, and other methods of preparing content prior to indexing

22

The compulsory analyst quote….

And finally….

“Enterprise Search Can Bring Big Data Within Reach”

• Multiple, purpose-built indexes that are derived from enriched content are necessary.

http://blogs.gartner.com/darin-stewart/2014/04/01/enterprise-search-can-bring-big-data-within-reach/

* Darin Stewart, Enterprise Search Can Bring Big Data Within Reach, April 2014 Blog

23

The Enterprise Search Market in a Nutshell

Iain Fletcher

ifletcher@Searchtechnologies.com

October 20, 2015

Questions?

24

Spare Slides

25

Reference Architecture

Content sources

Connectors

Indexes

Semantics

Text Mining

Quality Metrics

Content Processing Pipelines

Big Data Framework

Indexes

Queryparsing

Search Engine

Web Browser

Staging Repository

26

Where is the Focus?

• The Business View

• The Implementation View

ApplicationContent Capture & Preparation

Data Store

/ Index

ApplicationContent Capture

& PreparationData Store

/ Index

top related