sharepoint 2013 search topology and optimization

48
Search Topology and Optimization April 12, 2013 Mike Maadarani SharePoint Architect

Upload: mike-maadarani

Post on 15-Jan-2015

525 views

Category:

Technology


2 download

DESCRIPTION

In this presentation, I am explaining the details of all search components, how to properly configure the search topology, and the options to extend the search farm in a hybrid “cloud/on-premises” scenario. This presentation will explain what you need to consider to design your search, in order to handle your organization's needs. We will dive into scripting a high availability search topology, keeping it healthy and manage your day-to-day search operations. Learn about how to optimize your search for best performance and search relevancy, to support reliable search applications.

TRANSCRIPT

Page 1: SharePoint 2013 Search Topology and Optimization

Search Topology and OptimizationApril 12, 2013

Mike MaadaraniSharePoint Architect

Page 2: SharePoint 2013 Search Topology and Optimization

Bio..

Mike MaadaraniApp Dev and Architecture for over 18 years (15 Years Microsoft, 3 Years with the “Other Guys”)Business focused on Enterprise Content Management, Publishing Sites, & SearchTechnology focused on SharePoint, SQL Server and SharePoint IntegrationArchitect, trainer, and presenterBlog: www.maadarani.com [email protected]; @mikemaadarani

Page 3: SharePoint 2013 Search Topology and Optimization

Configuring SSA and PS

Topology Scenarios

Agenda

Closing and Q&A

Relevancy, Query Builder, &Optimization

SharePoint 2013 Search Overview

Architecture and Resource Utilization

Hybrid… Say What?

Page 4: SharePoint 2013 Search Topology and Optimization

Search in 2010

Crawl Component

Query Component

SharePoint 2010 Search Service Application

Crawl Indexing Engine

Query Engine

Search Admin

Property Store (SQL)

Content

UserWFE

Page 5: SharePoint 2013 Search Topology and Optimization

FAST Search for SharePoint 2010

FAST Content SSA

FAST Query SSA

FAST back-end components(managed separately)

Extensibility:• Sandbox• Entity

Extraction

Crawl Indexing Engine

Query Engine

Content Pipeline

Analysis Engine

Query Pipeline

Search AdminContent

UserWFE

Page 6: SharePoint 2013 Search Topology and Optimization

… In SharePoint 2013

SharePoint 2013 Search Service Application

Index Component

Query Engine

Content Pipeline

Content ProcessingComponent

CrawlComponent

Query ProcessingComponent

AnalyticsProcessingComponent

Query Pipeline

Search Admin

Admin Component

Entire index on local disk

Property Store (SQL)

Content

UserWFE

Analysis Engine

Crawl Indexing Engine

Link/query analysis & recommendations

Separate crawl and indexing

Extensibility:• Web

callout• Entity

Extraction

Page 7: SharePoint 2013 Search Topology and Optimization

SharePoint 2013 Search Architecture

SearchAdmin

Content UserCrawlContentProcessing Index

QueryProcessing WFE

API

AnalyticsProcessing

Crawl

Search Admin

Link

Analytics Reporting

FAST Search Index

SharePointSP AppsDevicesNon-SP UX

HTTPFile sharesSharePointUser profilesLotus Notes DocumentumExchange foldersCustom - BCS

Public API

Search topology components

Content Query

Page 8: SharePoint 2013 Search Topology and Optimization

Why Search is so important?

I just uploaded a document.

Make it searchable, quick!

FAST

Page 9: SharePoint 2013 Search Topology and Optimization

Why Search is so important?

EASY

Page 10: SharePoint 2013 Search Topology and Optimization

Why Search is so important?

EASY

Page 11: SharePoint 2013 Search Topology and Optimization

Why Search is so important?

Search Driven Applications

Page 12: SharePoint 2013 Search Topology and Optimization

Why Search is so important?

Search Everything

I can find ALL of Rob Ford’s hidden videos!

Page 13: SharePoint 2013 Search Topology and Optimization

noderunner.exe noderunner.exe noderunner.exe noderunner.exe

Where does Search live in the farm?

Windows servicesSharePoint Search Host Controller service

Runtime/lifecycle control of search components (except crawler) hostcontrollerservice.exe

SharePoint Server Search serviceCrawl Component mssearch.exe mssdmn.exe

ProcessesNoderunner.exe

Runtime environment for search components (except crawler)

msseearch.exemssdmn.exe

CrawlComponentnoderunner.exe

Search Runtime Environment

hostcontrollerservice.exe

Host Controller

Sh

are

Poin

t A

pp

Serv

er

Admin entitiesSearch Service Instance: Provisioning of the search service on each boxSearch Service Application: SharePoint Configuration entity

Still there, but only Crawl Component

AdminComponent

Query ProcessingComponent

Content ProcessingComponent

IndexComponent

Analytics ProcessingComponent

Page 14: SharePoint 2013 Search Topology and Optimization

Where do I host my components?

Page 15: SharePoint 2013 Search Topology and Optimization

CPU loadDriving factors

QPS

Query transformations

Network loadDriving factors

Number of index partitions

Size of queries and results

Example:20 index partitions @ 20 qps => 200/100 Mbit/s in/outbound

Query processing component (QPC)

Item count

DPS QPS

Load impact (relative)

CPU NetworkDisk

http://social.technet.microsoft.com/wiki/contents/articles/16002.sharepoint-2013-capacity-planning-sizing-and-high-availability-for-search-in-spc172.aspx

Page 16: SharePoint 2013 Search Topology and Optimization

CPU loadDriving factors

QPS and item count

Guidelines per index component @ 2 GHz CPU1M items: 5 QPS per CPU core

5M items: 2 QPS per CPU core

10M items: 1 QPS per CPU core

Disk loadDriving factors

QPS and item count

New content invalidates caches

Disk size: 500GB @ 10M items per index component

Index component

Item count

DPS QPS

Load impact (relative)

CPU NetworkDisk

Page 17: SharePoint 2013 Search Topology and Optimization

Crawl component

CPU loadDriving factors

Documents per second

Link discovery

Crawl management

Network loadDriving factors

Downloading items from content sources

Passing items on to CPC

Disk loadAll documents are temporarily stored in data folder

Item count

DPS QPS

Load impact (relative)

CPU NetworkDisk

Page 18: SharePoint 2013 Search Topology and Optimization

Content processing component (CPC)

CPU loadDriving factors

Documents per second

Document size and complexity

Feature extraction

Estimate: 5-10 DPS per CPU core

Network loadDriving factors

Documents per second

Document size

Item count

DPS QPS

Load impact (relative)

CPU NetworkDisk

Page 19: SharePoint 2013 Search Topology and Optimization

Analytics processing component (APC)

CPU loadDriving factors

Number of items

Site activity

Disk loadLocal disk used for temporary storage

Bulk load, primacy concern is load isolation

Network loadSame as for CPU load

PLUS: Network traffic increases when distributing APC across multiple machines

Item count

DPS QPS

Load impact (relative)

CPU NetworkDisk

Page 20: SharePoint 2013 Search Topology and Optimization

Search administration component

Low CPU and network load

Load increase with more components in the search topology

Item count

DPS QPS

Load impact (relative)

CPU NetworkDisk

Page 21: SharePoint 2013 Search Topology and Optimization

Create your SSA

$SSADB = "SharePoint_Demo_Search"

$SSAName = "Search Service Application SPS Boston"

$SVCAcct = "mcm\sp_search"

$SSI = get-spenterprisesearchserviceinstance -local

#1. Start the search services for SSI

Start-SPEnterpriseSearchServiceInstance -Identity $SSI

#2. Create the Application Pool

$AppPool = new-SPServiceApplicationPool -name $SSAName"-AppPool" -account $SVCAcct

#3. Create the search application and set it to a variable

$SearchApp = New-SPEnterpriseSearchServiceApplication -Name $SSAName -applicationpool $AppPool -databaseserver SQL2012 -databasename $SSADB

#4. Create search service application proxy

$SSAProxy = new-SPEnterpriseSearchServiceApplicationProxy -name $SSAName" Application Proxy" -Uri $SearchApp.Uri.AbsoluteURI

#5. Provision Search Admin Component

Set-SPEnterpriseSearchAdministrationComponent -searchapplication $SearchApp -searchserviceinstance $SSI

#6. Create the topology

$Topology = New-SPEnterpriseSearchTopology -SearchApplication $SearchApp

#7. Assign server(s) to the topology

$hostApp1 = Get-SPEnterpriseSearchServiceInstance -Identity "SPWFE"

New-SPEnterpriseSearchAdminComponent -SearchTopology $Topology -SearchServiceInstance $hostApp1

New-SPEnterpriseSearchCrawlComponent -SearchTopology $Topology -SearchServiceInstance $hostApp1

New-SPEnterpriseSearchContentProcessingComponent -SearchTopology $Topology -SearchServiceInstance $hostApp1

New-SPEnterpriseSearchAnalyticsProcessingComponent -SearchTopology $Topology -SearchServiceInstance $hostApp1

New-SPEnterpriseSearchQueryProcessingComponent -SearchTopology $Topology -SearchServiceInstance $hostApp1

New-SPEnterpriseSearchIndexComponent -SearchTopology $Topology -SearchServiceInstance $hostApp1 –IndexPartition 0

#8. Create the topology

$Topology | Set-SPEnterpriseSearchTopology

Page 22: SharePoint 2013 Search Topology and Optimization

Small Search Topology

Page 23: SharePoint 2013 Search Topology and Optimization

Fault tolerant small search topology

Host

VM

Index QPC

VM

Admin

Crawl

CPC

APC

Host

VM

Index QPC

VM

Admin

Crawl

CPC

APC

Page 24: SharePoint 2013 Search Topology and Optimization

Other SharePoint applications

Web front end

Admin

Crawl

CPC

APC

Index

QPC

Small search farm (up to 10M items)

Resources @ 10M items8x CPU cores24 GB RAM800 GB disk

Sized independently

Separate disk

for index

Page 25: SharePoint 2013 Search Topology and Optimization

Scaling from small to medium search topology

Adm

Crawl

Index Index IndexIndex QPCCPC CPC

APC

Adm

Crawl

Index Index Index IndexQPCCPC CPC

APC

Page 26: SharePoint 2013 Search Topology and Optimization

Extend your SSA

#2. Extend the Search Topology:

$hostApp1 = Get-SPEnterpriseSearchServiceInstance -Identity "SPWFE"

$hostApp2 = Get-SPEnterpriseSearchServiceInstance -Identity "SPSearch"

Start-SPEnterpriseSearchServiceInstance -Identity $hostApp1

Start-SPEnterpriseSearchServiceInstance -Identity $hostApp2

#3. Keep running this command until the Status is Online:Get-SPEnterpriseSearchServiceInstance -Identity $hostApp1 Get-SPEnterpriseSearchServiceInstance -Identity $hostApp2

#4. Once the status is online, you can proceed with the following commands:$ssa = Get-SPEnterpriseSearchServiceApplication$active = Get-SPEnterpriseSearchTopology -SearchApplication $ssa –Active

$newTopology = New-SPEnterpriseSearchTopology -SearchApplication $ssa

New-SPEnterpriseSearchAdminComponent -SearchTopology $newTopology -SearchServiceInstance $hostApp1

New-SPEnterpriseSearchCrawlComponent -SearchTopology $newTopology -SearchServiceInstance $hostApp1

New-SPEnterpriseSearchContentProcessingComponent -SearchTopology $newTopology -SearchServiceInstance $hostApp1

New-SPEnterpriseSearchAnalyticsProcessingComponent -SearchTopology $newTopology -SearchServiceInstance $hostApp1

New-SPEnterpriseSearchQueryProcessingComponent -SearchTopology $newTopology -SearchServiceInstance $hostApp1

New-SPEnterpriseSearchIndexComponent -SearchTopology $newTopology -SearchServiceInstance $hostApp1 –IndexPartition 0

New-SPEnterpriseSearchAdminComponent -SearchTopology $newTopology -SearchServiceInstance $hostApp2

New-SPEnterpriseSearchCrawlComponent -SearchTopology $newTopology -SearchServiceInstance $hostApp2

New-SPEnterpriseSearchContentProcessingComponent -SearchTopology $newTopology -SearchServiceInstance $hostApp2

New-SPEnterpriseSearchAnalyticsProcessingComponent -SearchTopology $newTopology -SearchServiceInstance $hostApp2

New-SPEnterpriseSearchQueryProcessingComponent -SearchTopology $newTopology -SearchServiceInstance $hostApp2

New-SPEnterpriseSearchIndexComponent -SearchTopology $newTopology -SearchServiceInstance $hostApp2 –IndexPartition 1

#5. Activate the topology:

Set-SPEnterpriseSearchTopology -Identity $newTopology

Page 27: SharePoint 2013 Search Topology and Optimization

Medium Search Topology

Page 28: SharePoint 2013 Search Topology and Optimization

Hybrid Search

Page 29: SharePoint 2013 Search Topology and Optimization

Why Hybrid Search?

Hybrid SharePoint environment

Pieces of content distributed across multiple environments

Complexity due to multiple locations

Many top level domains requiring knowledge of where to go to locate the most relevant content

No single Enterprise Search Center for finding content

Lost user productivity and added frustration while trying to locate relevant content

Page 30: SharePoint 2013 Search Topology and Optimization

Benefits

Provide integrated search results allowing for a single place to find content

One Enterprise Search center to reduce User Interface complexity Query all of your SharePoint content at the same time Allow O365 and On-Premises solutions to coexist Provides a solution allowing customers to move to the cloud on their

own terms Reduce operation cost Take advantage of newer SharePoint feature updates in O365 Hybrid search solves many problems as data is moving from on-

premises to O365

Page 31: SharePoint 2013 Search Topology and Optimization

One-way outbound topology

WFE

SharePoint Online

Local search

results only

Site collection

Office365 tenant SharePoint Server 2013 Farm

Hybrid search

results

Outbound

Inbound

SharePoint Online can NOT query SharePoint On-prem

Internet

Microsoft data center On-premises

SharePoint Server can query SharePoint Online

Page 32: SharePoint 2013 Search Topology and Optimization

One-way inbound topology

WFE

SharePoint Online

Local search

results only

Site collection

Office365 tenant SharePoint Server 2013 Farm

Hybrid search

results

Outbound

Inbound

SharePoint Online can query SharePoint On-prem

Internet

Microsoft data center On-premises

SharePoint Server can NOT querySharePoint Online

Reverse Proxy

DMZ

Page 33: SharePoint 2013 Search Topology and Optimization

One-way inbound topology

WFE

SharePoint Online

Local search

results only

Site collection

Office365 tenant SharePoint Server 2013 Farm

Hybrid search

results

Outbound

Inbound

SharePoint Online can query SharePoint On-prem

Internet

Microsoft data center On-premises

SharePoint Server can query SharePoint Online

Reverse Proxy

DMZ

Page 34: SharePoint 2013 Search Topology and Optimization

Tweaking Your results

Page 35: SharePoint 2013 Search Topology and Optimization

Challenges: Intent

Where is my talk Project Plan?

Are Documents held at the same place?

I wonder if there are references from

previous projects?Different people have different intentsQuery Rules help you handle intents

There is rarely a single right answer

Infrastructure Project

Page 36: SharePoint 2013 Search Topology and Optimization

Authorities: SSA-level configuration

Sites that are important

Sites with low intrinsic relevance

Takes ~24hrs to propagate

Page 37: SharePoint 2013 Search Topology and Optimization

Authorities: Connected

Page 38: SharePoint 2013 Search Topology and Optimization

Authorities: Connected

1

0

1

1

2

4

3

2

4

Setting an authority affects all sites connected through hyperlinks

Sites are weighted

by distance to the authority

Page 39: SharePoint 2013 Search Topology and Optimization

Query Rules

Tune Search Results

Created at the SSA, Tenant, Site Collection or SiteSSA

Site Collection

Site

Page 40: SharePoint 2013 Search Topology and Optimization

Query Rules

ConditionWhen Do I apply the rule?

ActionWhat to do when the rule is matched?

PublishingWhen should the rule be active?

Page 41: SharePoint 2013 Search Topology and Optimization

Query Rules

Exact match, beginning or end Ad-hoc or term store dictionary Match a regex (advanced) Is this query more likely aimed at

the following source…? Do people mostly click on result of

the following type…?

Conditions Show a promoted result Show a block of results Replace the core results

with a different query

Actions

Page 42: SharePoint 2013 Search Topology and Optimization

Query Builder

Dynamically Ranking Change

Part of the query

Results Ranking

Page 43: SharePoint 2013 Search Topology and Optimization

Query Builder

Page 44: SharePoint 2013 Search Topology and Optimization

Configuration in the Conceptual Relevance Flow

For all queries:

Authorities: Level 1: http://employment

Ranking model: {incorporate user ratings}

Query:HR Employmentquarterly report

Search Web Part

Query Processing Engine

Document Collection

Thesaurus: HR Human ResourcesBest bets: HR Employment /HR/employment

(WORDS HR, Human Resources) AND(WORDS employees, employed) AND (WORDS quarterly, quarterlies) AND(WORDS report, reports, reported)

Mixed Results for:• HR Employment best bet• HR Employment quarterly

report• HR Employment

ContentType=reports

Dynamic Reordering Rules: Quarterly Report {prefer docs from http://reports}

Query Rule: {Terms} Quarterly Report {Terms} ContentType=“reports”

Page 45: SharePoint 2013 Search Topology and Optimization

Create a Query Rule – Hybrid

From Result Source drop-down list, select the specified result source

Under Query is performed on these sources, if you select “One of these sources”, make sure to select the result source you created

Page 46: SharePoint 2013 Search Topology and Optimization

Hybrid Results

Results from SharePoint Online

Results from SharePoint Server

Page 47: SharePoint 2013 Search Topology and Optimization

Session Objective and Takeaways

High Availability and Performance

Better Search Quality

Better management

Friendly results and tools

Page 48: SharePoint 2013 Search Topology and Optimization

Thank You!

www.maadarani.com, [email protected] , @mikemaadarani

www.slideshare.net/maadarani