scaling search with solr cloud

Post on 27-Jan-2015

114 Views

Category:

Technology

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

Enterprise search can grow big, really big! And growing. Tens, yes hundreds of servers may be involved, locally or in the cloud. Managing this has been complex and time consuming - until now :) SolrCloud to the rescue Using the world's most popular Open Source search engine, Apache Solr™, we will show you how the new upcoming version 4.0 makes scaling search in the cloud really simple and robust. A new feature called SolrCloud adds centralized configuration, distributed indexing & searching, automatic failover, recovery and leader election. Scaling is now as simple as adding a new server to your cluster and it will find its role where it is most needed and start serving searches.

TRANSCRIPT

Scaling search withSolrCloud

Jan HøydahlCominvent AS

1

1995: Developer telecom1998: Java developer2000: Search - FAST2006: Lucene2007: new Cominvent()2009: Lucene/Solr2011: Lucene committer2012: Lucene PMC

> 100 projects

Jan Høydahl

2

3

Business critical searchDomain knowledge & best practices:

Consulting Training Support

About Cominvent4

5

7DLORUHG�WUDLQLQJ���FRQVXOWLQJ

,QWURGXFWLRQ�WR�6ROU��QRQ�WHFK�

,QWURGXFWLRQ�WR�6ROU��WHFK�

6ROU�'HYHORSHU

6FDOLQJ�DQG�WXQLQJ

'HYHORSLQJ�6ROU�3OXJLQV

Calendar from www.calendar-of-2012.com

SEPTEMBER 2012MONDAY TUESDAY WEDNESDAY THURSDAY FRIDAY SATURDAY SUNDAY

1 2

3 4 5 6 7 8 9

10 11 12 13 14 15 16

17 18 19 20 21 22 23

24 25 26 27 28 29 30

SolrTraining.com

Next course in Oslo:

http://www.meetup.com/Oslo-Solr-Community

6

CommunityZone talk:«Solr 101»

Thursday 14:20

http://www.meetup.com/Oslo-Solr-Community

7

next MeetUp:

8

•Sinsheim, Germany•November 5-8

•Lucene/Solr track•www.apachecon.eu

ApacheCon Europe 2012

Agenda

• Intro to Solr•Scaling search - before• Introduction to SolrCloud•Demo with Wikipedia data•Plans for Solr going forward•Q&A

9

Intro to Solr

10

Apache Solr

11

Search Server

Completely HTTP based

12

13

Areas of use

14

Example: e-commerce

15

www.libris.no

Boosting by function

16

Boosting on review popularity and sales numbers:

log(sum(popularity,numsold))

Auto suggest & phonetic normalization17

Example: classifieds/auctions

18

www.finn.no

Example: classifieds/auctions

18

www.finn.no

Who use Apache Lucene/Solr™ ?

19

..and many more:http://wiki.apache.org/solr/PublicServers

Versions

•Current stable = 3.6.1•Latest release = 4.0-beta•Next release = 4.0-FINAL --- «soon» :-)

20

01/2007

v1.1

09/2008

v1.3

11/2009

v1.4

03/2011

v3.1

04/2012

v3.6

06/2012

v4.0a

08/2012

v4.0ß

??/2012

v4.0

06/2011

v3.3

07/2012

v3.6.1

Scaling search

21

Why scale?

22

•One single Solr server handles...–millions of documents (per shard)–hundreds of queries per second (per replica)

•We need to scale if...–data volume increases–query volume increases–we need high availability / fault tolerance

Scaling search - before

23

Solr shard 1

- config, schema- synonyms

Scaling search - before

23

Solr shard 1

- config, schema- synonyms

Solr shard 2

- config, schema- synonyms

- Add shard node- Manually copy config- Manually index to right shard- Manually shards query parameter

Scaling search - before

23

Solr shard 1

- config, schema- synonyms

Solr 1 replica

- config, schema- synonyms

Solr shard 2

- config, schema- synonyms

Solr 2 replica

- config, schema- synonyms - Add replica node

- Copy config- Setup poll based replication- No indexing failover- Monitor every node

- Add shard node- Manually copy config- Manually index to right shard- Manually shards query parameter

Solr Cloud

24

What is SolrCloud?

•New in Solr 4.0•Easier scaling•Centralized config•Fault tolerant indexing and querying•Using Apache ZooKeeper as «registry»

25

ZooKeeper: «Because coordinating distributed systems is a Zoo»

What is SolrCloud

26

What is SolrCloud

26

What is SolrCloud

26

What is SolrCloud

26

Logical collection

What is SolrCloud

26

Logical collection

Transaction log

Soft commit

Scaling search - with SolrCloud

27

Solr master 1

ZK awareApache

ZooKeeper

Scaling search - with SolrCloud

27

Solr master 1

ZK aware

Solr master 2

ZK awareApache

ZooKeeper

- Add shard node, point it to ZK- It assumes the role of shard 2- Automatic document distribution- Automatic querying across cluster- Centralized config & monitoring

Scaling search - with SolrCloud

27

Solr master 1

ZK aware

Solr replica 1

ZK aware

Solr master 2

ZK aware

Solr replica 2

ZK aware

Apache ZooKeeper

- Add shard node, point it to ZK- It assumes the role of shard 2- Automatic document distribution- Automatic querying across cluster- Centralized config & monitoring

- Add replica node(s)- Auto role assignment- Push based replication- Indexing failover- Leader election through ZK

Scaling search - with SolrCloud

27

Solr master 1

ZK aware

Solr replica 1

ZK aware

Solr master 2

ZK aware

Solr replica 2

ZK aware

Apache ZooKeeper

Scaling search - with SolrCloud

27

Solr master 1

ZK aware

Solr replica 1

ZK aware

Solr master 2

ZK aware

Solr replica 2

ZK aware

Apache ZooKeeper

Solr master 2

ZK aware

Scaling search - with SolrCloud

27

Solr master 1

ZK aware

Solr replica 1

ZK aware

Solr master 2

ZK aware

Solr replica 2

ZK aware

Apache ZooKeeper

Solr master 2

ZK aware

Solr replica 2

ZK aware

Configuration

28

Solr master 1

ZK aware

Solr replica 1

ZK aware

Solr master 2

ZK aware

Solr replica 2

ZK aware

ZK

Configuration

28

Solr master 1

ZK aware

Solr replica 1

ZK aware

Solr master 2

ZK aware

Solr replica 2

ZK aware

ZK

-DzkRun-Dcollection.configName=jz-DnumShards=2-Dbootstrap_confdir=./solr/coll/conf

Configuration

28

Solr master 1

ZK aware

Solr replica 1

ZK aware

Solr master 2

ZK aware

Solr replica 2

ZK aware

ZK

-DzkHost=localhost:xxxx

-DzkRun-Dcollection.configName=jz-DnumShards=2-Dbootstrap_confdir=./solr/coll/conf

-DzkHost=localhost:xxxx-DzkHost=localhost:xxxx

Demoindexing & querying

29

Solr 4.0 and beyond

30

•Other news in v4.0 FINAL (expected later this autumn)–NRT–Real-time GET–Smaller index & memory footprint–New «modern» Admin GUI–Incremental updates–Pseudo-join

•Future plans–More shard distribution mechanisms–Re-balancing cluster (split shards)–...

Recap

•Apache Solr open source enterprise search•Scaling Solr was hard•Solr 4.0 with SolrCloud makes it easy :)

–Centralized config–Effortless scaling of cluster–Fault tolerant indexing & querying

•Download the 4.0-beta today, 4.0-FINAL soon

31

32

Remember

Calendar from www.calendar-of-2012.com

SEPTEMBER 2012MONDAY TUESDAY WEDNESDAY THURSDAY FRIDAY SATURDAY SUNDAY

1 2

3 4 5 6 7 8 9

10 11 12 13 14 15 16

17 18 19 20 21 22 23

24 25 26 27 28 29 30

Next Solr course in Oslo:

CommunityZone talk:«Solr 101»

Thursday 14:20

www.solrkurs.no

33

Jan HøydahlCominvent AS@cominvent

?

top related