scaling search with solr cloud

47
Scaling search with SolrCloud Jan Høydahl Cominvent AS 1

Upload: cominvent-as

Post on 27-Jan-2015

114 views

Category:

Technology


4 download

DESCRIPTION

Enterprise search can grow big, really big! And growing. Tens, yes hundreds of servers may be involved, locally or in the cloud. Managing this has been complex and time consuming - until now :) SolrCloud to the rescue Using the world's most popular Open Source search engine, Apache Solr™, we will show you how the new upcoming version 4.0 makes scaling search in the cloud really simple and robust. A new feature called SolrCloud adds centralized configuration, distributed indexing & searching, automatic failover, recovery and leader election. Scaling is now as simple as adding a new server to your cluster and it will find its role where it is most needed and start serving searches.

TRANSCRIPT

Page 1: Scaling search with Solr Cloud

Scaling search withSolrCloud

Jan HøydahlCominvent AS

1

Page 2: Scaling search with Solr Cloud

1995: Developer telecom1998: Java developer2000: Search - FAST2006: Lucene2007: new Cominvent()2009: Lucene/Solr2011: Lucene committer2012: Lucene PMC

> 100 projects

Jan Høydahl

2

Page 3: Scaling search with Solr Cloud

3

Page 4: Scaling search with Solr Cloud

Business critical searchDomain knowledge & best practices:

Consulting Training Support

About Cominvent4

Page 5: Scaling search with Solr Cloud

5

7DLORUHG�WUDLQLQJ���FRQVXOWLQJ

,QWURGXFWLRQ�WR�6ROU��QRQ�WHFK�

,QWURGXFWLRQ�WR�6ROU��WHFK�

6ROU�'HYHORSHU

6FDOLQJ�DQG�WXQLQJ

'HYHORSLQJ�6ROU�3OXJLQV

Calendar from www.calendar-of-2012.com

SEPTEMBER 2012MONDAY TUESDAY WEDNESDAY THURSDAY FRIDAY SATURDAY SUNDAY

1 2

3 4 5 6 7 8 9

10 11 12 13 14 15 16

17 18 19 20 21 22 23

24 25 26 27 28 29 30

SolrTraining.com

Next course in Oslo:

Page 6: Scaling search with Solr Cloud

http://www.meetup.com/Oslo-Solr-Community

6

CommunityZone talk:«Solr 101»

Thursday 14:20

Page 7: Scaling search with Solr Cloud

http://www.meetup.com/Oslo-Solr-Community

7

next MeetUp:

Page 8: Scaling search with Solr Cloud

8

•Sinsheim, Germany•November 5-8

•Lucene/Solr track•www.apachecon.eu

ApacheCon Europe 2012

Page 9: Scaling search with Solr Cloud

Agenda

• Intro to Solr•Scaling search - before• Introduction to SolrCloud•Demo with Wikipedia data•Plans for Solr going forward•Q&A

9

Page 10: Scaling search with Solr Cloud

Intro to Solr

10

Page 11: Scaling search with Solr Cloud

Apache Solr

11

Search Server

Page 12: Scaling search with Solr Cloud

Completely HTTP based

12

Page 13: Scaling search with Solr Cloud

13

Page 14: Scaling search with Solr Cloud

Areas of use

14

Page 15: Scaling search with Solr Cloud

Example: e-commerce

15

www.libris.no

Page 16: Scaling search with Solr Cloud

Boosting by function

16

Boosting on review popularity and sales numbers:

log(sum(popularity,numsold))

Page 17: Scaling search with Solr Cloud

Auto suggest & phonetic normalization17

Page 18: Scaling search with Solr Cloud

Example: classifieds/auctions

18

www.finn.no

Page 19: Scaling search with Solr Cloud

Example: classifieds/auctions

18

www.finn.no

Page 20: Scaling search with Solr Cloud

Who use Apache Lucene/Solr™ ?

19

..and many more:http://wiki.apache.org/solr/PublicServers

Page 21: Scaling search with Solr Cloud

Versions

•Current stable = 3.6.1•Latest release = 4.0-beta•Next release = 4.0-FINAL --- «soon» :-)

20

01/2007

v1.1

09/2008

v1.3

11/2009

v1.4

03/2011

v3.1

04/2012

v3.6

06/2012

v4.0a

08/2012

v4.0ß

??/2012

v4.0

06/2011

v3.3

07/2012

v3.6.1

Page 22: Scaling search with Solr Cloud

Scaling search

21

Page 23: Scaling search with Solr Cloud

Why scale?

22

•One single Solr server handles...–millions of documents (per shard)–hundreds of queries per second (per replica)

•We need to scale if...–data volume increases–query volume increases–we need high availability / fault tolerance

Page 24: Scaling search with Solr Cloud

Scaling search - before

23

Solr shard 1

- config, schema- synonyms

Page 25: Scaling search with Solr Cloud

Scaling search - before

23

Solr shard 1

- config, schema- synonyms

Solr shard 2

- config, schema- synonyms

- Add shard node- Manually copy config- Manually index to right shard- Manually shards query parameter

Page 26: Scaling search with Solr Cloud

Scaling search - before

23

Solr shard 1

- config, schema- synonyms

Solr 1 replica

- config, schema- synonyms

Solr shard 2

- config, schema- synonyms

Solr 2 replica

- config, schema- synonyms - Add replica node

- Copy config- Setup poll based replication- No indexing failover- Monitor every node

- Add shard node- Manually copy config- Manually index to right shard- Manually shards query parameter

Page 27: Scaling search with Solr Cloud

Solr Cloud

24

Page 28: Scaling search with Solr Cloud

What is SolrCloud?

•New in Solr 4.0•Easier scaling•Centralized config•Fault tolerant indexing and querying•Using Apache ZooKeeper as «registry»

25

ZooKeeper: «Because coordinating distributed systems is a Zoo»

Page 29: Scaling search with Solr Cloud

What is SolrCloud

26

Page 30: Scaling search with Solr Cloud

What is SolrCloud

26

Page 31: Scaling search with Solr Cloud

What is SolrCloud

26

Page 32: Scaling search with Solr Cloud

What is SolrCloud

26

Logical collection

Page 33: Scaling search with Solr Cloud

What is SolrCloud

26

Logical collection

Transaction log

Soft commit

Page 34: Scaling search with Solr Cloud

Scaling search - with SolrCloud

27

Solr master 1

ZK awareApache

ZooKeeper

Page 35: Scaling search with Solr Cloud

Scaling search - with SolrCloud

27

Solr master 1

ZK aware

Solr master 2

ZK awareApache

ZooKeeper

- Add shard node, point it to ZK- It assumes the role of shard 2- Automatic document distribution- Automatic querying across cluster- Centralized config & monitoring

Page 36: Scaling search with Solr Cloud

Scaling search - with SolrCloud

27

Solr master 1

ZK aware

Solr replica 1

ZK aware

Solr master 2

ZK aware

Solr replica 2

ZK aware

Apache ZooKeeper

- Add shard node, point it to ZK- It assumes the role of shard 2- Automatic document distribution- Automatic querying across cluster- Centralized config & monitoring

- Add replica node(s)- Auto role assignment- Push based replication- Indexing failover- Leader election through ZK

Page 37: Scaling search with Solr Cloud

Scaling search - with SolrCloud

27

Solr master 1

ZK aware

Solr replica 1

ZK aware

Solr master 2

ZK aware

Solr replica 2

ZK aware

Apache ZooKeeper

Page 38: Scaling search with Solr Cloud

Scaling search - with SolrCloud

27

Solr master 1

ZK aware

Solr replica 1

ZK aware

Solr master 2

ZK aware

Solr replica 2

ZK aware

Apache ZooKeeper

Solr master 2

ZK aware

Page 39: Scaling search with Solr Cloud

Scaling search - with SolrCloud

27

Solr master 1

ZK aware

Solr replica 1

ZK aware

Solr master 2

ZK aware

Solr replica 2

ZK aware

Apache ZooKeeper

Solr master 2

ZK aware

Solr replica 2

ZK aware

Page 40: Scaling search with Solr Cloud

Configuration

28

Solr master 1

ZK aware

Solr replica 1

ZK aware

Solr master 2

ZK aware

Solr replica 2

ZK aware

ZK

Page 41: Scaling search with Solr Cloud

Configuration

28

Solr master 1

ZK aware

Solr replica 1

ZK aware

Solr master 2

ZK aware

Solr replica 2

ZK aware

ZK

-DzkRun-Dcollection.configName=jz-DnumShards=2-Dbootstrap_confdir=./solr/coll/conf

Page 42: Scaling search with Solr Cloud

Configuration

28

Solr master 1

ZK aware

Solr replica 1

ZK aware

Solr master 2

ZK aware

Solr replica 2

ZK aware

ZK

-DzkHost=localhost:xxxx

-DzkRun-Dcollection.configName=jz-DnumShards=2-Dbootstrap_confdir=./solr/coll/conf

-DzkHost=localhost:xxxx-DzkHost=localhost:xxxx

Page 43: Scaling search with Solr Cloud

Demoindexing & querying

29

Page 44: Scaling search with Solr Cloud

Solr 4.0 and beyond

30

•Other news in v4.0 FINAL (expected later this autumn)–NRT–Real-time GET–Smaller index & memory footprint–New «modern» Admin GUI–Incremental updates–Pseudo-join

•Future plans–More shard distribution mechanisms–Re-balancing cluster (split shards)–...

Page 45: Scaling search with Solr Cloud

Recap

•Apache Solr open source enterprise search•Scaling Solr was hard•Solr 4.0 with SolrCloud makes it easy :)

–Centralized config–Effortless scaling of cluster–Fault tolerant indexing & querying

•Download the 4.0-beta today, 4.0-FINAL soon

31

Page 46: Scaling search with Solr Cloud

32

Remember

Calendar from www.calendar-of-2012.com

SEPTEMBER 2012MONDAY TUESDAY WEDNESDAY THURSDAY FRIDAY SATURDAY SUNDAY

1 2

3 4 5 6 7 8 9

10 11 12 13 14 15 16

17 18 19 20 21 22 23

24 25 26 27 28 29 30

Next Solr course in Oslo:

CommunityZone talk:«Solr 101»

Thursday 14:20

www.solrkurs.no

Page 47: Scaling search with Solr Cloud

33

Jan HøydahlCominvent AS@cominvent

?