sfbay area solr meetup - june 18th: benchmarking solr performance

25
Search | Discover | Analyze Confidential and Proprietary © Copyright 2013 Benchmarking Solr Performance June 18, 2014 Timothy Potter

Upload: lucidworks-archived

Post on 26-Jan-2015

115 views

Category:

Technology


2 download

DESCRIPTION

"Benchmarking Solr Performance" - Tim Potter, Lucidworks

TRANSCRIPT

Page 1: SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance

Search | Discover | Analyze

Confidential and Proprietary © Copyright 2013

Benchmarking Solr PerformanceJune 18, 2014Timothy Potter

Page 2: SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance

Confidential and Proprietary © Copyright 2013

My SolrCloud Experience

• At LucidWorks, mostly focused on hardening SolrCloud; Lucene/Solr committer

• Operated 36 node cluster in AWS for Dachis Group (1.5 years ago, 18 shards ~900M docs)

• Built a Fabric/boto framework for deploying and managing a cluster in EC2

• Co-author of Solr In Action

Page 3: SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance

Confidential and Proprietary © Copyright 2013

Agenda

• Indexing performance tests

• Solr Scale Toolkit

• Next steps

Page 4: SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance

Confidential and Proprietary © Copyright 2013

Cluster sizing

How many servers do I need to index X docs? ... shards ... ? ... replicas ... ?

I need N queries per second over M docs, how many servers do I need?

It depends?!?

Page 5: SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance

Confidential and Proprietary © Copyright 2013

Methodology

• Transparent repeatable results– Ideally hoping for something owned by the community

• Synthetic docs ~ 1K each on disk, mix of field types– Data set created using code borrowed from PigMix– English text fields generated using a Zipfian distribution

• Java 1.7u55, Amazon Linux, r3.2xlarge nodes– enhanced networking enabled, placement group, same AZ

• Stock Solr (cloud) 4.8.1– Using Shawn Heisey’s GC tuning parameters

• Use Elastic MapReduce to generate load– As many nodes as I need to drive Solr!

Page 6: SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance

Confidential and Proprietary © Copyright 2013

Indexing Results

Cluster Size # of Shards # of Replicas Reducers Time (secs) Docs / sec

10 10 1 48 1762 73,780

10 10 2 34 3727 34,881

10 20 1 48 1282 101,404

10 20 2 34 3207 40,536

10 30 1 72 1070 121,495

10 30 2 60 3159 41,152

15 15 1 60 1106 117,541

15 15 2 42 2465 52,738

15 30 1 60 827 157,195

15 30 2 42 2129 61,062

Page 7: SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance

Confidential and Proprietary © Copyright 2013

Direct Updates

IndexingClient 1

CloudSolrServer(SolrJ)

ZooKeeper/clusterstate.json

Shard 1(leader)

Shard 2(leader)

Shard 3(leader)

<doc>

<doc>

Watch /clusterstate.json

<doc><doc>

compute shardassignment on

clientbatch

Page 8: SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance

Confidential and Proprietary © Copyright 2013

Replication

CloudSolrServer(SolrJ)

ZooKeeper/clusterstate.json

Shard 1(leader)

Shard 2(leader)

Shard 3(leader)

<doc>

<doc>

Watch /clusterstate.json

<doc>Shard 1(replica)

Shard 2(replica)

Shard 3(replica)

Blocks for responsefrom replica(s)

Page 9: SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance

Confidential and Proprietary © Copyright 2013

Don’t swamp your servers!

Page 10: SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance

Confidential and Proprietary © Copyright 2013

Lessons Learned

• Know what throughput your client side is capable of generating– If in MapReduce, index from reducers with speculative

execution disabled

• Don’t change Solr config without good reasons for doing so

• Overshard (but not too much)

• Near-linear scalability as I added nodes!

Page 11: SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance

Confidential and Proprietary © Copyright 2013

Query Performance Tests

• All nodes in SolrCloud perform indexing and execute queries

• Using the TermsComponent to build queries based on the terms in each field.

• Harder to accurately simulate user queries over synthetic data– Need mix of faceting, paging, sorting, grouping, boolean clauses, range

queries, boosting, filters (some cached, some not), etc ...

• Does the randomness in your test queries model (expected) user behavior?

• Start with one server (1 shard) to determine baseline query performance.– Look for inefficiencies in your schema and other config settings

Page 12: SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance

Confidential and Proprietary © Copyright 2013

Solr Scale Toolkit

• Fabric / Python based toolset for deploying and managing SolrCloud clusters

• SolrJ-based client application useful for building tools that need access to cluster state information in ZooKeeper

• Code to support benchmarks for Solr

Page 13: SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance

Confidential and Proprietary © Copyright 2013

Python-based Tools

boto – Python API for AWS (EC2, S3, etc)Fabric – Python-based tool for automating system admin tasks over SSHpysolr – Python library for Solr (sending commits, queries, ...)kazoo – Python client tools for ZooKeeper

Supporting Cast:JMeter – run tests, generate reportscollectd – system monitoringLogstash4Solr – log aggregationJConsole/VisualVM – monitor JVM during indexing / queries

Page 14: SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance

Confidential and Proprietary © Copyright 2013

Solr Scale Toolkit: Demo

• Launch a meta node– Log agg / basic monitoring using SiLK

• Launch ZooKeeper Ensemble– 3 nodes to establish quorum– Setup cron job to clean-up snapshots

• Launch SolrCloud cluster• Create new collection and index some docs

– Attach JConsole while indexing

• Run a healthcheck on the collection• Checkout Banana Dashboard• Backup / Restore

– Requires patch for SOLR-5956– Use fab patch_jars to update jars and do a rolling restart

Page 15: SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance

Confidential and Proprietary © Copyright 2013

• Custom built AMI?

• Block device mapping– dedicated disk per Solr node

• Launch and then poll status until they are live – verify SSH connectivity

• Tag each instance with a cluster ID and username

Provisioning machines

fab new_ec2_instances:test1,n=3,instance_type=m3.xlarge

Page 16: SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance

Confidential and Proprietary © Copyright 2013

• Two options:– provision 1 to N nodes when you launch Solr cluster– use existing named ensemble

• Fabric command simply creates the myid files and zoo.cfg file for the ensemble– and some cron scripts for managing snapshots

• Basic health checking of ZooKeeper status:– echo srvr | nc localhost 2181

ZooKeeper

fab new_zk_ensemble:zk1,n=3

Page 17: SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance

Confidential and Proprietary © Copyright 2013

• Upload a BASH script that starts/stops Solr• Set system props: jetty.port, host, zkHost,

JVM opts• One or more Solr nodes per machine• JVM mem opts dependent on instance type

and # of Solr nodes per instance• Optionally configure log4j.properties to

append messages to Rabbitmq for Logstash4Solr integration

SolrCloud

fab new_solrcloud:test1,zk=zk1,nodesPerHost=2

Page 18: SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance

Confidential and Proprietary © Copyright 2013

• BASH script that implements:– start/stop Solr nodes on each EC2 instance– sets JVM memory options, system properties

(jetty.port), enable remote JMX, etc– backup log files before restarting nodes– ensure JVM is killed correctly before restarting

• Environment variables in:solr-ctl-env.sh

solr-ctl.sh

Page 19: SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance

Confidential and Proprietary © Copyright 2013

• Deploy a configuration directory to ZooKeeper• Create a new collection• Attach a local JConsole/VisualVM to a remote JVM• Rolling restart (with Overseer awareness)• Build Solr locally and patch remote

– Use a relay server to scp the JARs to Amazon network once and then scp them to other nodes from within the network

• Put/get files• Grep over all log files (across the cluster)

Miscellaneous Utility Tasks

Page 20: SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance

Confidential and Proprietary © Copyright 2013

• fab mine: See clusters I’m running (or for other users too)

• fab kill_mine: Terminate all instances I’m running– Use termination protection in production

• fab ssh_to: Quick way to SSH to one of the nodes in a cluster

• fab stop/recover/kill: Basic commands for controlling specific Solr nodes in the cluster

• fab jmeter: Execute a JMeter test plan against your cluster– Example test plan and Java sampler is included with the source

Other useful stuff ...

Page 21: SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance

Confidential and Proprietary © Copyright 2013

• Java-based command-line application that uses SolrJ’s CloudSolrServer to perform advanced cluster management operations:– healthcheck: collect metadata and health information

from all replicas for a collection from ZooKeeper– backup: create a snapshot of each shard in a collection

for backing up to remote storage (S3)

• Framework for building complex tools that benefit from having access to cluster state information in ZooKeeper

SolrCloud Tools (SolrJ client app)

./tools.sh –tool healthcheck

Page 22: SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance

Confidential and Proprietary © Copyright 2013

SiLK Integration

• SiLK: Solr integrated with Logstash and Kibana– Index time-series data, such as log data (collectd, Solr

logs, ...)– Build cool dashboards with Banana (fork of Kibana)

• Easily aggregate all WARN and more severe log messages from all Solr servers into logstash4solr

• Send collectd metrics to logstash4solr

Page 23: SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance

Confidential and Proprietary © Copyright 2013

SiLK Integration

Page 24: SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance

Confidential and Proprietary © Copyright 2013

What’s Next?

• Migrate to using Apache libcloud instead of using boto directly

• Benchmark mixed work-loads (queries and indexing)

• SiLK is improving rapidly!

• Chaos monkey tests– integrate jepsen?

• Open source so please kick the tires!

Page 25: SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance

Confidential and Proprietary © Copyright 2013

Wrap-up

• Solr Scale Toolkit: https://github.com/LucidWorks/solr-scale-tk• LucidWorks: http://www.lucidworks.com• SiLK: http://www.lucidworks.com/lucidworks-silk/• Solr In Action: http://www.manning.com/grainger/• Connect: @thelabdude / [email protected]

Questions?