webinar: inside apache solr 5

24

Upload: lucidworks

Post on 16-Jul-2015

867 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Webinar: Inside Apache Solr 5
Page 2: Webinar: Inside Apache Solr 5

October 13-15, 2015 • Austin, TXhttp://lucenerevolution.org

Page 3: Webinar: Inside Apache Solr 5

Inside Apache Solr 5

Page 4: Webinar: Inside Apache Solr 5

COMMUNITY

CUSTOMERS PRODUCTS

Apache Solr + Lucidworks

Page 5: Webinar: Inside Apache Solr 5

Search is more than just a box.

Page 6: Webinar: Inside Apache Solr 5

personal. contextual. actionable.

Search makes data

Page 7: Webinar: Inside Apache Solr 5

Search can be smarter.

location search history query security context

Personal, contextual, relevant results: consumer-like simplicity and power in the enterprise.

Page 8: Webinar: Inside Apache Solr 5

Product Offering

Environment

Features

Support Level

Additional Support

AvailabilityResponse Time

Number of IncidentsPricing Model

SolrEnterprise

24x7SLA-Backed

Unlimited IncidentsPer Node

Dev Support (4 Contacts)Operational Support

Regular Health Checks

SecurityLog Analysis / SiLK Support

Dashboards & ReportingEnhanced Admin UI

Fusion

Dev Support (4 Contacts)Operational Support

Regular Health Checks

24x7SLA-Backed

Unlimited IncidentsPer Node

SecurityCrawlers & Connectors

Log Analysis / SiLK SupportEnhanced Admin UI

Data EnrichmentMachine LearningRecommendations

Advanced Relevancy Tuning

DeveloperSupport

How-To SupportKnowledge BaseFusion Support

9x5SLA-Backed

Unlimited IncidentsPer Named Developer

ProductionDevelopment

Page 9: Webinar: Inside Apache Solr 5

• Get Started • Dig in • Go Big • Get Finished • Sneak peak

Inside Apache Solr 5

Page 10: Webinar: Inside Apache Solr 5

• Easy to start/stop

./bin/solr {start|stop}

• Create collections:

./bin/solr create -c <COLL_NAME>

• No more WAR! Web container (Jetty) is now an implementation detail

• Scripts to support installing and running Solr as a service on Linux.

Get Started

Page 11: Webinar: Inside Apache Solr 5

JSON’s great:

• Solr 5 “does the right thing” for JSON out of the box

Except when it isn’t:

• Most data isn’t JSON

• Solr handles CSV, XML, Rich Content out of the box without having to install plugins

Your Content, Your Way

Page 12: Webinar: Inside Apache Solr 5

Your Content, Your Way

• Solr 5 will ship Tika 1.7, adding:

• OCR support

• PST and Matlab

• Better Date Handling

• More flexibility with spatial units

Page 13: Webinar: Inside Apache Solr 5

Dig In

Page 14: Webinar: Inside Apache Solr 5

• Stats and Pivot faceting now work together

• Focused on accuracy of results

• First few steps in unification of all facet types with stats and aggregations

• http://lucidworks.com/blog/you-got-stats-in-my-facets/

Pivots and Stats

Page 15: Webinar: Inside Apache Solr 5

• Schema API: REST API for adding field types, and dynamic fields

• Managing Request Handlers through API

• Implicit registration of replication, Real Time Get and Administration Handlers

• Improved APIs for managing collections

API Goodness

Page 16: Webinar: Inside Apache Solr 5

Lucene 5 Highlights

• Stronger index safety guarantees

• Reduced memory usage in a number of areas

• No more FieldCache (replaced w/ UninvertingReader)

• Multi-valued sorting and suggesters

• Better IO defaults when using SSDs

• More efficient handling of merging stored fields

Page 17: Webinar: Inside Apache Solr 5

Go Big

• Many scaling improvements focused on interactions with Zookeeper:

• Split cluster state management reduces chattiness in large multi-tenant implementations

• Improved performance for Overseer operations

• Better timeout defaults based on real-world testing

• See Shalin Mangar’s Revolution Keynote for more details: http://bit.ly/shalinRevKeynote

Page 18: Webinar: Inside Apache Solr 5

Distributed IDF

• IDF = Inverse Document Frequency = A measure of the relative importance of a word in a collection

• 4 implementations:

• LocalStatsCache: Local Stats

• ExactStatsCache: One time use aggregation

• ExactSharedStatsCache: Stats shared across requests

• LRUStatsCache: Stats shared in an LRU cache across requests

Page 19: Webinar: Inside Apache Solr 5

• Ease of getting started means nothing if you can’t stay running in production

• Jepsen tests simulate network partitions, data loss, i.e. “The Real World”

• https://github.com/LucidWorks/jepsen/tree/solr-jepsen

Get Finished

Page 20: Webinar: Inside Apache Solr 5

Stability Improvements

• Protection of ZK content

• ReplicationHandler now has an option to throttle the speed of replication

• More control over terminating long running queries

• Finite default timeouts for select and update requests

Page 21: Webinar: Inside Apache Solr 5

WELCOME TO THE FUTURE

Page 22: Webinar: Inside Apache Solr 5

• Facets and Analytics:

• Mix and match all facet types and stats (SOLR-6352, SOLR-6353, SOLR-4212)

• Percentiles via t-digest (SOLR-6350)

• Replication performance (SOLR-6816)

• Finish off Config APIs (various)

• Data location aware ValueSource implementation for fast changing distributed data

• First class support for more languages OOTB

Near Term Road Map

Page 23: Webinar: Inside Apache Solr 5

Resources

Release Notes: • Solr: http://wiki.apache.org/solr/ReleaseNote50 • Lucene: https://wiki.apache.org/lucene-java/ReleaseNote50

Lucidworks: http://www.lucidworks.com • Webinar recording will be available soon

Grant • [email protected] • Twitter: @gsingers

Page 24: Webinar: Inside Apache Solr 5