what's new in solr 5.0
TRANSCRIPT
What’s new in Apache Solr 5.0
Who am I?
• Anshum Gupta, Apache Lucene/Solr committer, Lucidworks Employee.
• Search and related stuff for 9+ years.
• Apache Lucene since 2006 and Solr since 2010.
• Organizations I am or have been a part of:
Solr - Releases
–Someone
Ease of Use: Because usability doesn’t end after the first five minutes!
Scripts - Richer, faster, easier!
• Solr Demo:
• bin/post script
• Auto config-set copying
• Create -> Post -> Browse -> Delete
• bin/solr start -e cloud -noprompt ; bin/post -c gettingstarted http://lucidworks -recursive 2; open http://localhost:8983/solr/gettingstarted/browse
Example is now Server
• No default collection1
• Configset options
• ant example server
• post.sh
Posting documents was never so easy!
• bin/post script wraps around the improved SimplePostTool
• Index JSON directly OTB
• Developers: SolrServer is now SolrClient
Managing Solr
Managing Solr Configuration - Application• Paramsets: Add/Edit
• initParams: Generic appends, invariants and defaults outside of the component
• Schema API: REST API for adding field types, and dynamic fields
• Managing requestHandlers through API
• Implicit registration of replication, get and admin Handlers.
Managing the cluster - Systems• Collection APIs
• BALANCESHARDUNIQUE: Even distribution of custom replica properties
• Improved APIs
• Option to not shuffle nodeSet specified during CREATE Collection
• Logging
• Transaction log replay status
• Slow request (optional)
• Support for editing common solrconfig.xml values
• Scripts to support installing and running Solr as a service on Linux.
Keeping Solr Instance(s) Stable
• ReplicationHandler now has an option to throttle the speed of replication
• timeAllowed respected more widely - Query expansion, collection and LBHTTPSolrClient retries
• Finite default timeouts for select and update requests
Scalability
• Splitting of ClusterState • Every collection has its own cluster state
• No need to watch what everyone else is doing
• Might be the default in 5.0
• Improved Solr - Zk communication
• Speed up overseer operations avoiding cluster state reads from zookeeper at the start of each loop
• Better default timeouts to operate at a large scale
–Johnny Appleseed
“Type a quote here.”
Solr scalability is unmatched.
Features
Distributed IDF• Multiple contributors and almost 5 years.
• 4 implementations OTB:
• LocalStatsCache: Local Stats
• ExactStatsCache: One time use aggregation
• ExactSharedStatsCache: Stats shared across requests
• LRUStatsCache: Stats shared in an LRU cache across requests
• Flow:
• Conditionally Send GET_TERM_STATS request to participating nodes
• Compute global values, another request for SET_TERM_STATS + GET_TOP_IDS
• Conditional GET_FIELDS
Stats Component
• stats.field can now be used to generate stats over the numeric results of arbitrary functions,
• stats.field={!func}product(price,popularity)
• Stats hang off pivots via tags
And there are more…
• DateRangeField for indexing date ranges, especially multi-valued ones.
• Spatial fields that used to require units=degrees now take distanceUnits=degrees/kilometers miles instead.
• MoreLikeThis QueryParser: Works in SolrCloud mode too.
• API for managing blobs
and more…
• First class support in SolrJ for Collection API calls
• Upgrade Tika to 1.7: This adds support for parsing Outlook PST and Matlab (MAT) files.
Maturity
• Jepsen tests
• More unit tests and more success stories of Solr.
• Protection of ZK content
No more WAR!
• Solr is now an app, no more shipping a war starting Solr 5.0
• Upgrade to Jetty 9 coming soon
• Will allow for a lot of things (SPDY) that wouldn’t be possible if we had to support tomcat/netty/jetty everything else.
Between 4.10 and 5.0: The new Identity
Timeline*
• Release branch cut
• 2nd RC vote in progress.
• Vote - 3 days, 3 votes
• Artifacts propagation to ASF mirrors - 1 day
• Official release note - Right after!
* prospective and subject to how things go
Coming soon
• Collections API: REBALANCESHARDS
• Spatial 2D heat-map faceting
• Facet and analytics
• Replication performance
• More API goodness
Questions?
Connect @
http://www.twitter.com/anshumgupta
http://www.linkedin.com/in/anshumgupta/