performance tuning in ranker

12
Performance Tuning In Ranker. com

Upload: eossoftware

Post on 13-Jul-2015

40 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Performance tuning in ranker

Performance Tuning In Ranker.com

Page 2: Performance tuning in ranker

Agenda

● Introduction to Ranker● Performance Challenges In Ranker● Performance Tuning Strategies● Conclusion

Page 3: Performance tuning in ranker

Introduction To Ranker

Ranker is a social site and platform that is in essence an operating system for Lists. Ranker makes it easy, fun, and social for users to Rank things – anything - via a Netflix-style drop-and-drag interface and a huge backend database. Everything in the system is an object, so that we can aggregate individual lists and answer the “wisdom of crowds” question “what is the best ___”.

Ranker is fully distributable. So for example a travel blog can embed Ranker on their site that allows their users to easily rank their own favorite golf destinations. This gives the blog a sticky interactive tool, as well as valuable content showcasing a continually updated ranking of their community’s consensus picks for golf spots.

The Ranker platform is flexible enough to be used for publishing, social networking, shopping, polling, even organization.

Page 4: Performance tuning in ranker

Performance Challenges In Ranker

The Ranker application has to deal with two main performance issues:

1. Data Volume - One of the biggest challenges while building Ranker, compared to other regular web applications is the volume of data that has to be managed. Ranker deals with close to 10 million topics, most of which have been obtained from Freebase. Freebase uses a custom RDF store to persist and retrieve its data. However Ranker needs to achieve the same performance levels using a relational database.

2. Traffic - Ranker deals with a lot of social and entertaining content. This results in traffic spikes where its not uncommon to get a huge number of visitors in very short span of time. We have often seen around 40,000 visits to a single page within a span of one hour, before subsiding back to a more reasonable traffic volume.

Page 5: Performance tuning in ranker

Performance Tuning Strategies

● Caching● Database De-normalization● Hardware● Delayed Calculation/Aggregation● Event Based Post-Processing● Search Indexing

The performance challenges explained in the previous slide are handled using the following methods:

Page 6: Performance tuning in ranker

Caching

Ranker implements caching using the open source Ehcache framework. Caching is implemented to various depths within the application. While some parts of the application only use caching to store backend objects that are time-consuming to load, other parts of the application cache the entire request by storing the generated HTML in the cache.

Caching in Ranker is also well integrated with the custom CMS that is used to configure various pages in the application. The CMS allows us to specify different cache expiration times for each block of each configured page in Ranker.

Page 7: Performance tuning in ranker

Database De-normalization

Since the Ranker traffic patterns indicate that a huge percentage of activity on the site is for "reads" and a much smaller percentage for "writes", de-normalizing the database provides huge performance benefits for the application.

Database de-normalization often involves duplicating data across tables in order to avoid expensive joins in the SQL queries. Hence it involves a lot of overhead while editing or deleting the de-normalized entities. A single user action might require the application to update multiple locations due to this technique. This is also very prone to causing bugs in the system when the programmers are not aware of all the places the data is duplicated in. Hence de-normalization has been used very cautiously and is used when none of the other approaches are applicable.

Page 8: Performance tuning in ranker

Hardware

Using better hardware is often a much simpler and cheaper option than investing a lot of time in improving the performance of some parts of the application. We have made sure that we have the most suitable hardware for the systems that are being built, based on the amount of memory and processing power needed.

Ranker also uses hardware load balancers to distribute the load across multiple web servers. This makes a huge difference when there is a spike in traffic, as mentioned in the “Performance Challenges In Ranker” section.

Here is one situation where coding for better hardware made a huge difference to the performance: One of the background processes in Ranker required to make around 9 million queries to be able to complete its job. Later we realized that by loading all the data into memory in one shot, we could reduce the number of queries to a few thousand. However this would require us store around 3GB of data in memory. Hence it made more sense to get systems with bigger memory capacity. This change resulted in the performance increasing by about 20 times.

Page 9: Performance tuning in ranker

Delayed Calculation/Aggregation

Ranker uses a large number of small batch programs that perform calculations using complex algorithms, on a regular interval. This allows us to pre calculate scores for lists and items and hence avoid performing the calculations every time data is retrieved or stored. The tricky part in using this technique is to choose the right amount of pre-calculation of data. Too much pre-calculation will result in large number of results to store, however too less of it can result in doing a lot of calculation while loading the data.

For example, this technique is used to calculate the most interesting lists in each domain in Ranker. The algorithm to identify the interesting lists uses the a lot of factors like number of views, number of votes, etc and is executed once a day. In this case, instead of determining the most interesting lists in each domain, we only determine a universal score for each list. This score can be used to get the most interesting lists in any domain.

Page 10: Performance tuning in ranker

Event Based Post-Processing

This is another form of Delayed Calculation. Some of the user actions in Ranker will need the system to sometimes perform complex/time-consuming operations. For cases like these, Ranker uses an event based post-processing framework to perform these operations asynchronously. This will allow us to give the user a quick response time and also perform time-consuming operations within a few seconds after the action. The only disadvantage in using this approach is the difficulty it causes in reporting errors and failures to the user.

For example, when someone comments on a list, we need to notify the list author through an email. Even though the list author needs to be notified immediately, having a delay of a few seconds is acceptable. Performing this asynchronously will allow us to give a quick response to the user without having the user wait for the email to be sent.

Page 11: Performance tuning in ranker

Search Indexing

Ranker uses popular indexing tools like Lucene and Solr to index all searchable data. Using a search index provides huge performance benefits while performing text based search in the application.

Different strategies are used to add data into the index. Entities which are frequently created / changed in the system are added to the index through an automated SQL query, which runs every 5 minutes. Other entities, like the data obtained from freebase is updated through a program that is triggered manually.

Solr allows us to search across a number of fields and also do so using different weights for each type of field, without compromising on the speed of the search.

Page 12: Performance tuning in ranker

Conclusion

Making any changes in the application for improving the speed and performance of the application always involve certain trade-offs. In Ranker, we have made sure that we only make changes once they are analyzed well and we are ready to handle all the side effects of the change. Changes often involve additional effort in maintenance and environment setup. Some of them even require us to acquire and maintain new servers, like in the case of search indexing and background processes.

By choosing a variety of techniques to handle the different performance problems in the application, Ranker has been able to deliver and scale to the traffic as it becomes more popular.