implementing elasticsearch in a web application using...
TRANSCRIPT
Implementing Elasticsearch in a Web application using
Ruby on Rails
Sai Krishna Vadavalli
Problem Report Submitted
to the College of Engineering and Mineral Resources
at West Virginia University
in partial fulfillment of the requirements for the degree of
Master of Science in Computer Science
Dr. Vinod Kulathumani, Ph.D., Chair
Dr. Elaine M. Eschen, Ph.D.
Dr. Thirimachos Bourlai, Ph.D.
Department of Computer Science and Electrical Engineering
Morgantown, West Virginia
2014
Keywords: Ruby on Rails, Index based search, Elasticsearch, Tire, Searchkick
Copyright 2014 Sai Krishna Vadavalli
ABSTRACT
Implementing Elasticsearch in a web application using Ruby on Rails
Sai Krishna Vadavalli
Given the continually increasing amount of information available on the Web, information retrieval
systems are absolutely necessary to reduce the information overload, allow users to quickly and
easily find information. Information retrieval techniques are employed in various applications such
as the web search engines and enterprise search servers.
Web search engines perform search for information on World Wide Web, the enterprise search
systems in contrast index and search for documents from file systems, databases, and email etc.
using full text search capabilities. Elasticsearch is a flexible, distributed, open source real time
enterprise search server with full-text search engine capabilities which uses robust set of APIs,
query DSLs and clients for popular languages to perform scalable and reliable search in
distributed environments. It is built around Lucene’s Java libraries which implement actual
algorithms for matching text and storing optimized indexes of searchable terms. This report
discuses about the process of developing a web application using Ruby on rails, an open source
full stack web application framework using Model-View-Controller pattern to organize
application programming, usage of JDBC River plugin to index an existing MySQL database into
schema-less indexing model of elasticsearch and implement full text search, facet based searching
and autocomplete features in the application using Tire and Searchkick gems (packages in Ruby
programming).
ACKNOWLEDGEMENTS
I would like to thank Dr. Vinod Kulathumani for his constant support and invaluable guidance in
many aspects of my problem report.
I would like to extend my thanks to Dr. Thirimachos Bourlai & Dr. Elaine Eschen for being on
the committee and supporting the project with their suggestions and encouragement.
I would like to thank my beloved parents for their encouragement and support in every aspect of
my life.
TABLE OF CONTENTS
Title Page
CHAPTER 1: INTRODUCTION 1
1.1 Background 1
1.2 Problem Statement 3
CHAPTER 2: LITERATURE REVIEW 4
CHAPTER 3: ELASTICSEARCH SERVER 7
3.1 Basic Concepts of Elasticsearch 8
3.2 Inverted Index 10
3.3 Relevance 11
3.4 Data Management 11
3.5 Shard Management 12
3.6 Downloading and Installing Elasticsearch 13
CHAPTER 4: IMPLEMENTATION 16
4.1 Indexing MySQL database to elasticsearch using JDBC river plugin 16
4.2 Implementing elasticsearch using Tire and Searchkick gems on Ruby on Rails 17
CHAPTER 5: USER INTERFACE 18
5.1 Starting Rails Server 18
5.2 Wheel Interface 19
5.3 Search 22
CHAPTER 6: FUTURE WORK 24
REFERENCES 25
TABLE OF FIGURES
Title Page
Figure 1: Number of Website on World Wide Web from 2000-2013 1
Figure 2: Elasticsearch Cluster Architecture 9
Figure 3: Shard Management 13
Figure 4: Starting Elasticsearch instance 14
Figure 5: Elasticsearch via Browser 14
Figure 6: Cluster Health 15
Figure 7: JDBC River to import data from MySQL 16
Figure 8: Starting Rails Web Server 18
Figure 9: Wheel Login/ Signup interface 19
Figure 10: Log in to wheel 19
Figure 11: Create an Account in Wheel 20
Figure 12: Create/ View Posts 21
Figure 13: Create/ Send Message 21
Figure 14: Inbox/ Sent Message 22
Figure 15: Facet based Search 22
Figure 16: Auto-Complete Search 23
1
CHAPTER 1: INTRODUCTION
1.1 Background:
The World Wide Web [1]
is a collection of web pages that contain text, images, videos and other
multimedia interlinked via hyperlinks and accessed by the internet. With the technological
advances and increase in the usage of internet, the amount of websites on World Wide Web is
growing rapidly. According to the statistics available on internet we are fast approaching 1 billion
websites [2]
by the end of June 2014. The figure below shows the total number of websites by year
from 2000-2013.
Figure 1: Number of Website on World Wide Web from 2000-2013
With the enormous amount of information available on the internet, it has become next to
impossible to remember everything that is available on World Wide Web. That’s why the tools
which perform information retrieval have gained popularity in recent years.
Information retrieval systems are used to reduce the information overload, allow users to quickly
and easily find information without having the need to wade through numerous web pages. There
are many areas where information retrieval techniques are employed. However, the web search
engines are the most visible information retrieval applications.
Web search engine [3]
systems are designed to search information on World Wide Web. These
search engines work by storing information about many web pages which are retrieved by
2
automated Web Crawler or Spider that follows every link on the site. The retrieved web pages are
indexed by the search engine after analyzing the contents of each page. A search engine maintains
a database of such indexes for use in later queries. Google is one such web search engine that is
most widely used, it stores all or part (cache) of source page and also information about web pages.
Information retrieval systems can also be used to perform a full text search on websites of
universities, public libraries, online forums and e-commerce websites which contain huge amount
of information stored in their local databases. Inclusion of a full text search in a website provides
fast responsiveness and quicker solutions to queries submitted by the users.
In earlier days, a full text search is implemented on relational databases like MySQL by using
SELECT and LIKE statements. However, the process is considered to be complex and time
consuming because, if we use a LIKE statement with percent sign (%) before and after a search
string, a table scan is performed. Thus, if the number of tables between which the relation has to be
established in order to respond a search query is high, the search consumes large amount of time to
retrieve matching strings from the database.
In order to increase the performance, to include flexible query operators and relevancy weighting
many full text search tools that implement inverted indexing are introduced in recent time such as
Apache Solr, BaseX, DataparkSearch, Elasticsearch, Sphinx. The improved full text search tools
include features like wildcard search, phrase search, regular expression, Boolean queries etc.
The report focuses on implementing Elasticsearch on a database of universities in a web
application named “Wheel” developed using Ruby on Rails.
3
1.2 Problem Statement:
The report deals with building a web application using Ruby on Rails, an open source full stack
web application framework using Model-View-Controller pattern which allows the user to create a
profile and generate posts, send e-mail messages to other users in the system, search for
information about various universities.
Elasticsearch has been implemented in the web application using Tire and Searchkick gems
provided by Ruby on Rails which allows the user to perform full text search, faceted search and
autocomplete features on a database of universities that are indexed into the elasticsearch server
using JDBC river plugin from the MySQL database.
4
2. LITERATURE REVIEW
The activity of obtaining information relevant to a query from a large collection of documents is
called information retrieval. Information retrieval [4] systems can be distinguished by the scale of
operation.
Web information retrieval/ search performs search over billions of documents stored on
millions of computers around the world.
Personal information retrieval with integration of information retrieval functionality in
consumer operating systems and also in email clients
In Enterprise, Institutional and domain-specific search, where retrieval of information
from collections of internal documents which are typically stored on a centralized file
system.
To reduce the information overload and increase the performance of search, various information
retrieval techniques were introduced. Traditionally linear search techniques were used to perform
text search such as ‘grepping’ using UNIX grep command and SQL LIKE operators, join queries to
match query strings with the resources available in the database. Here, the information is retrieved
by performing a linear scan referred as wildcard pattern matching. However, there are some
disadvantages of performing these traditional ways to retrieve information. They are:
Its worst case complexity of query is proportional to number of elements in the list.
It is impractical to query database for a string such as ‘search for information’ with grep
where the term ‘search’ appears multiple times in a same sentence.
Unlike index based search it does not allow rank based searching where the best matching
results appear early in the list displayed to the end user.
Thus, to avoid linearly scanning texts for each query the modern automated information retrieval
tools perform inverted index.
Inverted index [5] is an index data structure storing terms or content which map to the parts of its
database file or a document. Building an inverted index involves collecting documents to be
indexed and assigning a Document ID for each document when it first appears, converting each
document in to a list of tokens by performing tokenization and performing normalization which is
pairing of terms and Document ID to get the indexing terms.
5
It covers a wide range of data and information problems by providing the ability to perform search
on structured, semi structured and unstructured data. Unstructured data refers to data that does not
have a clear semantics such as data stored in online retailer website database to maintain product
inventories and personnel records.
Several web search engine softwares were designed based on inverted index to search information
on worldwide web such as Google, Baidu, Yahoo, Bing etc. Google [6], considered to be one of the
most powerful web search engines performs indexed based search on approximately over 60
trillion individual pages which are managed in a large index of keywords of over 100 million
gigabytes. It also uses PageRank algorithm which assigns a relevancy score for each webpage.
Webpage’s page rank is determined by the following factors:
Frequency and location of keyword in a webpage: Lower score will be assigned to
keywords that appears only once or less frequently within the webpage
Webpage existence time: Based on the history of web page establishment google
prioritizes the web pages
Number of web pages that link to page in question: The rank of a page increases when
many other web pages link to it.
Also, in order to perform search on Personal, Enterprise, Institutional and domain-specific
information, Lucene based automated information retrieval tools were introduced. Lucene [7]
has
been widely recognized for its utility in implementation of internet search engines, local database
searching. It is an open source, highly scalable text search engine library originally written in Java
by Doug Cutting supported by Apache Software Foundation. The process of full text search using
Lucene includes creating index for the documents in the database and parsing the user query to
display results that match the prebuilt indexes. Several enterprise search server projects such as
Solr and Elasticsearch are developed to extend Lucene’s capabilities and perform search on a
database of documents within a web application.
The flexibility of Lucene’s API allows indexing text from PDFs, HTML, Word and
OpenDocument documents as long as textual information can be extracted. Lucene has many
features, such as:
6
Usage of powerful, accurate and efficient search algorithms
Simultaneous indexing and searching
Support for powerful query types such as phrase query, wildcard query, Boolean query etc.
Evaluating score for each document that matches a given query and returns the most
relevant documents based on the scores
Solr [8]
is a stand-alone enterprise search server with a REST-like API. Documents are indexed via
XML, JSON, CSV or binary over HTTP and results are obtained via HTTP GET request in XML,
JSON, CSV or binary formats. Solr has the following features:
Full text search capabilities
Faceted search and Filtering
Real time indexing
Flexible and adaptable with XML configuration etc.
Apache Solr when compared to Elasticsearch has some limitations such as distributed replication,
manual fix for corrupted nodes and shard distribution.
In this project I implemented a full text search with autocomplete feature using Elasticsearch on a
Ruby on Rails web application. Autocomplete [9] feature gives the users an instant feedback as they
type a query. As the autocomplete query will not contain complete words like a regular query
submitted for index based search a better approach is to perform search based on N-Grams. In this
approach the normalized tokens obtained by indexing process are adjusted so that partial word
queries match directly.
For example: Instead of searching for an indexed term – [lookup]. The index is adjusted by nGram
tokenizer as - [l] [lo] [loo] [look] [looku] [lookup]. Where, nGram is a contiguous collection of n
items from a sequence of text or speech for predicting the next item in the sequence.
7
3. ELASTICSEARCH SERVER
Elasticsearch [10]
is a flexible, distributed, open source real time search engine which uses robust
set of APIs, query DSLs and clients for most popular languages to perform scalable and reliable
search in distributed environments. The elasticsearch server project was started by Shay Banon and
first published in February 2010. Because of its distributed nature and real-time abilities it is used
as a document database. Thus, Elasticsearch can be used both as a search engine and also a data
store.
Elasticsearch is Lucene: Elasticsearch is a piece of infrastructure built around Lucene's Java libraries. Lucene implements
the actual algorithms for matching text and storing optimized indexes of searchable terms.
Problems solved by Elasticsearch:
Searching a large database of product descriptions for best match and returning best results.
Auto completing search based on the previously issued searches, accounting misspelt
phrases in the search.
Stores large amount of semi-structured data in a distributed fashion. Although, elasticsearch has aforementioned problem solving capabilities there are certain relational
database operations that cannot be performed on an elasticsearch database such as creation of
unique records for instance ID or phone number, performing mathematical operation such as sum,
comparison etc. on the data available in the database.
Why Elasticsearch? [11]
Basic search operations such as single word search, basic Boolean queries are performed on
traditional databases. However, the search gets complex while dealing with data spread over
multiple tables and columns making it impossible to search content scattered throughout the
database. Therefore, the unstructured textual data didn’t fit into the traditional style databases, thus
the need for unstructured full-text searching was apparent. Various open source information
retrieval tools were introduced to perform full text search on unstructured textual data such as
Lucene, Sphinx, Solr and Elasticsearch.
8
Apache Solr and Elasticsearch are considered to be two major search tools that are currently in use.
As both of them are based on Lucene, there is not much difference in the features supported by
both search tools. The fundamental difference which makes one better than the other is the working
of distributed replication. As Lucene stores search index in immutable segments, with the increase
in the number of documents new indexed segments are generated by merging existing immutable
segments into a more efficient segment.
Although Elasticsearch and Solr use Lucene to store their search index, the replication to different
servers differs in both the tools. This procedure works great in Solr when the application doesn’t
need to add new documents to search index too often as it copies the new segment files to different
servers which requires a call to commit for the content to show in other search shards. Elasticsearch
solved this problem by sending the new segment file to be indexed by all search shards thus
making the information available on all the replica shards.
Elasticsearch offers a more robust single server deployment by having a write-ahead log which
automatically fixes if a node gets corrupted, whereas Solr needs a manual fix. Moreover,
elasticsearch has high availability built-in, thus if the primary shard goes down one of the replicas
become primary without much trouble as all the shards maintain same content.
3.1 Basic Concepts of Elasticsearch: 3.1.1 Index: A place where elasticsearch stores data. You can think of an index as a table of
relational database. In contrast to relational database the values stored in an index can be used for
fast and efficient full text search operations.
3.1.2 Document: It is an analogy to a row in a relational database table. Document is the main
entity stored in elasticsearch. Documents contain fields which may appear multiple times. In such
cases they are called multivalued documents. Field type is important in elasticsearch document as
they provide information to the search engine to perform operations such as comparison, sorting
etc.
9
3.1.3 Document type: Document type provides a way to differentiate different objects. Although every document has
different structures, dividing it based on types help in data manipulation. For example: Blog applications store articles, books, magazines, comments etc.
3.1.4 Node and Cluster: Each elasticsearch instance runs on a single standalone server which is called a node and collection
of such cooperating servers forms a cluster. Data can be split across nodes via index sharding.
Replicas help to achieve better availability and performance.
Elasticsearch cloud consists of several clusters each with a group of nodes which are instances of
Elasticsearch.
Figure 2: Elasticsearch Cluster Architecture
3.1.5 Shard: Limitation such as hard disk capacity, limited computing power and RAM limitations arise when
large number of documents are to be stored on a single node. Thus, in such cases data can be
divided into shards where each shard is a separate Lucene index which is spread among the
clusters.
10
3.1.6 Replica: Shard replicas are used to achieve high availability and throughput. Replica is the exact copy of
primary shard where the actual operations are performed on the data. Every shard can have zero or
more replicas. Therefore, if the primary shard gets lost or destroyed the replica is promoted by the
cluster to be the primary shard.
3.2 Inverted Index [12]
Elasticsearch uses inverted index structure which allows very fast full text searches. Inverted index
is a list of unique words in a document and the list of documents in which certain word appears.
The content field of each document is split into separate words which are called tokens. A list of
such terms and the document in which they appear is created. Consider two documents with the
following content:
“Elasticsearch performs full text search”
“Full text search uses inverted index”
The result looks something like this:
Term Doc1 Doc2
Elasticsearch X
Performs X
Full X X
Text X X
Search X X
Uses X
Inverted X
Index X
Now, if we want to perform a search for “performs search” we just need to find the documents in
which each term appears:
11
Term Doc1 Doc2
Performs X
Search X X
The document with highest number of matches is considered a better match i.e. Doc 1 and queried
based on the naïve similarity algorithm. Search operation performs inverted index, and based on
relevance scoring the matching results are displayed.
3.3 Relevance: A positive floating point number called ‘_score’ generated by query clause is used to represent the
relevance of every document. Relevance is the algorithm used to calculate the similarity of contents
of a full text field with the query string. In elasticsearch a standard similarity algorithm which takes into account the term frequency,
inverse document frequency and length of field to evaluate the relevance.
Term Frequency: It evaluates the relevance based on how often a term appears in the search field.
Thus, a term appearing more often makes the term more relevant
Inverse Document Frequency: The relevance score is calculated based on the number of times
each term appears in the index. Thus, a term appearing many times in an index has lower weight
than terms that are appearing unique and uncommon.
Field Norm: Based on field norm, the term appearing in a short content field carries more weight
than the same term appearing in a long content field. Thus, the longer the content length the lesser
is the relevance of the words in the content.
3.4 Data Management In elasticsearch every record must be stored as JSON object in Elasticsearch. Thus, concepts of
data management and JSON are essential in order to work with Elasticsearch data and services.
12
Elastic search uses a common approach to split an index into many shards so that they can be
spread over several nodes. Every shard contains data by storing up to 2^32 records. Elasticsearch
performance scales horizontally with the number of shards. Thus, to avoid poor performance the
best practice is to maintain a shard with maximum size of 10GB.
Comparison of Elasticsearch structure with SQL and Mongo DB
Elasticsearch SQL Mongo DB
Index Database Database
Record (JSON object) Record (Tuple) Record (BSON object)
Field Field Field
Mapping Table Collection
Shard Shard Shard
3.5 Shard Management Every index has one or more replicas in order to avoid node failures and data loss for higher
availability and improved performance of cluster. Shards are of two types Primary Shards: Shards that are part of master index
Secondary Shards: Shards that are part of replicas
Consistency of operations is maintained by following these rules:
Execute write-first in the primary shard
If success then propagate to secondary shards Thus, if primary shard is deleted or destroyed, secondary becomes the primary shard and the flow
is re-executed.
13
Figure 3: Shard Management Cluster Indicator: Indicates the health of the cluster. The following are different states of cluster
condition:
Green – All clusters are in good condition
Yellow – Some shards are missing. However, all the clusters are in working condition
Red – Threat or Loss of primary shards
3.6 Downloading and Installing Elasticsearch:
Following steps should be followed for downloading and installing elasticsearch: Download latest version of elasticsearch from http://www.elasticsearch.org/download/ . Based on
the type of operating system download specific version of elasticsearch and extract the files into a
directory.
14
3.6.1 Running Elasticsearch
Start elasticsearch server instance by opening the installed directory and run
bin/elasticsearch -f [Linux/ MacOS]
bin\elasticsearch.bat -f [Windows] in the command line
Figure 4: Starting Elasticsearch instance
The default port for HTTP API is 9200. However, if the default port is not available the engine
binds the next free port. We can check the working of elasticsearch on localhost port 9200 via
browser.
Figure 5: Elasticsearch via browser It can also be checked by running a cURL command in the terminal by passing a query as
15
curl –XGET http://localhost:9200/_cluster/health?pretty
Figure 6: Cluster Health
The output is a structured Java Script Object Notation (JSON) object
3.6.2 Running Elasticsearch as a service: In order to automatically start elasticsearch instance during system boot and close during system
shutdown, Elasticsearch deb archive or service wrapper can be used which provides the necessary
startup scripts.
16
4. IMPLEMENTATION
4.1 Indexing MySQL database to Elasticsearch using JDBC River plugin
Elasticsearch provides a wide range of plugins to enhance the elasticsearch functionality in a
custom manner. One among such plugins is JDBC river plugin [13]
developed by Jorg Prante which
fetches data from MySQL database and indexing it to Elasticsearch. The plugin transforms the
internal data into structured JSON objects for schema-less indexing supported by Elasticsearch.
Figure 7: JDBC River to import data from MySQL A JDBC river can be created by issuing a simple command after downloading JDBC driver jar
from MySQL and placing the jar file into the plugin folder.
Command:
curl –XPUT ‘localhost:9200/_river/my_jdbc_river/_meta’ –d
‘{ “type” : “jdbc”, “jdbc” : {
“url”: “jdbc:mysql://localhost:3306/database_name”,
“user”: “root”,
“password”: “”,
“sql”: “use database wheel_development” }}’
17
4.2 Implementing elasticsearch using Tire and Search kick gems Ruby on Rails
A gem in ruby is a package containing information along with files to install. Ruby has Tire [14]
and Searchkick [15]
gems which provide a rich and comfortable Ruby API for the elasticsearch
search engine/ database. These gems have to be added to the models on which elasticsearch has to
be applied.
In order to install Tire and Searchkick gems we have to include
gem ‘tire’
gem ‘searchkick’ in application gem file and run bundle install command
To implement Tire on a specific model:
class Posts < ActiveRecord::Base
include Tire::Model::Search
include Tire::Model::Callbacks
end
To implement search on a specific model:
class Univ < ActiveRecord::Base
searchkick autocomplete: ['univ_name']
end
18
5. USER INTERFACE
‘Wheel’ is a web application developed using robust Ruby on Rails framework [16]
which allows
users to create user profile, generate posts visible to other users of wheel, send personal messages
and search for basic information about universities. In order to view and perform earlier mentioned
operations on the application the rails server and elasticsearch server have to be started.
5.1 Starting Rails server:
Figure 8: Starting Rails Web Server
The ‘Thin’ Webserver is used as it is considered to be the most secure, stable, fast and extensible
ruby web server which has a more concurrency level compared to other ruby web servers like
WEBrick and Mongrel.
19
5.2 Wheel Interface:
Figure 9: Wheel Login/ Signup interface
5.2.1 Login: Validates the user supplied credentials and creates a unique session to the user if the
provided credentials match with the details in the database. An error is displayed for any
mismatch/ unavailable user credentials.
Figure 10: Log in to wheel
20
5.2.2 Signup:
Rails provides ‘bcrypt’ gem which is a secure hash algorithm designed by OpenBSD to perform
password hashing. The system performs validations for password mismatch, uniqueness of email
address and pops out errors if any validations fail based on the details provided by the user.
Figure 11: Create an Account in Wheel
On successful registration the user is redirected to the home screen of wheel which provides
navigation to sending message, searching for universities. The home screen allows user to read,
generate posts visible to public on wheel and also perform search on posts.
21
5.2.3 Creating Posts: Home screen allows the user to generate, read posts generated by other users
Figure 12: Create/ View posts
5.2.4 Send Message: It allows users to send personal messages to the registered users of the application.
Figure 13: Create/Send Message
22
5.2.5 View Sent/Received Messages:
Figure14: Inbox/ Sent Message 5.3 Search:
Application allows performing a full text search on the information available about various
universities in the database, facet search based on different states. It also includes an auto-
complete feature which matches the query with the related terms in the indexed database, shows
the appropriate matches as suggestions.
Figure 15: Facet based search
23
Figure 16: Auto-Complete Search
24
6. FUTURE WORK Wheel can be enhanced by introducing friendships, building personal profile by updating details
about the user and avatars. Search can be applied for the users in the system. Also, elasticsearch
allows range facets which can be used to perform search on the universities based on rank,
acceptance rate, graduation percentage and the fee can be included. Highlighting, suggestions,
rank can also be applied to the search.
Elasticsearch supports kibana, a highly scalable open-source real time analysis of streaming
data to visualize the data logs. It can be used to perform time based comparisons on a range of
data accessed by users around the world.
25
REFERENCES
[1] World Wide Web http://en.wikipedia.org/wiki/World_Wide_Web
[2] http://www.internetlivestats.com/total-number-of-websites/
[3] Web search engine http://en.wikipedia.org/wiki/Web_search_engine
[4] Introduction to information retrieval by Manning, Christopher D, Raghavan Prabhakars
[5] Inverted Index http://en.wikipedia.org/wiki/inverted_index
[6] http://www.google.com/insidesearch/howsearchworks/thestory/index.html
[7] Lucene http://en.wikipedia.org/wiki/Lucene
[8] Apache Solr Beginner’s Guide by Alfredo Serafini
[9] http://jontai.me/blog/2013/02/adding-autocomplete-to-an-elasticsearch-search-application/
[10] Elasticsearch server by Rafal Kuc, Marek Rogozinski
[11] Solr vs Elasticsearch by Rafal Kuc http://blog.sematext.com/2012/08/23/solr-vs-elasticsearch-part-1- overview/ [12] Inverted Index
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/inverted-index.html
[13] JDBC plugin for elasticsearch
https://github.com/jprante/elasticsearch-river-jdbc
[14] Tire library for elasticsearch https://github.com/karmi/retire
[15] Searchkick library for elasticsearch https://github.com/ankane/searchkick
[16] Beginning Ruby on Rails, Holzner, Steven
[17] Elasticsearch Cookbook by Paro, Alberto
26
[18] RailsSpace: building a social networking Website with Ruby on Rails, Michael
Hartl, Aurelius Prochazka
[19] Exploring elasticsearch http://exploringelasticsearch.com/
[20] Wikipedia http://www.wikipedia.org/