implementing elasticsearch in a web application using...

31
Implementing Elasticsearch in a Web application using Ruby on Rails Sai Krishna Vadavalli Problem Report Submitted to the College of Engineering and Mineral Resources at West Virginia University in partial fulfillment of the requirements for the degree of Master of Science in Computer Science Dr. Vinod Kulathumani, Ph.D., Chair Dr. Elaine M. Eschen, Ph.D. Dr. Thirimachos Bourlai, Ph.D. Department of Computer Science and Electrical Engineering Morgantown, West Virginia 2014 Keywords: Ruby on Rails, Index based search, Elasticsearch, Tire, Searchkick Copyright 2014 Sai Krishna Vadavalli

Upload: voduong

Post on 12-Feb-2018

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Implementing Elasticsearch in a Web application using …wvuscholar.wvu.edu/reports/SaiKrishnaVadavalli.pdf · ABSTRACT Implementing Elasticsearch in a web application using Ruby

Implementing Elasticsearch in a Web application using

Ruby on Rails

Sai Krishna Vadavalli

Problem Report Submitted

to the College of Engineering and Mineral Resources

at West Virginia University

in partial fulfillment of the requirements for the degree of

Master of Science in Computer Science

Dr. Vinod Kulathumani, Ph.D., Chair

Dr. Elaine M. Eschen, Ph.D.

Dr. Thirimachos Bourlai, Ph.D.

Department of Computer Science and Electrical Engineering

Morgantown, West Virginia

2014

Keywords: Ruby on Rails, Index based search, Elasticsearch, Tire, Searchkick

Copyright 2014 Sai Krishna Vadavalli

Page 2: Implementing Elasticsearch in a Web application using …wvuscholar.wvu.edu/reports/SaiKrishnaVadavalli.pdf · ABSTRACT Implementing Elasticsearch in a web application using Ruby

ABSTRACT

Implementing Elasticsearch in a web application using Ruby on Rails

Sai Krishna Vadavalli

Given the continually increasing amount of information available on the Web, information retrieval

systems are absolutely necessary to reduce the information overload, allow users to quickly and

easily find information. Information retrieval techniques are employed in various applications such

as the web search engines and enterprise search servers.

Web search engines perform search for information on World Wide Web, the enterprise search

systems in contrast index and search for documents from file systems, databases, and email etc.

using full text search capabilities. Elasticsearch is a flexible, distributed, open source real time

enterprise search server with full-text search engine capabilities which uses robust set of APIs,

query DSLs and clients for popular languages to perform scalable and reliable search in

distributed environments. It is built around Lucene’s Java libraries which implement actual

algorithms for matching text and storing optimized indexes of searchable terms. This report

discuses about the process of developing a web application using Ruby on rails, an open source

full stack web application framework using Model-View-Controller pattern to organize

application programming, usage of JDBC River plugin to index an existing MySQL database into

schema-less indexing model of elasticsearch and implement full text search, facet based searching

and autocomplete features in the application using Tire and Searchkick gems (packages in Ruby

programming).

Page 3: Implementing Elasticsearch in a Web application using …wvuscholar.wvu.edu/reports/SaiKrishnaVadavalli.pdf · ABSTRACT Implementing Elasticsearch in a web application using Ruby

ACKNOWLEDGEMENTS

I would like to thank Dr. Vinod Kulathumani for his constant support and invaluable guidance in

many aspects of my problem report.

I would like to extend my thanks to Dr. Thirimachos Bourlai & Dr. Elaine Eschen for being on

the committee and supporting the project with their suggestions and encouragement.

I would like to thank my beloved parents for their encouragement and support in every aspect of

my life.

Page 4: Implementing Elasticsearch in a Web application using …wvuscholar.wvu.edu/reports/SaiKrishnaVadavalli.pdf · ABSTRACT Implementing Elasticsearch in a web application using Ruby

TABLE OF CONTENTS

Title Page

CHAPTER 1: INTRODUCTION 1

1.1 Background 1

1.2 Problem Statement 3

CHAPTER 2: LITERATURE REVIEW 4

CHAPTER 3: ELASTICSEARCH SERVER 7

3.1 Basic Concepts of Elasticsearch 8

3.2 Inverted Index 10

3.3 Relevance 11

3.4 Data Management 11

3.5 Shard Management 12

3.6 Downloading and Installing Elasticsearch 13

CHAPTER 4: IMPLEMENTATION 16

4.1 Indexing MySQL database to elasticsearch using JDBC river plugin 16

4.2 Implementing elasticsearch using Tire and Searchkick gems on Ruby on Rails 17

CHAPTER 5: USER INTERFACE 18

5.1 Starting Rails Server 18

5.2 Wheel Interface 19

5.3 Search 22

CHAPTER 6: FUTURE WORK 24

REFERENCES 25

Page 5: Implementing Elasticsearch in a Web application using …wvuscholar.wvu.edu/reports/SaiKrishnaVadavalli.pdf · ABSTRACT Implementing Elasticsearch in a web application using Ruby

TABLE OF FIGURES

Title Page

Figure 1: Number of Website on World Wide Web from 2000-2013 1

Figure 2: Elasticsearch Cluster Architecture 9

Figure 3: Shard Management 13

Figure 4: Starting Elasticsearch instance 14

Figure 5: Elasticsearch via Browser 14

Figure 6: Cluster Health 15

Figure 7: JDBC River to import data from MySQL 16

Figure 8: Starting Rails Web Server 18

Figure 9: Wheel Login/ Signup interface 19

Figure 10: Log in to wheel 19

Figure 11: Create an Account in Wheel 20

Figure 12: Create/ View Posts 21

Figure 13: Create/ Send Message 21

Figure 14: Inbox/ Sent Message 22

Figure 15: Facet based Search 22

Figure 16: Auto-Complete Search 23

Page 6: Implementing Elasticsearch in a Web application using …wvuscholar.wvu.edu/reports/SaiKrishnaVadavalli.pdf · ABSTRACT Implementing Elasticsearch in a web application using Ruby

1

CHAPTER 1: INTRODUCTION

1.1 Background:

The World Wide Web [1]

is a collection of web pages that contain text, images, videos and other

multimedia interlinked via hyperlinks and accessed by the internet. With the technological

advances and increase in the usage of internet, the amount of websites on World Wide Web is

growing rapidly. According to the statistics available on internet we are fast approaching 1 billion

websites [2]

by the end of June 2014. The figure below shows the total number of websites by year

from 2000-2013.

Figure 1: Number of Website on World Wide Web from 2000-2013

With the enormous amount of information available on the internet, it has become next to

impossible to remember everything that is available on World Wide Web. That’s why the tools

which perform information retrieval have gained popularity in recent years.

Information retrieval systems are used to reduce the information overload, allow users to quickly

and easily find information without having the need to wade through numerous web pages. There

are many areas where information retrieval techniques are employed. However, the web search

engines are the most visible information retrieval applications.

Web search engine [3]

systems are designed to search information on World Wide Web. These

search engines work by storing information about many web pages which are retrieved by

Page 7: Implementing Elasticsearch in a Web application using …wvuscholar.wvu.edu/reports/SaiKrishnaVadavalli.pdf · ABSTRACT Implementing Elasticsearch in a web application using Ruby

2

automated Web Crawler or Spider that follows every link on the site. The retrieved web pages are

indexed by the search engine after analyzing the contents of each page. A search engine maintains

a database of such indexes for use in later queries. Google is one such web search engine that is

most widely used, it stores all or part (cache) of source page and also information about web pages.

Information retrieval systems can also be used to perform a full text search on websites of

universities, public libraries, online forums and e-commerce websites which contain huge amount

of information stored in their local databases. Inclusion of a full text search in a website provides

fast responsiveness and quicker solutions to queries submitted by the users.

In earlier days, a full text search is implemented on relational databases like MySQL by using

SELECT and LIKE statements. However, the process is considered to be complex and time

consuming because, if we use a LIKE statement with percent sign (%) before and after a search

string, a table scan is performed. Thus, if the number of tables between which the relation has to be

established in order to respond a search query is high, the search consumes large amount of time to

retrieve matching strings from the database.

In order to increase the performance, to include flexible query operators and relevancy weighting

many full text search tools that implement inverted indexing are introduced in recent time such as

Apache Solr, BaseX, DataparkSearch, Elasticsearch, Sphinx. The improved full text search tools

include features like wildcard search, phrase search, regular expression, Boolean queries etc.

The report focuses on implementing Elasticsearch on a database of universities in a web

application named “Wheel” developed using Ruby on Rails.

Page 8: Implementing Elasticsearch in a Web application using …wvuscholar.wvu.edu/reports/SaiKrishnaVadavalli.pdf · ABSTRACT Implementing Elasticsearch in a web application using Ruby

3

1.2 Problem Statement:

The report deals with building a web application using Ruby on Rails, an open source full stack

web application framework using Model-View-Controller pattern which allows the user to create a

profile and generate posts, send e-mail messages to other users in the system, search for

information about various universities.

Elasticsearch has been implemented in the web application using Tire and Searchkick gems

provided by Ruby on Rails which allows the user to perform full text search, faceted search and

autocomplete features on a database of universities that are indexed into the elasticsearch server

using JDBC river plugin from the MySQL database.

Page 9: Implementing Elasticsearch in a Web application using …wvuscholar.wvu.edu/reports/SaiKrishnaVadavalli.pdf · ABSTRACT Implementing Elasticsearch in a web application using Ruby

4

2. LITERATURE REVIEW

The activity of obtaining information relevant to a query from a large collection of documents is

called information retrieval. Information retrieval [4] systems can be distinguished by the scale of

operation.

Web information retrieval/ search performs search over billions of documents stored on

millions of computers around the world.

Personal information retrieval with integration of information retrieval functionality in

consumer operating systems and also in email clients

In Enterprise, Institutional and domain-specific search, where retrieval of information

from collections of internal documents which are typically stored on a centralized file

system.

To reduce the information overload and increase the performance of search, various information

retrieval techniques were introduced. Traditionally linear search techniques were used to perform

text search such as ‘grepping’ using UNIX grep command and SQL LIKE operators, join queries to

match query strings with the resources available in the database. Here, the information is retrieved

by performing a linear scan referred as wildcard pattern matching. However, there are some

disadvantages of performing these traditional ways to retrieve information. They are:

Its worst case complexity of query is proportional to number of elements in the list.

It is impractical to query database for a string such as ‘search for information’ with grep

where the term ‘search’ appears multiple times in a same sentence.

Unlike index based search it does not allow rank based searching where the best matching

results appear early in the list displayed to the end user.

Thus, to avoid linearly scanning texts for each query the modern automated information retrieval

tools perform inverted index.

Inverted index [5] is an index data structure storing terms or content which map to the parts of its

database file or a document. Building an inverted index involves collecting documents to be

indexed and assigning a Document ID for each document when it first appears, converting each

document in to a list of tokens by performing tokenization and performing normalization which is

pairing of terms and Document ID to get the indexing terms.

Page 10: Implementing Elasticsearch in a Web application using …wvuscholar.wvu.edu/reports/SaiKrishnaVadavalli.pdf · ABSTRACT Implementing Elasticsearch in a web application using Ruby

5

It covers a wide range of data and information problems by providing the ability to perform search

on structured, semi structured and unstructured data. Unstructured data refers to data that does not

have a clear semantics such as data stored in online retailer website database to maintain product

inventories and personnel records.

Several web search engine softwares were designed based on inverted index to search information

on worldwide web such as Google, Baidu, Yahoo, Bing etc. Google [6], considered to be one of the

most powerful web search engines performs indexed based search on approximately over 60

trillion individual pages which are managed in a large index of keywords of over 100 million

gigabytes. It also uses PageRank algorithm which assigns a relevancy score for each webpage.

Webpage’s page rank is determined by the following factors:

Frequency and location of keyword in a webpage: Lower score will be assigned to

keywords that appears only once or less frequently within the webpage

Webpage existence time: Based on the history of web page establishment google

prioritizes the web pages

Number of web pages that link to page in question: The rank of a page increases when

many other web pages link to it.

Also, in order to perform search on Personal, Enterprise, Institutional and domain-specific

information, Lucene based automated information retrieval tools were introduced. Lucene [7]

has

been widely recognized for its utility in implementation of internet search engines, local database

searching. It is an open source, highly scalable text search engine library originally written in Java

by Doug Cutting supported by Apache Software Foundation. The process of full text search using

Lucene includes creating index for the documents in the database and parsing the user query to

display results that match the prebuilt indexes. Several enterprise search server projects such as

Solr and Elasticsearch are developed to extend Lucene’s capabilities and perform search on a

database of documents within a web application.

The flexibility of Lucene’s API allows indexing text from PDFs, HTML, Word and

OpenDocument documents as long as textual information can be extracted. Lucene has many

features, such as:

Page 11: Implementing Elasticsearch in a Web application using …wvuscholar.wvu.edu/reports/SaiKrishnaVadavalli.pdf · ABSTRACT Implementing Elasticsearch in a web application using Ruby

6

Usage of powerful, accurate and efficient search algorithms

Simultaneous indexing and searching

Support for powerful query types such as phrase query, wildcard query, Boolean query etc.

Evaluating score for each document that matches a given query and returns the most

relevant documents based on the scores

Solr [8]

is a stand-alone enterprise search server with a REST-like API. Documents are indexed via

XML, JSON, CSV or binary over HTTP and results are obtained via HTTP GET request in XML,

JSON, CSV or binary formats. Solr has the following features:

Full text search capabilities

Faceted search and Filtering

Real time indexing

Flexible and adaptable with XML configuration etc.

Apache Solr when compared to Elasticsearch has some limitations such as distributed replication,

manual fix for corrupted nodes and shard distribution.

In this project I implemented a full text search with autocomplete feature using Elasticsearch on a

Ruby on Rails web application. Autocomplete [9] feature gives the users an instant feedback as they

type a query. As the autocomplete query will not contain complete words like a regular query

submitted for index based search a better approach is to perform search based on N-Grams. In this

approach the normalized tokens obtained by indexing process are adjusted so that partial word

queries match directly.

For example: Instead of searching for an indexed term – [lookup]. The index is adjusted by nGram

tokenizer as - [l] [lo] [loo] [look] [looku] [lookup]. Where, nGram is a contiguous collection of n

items from a sequence of text or speech for predicting the next item in the sequence.

Page 12: Implementing Elasticsearch in a Web application using …wvuscholar.wvu.edu/reports/SaiKrishnaVadavalli.pdf · ABSTRACT Implementing Elasticsearch in a web application using Ruby

7

3. ELASTICSEARCH SERVER

Elasticsearch [10]

is a flexible, distributed, open source real time search engine which uses robust

set of APIs, query DSLs and clients for most popular languages to perform scalable and reliable

search in distributed environments. The elasticsearch server project was started by Shay Banon and

first published in February 2010. Because of its distributed nature and real-time abilities it is used

as a document database. Thus, Elasticsearch can be used both as a search engine and also a data

store.

Elasticsearch is Lucene: Elasticsearch is a piece of infrastructure built around Lucene's Java libraries. Lucene implements

the actual algorithms for matching text and storing optimized indexes of searchable terms.

Problems solved by Elasticsearch:

Searching a large database of product descriptions for best match and returning best results.

Auto completing search based on the previously issued searches, accounting misspelt

phrases in the search.

Stores large amount of semi-structured data in a distributed fashion. Although, elasticsearch has aforementioned problem solving capabilities there are certain relational

database operations that cannot be performed on an elasticsearch database such as creation of

unique records for instance ID or phone number, performing mathematical operation such as sum,

comparison etc. on the data available in the database.

Why Elasticsearch? [11]

Basic search operations such as single word search, basic Boolean queries are performed on

traditional databases. However, the search gets complex while dealing with data spread over

multiple tables and columns making it impossible to search content scattered throughout the

database. Therefore, the unstructured textual data didn’t fit into the traditional style databases, thus

the need for unstructured full-text searching was apparent. Various open source information

retrieval tools were introduced to perform full text search on unstructured textual data such as

Lucene, Sphinx, Solr and Elasticsearch.

Page 13: Implementing Elasticsearch in a Web application using …wvuscholar.wvu.edu/reports/SaiKrishnaVadavalli.pdf · ABSTRACT Implementing Elasticsearch in a web application using Ruby

8

Apache Solr and Elasticsearch are considered to be two major search tools that are currently in use.

As both of them are based on Lucene, there is not much difference in the features supported by

both search tools. The fundamental difference which makes one better than the other is the working

of distributed replication. As Lucene stores search index in immutable segments, with the increase

in the number of documents new indexed segments are generated by merging existing immutable

segments into a more efficient segment.

Although Elasticsearch and Solr use Lucene to store their search index, the replication to different

servers differs in both the tools. This procedure works great in Solr when the application doesn’t

need to add new documents to search index too often as it copies the new segment files to different

servers which requires a call to commit for the content to show in other search shards. Elasticsearch

solved this problem by sending the new segment file to be indexed by all search shards thus

making the information available on all the replica shards.

Elasticsearch offers a more robust single server deployment by having a write-ahead log which

automatically fixes if a node gets corrupted, whereas Solr needs a manual fix. Moreover,

elasticsearch has high availability built-in, thus if the primary shard goes down one of the replicas

become primary without much trouble as all the shards maintain same content.

3.1 Basic Concepts of Elasticsearch: 3.1.1 Index: A place where elasticsearch stores data. You can think of an index as a table of

relational database. In contrast to relational database the values stored in an index can be used for

fast and efficient full text search operations.

3.1.2 Document: It is an analogy to a row in a relational database table. Document is the main

entity stored in elasticsearch. Documents contain fields which may appear multiple times. In such

cases they are called multivalued documents. Field type is important in elasticsearch document as

they provide information to the search engine to perform operations such as comparison, sorting

etc.

Page 14: Implementing Elasticsearch in a Web application using …wvuscholar.wvu.edu/reports/SaiKrishnaVadavalli.pdf · ABSTRACT Implementing Elasticsearch in a web application using Ruby

9

3.1.3 Document type: Document type provides a way to differentiate different objects. Although every document has

different structures, dividing it based on types help in data manipulation. For example: Blog applications store articles, books, magazines, comments etc.

3.1.4 Node and Cluster: Each elasticsearch instance runs on a single standalone server which is called a node and collection

of such cooperating servers forms a cluster. Data can be split across nodes via index sharding.

Replicas help to achieve better availability and performance.

Elasticsearch cloud consists of several clusters each with a group of nodes which are instances of

Elasticsearch.

Figure 2: Elasticsearch Cluster Architecture

3.1.5 Shard: Limitation such as hard disk capacity, limited computing power and RAM limitations arise when

large number of documents are to be stored on a single node. Thus, in such cases data can be

divided into shards where each shard is a separate Lucene index which is spread among the

clusters.

Page 15: Implementing Elasticsearch in a Web application using …wvuscholar.wvu.edu/reports/SaiKrishnaVadavalli.pdf · ABSTRACT Implementing Elasticsearch in a web application using Ruby

10

3.1.6 Replica: Shard replicas are used to achieve high availability and throughput. Replica is the exact copy of

primary shard where the actual operations are performed on the data. Every shard can have zero or

more replicas. Therefore, if the primary shard gets lost or destroyed the replica is promoted by the

cluster to be the primary shard.

3.2 Inverted Index [12]

Elasticsearch uses inverted index structure which allows very fast full text searches. Inverted index

is a list of unique words in a document and the list of documents in which certain word appears.

The content field of each document is split into separate words which are called tokens. A list of

such terms and the document in which they appear is created. Consider two documents with the

following content:

“Elasticsearch performs full text search”

“Full text search uses inverted index”

The result looks something like this:

Term Doc1 Doc2

Elasticsearch X

Performs X

Full X X

Text X X

Search X X

Uses X

Inverted X

Index X

Now, if we want to perform a search for “performs search” we just need to find the documents in

which each term appears:

Page 16: Implementing Elasticsearch in a Web application using …wvuscholar.wvu.edu/reports/SaiKrishnaVadavalli.pdf · ABSTRACT Implementing Elasticsearch in a web application using Ruby

11

Term Doc1 Doc2

Performs X

Search X X

The document with highest number of matches is considered a better match i.e. Doc 1 and queried

based on the naïve similarity algorithm. Search operation performs inverted index, and based on

relevance scoring the matching results are displayed.

3.3 Relevance: A positive floating point number called ‘_score’ generated by query clause is used to represent the

relevance of every document. Relevance is the algorithm used to calculate the similarity of contents

of a full text field with the query string. In elasticsearch a standard similarity algorithm which takes into account the term frequency,

inverse document frequency and length of field to evaluate the relevance.

Term Frequency: It evaluates the relevance based on how often a term appears in the search field.

Thus, a term appearing more often makes the term more relevant

Inverse Document Frequency: The relevance score is calculated based on the number of times

each term appears in the index. Thus, a term appearing many times in an index has lower weight

than terms that are appearing unique and uncommon.

Field Norm: Based on field norm, the term appearing in a short content field carries more weight

than the same term appearing in a long content field. Thus, the longer the content length the lesser

is the relevance of the words in the content.

3.4 Data Management In elasticsearch every record must be stored as JSON object in Elasticsearch. Thus, concepts of

data management and JSON are essential in order to work with Elasticsearch data and services.

Page 17: Implementing Elasticsearch in a Web application using …wvuscholar.wvu.edu/reports/SaiKrishnaVadavalli.pdf · ABSTRACT Implementing Elasticsearch in a web application using Ruby

12

Elastic search uses a common approach to split an index into many shards so that they can be

spread over several nodes. Every shard contains data by storing up to 2^32 records. Elasticsearch

performance scales horizontally with the number of shards. Thus, to avoid poor performance the

best practice is to maintain a shard with maximum size of 10GB.

Comparison of Elasticsearch structure with SQL and Mongo DB

Elasticsearch SQL Mongo DB

Index Database Database

Record (JSON object) Record (Tuple) Record (BSON object)

Field Field Field

Mapping Table Collection

Shard Shard Shard

3.5 Shard Management Every index has one or more replicas in order to avoid node failures and data loss for higher

availability and improved performance of cluster. Shards are of two types Primary Shards: Shards that are part of master index

Secondary Shards: Shards that are part of replicas

Consistency of operations is maintained by following these rules:

Execute write-first in the primary shard

If success then propagate to secondary shards Thus, if primary shard is deleted or destroyed, secondary becomes the primary shard and the flow

is re-executed.

Page 18: Implementing Elasticsearch in a Web application using …wvuscholar.wvu.edu/reports/SaiKrishnaVadavalli.pdf · ABSTRACT Implementing Elasticsearch in a web application using Ruby

13

Figure 3: Shard Management Cluster Indicator: Indicates the health of the cluster. The following are different states of cluster

condition:

Green – All clusters are in good condition

Yellow – Some shards are missing. However, all the clusters are in working condition

Red – Threat or Loss of primary shards

3.6 Downloading and Installing Elasticsearch:

Following steps should be followed for downloading and installing elasticsearch: Download latest version of elasticsearch from http://www.elasticsearch.org/download/ . Based on

the type of operating system download specific version of elasticsearch and extract the files into a

directory.

Page 19: Implementing Elasticsearch in a Web application using …wvuscholar.wvu.edu/reports/SaiKrishnaVadavalli.pdf · ABSTRACT Implementing Elasticsearch in a web application using Ruby

14

3.6.1 Running Elasticsearch

Start elasticsearch server instance by opening the installed directory and run

bin/elasticsearch -f [Linux/ MacOS]

bin\elasticsearch.bat -f [Windows] in the command line

Figure 4: Starting Elasticsearch instance

The default port for HTTP API is 9200. However, if the default port is not available the engine

binds the next free port. We can check the working of elasticsearch on localhost port 9200 via

browser.

Figure 5: Elasticsearch via browser It can also be checked by running a cURL command in the terminal by passing a query as

Page 20: Implementing Elasticsearch in a Web application using …wvuscholar.wvu.edu/reports/SaiKrishnaVadavalli.pdf · ABSTRACT Implementing Elasticsearch in a web application using Ruby

15

curl –XGET http://localhost:9200/_cluster/health?pretty

Figure 6: Cluster Health

The output is a structured Java Script Object Notation (JSON) object

3.6.2 Running Elasticsearch as a service: In order to automatically start elasticsearch instance during system boot and close during system

shutdown, Elasticsearch deb archive or service wrapper can be used which provides the necessary

startup scripts.

Page 21: Implementing Elasticsearch in a Web application using …wvuscholar.wvu.edu/reports/SaiKrishnaVadavalli.pdf · ABSTRACT Implementing Elasticsearch in a web application using Ruby

16

4. IMPLEMENTATION

4.1 Indexing MySQL database to Elasticsearch using JDBC River plugin

Elasticsearch provides a wide range of plugins to enhance the elasticsearch functionality in a

custom manner. One among such plugins is JDBC river plugin [13]

developed by Jorg Prante which

fetches data from MySQL database and indexing it to Elasticsearch. The plugin transforms the

internal data into structured JSON objects for schema-less indexing supported by Elasticsearch.

Figure 7: JDBC River to import data from MySQL A JDBC river can be created by issuing a simple command after downloading JDBC driver jar

from MySQL and placing the jar file into the plugin folder.

Command:

curl –XPUT ‘localhost:9200/_river/my_jdbc_river/_meta’ –d

‘{ “type” : “jdbc”, “jdbc” : {

“url”: “jdbc:mysql://localhost:3306/database_name”,

“user”: “root”,

“password”: “”,

“sql”: “use database wheel_development” }}’

Page 22: Implementing Elasticsearch in a Web application using …wvuscholar.wvu.edu/reports/SaiKrishnaVadavalli.pdf · ABSTRACT Implementing Elasticsearch in a web application using Ruby

17

4.2 Implementing elasticsearch using Tire and Search kick gems Ruby on Rails

A gem in ruby is a package containing information along with files to install. Ruby has Tire [14]

and Searchkick [15]

gems which provide a rich and comfortable Ruby API for the elasticsearch

search engine/ database. These gems have to be added to the models on which elasticsearch has to

be applied.

In order to install Tire and Searchkick gems we have to include

gem ‘tire’

gem ‘searchkick’ in application gem file and run bundle install command

To implement Tire on a specific model:

class Posts < ActiveRecord::Base

include Tire::Model::Search

include Tire::Model::Callbacks

end

To implement search on a specific model:

class Univ < ActiveRecord::Base

searchkick autocomplete: ['univ_name']

end

Page 23: Implementing Elasticsearch in a Web application using …wvuscholar.wvu.edu/reports/SaiKrishnaVadavalli.pdf · ABSTRACT Implementing Elasticsearch in a web application using Ruby

18

5. USER INTERFACE

‘Wheel’ is a web application developed using robust Ruby on Rails framework [16]

which allows

users to create user profile, generate posts visible to other users of wheel, send personal messages

and search for basic information about universities. In order to view and perform earlier mentioned

operations on the application the rails server and elasticsearch server have to be started.

5.1 Starting Rails server:

Figure 8: Starting Rails Web Server

The ‘Thin’ Webserver is used as it is considered to be the most secure, stable, fast and extensible

ruby web server which has a more concurrency level compared to other ruby web servers like

WEBrick and Mongrel.

Page 24: Implementing Elasticsearch in a Web application using …wvuscholar.wvu.edu/reports/SaiKrishnaVadavalli.pdf · ABSTRACT Implementing Elasticsearch in a web application using Ruby

19

5.2 Wheel Interface:

Figure 9: Wheel Login/ Signup interface

5.2.1 Login: Validates the user supplied credentials and creates a unique session to the user if the

provided credentials match with the details in the database. An error is displayed for any

mismatch/ unavailable user credentials.

Figure 10: Log in to wheel

Page 25: Implementing Elasticsearch in a Web application using …wvuscholar.wvu.edu/reports/SaiKrishnaVadavalli.pdf · ABSTRACT Implementing Elasticsearch in a web application using Ruby

20

5.2.2 Signup:

Rails provides ‘bcrypt’ gem which is a secure hash algorithm designed by OpenBSD to perform

password hashing. The system performs validations for password mismatch, uniqueness of email

address and pops out errors if any validations fail based on the details provided by the user.

Figure 11: Create an Account in Wheel

On successful registration the user is redirected to the home screen of wheel which provides

navigation to sending message, searching for universities. The home screen allows user to read,

generate posts visible to public on wheel and also perform search on posts.

Page 26: Implementing Elasticsearch in a Web application using …wvuscholar.wvu.edu/reports/SaiKrishnaVadavalli.pdf · ABSTRACT Implementing Elasticsearch in a web application using Ruby

21

5.2.3 Creating Posts: Home screen allows the user to generate, read posts generated by other users

Figure 12: Create/ View posts

5.2.4 Send Message: It allows users to send personal messages to the registered users of the application.

Figure 13: Create/Send Message

Page 27: Implementing Elasticsearch in a Web application using …wvuscholar.wvu.edu/reports/SaiKrishnaVadavalli.pdf · ABSTRACT Implementing Elasticsearch in a web application using Ruby

22

5.2.5 View Sent/Received Messages:

Figure14: Inbox/ Sent Message 5.3 Search:

Application allows performing a full text search on the information available about various

universities in the database, facet search based on different states. It also includes an auto-

complete feature which matches the query with the related terms in the indexed database, shows

the appropriate matches as suggestions.

Figure 15: Facet based search

Page 28: Implementing Elasticsearch in a Web application using …wvuscholar.wvu.edu/reports/SaiKrishnaVadavalli.pdf · ABSTRACT Implementing Elasticsearch in a web application using Ruby

23

Figure 16: Auto-Complete Search

Page 29: Implementing Elasticsearch in a Web application using …wvuscholar.wvu.edu/reports/SaiKrishnaVadavalli.pdf · ABSTRACT Implementing Elasticsearch in a web application using Ruby

24

6. FUTURE WORK Wheel can be enhanced by introducing friendships, building personal profile by updating details

about the user and avatars. Search can be applied for the users in the system. Also, elasticsearch

allows range facets which can be used to perform search on the universities based on rank,

acceptance rate, graduation percentage and the fee can be included. Highlighting, suggestions,

rank can also be applied to the search.

Elasticsearch supports kibana, a highly scalable open-source real time analysis of streaming

data to visualize the data logs. It can be used to perform time based comparisons on a range of

data accessed by users around the world.

Page 30: Implementing Elasticsearch in a Web application using …wvuscholar.wvu.edu/reports/SaiKrishnaVadavalli.pdf · ABSTRACT Implementing Elasticsearch in a web application using Ruby

25

REFERENCES

[1] World Wide Web http://en.wikipedia.org/wiki/World_Wide_Web

[2] http://www.internetlivestats.com/total-number-of-websites/

[3] Web search engine http://en.wikipedia.org/wiki/Web_search_engine

[4] Introduction to information retrieval by Manning, Christopher D, Raghavan Prabhakars

[5] Inverted Index http://en.wikipedia.org/wiki/inverted_index

[6] http://www.google.com/insidesearch/howsearchworks/thestory/index.html

[7] Lucene http://en.wikipedia.org/wiki/Lucene

[8] Apache Solr Beginner’s Guide by Alfredo Serafini

[9] http://jontai.me/blog/2013/02/adding-autocomplete-to-an-elasticsearch-search-application/

[10] Elasticsearch server by Rafal Kuc, Marek Rogozinski

[11] Solr vs Elasticsearch by Rafal Kuc http://blog.sematext.com/2012/08/23/solr-vs-elasticsearch-part-1- overview/ [12] Inverted Index

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/inverted-index.html

[13] JDBC plugin for elasticsearch

https://github.com/jprante/elasticsearch-river-jdbc

[14] Tire library for elasticsearch https://github.com/karmi/retire

[15] Searchkick library for elasticsearch https://github.com/ankane/searchkick

[16] Beginning Ruby on Rails, Holzner, Steven

[17] Elasticsearch Cookbook by Paro, Alberto

Page 31: Implementing Elasticsearch in a Web application using …wvuscholar.wvu.edu/reports/SaiKrishnaVadavalli.pdf · ABSTRACT Implementing Elasticsearch in a web application using Ruby

26

[18] RailsSpace: building a social networking Website with Ruby on Rails, Michael

Hartl, Aurelius Prochazka

[19] Exploring elasticsearch http://exploringelasticsearch.com/

[20] Wikipedia http://www.wikipedia.org/