international journal of informative & futuristic research...

502 www.ijifr.com

Copyright © IJIFR 2014

Reviewed Paper

International Journal of Informative & Futuristic Research ISSN (Online): 2347-1697

Volume 2 Issue 3 November 2014

Narayan Bhausaheb Vikhe1

Department of Computer Engineering,

Sir Visvesvaraya Institute of Technology,

Nashik - Maharashtra

Prof. Mrs. M.M.Naoghare2

Department of Computer Engineering,

Sir Visvesvaraya Institute of Technology,

Nashik - Maharashtra

Abstract

In this article, we have discussed about Query languages are computer

languages used to make queries into databases and information systems. The

difference is that a database QueRIE language attempts to give factual

answers to factual questions, while an information retrieval query language

attempts to find documents containing information that is relevant to an

area of inquiry. Database management systems (DBMSs) are computer

software applications that interact with the user, other applications, and the

database itself to capture and analyze data. The data are typically organized

to model aspects of reality in a way that supports processes requiring

information. Database management systems are often classified according

to the database model that they support.

1. Introduction

Database systems provide the critical infrastructure to access and analyze large volumes of data in a

variety of applications. Prominent examples include large-scale data warehouses that support

business-intelligence tools, systems for ad-hoc analytics over big data, and services for scientific-data

exploration, such as the Genome browser 1 or SkyServer 2, which allow scientists to QueRIE large

databases of scientific data over a web-enabled interface. Relational database users employ a QueRIE

interface (typically, a web-based client) to issue a series of SQL queries that aim to analyse the data

and mine it for interesting information. First-time users may not have the necessary knowledge to

QueRIE : System for Personalized Query Recommendation

Paper ID IJIFR/ V2/ E3/ 007 Page No. 502- 507 Subject Area Computer Engineering

Key Words Data Mining, Metadata, Database, Sky Server, Ad-Hoc, Cryptography, QueRIE

503

ISSN (Online): 2347-1697 International Journal of Informative & Futuristic Research (IJIFR)

Volume 2, Issue 3, November 2014 15th Edition, Page No: 502-507

Narayan Bhausaheb Vikhe*, Prof. Mrs. M.M.Naoghare**: QueRIE - System for Personalized Query Recommendation

know where to start their exploration. Other times, users may simply overlook queries that retrieve

important information. In this work we describe a framework to assist non-expert users by providing

personalized QueRIE recommendations.

QueRIE is built on a simple premise that is inspired by Web recommender systems: If users A

and B have posed similar queries, then the other queries of B may be of interest to user A and vice

versa. In other words, we can recommend the queries of user B in order to help user A in their

exploration of the database. A database is an organized collection of data. A general-purpose DBMS

is designed to allow the definition, creation, querying, update, and administration of databases. Well-

known DBMSs include MySQL, Oracle, PostgreSQL and so many. A database is not generally

portable across different DBMSs, but different DBMSs can interoperate by using standards such as

SQL and ODBC or JDBC to allow a single application to work with more than one DBMS for

example, modelling the availability of rooms in hotels in a way that supports finding a hotel with

vacancies. DAS in addition had the purposes of immigration control including issuing visas Such as

fragment/splattered structures, random sampling, index hash table and Advanced Cryptographic

Techniques (ACT) to avoid the information loss and corrupt the Meta data, etc

2. Justification Of Problem

The tuple -based approach constructs large (and relatively dense) summaries and, most importantly,

requires real-time calculations of the similarities between the session summary S 0 of the current user

and these of past users. The fragment-based approach clearly captures information at a coarser level

of detail, and hence it is expected to miss interesting correlations between users.

Figure1: QueRIE Tactics/ Overview

3 Review of Literature

• Hive - A petabyte scale data warehouse using HADOOP (A. Thusoo et al.): says that

HADOOP is a popular open-source map-reduce implementation which is being used in companies

like Yahoo, Facebook etc. to store and process extremely large data sets on commodity hardware.

However, the map-reduce programming model is very low level and requires developers to write

custom programs which are hard to maintain and reuse. Hive, an open-source data warehousing

solution built on top of HADOOP. Hive supports queries expressed in a SQL-like declarative

language - HiveQL, which are compiled into map-reduce jobs that are executed using HADOOP.

• QueRIE: A recommender system supporting interactive database exploration (S. Mittal, J. S.

V. Varman): mentioned that the demonstration presents QueRIE, a recommender system that supports

504




interactive database exploration. This system aims at assisting non-expert users of scientific databases

by generating personalized query recommendations. Drawing inspiration from Web recommender

systems, QueRIE tracks the querying behavior of each user and identifies potentially “interesting”

parts of the database related to the corresponding data analysis task by locating those database parts

that were accessed by similar users in the past. It then generates and recommends the queries that

cover those parts to the user.

• Amazon.com recommendations: Item-to-item collaborative filtering (G. Linden, B. Smith, and

J. York) wrote that we can use recommendation algorithms to personalize the online store for each

customer. The store radically changes based on customer interests, showing programming titles to a

software engineer and baby toys to a new mother. There are three common approaches to solving the

recommendation problem: traditional collaborative filtering, cluster models, and search-based

methods. Here, we compare these methods with our algorithm, which we call item-to-item

collaborative filtering. Unlike traditional collaborative filtering, our algorithm's online computation

scales independently of the number of customers and number of items in the product catalog. Our

algorithm produces recommendations in real-time, scales to massive data sets, and generates high

quality recommendations.

• Personalized queries under a generalized preference model (G. Koutrika and Y. Ioannidis)

find the solution for problem that we face regularly that we present a preference model that combines

expressivity and concision. In addition, we provide efficient algorithms for the selection of

preferences related to a QueRIE, and an algorithm for the progressive generation of personalized

results, which are ranked based on user interest. Several classes of ranking functions are provided for

this purpose. We present results of experiments both synthetic and with real users (a) demonstrating

the efficiency of our algorithms, (b) showing the benefits of QueRIE personalization, and (c)

providing insight as to the appropriateness of the proposed ranking functions.

4 Methodology

The proposed methodology includes the following steps:

1. Kullback–Leibler divergence Theorem:

ie For discrete probability distributions P and Q, the KL divergence of Q from P is defined

to be

In words, it is the expectation of the logarithmic difference between the probabilities P and

Q, where the expectation is taken using the probabilities P. The KL divergence is only defined if

P and Q both sum to 1 and if implies for all i (absolute continuity). If

the quantity appears in the formula, it is interpreted as zero because

2. Agglomerative Clustering Algorithm:

Agglomerative: This is a "bottom up" approach: each observation starts in its own cluster,

and pairs of clusters are merged as one moves up the hierarchy.

505




Divisive: This is a "top down" approach: all observations start in one cluster, and splits are

performed recursively as one moves down the hierarchy

Figure 2: Agglomerative & Divisive System

The methodology is based on KL divergence is a special case of a broader class of divergences called

f-divergences. It was originally introduced by Solomon Kullback and Richard Leibler in 1951 as the

directed divergence between two distributions. It can be derived from a Bregman divergence.

Agglomerative Clustering Algorithm forms clusters in a bottom-up manner.

5 Understanding Database System

A database is an organized collection of data. Formally, "database" refers to the data themselves and

supporting data structures. Databases are created to operate large quantities of information by

inputting, storing, retrieving and managing that information. Databases are set up so that one set of

software programs provides all users with access to all the data

Fig. Database system

Working on different types & from different places database To ensure that the distributive

databases are up to date and current, there are two processes: replication and duplication. Replication

involves using specialized software that looks for changes in the distributive database. t may be stored

in multiple computers located in the same physical location, or may be dispersed over a network of

interconnected computers.

506




Figure 3: Distributed database

Figure 4. System Architecture

6 Conclusion & future Work

There are many interesting directions we would like to explore in the future. We would like to

measure the impact the QueRIE relaxation process has in the quality of recommendations. Exploring a

sequence-based approach is another interesting direction for future work, but it requires

a careful reconsideration of several aspects of our framework. For instance, pure sequence

information may not be sufficient to discover user similarities. Instead, we may have to consider the

relative changes between queries in the sequence, e.g., that selection predicates becomes more

selective as queries progress, in order to properly detect similarities. We also plan to focus on

relational databases that have a form-based interface. While the fragment-based approach seems as a

507




straightforward selection for such environments, new challenges related to the formulation of session

similarity, the synthesis of recommendations and their presentation arise. Finally, as we aim at

developing a more generic and scalable system, we are currently working on integrating alternative

techniques for generating recommendations, such as matrix factorization methods.

References

[1] A. Thusoo et al., “Hive - A petabyte scale data warehouse using hadoop,” in Proc. IEEE 26th ICDE, Long

Beach, CA, USA, Mar. 2010, pp. 996–1005.

[2] G. Chatzopoulou, M. Eirinaki, and N. Polyzotis, “Collaborative filtering for interactive database

exploration,” in Proc. 21st Int. Conf. SSDBM, New Orleans, LA, USA, 2009, pp. 3–18.

[3] S. Mittal, J. S. V. Varman, G. Chatzopoulou, M. Eirinaki, and N. Polyzotis, “QueRIE: A recommender

system supporting interactive database exploration,” in Proc. IEEE ICDM, Sydney, NSW, Australia, 2010.

[4] J. Akbarnejad et al., “SQL QueRIE recommendations,” PVLDB, vol. 3, no. 2, pp. 1597–1600, 2010.

[5] N. Alon, Y. Matias, and M. Szegedy, “The space complexity of approximating the frequency moments,” in

Proc. 28th STOC, New York, NY, USA, 1996.

[6] E. Cohen, “Size-estimation framework with applications to tran- sitive closure and reachability,” J. Comput.

Syst. Sci., vol. 55, no. 3, pp. 441–453, 1997.

[7] G. Linden, B. Smith, and J. York, “Amazon.com recommendations: Item-to-item collaborative filtering,”

IEEE Internet Comput., vol. 7, no. 1, pp. 76–80, Jan./Feb. 2003.

[8] N. Koudas, C. Li, A. K. H. Tung, and R. Vernica, “Relaxing join and selection queries,” in Proc. 33nd Int.

Conf. VLDB, Seoul, Korea, 2006, pp. 199–210.

[9] Kullback(1959), Information Theory and Statistics, Dover Press. ISBN 0-486-69684-7.

[10] Burnham, K. P. and Anderson D. R. (2002) Model Selection and Multimodel Inference: A Practical

Information-Theoretic Approach, Second Edition (Springer Science, New York) ISBN 978-0-387-95364-9.

[11] Database Systems: The Complete Book. Hector Garcia-Molina, Jeffrey D. Ullman, Jennifer D. Widom

[12] Jeffrey Ullman 1997: First course in database systems, Prentice–Hall Inc., Simon & Schuster, Page 1,

ISBN 0-13-861337-0.

[13] Tsitchizris, D. C. and F. H. Lochovsky (1982). Data Models. Englewood-Cliffs, Prentice–Hall.

[14] Beynon-Davies P. (2004). Database Systems 3rd Edition. Palgrave, Basingstoke, UK. ISBN 1-4039-1601-2

[15] Raul F. Chong, Michael Dang, Dwaine R. Snow, Xiaomei Wang (3 July 2008). "Introduction to DB2".

Retrieved 17 March 2013.. This article quotes a development time of 5 years involving 750 people for DB2

release 9 alone

[16] C. W. Bachmann (November 1973), The Programmer as Navigator, CACM (Turing Award Lecture 1973)

[17] "DB-Engines Ranking". January 2013. Retrieved 22 January 2013.

[18] Proctor, Seth (2013). "Exploring the Architecture of the NuoDB Database, Part 1". Retrieved 2013-07-12.

[19] "TeleCommunication Systems Signs up as a Reseller of TimesTen; Mobile Operators and Carriers Gain

Real-Time Platform for Location-Based Services". Business Wire. 2002-06-24.

[20] "database, n". OED Online. Oxford University Press. June 2013. Retrieved July 12, 2013.

[21] IBM Corporation. "IBM Information Management System (IMS) 13 Transaction and Database Servers

delivers high performance and low total cost of ownership". Retrieved Feb 20, 2014.

[22] Ken North, "Sets, Data Models and Data Independence", Dr. Dobb's, 10 March 2010

[23] www.google.com

[24]www.wikipedia.com

international journal of informative & futuristic research...

Documents