kd-tree - ijsetr.com journal.docx  · web viewthe word image is also used in the ... cpu with...

8
International Journal of Science, Engineering and Technology Research (IJSETR) Volume 1, Issue 1, May 2014 KD-Tree based Information Retrieval System for Immigration and National Registration Department Myat Su Mon, Aung Myint Aye Department of Information Technology Mandalay Technology University [email protected] AbstractA KD-Tree, or k-dimensional tree, is a data structure used in computer science for organizing some number of points in a space with k dimensions. It is a binary search tree with other constraints imposed on it. KD-Tree is very useful for range and nearest neighbor searches. Each level of a KD-Tree splits all children along a specific dimension, using a hyper plane that is perpendicular to the corresponding axis. At the root of the tree all children will be split based on the first dimension (i.e. if the first dimension coordinate is less than the root it will be in the left-sub tree and if it is greater than the root it will obviously be in the right sub-tree). Each level down in the tree divides on the next dimension, returning to the first dimension once all others have been exhausted. KD-Tree is should be used in information retrieval system. This system used KD-Tree by using image in searching. In the system, the user can search information by using an image. This system can also reduce the searching time in the large number of information. SQL database is used to store information and images. The proposed system is implemented by using C#.Net programming language. Index Terms—Information Retrieval System, Immigration, National registration department, Image, KD-Tree, Pixels I. INTRODUCTION Information retrieval is the activity of obtaining information resources relevant to an information need from a collection of information resources. Searches can be based on metadata or on full-text (or other content-based) indexing. Automated information retrieval systems are used to reduce what has been called "information overload". Many immigration and national registration department use IR systems to provide access to people information such as name, date of birth, occupation, NRC, address, history of people. Immigration is the movement of people into another country or region to which they are not native in order to settle there, especially permanently. Immigration is a result of a number of factors, including economic and/or political reasons, family re- unification, natural disasters or the wish to change one's surroundings voluntarily. National registration department generates national identification number (NRC) that is used by the governments of many countries as a means of tracking their citizens, permanent residents, and temporary residents for the purposes of work, taxation, government benefits, health care, and other governmentally- related functions. The number will appear on an identity document issued by a country. The ways in which such a system is implemented are dependent on the country, but in most cases, a citizen is issued an identification 1 All Rights Reserved © 2014 IJSETR

Upload: lamkhuong

Post on 06-Feb-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: KD-Tree - ijsetr.com Journal.docx  · Web viewThe word image is also used in the ... CPU with Microsoft C++ compiler and ... use a partition method like the one Quick Sort uses to

International Journal of Science, Engineering and Technology Research (IJSETR)Volume 1, Issue 1, May 2014

KD-Tree based Information Retrieval System for Immigration and National Registration

Department

Myat Su Mon, Aung Myint AyeDepartment of Information Technology

Mandalay Technology [email protected]

Abstract—A KD-Tree, or k-dimensional tree, is a data structure used in computer science for organizing some number of points in a space with k dimensions. It is a binary search tree with other constraints imposed on it. KD-Tree is very useful for range and nearest neighbor searches. Each level of a KD-Tree splits all children along a specific dimension, using a hyper plane that is perpendicular to the corresponding axis. At the root of the tree all children will be split based on the first dimension (i.e. if the first dimension coordinate is less than the root it will be in the left-sub tree and if it is greater than the root it will obviously be in the right sub-tree). Each level down in the tree divides on the next dimension, returning to the first dimension once all others have been exhausted. KD-Tree is should be used in information retrieval system. This system used KD-Tree by using image in searching. In the system, the user can search information by using an image. This system can also reduce the searching time in the large number of information. SQL database is used to store information and images. The proposed system is implemented by using C#.Net programming language.

Index Terms—Information Retrieval System, Immigration, National registration department, Image, KD-Tree, Pixels

I. INTRODUCTION

Information retrieval is the activity of obtaining information resources relevant to an information need from a collection of information resources. Searches can be based on metadata or on full-text (or other content-based) indexing. Automated information retrieval systems are used to reduce what has been called "information overload". Many immigration and national registration department use IR systems to provide access to people information such as name, date of birth, occupation, NRC, address, history of people.

Immigration is the movement of people into another country or region to which they are not native in order to settle there, especially permanently. Immigration is a result of a number of factors, including economic and/or political reasons, family re-unification, natural disasters or the wish to change one's surroundings voluntarily.

National registration department generates national identification number (NRC) that is used by the governments of many countries as a means of tracking their citizens, permanent residents, and temporary residents for the purposes of work, taxation, government benefits, health care, and other governmentally-related functions. The number will appear on an identity document issued by a

country. The ways in which such a system is implemented are dependent on the country, but in most cases, a citizen is issued an identification number at birth or when they reach a legal age (typically the age of 18). Non-citizens may be issued such numbers when they enter the country, or when granted a temporary or permanent residence permit. In national registration department, images are used to store people information. KD-Tree is support in information searching and retrieval using images in national registration department.

An image (from Latin: imago) is an artifact that depicts or records visual perception, for example a two-dimensional picture, that has a similar appearance to some subject – usually a physical object or a person, thus providing a depiction of it. Images may be two-dimensional, such as a photograph, screen display, and as well as a three-dimensional, such as a statue or hologram. They may be captured by optical devices – such as cameras, mirrors, lenses, telescopes, microscopes, etc. and natural objects and phenomena, such as the human eye or water surfaces. The word image is also used in the broader sense of any two-dimensional figure such as a map, a graph, a pie chart, or an abstract painting. In this wider sense, images can also be rendered manually, such as by drawing, painting, carving, rendered automatically by printing or computer graphics technology, or developed by a combination of methods, especially in a pseudo-photograph.

This paper is organized as follows. Section 2 presents the related works. Section 3 briefly introduces KD-Tree in details. Section 4 describes the architecture of the proposed system, whereas section 5 draws conclusions.

II. RELATED WORKS

The KD-Tree is a binary tree [1] in which every node is a k-dimensional point. Every non-leaf node can be thought of as implicitly generating a splitting hyper plane that divides the space into two parts, known as half-spaces. Points to the left of this hyper plane are represented by the left sub tree of that node and points right of the hyper plane are represented by the right sub tree [2]. The hyper plane direction is chosen in the following way: every node in the tree is associated with one of the k-dimensions, with the hyper plane perpendicular to that dimension's axis. So, for example, if for a particular split the "x" axis is chosen, all points in the sub tree with a smaller "x" value than the node will appear in the left sub tree and all points with larger "x" value will be in the right sub tree.

1All Rights Reserved © 2014 IJSETR

Page 2: KD-Tree - ijsetr.com Journal.docx  · Web viewThe word image is also used in the ... CPU with Microsoft C++ compiler and ... use a partition method like the one Quick Sort uses to

International Journal of Science, Engineering and Technology Research (IJSETR)Volume 1, Issue 1, May 2014

In such a case, the hyper plane would be set by the x-value of the point, and its normal would be the unit x-axis [3]. KD-Trees allow [4] to efficiently perform searches like "all points at distance lower than R from X" or "k nearest neighbors of X". When processing such query, we find a leaf which corresponds to X. Then we process points which are stored in that leaf, and then we start to scan nearby leaf. At some point we may notice that distance from X to the leaf is higher than the worst point found so far [5]. It is time to stop search, because next leafs won't improve search results. Such algorithm is good for searches in low-dimensional spaces. However, its efficiency decreases as dimensionality grows, and in high-dimensional spaces KD-Trees give no performance over naive O(N) linear search (although continue to give correct results)[6].

In order to estimate performance of ALGLIB implementation of KD-Trees we've conducted a series of numerical experiments. Experiments were performed on AMD Phonon II X6 3.2GHz CPU with Microsoft C++ compiler and maximum optimization settings [7]. During experiments we've generated N=50.000 points, uniformly and randomly distributed across D-dimensional unit hypercube. Then we performed 50.000 queries for K nearest neighbors.

III. BACKGROUND THEORY

This section briefly describes the concepts of KD-Tree.

A. KD-TreeA KD-Tree, or k-dimensional tree, is a data structure used

in computer science for organizing some number of points in a space with k dimensions. It is a binary search tree with other constraints imposed on it. KD-Trees are very useful for range and nearest neighbour searches. For our purposes we will generally only be dealing with point clouds in three dimensions, so all of our k-d trees will be three-dimensional. Each level of a KD-Tree splits all children along a specific dimension, using a hyper plane that is perpendicular to the corresponding axis. At the root of the tree all children will be split based on the first dimension (i.e. if the first dimension coordinate is less than the root it will be in the left-sub tree and if it is greater than the root it will obviously be in the right sub-tree). Each level down in the tree divides on the next dimension, returning to the first dimension once all others have been exhausted. They most efficient way to build a KD-Tree is to use a partition method like the one Quick Sort uses to place the median point at the root and everything with a smaller one dimensional value to the left and larger to the right. You then repeat this procedure on both the left and right sub-trees until the last trees that you are to partition are only composed of one element.

Figure 1.Example of a 2-dimensional KD-Tree.

Figure 2. Demonstration of hour the Nearest-Neighbour search works.

B. Adding elements in KD-TreeOne adds a new point to a KD-Tree in the same way as

one adds an element to any other search tree. First, traverse the tree, starting from the root and moving to either the left or the right child depending on whether the point to be inserted is on the "left" or "right" side of the splitting plane. Once you get to the node under which the child should be located, add the new point as either the left or right child of the leaf node, again depending on which side of the node's splitting plane contains the new node.

Adding points in this manner can cause the tree to become unbalanced, leading to decreased tree performance. The rate of tree performance degradation is dependent upon the spatial distribution of tree points being added, and the number of points added in relation to the tree size. If a tree becomes too unbalanced, it may need to be re-balanced to restore the performance of queries that rely on the tree balancing, such as nearest neighbor searching.

2All Rights Reserved © 2014 IJSETR

Page 3: KD-Tree - ijsetr.com Journal.docx  · Web viewThe word image is also used in the ... CPU with Microsoft C++ compiler and ... use a partition method like the one Quick Sort uses to

International Journal of Science, Engineering and Technology Research (IJSETR)Volume 1, Issue 1, May 2014

Figure 3.KD-Tree decomposition for the point set (2,3), (5,4), (9,6), (4,7), (8,1), (7,2).

C. Removing elements in KD-TreeTo remove a point from an existing KD-Tree, without

breaking the invariant, the easiest way is to form the set of all nodes and leaves from the children of the target node, and recreate that part of the tree.

Another approach is to find a replacement for the point removed. First, find the node R that contains the point to be removed. For the base case where R is a leaf node, no replacement is required. For the general case, find a replacement point, say p, from the sub tree rooted at R. Replace the point stored at R with p. Then, recursively remove p. For finding a replacement point, if R discriminates on x (say) and R has a right child, find the point with the minimum x value from the sub tree rooted at the right child. Otherwise, find the point with the maximum x value from the sub tree rooted at the left child.

Figure 4. Resulting KD-Tree.

D. KD-Tree nearest Neighbor SearchThe KD-tree is one of the most widely used structures

searching for nearest neighbors. KD-tree is a kind of binary tree in which every node is a k-dimensional point. KD-tree algorithm defines an image as nodes and then searches for the nearest neighbor. The nearest neighbor search aims to find the point in the tree that is nearest to a given input point. This search can be done efficiently by using the tree properties to quickly eliminate large portions of the search space.

Searching for a nearest neighboring a KD-tree precedes as follows:

Starting with the root node, the algorithm moves down the tree recursively, in the same way that it

would if the search points were being inserted (i.e. it goes left or right depending on whether the point is less than or greater than the current node in the split dimension).

Once the algorithm reaches a leaf node, it saves that node point as the "current best"

The algorithm unwinds the recursion of the tree, performing the following steps at each node:o If the current node is closer than the current

best, then it becomes the current best.o The algorithm checks whether there could be

any points on the other side of the splitting plane that are closer to the search point than the current best. In concept, this is done by intersecting the splitting hyper plane with a hyper sphere around the search point that has a radius equal to the current nearest distance. Since the hyper planes are all axis-aligned this is implemented as a simple comparison to see whether the difference between the splitting coordinate of the search point and current node is less than the distance (overall coordinates) from the search point to the current best.

o If the hyper sphere crosses the plane, there could be nearer points on the other side of the plane, so the algorithm must move down the other branch of the tree from the current node looking for closer points, following the same recursive process as the entire search.

o If the hyper sphere doesn't intersect the splitting plane, then the algorithm continues walking up the tree, and the entire branch on the other side of that node is eliminated.

When the algorithm finishes this process for the root node, then the search is complete.

The following figure shows an example of a KD-tree that consists of four leaf nodes labeled A, B, C and D.

Figure 5.Sample Node A and B.

Figure 6.Sample Node C and D.

The following algorithm describes about search nearest KD-Tree.

p=Convert_pixels(i); // p is pixel array.SearchNearestNeighbor(p)

3All Rights Reserved © 2014 IJSETR

Page 4: KD-Tree - ijsetr.com Journal.docx  · Web viewThe word image is also used in the ... CPU with Microsoft C++ compiler and ... use a partition method like the one Quick Sort uses to

International Journal of Science, Engineering and Technology Research (IJSETR)Volume 1, Issue 1, May 2014

{accurency = 0;samepixel = 0;totalpixel = p.length;if (inputimage(distance(p)) == databaseimage(distance(p)) {

if (inputimage(p) == databaseimage(p)){samepixel+=1;if (p has leftson)

SearchNearestNeighbor(leftson);if (p has rightson)

SearchNearestNeightbor(rightson); }

}accurency = (samepixel/totalpixel) x 100%;}

In this paper, tree based KD-Tree method is used to search image from database, where each node denotes a subset of pixels. When an image is incoming as input, KD-Tree checks first pixel and then checks the next pixel. If these pixels are same, KD-Tree checks children (x and y) of this pixel, if not; it checks its neighbor pixel.

Each image in image collection is searched by KD-Tree using incoming image. KD-Tree is faster in computation time because it constructs tree node before pixel checking.

IV. DESIGN AND IMPLEMENTATION OF THE KD-TREE BASED INFORMATION RETRIEVAL SYSTEM

A. Design of the Proposed SystemIn this system, the information record includes name of

people, date of birth, occupation, NRC number, address and history of people. In this system, users can search and find to relevant information by using image. KD-tree is search input image in national registration department database, and then show information that related to input image.

The design of the KD-Tree information retrieval system is shown in Figure 4. KD-Tree method searches nearest image and information by comparing sizes of images. If the size of input image and stored image are not equal, it needs to resize the input image. If the size of input image and stored image are equal, it does not need to resize the input image. Then, input image and stored image are compared by using KD-Tree. According to architecture design, the system works the following procedures.

First, KD-tree constructs tree structure to input image pixels as a line in computation.

Second, KD-tree compares the x value of a pixel for the value of the root at the tree.

Third, pixel compares the y value of the point for searching and descending into the left sub-tree of the node as x values.

The whole pixels in input image find such ways. If same image is found, outputs of the system are

information and image for people.

Figure 7. Architecture Design of System

This architecture design of system is implementing for Immigration and National Registration Department. This architecture is tends to fast in searching information from database.

B. ImplementationThe proposed system is implemented, and is described in

the following figures. The system is especially implemented KD-Tree for people searching. This system is implemented using C# programming language.

The following form describes main form of the system. Main form includes buttons such as input information, search by image, and search by data, and, view database.

Figure 8. Main form of the system

The following form is the input information form. In input information form, the user can insert, update, and, delete the data for Immigration and National Registration Department.

4All Rights Reserved © 2014 IJSETR

Page 5: KD-Tree - ijsetr.com Journal.docx  · Web viewThe word image is also used in the ... CPU with Microsoft C++ compiler and ... use a partition method like the one Quick Sort uses to

International Journal of Science, Engineering and Technology Research (IJSETR)Volume 1, Issue 1, May 2014

Figure 9. Data Entry form for People

The following form is the search form by using images. Immigration and National Registration Department stored many images and information in computer. The user can search in much information by using image.

Figure 10. Search by Image form

The following form is the search form by using name and NRC. Immigration and National Registration Department stored many images and information in computer. The user can search information by using Name or NRC or both.

Figure 11.Search by Information form

This system used Microsoft SQL server to store images and information. The SQL server is database that used to

store information. The SQL server view is describes in the following figure.

Figure 12.Database of Immigration and National Registration Department.

V. CONCLUSION

This research focuses on the searching people information by using KD-Tree for Immigration and National Registration Department. The KD-Tree gives faster than processing time of other searching algorithm. The system searches information when the user inputting image. Output accuracy of the system may depend on the stored images in database. If the input image gives the system, the system checks this image in database. If the system is found, the accuracy may become nearly 90% to 100%. The system shows that the overall retrieval performance from database on what is believed to be a challenging dataset holds up well for very large image collections.

ACKNOWLEDGMENTFirst of all, the author is highly grateful to Dr. Myint

Thein, the Pro-Rector of the Mandalay Technological University for his permission for completion of this paper. The author would like to express deepest gratitude and special thanks to her supervisor, Dr. Aung Myint Aye, Associate Professor and Head, Department of Information Technology, Mandalay Technological University, for his supervising, enthusiastic and suggestion. The author would like to thank to all teachers from the Department of Information Technology, Mandalay Technological University, who give suggestions and advices for submission of paper.

REFERENCES[1] H. M. Kakde. Range searching using kd tree.2005range

searching using kd tree.

[2] Chandran, sharat. Introduction to kd-trees. University of maryland department of computer science.

[3] Friedman, jerome h., bentley, jon louis, finkel, raphael ari "an algorithm for finding best matches in logarithmic expected time". Acm trans. Math. Softw. Issn 0098-3500. Retrieved 29 march 2013.

5All Rights Reserved © 2014 IJSETR

Page 6: KD-Tree - ijsetr.com Journal.docx  · Web viewThe word image is also used in the ... CPU with Microsoft C++ compiler and ... use a partition method like the one Quick Sort uses to

International Journal of Science, Engineering and Technology Research (IJSETR)Volume 1, Issue 1, May 2014

[4] Lee, d. T.; wong, c. K. (1977). "worst-case analysis for region and partial region searches in multidimensional binary search trees and balanced quad trees".

[5] Jacob e. Goodman, joseph o'rourke and piotr indyk (ed.). "chapter 39 : nearest neighbours in high-dimensional spaces". Handbook of discrete and computational geometry (2nd ed.). Crc press.

[6] Cormen, thomas h.; leiserson, charles e., rivest, ronald l.. Introduction to algorithms. Mit press and mcgraw-hill

[7] Rosenberg j. Geographical data structures compared: a study of data structures supporting region queries. Ieee transaction on cad integrated circuits systems .

6All Rights Reserved © 2014 IJSETR