quranic search engine

4
Sci.Int(Lahore),26(1),181-184,2014 ISSN 1013-5316; CODEN: SINTE 8 181 AN ESSENTIAL FRAMEWORK FOR CONCEPT BASED EVOLUTIONARY QURANIC SEARCH ENGINE (CEQSE) Syed Ali Raza 1* , Muhammad Rehan 1 , Amjad Farooq 2 , S. M. Ahsan 2 , M. Saleem Khan 1 1 Department of Computer Science, GC University Lahore 2 Department of Computer Science, UET Lahore *Corresponding Author: [email protected] ABSTRACT: The Holy Quran has affected the lives of Muslim nation and it is among one of the most reading books. Despite the fact that the Quran is recited heavily overall on the globe yet there is less concentration on Quranic search. Currently available models exploit keyword based searching. Currently available models exploit keyword based searching which are not only less efficient as well as keyword based searching techniques does not search Quranic concept accurately. This research paper, addresses the deficiencies of keyword based searching and the issues related to semantic search in the Holy Quran, and propose a model that is capable of performing semantic search. INTRODUCTION: The Holy Quran is most sacred scripture among Muslim nation and is an ultimate source of information and assortment of diverse knowledge and dissimilar subjects. It discusses almost all fields of life and provide basics for all areas of knowledge. Neutrally, today on earth one of every five people is Muslim [1]. Therefore, significance of understanding the Holy Quran for every Muslim as well as for those scholars who are interested in the study of man and society is very high. In view of the fact that Holy Quran has been effectively influential not only in molding the destinies of Islamic societies, but also in changing the destiny of mankind as a whole [2]. Therefore, understanding the concepts of the Quran is of paramount significance if one wishes to study this book comprehensively. The Holy Quran has its own style of describing different concepts which is unique in many ways. Generally a concept has been discussed in different chapters. For example, the concept of Hell is discussed in various chapters and similarly the oneness of the Almighty has been discussed throughout the Holy Quran. It is also possible that one verse may contain more than one theme. For example, Verse 40 of the chapter 76 contains only seven words having 5 different concepts in it. Such that the first concept is we (Allah) have warned you (Human); second one is we (Allah) have warned of chastisement; third one is chastisement Is near at hand; forth is Man shall see (in Qayamat) what his(human) two hands have sent before and the last one is Unbeliever shall say (in Qayamat) I were dust. One underlying point in these verses is that the word Allah and Qayamat have never been used in this verse but the context reveals what is being said. One other unique style of Quran is that one term has been used in many different styles depending on the context. For example, Muhammad is used as Ahmad, Mudhathir, Muzammil, Mubashir, Nazeer and Heaven is used as The Garden & Paradise etc. A term may also be used in different meanings. The disambiguation between meanings depends on the context in which term is being used. Even though The Quran is recited heavily overall on the globe yet there is less concentration on searching the Quranic concepts digitally. Currently available models exploit keyword based searching, where statistical and keyword-based techniques have achieved some success in data mining and information retrieval systems [3]. Despite this fact such systems are not only less efficient as well as keyword based searching techniques have several limitations in connection of quality of search results and overall usage of systems [3]. Therefore there is an immense need of building an intelligent tool to assist readers of the Quran to search most relevant and effective results for better understanding of underlying concepts. In this regard massive amount of semantic knowledge is required to continue progress in textual-information management; for this the tool should be embodied with the capacity of profound and deep understanding of meanings. RELATED WORK: Currently numerous tools exploiting keyword based searching are available in digital format [4-8]. These Quranic software’s and databases are providing searching of the Holy Quran in form of audio, video or text files. A chatbot was developed by Abu-Shawar and Atwell [4] in 2007 for Holy Quran. This chatbot is good at answering the questions from the Quran but have no capability to understand the input. All it can do is to try to find most considerable words in the question, and then perform simple keyword based searching to find relevant verses from the Quran. This is essentially an extension of keyword-search, the user can type in a question as a full sentence, rather than just some keywords, but the system still in effect performs keyword-searches. The Search Truth Quran search tool [5] allows users to search the Quran using many translations at a time such as Mohsin khan, Yousaf ali, Shakir, Pickthal etc. This tool also allows phonetic search. Main drawback of this system is that it does not search for the exact match of the word but rather if the word (to be searched for) is part of any word in the Quran. For instance, if you search for the word ‘ship’, all the verses that contain the words worship, friendship etc will be retrieved. The Guided Ways Quran search tool [6], allows users to search the Quran using many translations at a time such as Pickthal, Mohsin Khan, Yousaf Ali, Shakir, Ahmad Ali, Jhalandhary etc as in [5]. User can choose one or more Quranic translation used in the search process as well as this search can be performed for different languages. It searches the Quran for an exact match of the input word. The IslamiCity search tool [7], searches the Holy Quran using many translations at a time. When a user inputs a word for searching the tool matches it against identical matches or partial matches (part of the word). The USC Quran search tool [8] searches the Quran using three English translations; (Yousaf Ali, Pickthall, Shakir). It matches exact words only.

Upload: mashhood

Post on 12-Apr-2016

7 views

Category:

Documents


0 download

DESCRIPTION

nice

TRANSCRIPT

Page 1: Quranic Search Engine

Sci.Int(Lahore),26(1),181-184,2014 ISSN 1013-5316; CODEN: SINTE 8

 

181 

AN ESSENTIAL FRAMEWORK FOR CONCEPT BASED EVOLUTIONARY QURANIC SEARCH ENGINE (CEQSE)

Syed Ali Raza1*, Muhammad Rehan1, Amjad Farooq2, S. M. Ahsan2, M. Saleem Khan1 1Department of Computer Science, GC University Lahore

2Department of Computer Science, UET Lahore *Corresponding Author: [email protected]

ABSTRACT: The Holy Quran has affected the lives of Muslim nation and it is among one of the most reading books. Despite the fact that the Quran is recited heavily overall on the globe yet there is less concentration on Quranic search. Currently available models exploit keyword based searching. Currently available models exploit keyword based searching which are not only less efficient as well as keyword based searching techniques does not search Quranic concept accurately. This research paper, addresses the deficiencies of keyword based searching and the issues related to semantic search in the Holy Quran, and propose a model that is capable of performing semantic search.

INTRODUCTION: The Holy Quran is most sacred scripture among Muslim nation and is an ultimate source of information and assortment of diverse knowledge and dissimilar subjects. It discusses almost all fields of life and provide basics for all areas of knowledge. Neutrally, today on earth one of every five people is Muslim [1]. Therefore, significance of understanding the Holy Quran for every Muslim as well as for those scholars who are interested in the study of man and society is very high. In view of the fact that Holy Quran has been effectively influential not only in molding the destinies of Islamic societies, but also in changing the destiny of mankind as a whole [2]. Therefore, understanding the concepts of the Quran is of paramount significance if one wishes to study this book comprehensively. The Holy Quran has its own style of describing different concepts which is unique in many ways. Generally a concept has been discussed in different chapters. For example, the concept of Hell is discussed in various chapters and similarly the oneness of the Almighty has been discussed throughout the Holy Quran. It is also possible that one verse may contain more than one theme. For example, Verse 40 of the chapter 76 contains only seven words having 5 different concepts in it. Such that the first concept is we (Allah) have warned you (Human); second one is we (Allah) have warned of chastisement; third one is chastisement Is near at hand; forth is Man shall see (in Qayamat) what his(human) two hands have sent before and the last one is Unbeliever shall say (in Qayamat) I were dust. One underlying point in these verses is that the word Allah and Qayamat have never been used in this verse but the context reveals what is being said. One other unique style of Quran is that one term has been used in many different styles depending on the context. For example, Muhammad is used as Ahmad, Mudhathir, Muzammil, Mubashir, Nazeer and Heaven is used as The Garden & Paradise etc. A term may also be used in different meanings. The disambiguation between meanings depends on the context in which term is being used. Even though The Quran is recited heavily overall on the globe yet there is less concentration on searching the Quranic concepts digitally. Currently available models exploit keyword based searching, where statistical and keyword-based techniques have achieved some success in data mining and information retrieval systems [3]. Despite this fact such systems are not only less efficient as well as keyword based searching techniques have several limitations

in connection of quality of search results and overall usage of systems [3]. Therefore there is an immense need of building an intelligent tool to assist readers of the Quran to search most relevant and effective results for better understanding of underlying concepts. In this regard massive amount of semantic knowledge is required to continue progress in textual-information management; for this the tool should be embodied with the capacity of profound and deep understanding of meanings. RELATED WORK: Currently numerous tools exploiting keyword based searching are available in digital format [4-8]. These Quranic software’s and databases are providing searching of the Holy Quran in form of audio, video or text files. A chatbot was developed by Abu-Shawar and Atwell [4] in 2007 for Holy Quran. This chatbot is good at answering the questions from the Quran but have no capability to understand the input. All it can do is to try to find most considerable words in the question, and then perform simple keyword based searching to find relevant verses from the Quran. This is essentially an extension of keyword-search, the user can type in a question as a full sentence, rather than just some keywords, but the system still in effect performs keyword-searches. The Search Truth Quran search tool [5] allows users to search the Quran using many translations at a time such as Mohsin khan, Yousaf ali, Shakir, Pickthal etc. This tool also allows phonetic search. Main drawback of this system is that it does not search for the exact match of the word but rather if the word (to be searched for) is part of any word in the Quran. For instance, if you search for the word ‘ship’, all the verses that contain the words worship, friendship etc will be retrieved. The Guided Ways Quran search tool [6], allows users to search the Quran using many translations at a time such as Pickthal, Mohsin Khan, Yousaf Ali, Shakir, Ahmad Ali, Jhalandhary etc as in [5]. User can choose one or more Quranic translation used in the search process as well as this search can be performed for different languages. It searches the Quran for an exact match of the input word. The IslamiCity search tool [7], searches the Holy Quran using many translations at a time. When a user inputs a word for searching the tool matches it against identical matches or partial matches (part of the word). The USC Quran search tool [8] searches the Quran using three English translations; (Yousaf Ali, Pickthall, Shakir). It matches exact words only.

Page 2: Quranic Search Engine

ISSN 1013-5316; CODEN: SINTE 8 Sci.Int(Lahore),26(1),181-184,2014

 

182 

These tools are useful for searching keywords using different translations but are not good enough to search for a concept. For instance if the word paradise is searched these tools would only return those verses which contains the word paradise while the fact is Quran uses word heaven and garden in same manner. Recently some research has been done in this regard but such concept based searching techniques haven’t yet explored in fully. A new approach for XML semantics in terms of a specification language is used in [9-10] to specify semantics rules. Aim of these papers is to apply XML semantics approach to indicate reliability of the Holy books being published in XML format. The work done in [10] exhibits the significance of XML semantics checker approach to examine semantic consistencies of Holy Quran. This checker model successfully verifies that number of verses in each chapter was correctly written in the Quran XML format document. It was also verified that XML document of Holy Quran contained exactly the same number of chapters as really are in Holy Quran. The work presented in [11-12] uses ontologies for key word extraction and key phrase candidate for developing ontology for Islamic literature based on an algorithm. A skeletal methodology has been presented in these researches for building these ontologies. A computational model for representing Arabic lexicons using ontologies has been presented in [13]. It is based on the field theory of semantics, from the linguistics domain, and the data which drives the design of the model is obtained to presents superiority and perfection of the Arabic language. This paper presents the design and implementation of the proposed ontological model. Some results of its application on vocabulary of the Holy Quran are presented. Another model presented in [14] exploits WordNet relationships in relational database model. The implementation of this model has been carried out using Surah Al-Baqrah, the largest chapter of the Holy Quran. The precision of this model's prototype implementation is claimed somehow to be far better than simple key word searching. One good semantic based work has been done in [15]. In this study, a query has been improved in order to retrieve more relevant documents across language boundaries, a mechanism for query translation with semantic which is applied on as semantic query. Therefore, this study is conducted with the purposes to investigate semantic approach against the queries and vice versa. Furthermore, it is also conducted to investigate the performance of query based search on total retrieve and relevant retrieval. Results from the experiments suggest that semantic approach is most important process in cross language information retrieval. It also suggest that semantic approach contributes to better performance in retrieving more relevant and related Quran document results. Another ontology based method for searching Holy Quran is presented in [16] exploiting NLP patterns that help reduce the effort during the knowledge acquisition process. Some limitations of the work has also been mentioned such that all the competency question cannot be answer using Quran because Quran being reveal in a general and the detail of every subject such as Salah being described and elaborate in detail by Hadith. Secondly some verses especially mutashabihat need further elaboration or discussion by Quranic experts.

METHODOLOGY The current research is proposing theoretical framework architecture for Concept based Evolutionary Quranic Search Engine (CEQSE) that will take user queries as input and will search concepts in Quran accordingly. The benefit of implementing this framework is that the timing and accuracy for searching is not same all the time. Initially this search engine may take a longer time and may search some irrelevant verses in comparison with its search after experience. This framework consists of eight modules as shown in figure 1. Quran Document: This module behaves as an input interface and is use to take a Quran Document as input from user. This module holds all verses of Quran in it. Although this is one time task but it give opportunity to add as many books to search text as user wants. This document passes the text to next module for further processing. Ontology Extractor: The purpose of this module is to extract ontological knowledge from the factual knowledge. This module takes XML file and considering the concept that how a human brain actually store semantic information perform tagging operation on sentences level. This module does tagging through dividing each sentence into three tags; Subject, Object and Predicate. This ontological knowledge is store into Ontological Knowledgebase repository to use this knowledge as a conceptual knowledge for further modules. This module is also responsible to provide ontology to Query Engine according to user query. Query Engine: Query Engine work as a controller, it gets queries of user and passes it to subsystems for processing. It gets the query from user application and passes it to POS Tagger. It is also responsible to retrieve the ontological knowledge from ontology extractor to entertain the query of user. Then after validation of concepts from concept validator and sends it to Ontology Extractor for refinement of Ontological knowledge. POS Tagger: POS Tagger is used for part of speech tagging /tokenization of words. This is used to label each word of a sentence into its suitable token like verb, adverb, noun etc. XML Generator: This module takes Quran document and converts it into XML file format for CEQSE framework. XML is a most useful language applies for the transmission of data in all type of applications due to its popularity in storing and describing information. This XML file then further transfers to Ontology Extractor Module. Morphological Analysis: Initial task of this module is to filter out the verse from those words, which are more frequent in the query as they contain very low inequity for retrieval of relevant concept from ontological Knowledgebase. As a document or query have many morphological deviations so this module then is use to extract the comprise morphemes in a word. In the result this module brings the words to their stems or root form.

Page 3: Quranic Search Engine

Sci.Int(Lahore),26(1),181-184,2014 ISSN 1013-5316; CODEN: SINTE 8

 

183 

Fig1: CEQSE Framework

The purpose of this practice is that generally similar words have same meanings so, uniformity of morphological words enhance the effectiveness of retrieval of conceptual knowledge. Morphological knowledgebase hold this morphological Synonyms Identifier: To generate effective responses of user’s query this module identifies the synonyms of the words. The purpose of this identification is to get the every possible concept in the document for user generated query. Concept Validation Module:

Query Engine retrieves all concepts according to user query and displays it on user application. Then at this point the concepts are validated. If user selects any relevant verse then the weights of that ontology with the searched query is updated otherwise same weights are updated in the ontological database. In the result of validation from user side these selected concepts transfer to ontological knowledgebase for refinement and enhancement for the effectiveness of results according to query. ALGORITHM

Search Truth

Guided Ways

Islamicity Corpus Quran

Al-Islam Actual Result

Allah 0 1 11 0 1 42 Paradise 0 0 15 0 0 14 Garden 3 0 5 0 0 14 Hell 1 1 2 1 1 2 Sea 2 1 2 1 0 4 Punishment 0 0 0 0 0 0 Jinn 6 3 5 5 6 6 Man 6 6 4 5 0 36 Water 1 1 3 1 3 0 Earth 4 4 1 3 4 3 Believer 0 0 2 0 0 0 Sinner/Criminal 0 0 0 0 0 3

Table 1: Comparison of different Quranic Search Engines

1. Take Quran document 2. Conversion of Quran document into XML format 3. Pass XML file to ontology extractor

i. For each sentence of document ii. Perform tagging of each sentence in

subject, object and predicate form iii. Store Ontological Knowledge into

Ontological Knowledgebase 4. Take query of user at runtime

i. Perform Part of Speech tagging in POS Tagger

ii. Perform Morphological Analysis and find frequent items in the query

iii. Perform Synonyms Identification iv. Retrieve Ontological Knowledge from

ontology extractor 5. Validate concept knowledge on Application layer

i. Transfer validated knowledge to Ontology Extractor for knowledge refinement

Page 4: Quranic Search Engine

ISSN 1013-5316; CODEN: SINTE 8 Sci.Int(Lahore),26(1),181-184,2014

 

184 

ii. Change retrieval policies according to validated concepts

6. Goto step 4 DISCUSSION The concept formulated in the proposed model and algorithm is to provide concept based Quranic searching. To justify proposed model and algorithm three parameters have been selected i.e. efficiency, accuracy and unbiased searching of the Quranic text. Although there are currently many Quranic softwares and databases are available which are performing good searching. Yet these softwares are capable to find most significant words in query and then retrieve those verses in Quran from database that contains such kind of keywords in them [4-8, 11-12] whether or not they are required. Critical analysis of these tools implies that keyword based searching does not have any ability to entertain the query of user properly as there are many verses in Quran which actually doesn’t contain any explicit word yet they possess many hidden concepts in them. For instance “Sura e Rehman” contains many different concepts and it is clear from table 1 that different searching tools have different results for any particular word. Other then the accuracy another critical issue is that this cycle will be repeated every time any of the concept is searched which results in slow and seemingly wrong results. On the contrary proposed model extracts concepts from this verse and tag each pronoun to a particular noun with which it represent. There exist software [16] which perform ontology based Quranic search using NLP for knowledge acquisition but the limitation of this software is, it is consulting Hadith for getting the answers of many concepts that are not clearly reveal in Quran. This phenomena is pointing towards an illusion that there exist many Hadith books and every writer is providing its own interpretation about different Hadith so, there is a possibility exist extracted concepts about user query actually confusing the user through providing different interpretation on a single search. Such software also subsist which have predefined concepts [15,14] and give the answer of every search of user in fixed means, like if they have concept ”ALLAH is one”, then if user search ALLAH they every time give same verse assuming that all concepts of Quran have been listed. While Quran is termed as living book among Muslims and still a lot of research is being made by scholars to understand Quranic concepts so there is a sheer need to update the concepts and to bind new concepts with relevant verses. CEQSE provides a theoretical framework having the ability to refine its search through evolving and improving its concepts from user validation. Therefore, it is able to provide efficient searching with better understating of Quranic Verses, delivering the deep understanding of meanings. CONCLUSION This paper propose a theoretical framework providing a comprehensive basis for implementing semantic based concept extraction engine for Quranic search. The effective results of search provide better understanding of underlying concepts of Quranic Verses to user. In future an application will be developed for Quranic search using this framework,

in addition with an extensive knowledgebase holding vast textual information. REFERENCES [1] A. Rippin, The Blackwell Companion to the Qur’an. p18

(2006). [2] M. Mutahhari, Understanding the Uniqueness of the

Quran. Vol I No. (1984). [3] E. H. Onen, S. Saarela, and K. Viljanen, Ontogator:

Combining View- and Ontology-Based Search with Semantic Browsing. In: Proceedings of XML Finland 2003, Open Standards, XML, and the Public Sector, Kuopio, October, 2003.

[4] A. Shawar, Atwell, An Arabic Chabot giving answers from the Qur'an in Bel Proceedings of TALN04: XI Conference. Vol. 2, (2004) [5] Search Truth, an online QURAN and Hadith search web portal, retrieved on January 1, 2012 http://www.searchtruth.com/

[6] Guided Ways Technologies, an online QURAN and Hadith search web portal, retrieved on January 1, 2012, http://www.guidedways.com/index.php

[7] IslamiCity, an online QURAN search web portal with translation in different languages, Retrieved on January 2, 2012 http://www.islamicity.com/mosque/quran/

[8] University of Southern California, Centre for Muslim-Jewish Engagement provides Quran search database on web portal, Retrieved on January 2, 2012

[9] Y. Kotb, K. Gondow, and T. Katayama, “The SLXS Specification Language for Describing Consistency of XML Documents,” Proc. Of the Fourth Workshop on Information and Computer Science (WICS’2002), IEEE Comp. Soc., El-Damam, Saudi Arabia, pp. 289- 304, March 2002.

[10] Y. Kotb, K. Gondow, and T. Katayama, “The XML Semantics Checker Model,” Proc. of the Third International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT’02), Kanazawa, Japan, pp. 430-438, September 2002.

[11] S. Saad and N. Salim, “Build Islamic Ontology based on Ontology Learning,” Postgraduate Annual Research Seminar 2007, (3-4 July 2007 ).

[12] S. Saad, N. Salim and N. Omar, “Keyphrase Extraction for Islamic Knowledge Ontology,” IT symposium Vol. 2, pp. 1-6 (on 26-28 Aug 2008).

[13] M. Yahya, H. Khalifa, A. Bahanshal, I. Odah, N. Helwah, “An ontological model for representing semantic lexicons: an application on time nouns in the holy quran” The Arabian Journal for Science and Engineering, Volume 35, Number 2C in December 2010

[14] M. Nadeem, H. Ullah, M. Imran, M. Sikandar, “Relational WordNet Model for Semantic Search in Holy Quran. Muhammad Shoaib”, International Conference on Emerging Technologies IEEE, 2009

[15] A.Yunus, R. Zainuddin and N. Abdullah, “Semantic Query for Quran Documents Results” IEEE Conference on Open Systems (ICOS 2010), December 5-7, 2010, Kuala Lumpur, Malaysia

[16] S. Saad, N. Salim, S. Zainuddin, “An Early Stage of Knowledge Acquisition Based on Quranic Text” International Conference on Semantic Technology and Information Retrieval 28-29 June 2011, Putrajaya, Malaysia