brainspa paper

18
BrainSpa – A Web Application for Exploring Knowledge using SPARQL Eugen Ignat 1 , Sabin Pochiscan 1 , Radu Simionescu 1 , Simona – Adina Toderas 1 1 Artificial Intelligence and Computational Linguistics Department, Faculty of Computer Science - Iasi, Romania [email protected] [email protected] [email protected] [email protected] Abstract. The World Wide Web is a dynamic environment that everyone (from user to expert) is excited about. Now entering a third stage of life, it faces new challenges: finding good ways to model knowledge about things by attaching meta-data to data itself. This paper focuses on how the WWW is becoming a less ambiguous space and what are the many ways to take advantage of new features becoming available for no cost except one's interest. A particular case study is done on BrainSpa – our own web application for exploring various SPARQL endpoints and sharing queries with members of the semantic lovers community. Keywords: RDF, SPARQL, web application, PHP, concepts, modeling, RAP, OAuth, query, tag, endpoint, prefix, semantic, WWW. 1 Introduction Some well known web jokes flirt with the idea of Google-ing your car keys or mismatched socks. Fortunately for the on-line man, that era may not be so far away, given the recent effort put into moving the WWW into a semantic driven zone for modeling concepts and linking information. In order to satisfy the need for organized, unambiguous information, certain organizations like W3C have taken the initiative to develop specifications, languages and technologies that are free to use and more than appealing for the task of annotating knowledge from any domain. Making use of such innovative technologies allows users to develop all kinds of interesting and creative applications that certainly prove useful in a wide range of domains (given today's demand for automating as many processes as possible). BrainSpa adheres to the above mentioned group of applications – it is an interesting tool in the form of a web application that allows its consumer to explore knowledge available in the World Wide Web (in the form of RDF files) using SPARQL without explicitly having to know the query language. The querying is done by completing an on-line form with requested information either in an anonymous

Upload: adina-toderas

Post on 26-May-2015

304 views

Category:

Technology


0 download

DESCRIPTION

BrainSpa is a web application for exploring knowledge using the SPARQL query language for RDFs.

TRANSCRIPT

Page 1: BrainSpa Paper

BrainSpa – A Web Application for Exploring Knowledge using SPARQL

Eugen Ignat1, Sabin Pochiscan1, Radu Simionescu1, Simona – Adina Toderas1

1 Artificial Intelligence and Computational Linguistics Department, Faculty of ComputerScience - Iasi, Romania

[email protected]@[email protected]@info.uaic.ro

Abstract. The World Wide Web is a dynamic environment that everyone (from user to expert) is excited about. Now entering a third stage of life, it faces new challenges: finding good ways to model knowledge about things by attaching meta-data to data itself. This paper focuses on how the WWW is becoming a less ambiguous space and what are the many ways to take advantage of new features becoming available for no cost except one's interest. A particular case study is done on BrainSpa – our own web application for exploring various SPARQL endpoints and sharing queries with members of the semantic lovers community.

Keywords: RDF, SPARQL, web application, PHP, concepts, modeling, RAP, OAuth, query, tag, endpoint, prefix, semantic, WWW.

1 Introduction

Some well known web jokes flirt with the idea of Google-ing your car keys or mismatched socks. Fortunately for the on-line man, that era may not be so far away, given the recent effort put into moving the WWW into a semantic driven zone for modeling concepts and linking information.

In order to satisfy the need for organized, unambiguous information, certain organizations like W3C have taken the initiative to develop specifications, languages and technologies that are free to use and more than appealing for the task of annotating knowledge from any domain. Making use of such innovative technologies allows users to develop all kinds of interesting and creative applications that certainly prove useful in a wide range of domains (given today's demand for automating as many processes as possible).

BrainSpa adheres to the above mentioned group of applications – it is an interesting tool in the form of a web application that allows its consumer to explore knowledge available in the World Wide Web (in the form of RDF files) using SPARQL without explicitly having to know the query language. The querying is done by completing an on-line form with requested information either in an anonymous

Page 2: BrainSpa Paper

way or by logging in to one's account. Obviously, having an account provides more benefits than using BrainSpa in the anonymous fashion: registered users are presented with the opportunity to save or share their queries on the server side, and even store queries and results on their local computers. Also, registering for this service is extremely easy and does not require memorizing another pair of user-name / password credentials; anyone can log in using an existing Twitter, Gmail, Yahoo! or YouTube account, thanks to the advantages brought by OAuth [1] – an open protocol that allows secure API authorization in a simple manner. This way, BrainSpa is not just a client – server solution for browsing the annotated data available online, but the foundation for a community-driven environment.

More about the technologies involved in the project, among with other information, will be presented in the remaining of this paper, as following: coming up next is a study regarding the current situation of semantic resources and applications available so far, while section 3 lists and overviews everything involved in the actual development of BrainSpa; in the fourth chapter, a few use-cases for the application are mentioned, together with the relevant diagrams; the paper concludes with chapter 5 followed by a list of references.

2 Overview

The World Wide Web is slowly moving into a semantic-driven zone and this can easily be proven by presenting the new technologies and services freely available to attend to the task of modeling knowledge, annotating data and exploring concepts. Specifications for semantic markup / modeling like RDF or the lightweight micro-formats are gaining popularity as more and more tools that try to improve the “internet surfing” experience make their way on the market. For example, Firefox extensions (Tails, Operator) have been developed for exploring or operating with micro-formats. Similarly to the HTML Validator, W3C offers validating and visualizing services for RDF documents. Also, there is a big number of recently developed semantic frameworks available for many different programming languages:

• D2R Server, Joseki, Sesame and Mulgara for Java,• RAP and ARC for PHP,• 4store, OpenLink Virtuoso and Oracle Spatial 11g for C/C++,• RDFStore for C and Perl,

and many more, all having specific methods for reading / parsing data from semantic formats, storing RDF triples to a database, creating queries or accessing endpoints. Just like for storing information in the traditional database manner, RDF knowledge comes with a querying solution – the SPARQL query language. SPARQL (named as a recursive acronym that stands for SPARQL Protocol and RDF Query Language) can be tasted at the various endpoints that offer user interface for this purpose. The biggest project offering a query solution is DBpedia – a project aimed at extracting structured information from available Wikipedia information. W3C offers an up-to-date and accurate list of SPARQL endpoints for exploring content from a wide range of domains.

Page 3: BrainSpa Paper

All these are just a few basic examples of technological advancement achieved in the semantic web area. Of course, more tools are available and can be created by developers willing to contribute in the progress of the WWW, and BrainSpa tries to be such a tool.

3 Architecture

BrainSpa was created by following the general guidelines of software engineering. After an in-depth analysis of the requirements came the architectural and detailed design of the desired software product. The coding process evolved in a modular style, followed by an incremental integration of the system. In the validation step, two questions were asked and successfully answered - “Are we building the right product?” and “Are we building the product right?” - in order to test if the initial requirements were fully respected and implemented in a functional manner. The last step, maintenance – having the longest lifespan – starts after the product is deployed and ends with the author's loss of interest in it.

The following sub-chapters provide more information regarding everything involved in the actual development of BrainSpa.

3.1 Technologies

This section provides an overview of all technologies involved in the creation of BrainSpa. They were chosen based on accessibility, (lack of) price, interoperability and position towards freeware / open-source.

3.1.1 Dropbox

In the planning stage, one of the first issues we had to deal with was how to share files (for source code, scripts, images, documentation, etc.) between the developers in a versatile yet time saving manner. With the though in mind that BrainSpa is a relatively small project compared to the mammoths of the IT industry, we considered that adopting version control in an SVN manner would more likely separate the members of the team rather than making them work together. So we found a (at least in our opinion) better solution in Dropbox [2] – a free service for sharing files among users.

Since we were already using Dropbox for personal projects, the existing accounts needed only some new shared folders which would store files for specific tasks (images, actual projects with libraries or documentation). Because Dropbox has file

Page 4: BrainSpa Paper

history and versioning and supports operations like restore, the shared data is exposed to no risk of accidentally deleting / changing any vital information.

3.1.2 Creately

Another issue encountered in the planning stage was finding the best tool for tasks like visually modeling the database schema or creating use-case diagrams. We had experience with ArgoUML and Creately [3] from which we chose the later because (unlike ArgoUML) it is available as a web application, it provides visual elements for creating a wide range of diagrams (that can be exported as images or PDFs) and it provides the opportunity to share either files or entire projects among Creately users.

Taking advantage of this service enables developers to save time by focusing more on how to project / model information in a visual way without having to worry about versioning, sharing or letting someone know that a diagram content has / needs to be changed somewhere on the trunk of the project. Last but not least, Creately is appealing due to it's modern, eye-candy design of layout and components (an advantage presented in the form of relevant diagrams in the following sections of this paper).

3.1.3 280 Slides

Another extremely useful tool for on-line, collaborative work is 280 Slides [4] – the free web application for creating, saving, editing and sharing presentations in the easiest way possible.

280 Slides proved useful in creating a beautiful, quality presentation for the BrainSpa project, presentation that was contributed to easily by every member of the team, since it was always backed-up and available online.

3.1.4 Open Office

A considerable rival for Microsoft Office, OpenOffice [5] is the “free and open productivity suite” available for just a download and the execution of a clean installer. Due to it's lack of price and the fact that it is open-source, this office solution is not

Page 5: BrainSpa Paper

only very popular, but also compatible with any operating system or bureaucratic task.

Among the available applications of OpenOffice, Writer was used for the creation

of the present document that conforms to LNCS standards.

3.1.5 CodeIgniter

Because the server-side of BrainSpa, developed in PHP, must deal with complex tasks for database operations, session / cookies management, keeping the views and the data separated with the use of controllers, the project cannot do without a powerful PHP framework for web applications.

CodeIgniter [6] is the best candidate as it provides a simple yet powerful environment with minimal configuration and maximal resourcefulness through its large number of libraries. The framework not only proves excellent performance results, but also provides all necessary resources for completing tasks like database administration and session management. Also, it is open-source and based on the Model-View-Controller design pattern, very popular especially when it comes to building web applications.

3.1.6 Zend Framework

The Zend Framework [7] is another powerful solution for building PHP web applications. It also provides significant resources for common database or session management tasks in the OOP and MVC fashion, but it is most popular for it's “use-it-all” framework statute.

Because Zend Framework has a more than friendly attitude towards the modern, Web 2.0 applications and web services – it provides ways for consuming widely available APIs from leading vendors like Google, Amazon, Yahoo! or Flickr – our project makes use of it, together with OAuth, for enabling users to log in on BrainSpa using an existing account from Yahoo!, Google, Twitter or YouTube.

Page 6: BrainSpa Paper

3.1.7 OAuth

OAuth is an open protocol for secure API authorization in a simple and standard method from desktop and web applications. What it does is allowing the user to grant access to his private resources (located in one site – the Service Provide) to another site (the Consumer) without sharing the user's identity.

The work-flow of OAuth implementations is consistent for most service providers and adheres to the following steps:

• the developers signs up to the service provider in order to get a consumer key and a shared secret;

• the provider gives the developer a request token;• the application redirects the user to the service provider web site in order to

obtain user authorization;• given the user authorization, the service provider redirects back to the

application;• upon receiving a request token and OAuth verifier, the service provider

grants an access token and a token secret that can be taken advantage of until they expire.

3.1.8 RAP

Because BrainSpa is a semantic-oriented web application, some extra operations are involved in the overall functionality of the system, operations regarding sending a SPARQL query to an endpoint and receiving an RDF result that will be transformed into visual-appealing format. This is were RAP [8] can play its role as a powerful RDF API for PHP with some interesting features like:

• methods for manipulating RDF models as a set of RDF triples or resources or through vocabulary specific methods,

• integrated RDF/XML, N3, N-TRIPLE, TriX parsers and serializers,• in-memory / database storage,• SPARQL query engine and client library,• integrated RDF server (similar to the Joseki RDF server),

these being just a few. RAP is the most suitable software package for parsing, querying, manipulating,

serializing and saving RDF models.

Page 7: BrainSpa Paper

3.2 Development

Regarding the model used in developing BrainSpa, a predominant XP (Extreme Programming) technique was adopted by the team. There was no hierarchical distribution among the members, a collaborative working style was encouraged, and each of us was able to bring their contribution to the project by turning to profit personal skills. The initial task was devised into a number of issues that we could work on alone or in pairs, and we met regularly (both on-line and in person) to discuss so far progress and future directions to follow.

3.2.1 Responsibilities

In order to adhere to IT standards and survive on the market, the project, initially called WebSpa, needs a strong identity. All marketing aspects (name change, logo, diagrams, documentation, presentation, speeches) together with some architectural responsibilities, database design, testing and research were handled by Adina Toderas.

The User Interface (developed using HTML + CSS, JavaScript and jQuery) and client-side aspects were Sabin Pochiscan's responsibilities.

Last but not least, server-side aspects (querying, saving, OAuth, RAP, etc.) were dealt with by the pair of the last two members in the team, Eugen Ignat and Radu Simionescu, with occasional help from Adina Toderas (for testing).

Each of the four authors had the opportunity to make use of their personal skills and work on what they enjoyed most / were good at. This is the biggest immaterial reward one can ask for when it comes to school or career.

3.2.2 Coding

In order to make use of advantages like modularity, re-usability, polymorphism, inheritance and abstraction, the well known Object Oriented coding style was adopted. Also, because BrainSpa is a web application that benefits of data persistence (saving user information, queries, tags, descriptions to the database on the server side) and having a rather complex user interface, it is implemented in the guidelines of the Model-View-Controller design pattern. The MVC states that data and view should be separated within a software entity, and should only communicate with each other

Page 8: BrainSpa Paper

using a special controller developed specifically for that data and that view. In other words, while the Model and the View are quite often reusable, the Controller is not.

In our project, information regarding users and their queries is saved in a database storage system (the Model of MVC). Figure 1 depicts the schema for the mentioned database.

The View of the MVC is the user interface itself – a visual interactive space that the user utilizes in order to communicate with BrainSpa and take advantage of its features and capabilities. The UI (Fig. 2, 3 and 4) is composed of a web page which handles different functionality tasks like logging in, registering, querying an endpoint or browsing through existing public queries. The project interface is developed as a RIA (Rich Interface Application) – similarly to a desktop application, it provides as much functionality as possible within the same window of interaction. Also, the main module of BrainSpa, which handles the construction of SPARQL queries, is inspired from the “View” module in Drupal (Fig. 5) that has similar functionality - generating a MySQL query without explicitly knowing the MySQL query language.

Development for BrainSpa was done mostly using NetBeans, the Java integrated development environment that can be user for coding in many other languages besides Java, languages such as JavaScript, PHP, Python, Ruby, C, C++, Scala or Clojure. Because the IDE works anywhere if there is a Java Virtual Machine installed, it is a platform independent working environment. A screen-shot of the project in NetBeans is shown in Figure 6.

Page 9: BrainSpa Paper

Fig. 1. Diagram for the database schema (done using Creately service) of the BrainSpa project.

Page 10: BrainSpa Paper

Fig. 2. BrainSpa user interface – query builder.

Fig. 3. BrainSpa user interface – query builder in action.

Page 11: BrainSpa Paper

Fig. 4. BrainSpa user interface – results.

Fig. 5. Drupal “View” module for generating MySQL queries.

Page 12: BrainSpa Paper

Fig. 6. BrainSpa source files as seen in NetBeans.

As it was mentioned in a previous section, the sharing and version control process for BrainSpa source files was handled by Dropbox, a free, lightweight service for on-line backup and file sync. Because of this, a collaborative working style was adopted by the team members – frequent meetings (both on-line and in person), working in pairs, etc.

The last diagram of this sub-chapter, Figure 7, represents a detailed deconstruction of the regular SPARQL query; all aspects involved in the query are shown in a tree-like structure in order to reflect and argument the display of user interface elements involved in generating an interrogation.

Page 13: BrainSpa Paper

Fig. 7. Detailed deconstruction of a SPARQL query.

4 Use-cases

No matter how efficient and functional a software package is, it must prove to have some meaningful use to the target audience, it must practically answer the question to a problem that is of interest to a certain group of people. The presented project aims to offer solutions for exploring knowledge modeled with the use of the RDF specifications, knowledge available at endpoints that the query will reach and interrogate, thus obtaining a result to give back to the user. The target audience is composed of users having a small amount of technical knowledge in IT and that are

Page 14: BrainSpa Paper

fond of web and semantic technologies, but it can be extended to a wider class of users with no IT background that want to come across valid knowledge.

BrainSpa can be invoked either in an anonymous manner or with an account, the later being preferred since it provides more feature that are community oriented.

4.1 Anonymous use

Users can access the BrainSpa web application in an anonymous fashion (without registering with an existing account) but the functionality is limited, as the project aims to be community oriented. The only available option is filling the query form in order to compose and send a SPARQL request to an endpoint and receive the results (displayed to the user in table manner). A relevant use-case diagram is presented in Figure 8, showing how the actors involved in the scenario (the User, the System – BrainSpa – and a SPARQL query endpoint interact with each other.

Fig. 8. Use-case diagram for an anonymous connection to BrainSpa web application.

4.2 Registered use

One can make use of the BrainSpa services fully by using an account. One of the most interesting parts of the project is the fact that a user does not need to actually

Page 15: BrainSpa Paper

register with BrainSpa and memorize another pair of user-name / password credentials. The project takes advantage of the OAuth protocol which means that anyone having a Yahoo!, Google, YouTube or Twitter account can log in to the web application using that account. Once logged in, the number of options available increases. A possible use-case scenario is the following: somebody wants to find accurate information regarding a certain subject (for example, comments about Romania) and, after obtaining the results, store them together with the query on the local computer. All one needs to do is complete the form available online in order to generate a query like the following:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT ?info WHERE { <http://dbpedia.org/resource/Romania> rdfs:comment ?info . }

After the endpoint processes the received query, it sends a result back to the web application that displays the received information in a table fashion. Both the results (as an RDF file) and the query can be saved to the local file system only with a few clicks.

Another use-case scenario is the following: a registered user wants to save his queries in order to use them in the future (Since the information available on-line and modeled as RDF will continue to change and hopefully change, it is evident that a today's result to a query will look different then the result of the same query executed in a month from now.). Also, the user wants to attach both tags and a description to his query in order to distinguish his saved information more easily. This is also possible at the cost of just a few clicks in the user interface. Even more, the user is presented with the option of saving his query either private or public, which brings us to the next use-case scenario.

A user wants to browse through the existing shared queries. This is made possible through the help of a search form that can receive as input the tags or / and description key words one wants to filter by. Upon searching, the application will interrogate the public queries stored in the database and display the results to the user in the user interface. After reviewing them, he can eventually chose to execute and see the results.

But this is not all that BrainSpa has to offer. Other important features are the possibility of favoring specific query information like endpoints and prefixes: each user is provided with lists of his favorite endpoints and prefixes, and the possibility of adding or removing entries from those lists is of course made available.

Figure 9 presents the complete use-case scenarios diagram. Each major operation possible within BrainSpa is represented by an oval use-case element, while the arrows indicate the direction followed by each operation.

Page 16: BrainSpa Paper

Fig. 9. Use-case diagram for a user logging in with an account to BrainSpa web application.

Page 17: BrainSpa Paper

5 Conclusions

The idea of browsing the World Wide Web in an intelligent, concept driven manner is extremely appealing but seems to be far from happening in the next few years. However, important progress has been made in domains closely related to the problem at hand, and we are not that far from using a search engine than knows how to distinguish between the Java programming language, the Java island and the Java coffee.

Certain organizations like W3C have taken the initiative to develop specifications, languages and technologies that are free to use and more than appealing for the task of annotating knowledge from any domain. Making use of such innovative technologies allows users to develop all kinds of interesting and creative applications that certainly prove useful in a wide range of domains (given today's demand for automating as many processes as possible).

BrainSpa adheres to the above mentioned group of applications – it is an interesting tool in the form of a web application that allows its consumer to explore knowledge available in the World Wide Web (in the form of RDF files) using SPARQL without explicitly having to know the query language. The authors meant to develop a tool that tries to improve the on-line experience of a user fond of web and semantic technologies. The development process of the project was complex and helped every team member enrich their knowledge and technical experience. Research has been done on a large amount of technologies (besides the ones mentioned in the present paper), so that the most suitable of them may be chosen to help obtain good functionality and performance within the software application.

As the use-cases demonstrated, BrainSpa represents the first step in building a community of users that are interested in innovation and technological advancement. Hopefully, with time, it will evolve more features and gain a large number of members. Even at the current stage, the authors believe it to be an interesting and useful tool that can be later integrated in solving more complex problems encountered in the semantic web area of research.

References

1. OAuth, http://oauth.net/2. Dropbox, https://www.dropbox.com3. Creately, http://creately.com/4. 280 Slides, http://280slides.com/ 5. OpenOffice, http://www.openoffice.org/6. CodeIgniter, http://codeigniter.com/7. Zend Framework, http://zendframework.com/8. RAP, an RDF API for PHP, http://www4.wiwiss.fu-berlin.de/bizer/rdfapi/9. Twitter API Wiki, http://apiwiki.twitter.com10.Authentication and Authorization for Google APIs,

http://code.google.com/apis/accounts/docs/OAuth.html11.Programmer's Reference Guide to Zend Framework and OAuth,

http://framework.zend.com/manual/en/zend.oauth.introduction.html

Page 18: BrainSpa Paper

12.Yahoo! OAuth authorization model, http://developer.yahoo.com/oauth/13.Developer's guide to Youtube Data API,

http://code.google.com/apis/youtube/2.0/developers_guide_protocol.html14.Code Recipes, http://code.activestate.com/recipes/15.JSON in JavaScript, http://www.json.org/js.html