ghent university iminds multimedia labusers.datasciencelab.ugent.be/lasmeken/mp2015-2016/...research...

Ghent University – iMinds – Multimedia Lab

Master thesis subjects 2015 - 2016

Knowledge on Webscale

Assessing Open Source Code Trustworthiness through Version History

Promotor: Erik Mannens and Rik Van de Walle

Supervisors: Tom De Nies, Ruben Verborgh, Miel Vander Sande

Study Programme: Master Computer Science Engineering, Master Mathematical Informatics

Number of students: 1

Number of theses: 1

Contact: [email protected]

Keywords: Trust, Version Control Systems, Git, Semantic Web

Location: home, Zuiderpoort

Problem definition:

An increasing amount of code is being shared on the Web, thanks to various open source initiatives

such as npm, sourceforge, Google Code, etc. However, with all this code being pushed to the Web,

and the possibility for anyone to contribute to any project, the quality of the resulting software is not

always optimal. Furthermore, manually judging whether or not to trust a piece of code is a time-

consuming and non-exact process, while programmers should be focusing on writing contributions

of their own. Therefore, there is a need for an automatic method to help programmers decide

whether or not they can trust a certain piece of code.

Goals:

To achieve this, the student will exploit the information contained within a Version Control System

(VCS), such as Git. More specifically, the ‘provenance’ (also referred to as ‘data lineage’) of the

code will be exposed using a tool such as Git2PROV. Then, the student will create a method to

automatically reason of this provenance, to give the end-user an assessment of the trustworthiness

of each version of the code. An extensive literature review will need to be conducted, to find out

which criteria influence trustworthiness, and how they can be inferred. Finally, the thesis should

result in a lightweight, user-friendly demonstrator that can easily be used by programmers in their

daily workflow.

http://www.npmjs.com/

http://sourceforge.net/

http://code.google.com/

http://git2prov.org/

Towards a Trusted Web by Tracing the Origins of Composite Web Pages


Supervisors: Tom De Nies and Ruben Verborgh



Number of theses: 1


Keywords: Trust, Version Control Systems, Git, Semantic Web


Problem definition:

While openness is one of the core foundations of the Web, it has caused such an abundance of

heterogeneous content that it becomes unclear for humans if they can trust the content they see on

web pages. Furthermore, web pages are often littered with tracking mechanisms, such as hidden

pixels, cookies, etc. The first step in deciding whether or not to trust a web page is finding out where

its contents come from, who made/edited them and whether these sources are to be trusted. This is

what’s known as the page’s provenance.

Goals:

In this master’s thesis, the student will investigate a method to trace the provenance of Web pages

and the parts of which they composed. To achieve this, an extensive literature study will be

performed to identify existing approaches that might serve as a baseline for this purpose. The

student will then devise an improvement on these approaches, thereby exposing the full provenance

of a web page. This provenance can then be interpreted by a reasoner to suggest a trust

recommendation to the end user: a human being looking at the web page through a browser. By

working with standards from the W3C Open Web Platform, the solution devised in this thesis

potentially has a worldwide impact.

Bringing time-travel to data on the Web with efficient indexes


Supervisors: Miel Vander Sande, Laurens De Vocht


Number of students: 1 or 2

Number of theses: 1

Contact: Miel Vander Sande

Keywords: Data structures, Web APIs, Indexing, RDF, Semantic Web, Linked Data


Problem definition:

The Web of Data is an interconnected network of Linked Datasets residing on the Web. In

contrast to documents, concepts within the data are linked in a single global data space. These data

are modelled as a graph using the RDF framework, which describes data as triples (subject-

predicate-object). Although reading infrastructure has evolved significantly, writing this dataspace

is still an unsolved problem. How do you maintain such a dataspace where different users create,

read, update and delete statements? How do you take into account different views on the same

data? These problems pose new challenges for data storage, more specifically indexes, where

changes need to be remembered. Databases uses many different indexing algorithms like B-trees,

B+ trees, skip lists, hash tables and storage structures like Log-structured storage to enable fast and

concurrent read/write access. How do they perform for RDF? Can they be exploited by version

control systems?

Goals:

In this thesis, you initiate a building block for the Read/Write Web of Data. You dive into the

literature about general and versioned indexing strategies in RDF databases. Based on this

knowledge, you propose a technique that supports (a) fast triple pattern based retrieval, (b)

acceptable insertion/removal of triples and (c) track changes in order to retrieve past views. Finally,

the approach is implemented and evaluated using a use case, in order to verify the features

mentioned above.

Monitoring Science Related Conversations on the Web in Real-Time

Promotors: Erik Mannens and Rik Van de Walle

Supervisors: Laurens De Vocht, Anastasia Dimou



Number of theses: 1

Contact: Laurens De Vocht

Keywords: Web 2.0, Science 2.0, Researchers, Web of Data, Collaboration Tools, Social Media


Problem definition:

Digital libraries and online journals (such as IEEE, ACM) all have search engines to help scholars

find interesting resources. However, these approaches are often ineffective, mostly because

scholars: (i) only look-up resources based, at best, on their topics or keywords, not taking into

account the specific context and the scholar's profile; (ii) are restricted to resources from a single

origin. Of course, aggregators exist that index resources from multiple sources. The challenge is

therefore in matching research needs and contexts to opportunities from multiple, heterogeneous

sources. In other words, we should make the most of the wealth of resources for research through

relating and matching their scholar profile with the online available resources, publications and other

scholar's profiles.

Goals:

Combine streams of Web Collaboration Tools (e.g. Researchgate, Mendeley…) and Social Media

(e.g. Twitter, LinkedIn...) to track scientifically related communication and align it with the Web of

Data (such as COLINDA, DBLP, PubMed). This allows developing an efficient real-time monitor and

a useful environment for researchers. The monitor needs to allow interaction with the users, like a

dashboard. The user's personal research library and preferences could be matched with those of

others. This allows links to be made to social and research data beyond a single researcher's scope

and be a great source for inspiration (what is relevant to me?) and overview (what's hot right now

around me?). This should lead to more fine-grained details facilitating researchers to obtain a

sophisticated selection and linking of contributed resources based on previous assessments and

explored links.

Towards Intelligent Web Experiences: A Contextual User Journey


Supervisors: Laurens De Vocht



Number of theses: 1

Contact: Laurens De Vocht

Keywords: Storytelling, Information Retrieval, Big Data, Linked Data, Pathfinding, User Interaction


Problem definition:

The number of resources that are now available to users on the Web, is rapidly expanding. Although

this Big Data may be structured and even linked, the underlying structure still looks like a maze to

most users. Therefore, there is an abundance of apps and web pages trying to hide this for the user

and of course there are search engines helping the user travel to the right page at all times.

Rather than imposing a linear journey on users through the hierarchies that website navigation and

lists of links impose, a networking-based contextual experience is based on a user’s search and the

relationships that form around what a user searches for. In this way, users navigate the network in

an order of their choice—establishing their own personalized web experience. There is still some

order as resources in the network are related to each other through modeled relationships and

ontologies. But, these relationships can run in multiple directions rather than one direction. The

categories become less important and the focus is put around the user and the content that is most

relevant to them.

You will investigate an approach to influence the generation of explanations of how resources are

related to each other in real-time in a personalized user context. Each explanation can be seen as a

useful/relevant re-combination of multiple associations between resources, for example:

Trivia Finding / Personalized Story: DBpedia (Structured version of Wikipedia)

Research 2.0: Recreate Events (Conferences) based on data of Web Collaboration Tools,

Digital Libraries, Linked Open Data

Medicine: Drug discovery and genome data analysis.

Goals:

In this master’s thesis, the goal is to generate an experience which is both relevant to the user and

coherent. This includes an optimization on how the users interact with data, without violating any

rules of the context in which it is applied (e.g. chosen topics, required resources). Rather than living

in the past, you will investigate methods that make it possible to look toward the future, providing

inspiration as you discover things you didn’t know before. Relationships in the data can suggest new

songs you may want to listen to or people you may want to meet. The user's journey on the Web

evolves from being enforced linearly (passing through search engines over and over) to a network

of data represented in way they like.

User experience modeling in mobile applications using behavior stream visualization,

clustering and sentiment extraction


Supervisors: Dieter De Witte, Azarakhsh Jalalvand, Paul Davies (UXProbe)



Number of theses: 1

Contact: [email protected], [email protected]

Keywords: Big Data Visualizations, Clustering, Sentiment extraction, Machine Learning

Location: Home, Zuiderpoort, Boerentoren Antwerp

Problem definition:

Nowadays, being a successful mobile application developer is a challenging task. Apart from the

fierce competition, you need to convince users to install your product and to keep using it. In this

context, it is of critical importance to identify and resolve issues/nuisances as soon as possible.

Therefore developers are eager to track user behavior and user sentiment in their application and to

identify, analyze and even predict bad user experiences.

UXProbe (http://www.uxprobe.org/), a startup specialized in usability design, created a service

which developers can integrate in their mobile application. The service collects event streams of

user interactions, errors that occur etc. What makes their service unique is that these event streams

are enriched with user generated sentiment feedback in the form of ratings, emoticons and mini-

surveys resulting in a very rich dataset. Deriving and communicating insights from this data in a

scalable fashion for increasing user bases, within limited time constraints is a challenging task for

which UXProbe needs you!

Goals:

As a first step you will create a number of dynamic dashboard-like visualizations to allow the visual

exploration of the user behavior patterns and the emotions associated with them. This will enable

development teams to quickly spot unexpected interaction patterns and assess the effectiveness of

the solutions they provide to address these.

In a second phase the thesis will, depending on your interests focus on:

Clustering the events into categories of ‘error motifs’.

The design of a prioritization queue based on the results of user sentiment analysis.

The design of a learning algorithm which can predict upfront which problems a user is likely to face

in the near future.

Analyzing and exploring links in biomedical graphs for drug discovery


Supervisors: Dieter De Witte, Laurens Devocht & Filip Pattyn (Ontoforce)



Number of theses: 1

Contact: [email protected], [email protected]

Keywords: Biomedical Data Mining, Graph Algorithms, Big Data


Problem definition:

Ontoforce is a startup that has created a solution to transparently query many clinical data sources

simultaneously. Their semantic technology - disQover - can query data from molecular databases,

scientific publications, clinical trials etc., allowing pharmaceutical companies to shorten their drug

discovery pipeline significantly. While the focus is currently on semantic technologies, there are

many questions which are more easily mapped onto pathfinding algorithms than to queries, a simple

example being: is there a path between lung cancer and smoking via genetic mutations? Neo4J is a

graph database optimized for graph traversal queries. Rik Van Bruggen, regional director of Neo

Technology in Belgium will be available as technical advisor on how to implement certain use cases.

Goals:

Design a number of specialized path finding algorithms in Neo4J and assess the

limitations/complementarity of both semantic technologies and graph databases for the use cases

presented. Investigate whether the algorithms can be translated to different context, for example in

automatic storytelling (www.everythingisconnected.be), which under the hood also relies on path

finding algorithms.

mailto:[email protected]

http://www.everythingisconnected.be/

Turning Social Media Data to a Semantic Gold Mine


Supervisors: Anastasia Dimou, Laurens De Vocht



Number of theses: 1

Contact: Anastasia Dimou

Keywords: Social Media, Linked Data mapping, Linked Data publishing


Problem definition:

Incorporating data from social media, e.g. twitter, Facebook, LinkedIn, into the Linked Open Data

cloud has a lot of value for marketing analysis, business and research. In real-world situations, this

information cannot be efficiently searched as we cannot query it, e.g. we cannot search when a user

last mentioned a place.

Goals:

In this thesis, you research extract-map-load approaches where handling the mapping is driven by

the data in real time. You investigate how to perform an efficient execution plan that handles such

data streams and maps them to RDF. The focus is on mapping social media data to the RDF data

model so you can bridge information from different social media in a way that you can identify

interesting trends around people, topics or events while you filter out the noisy data.

Visualizing biological sequence motifs using high performance multidimensional scaling


Supervisors: Dieter De Witte



Number of theses: 1

Contact: [email protected],

Keywords: Bioinformatics, Big Data Visualizations, Clustering


Problem definition:

The DNA of an organism consists of approximately 1% coding sequences, i.e. genes, and 99%

noncoding sequences. Regulatory sequences are hidden in the vicinity of the genes and help

regulate transcription (the process of translating DNA into RNA). In a later step RNA is then

synthesized into proteins. The accurate discovery of regulatory motifs in sequence data is very

difficult since these motifs are generally short and allow multiple wildcards.

An exhaustive discovery algorithm has already been developed which generates a database of

motifs which are overrepresented in sequences of closely related organisms. The amount of data is

however still too large to derive clear insights yet.

Goals:

The motif database generated by the discovery algorithm generates conservation information about

millions of short motifs. When one motif is found to be significant many highly similar motifs will

generally also be significant which implies that they can be clustered and the cluster as a whole

might have a biological meaning.

Multidimensional scaling is a technique which allows the visualization of high dimensional data by

mapping it onto a 2D or 3D Euclidean space while only requiring a well-chosen string distance

between the motifs.

The thesis student will investigate whether this algorithm can be used for large datasets of short

motifs, investigate the scalability and develop an end-to-end solution in which a biologist can

explore the data in an interactive way. As an extension the results from different motif algorithms

can be visually compared.


Low latency querying of huge Linked Medical datasets using Big Data technologies


Supervisors: Dieter De Witte, Laurens Devocht



Number of theses: 1


Keywords: Big Data Architectures, Linked Data, Biomedical Data Mining


Problem definition:

The semantic web is a continuously expanding collection of linked datasets. Transparently querying

all these datasets together is a technical challenge for which multiple solutions have been

developed in the past years: from smart LDF clients to federated querying. If data dumps are

available another approach is possible: collect all dumps in a distributed file system (HDFS) and

query them using SQL-like tools developed for Big Data systems. Semantic data is generally

queried using the SPARQL language for which there is no real solution in the Big Data ecosystem

yet. Depending on the availability and the volume of the data this approach might be the preferred

or even the only solution.

Goals:

In a first step a literature study will be performed to get a clear overview of all attempts that have

been made to translate this problem into the Big Data space. In the next step a solution will be

developed which provides a SPARQL interface to a Big Data system of choice. Most recently

Apache Spark has risen to be the most popular big data technology and it would be interesting to

see if its superior data processing performance translates to linked data querying. As a benchmark

a set of queries on multiple medical datasets generated by the disQover platform of Ontoforce will

be used to compare the Big Data approach to the already available solutions based on federated

querying.


A journey planner for the galaxy


Supervisors: Pieter Colpaert, Ruben Verborgh



Number of theses: 1


Keywords: hypermedia, REST, Linked Data, route planning


Problem definition:

Today, it would be impossible to write an app which plans routes throughout the entire galaxy: all

the transit data of e.g., buses, space shuttles, trains, bikes or space elevators would have to be

collected on one machine. This machine would then have to calculate hard to cache route planning

advice for all the user agents in the galaxy.

Within our lab we are developing a different server-client mechanism: instead of opening route

planning APIs for wide use, we suggest publishing the raw arrivals and departures using a REST

API. This way, route planning user agents can follow links to get more data on the fly. We are

implementing this data publishing interface with all the transit data that is already available world-

wide and we have written a proof of concept user agent which now calculates the routes by

following links, and this within a reasonable query time.

Goals:

The goal of this thesis is to research the possibilities to make these user agents (which may be both

servers or mobile apps) more intelligent in various ways: e.g., the process could be sped up if we

can pre-fetch data on the client-side or the result can be more suitable for end-users if our user-

agent is able to discover other datasets (e.g., when I’m in Caracas, I only want these routes with the

least criminality reported, or when I’m in a wheelchair, I only want wheelchair accessible routes).

The user agent that is written needs to be generic: anywhere in the galaxy, it should be able to

automatically discover the right datasets published on the Web.


Building a Big Data streaming architecture for real-time Twitter annotation and querying


Supervisors: Dieter De Witte, Frederic Godin, Gerald Haesendonck

Study Programme: Master Computer Science Engineering, Master Mathematical Informatics,


Number of theses: 1


Keywords: stream querying, linked data, neural networks, semantic annotation

Location: home, Zuiderpoort, Technicum

Problem definition:

Twitter is an online social network service that allows users to send short messages, aka tweets.

Currently over 500 million tweets are being generated every day. Automatically extracting

information from these tweets is a challenging task since they are short, can contain multiple

languages, contain spelling errors etc. Extracting information about tweets which are semantically

related, i.e. deal with a certain topic is far from trivial if they do not contain the same terms.

Goals:

In this thesis the student will be involved in setting up a streaming architecture for enriching tweets

with semantic information. Neural networks will be trained and used to label the tweets and to spot

named entities. The enriched stream will then be converted into semantic RDF triples which can be

queried using a streaming variant of the SPARQL language, for example C-SPARQL. Spark is a Big

Data technology stack that contains tools for both stream processing and batch analysis and is the

recommended technology for tackling this kind of problem.

Distributed query answering on the open Web


Supervisors: Ruben Verborgh, Miel Vander Sande



Number of theses: 1

Contact: Ruben Verborgh

Keywords: Linked Data, Web, Semantic Web, querying, distributed systems


Problem definition:

Mail [email protected] to discuss this subject.

What do your friends think of movies directed by Martin Scorsese?

What Nvidia graphics cards have few bug reports on Linux?

Is it cheaper to buy certain flights and hotels separately or together?

None of the above questions can be answered by a single data source, yet today’s Web technology

still focuses on single-source answer systems. This is problematic because a) it’s not scalable,

since that single source will need to process a lot of queries, and b) that source doesn’t have all the

data it needs to answer questions such as the above. The idea behind Linked Data Fragments

(http://linkeddatafragments.org/) is that clients, instead of servers, should answer queries. Sources

should offer fragments of data in such a way that clients can combine them to answer questions that

span multiple datasets.

A client that works with a single data source already exists

(https://github.com/LinkedDataFragments/Client). Your task in this master’s thesis is to extend this

client – or build a new one – so that it can query different data sources for a single query.

Goals:

Developing a scalable method to answer queries using different data sources.

Describing to a client which data sources are relevant for a given query.

Evaluating your solution on aspects such as accuracy and performance.


http://linkeddatafragments.org/

https://github.com/LinkedDataFragments/Client

Querying multimedia data on the (social) Web


Supervisors: Ruben Verborgh, Miel Vander Sande



Number of theses: 1


Keywords: Linked Data, Web, Semantic Web, querying, multimedia, images, video


Problem definition:


How would you find YouTube movies about “New York” in which people mention the Twin

Towers?

How could you find images that depict two people shaking hands?

Even though there is a large amount of metadata available on the Web, finding images and video

can be quite difficult. The goal of this thesis is to build an intelligent client (for instance, as a browser

extension) that is able to find multimedia items on the Web. This saves users many search

operations on different datasets. For this, you will need to combine metadata from different sources.

A starting point for this is the Linked Data Fragments client (http://linkeddatafragments.org/), which

already allows to query the Web Of Data. Your task is to blur the border between textual and

multimedia search, making it easier to find those media items users are looking for.

Goals:

Developing a client to find multimedia data on the Web.

Finding methods to query existing multimedia platforms such as YouTube and Instagram.

Evaluating your solution on aspects such as recall, precision, and performance.



Real-time querying of transport data on the Web


Supervisors: Ruben Verborgh, Pieter Colpaert



Number of theses: 1


Keywords: Linked Data, Web, Semantic Web, querying, transport, train


Problem definition:


The Belgian rail website allows you to plan your journey, but only in very rigid ways. It does not take

into account your current location and plans. Suppose you need to be in a certain building in

Brussels for a meeting. That morning, you decide to take the train at 14:06. Unfortunately, that train

is severely delayed later on, but you won’t know that until you check the website again. In this thesis,

you develop a querying system over the Web that allows to retrieve real-time results that are

continuously updated. Based on data from different sources, your system automatically picks those

fragments that are necessary for users to plan their journey. You can build on existing work for Web

querying, such as Linked Data Fragments (http://linkeddatafragments.org/).

Goals:

Developing a real-time querying mechanism for transport data.

Planning a route using different fragments of data from the Web.

Evaluating your solution on aspects such as bandwidth and performance.



Predictive analytics for the Internet of Things


Supervisors: Dieter De Witte, Sven Beauprez (IoTBE.org)



Number of theses: 2


Keywords: Internet of things, linked data publishing, stream reasoning, big data analytics


Problem definition:

In the upcoming years the number of devices for the Internet of Things will grow exponentially. IoT

will therefore become the largest source of streaming data. In order to derive insights from this data

it should be converted into a format which can be queried and allows easy semantic enrichment.

Linked open data is the ideal candidate to fulfill this task. To prepare for this innovation a prototype

environment is required which will reveal the challenges for the upcoming data explosion. As a data

source we will make use of Tessel Microcontrollers (www.tessel.io) which can be easily configured

using only JavaScript and which are extendible with a wide range of sensors: audio, video,

temperature, Bluetooth etc. The built-in Wi-Fi allows for a straightforward data transfer to a Big Data

infrastructure.

Goals:

In this thesis the student(s) will be involved in setting up an architecture for analyzing and enriching

sensor streams with semantic information. The annotated streams can be queried using a streaming

variant of the SPARQL language, used for (static) linked data.

In a first phase the student will build an experimental setup with Tessel devices as the data source.

The data generated by the devices will be automatically ingested in a Big Data streaming

architecture. For the analytics platform the student will explore the possibilities of the Apache Spark

stack which contains tools for both stream processing and batch analysis.

Since multiple students can work on this topic the focus of the thesis can be aligned with the

interests of the student: if the data streams are captured he/she can focus on enrichment,

visualizations or on optimizing the performance of the data pipeline.

http://www.tessel.io/

Web-based framework for real-time data visualization and scene graph management

Promotor: Peter Lambert and Rik Van de Walle

Supervisors: Jelle Van Campen, Tom Pardidaens and Christophe Herreman (Barco)

Study Programme: Master in Computer Science Engineering, Master of Mathematical Informatics


Number of theses: 1

Contact: Christophe Herreman (Barco)

Keywords: html5, web, real-time, scene graph, data visualization

Location: home, Zuiderpoort, Barco

Problem definition:

Barco develops systems and solutions for professional visualization applications in many different

markets and increasingly requires web-based front-ends for these applications. The applications

typically have stringent requirements towards image quality, latency and user experience and deal

with large amounts of live video and data channels. At this moment, front-end applications are

developed by integrators or specifically for a single customer.

Goals:

The goal of this project is to develop prototypes, which demonstrate different technologies to render

highly dynamic scenes such as geographical maps or 3D city models, and augment the visualization

with real-time video and meta-data. After analyzing the requirements from various divisions, a study

of the state-of-the-art should be performed including evaluating available technology (commercial or

open source libraries). The suitability of the libraries is to be explored with regards to performance,

latency, scalability and feature completeness. Then, a framework and software architecture is to be

designed and implemented to enhance these capabilities with the in-house capabilities for real-time

video and meta-data visualization and also to enable handling different types of content in single

browser window, possibly combining the capabilities of several libraries.

This thesis is jointly supervised with Barco (www.barco.be).

http://www.barco.be/

Designing an Ontology-Driven Dialog System for Natural Language Understanding

Promotoren: Kris Demuynck en Erik Mannens

Supervisors: Gaëtan Martens([email protected]) en Azarakhsh Jalalvand

Study Programme: Master Computer Science Engineering, Master Electrical Engineering, Master

Mathematical Informatics


Number of theses: 1

Contact: Gaëtan Martens ([email protected]), Azarakhsh Jalalvand

Keywords: Automatic Speech Recognition, Natural Language Understanding, Semantic Web,

Dialog System


Dialog Management (DM) and Natural Language Understanding (NLU) are fields of computer

science, artificial intelligence, and linguistics concerned with the interactions between computers

and human (natural) languages. Nuance Communications is the market leader in those

technologies and delivers currently the most significant advancements in speech recognition

technology.

Problem definition:

An Automatic Speech Recognition (ASR) + NLU system transforms an input speech signal into a

semantically enriched output which consists of intents and interpretations. A dialog system (DS) is

an engine responsible for DM, which allows having a conversation with a machine based on a

predefined set of concepts. These concepts and their relationships are modeled using Semantic

Web technologies by means of an ontology. This ontology defines the behavior of the DS relying on

a semantic reasoner. For example, if the user says “I want to make a phone call” (intent=“Phone”)

then the DS should ask for additional information such as: “Do you want to call somebody from your

contact list or do you want to dial a number?” On the other hand if the user said “I want to call

Gaëtan Martens on his cell phone” the system should not ask for additional information since the

required interpretations (i.e., contactName=“Gaëtan Martens” and phoneType=“cell phone”) are

already available.

Goals:

In this Master thesis the students will build a DS which relies on an OWL ontology to define its

behavior. This ontology has to be created based on a set of uses cases provided by Nuance. The

ontology has to model the intents, the concepts (i.e., the corresponding NLU interpretations), and

the relationships between the concepts. The next step is then to build the DS around an existing

semantic reasoner. The final goal of this challenging thesis is to have a functional DS, built using

open source libraries, that is able to use the output from Nuance’s state-of-the-art ASR+NLU system

and is configured with the created ontology.

The students will have the opportunity to work on cutting-edge technology within the Nuance’s

automotive R&D team in an international context.



Building a Natural Language Dialog System with Semantic Web technology

Promotoren: Kris Demuynck en Erik Mannens

Supervisors: Gaëtan Martens([email protected]) en Azarakhsh Jalalvand

Study Programme: Master Computer Science Engineering, Master Electrical Engineering, Master

Mathematical Informatics


Number of theses: 1

Contact: Gaëtan Martens ([email protected]), Azarakhsh Jalalvand

Keywords: Automatic Speech Recognition, Natural Language Understanding, Semantic Web,

Dialog System

Dialog Management (DM) and Natural Language Understanding (NLU) are fields of computer

science, artificial intelligence, and linguistics concerned with the interactions between computers

and human (natural) languages. Nuance Communications is the market leader in those

technologies and delivers currently the most significant advancements in speech recognition

technology.

Problem definition:

An Automatic Speech Recognition (ASR) + NLU system transforms the input speech signal into a

semantically enriched textual output. This output consists of an n-best list of intents (e.g., “Call”) with

a set of possible interpretations (such as contactName=“Gaëtan Martens” and phoneType=“cell

phone”) and corresponding probabilities. The dialog system (DS) is the engine responsible for DM,

which allows having a conversation with a machine based on a predefined set of concepts. This set

of concepts (and their inter-relationships) is modeled using Semantic Web technologies by means of

an ontology. This ontology then defines the behavior of the DS. The DS is able, given its current

state, to recognize an intent or ask for more information when the input is ambiguous by applying

reasoning techniques.

Goals:

In this master thesis, the goal is to link an existing ASR + NLU system with a DS via ontologies. At

first, the supported intents and interpretations of the NLU system must be formally defined in the

ontology and missing intents (required by the DS) will have to be added. Further alignment tasks will

have to be investigated by testing and iterative enhancements such as improving the mechanism of

the DS to take into account the NLU probabilities. The final goal is to have a functional speech-

enabled ontology-based end-to-end DS, which is still a challenge given the current state of the art.

The student will have the opportunity to work on cutting-edge technology within the Nuance’s

automotive R&D team in an international context.



The Multi-sensory and Sentiment-aware eBook


Supervisors: Hajar Ghaem Sigarchian, Tom De Nies, Frank Salliau



Number of theses: 1

Contact:Hajar Ghaem Sigarchain

Keywords: e-Text-Book, Sentiment Analysis, Internet Of Things, Semantic Web


Problem definition:

People expect more and more when reading e-Books, in terms of entertainment and immersion.

The recent advances in smart environments theoretically allow people to experience an eBook not

only using their eyes but also using their other senses, such as hearing, touch (e.g., thermoception),

etc. Content of such enhanced e-Books can become augmented and reacted upon by smart objects,

using the Internet of Things (IoT). However, many challenges remain to make this a reality, such as

the automatic annotation, exchange of data, and timing (e.g., should the room react any time a

person flips a page of the eBook?).

Goals:

The goal of this master’s thesis is create digital books as an object in IoT. The student needs to

perform a literature review, in order to have better understanding of the concepts and current state

of the art. In addition he/she will propose an appropriate architecture, and data representation

format. The research domains will include smart rooms, sensors, Internet of Things, digital

publishing and Semantic Web. The next step is implementing a prototype as a proof-of-concept.

Eventually, he/she needs to test the prototype to evaluate and qualify the relevancy of the solution

to the end-user (the reader of the book).

Using Events to Connect Books Based on Their Actual Content


Supervisors: Pieter Heyvaert, Ben De Meester, Frank Salliau



Number of theses: 1

Contact: Pieter Heyvaert

Keywords: books, events, Semantic Web, media, metadata


Problem definition:

Many books tell a story about a certain event or a series of events. Different stories (in multiple

books) might even contain the same events. An example of such a series of events can be the

Battle of Waterloo, which includes separate events for the preparation, the first French attack, the

capture of La Haye Sainte, the attack of the Imperial Guard, the capture of Plancenoit, and so on.

Currently, the metadata extracted from books mostly denotes topics, known locations, people, dates,

and so on. However, the connection between people, their location and the date is not considered,

and these elements are what an event consists of. When having this information available, one is

able to determine the similarity between different books regarding a specific event. This is not only

limited to books: when event information is also available for movies, TV series, news articles, and

so on, one is also able to determine how the literature/movies interpret real events: what happens

when non-fiction (e.g., newspaper articles) takes a fiction approach (via e.g., movies) is one of the

questions that might be answered here.

Goals:

You will need (1) to determine how events can be dissected from the several types of media, with

the focus on books, (2) to decide on the granularity of the events extracted from the media, (3) to

determine how to incorporate existing information, if any, about the events and media, and (4) to

determine/visualize the events found in the different sources. As part of the thesis you will need to

create a prototype which takes media (such as books) as input and outputs the events. Next, you’ll

need to visualize the connection between the different media based on the found events. Also the

difference between events can be determined: two books might talk about the same big event,

however, the events making up the big event may be different, may be in a different order, and so

on.

Deriving Semantics from Styling


Supervisors: Ben De Meester



Number of theses: 1

Contact: Ben De Meester

Keywords: HTML5, Semantics, CSS


Problem definition:

HTML is the de facto publication format for the World Wide Web for humans. Using styling

properties (e.g., in a CSS stylesheet), it is possible to style the HTML document to make it pretty.

However, the HTML file itself is rarely pretty for machines. Coding such as ‘<p class=”title”>’ is a

terror, and contradictory to the semantic structure that HTML5 can provide. One thing that will

always be consistent however, is the visual looks of an HTML page versus its intended semantics.

E.g., ‘<p class=”title”>’ will look similar to ‘<h1>’ for a user, but the latter has a lot better semantic

meaning than the former. And better semantics means better processing of the HTML, which results

in a better Web.

Goals:

The goal of this thesis is to improve existing HTML pages, their structure and semantic meaning, by

using, among others, its visual characteristics. To that end, the student needs to define which

criteria could be important, e.g., not only font-size could be important, but also font-weight, position

on a page, name of the CSS class, etc. Next, the student should provide for a proof-of-concept that

can effectively recognize and improve bad HTML constructs. Many techniques may be used,

involving clustering, Natural Language Processing, and ad hoc processing instructions.

Trusting query results on the open Web


Supervisors: Ruben Verborgh, Tom De Nies



Number of theses: 1


Keywords: Linked Data, provenance, trust, Semantic Web, querying, Web, HTTP, client/server


Problem definition:


The Web way of answering a question is to find combined answers from different sources. But

how can we be sure that the final answer is based on sources we can trust? This is the question

you will answer in this thesis.

Because information is spread across different places, the old database paradigm of “query =>

answer” doesn’t really work anymore on the Web. Linked Data Fragments

(http://linkeddatafragments.org/) capture this idea: a client asks servers for different parts of

information and is able to combine it by itself. We built a Node.js application that can query the Web

in such a way (https://github.com/LinkedDataFragments/Client). What this client doesn’t tell you

(yet), is where the different parts of the answer come from; it doesn’t give you a guarantee that the

answer is correct/trustworthy. By combining Linked Data Fragments with provenance technology,

we can precisely assess the trustworthiness of data.

Goals:

- Developing a method to combine the trust in different knowledge sources.

- Describing the trust in an answer that encompasses different sources.

- Developing a client that queries the Web and gives an answer you can trust.



https://github.com/LinkedDataFragments/Client

Automatic Composition of Context-based Content in Digital Books


Supervisors: Hajar Ghaem Sigarchian, Tom De Nies, Wesley De Neve, Frank Salliau



Number of theses: 1

Contact: Hajar Ghaem Sigarchain

Keywords: e-Text-Book, Widgets, Ubiquitous environment, Semantic Web, Profile Manager


Problem definition:

Books, even digital books are a static medium, with fixed contents. There is currently no means of

personalizing and tailoring books to the person who reads them. However, the technology to provide

dynamic contents inside a digital book already exists. Depending on contextual criteria such as the

reader’s interests, location, language, culture, the contents can adapt itself to fit the reader’s needs.

As a general use case, we can refer to non-fiction books such as tourist guide books that e.g.

automatically update their contents based on the reader’s geolocation.

Educational textbooks also provide an excellent use case to prove this principle, with the textbook

adapting itself to the student’s progress and changing interests. Publishers will also benefit from this

approach: it facilitates reuse of existing content; it also can lead to a significant reduction of both

creation and distribution costs and even lead to new business models.

Goals:

The goal of this master’s thesis is to automatically change digital books based on their context with

external open content. The student needs to do a literature review, in order to have better

understanding of the concepts and current state of the art. In addition he/she will propose an

appropriate architecture, and data representation format. The research domains will include

mashups, widgets, cloud-based synchronization, versioning and the Semantic Web.

The next step is implementing a prototype as a proof-of-concept. Eventually, he/she needs to test

the prototype to evaluate and qualify the relevancy of the automatically provided contents.

Mining anonymized user data for predictive analysis

Promotor: Sofie Van Hoecke and Peter Lambert

Supervisors: Glenn Van Wallendael, Benoit Marion ([email protected]), Dirk Van Gheel

([email protected])

Study Programme: Master Computer Science Engineering, Master Industrial Engineering:

Elektronica – ICT, Master Industrial Engineering: Informatica


Number of theses: 1

Contact: Sofie Van Hoecke

Keywords: Data mining, recommendations, crash prevention

Location: home, Zuiderpoort, TPVision Zwijnaarde

Problem definition:

To continuously improve the quality of Philips televisions, TPVision incorporates anonymized user

logging on Android enabled Smart TVs. However, as current TVs are complex systems, a large

amount of user data is logged. To improve the evaluation of user data, advanced data mining

techniques are required in many areas of our operations.

Efficient data mining can result in an optimized recommendation engine to recommend more

interesting TV channels or Android apps, improving the user’s lean back experience. Additionally,

watch later lists combining YouTube, Vimeo, and Spotify can be optimized by providing a prioritized

list.

Furthermore, from the user data the market impact of certain features (such as the availability of

broadcast tuners, pre-installed apps, DLNA connectivity etc.) can be predicted. Identifying

relationships between user behavior and interaction with those features taking into account general

user clusters and regional differences can result in differential marketing strategies.

Finally, the performance of the TV can be improved using machine learning techniques for analysis

of logged data and intervene in a timely manner to prevent a potentially hazardous crash to happen.

Goals:

The goal of this master’s thesis is to apply data mining techniques on anonymous user data from

currently deployed Philips Android TVs within Europe.

Topics to improve user experience, technical enhancements or commercially interesting research

are considerable. Depending on the student’s interest, the actual problem statement can be fine-

tuned.

TP Vision is a dedicated TV player in the world of visual digital entertainment. TP Vision is part of

TPV Technology, the #1 monitor manufacturer in the world. At its Innovation Site Europe, Ghent we

design televisions of the future, for Philips and other brands. NetTV, Ambilight, Android TV and

Cinema 21:9 have all been developed in this innovative and driven environment. Recognition of our

activities is visible in numerous awards such as prestigious EISA-awards.

This master’s thesis is within the Advanced Software Development department which drives the

innovation chain from conceptualization to feasibility and prototyping, as well as leads technology

and standardization roadmaps.



A Proof checker for the Semantic Web


Supervisors: Dörthe Arndt, Ruben Verborgh



Number of theses: 1

Contact: Dörthe Arndt

Keywords: N3Logic, proof, reasoning, semantic web


Problem definition:

The Semantic Web enables computers to understand and reason about public data. There exists a

huge number of reasoners which are able to draw conclusions based on common knowledge (e.g.

Pellet, CWM or EYE). Some of them also provide proofs to clarify how they came to a certain result.

But what is a proof? How can we be sure that a proof is correct? When do we trust it? Could they

be lying?

An independent checker is needed!

Goals:

With this thesis you will help to find a solution for this problem: You will get a better understanding of

N3-logic, the logic used by reasoners such as EYE or CWM

((http://www.w3.org/2000/10/swap/doc/cwm.html, http://eulersharp.sourceforge.net/).

You’ll learn what the current proof format of these reasoners looks like and how it could be improved.

This knowledge will enable you to implement an independent proof checker for N3-proofs which can

handle proofs written in the N3-proof ontology (http://www.w3.org/2000/10/swap/reason.n3).

http://www.w3.org/2000/10/swap/doc/cwm.html

http://eulersharp.sourceforge.net/

http://www.w3.org/2000/10/swap/reason.n3

On-site tijdsregistratie en efficiëntie-analyse in het operatiekwartier

Promotor: Sofie Van Hoecke, Patrick Wouters

Supervisors: dr. Piet Wyffels (UZ Gent)

Study Programme: Master of Science in de industriële wetenschappen: elektronica-ICT - Campus

Kortrijk, Master of Science in Computer Science Engineering, Master of Science in de industriële

wetenschappen: elektronica-ICT - Campus Schoonmeersen, Master of Science in de industriële

wetenschappen: informatica


Number of theses: 1

Contact: Sofie Van Hoecke

Keywords:

Location: home, Zuiderpoort, Technicum or UZ Gent (K12)

Problem definition:

Het organiseren en uitvoeren van een operatieprogramma binnen de dagelijks toegemeten werktijd

blijkt één van de meest complexe taken binnen een ziekenhuis. Er bestaat heel wat literatuur waarin

werkmodellen en systemen worden voorgesteld maar deze zijn slechts beperkt toepasselijk, o.a.

door belangrijke internationale verschillen in de structuur van de gezondheidszorg. Een meer recent

en universeel probleem dat de werkbaarheid van gangbare organisatiemodellen ondermijnt is het

spanningsveld tussen opgelegde budgettaire beperkingen en de stijgende zorgvraag.

Het enige afdoende antwoord hierop is een efficiëntere inzet van bestaande middelen en personeel

en een optimalisatie van de workflow (planning en uitvoer) in deze multidisciplinaire en

hoogtechnologische omgeving. Dit is geen eenvoudige opdracht gezien de aanwezigheid van

onvoorspelbare factoren (urgenties, complicaties), de betrokkenheid en uiteenlopende belangen

van verschillende disciplines (chirurgen, anesthesisten, verpleegkundigen, technici) en de

afstemming op diensten die patiënten moeten aanleveren en ontvangen.

Goals:

Het doel van deze masterproef is het ontwerpen van een intelligent, flexibel en gebruiksvriendelijk

systeem (bv. onder de vorm van web-gebaseerde toepassing) dat een accurate tijdsregistratie op

de werkvloer mogelijk maakt met aandacht voor de verschillende fases en deelaspecten die aan

bod komen bij, en invloed hebben op het verloop van een chirurgische ingreep (opstart tijd, turnover

tijd, installatietijd...). Het ontwikkelen van een geschikt datamodel is hierbij heel belangrijk. De data

die aldus opgemeten wordt, moet eenvoudig en efficiënt kunnen bevraagd worden om

tijdsverliesposten in kaart te brengen en na root cause analysis systematische verbeterinitiatieven te

lanceren.

Tijdens deze masterproef zal de student kennis opdoen omtrent de werking van een

operatiekwartier en met de verschillende partijen afstemmen om de verschillende standpunten en

invalshoeken op te nemen in het ontwerp. Er bestaat momenteel een grote en zeer brede interesse

in (en markt voor) dergelijke toepassingen.

Deze masterproef loopt in samenwerking met de dienst Anesthesie van het UZ Gent.

Real-time data-enrichment and data-mining of multilingual databases in procurement

environment

Promotors: Erik Mannens & Rik Van de Walle

Supervisors: Ruben Verborgh & Dieter De Witte



Number of theses: 1

Contact: Manu Lindekens (GIS International)

Keywords: datamining, data-enrichment, data-cleaning

Location: GIS International – Stapelplein 70 – 9000 Gent

Problem definition:

In a B2B world we as a procurement service provider (www.gisinternational.net) do receive lots of

information from our customers on items we need to buy from different sort of vendors. This

information is in most cases very limited, in different languages, incomplete and polluted.

Based on that limited information we, in a first phase, categorize these articles into different

commodities. This categorization will allow us to define better which vendors to target. A second

step is to enrich the data to be more accurate in ordering the material. These are all steps that are

today done manually but are very labor-intensive. An off-the-shelf software package that does all

this is currently not available.

Secondly the data in our own ERP system needs to be cleaned and enriched in real-time. Currently

we have identified different parties to tackle this problem with intelligent software.

Goals:

The goal of this thesis is to do be the link between the different parties who will develop and

implement the software to solve this problem. Therefore a good understanding of the needs, the

processes and how the software will work is important. GIS International has the knowledge on the

parts, knows which information is needed and commercially where to get it from. The other parties

are top research and technology companies which are experts in their fields. Purpose is to build a

state-of-the-art add-on to the ERP software of GIS (MS Dynamics NAV) which will be the

differentiator in the market.

Do not hesitate to contact us when you are interested in this topic and have some questions!

http://www.gisinternational.net/

An extensible web-based intermodal route planner for Belgium


Supervisors: Pieter Colpaert, Ruben Verborgh

Study Programme: Industriële wetenschappen


Number of theses: 1


Keywords: hypermedia, REST, Linked Data, route planning


Problem definition:

Developers of route planning apps today have to be satisfied with web-services which do the hard

work for them: they offer a black-box algorithm with a finite set of functionalities (e.g.,

http://api.myapp/?from=...&to=...). When the developer would like to make the algorithm take into

account other modes of transport or take into account different edge weights (e.g., depending on

wheelchair accessibility, criminality statistics or probability of being too late), it needs to request

these features to the server-admin.

At MultiMedia Lab, we are researching an alternate server-client trade-off: instead of exposing route

planning algorithms over HTTP, we suggest publishing the raw arrivals and departures using a

REST API with paged fragments (http://linkeddatafragments.org). This way, route planning user

agents, which now execute the algorithm on their own, can follow hypermedia controls to get more

data on the fly.

When we publish an ordered list of arrival/departures (connections), route planning can be done by

relaxing each connection once (shortest path in a DAG). We publish this graph as Linked Open

Data resources for route planning purposes.

Goals:

The goal of this master thesis is to research different strategies to merge graphs from different

transport modes on the scale of Belgium. A visualization (cfr.

http://kevanahlquist.com/osm_pathfinding/) should be created to understand the different strategies

for merging different transport modes.

ghent university iminds multimedia labusers.datasciencelab.ugent.be/lasmeken/mp2015-2016/...research...

Documents