primary research team & capabilities

8
11 November 2011 1 Primary Research Team & Capabilities Primary Research Team & Capabilities Dept. of Parallel and Distributed Computing Research and Development Areas: Large-scale HPCN, Grid and MapReduce applications Intelligent and Knowledge oriented Technologies Experience from IST: 3 project in FP5: ANFAS, CrosGRID, Pellucid 6 project in FP6: EGEE II, K-Wf Grid, DEGREE (coordinator), EGEE, int.eu.grid, MEDIGRID 4 projects in FP7: Commius, Admire, Secricom, EGEE III Several National Projects (SPVV, VEGA, APVT) IKT Group Focus: Information Processing (Large Scale) Graph Processing Information Extraction and Retrieval Semantic Web Knowledge oriented Technologies Parallel and Distributed Information Processing Solutions: SGDB: Simple Graph Database gSemSearch: Graph based Semantic Search Ontea: Pattern-based Semantic Annotation ACoMA: KM tool in Email EMBET: Recommendation System Experts on MapReduce and IR (Nutch, Solr, Lucene) Director & leader of PDC: Dr. Ladislav Hluchý URL: http://ikt.ui.sav.sk

Upload: thora

Post on 19-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Primary Research Team & Capabilities. URL: http://ikt.ui.sav.sk. Dept. of Parallel and Distributed Computing Research and Development Areas: Large-scale HPCN, Grid and MapReduce applications Intelligent and Knowledge oriented Technologies Experience from IST: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Primary Research Team & Capabilities

11 November 2011 1

Primary Research Team & CapabilitiesPrimary Research Team & Capabilities

Dept. of Parallel and Distributed ComputingResearch and Development Areas:

– Large-scale HPCN, Grid and MapReduce applications– Intelligent and Knowledge oriented Technologies

Experience from IST:– 3 project in FP5: ANFAS, CrosGRID, Pellucid– 6 project in FP6: EGEE II, K-Wf Grid, DEGREE

(coordinator), EGEE, int.eu.grid, MEDIGRID– 4 projects in FP7: Commius, Admire, Secricom, EGEE III

Several National Projects (SPVV, VEGA, APVT)IKT Group Focus:

– Information Processing (Large Scale)– Graph Processing – Information Extraction and Retrieval– Semantic Web– Knowledge oriented Technologies– Parallel and Distributed Information Processing

Solutions:– SGDB: Simple Graph Database– gSemSearch: Graph based Semantic Search– Ontea: Pattern-based Semantic Annotation– ACoMA: KM tool in Email– EMBET: Recommendation System– Experts on MapReduce and IR (Nutch, Solr, Lucene)

Director & leader of PDC: Dr. Ladislav Hluchý

URL: http://ikt.ui.sav.sk

Page 2: Primary Research Team & Capabilities

Approach and SolutionsApproach and Solutions

Page 3: Primary Research Team & Capabilities

Large scale Text and Graph data processingLarge scale Text and Graph data processing

Core Technology• Web crawling

– Nutch + plugins

• Full text indexing and search– lucene, Sorl

• Information Extraction– Ontea, GATE

• All above large scale– Hadoop, S4

• Graph processing and Querying– Simple Graph Database (SGDB)

– gSemSearch

– Neo4j

– Blueprints

11 November 2011 3

Underlined are the technologies developed by IISAS

Page 4: Primary Research Team & Capabilities

Ontea: Information Extraction ToolOntea: Information Extraction Tool

Regex patternsGazetteersResuls

Key-value pairs Structured into trees graphs

Transformers, ConfigurationAutomatic loading of extractors

Visual Annotation Tool Integration with external tools

GATE, Stemers, Hadoop …Multilingual tests

English, Slovak, Spanish, Italian

11 November 2011 4

http://ontea.sf.net

Page 5: Primary Research Team & Capabilities

• Use of Social Network from email• Includes extracted objects• Full text of extracted objects• Related objects discovered and

ordered by spread activation on social network graph

• Faceted search, navigation

Email Search PrototypeEmail Search Prototype

11 November 2011 5

Page 6: Primary Research Team & Capabilities

gSemSearch: Graph based Semantic SearchgSemSearch: Graph based Semantic Search

• Graph/Network of interacting (interconnected) entities• Discovering relation in the Graph (network) using spread of activation algorithm• Showing relations of concrete type, e.g. telephone numbers related to a person• Navigation over related entities• Full-text search of the entities• User interface for search• User interaction with data (merging,

deleting entities) with immediate impact on discovered relations

• Tested on Email Enron Corpus– Email Social Network Search– http://ikt.ui.sav.sk/esns/

11 November 2011 6

Page 7: Primary Research Team & Capabilities

SGDB: Simple Graph DatabaseSGDB: Simple Graph Database

• Storage for graphs• Optimized for graph traversing and spread of activation• Faster then Neo4j for graph traversing operations• Supports Blueprints API• https://simplegdb.svn.sourceforge.net/svnroot/simplegdb/Sgdb3

• Graph Database Benchmarks– Graph Traversal Benchmark for Graph Databases

– http://ups.savba.sk/~marek/gbench.html

– Blueprints API - possibility to test compliant Graph databases

11 November 2011 7

Page 8: Primary Research Team & Capabilities

Future Direction: Relations Discovery in Large Graph DataFuture Direction: Relations Discovery in Large Graph Data

• Motivation– Graph/Network data are everywhere: social networks, web, LinkedData,

transactions, communication (email, phone). – Also text can be converted to graph. – Interconnecting graph data and searching for relations is crucial.

• Approach– Forming semantic trees and graphs from text, web, communication, databases

and LinkedData– User interaction with graph data in order to achieve integration and data

cleansing– Users will do it, if user effort have immediate impact on search results

11 November 2011 8