event-based hybrid consistency framework (ebhcf) for distributed annotation records ahmet fatih...

29
Event-Based Hybrid Consistency Framework (EBHCF) for Distributed Annotation Records Ahmet Fatih Mustacoglu [email protected] Advisor: Prof. Geoffrey C. Fox

Upload: clarissa-carr

Post on 03-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Event-Based Hybrid Consistency Framework (EBHCF) for

Distributed Annotation RecordsAhmet Fatih [email protected]

Advisor: Prof. Geoffrey C. Fox

Outline• Online Collaboration, Tools for

Annotation and sharing publications• Motivations and research issues• Event-Based Hybrid Consistency

Framework (EBHCF)• Measurements and Analysis• Contributions and Future works

2

Online Collaboration• Rapid development of tools and services • Aimed at fostering online collaboration and sharing

between users and communities:– Social Bookmarking Tools supports annotation using

keywords called tags and sharing• e.g. del.icio.us

– Tools for annotation and sharing of scholarly publications• Connotea• Citeulike• Bibsonomy

– Social Networking Tools(MySpace)– Video Sharing and annotation

• e.g. YouTube

3

Tools for Annotation and Sharing Publications

• Used for:– Collecting – Annotating– Sharing papers

• Limitations:– Need to enter same entry to each tool– Different and limited metadata storage– No timing information for updated records– Lack of ability to transfer data between tools– Lack of services to extract and import data into a repository– Lack of services to upload data from a repository

4

Metadata Storage in Major

Annotation Tools

5

Motivations • Numerous annotation tools and huge amount

of data• Generates multiple instances of metadata about

the same document

• No time-stamp info for updated records• Causing inconsistencies

• Different and limited metadata storage• Lack of interoperability between annotation

sites• Applying service-based architecture to

annotation systems 6

Research Issues I• Performance– The benefits and costs of using event-based

mechanism and hybrid-consistency framework

• Scalability– Throughput– The number of integrated annotation tools

• Flexibility and Extensibility – Interoperable with other clients via SOAP

messages– Ease of integrating an annotation tool

7

Research Issues II• Event-based Model– Major, Minor Events and dataset– Update concept of entities– Efficient and effective event processing

• Hybrid consistency framework for annotation tools– Conditions and requirements– The way of consistency maintenance• Pull+time based and Push based

– Advantages and disadvantages 8

Event-based Hybrid Consistency Framework

(EBHCF)

9

Communication Manager• Responsible for providing communication

between annotation tools and update manager via gateways

• Utilizes a gateway for each annotation tool, and a parser– Retrieve updates in XML format– Parse and pass updates to update manager– Post updates coming from update manager to

annotation tools

10

11

Communication Manager

Gateway

Communication Manager

Connotea Web APIDelicious Web API

Citeulike RSS feed and HTML

JDOM Parser and XPATH

Pul

l upd

ates

via

H

TT

PC

lient

Pus

h up

date

s vi

a H

TT

PC

lient

Gateway•Interface between hybrid consistency framework and an annotation tool• Provides extensibility •A gateway needs to be deployed for annotation tool that need to be integrated into the system

12

GatewaysEBHCF Modules

EBHCF

Annotation Tools

Event-based Hybrid Consistency Framework

(EBHCF)

13

Annotation Tools Update Manager

• Responsible for: Retrieving the updates from annotation tools periodicallyApplying the updates on the primary copy of each record Propagating the updates back to each annotation toolsPre-retrieval of existing records

14

Annotation Tools Update Manager

Communication Manager

Connotea Thread

Delicious Thread

Citeulike Thread

Database

Check Updates Periodically

Update Primary Copy and propagate updates to other annotation tools

Event-based Hybrid Consistency Framework

(EBHCF)

15

Digital Entity Manager

• Responsible for:Events and dataset creationEvent Processing

•Manages updates made on the primary copy of a digital entity

Pass updates to the Communication Manager

•Handles periodic update management• Deals with history and rollback management of a digital entity

16

Database

Digital Entity Manager

Digital Entity Update Management

History and Rollback Management

Events and Dataset Management

Periodic Update Management

Event-based Model Services

WSDL

Events and Dataset Creation

Event Processing Engine

Summary of the Event-Based Hybrid Consistency Framework

• Event-based model allows – Have all versions of records – Helps ease of changes back to any versions– Once the number of minor events are huge, they can be

converted to a major event (compaction)

• Hybrid Consistency Framework deploys– Pull + time based consistency mechanism – Push based consistency mechanism

• Pre retrieval expedites the consistency maintenance execution time by decreasing the access time.

• The updates can be prioritized based on the priority of annotation tools. 17

Benchmarks• System Performance– Investigation of the maximum number of annotation tools– Improved results for the investigation of the maximum

number of annotation tools

• Scalability– System behavior for the increased minor events– Number of users that can be supported simultaneously

• Consistency Maintenance– Cost of consistency maintenance for carrying out the

updates in primary copy– Cost of consistency maintenance for carrying out the

updates from annotation tools

18

Benchmark I- Performance I

19

Machine Configurations

• The goal is to investigate the maximum number of annotation tools that can be supported by the proposed system without performance degradation

• Every measurement is observed 100 times.• We were able to create max of 1400 threads to represent annotation

tools• We have used:

• Java 2 Standard Edition compiler with version 1.5.0_12. The maximum heap size of Java Virtual Machine (JVM) to1024MB

• Apache Tomcat Server with version 5.0.28. • Apache Axis technology with version 1.2

Cluster Node gf8.ucs.indiana.edu

Processor Intel® XeonTM CPU (2.40GHz)

RAM 2 GB total

OS GNU/Linux (kernel release 2.4.22)

Benchmark I- Performance II

20

Benchmark I- Performance III

21

Benchmark II- Scalability I• The goal is to investigate the scalability of Event-based

Hybrid Consistency Framework implementation• Measure the round trip time for standard operations like

MoreInfo while the number of minor events are increased .• Measure the round trip time for standard operations like

MoreInfo while the message rate increases.

22

Test-4. Event-based Hybrid Consistency Framework – More Info request with increasing

# of minor events

WSDL

Single Thread

Event-based Hybrid Consistency Framework

WSDL

1 user and 400 requests

Database

WSDL

Single Thread

Event-based Hybrid Consistency Framework

WSDL

Database

Test-5. Event-based Hybrid Consistency Framework – More Info request with increasing

Message rates

WSDL

Single Thread

Various # of Clients

Benchmark II- Scalability II

23

Benchmark II- Scalability III

24

Benchmark III- Consistency Maintenance I

• The goal is to investigate the cost of our Hybrid Consistency Framework.

• The cost of consistency maintenance in terms of the time required to carry out updates at the primary-copy holder

• The cost of consistency maintenance in terms of the time required to carry out updates at the annotation tools

25

Event-based Hybrid Consistency Framework Main Database

Annotation Tools

Benchmark III-Consistency Maintenance II

26

Benchmark III-Consistency Maintenance III

27

Contributions• System research

• An Event-Based Hybrid Consistency Framework• Efficient, scalable, and modular• Separation of events as minor and major events• Pull+time based and push based consistency maintenance • Handling various types of metadata coming from several sources

• Identifying the circumstances where this architecture has advantages• Event-based model

• Comprehensive benchmarks to evaluate the scalability, performance and consistency maintenance for the prototype system

• System software• A prototype: Internet Documentation and Integration of Metadata

(IDIOM)• Event-Based Hybrid Consistency Framework for SRG

28

Future Works• Using EBHCF for the deployment of other

online collaboration and sharing tools • Improvement of the current event processing

engine and the hybrid consistency framework• Compaction of increased number of minor

events into a major event• From a single storage of metadata to

distributed storages

29