event-based hybrid consistency framework (ebhcf) for distributed annotation records ahmet fatih...
TRANSCRIPT
Event-Based Hybrid Consistency Framework (EBHCF) for
Distributed Annotation RecordsAhmet Fatih [email protected]
Advisor: Prof. Geoffrey C. Fox
Outline• Online Collaboration, Tools for
Annotation and sharing publications• Motivations and research issues• Event-Based Hybrid Consistency
Framework (EBHCF)• Measurements and Analysis• Contributions and Future works
2
Online Collaboration• Rapid development of tools and services • Aimed at fostering online collaboration and sharing
between users and communities:– Social Bookmarking Tools supports annotation using
keywords called tags and sharing• e.g. del.icio.us
– Tools for annotation and sharing of scholarly publications• Connotea• Citeulike• Bibsonomy
– Social Networking Tools(MySpace)– Video Sharing and annotation
• e.g. YouTube
3
Tools for Annotation and Sharing Publications
• Used for:– Collecting – Annotating– Sharing papers
• Limitations:– Need to enter same entry to each tool– Different and limited metadata storage– No timing information for updated records– Lack of ability to transfer data between tools– Lack of services to extract and import data into a repository– Lack of services to upload data from a repository
4
Motivations • Numerous annotation tools and huge amount
of data• Generates multiple instances of metadata about
the same document
• No time-stamp info for updated records• Causing inconsistencies
• Different and limited metadata storage• Lack of interoperability between annotation
sites• Applying service-based architecture to
annotation systems 6
Research Issues I• Performance– The benefits and costs of using event-based
mechanism and hybrid-consistency framework
• Scalability– Throughput– The number of integrated annotation tools
• Flexibility and Extensibility – Interoperable with other clients via SOAP
messages– Ease of integrating an annotation tool
7
Research Issues II• Event-based Model– Major, Minor Events and dataset– Update concept of entities– Efficient and effective event processing
• Hybrid consistency framework for annotation tools– Conditions and requirements– The way of consistency maintenance• Pull+time based and Push based
– Advantages and disadvantages 8
Communication Manager• Responsible for providing communication
between annotation tools and update manager via gateways
• Utilizes a gateway for each annotation tool, and a parser– Retrieve updates in XML format– Parse and pass updates to update manager– Post updates coming from update manager to
annotation tools
10
11
Communication Manager
Gateway
Communication Manager
Connotea Web APIDelicious Web API
Citeulike RSS feed and HTML
JDOM Parser and XPATH
Pul
l upd
ates
via
H
TT
PC
lient
Pus
h up
date
s vi
a H
TT
PC
lient
Gateway•Interface between hybrid consistency framework and an annotation tool• Provides extensibility •A gateway needs to be deployed for annotation tool that need to be integrated into the system
12
GatewaysEBHCF Modules
EBHCF
Annotation Tools
Annotation Tools Update Manager
• Responsible for: Retrieving the updates from annotation tools periodicallyApplying the updates on the primary copy of each record Propagating the updates back to each annotation toolsPre-retrieval of existing records
14
Annotation Tools Update Manager
Communication Manager
Connotea Thread
Delicious Thread
Citeulike Thread
Database
Check Updates Periodically
Update Primary Copy and propagate updates to other annotation tools
Digital Entity Manager
• Responsible for:Events and dataset creationEvent Processing
•Manages updates made on the primary copy of a digital entity
Pass updates to the Communication Manager
•Handles periodic update management• Deals with history and rollback management of a digital entity
16
Database
Digital Entity Manager
Digital Entity Update Management
History and Rollback Management
Events and Dataset Management
Periodic Update Management
Event-based Model Services
WSDL
Events and Dataset Creation
Event Processing Engine
Summary of the Event-Based Hybrid Consistency Framework
• Event-based model allows – Have all versions of records – Helps ease of changes back to any versions– Once the number of minor events are huge, they can be
converted to a major event (compaction)
• Hybrid Consistency Framework deploys– Pull + time based consistency mechanism – Push based consistency mechanism
• Pre retrieval expedites the consistency maintenance execution time by decreasing the access time.
• The updates can be prioritized based on the priority of annotation tools. 17
Benchmarks• System Performance– Investigation of the maximum number of annotation tools– Improved results for the investigation of the maximum
number of annotation tools
• Scalability– System behavior for the increased minor events– Number of users that can be supported simultaneously
• Consistency Maintenance– Cost of consistency maintenance for carrying out the
updates in primary copy– Cost of consistency maintenance for carrying out the
updates from annotation tools
18
Benchmark I- Performance I
19
Machine Configurations
• The goal is to investigate the maximum number of annotation tools that can be supported by the proposed system without performance degradation
• Every measurement is observed 100 times.• We were able to create max of 1400 threads to represent annotation
tools• We have used:
• Java 2 Standard Edition compiler with version 1.5.0_12. The maximum heap size of Java Virtual Machine (JVM) to1024MB
• Apache Tomcat Server with version 5.0.28. • Apache Axis technology with version 1.2
Cluster Node gf8.ucs.indiana.edu
Processor Intel® XeonTM CPU (2.40GHz)
RAM 2 GB total
OS GNU/Linux (kernel release 2.4.22)
Benchmark II- Scalability I• The goal is to investigate the scalability of Event-based
Hybrid Consistency Framework implementation• Measure the round trip time for standard operations like
MoreInfo while the number of minor events are increased .• Measure the round trip time for standard operations like
MoreInfo while the message rate increases.
22
Test-4. Event-based Hybrid Consistency Framework – More Info request with increasing
# of minor events
WSDL
Single Thread
Event-based Hybrid Consistency Framework
WSDL
1 user and 400 requests
Database
WSDL
Single Thread
Event-based Hybrid Consistency Framework
WSDL
Database
Test-5. Event-based Hybrid Consistency Framework – More Info request with increasing
Message rates
WSDL
Single Thread
Various # of Clients
Benchmark III- Consistency Maintenance I
• The goal is to investigate the cost of our Hybrid Consistency Framework.
• The cost of consistency maintenance in terms of the time required to carry out updates at the primary-copy holder
• The cost of consistency maintenance in terms of the time required to carry out updates at the annotation tools
25
Event-based Hybrid Consistency Framework Main Database
Annotation Tools
Contributions• System research
• An Event-Based Hybrid Consistency Framework• Efficient, scalable, and modular• Separation of events as minor and major events• Pull+time based and push based consistency maintenance • Handling various types of metadata coming from several sources
• Identifying the circumstances where this architecture has advantages• Event-based model
• Comprehensive benchmarks to evaluate the scalability, performance and consistency maintenance for the prototype system
• System software• A prototype: Internet Documentation and Integration of Metadata
(IDIOM)• Event-Based Hybrid Consistency Framework for SRG
28
Future Works• Using EBHCF for the deployment of other
online collaboration and sharing tools • Improvement of the current event processing
engine and the hybrid consistency framework• Compaction of increased number of minor
events into a major event• From a single storage of metadata to
distributed storages
29