a comprehensive model for arbitrary result extraction
DESCRIPTION
A Comprehensive Model for Arbitrary Result Extraction. Neal Sample, Gio Wiederhold Stanford University Dorothea Beringer Hewlett-Packard. Shift in Programming Tasks. Integration/Composition. Coding. 1970 1990 2010. Sample Composition Tasks. Logistics - PowerPoint PPT PresentationTRANSCRIPT
A Comprehensive Model for Arbitrary Result
Extraction
Neal Sample, Gio WiederholdStanford University
Dorothea BeringerHewlett-Packard
2 SAC 2002
Shift in Programming Tasks
Coding
Integration/Composition
1970 1990 2010
3 SAC 2002
Sample Composition Tasks Logistics
Reservation and distribution systems, “find the best transportation route from A to B”
Genomics Framework for composing various processing
tools and repositories Modeling
Weather prediction, complex chemical systems, basin modeling
Composition of processes (vs. components, data)
4 SAC 2002
CLAM Composition Language Purely compositional
no primitives for arithmetic no primitives for I/O, etc.
Splitting up CALL-statement parallelism by asynchrony in sequential program novel possibilities for optimizations reduction in complexity of invocation statements
Higher-level language assembly HLLs HLLs compositional paradigm
Intent: Enable domain experts
5 SAC 2002
CLAM Primitives
Pre-invocation:SETUP: set up the connection to a service
SET-, GETPARAM: in a service
ESTIMATE: for optimization
Invocation and result gathering:INVOKE: begin execution
EXAMINE: test progress of an invoked method
EXTRACT: extract results from an invoked method
Termination:TERMINATE: terminate a method invocation/connection to
a service
6 SAC 2002
Data Dependencies & Scheduling
// begin programA = service1();B = service2();C = service3(A,B);D = service4(C);E = service5(C);// end of program
START
END
service1 service2
service3
service4 service5
7 SAC 2002
Runtime: data extraction is hard
Data extraction with native modules worked
No language-level specifications in CLAM E.g., Polling, threading, exception handling…
Multiple middleware for transport difficult mapping CORBA-RMI, RMI-COM, COM-CPAM, etc.
Crisis of legacy services To generalize or restrict?
Refine the strategy…
8 SAC 2002
Strategy: hide it & depend on it
Have to respect service capabilities Or suffer the LCD… (more in a bit)
Simple and flexible programming Data extractions is a runtime issue,
it is not central to composition task Simplified Integration
Legacy ambivalence Simple bridging for middleware Increase audience for services
Better scheduling Declarative language, data dependencies
9 SAC 2002
Where are we?
Declarative language for composition Data is used synchronization No primitives to support synchronization
Apparent “mismatch” in data extraction methods & capabilities among various actors What does the data look like? How can data be extracted?
10 SAC 2002
Data View: Services
RESULTS
Result A Result B Result C
11 SAC 2002
Extraction Techniques Asynchrony
Explicitly controlled: spin-locks, polling, interrupt handling, etc.
Can use with any DAG schedule Partial extraction
web browsing - HTML text as a schema SQL cursors (thanks to the reviewer)
Progressive extraction (exceptional) Adaptive mesh refinements, JPEG
interleaving
12 SAC 2002
Current Focus
Pre-invocation:SETUP: set up the connection to a service
SET-, GETPARAM: in a service
ESTIMATE: for optimization
Invocation and result gathering:INVOKE: begin execution
EXAMINE: test progress of an invoked method
EXTRACT: extract results from an invoked method
Termination:TERMINATE: terminate a method invocation/connection to
a service
13 SAC 2002
Current Focus
Pre-invocation:SETUP: set up the connection to a service
SET-, GETPARAM: in a service
ESTIMATE: for optimization
Invocation and result gathering:INVOKE: begin execution
EXAMINE: test progress of an invoked method
EXTRACT: extract results from an invoked method
Termination:TERMINATE: terminate a method invocation/connection to
a service
14 SAC 2002
EXAMINE Primitive in CLAM Returns “status” and “progress”
Status – 2 bits of state status = {DONE, NOT_DONE, PARTIAL, ERROR}
Progress – open descriptor Indicates progress in application specific-way Could be variance, mean, amplitude, etc. Default assumption: integer 0-100 = % done
Resolution of EXAMINE Can apply per service (black box) Can apply per result (white box)
Not complete for many legacy systems:only “status”, no “progress”
15 SAC 2002
EXAMINEService
A B C
Service.EXAMINE() {NOT_DONE, 0}
Service.EXAMINE(A) {NOT_DONE, 0}
Service.EXAMINE(B) {NOT_DONE, 0}
Service.EXAMINE(C) {NOT_DONE, 0}
Service
A B C
Service.EXAMINE() {PARTIAL, 40}
Service.EXAMINE(A) {DONE, 100}
Service.EXAMINE(B) {NOT_DONE, 0}
Service.EXAMINE(C) {PARTIAL, 20}
Service
A B C
Service.EXAMINE() {DONE, 100}
Service.EXAMINE(A) {DONE, 100}
Service.EXAMINE(B) {DONE, 100}
Service.EXAMINE(C) {DONE, 100}
16 SAC 2002
EXTRACT Primitive Extracts data from a service
Per service (black box) (var) = Service.EXTRACT();
Per result (white box) (varA = A, varC = C) = Service.EXTRACT();
Allows partial data extraction saves volume: abandon uninteresting
elements saves time: termination of useless invocation
Allows progressive data extraction with 2-value EXAMINE (status+progress) Steering, time saving
17 SAC 2002
Examine-Extract Relationship
per servicestatus only
per service per result
EXTRACT
per servicestatus+progress
per resultstatus only
per resultstatus+progress
asynchronousprocedure call,
Java RMI
limited Partial Extraction,
(binary) thumbnails
partitionedprogressive extract
(full result set) ?semantic partial
extraction(full result set)
partial extractionbrowsing, SQL cursor
(no progressive)
progressiveextraction
(full result set)
progressive andpartial extraction
CLAM
18 SAC 2002
Examine-Extract Relationship
per servicestatus only
per service per result
EXTRACT
per servicestatus+progress
per resultstatus only
per resultstatus+progress
asynchronousprocedure call,
Java RMI
limited Partial Extraction,
(binary) thumbnails
partitionedprogressive extract
(full result set) ?semantic partial
extraction(full result set)
partial extractionbrowsing, SQL cursor
(no progressive)
progressiveextraction
(full result set)
progressive andpartial extraction
CLAM
19 SAC 2002
Examine-Extract Relationship
per servicestatus only
per service per result
EXTRACT
per servicestatus+progress
per resultstatus only
per resultstatus+progress
asynchronousprocedure call,
Java RMI
limited Partial Extraction,
(binary) thumbnails
partitionedprogressive extract
(full result set) ?semantic partial
extraction(full result set)
partial extractionbrowsing, SQL cursor
(no progressive)
progressiveextraction
(full result set)
progressive andpartial extraction
*CLAM
20 SAC 2002
Examine-Extract Relationship
per servicestatus only
per service per result
EXTRACT
per servicestatus+progress
per resultstatus only
per resultstatus+progress
asynchronousprocedure call,
Java RMI
limited Partial Extraction,
(binary) thumbnails
partitionedprogressive extract
(full result set) ?semantic partial
extraction(full result set)
partial extractionbrowsing, SQL cursor
(no progressive)
progressiveextraction
(full result set)
progressive andpartial extraction
*CLAM
21 SAC 2002
Examine-Extract Relationship
per servicestatus only
per service per result
EXTRACT
per servicestatus+progress
per resultstatus only
per resultstatus+progress
asynchronousprocedure call,
Java RMI
limited Partial Extraction,
(binary) thumbnails
partitionedprogressive extract
(full result set) ?semantic partial
extraction(full result set)
partial extractionbrowsing, SQL cursor
(no progressive)
progressiveextraction
(full result set)
progressive andpartial extraction
*CLAM
22 SAC 2002
Examine-Extract Relationship
per servicestatus only
per service per result
EXTRACT
per servicestatus+progress
per resultstatus only
per resultstatus+progress
asynchronousprocedure call,
Java RMI
limited Partial Extraction,
(binary) thumbnails
partitionedprogressive extract
(full result set) ?semantic partial
extraction(full result set)
partial extractionbrowsing, SQL cursor
(no progressive)
progressiveextraction
(full result set)
progressive andpartial extraction
*CLAM
23 SAC 2002
Examine-Extract Relationship
per servicestatus only
per service per result
EXTRACT
per servicestatus+progress
per resultstatus only
per resultstatus+progress
asynchronousprocedure call,
Java RMI
limited Partial Extraction,
(binary) thumbnails
partitionedprogressive extract
(full result set) ?semantic partial
extraction(full result set)
partial extractionbrowsing, SQL cursor
(no progressive)
progressiveextraction
(full result set)
progressive andpartial extraction
*CLAM
24 SAC 2002
Examine-Extract Relationship
per servicestatus only
per service per result
EXTRACT
per servicestatus+progress
per resultstatus only
per resultstatus+progress
asynchronousprocedure call,
Java RMI
limited Partial Extraction,
(binary) thumbnails
partitionedprogressive extract
(full result set) ?semantic partial
extraction(full result set)
partial extractionbrowsing, SQL cursor
(no progressive)
progressiveextraction
(full result set)
progressive andpartial extraction
*CLAM
25 SAC 2002
Conclusions Data extraction hiding is bueno!
User is not responsible for data management
Synchronizing extractions not in the language simplicity
Enables effective service scheduling
Simplified integration Blueprint for proactive design
pattern for future services