lightweight service oriented parallelism

46
Lightweight Service Oriented Parallelism Paul Roe Queensland University of Technology (QUT) [email protected]

Upload: deanne

Post on 08-Jan-2016

28 views

Category:

Documents


1 download

DESCRIPTION

Lightweight Service Oriented Parallelism. Paul Roe Queensland University of Technology (QUT) [email protected]. Brisbane. QUT. Queensland University of Technology (QUT) One of largest universities in Australia: 40,000+ students (undergraduate, postgraduate, 10% international) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lightweight Service Oriented Parallelism

Lightweight Service Oriented Parallelism

Paul RoeQueensland University of Technology (QUT)

[email protected]

Page 2: Lightweight Service Oriented Parallelism

2

QUT• Queensland University of Technology (QUT)• One of largest universities in Australia: 40,000+ students

(undergraduate, postgraduate, 10% international)• Applied emphasis, strong links with industry• Motto “A university for the real world”• Faculty of IT, 4000 students, 20% international

BrisbaneBrisbane

Page 3: Lightweight Service Oriented Parallelism

3

My Background

• Academic at QUT for 10 years

• I am a computer scientist background in– Programming languages– Distributed computing

• Practical / applied emphasis

• I lead a small research group interested in grid computing and eScience

Page 4: Lightweight Service Oriented Parallelism

4

Two Parts

• Introduction to web services and service orientation

• Lightweight Service Oriented Parallelism

Page 5: Lightweight Service Oriented Parallelism

5

Web services

Page 6: Lightweight Service Oriented Parallelism

6

Web services (WS)

• Computer to computer messaging using XML• Typically SOAP for messaging protocol with WSDL (Web

Service Definition Language)– Standard and platform neutral

• Designed for eCommerce and enterprise application integration

• Similarities with MPI– message passing– Support for different message exchange patterns

• Web service principles and technologies are evolving– Originally SOAP was for lightweight RPC between objects– SOAP and WSDL support RPC and messaging encoding and

styles– Now strong move to XML centric messaging

Page 7: Lightweight Service Oriented Parallelism

7

Why Not CORBA, DCOM, Java RMI etc.?

• Distributed object models try to scale local OO model– Ok for a LAN– Breaks for Internet

• Too complex– Assume an object model, virtual machine etc.– Large investment for little return

• Poor interoperability– WS designed for interoperability – primary goal

• Designed for local area networks rather than Internet• Not standards based (except CORBA)• Problems bootstrapping, ‘all or nothing’ approach• Other attempts e.g. EDI

– Problem fixed, not extensible

Page 8: Lightweight Service Oriented Parallelism

8

XML Basics

• XML is the basis for web services

• XML is platform neutral data language

• XML is three things:1. Family of specifications e.g. XSLT, XPath,

2. Serialisation format (XML 1.0 with tags etc.)

3. Infoset: Model for data

• XML can be described by XML schema

Page 9: Lightweight Service Oriented Parallelism

9

Infoset

• Infoset is a model of XML– Essence of XML

• XML is no longer just a syntax• This is important – opens the way to other

representations of XML

XML is very inefficient; it’s verbose, there’s lots of angle brackets, everything’s a Unicode string, there’s no binary format; you’ve always got to parse it first, and that’s why web services are slow …

XML is very inefficient; it’s verbose, there’s lots of angle brackets, everything’s a Unicode string, there’s no binary format; you’ve always got to parse it first, and that’s why web services are slow …

Wrong!

Page 10: Lightweight Service Oriented Parallelism

10

SOAP

• Provides two key features for XML based messaging– Separation of message header vs payload data

(envelope with header and body)– Standard way to report faults

• No further evolution of SOAP necessary!• Extensible header mechanism supports modular

and composable advanced services e.g. security, transactions and reliability– Vital feature

Page 11: Lightweight Service Oriented Parallelism

11

SOAP

<envelope>

<header>: Message context

<body>: Message payload, data

<fault>: Soap error (optional)

Page 12: Lightweight Service Oriented Parallelism

12

SOAP Extensible Headers

<soap:Envelopexmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xmlns:xsd="http://www.w3.org/2001/XMLSchema"xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">

<soap:Header> <t:Transaction xmlns:t="some-URI" soap:mustUnderstand="1"> 5 </t:Transaction> </soap:Header> <soap:Body> <Add xmlns="http://www.qut.edu.au/"> <a>1</a> <b>2</b> </Add> </soap:Body></soap:Envelope>

Extensible header info: can be optional or mandatory

SOAP body, message payload

Page 13: Lightweight Service Oriented Parallelism

13

WSDL (1.1)

<definitions>: root element

<types>: What data types will be transmitted?

<messages>: What messages will be transmitted?

<portType>: What operations (functions) will be supported?

<binding>: How will messages be transmitted + SOAP specifics, encoding etc.

<service>: Where is the service located?

Abstract,c.f.interface

concrete

WSDL is an XML document. Elements can be split across multiple files.

(Typically XML Schema)

Page 14: Lightweight Service Oriented Parallelism

14

Web service invocation:The big picture

WebserviceProxy

WSDL doc (contains/refs XML schema)

Generate usingdeveloper tools e.g.Visual Studio or Eclipse

describes

XMLdocument

ClientProgram

sender receiver

ServerProgram

Webservice

stub

Deserialisemessage

Serialisemessage

Send XMLmessageon the wire,SOAP format

Page 15: Lightweight Service Oriented Parallelism

15

Web Services Landscape

SecurityReliable

MessagingTransactions

WS-Policy

WS-Addressing, MTOM

XML, SOAP

HTTP, HTTPS, SMTP, TCP, …Transport

Messaging

Composableservice assurances

WSDL,XML Schema

Discovery: UDDI,WS–Discovery,

MetaDataExchange

Description

Page 16: Lightweight Service Oriented Parallelism

16

Service Orientation

Page 17: Lightweight Service Oriented Parallelism

17

Service Orientation (SO)

• Architectural view of software and systems inspired by web services

• Much hype!• “Service-oriented development focuses on systems that

are built from a set of autonomous services.” Don Box• No flat space containing a sea of objects• There are four tenets:

– Boundaries are explicit– Services are autonomous– Services share schema and contract, not class– Service compatibility is determined based on policy

• Key idea services are loosely coupled and autonomous– Web services are one possible implementation

Page 18: Lightweight Service Oriented Parallelism

18

SO vs Distributed Objects• CORBA, DCOM, Java RMI etc. try to present a uniform view of the

world– Common object model– Set of objects all living in the same space– Ok for a LAN: single admin domain, reliable, simple security,

homogeneous• Doesn’t work on the internet• Can’t do business by dictation: you must use Corba / RMI / DCOM

etc.• Increasingly doesn’t work in LAN

– Move to more structure, local firewalls and tiered admin within organisations

• Déjà vu?– C.f. TCP sockets (no shared implementation)

• Policy => metadata

Page 19: Lightweight Service Oriented Parallelism

19

Parallelism

Page 20: Lightweight Service Oriented Parallelism

20

Motivation and Ideas

• Use SOAP instead of MPI– Interoperability– Leverage higher level WS specs e.g. security

• Service orientation decouples clients and servers, producers and consumers

• Simple producer consumer models of parallelism can benefit from SO– E.g. when producers are legacy applications and

consumers are modern e.g. WS enabled apps or modern scripts

Page 21: Lightweight Service Oriented Parallelism

21

Two Simple Models of Parallelism

• (Both producer consumer)

• Futures (Task-result)– Lisp futures or Cilk etc.

• Linda– Tuple space, JavaSpaces etc.

Page 22: Lightweight Service Oriented Parallelism

22

Futures

• Idea, spawn function calls – asynchronous– handle = Future (Add(1,2))

– Create a task to perform Add(1,2)

– Can interrogate the handle to enquire on result

• Web services can naturally express this form of communication

handle Add(int,int)

int+ getAdd(handle)

Client Cluster

Page 23: Lightweight Service Oriented Parallelism

23

Add Request

<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"> <soap:Body> <Add xmlns="http://www.qut.edu.au/"> <a>1</a> <b>2</b> </Add> </soap:Body></soap:Envelope>

Page 24: Lightweight Service Oriented Parallelism

24

Add Response

<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"> <soap:Body> <AddResult xmlns="http://www.qut.edu.au/"> 437643786432 </AddResult> </soap:Body></soap:Envelope>

Page 25: Lightweight Service Oriented Parallelism

25

getResultAdd Request

<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"> <soap:Body> <getAdd xmlns="http://www.qut.edu.au/"> <handle>437643786432</handle> </getAdd > </soap:Body></soap:Envelope>

Page 26: Lightweight Service Oriented Parallelism

26

getResultAdd Response

<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"> <soap:Body> <getAddResult xmlns="http://www.qut.edu.au/"> 3 </getAddResult> </soap:Body></soap:Envelope>

If result not ready return null (empty)

Page 27: Lightweight Service Oriented Parallelism

27

Caching

• Assume computation is ‘functional’• Cache results on server• Sessionless• Poll server until get result• Need to match args to see if already got result• Can support both kinds of function in web service

interface

int+ Add(int,int)

Client Cluster

Page 28: Lightweight Service Oriented Parallelism

28

Data Parallelism

• Problem, asynchronous programming model rather tricky• Often want to invoke many functions en mass• Can build data parallel abstractions in language to support data

parallelism– E.g. matrix add

• Also build into web service framework, automatically lift point wise operations

int+ [] Add(int[],int[])

Client Cluster

Page 29: Lightweight Service Oriented Parallelism

29

System Overview

Client Server

WebServices

Web Services

Grid/ Cluster

Web Server

Job Repository(function cache)

DecoupledAnd autonomous

Page 30: Lightweight Service Oriented Parallelism

30

System Properties• Job requestors poll for results and for creating tasks• Job executors poll for jobs

• Decouple result requestors/consumers from result producers• Result producers can be legacy code• Result consumers can be different code• Completely decoupled• Can share results• Also naturally fault tolerant if cache results in a stable store

• (Service orientation:1. Boundaries are explicit 2. Services are autonomous)

Page 31: Lightweight Service Oriented Parallelism

31

Result cache

• Need a stable store• Need to efficiently store results and compare

arguments XML• Use an XML database e.g.

– Xindice, SQL Server 2005 etc.

• One table per job type e.g. table for Add• Use stored procedures to perform operations• Need facility to create tables

– Also a web service

Page 32: Lightweight Service Oriented Parallelism

32

Jobs, Schema and Web Services

ServerWeb Services

Job tableCreate job

Get result

Get result

Create table

SchemaWSDL

Web Services

Put result

Data parallel

Job creators /consumers

Job executors

Page 33: Lightweight Service Oriented Parallelism

33

Database

Page 34: Lightweight Service Oriented Parallelism

34

WSDL, Schema etc

• Typed jobs: when a job type is created the schema must be provided for the inputs and outputs to the function.

• The WSDL, table, and web services are created automatically

• (Service orientation:3. Services share schema and contract, not class

4.Service compatibility is determined based on policy)

Page 35: Lightweight Service Oriented Parallelism

35

Details

• Using SQL 2005

• Supports XML indexing, but not testing XML for equality

• Therefore need an efficient mechanism to compare web service call inputs with what already in database

• Use canonicalisation provided by XML security and generate a hash from this

Page 36: Lightweight Service Oriented Parallelism

36

User Interface

Page 37: Lightweight Service Oriented Parallelism

37

Utilising Idle Machines

• (old project G2, g2.fit.qut.edu.au)• System is amenable to cycle scavenging• Extend the system to also support code

caching and distribution for simple code • Can be heterogenous and support Java

applets, .NET etc.• Volunteer machines download jobs and

code• Extra table in database

Page 38: Lightweight Service Oriented Parallelism

38

Results

• Blast application running on ten node test cluster– Speedup of 9.96 times for 40 jobs of approx 1m57s

duration• The bioinformatics SVM application in 50 PC lab

(cycle scavenging)– Speedup of 46 times with 200 jobs of approx 1m44s

duration (input and output were negligible)• Works well for coarse grained parallelism

• To generate tasks simply send an XML doc to the server via a tool or DIY

Page 39: Lightweight Service Oriented Parallelism

39

REST

• Many end user applications support binding to XML– E.g. in Excel can simply import XML data

• REST – different style of web services based on HTTP verbs

• Expose results as XML through a URL e.g.– eresearch.fit.qut.edu.au/g2x/Add/1/2– Results in an XML doc

Page 40: Lightweight Service Oriented Parallelism

40

Linda

• (Work in progress)• Alternative simple model of parallelism• Linda has a tuple space and 4 operations:

– in, out, rd, eval

• Add and copy/remove tuples from tuplespace• Remove and copy by associative matching on

data• Naturally asynchronous model

Page 41: Lightweight Service Oriented Parallelism

41

XML Databases and Linda

• Use XML instead of tuples• XML databases store XML data and support querying

data• Build a Linda like system• SQL server supports XQuery (Xindice supports XPath)• Use XQuery to query for data

– XQuery is a SQL like functional language for querying XML data

• Have a few simple web services to add and remove XML data

• (related work on XSpaces etc.)

Page 42: Lightweight Service Oriented Parallelism

42

Operations

• Like functional case support creation of typed XML tables, but hold just a single XML value

• Operations (web services)

URL CreateLindaTable(XML Schema)void Put(XMLDoc[])XMLDoc[] Take (XQuery-string)XMLDoc[] Copy (XQuery-string)

Page 43: Lightweight Service Oriented Parallelism

43

Linda

Cluster

<foo></foo>

<foo></foo>

<foo></foo>

<foo></foo>

<foo></foo>

TableXML documents

ProducersPut(<foo> … </foo>)

ConsumersTake(“for $v in / where $v/@val < 2000 return $v”)

Web services

Page 44: Lightweight Service Oriented Parallelism

44

Preliminary Results

• Preliminary results encouraging• Sending around XQueries – some security issues e.g.

DoS attacks etc.• Model well suited to certain algorithms e.g. genetic

algorithms where got a set of improving values• Producers and consumers tend to be the same program

– But just need to generate and send XML docs to server

• Can have multiple tables– Locking?

Page 45: Lightweight Service Oriented Parallelism

45

Future Work• Search on functional parallelism cache• Notification interface• WS Resource Framework• Untyped jobs• Security• Connect to a proper job scheduler• Server is a bottleneck – can we use

database replication etc. to alleviate this

Page 46: Lightweight Service Oriented Parallelism

46

Conclusions

• Web services and databases can support simple lightweight service oriented parallelism

• Service orientation very useful, particularly the decoupling

• Databases useful – highly tuned

• Need to support different paradigms