parallel and distributed monte carlo simulations for ... · 1 parallel and distributed monte carlo...
TRANSCRIPT
1
Parallel and distributed Monte Carlo simulations for finance in a grid environment
Francoise BAUDEMireille BOSSYViet Dung DOANIan STOKES-REES
INRIA Sophia-AntipolisFrance
2
Outline
� Objectives
� Background
� Architecture
� Layered Grid Process Model
� Grid5000
� Performance Results
� Future
3
High Level Objectives
� Framework for distributed computational finance algorithms
� Investigate grid component model� http://gridcomp.ercim.org/
� Implement open source versions of parallel algorithms for computational finance
� Utilise ProActive grid middleware
� Deploy and evaluate on various grid platforms� Grid5000 (France)� DAS3 (Netherlands)� EGEE (Europe)
4
Grid Emphasis
� The most recent work on PicsouGrid has focused on:� Multi site (5+)
� Large scale (500-2000 cores)
� Long term (days to weeks)
� Multi-grid (2+)
� Parallel distributed American option pricing� Ibanez-Zapetero
� Longstaff-Schwartz
5
Outline
� Objectives
� Background
� Architecture
� Layered Grid Process Model
� Grid5000
� Performance Results
� Future
6
ProActive
http://www.objectweb.org/proactive
� Java Library for Distributed Computing� Developed by INRIA Sophia Antipolis, France
(Project OASIS)� 50-100 person-years R&D work invested� Provides transparent asynchronous distributed
method calls� Implemented on top of Java RMI� Fully documented (600 page manual)� Available under LGPL� Used in commercial applications� Graphical debugger
7
Background – PicsouGrid v1,2,3
� Original versions of PicsouGrid utilised:� Grid5000� ProActive� JavaSpaces
� Implemented� European Simple, Basket, and Barrier Pricing� Medium-size distributed system: 4 sites, 180 nodes� Short operational runs (5-10 minutes)� Fault Tolerance mechanisms
� Achieved 90x speed-up with 140 systems� 65% efficiency
� Reported in e-Science 2006 (Amsterdam, Nov 2006)� A Fault Tolerant and Multi-Paradigm Grid Architecture for Time Constrained
Problems. Application to Option Pricing in Finance.
8
PicsouGrid v3 Performance
Multi-site
Peak speed-up
Performancedegradation
9
Outline
� Objectives
� Background
� Architecture
� Layered Grid Process Model
� Grid5000
� Performance Results
� Future
10
PicsouGrid Architecture
� Server/Control Node� Provides User Interface� Instantiates network of Sub-Servers� Allows configuration of Simulator network� Creates “Request for Option Price” (with algorithm parameters)� Controls Sub-Servers and aggregates/reports results� Monitors Sub-Servers for failures and spawns new Sub-Servers if
necessary
� Sub-Server� Acts as local site/cluster/system controller� Instantiates local Simulators� Delegates simulations in packets to Simulators� Collects results, aggregates, and returns to Server� Monitors Simulators for failures and spawns new Simulators if necessary
� Simulator� Computes Monte Carlo simulations for option pricing using packets
11
PicsouGrid Deployment and Operation
reserveworkers
Client Server
Sub-Server
Sub-Server
Worker
ProActive Worker
DB
ProActive
ProActive
ProActive
JavaSpacevirtual shared
memory (to v3)
option pricing requestMC simulation packetheartbeat monitorMC result
12
PicsouGrid v5 Design Objectives
� Multi-Grid� Grid5000� gLite/EGEE� INRIA Sophia desktop cluster
� Decoupled Workers� Autonomous� Independent deployment and operation� P2P discover and acquire
� Long Running, Multi-Algorithm� Create “standing” application� Augment (or reduce) P2P worker network based on demand� Computational tasks specify algorithm and parameters
13
Simulator Interface
public interface Simulator {
public void init(OptionSet optionSet);public void init(Algorithm alg,
Asset asset);public void simulate( );public IntWrapper simulate( Integer iterations );public BooleanWrapper autorun( Simulator sim );public IntWrapper offload( Simulator sim );public void merge( Simulator sim );public void seed( Simulator sim );public void reset( );public void restart( );public OptionSet getOptionSet( );public State getState( );
}
14
XML Pricing Descriptors
<OptionSet results="pricing-es1-res.xml"><Algorithm
type="EuropeanSimple"timeIntervalsPerYear="50"iterations="1e5"packets="100"
/><AssetPool>
<Asset><Params
name="ABC Ltd."startPrice="36"strikePrice="40"timeToMaturity="1"volatility="0.20"riskFreeRate="0.06"
/></Asset>
</AssetPool></OptionSet>
15
Embedded Standards and Results
<Performanceiterations=“100000"steps=“100"startTime="1180428748"stopTime="1180431022"wallTime=“1274"startTimeH="Tue May 29 10:53:28 RDT 2007"stopTimeH="Tue May 29 10:53:31 RDT 2007"/>
<PricecallPrice="3.143"putPrice=“0.807“callCI=“0.012"putCI=“0.012"completedSimulations=“100000"
/>
16
Outline
� Objectives
� Background
� Architecture
� Layered Grid Process Model
� Grid5000
� Performance Results
� Future
17
Grid Performance Monitoring and State Machines
� Grid-ified distributed applications add at least three new layers of complexity compared to serial counterpart:� Grid interaction and management� Local cluster interaction and management� Distributed application code
� Notoriously difficult to figure out what is going on where and when it is happening:� Bottlenecks� Hot spots� Idle time� Limiting factor: CPU, storage, network?� What state is an application/task/process/system currently in?
� Solution : Utilise a common state machine model for grid applications/processes
18
Layered System Grid
Site
Cluster
Host
Core
VM
Process
19
“Proof” of layering
� What I execute on a Grid5000 Submit (UI) Node:mysub -l nodes=30 es-bench1e6
� What eventually runs on Worker Node:/bin/sh -c /usr/lib/oar/oarexecuser.sh /tmp/OAR_59658
30 59658 istokes-rees \/bin/bash ~/proc/fgrillon1.nancy.grid5000.fr/submit N script-wrapper \~/bin/script-wrapper fgrillon1.nancy.grid5000.fr \~/es-bench1e6
� Granted, this is nothing more than good system designand separation of concerns
� We are just looking at the implicit API layers of “the grid”� Universal interface : command shell, environment variables
and file system
20
Abstract Recursive Process Model
� Question: Is it possible to propose a recursive process model which can be applied at all layers?
� Create – process description� Bind – process to the physical layer� Prepare – prepare for execution (software, stage in, config)� Execute – initiate process execution (enter next lower layer)� Complete – book keeping, stage out, clean up� Clear – wipe system, ready for next invocation
� Each stage can be in a particular state:� Ready� Active� Done
21
Grid Process State Machine
Ready
Active
Done
Create
Ready
Active
Done
Bind
Ready
Active
Done
Prepare
Ready
Active
Done
Execute
Ready
Active
Done
Complete
Ready
Active
Done
Clear
Fail CancelSystem UserSuspend Pause
Create process description
Bind to a particular system
Prepare system to execute process
Execute process (recurse to next lower level)
Tidy up system and accounting after completion of process
Clear process from system
22
CREAM Job States
� New LCG/EGEE Workload Management System
� Can be mapped to Grid Process State Machine
� This only shows one level of mapping� In practice, would
apply state machine at Grid level, LRMS level, and task level
� Timestamps on state entry:� Layer.Stage.State
Prepare
Create
Bind
Execute
Done FailedFailed Cancelled
Sus
pend
23
Outline
� Objectives
� Background
� Architecture
� Layered Grid Process Model
� Grid5000
� Performance Results
� Future
24
Grid5000 Stats
� 9 Sites across France
� 21 Clusters
� 17 Batch systems
� 3138 cores� Xeons
� Opterons
� Itaniums
� G5
Lille
Sophia
Lyon
Nancy
Bordeaux
Paris-Orsay
Toulouse
Grenoble
Rennes
25
Characteristics of Grid5000
� Private network� Outbound Internet access possibly via
ssh tunnel� Access based on ssh keys
(passwordless)� Shared NFS file space at each site� Very limited data management facilities� Myrinet and Infiniband prevalent on
many clusters� RENATER French research network,
2.5 to 10 Gb/s inter-site� Focus on multi-node (and multi-site )
grid computing� “Kadeploy” provides mechanism for
custom system image to be loaded before job starts
Grid5000 site
26
Deployment and Execution on Grid5000
� Limited grid-wide (cross-site) job submission mechanisms� In practice, submit individually at each site� Coordinate between sites via multiple “reservation” job
submissions with same reservation window
� Limited data-management/staging/configuration� Kadeploy (often too “heavy weight”)� rsync� Configuration wrapper scripts
� Node count reservations “best effort”� Rule of thumb: don’t expect more than 80% of requested nodes
to be available when reservation starts� Experience shows reservation start times could be delayed 30
seconds to 10 minutes
27
Outline
� Objectives
� Background
� Architecture
� Layered Grid Process Model
� Grid5000
� Performance Results
� Future
28
Experimental Setup
� European Simple call/put option price� 1e6 Monte Carlo iterations
� Single asset pricing reference:� treference= 67.3 seconds� AMD Opteron 2218 (64 bit) 2.6 GHz 1 MB L1 667
MHz bus (best performing core available)� Objective 1 : maximize number of options priced in a
fixed time window� Objective 2 : maximize speed-up efficiency:
(noptions× treference)Σsites(ncores_i × treservation_i)
29
“Run Now” Experiment
� Make immediate request for maximum number of nodes on all Grid5000 clusters� Price one option per acquired core� Not really fair: Grid5000 is not a production grid
� Submit to 15 clusters� 8 clusters at 6 sites completed tasks within 6 hours� Remainder either failed or hadn’t started 24 hours later
� 1272 cores utilised� 85 core-hours occupied
� This is the total amount of time the tasks “held” a particular core: idle time + execution time
� Objective 1(alt): 1272 options priced in “8 minute window”� Objective 2: 1272 options × 67.3 s / 85 hr = 28% efficient� Discovered various grid issues (e.g. NTP, rsync)
30
Queuing
Queuing
Queuing
Queuing
Execution
Result stage-out
31
When everything is working
32
NTP Problems (Time Sync)
33
Unexplained slow downs(homogeneous cluster)
34
Erratic node/core startup
35
Coordinated Start with Reservation
� Reservation made 12+ hours in advance� Confirmed no other reservations for time slot� Start time at “low utilisation” point of 6:05am
� 5 minutes provided for system restarts and Kadeploy re-imaging after end of reservations going to 6am
� Submitted to 12 clusters, at 8 sites� 9 clusters at 7 sites ran successfully
� 894 cores utilised� 31.3 core-hours occupied� No cluster reservation started “on time”
� Start time delays of 20s to 5.5 minutes� Illustrates difficulty of cross-site coordinated parallel processing
� Objective 1: 894 options priced in 9.5 minute window� Objective 2: 894 options × 67.3 s / 31.3 hr = 53.4% efficient� Still problems (heterogeneous clusters, NTP, rsync)
36
37
Intra-node timing variations
38
Core Timeline (detail)
� May seem like splitting hairs, but this is important for parallel algorithms with regular communication and synchronisation points
� Also, to know where latencies/inefficiencies are introduced
39
Heterogeneous clusters(hyper threading on)
40
Mis-configured timezone
41
Overall cluster benchmarks
42
Outline
� Objectives
� Background
� Architecture
� Layered Grid Process Model
� Grid5000
� Performance Results
� Future
43
Parallelism
� American option pricing with “floating” exercise date is much more difficult to calculate
� Two algorithms with good opportunities for parallelism are available:� Longstaff-Schwartz (2001)� Ibanez-Zapetero (2002)
� Interesting to see what speed up can be achieved by parallel implementation – in progress
� Interested in possibility of cross-site parallel computation utilising ProActive
44
Longstaff Schwartz
45
Ibanez-Zapetero
46
Multi-Grids
� Very interested in experimenting with Multi-Grid environment:� Grid5000
� gLite/EGEE
� DAS3
� Local cluster/desktop-grid/p2p network
� ProActive deploys on LCG (gLite/EGEE)� Other ProActive applications deployed and run successfully
� VO problems in Feb/March meant PicsouGrid could not be run on LCG – so no results for ISGC! ����
� Investigate use of HTTP-based task pools to bridge “grids”
47
Future for PicsouGrid
� Many more computational finance algorithms have already been developed and need to be similarly benchmarked:� Barrier, Basket� American (Longstaff-Schwartz and Ibanez-Zapatero)
� “Continuous” operation of option pricing, rather than “one-shot”
� Incorporate dynamic node availability� Improve modularization/componentization of finance
algorithms
48
Summary of Observations
� Deploying parallel applications in a grid environment continues to be a challenging problem
� Heterogeneity in a grid is pervasive and still hard to deal with
� Understanding performance issues, hot spots, bottlenecks, wasted idle time, and synchronisationpoints can be aided by a grid process model
� Middleware really is critical : gLite, LRMS, OAR, ProActive, etc. need to provide end users and application developers with reliable, consistent, and easy to use interface to “the grid”