x10-based massive parallel large- scale traffic flow...

35
X10-based Massive Parallel Large- Scale Traffic Flow Simulation Toyotaro Suzumura 1,2 , Sei Kato 1 , Takashi Imamichi 1 , Mikio Takeuchi 1 , Hiroki Kanezashi 2 , Tsuyoshi Ide 1 , and Tamiya Onodera 1 IBM Research – Tokyo 1 , Tokyo Institute of Technology 2 1 This research was partly supported by the Japan Science and Technology Agency (JST) Core Research of Evolutionary Science and Technology (CREST)

Upload: others

Post on 15-Jun-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: X10-based Massive Parallel Large- Scale Traffic Flow ...x10.sourceforge.net/documentation/papers/X10Workshop2012/slide… · X10-based Massive Parallel Large-Scale Traffic Flow Simulation

X10-based Massive Parallel Large-Scale Traffic Flow Simulation

Toyotaro Suzumura1,2, Sei Kato1, Takashi Imamichi1, Mikio Takeuchi1, Hiroki Kanezashi2, Tsuyoshi Ide1, and Tamiya Onodera1

IBM Research – Tokyo1, Tokyo Institute of Technology2

1

This research was partly supported by the Japan Science and Technology Agency (JST) Core Research of Evolutionary Science and Technology (CREST)

Page 2: X10-based Massive Parallel Large- Scale Traffic Flow ...x10.sourceforge.net/documentation/papers/X10Workshop2012/slide… · X10-based Massive Parallel Large-Scale Traffic Flow Simulation

1

2

3 4 5

6

7 8

X10-based Ultra-Large Scale Agent Simulation on the 2 Petaflops Supercomputer

Goal: To build a scalable large-scale agent simulation platform based on X10 that runs on a Super Computer with ten thousands of CPU cores and dual links of 40Gbps Infiniband network

Status: Completed the multi-node version and verified the scalable performance with the Hiroshima road network.

Megaffic

X10 TSUBAME: 2 Petaflops Supercomputer

Simulation Data: Hiroshima # of trips: 10000 (1/100 of real trips) # of simulation steps: 1000 (1/100 of real steps: 24 hours)

0  

50  

100  

150  

200  

250  

1   2   4   8   16  

Time(s)

Places

Simula.on  .me  

196 cores 12 cores

Page 3: X10-based Massive Parallel Large- Scale Traffic Flow ...x10.sourceforge.net/documentation/papers/X10Workshop2012/slide… · X10-based Massive Parallel Large-Scale Traffic Flow Simulation

Outline § Motivation

§ XAXIS Overview and Architecture

§ Design for Highly Scalable Platform

§ Performance Evaluation

§ Discussion

§ Related Work

§ Concluding Remarks and Future Work

§ Other Activities

Page 4: X10-based Massive Parallel Large- Scale Traffic Flow ...x10.sourceforge.net/documentation/papers/X10Workshop2012/slide… · X10-based Massive Parallel Large-Scale Traffic Flow Simulation

Background: Large-scale Simulation is Everywhere §  We have entered into the era where proactive response is needed

§  Highly performance large-scale based simulation is required for timely decision.

4

http://mark.buchanan.pagesperso-orange.fr/nature_economic_modelling.pdf

Page 5: X10-based Massive Parallel Large- Scale Traffic Flow ...x10.sourceforge.net/documentation/papers/X10Workshop2012/slide… · X10-based Massive Parallel Large-Scale Traffic Flow Simulation

How can we design and develop highly distributed agent simulation platform ?

§  How can we design and implement a platform that handles millions of agents and multiple simulations concurrently ?

§  How can we handle large-scale graphs consisting of millions of vertices and tens of millions of edges such as the whole Japanese road network ?

1

2

345

6

78

Page 6: X10-based Massive Parallel Large- Scale Traffic Flow ...x10.sourceforge.net/documentation/papers/X10Workshop2012/slide… · X10-based Massive Parallel Large-Scale Traffic Flow Simulation

1

2

3 4 5

6

7 8

X10-based Large Scale Agent Simulation on the 2 Petaflops Super Computer

§  Goal: To build a scalable large-scale agent simulation platform that runs on a Super Computer with thousands of cores and dual links of 40Gbps Infiniband network

§  Technical Challenge towards High Scalability : How can we concurrently process multiple agents in a scalable manner ? –  How can we divide extremely huge graph into a set of sub-graphs and allocate each subgraph to compute node on a

super computer in order to find the best allocation pattern that balances the communication and computational cost based on the profiling data at runtime ?

à Prior arts tackle similar problem, but the different underlying environment and application needs different optimization scheme

Megaffic

X10

Page 7: X10-based Massive Parallel Large- Scale Traffic Flow ...x10.sourceforge.net/documentation/papers/X10Workshop2012/slide… · X10-based Massive Parallel Large-Scale Traffic Flow Simulation

XAXIS: X10-based Agents eXecutive Infrastructure for Simulation § X10-based Distributed Agent Simulation Platform

– X10 is the state-of-the-art PGAS (Partitioned Global Address Space) language that brings high productivity when implementing highly parallel and distributed applications on post-peta or exascale machines •  X10 provides the functionality that can seamlessly integrate with legacy

applications written in Java or C++.

§ Programming Model –  The agent programming model of XAXIS is derived from our ZASE

[Yamamoto, AAMAS2007] simulation platform

– XAXIS provides compatible API interface of ZASE to developers.

Gaku Yamamoto, et.al, “A Platform for Massive Agent-based Simulation and” , AAMAS 2007

Page 8: X10-based Massive Parallel Large- Scale Traffic Flow ...x10.sourceforge.net/documentation/papers/X10Workshop2012/slide… · X10-based Massive Parallel Large-Scale Traffic Flow Simulation

XAXIS Software Stack §  The following diagram illustrates the software stack of XAXIS and its

applications.

§  XAXIS in X10 can execute the existing ZASE applications written in Java with slight modification

8

ZASE    Simula2on  

Run2me  (Java)

Agent  Simula2on(Java)(e.g.  Traffic,    CO2  Emission,  Auc2on,  Marke2ng)  

 XAXIS  :  X10-­‐Based  Simula2on  Run2me  

Agent    Simula2on  

(X10)  

ZASE-­‐XAXIS-­‐Bridge  (Java)

X10  (Java,  C++)

ZASE API ZASE-­‐XAXIS-­‐Bridge  

(X10)

Page 9: X10-based Massive Parallel Large- Scale Traffic Flow ...x10.sourceforge.net/documentation/papers/X10Workshop2012/slide… · X10-based Massive Parallel Large-Scale Traffic Flow Simulation

XAXIS Architecture: X10-Based Agent Simulator

A2

onHandleMessage

receiveMessage

Agent Object

Repository

(3) Retrieves an agent object with an agent id

Invoke an onHandleMessage method of the A2 object

9

A1

execute

sendMessage

Agent Directory

Identify a place id

Send message with “async at ” Msg (place id, agent id)

Msg (agent id)

Agent Object

Repository

Agent Directory

Agent Manager Agent Manager

Place P

Identify a place id

User Agent Code

Place Q

User Agent Code

A2

XAXIS Server Global Data

Place 0

Simulation Cycle Management

Page 10: X10-based Massive Parallel Large- Scale Traffic Flow ...x10.sourceforge.net/documentation/papers/X10Workshop2012/slide… · X10-based Massive Parallel Large-Scale Traffic Flow Simulation

XAXIS-based Large Scale Traffic Simulator

10

Vehicle Proxy/ Vehicle

CrossPoint (X10 Activity)

Place P Place Q

Simulation Manager

Place 0

Simulation Cycle Management

Vehicle Proxy/ Vehicle

Simulation execution at time T

Road Road

Vehicle Proxy/ Vehicle

CrossPoint (X10 Activity)

Vehicle Proxy/ Vehicle

Road Road

Vehicle Proxy/ Vehicle

CrossPoint (X10 Activity)

Vehicle Proxy/ Vehicle

Simulation execution

Road Road

Vehicle Proxy/ Vehicle

CrossPoint (X10 Activity)

Vehicle Proxy/ Vehicle

Road Road

Simulation execution at timeT

Graph Server (Java) Graph

Simulation execution

Destination (X10 Activity)

Origin (X10 Activity)

SubGraph ?? SubGraph ??

Simulation execution at time T

Simulation execution at time T

Page 11: X10-based Massive Parallel Large- Scale Traffic Flow ...x10.sourceforge.net/documentation/papers/X10Workshop2012/slide… · X10-based Massive Parallel Large-Scale Traffic Flow Simulation

Mapping Magaffic Components to X10

Megaffic

X10

Page 12: X10-based Massive Parallel Large- Scale Traffic Flow ...x10.sourceforge.net/documentation/papers/X10Workshop2012/slide… · X10-based Massive Parallel Large-Scale Traffic Flow Simulation

Megaffic on XAXIS

XAXIS Runtime

GraphServer Driver.StaticDriver

roadnetwork. Road

simulator. Place

roadnetwork. Area

simulator. Region

roadnetwork. CrossPoint

simulator. Driver

TrafficEnv

Service

citizen. Service

TrafficSim

Launcher

Simulator.Launcher/

RegionLauncher

ShortestPath /Dijkstra

Vehicle

simulator.

Citizen

Vehicle Proxy

simulator. CitizenProx

y

Page 13: X10-based Massive Parallel Large- Scale Traffic Flow ...x10.sourceforge.net/documentation/papers/X10Workshop2012/slide… · X10-based Massive Parallel Large-Scale Traffic Flow Simulation

Component Diagram

Cross

Point (Zase Driver)

Cross

Point (ZaseDrive

r)

CrossPoint

(ZaseDriver)

Road (Zase Place)

VehicleProxy

Vehicle  

Area (Zase Region)

Road (Zase Place)

Driver  

VehicleProxy

Vehicle  

Driver  

Graph(road network)

in

out

Page 14: X10-based Massive Parallel Large- Scale Traffic Flow ...x10.sourceforge.net/documentation/papers/X10Workshop2012/slide… · X10-based Massive Parallel Large-Scale Traffic Flow Simulation

Outline § Motivation

§ XAXIS Overview and Architecture

§ Design for Highly Scalable Platform

§ Performance Evaluation

§ Discussion

§ Related Work

§ Concluding Remarks and Future Work

§ Other Activities

Page 15: X10-based Massive Parallel Large- Scale Traffic Flow ...x10.sourceforge.net/documentation/papers/X10Workshop2012/slide… · X10-based Massive Parallel Large-Scale Traffic Flow Simulation

Each Place(X10) manages different set of CrossPoints (XAXIS)

Cross

Point

Cross

Point

CrossPoint

(ZaseDriver)

VehicleProxy

Vehicle  

Road

Driver  

VehicleProxy

Vehicle   Driver  

Graph(road network)

in

out Road

Cross

Point

Cross

Point

CrossPoint

(ZaseDriver)

VehicleProxy

Vehicle  

Road

Driver  

VehicleProxy

Vehicle   Driver  

Graph(road network)

in

out Road

Cross

Point

Cross

Point

CrossPoint

(ZaseDriver)

VehicleProxy

Vehicle  

Road

Driver  

VehicleProxy

Vehicle   Driver  

Graph(road network)

in

out Road

1

2

3 4 5

6

7 8

2

3

4

SubGraph (Road

Network)

SubGraph (Road

Network)

SubGraph (Road

Network)

Page 16: X10-based Massive Parallel Large- Scale Traffic Flow ...x10.sourceforge.net/documentation/papers/X10Workshop2012/slide… · X10-based Massive Parallel Large-Scale Traffic Flow Simulation

Design for Vehicle Migration among Different X10 Places

A2

deserialize migrated vehicle

Receive migrated vehicle object

(3) Retrieves an agent object with an agent id

Invoke an onHandleMessage method of the A2 object

16

A1

Execute & send migrate vehicles

Migrate vehicle object Cross Point

Directory Identify a place id

Send vehicle with “async at ” Vehicle object message

Vehicle (migration cross point id)

Cross Point Directory

Cross Point Manager Cross Point Manager

X10 Place (P)

Identify a place id

X10 Place (Q)

Cross Point

ZASE-X Server Global Data

Place 0

Simulation Cycle Management

Cross Point

Migration Vehicle

Repository

Road Network (P)

Road Network (Q)

Page 17: X10-based Massive Parallel Large- Scale Traffic Flow ...x10.sourceforge.net/documentation/papers/X10Workshop2012/slide… · X10-based Massive Parallel Large-Scale Traffic Flow Simulation

Design for Vehicle Migration Among X10 Places

CP1

CP2

CP3 CP0

X10 Place 1

X10 Place 2 X10 Place 0

Road 0

Road 1

Road 2

•  A road object has information on the identifiers of origin and destination cross points. When looking into the cross point identifier, it is possible to know which X10 place the cross point is located since the identifier is assigned by the X10 DistArray construct.

•  A road object also exists at a place where its destination cross point is located. For instance, if certain trip takes CP1 as origin and CP3 as destination, a graph server returns CP1, CP2, and CP3 as a shortest path. When a vehicle firstly enters into a road, it checks whether a road exists at the same X10 place. If not, a manager migrates the vehicle to the next road exists at different X10 place.

CP: Cross Point

Page 18: X10-based Massive Parallel Large- Scale Traffic Flow ...x10.sourceforge.net/documentation/papers/X10Workshop2012/slide… · X10-based Massive Parallel Large-Scale Traffic Flow Simulation

Outline § Motivation

§ XAXIS Overview and Architecture

§ Design for Highly Scalable Platform

§ Performance Evaluation

§ Discussion

§ Related Work

§ Concluding Remarks and Future Work

§ Other Activities

Page 19: X10-based Massive Parallel Large- Scale Traffic Flow ...x10.sourceforge.net/documentation/papers/X10Workshop2012/slide… · X10-based Massive Parallel Large-Scale Traffic Flow Simulation

TSUBAME 2.0 Supercomputer .

Page 20: X10-based Massive Parallel Large- Scale Traffic Flow ...x10.sourceforge.net/documentation/papers/X10Workshop2012/slide… · X10-based Massive Parallel Large-Scale Traffic Flow Simulation

TSUBAME 2.0 System Configuration

Page 21: X10-based Massive Parallel Large- Scale Traffic Flow ...x10.sourceforge.net/documentation/papers/X10Workshop2012/slide… · X10-based Massive Parallel Large-Scale Traffic Flow Simulation

TSUBAME 2.0 Specification Specification

CPU Intel Westmere EP (Xeon X5670, L2 Cache: 256 KB, L3: 12MB) 2.93 GHz processors, 12 CPU Cores (24 cores with Hyper Threading) x 2 sockets per 1 node (24 CPU Cores)

RAM 54 GB

OS SUSE Linux Enterprise 11 (Linux kernel: 2.6.32) # of Total Nodes 1466 nodes (We only used up to 1366 nodes) Network Topology Full-Bisection Fat-Tree Topology

Network Voltaire / Mellanox Dual-rail QDR Infiniband (40Gbps x2 = 80 Gbps)

GPGPU Three NVIDIA Fermi M2050 GPUs (*Not used for this work) GCC and OpenMP GCC 4.3.4 (-O3 option) , OpenMP 3.0

OpenMPI OpenMPI 1.5.3, MVAPICH 1.6.1

Java Virtual Machine IBM Java 1.6.0 (GC Policy: gencon) X10 X10 2.1.1.1

Page 22: X10-based Massive Parallel Large- Scale Traffic Flow ...x10.sourceforge.net/documentation/papers/X10Workshop2012/slide… · X10-based Massive Parallel Large-Scale Traffic Flow Simulation

Performance Evaluation – Single Node

1

1.5

2

2.5

3

3.5

4

4.5

5

5.5

1 2 4 6 8 10 12

# o

f sp

eedu

p

# of threads

Performance Characteristics of XAXIS (# of trips: 115000 (1/10), roadnetwork: hiroshima)

100

1000

10000

100000

Objective • To see the performance speed up ratio with varying number of threads and simulation overall steps Experimental Setting: • Road network: hiroshima • # of trips: 115000 (1/10) • # of simulations: 100, 1000, 10000 , 100000 Findings: - This result has revealed that the real traffic data could not consume the full CPU usage. - To evaluate the full capability of XAXIS+MegafficCUI, we need to create artificial data - The expected time for 1 day will be 355 seconds * 10 = 1 hour 0  

200  

400  

600  

800  

1000  

1200  

1400  

1   2   4   6   8   10   12  

# of simulations: 10000 (real data is 86400)

# of threads

Elapsed Time (sec)

S72hs22-4.trl.ibm.com(Intel Xeon 6 core x 2 sockets, Hyper thread :off, 8GB RAM, RHEL5, )IBM J9 VM 1.6.0 (gc policy : gencon , -Xms:4096m,-Xmx6144m)

Page 23: X10-based Massive Parallel Large- Scale Traffic Flow ...x10.sourceforge.net/documentation/papers/X10Workshop2012/slide… · X10-based Massive Parallel Large-Scale Traffic Flow Simulation

Performance Evaluation – Multiple Nodes (100 trips per 1 step )

#  of  Place 1 2 4 8 16

#  of  Migra2ons

0 3214 8282 14836 13261

Execu2on    Time  (s)

221.172

132.977

81.267 72.229 49.619

# of trips : 100000, # of steps: 1000, RI=true # of threads per node: 12, heap memory=32GB The origin and destination of each trip is the same, but some trips moves to other places depending on the shortest path.

0  

2000  

4000  

6000  

8000  

10000  

12000  

14000  

16000  

1   2   4   8   16  Th

e  nu

mbe

r  of  m

igra.o

n

Places  (Nodes,  12  CPU  core  per  Node)

The  number  of  migra.ons  

0  

50  

100  

150  

200  

250  

1   2   4   8   16  

Time(s)

Places  (Nodes,  12  CPU  core  per  Node)

Simula.on  .me  

Page 24: X10-based Massive Parallel Large- Scale Traffic Flow ...x10.sourceforge.net/documentation/papers/X10Workshop2012/slide… · X10-based Massive Parallel Large-Scale Traffic Flow Simulation

Performance Evaluation – Multiple Nodes (100 trips per 1 step, More migrations)

Place数 1 2 4 8 16

#  of  Migra2ons 0 12523 26889 41326 63747

Simula2on  Time(s)

223.125 139.870 97.121 90.276 64.080

# of trips : 100000, # of steps: 1000, RI=true # of threads per node: 12, heap memory=32GB The origin and destination of each trip is the different place.

0  

50  

100  

150  

200  

250  

1   2   4   8   16  

Time(s)

Places(Nodes,  12  CPU  core  per  Node)

Simula.on  Time  

0  

10000  

20000  

30000  

40000  

50000  

60000  

70000  

1   2   4   8   16  Th

e  nu

mbe

r  of  m

igra.o

n

Places

The  number  of  migra.ons  

The origin and destination of each trip is located at different places.

Page 25: X10-based Massive Parallel Large- Scale Traffic Flow ...x10.sourceforge.net/documentation/papers/X10Workshop2012/slide… · X10-based Massive Parallel Large-Scale Traffic Flow Simulation

Discussion §  CPU usage greatly becomes lower if we employs more number of nodes

§  This is because the number of cross points per 1 node becomes less

§  We need more heavy computation or more trips per each step for better scalability

0  

5  

10  

15  

20  

25  

30  

35  

40  

45  

1   2   4   8   16  

CPU  usage(%

)  

Places  

CPU  usage  

CPU_us  

CPU_sy  

Page 26: X10-based Massive Parallel Large- Scale Traffic Flow ...x10.sourceforge.net/documentation/papers/X10Workshop2012/slide… · X10-based Massive Parallel Large-Scale Traffic Flow Simulation

Hiroshima with 16 Places

Page 27: X10-based Massive Parallel Large- Scale Traffic Flow ...x10.sourceforge.net/documentation/papers/X10Workshop2012/slide… · X10-based Massive Parallel Large-Scale Traffic Flow Simulation

Rio De Janeiro

Page 28: X10-based Massive Parallel Large- Scale Traffic Flow ...x10.sourceforge.net/documentation/papers/X10Workshop2012/slide… · X10-based Massive Parallel Large-Scale Traffic Flow Simulation

Singapore

Page 29: X10-based Massive Parallel Large- Scale Traffic Flow ...x10.sourceforge.net/documentation/papers/X10Workshop2012/slide… · X10-based Massive Parallel Large-Scale Traffic Flow Simulation

Large-scale traffic simulation with the whole Japanese road network consisting of 1 million cross points and 10 million vehicles

TSUBAME: 2 Petaflops Supercomputer

Page 30: X10-based Massive Parallel Large- Scale Traffic Flow ...x10.sourceforge.net/documentation/papers/X10Workshop2012/slide… · X10-based Massive Parallel Large-Scale Traffic Flow Simulation

Performance Analysis on TSUBAME

30 2011 IEEE International Symposium on Workload Characterization

The synchronization overhead greatly affects the performance when hundreds of threads are involved and scattered among distributed systems. As shown in the following graph, if we make the synchronization more loose, we got mostly linear performance scalability.

Page 31: X10-based Massive Parallel Large- Scale Traffic Flow ...x10.sourceforge.net/documentation/papers/X10Workshop2012/slide… · X10-based Massive Parallel Large-Scale Traffic Flow Simulation

Towards more scalability § Problem

– As shown in the previous chart, the synchronization overhead greatly affects the performance when hundreds of threads are involved and scattered among distributed systems

§ Possible Solutions –  More loose synchronization without loosing the simulation precision

–  To come up with better parallelization approach

–  Hierarchical synchronization … etc.

Page 32: X10-based Massive Parallel Large- Scale Traffic Flow ...x10.sourceforge.net/documentation/papers/X10Workshop2012/slide… · X10-based Massive Parallel Large-Scale Traffic Flow Simulation

Outline § Motivation

§ XAXIS Overview and Architecture

§ Design for Highly Scalable Platform

§ Performance Evaluation

§ Discussion

§ Related Work

§ Concluding Remarks and Future Work

§ Other Activities

Page 33: X10-based Massive Parallel Large- Scale Traffic Flow ...x10.sourceforge.net/documentation/papers/X10Workshop2012/slide… · X10-based Massive Parallel Large-Scale Traffic Flow Simulation

Related Work §  Yamamoto et al, A Platform for Massive Agent-based Simulation and its

Evaluation, 2007

§  David et al, Distributed Platform for Large-Scale Agent-Based Simulations, 2009,

§  Gorgious et al, Large Scale Distributed Simulation on the Grid, 2006

§  Nayer et al, Large-Scale Multi-Agent-Based Simulation using Exemplars,

§  Dan Chen et al. Large scale agent-based simulation on the grid, Journal Future Generation Computer Systems

§  Yi Zhang et al, Grid-aware Large Scale Distributed Simulation of Agent-based Systems, 2005

§  A flexible, large-scale, distributed agent based epidemic model, WSC 2007

§  Comparison of agent-based modeling software, http://en.wikipedia.org/wiki/Comparison_of_agent-based_modeling_software

33

Page 34: X10-based Massive Parallel Large- Scale Traffic Flow ...x10.sourceforge.net/documentation/papers/X10Workshop2012/slide… · X10-based Massive Parallel Large-Scale Traffic Flow Simulation

Demonstration

§ Riodejaneiro

§ Beijing

Page 35: X10-based Massive Parallel Large- Scale Traffic Flow ...x10.sourceforge.net/documentation/papers/X10Workshop2012/slide… · X10-based Massive Parallel Large-Scale Traffic Flow Simulation

Concluding Remarks and Future Work § Summary

– We designed and developed X10-based agent simulation platform, and verified the scalable performance on the TSUBAME 2.0 super computer

§ Future Work

– More Performance Optimization •  Agent Migration Overhead, Graph Partitioning, Time Decomposition and

more drastic one … •  Other agent simulations •  Experiments with BlueGene/P and later models, or RIKEN K Super

Computer