1 alexandru v staicu 1, jacek r. radzikowski 1 kris gaj 1, nikitas alexandridis 2, tarek el-ghazawi...
TRANSCRIPT
![Page 1: 1 Alexandru V Staicu 1, Jacek R. Radzikowski 1 Kris Gaj 1, Nikitas Alexandridis 2, Tarek El-Ghazawi 2 1 George Mason University 2 George Washington University](https://reader035.vdocument.in/reader035/viewer/2022062517/56649f305503460f94c4b2dc/html5/thumbnails/1.jpg)
1
Alexandru V Staicu1, Jacek R. Radzikowski1
Kris Gaj1, Nikitas Alexandridis2, Tarek El-Ghazawi2
1 George Mason University2 George Washington University
Effective Use of Networked Reconfigurable Resources
http://ece.gmu.edu/lucite
![Page 2: 1 Alexandru V Staicu 1, Jacek R. Radzikowski 1 Kris Gaj 1, Nikitas Alexandridis 2, Tarek El-Ghazawi 2 1 George Mason University 2 George Washington University](https://reader035.vdocument.in/reader035/viewer/2022062517/56649f305503460f94c4b2dc/html5/thumbnails/2.jpg)
2
Problem:
• Reconfigurable resources expensive and underutilized
• Many of these resources available over the network
• It is desirable to leverage networked reconfigurable resources to help other users within the same organization
![Page 3: 1 Alexandru V Staicu 1, Jacek R. Radzikowski 1 Kris Gaj 1, Nikitas Alexandridis 2, Tarek El-Ghazawi 2 1 George Mason University 2 George Washington University](https://reader035.vdocument.in/reader035/viewer/2022062517/56649f305503460f94c4b2dc/html5/thumbnails/3.jpg)
3
Tasks 1, 2, 3
Task 3
Task 1
Execution Host 1
ExecutionHost 2
Execution Host 3
Master HostSubmission Host
Task 2
Approach: Adapt and use a Job Management System
![Page 4: 1 Alexandru V Staicu 1, Jacek R. Radzikowski 1 Kris Gaj 1, Nikitas Alexandridis 2, Tarek El-Ghazawi 2 1 George Mason University 2 George Washington University](https://reader035.vdocument.in/reader035/viewer/2022062517/56649f305503460f94c4b2dc/html5/thumbnails/4.jpg)
4
Approach:
• Select the most suitable existing Job Management System (JMS)
• Extend this JMS to recognize and utilize reconfigurable resources
- identify and define functional requirements- rank known systems according to these requirements- identify which JMS is the easiest to extend
- add new dynamic resources- configure scheduling to be based on these new resources
![Page 5: 1 Alexandru V Staicu 1, Jacek R. Radzikowski 1 Kris Gaj 1, Nikitas Alexandridis 2, Tarek El-Ghazawi 2 1 George Mason University 2 George Washington University](https://reader035.vdocument.in/reader035/viewer/2022062517/56649f305503460f94c4b2dc/html5/thumbnails/5.jpg)
5
Tasks 1, 2, 3
Task 3
Task 1
Execution Host 1
ExecutionHost 2
Execution Host 3
Master Host
Submission Host
Task 2
Networked Reconfigurable Resource Management System
FPGAboards
![Page 6: 1 Alexandru V Staicu 1, Jacek R. Radzikowski 1 Kris Gaj 1, Nikitas Alexandridis 2, Tarek El-Ghazawi 2 1 George Mason University 2 George Washington University](https://reader035.vdocument.in/reader035/viewer/2022062517/56649f305503460f94c4b2dc/html5/thumbnails/6.jpg)
6
Myrinet SAN/LAN
Switch
WILDFORCE
Dell
WILDSTAR
Dell
SLAAC
Dell
WILDSTAR
Dell
WILDFORCE
Dell Sparc 10
SLAAC Research Reference Platform
Ethernet Intelligent Hub 100
Mbps
Heterogeneous network with FPGA-based accelerators
Dell HP
Sparc 20 DellGateway
SLAAC WILDSTAR
WILDFORCE SLAAC
Ethernet Intelligent Hub 100
Mbps
![Page 7: 1 Alexandru V Staicu 1, Jacek R. Radzikowski 1 Kris Gaj 1, Nikitas Alexandridis 2, Tarek El-Ghazawi 2 1 George Mason University 2 George Washington University](https://reader035.vdocument.in/reader035/viewer/2022062517/56649f305503460f94c4b2dc/html5/thumbnails/7.jpg)
7
Functional units of a typical Job Management System
jobs & their requirements
UserServer
Job SchedulerResourceMonitor
availableresources
resource requirements
scheduling policies
JobDispatcherresource allocation
and job execution
Resource Manager
![Page 8: 1 Alexandru V Staicu 1, Jacek R. Radzikowski 1 Kris Gaj 1, Nikitas Alexandridis 2, Tarek El-Ghazawi 2 1 George Mason University 2 George Washington University](https://reader035.vdocument.in/reader035/viewer/2022062517/56649f305503460f94c4b2dc/html5/thumbnails/8.jpg)
8
Classification of Investigated Systems (1)
Centralized JMS
DistributedJMS w/o a Central Scheduler
DistributedOperating
System
• LSF• CODINE• PBS• Condor• RES
• Globus• Legion• NetSolve
• MOSIX
![Page 9: 1 Alexandru V Staicu 1, Jacek R. Radzikowski 1 Kris Gaj 1, Nikitas Alexandridis 2, Tarek El-Ghazawi 2 1 George Mason University 2 George Washington University](https://reader035.vdocument.in/reader035/viewer/2022062517/56649f305503460f94c4b2dc/html5/thumbnails/9.jpg)
9
ParameterStudy
Scheduler
ResourceMonitor andForecaster
DistributedComputingInterface
• Compaq DCE• AppLES • NWS
Classification of Investigated Systems (2)
![Page 10: 1 Alexandru V Staicu 1, Jacek R. Radzikowski 1 Kris Gaj 1, Nikitas Alexandridis 2, Tarek El-Ghazawi 2 1 George Mason University 2 George Washington University](https://reader035.vdocument.in/reader035/viewer/2022062517/56649f305503460f94c4b2dc/html5/thumbnails/10.jpg)
10
Operating system, flexibility, user interface
LSF Codine PBS CONDOR RES
Distribution
Source code
OS Support
User Interface
SolarisLinuxTru64NT
GUI &CLI
CLI
com pub pub/com pub gov
GUI &CLI
GUI &CLI
GUI &CLI
![Page 11: 1 Alexandru V Staicu 1, Jacek R. Radzikowski 1 Kris Gaj 1, Nikitas Alexandridis 2, Tarek El-Ghazawi 2 1 George Mason University 2 George Washington University](https://reader035.vdocument.in/reader035/viewer/2022062517/56649f305503460f94c4b2dc/html5/thumbnails/11.jpg)
11
Scheduling and Resource Management
LSF Codine PBS CONDOR RES
Batch jobs
Interactive jobs
Parallel jobs
Accounting
![Page 12: 1 Alexandru V Staicu 1, Jacek R. Radzikowski 1 Kris Gaj 1, Nikitas Alexandridis 2, Tarek El-Ghazawi 2 1 George Mason University 2 George Washington University](https://reader035.vdocument.in/reader035/viewer/2022062517/56649f305503460f94c4b2dc/html5/thumbnails/12.jpg)
12
Efficiency and Utilization
LSF Codine PBS CONDOR RES
Stage-in andstage-out
Timesharing
Process migration
Dynamic loadbalancing
Scalability
![Page 13: 1 Alexandru V Staicu 1, Jacek R. Radzikowski 1 Kris Gaj 1, Nikitas Alexandridis 2, Tarek El-Ghazawi 2 1 George Mason University 2 George Washington University](https://reader035.vdocument.in/reader035/viewer/2022062517/56649f305503460f94c4b2dc/html5/thumbnails/13.jpg)
13
Fault Tolerance and Security
LSF Codine PBS CONDOR RES
Checkpointing
Daemon fault recovery
Authentication
Authorization
![Page 14: 1 Alexandru V Staicu 1, Jacek R. Radzikowski 1 Kris Gaj 1, Nikitas Alexandridis 2, Tarek El-Ghazawi 2 1 George Mason University 2 George Washington University](https://reader035.vdocument.in/reader035/viewer/2022062517/56649f305503460f94c4b2dc/html5/thumbnails/14.jpg)
14
Documentation and Technical Support
LSF Codine PBS CONDOR RES
Documentation
Technicalsupport
![Page 15: 1 Alexandru V Staicu 1, Jacek R. Radzikowski 1 Kris Gaj 1, Nikitas Alexandridis 2, Tarek El-Ghazawi 2 1 George Mason University 2 George Washington University](https://reader035.vdocument.in/reader035/viewer/2022062517/56649f305503460f94c4b2dc/html5/thumbnails/15.jpg)
15
JMS features supporting extension to reconfigurable hardware
• capability to define new dynamic resources
• strong support for stage-in and stage-out- configuration bitstreams- executable code- input/output data
• support for Windows NT and Linux
![Page 16: 1 Alexandru V Staicu 1, Jacek R. Radzikowski 1 Kris Gaj 1, Nikitas Alexandridis 2, Tarek El-Ghazawi 2 1 George Mason University 2 George Washington University](https://reader035.vdocument.in/reader035/viewer/2022062517/56649f305503460f94c4b2dc/html5/thumbnails/16.jpg)
16
Ranking of Centralized Job Management Systems (1)
Capability to define new dynamic resources:
Excellent: LSF, PBS, CODINEMore difficult: CONDOR, RES
Stage-in and stage-out:
Excellent: LSF, PBSLimited: CONDORNo: CODINE, RES
![Page 17: 1 Alexandru V Staicu 1, Jacek R. Radzikowski 1 Kris Gaj 1, Nikitas Alexandridis 2, Tarek El-Ghazawi 2 1 George Mason University 2 George Washington University](https://reader035.vdocument.in/reader035/viewer/2022062517/56649f305503460f94c4b2dc/html5/thumbnails/17.jpg)
17
Ranking of Centralized Job Management Systems (2)
Overall suitability to extend to reconfigurable hardware:
1. LSF2. CODINE3. PBS4. CONDOR5. RES
without changing the JMS source code
requires changes to the JMS source code
![Page 18: 1 Alexandru V Staicu 1, Jacek R. Radzikowski 1 Kris Gaj 1, Nikitas Alexandridis 2, Tarek El-Ghazawi 2 1 George Mason University 2 George Washington University](https://reader035.vdocument.in/reader035/viewer/2022062517/56649f305503460f94c4b2dc/html5/thumbnails/18.jpg)
18
Submission host
LIM
Batch API
Master host
MLIM
MBD
Execution host
SBD
Child SBD
LIM
RES
User job
Extension of LSF to reconfigurable hardware (1)Operation of LSF
LIM – Load Information ManagerMLIM – Master LIMMBD – Master Batch DaemonSBD – Slave Batch DaemonRES – Remote Execution Server
queue1
2
3
45
6 7
89
10
11
12
13
Loadinformation
otherhosts
otherhosts
bsub app
![Page 19: 1 Alexandru V Staicu 1, Jacek R. Radzikowski 1 Kris Gaj 1, Nikitas Alexandridis 2, Tarek El-Ghazawi 2 1 George Mason University 2 George Washington University](https://reader035.vdocument.in/reader035/viewer/2022062517/56649f305503460f94c4b2dc/html5/thumbnails/19.jpg)
19
Extension of LSF to reconfigurable hardware(2)
Submission host
LIM
Batch API
Master host
MLIM
MBD
Execution host
SBD
Child SBD
LIM
RES
User job
ELIM – External Load Information ManagerACS API – Adaptive Computing Systems API
queue1
2
3
45
6 7
89
10
11
12
13
Loadinformation
otherhosts
otherhosts
bsub app
ELIM
ACS API
14FPGAboard
Statusof theboard
![Page 20: 1 Alexandru V Staicu 1, Jacek R. Radzikowski 1 Kris Gaj 1, Nikitas Alexandridis 2, Tarek El-Ghazawi 2 1 George Mason University 2 George Washington University](https://reader035.vdocument.in/reader035/viewer/2022062517/56649f305503460f94c4b2dc/html5/thumbnails/20.jpg)
20
Conclusions (1)
• 12 systems evaluated using 25 functional requirements + the suitability of extension to support reconfigurable hardware
• LSF, CODINE, PBS, and Condor ranked the highest in the functional requirements
• LSF, CODINE, and PBSPro found easy to extend without changes in their source codes
• LSF most suitable to support reconfigurable hardware
![Page 21: 1 Alexandru V Staicu 1, Jacek R. Radzikowski 1 Kris Gaj 1, Nikitas Alexandridis 2, Tarek El-Ghazawi 2 1 George Mason University 2 George Washington University](https://reader035.vdocument.in/reader035/viewer/2022062517/56649f305503460f94c4b2dc/html5/thumbnails/21.jpg)
21
• General software architecture of the extended system developed
• Experimental developments, verification and performance evaluation of the extended system in progress
Conclusions (2)