initial observations of hardware/software co-simulation using fpga in architecture research
Post on 31-Dec-2015
26 Views
Preview:
DESCRIPTION
TRANSCRIPT
§ Georgia Institute of Technology, † Intel Corporation
Initial Observations of Hardware/Software Co-
simulation using FPGA in Architecture Research
Initial Observations of Hardware/Software Co-
simulation using FPGA in Architecture Research
Taeweon Suh Taeweon Suh §§ Hsien-Hsin S. Lee Hsien-Hsin S. Lee §§
Shih-Lien LuShih-Lien Lu †† John Shen John Shen ††
February 12,February 12, 20062006
2Georgia Tech, Intel - WARFP 2006
Hardware/Software Co-simulationHardware/Software Co-simulation Software simulationSoftware simulation
– Advantages: Flexible, observable, easy-to-implementAdvantages: Flexible, observable, easy-to-implement– Disadvantage: Intolerable simulation timeDisadvantage: Intolerable simulation time
Hardware emulationHardware emulation– Advantage: Significant speedup, concurrent executionAdvantage: Significant speedup, concurrent execution– Disadvantages: Much less flexible and observable, Disadvantages: Much less flexible and observable,
low-level design taking longer time to implement and low-level design taking longer time to implement and validatevalidate
Hardware/Software Co-simulationHardware/Software Co-simulation– Try to retain advantages of both approachesTry to retain advantages of both approaches– Basic ideaBasic idea
Implement time-consuming software functions into FPGAImplement time-consuming software functions into FPGA The remaining simulator interacts with FPGAThe remaining simulator interacts with FPGA
3Georgia Tech, Intel - WARFP 2006
Intel server systemIntel server system
Experiment EquipmentExperiment Equipment
Pentium-IIIPentium-III
ACE FPGA boardACE FPGA board
Logic analyzerLogic analyzerHost PCHost PC
UARTUART
4Georgia Tech, Intel - WARFP 2006
Communication MethodCommunication Method
Communication between Pentium-III and FPGACommunication between Pentium-III and FPGA– Use FSB as communication mediumUse FSB as communication medium– Allocate one page of memory for communicationAllocate one page of memory for communication– SendSend data to FPGA: data to FPGA: write-throughwrite-through cache mode cache mode– ReceiveReceive data from FPGA: data from FPGA: cache-to-cachecache-to-cache transfer transfer
Front-side bus (FSB)
Pentium-III Pentium-III (MESI)(MESI)
Memorycontroller
2GB SDRAM
FPGAFPGA(Virtex-II)(Virtex-II)
“write” bus transaction
“cache-to-cache transfer”“read” bus transaction
cache line“FLUSH”
5Georgia Tech, Intel - WARFP 2006
Hardware/Software Implementation Hardware/Software Implementation
Hardware (FPGA) implementationHardware (FPGA) implementation– State machinesState machines
Monitoring bus transactions on FSBMonitoring bus transactions on FSB Checking bus transaction types, i.e., read or writeChecking bus transaction types, i.e., read or write Managing cache-to-cache transferManaging cache-to-cache transfer
– Implementation of software functions to FPGAImplementation of software functions to FPGA– Debugging logic and statistics countersDebugging logic and statistics counters
Software implementationSoftware implementation– Linux device driverLinux device driver
FPGA needs to know when to respond to FSB transactionsFPGA needs to know when to respond to FSB transactions Specific physical address is needed for communication Specific physical address is needed for communication Allocate one page of memory for FPGA access via Allocate one page of memory for FPGA access via Linux Linux
device driverdevice driver– Simulator modification for accessing FPGASimulator modification for accessing FPGA
6Georgia Tech, Intel - WARFP 2006
Example: Simplescalar Co-simulationExample: Simplescalar Co-simulation Preliminary experiment for correctness checkupPreliminary experiment for correctness checkup
– Implement a simple function (Implement a simple function (mem_access_latencymem_access_latency) ) into FPGAinto FPGA
Co-simulation resultsCo-simulation results
mcf
bzip2
craftyeon-cook
Baseline (h:m:s)Co-simulation
(h:m:s)difference
(h:m:s)2:18:38 2:20:50 + 0:02:12
gcc-166
parser
perl
twolf
3:03:58 3:06:50 + 0:02:52
2:56:38 2:59:28 + 0:02:50
2:43:52 2:45:45 + 0:01:53
3:45:30 3:48:56 + 0:03:26
3:34:57 3:37:27 + 0:02:30
2:42:30 2:45:50 + 0:03:20
2:43:30 2:45:28 + 0:01:58
7Georgia Tech, Intel - WARFP 2006
Co-simulation Results Analysis Co-simulation Results Analysis
FSB access is expensiveFSB access is expensive– ~ 20 FSB cycles (~ 20 FSB cycles (≈ ≈ 160 CPU cycles) for each transfer160 CPU cycles) for each transfer
One cache line (32 bytes) needs to be transferred for One cache line (32 bytes) needs to be transferred for cache-to-cache transfercache-to-cache transfer
P-III MESI requires to update main memory upon cache-P-III MESI requires to update main memory upon cache-to-cache transferto-cache transfer
““mem_access_latency”mem_access_latency” function is too simple function is too simple– Even software simulation takes at most a few dozen Even software simulation takes at most a few dozen
CPU cyclesCPU cycles Device driver overhead Device driver overhead
– System overhead due to device driverSystem overhead due to device driver– It requires one TLB entry, which would be used in the It requires one TLB entry, which would be used in the
simulation otherwisesimulation otherwise Time-consuming software routines and reasonable Time-consuming software routines and reasonable
FPGA access frequency are needed to benefit from FPGA access frequency are needed to benefit from hardware implementationhardware implementation
8Georgia Tech, Intel - WARFP 2006
On-going WorkOn-going Work
SoftSDV co-simulation for multi-core researchSoftSDV co-simulation for multi-core research– Implement distributed lowest level caches, and Implement distributed lowest level caches, and
interconnection network such as ring or mesh in FPGAinterconnection network such as ring or mesh in FPGA
L3
L3
CPU0L1,L2
Ring I/F
Ring I/F
CPU4
L1,L2
L3
L3
CPU1L1,L2
Ring I/F
Ring I/F
CPU5
L1,L2
L3
L3
CPU2L1,L2
Ring I/F
Ring I/F
CPU6
L1,L2
L3
L3
CPU3L1,L2
Ring I/F
Ring I/F
CPU7
L1,L2
FPGAFPGA
9Georgia Tech, Intel - WARFP 2006
Conclusions Conclusions
Proposed a new co-simulation methodologyProposed a new co-simulation methodology Preliminary co-simulation using Simplescalar proves Preliminary co-simulation using Simplescalar proves
the correctness of the methodologythe correctness of the methodology – Hardware/softwareHardware/software implementationimplementation– Communication between P-III and FPGA via FSBCommunication between P-III and FPGA via FSB– Linux driver Linux driver
Co-simulation results indicateCo-simulation results indicate – Bus access (FSB) is expensiveBus access (FSB) is expensive– Linux driver overhead also needs to be overcomeLinux driver overhead also needs to be overcome– Time-consuming blocks need to be emulatedTime-consuming blocks need to be emulated
Multi-core co-simulation would benefit from FPGAMulti-core co-simulation would benefit from FPGA– Implement distributed low-level caches and Implement distributed low-level caches and
interconnection network, which would be complex interconnection network, which would be complex enough to benefit from hardware modelingenough to benefit from hardware modeling
10Georgia Tech, Intel - WARFP 2006
Questions, Comments?Questions, Comments?
Thanks for your attention!
11Georgia Tech, Intel - WARFP 2006
Backup Slides
12Georgia Tech, Intel - WARFP 2006
Communication DetailsCommunication Details
All FSB signals are mapped to FPGA pinsAll FSB signals are mapped to FPGA pins Encoding software function arguments in the FSB Encoding software function arguments in the FSB
address for Simplescalar exampleaddress for Simplescalar example– For 4KB page,For 4KB page,
Set its attribute as write-through modeSet its attribute as write-through mode Lower 12 bits in FSB address bus are free to useLower 12 bits in FSB address bus are free to use High 24 bits are used for TLB translationHigh 24 bits are used for TLB translation
Front-side bus (FSB)
Pentium-III Pentium-III (MESI)(MESI)
XilinxXilinxVirtex-IIVirtex-II
top related