hardware/software co-design/co-verificationsoc.yonsei.ac.kr/class/material/cad/hardware... ·...
TRANSCRIPT
Hardware/Software Co-Design/Co-Verification
Sungho Kang
Yonsei University
2CS&RSOC YONSEI UNIVERSITY
Outline
IntroductionCo-design MethodologyPartitioningSchedulingCo-Simulation SystemsTimed Co-simulationMultimedia ExamplesConclusion
3CS&RSOC YONSEI UNIVERSITY
IntroductionMixed Implementation
MemoryBehavioral
specificationplus
constraints
Analoginterface
Software
Hardware
Constrains
Cost
PerformanceMixed
implementation
A mixedimplementation
Program
ASIC
Interface
µP
4CS&RSOC YONSEI UNIVERSITY
IntroductionCo-Design
Integrated design of systems implemented using both hardware and software componentsWhy
Advances in enabling technologiesSystem level specification / simulationHigh level synthesis and CAD frameworks
Advanced design methods required due to the increased diversity and complexityCost and performance of HW/SW systems should be optimized for market competitivenessProduce-to-market time is vitalExploiting concurrency among design threads and tools will result in significant gains
DifficultiesTarget moves (e.g. in size and complexity)Tolerance level for error decreases
5CS&RSOC YONSEI UNIVERSITY
IntroductionCo-Design
Integration of HW and SW design techniquesThe HW and SW components are interdependentHW and SW are typically described and design using different methodologies, languages and tools
AdvantagesAcceleration of the design processLengthy system integration and test phase can be avoidedDynamic HW/SW trade-offs in the design process
Co-design methodologies differ widelyWidely differing assumptionsInterface/communication techniquesDesign goals
Example typesA microprocessor and its associated glue logicA microprocessor and special-purpose computing engine
6CS&RSOC YONSEI UNIVERSITY
IntroductionCo-Design
applicationsoftware
operatingsystem
device driver
system bus
applicationspecific
hardware
bus interface
system bus
software hardware
send, receive, wait
register reads/writesinterrupts
bus trasactions
system designhardware-software co-design
hardware-softwareco-synthesis
hardware-softwareco-simulation
hardware-software
partioning
HW/SW interface abstractions
HW/SW system design activities
7CS&RSOC YONSEI UNIVERSITY
IntroductionCo-Simulation
Simulation of heterogeneous systems whose HW and SW components are interfacingRoles of co-simulation
Verification of system specification before system synthesisVerification of mixed system after system synthesis and integrationSystem performance estimation for system partitioning
Issues of HW-SW co-simulationTiming accuracyProcessor modelPerformanceInterface transparencyTransition to co-synthesisIntegrated user interface and internal representation
8CS&RSOC YONSEI UNIVERSITY
IntroductionReason for Co-simulation
Processor transition to embedded coreNo existing hardware
Increased software contentSoftware controls more functionsDevelopment and debug times increased
More complex hardware interfacesProcessor interactions with DSPProcessor interactions with increasingly complex hardware
9CS&RSOC YONSEI UNIVERSITY
IntroductionReason for Co-Simulation
Design problem foundSomething wrong due to logic errorProblem is visible, but not apparent in hardwareFound while running software
Brings SW and HW together
10CS&RSOC YONSEI UNIVERSITY
Design MethodologyCo-Design Problems
Instruction set processorsCodesign to design well-balanced long-lasting processorsInstruction set selectionCache design Pipeline control
HW mechanism : flush the pipelinesSW solution : reorder instructions or insert no operation
ASIPSW : retargetable code generation for ASIP data pathHW : library bindingHigh performanceDesirable programmabilityLow unit cost than ASIC
Embedded systems and controllersReal time systems with peripheral devices (sensors, actuators)
11CS&RSOC YONSEI UNIVERSITY
Design MethodologyCo-Design Methodologies
Automation of conventional codesign
clear early bindingeasy design decisions
Constraints/requirementsanalysis
System specification
HW/SW partitioning
HW synthesisSW generation
interface synthesis
HW/SW cosimulation
Integrated systemevaluation/verification
12CS&RSOC YONSEI UNIVERSITY
Design MethodologyCo-Design Methodologies
Model-based codesignlate partitioning/binding after refiningeasy to handle design changescomponent reusemodular hierarchical models
Constraints/requirements analysis
System specification
Systemmodeling
Validation/Simulation
Verified model(desired granularity)
Technology assignment(into HW/SW/interface components)
Model base
Refinement(decompose
into submodels)
13CS&RSOC YONSEI UNIVERSITY
Design MethodologyModels and LanguagesTerm-rewriting technique
(e.g.) f(g(1,y) reduces to f(f(1) by applying the rewrite rule g(x,y) → f(x)Bundgen and Kuchlin, Univ. Tubinger, Germany
Funmath(Functional mathematics)Boute, Univ. nijmegen, Netherlands
Interaction refinement calculusBroy, Tech. Univ. Munchen, Germany
Graphical model (data flow diagram)Evans and Morris, Univ. of Manchester, U.K.
Synchronized TransitionsModel a computation as a fine-grain parallel computation.Model interface with state variables.
Staunstrup. Tech. Univ. of DenmarkHierarchical Petri Nets
Dittrich, Univ. Dortmund, GermanySolar (FSMs and state tables)
Jerraya and O’Brien, TIMA/INPG, FranceC, C++, VHDL
14CS&RSOC YONSEI UNIVERSITY
Design MethodologyModeling for Co-Simulation
Simulation of a mixed systemSW : functional model, instruction set level model, detailed architectural modelHW : functional model, netlist level model
Processor modeling at different levels of abstraction, depending on the availability and the desired simulation accuracy
Detailed processor modelDiscrete-event model of their internal HW architecture
Bus modelSimulate the activity on the periphery of a processorUseful for verifying low-level interactionsDifficult to model the activity accurately
Instruction set architecture model : efficient (C program)Compiled simulation (binary-to-binary translation) : very fastHW models (physical hardware)
15CS&RSOC YONSEI UNIVERSITY
Co-design ExampleEmbedded System Design
Importance factorsDesign timeManufacturing cost : consumer appliancesModifiability : prototype communication systemReliability : anti-lock brake system
TasksPartitioning into smaller interacting componentsAllocating the partitions to microprocessors or other hardware unitsSchedulingMapping each functional description into an implementation
16CS&RSOC YONSEI UNIVERSITY
Co-design ExampleEmbedded Microprocessor System
SW on a dedicated microprocessorThe HW is modeled/designed at a much lower level of abstraction than the SW.
applicationcode
I/O drivers
micro-processor
interface hardwareand glue logic
software
hardware
An embeded microprocessor system
17CS&RSOC YONSEI UNIVERSITY
Co-design ExampleHeterogeneous Multi-Processing System
Distributed heterogeneous embedded processor systemChoose the number and type of PEs.Map SW tasks onto PEs.Meet performance objective while minimizing HW cost.
appl.code
software
hardware
appl.code
appl.code
µ proc.type 1
µ proc.type 1
µ proc.type 1
interconnection network
...
...
18CS&RSOC YONSEI UNIVERSITY
Co-design ExampleApplication Specific Instruction Set
Application-Specific :Optimize for the given application SW
HW-SW boundary can be moved by adding/deleting instructionsModifiability should be considered in finding the best partition
software hardware
nativecode
storage
hard-wiredcontroller
datapath
An application-specific instruction set processor
19CS&RSOC YONSEI UNIVERSITY
Co-design ExampleApplication Specific Co-Processors
An Instruction set processor + custom co-processorsCustom co-processor systems afford many degrees of freedom (DSP, SISD, SIMD, MIMD)Different HW-SW partitioning strategies
Move the performance-critical regions of code into HW.Reduce HW cost without decreasing performance.Minimize the communication between HW & SW components and maximize the concurrency.Trade-off performance and cost.
software
hardware
co-processor interface logic
microprocessor
ctrl ctrldatapath
datapath
nativecode
storage
A multi-threaded custom co-processor
...
20CS&RSOC YONSEI UNIVERSITY
PartitioningPartitioning Problems
Two-way vs multi-way partitioningPerformance-driven partitioning (system performance)
SpeedPower
Layout-driven partitioningPartitioning with replication (FPGA pins)Hardware-software partitioning (trade-off)
Co-synthesis rather than partitioningSome factors are not easily quantifiable
21CS&RSOC YONSEI UNIVERSITY
PartitioningPartitioning in Co-Design
PartitioningHW/SW levelArchitecture partitioningEarly binding : easy design decisionsLate binding : easy to handle change requirements
better solution Partitioning on a high level
Hardware sharing among exclusive operationsConcurrent scheduling to each partition
Parameterized modules from library for specific design needs
Interface : signal exchange, interrupt driven, center schedulerco-simulation : parallelism vs synchronization
22CS&RSOC YONSEI UNIVERSITY
PartitioningHW-SW Partitioning
Performance requirementsSome functions may need to be implemented in HWThe overhead of synchronization and data transfer should be considered
Implementation costsHW can be shared
ModifiabilitySW can be easily changed
Nature of computationSome function may have an affinity for either HW or SWDegree of data parallelism
23CS&RSOC YONSEI UNIVERSITY
PartitioningHW-SW Partitioning
Software-preferredTask which calls OS often
Hardware-preferredArithmetic operationHigh degree of data parallelismMultiple threads of controlCustomized memory architecture
24CS&RSOC YONSEI UNIVERSITY
PartitioningSoftware-Oriented Partitioning
As many operations as possible in softwareHigh memory density of standard microprocessorsOptimally adapted compilersCarefully verified standard coresSimpler software debuggingSmaller hardwareFlexibility in case of modification
External hardwareWhen timing constraints are violatedBasic and inexpensive I/O functions
25CS&RSOC YONSEI UNIVERSITY
PartitioningHardware-Oriented Partitioning
Move hardware functions to software checking timing constraints and synchronism
Min/max delay constraintsExecution rate constraints
ChallengesInterface synchronizationHierarchical memory schemesMultiple processorsPipelining
26CS&RSOC YONSEI UNIVERSITY
PartitioningPerformance Estimation
At a low abstraction level Easy and accurateLong iteration time
At a higher level of abstractionNecessary to reduce the exploring timeCurrently quick and dirtyGoal : quick and accurate
Performance and cost estimation is important for HW-SW partitioning and for HW/SW synthesis/optimization
27CS&RSOC YONSEI UNIVERSITY
SchedulingScheduling for Real-Time Systems
Intricate timing requirements at different levelsProcessors must generate a sequence of control signals and read/write with appropriate time intervalsHigher level timing constraints
System-level scheduleraccurate run-time estimation by scheduling after resource bindingUnder sequencing, rate, and response time constraints
Low-level schedulerEach atomic sequence at system level is scheduledHardware and software device interactions are considered
28CS&RSOC YONSEI UNIVERSITY
SchedulingParallel Scheduling
The task of schedulingAssign actors to processorsOrder execution of the actors on each processorDetermine when each actor fires such that all data precedence constraints are met
Non-preemptive scheduleAn actor can not be interrupted in the middlePreemption entails a significant implementation overhead
Static/dynamic scheduleStatic scheduling for hard real-time constraintsDynamic scheduling at run time may lead to a more flexible implementation But difficult to achieve real-time performance guarantees
Performance metricThe (average) iteration period TThe throughput 1/T
29CS&RSOC YONSEI UNIVERSITY
SchedulingParallel SchedulingOptimal multiprocessor scheduling of an acyclic graph is known to be NP-hard.List scheduling is popular.
Assign priorities to tasks.The ready task with the highest priority is scheduled as soon as a processor is available.
Overlapped schedules are superior to blocked schedules with unfolding and retiming.
B C
A D
B C
A D
C AD B
C AD B
C BD A
C BD A D
C AD B
C B CA D A
Proc 1Proc 2
Proc 1Proc 2
Proc 1Proc 2
Proc 1Proc 2
(a) HSDFG acyclic precedencegraph
(b) blocked schedule
(c) overlapped schedule
t
tT= 2 t.u.
T= 3 t.u.
Fully static schedule
30CS&RSOC YONSEI UNIVERSITY
OptimizationHardware Reusability
Key to cost effective designReusability
Possibility of several operators to a HW moduleExternal controllabilityDependent on the characteristics of the circuitThe ability of the CAD tool is also importantThe functionality of HW modules can be extended or adapted by means of automatic transformations (library, structure, behavior)
31CS&RSOC YONSEI UNIVERSITY
OptimizationDesign Space Exploration
Design parametersThe number of HW functional unitsThe max number of busesThe max number of variable references in a step.
Estimation of resources, time steps, buses, variable accesses
Scheduling can be computationally intensive.A lower bound can be heuristically estimated.
Complete scheduling gives an upper boundWindow concept by the threshold W
W=1 ⇒ complete scheduleW>1 ⇒ partial schedule
(degree of freedom ≤ W)Partition DAGs, until # t_steps(DAG) ≤ WDepth first search branch and bound, to reduce memory requirement.
32CS&RSOC YONSEI UNIVERSITY
OptimizationDesign Space Exploration
Optimization problemDominating design
Each criterion is no worse than those of other designs
Find the set of designs which are not dominated by other designs
When resource estimation and partial scheduling return a design point
Discard it if it is dominated by othersIncorporate it in the design space, otherwise
The number of time steps for each DAG is initially set to its critical length, and is relaxed if a constraint is violated
33CS&RSOC YONSEI UNIVERSITY
Co-SImulation SystemsBecker, Singh, and Tell(1992)
NIU Firmware inC++
Interfacefunctions
Co-simulationsupport library
Unix
NIU HW descriptionin Verilog
SPARCmodel
Portmodel
other devicemodel
IPC tasks Builtin tasks
Verilog-XL
Unix
NIU Monitorprogram in C++
Interfacefunctions
Co-simulationsupport library
Unix
Systemunder
Development
Co-simulation Environment
Network
Co-simulation of Network Interface Unit (NIU) using PLI of Verilog-XL simulator, C++, and Unix SocketSynchronized handshake + cycle accurate processor interface model
34CS&RSOC YONSEI UNIVERSITY
Co-SImulation SystemsThomas, Adams, and Schmit (1993)
Co-simulation using PLI of Verilog-XL simulator and Unix SocketSynchronized handshake with no processor model
softwareprocess 1
softwareprocess 2
Unix sockets
hardwareprocess 1
hardwareprocess 2
Verilog PLI
Bus interface module
Application-specifichardware module
Verilog simulator
35CS&RSOC YONSEI UNIVERSITY
Co-SImulation SystemsPoseidon
Gupta, Coelho, and Ed Micheli, 1992Pin-level(cycle accurate) model of processorsSoftware in assembly code of the target processor(DLX)
Ariadne
DLXSimulator MercuryPOSEIDON
System Graph Model
DLX Assembly Code Gate-Level Description
36CS&RSOC YONSEI UNIVERSITY
Timed Co-SimulationTimed Co-Simulation
Untimed co-simulationHardware and software time scales are not synchronizedGood for verifying functional completenessCorrect results for handshaking protocol
Timed co-simulationHardware and software time scale are synchronized at each data exchangeUseful for overall performance assessment Can be used for both handshaking and non-handshaking protocol
37CS&RSOC YONSEI UNIVERSITY
Timed Co-SimulationHow to implement
Nano-second accurate co-simulationNeed expensive processor modelVery slow
Cycle-based co-simulationNeed cycle-based processor modelSlow
Synchronization through software timing estimationNo need of processor simulation modelFast
38CS&RSOC YONSEI UNIVERSITY
Timed Co-SimulationSoftware Timing Estimation
simulationcompiler
host binary
target binary
simulationcompiler C compiler
Cprogram
target binary
host binary
Modify the intermediate C program for profiling
39CS&RSOC YONSEI UNIVERSITY
Timed Co-SimulationSynchronization
LockstepSynchronize at every stepLarge IPC overhead
OptimisticSimulate to optimistic next synchronization pointIn the event of inconsistency, rollback
40CS&RSOC YONSEI UNIVERSITY
Timed Co-SimulationNon-IPC-based implementation
SW instructions in the event list
HWevents & elements
SWinstruction
bend_loopstart, LT
lacl var
SW clockperiod(*n)
41CS&RSOC YONSEI UNIVERSITY
Timed Co-SimulationTimed Co-Simulation
Implementation with VHDL description
procedure sw_part (data : inout integer;next : out TIME;
)
attribute foreign of sw_part : procedure is “C”;
-- in the architecture bodySW_processor : processvariable next_time : TIME;begin
sw_part(data=>hw_data, next=>next_time);wait for next_time - now;
end process;
42CS&RSOC YONSEI UNIVERSITY
ConclusionConclusion
Suggestion for Co-design Design systems/prototypes by using co-design models
This facilitates postponing partitioning decisionThis constitutes reusable modules
Develop successful partitioning methodsDevelop validation techniques
Co-simulation (C+VHDL)HW/SW emulationFormal verification
Co-simulation Open IssuesWell-defined abstract models for HW-SW co-simulationSimulation-based performance estimationTime-accurate and fast co-simulationIntegration with other co-design tools