simics and friends simics and friends – modeling tools for
TRANSCRIPT
Simics and Friends Simics and Friends ––Modeling Tools for CMP ResearchModeling Tools for CMP Research
ZvikaZvikaZvikaZvikaZvikaZvikaZvikaZvika GuzGuzGuzGuzGuzGuzGuzGuz, , , , , , , , Isask’harIsask’harIsask’harIsask’harIsask’harIsask’harIsask’harIsask’har ((((((((ZigiZigiZigiZigiZigiZigiZigiZigi) Walter) Walter) Walter) Walter) Walter) Walter) Walter) Walter
The Technion The Technion –– Israel Institute of Technology Israel Institute of Technology
Modeling Tools for CMP ResearchModeling Tools for CMP Research
AgendaAgenda
n Review the most commonly used tools in CMP µarch research
¨ Simulators
¨ Benchmarks
Official AgendaOfficial AgendaOfficial AgendaOfficial AgendaOfficial AgendaOfficial AgendaOfficial AgendaOfficial Agenda
Unofficial AgendaUnofficial AgendaUnofficial AgendaUnofficial AgendaUnofficial AgendaUnofficial AgendaUnofficial AgendaUnofficial Agenda
2
n Convince you to use Simics
¨ Because most often than not it is the best option
¨ Because we need more (geographically adjacent) people
Unofficial AgendaUnofficial AgendaUnofficial AgendaUnofficial AgendaUnofficial AgendaUnofficial AgendaUnofficial AgendaUnofficial Agenda
n Teaching the tools
Not on Our AgendaNot on Our AgendaNot on Our AgendaNot on Our AgendaNot on Our AgendaNot on Our AgendaNot on Our AgendaNot on Our Agenda
OutlineOutlinen Choosing a Simulators
n Simics
n And friends
¨ GEMS, Garnet & Orion , FeS2, SimFlex
n OPNET - modeling CMP interconnect
n Benchmarks
n Summary
¨ Technion goodies
3
OutlineOutlinen Choosing a Simulators
n Simics
n And friends
¨ GEMS, Garnet & Orion , FeS2, SimFlex
n OPNET - modeling CMP interconnect
n Benchmarks
n Summary
¨ Technion goodies
4
Choosing A SimulatorChoosing A Simulator
Detail Flexibility
Performance
DesignSpace
Detail Flexibility
PerformanceDesignSpace
Ease Of Use
n What should it model?
¨ Processor /Cache/Interconnect/etc.
n What would run on it?
¨ Benchmarks type
5
Detail FlexibilityDetail Flexibility
CMP Research CMP Research ((very partial list)very partial list)
n System-wide architecture
n Asymmetric CMP
n Memory hierarchy
¨ Caches
¨ Coherence protocols¨ Coherence protocols
n Reciprocation between HW and SW
¨ Hybrid Transactional memory
n Interactions among the cores
¨ Task assignment/migration
n Interconnect
¨ Network-On-Chip6
Choosing a Simulator for CMP Research Choosing a Simulator for CMP Research
n What will it model?
¨ Multiple cores
¨ Memory hierarchy (caches, coherence)
¨ Interconnect (NoC)
n What will run on it? n What will run on it?
¨ Multi-threaded benchmarks
¨ Commercial workloads
7
ßßßß Need OS for that
ßßßß Really need OS for that
⇒⇒⇒⇒ Full-system simulator, capable of booting (commercial) OS
OutlineOutlinen Choosing a Simulators
n Simics
n And friends
¨ GEMS, Garnet & Orion , FeS2, SimFlex
n OPNET - modeling CMP interconnect
n Benchmarks
n Summary
¨ Technion goodies
8
Meet the ContendersMeet the Contendersn SimpleScalar
¨ Uniprocessor
n PIN
¨ Not a simulator
Several in-house tools n Several in-house tools
¨ Not relevant
n M5
n Simics
n ?
9
Why Simics? Why Simics? (the short answer)(the short answer)
n Because everyone is using it
¨ THE most widely used simulator in our field
¨ 1/3 of ISCA’07 papers used Simics
n Huge, active community
Alive and kicking forum n Alive and kicking forum
n Because it is free
¨ For academia
n Up to Simics 4.2 L
n ..Oh.. and because it is really really good!
10
SimicsSimics in a Nutshellin a Nutshelln Virtual Hardware
¨ Event driven
¨ Cycle accurate*
Complete production
The software can’t tell the difference
Runs binaries from real target
Operating system
User program
MiddlewareDBJava VM
Target Software
11
HW/SW interface
productionsoftware
Simulated(virtual) hardware
Virtual Hardware
CPU
RAM
FLASH
User Intfdevice
A/DROM
PCI
I2C
BusCPU
NetworkDisk
Disk Ctrl
Drivers Boot firmwareHardware abstraction layer
Operating system
Target Software
http://www.virtutech.com/
Simics Overview (Simics Overview (11//33))
n A software, event-driven simulator
n Full-system simulator
¨ Processor
“Simics is a flexible, scalable, and high-performance full-system simulator”
¨ Memory hierarchy (DRAM, Disk)
¨ Network
¨ Devices (DMA, Interrupt controller, PCI, etc.)
n Runs unmodified binaries
¨ OS, drivers and applications
¨ Models the entire machine that OS sees
¨ Application cannot tell the difference12http://www.virtutech.com/
Simics Overview (2/3)Simics Overview (2/3)n
n Fully supported ISAs:
¨ SPARC
¨ X86
“Simics is a flexible, scalable, and high-performance full-system simulator”
¨ Alpha, Itanium, MIPS, ARM, ..
n Scalable:
¨ Single processor (uniprocessor /CMP) à MPs à Racks à ClustersàDistributed systems
13http://www.virtutech.com/
Simics Overview (Simics Overview (33//33))n
n Flexible
¨ Different degrees of simulation (details)
n Functionality only
“Simics is a flexible, scalable, and high-performance full-system simulator”
n Microarchitecture and timing
¨ Configurable
n Hook/unhook modules
n Control their timing
n Write your own (in C++)
14http://www.virtutech.com/
“Demo”“Demo”
Solaris/PowerPCSolaris/PowerPC
RedHat 7.2/Itanuim
NT/x86
RedHat 6.2/x86
15
RedHat 7.2/ Pentium III
XP/x86-64
RedHat 7.2/ Pentium III
Simics console
XP/x86-64Solaris 8/UltraSparc II
Simics console
http://www.virtutech.com/
What Have We Seen?What Have We Seen?
User application code
Target operating system (s)
Middleware and libraries
16
SimicsSimics
Host hardwareHost hardware
Host operating systemHost operating system
Virtual target hardware
Target operating system (s)
http://www.virtutech.com/
Simics Provides:Simics Provides:n Checkpoints
¨ Save/restore state
n Breakpoints
¨ Temporal breakpoints
¨ Break on memory/Register/IO¨ Break on memory/Register/IO
¨ Graphics breakpoint
n Magic instructions
¨ Signal Simics from within your application
n Access host files from the simulated machine
n So much more..
17
Simics Timing ModelsSimics Timing Modelsn Default mode
¨ Every instruction takes exactly 1 clock cycle
n Including access to disc, access to memory, etc.
n in-order mode
¨ User defines a timing model function which will be called
10X-100Xslowdown
¨ User defines a timing model function which will be called when memory request occurs
¨ Function returns the number of cycles to stall
n Out-of-order mode (MAI mode)
¨ Detailed out-of-order µarch simulation
¨ User-defined processor model
n Full control on how instructions advance
18
1000X-10000X slowdown
10000X-1 million slowdown
Simics Timing Simics Timing -- defaultdefaultn Emulation mode
n Used for fast-forwarding
¨ Boot OS
¨ Build workload
¨ Fast-forward to relevant execution part¨ Fast-forward to relevant execution part
n Basically, used for creating a checkpoint
19
Simics Timing Simics Timing –– in orderin ordern Timing model is a C program
n You can act on every memory access
n Usually used for modeling:
¨ Caches (and cache hierarchies)
¨ Coherency protocols (directory)
¨ Hardware/Hybrid transactional memory
20
Simics Timing Simics Timing –– Out Of Order ModeOut Of Order Moden Gives full control over timing
¨ User decides when things happen
n Fetch/decode/execute/commit
¨ Simics handle how these things happen
n MAI supports:n MAI supports:
¨ Out-of-order execution, multi-processor, multi-threading, branch prediction, value prediction
n Used for processor µarch research
¨ Models processor internal
n And whenever you need a better notation of time
¨ Interconnect study
21
Simple Example Simple Example –– Adding Cache (Adding Cache (11//44))
n Nahalal – A new cache architecture for CMP
n Architectural differentiation of cache lines at runtime
¨ According to usage - Private vs. Shared
22
CPU0
CPU1
CPU2
CPU6
CPU5
CPU4
CPU3CPU7
CPU0
CPU1
CPU2
CPU6
CPU5
CPU4
CPU3CPU7
Simple Example Simple Example –– Adding Cache (Adding Cache (22//44))
1. Writing a cache timing model
¨ C- Program
23
Simple Example Simple Example –– Adding Cache (Adding Cache (33//44))
2. Hooking the new cache into Simics
¨ Python script
24
Simple Example Simple Example –– Adding Cache (Adding Cache (44//44))
3. Run Simics and collect statistics
25
Simics in ResearchSimics in Researchn “Virtual Hierarchies,” M. R. Marty and M. D. Hill, Micro's Top Picks 2008
n “Improving Multiple-CMP Systems Using Token Coherence,”, M. R. Marty, J. D. Bingham, M. D. Hill, A. J. Hu, M.K. Martin and D. A. Wood, HPCA 2005
n "Nahalal: Cache Organization for Chip Multiprocessors", Z. Guz, I. Keidar, A. Kolodny, U. C. Weiser, IEEE Computer Architecture Letters, May 2007
n “Memory Mapped ECC: Low-Cost Error Protection for Last Level Caches”, D. H. Yoon and M. Erez, ISCA 2009Yoon and M. Erez, ISCA 2009
n “TokenTM: Efficient Execution of Large Transactions with Hardware Transactional Memory”, J. Bobba, N. Goyal, M. D. Hill, M. M. Swift, and D. A. Wood, ISCA 2008
n “Predicting the Performance of Reconfigurable Optical Interconnects in Distributed Shared-Memory Systems”, W. Heirman, J. Dambre, I. Artundo, C. Debaes, H. Thienpont, D. Stroobandt, J. Van Campenhout, Photonic Network Communications ’08
n “Serializing Instructions in System-Intensive Workloads: Amdahl's Law Strikes Again“ P. M. Wells, G. S. Sohi, HPCA 2008
n “Predictor Virtualization,” I. Burcea, S. Somogyi, A. Moshovos and B. Falsafi, ASPLOS 2008
26
OutlineOutlinen Choosing a Simulators
n Simics
n And friends
¨ GEMS, Garnet & Orion , FeS2, SimFlex
n OPNET - modeling CMP interconnect
n Benchmarks
n Summary
¨ Technion goodies
27
AddAdd--ons for Simicsons for Simicsn Open-source add-ons enlarge Simics capabilities
n Some as popular as Simics itself
n GEMS n GEMS
n Garnet & Orion
n SimFlex
n FeS2
28
MultifacetMultifacet GEMSGEMS
n The most mature Simics add-on
¨ Most of ISCA’s Simics papers actually use GEMS
¨ Alive and active forum
“GEMS is a set of modules for Virtutech Simics that enables detailed simulation of multiprocessor systems, including CMP.”
Alive and active forum
n Two main components¨ Ruby – Memory system timing simulator
¨ Opal – Timing model for OOO processor
n Flexible ¨ Can be configured/altered/hacked
¨ Add your own models
29http://www.cs.wisc.edu/gems/
GEMS RubyGEMS Rubyn Cache hierarchy
¨ L1, L2 (private/shared), SNUCA/DNUCA, Simple DRAM
n Different coherence protocols
¨ Snoop, Directory, Token coherence
¨ Write your own¨ Write your own
n HW transaction memory
¨ Log-TM
¨ Sun’s Rock
n Interconnect
¨ Simple
¨ Garnet - detailed NoC interconnect
30http://www.cs.wisc.edu/gems/
OutlineOutlinen Choosing a Simulators
n Simics
n And friends
¨ GEMS, Garnet & Orion , FeS2, SimFlex
n OPNET - modeling CMP interconnect
n Benchmarks
n Summary
¨ Technion goodies
31
L2$ L2$ L2$ L2$
L2$ L2$ L2$ L2$
CPUL1$
CPUL1$
CPU
L1$
CPU
L1$
CMP is More than CPUs and Memory…CMP is More than CPUs and Memory…
n We need to model the interconnect too
¨ Might have a paramount effect on performance and power
n Sometime, this is all we need!
L2$ L2$ L2$ L2$
L2$ L2$ L2$ L2$
CPU
L1$
CPUL1$
CPUL1$
CPU
L1$
32
n Important part of the system!
n Static modeling can account for static attributes
¨ Topology, routing, link bandwidth, packet size, etc.
n Run-time effects are much harder to (statically) model
Simulate the Interconnect? Why Bother?Simulate the Interconnect? Why Bother?
¨ Shared resource arbitration, finite buffer sizes, channel multiplexing, flow control, …
¨ Might be dominating factors
¨ Driving home during rush hours
33
n NoC is a network!
¨ Use a network oriented tool with built in support for traffic modeling
¨ Eliminate complex system simulator if not really needed
n Perfect tool for optimizing the interconnect
¨ Architecture, topology, protocols, parameter tuning, etc.
Network vs. Full System SimulatorNetwork vs. Full System Simulator
¨ Architecture, topology, protocols, parameter tuning, etc.
n Easy programming and debugging
n Fast!
¨ “Fastest discrete event simulation engine among leading industry solutions”
34
OPNET Modeler FeaturesOPNET Modeler Featuresn Object-oriented modeling
n Hierarchical modeling environment
n GUI-based debugging and analysis
n Event-driven simulation engineEvent-driven simulation engine
n Coding C/C++ & auxiliary functions
n Open interface for integrating external object files, libraries, and other simulators
n Asynchronous/synchronous modeling
35
n "QNoC: QoS architecture and design process for Network on Chip“, E. Bolotin, I. Cidon, R. Ginosar, A. Kolodny, Special issue on Networks on Chip, The Journal of Systems Architecture, December 2003
n "Network Delays and Link Capacities in Application-Specific Wormhole NoCs“, Z. Guz, I. Walter, E. Bolotin, I. Cidon, R. Ginosar, and A. Kolodny, VLSI Design, vol.2007, Article ID 90941, May 2007
OPNET in CMP ResearchOPNET in CMP Research
n "Routing Table Minimization for Irregular Mesh NoCs“, E. Bolotin, I. Cidon, R. Ginosar, A. Kolodny, DATE 2007
n "Access Regulation to Hot-Modules in Wormhole NoCs“, I. Walter, I. Cidon, R. Ginosar, A. Kolodny, NOCS 2007
n "The Power of Priority: NoC based Distributed Cache Coherency“, E. Bolotin, Z. Guz, I. Cidon, R. Ginosar, A. Kolodny, NOCS 2007
n "Best of Both Worlds: A Bus Enhanced NoC (BENoC)“, R. Manevich, I. Walter, I. Cidon, and A. Kolodny, the ACM/IEEE Int. Symp. on Networks-on-Chip (NOCS), 2009
36
n A new interconnect architecture, utilizing “the best of both worlds”
¨ Use NoC for data delivery
¨ Use bus for lightweight, latency critical meta-data
n Coherency
n Evaluated used OPNET and Simics RR R R
BusBus--Enhanced Network onEnhanced Network on--ChipChip
n Evaluated used OPNET and Simics R
R
R
R
R R
R
RR R R
RR R R
R
Module
Module
Module
Module
Module
Module
Module
Module
ModuleModule Module Module
ModuleModule Module Module
37
RR R R
BusBus--Enhanced Network onEnhanced Network on--ChipChip
R
R
R
R
R R
R
RR R R
RR R R
R
Module
Module
Module
Module
Module
Module
Module
Module
ModuleModule Module Module
ModuleModule Module Module
38
n Run OPNET as a trace-driven simulator
¨ L2 access logs generated by Simics
n Advantages
¨ Fast
¨ Simple
Gluing OPNET to Gluing OPNET to SimicsSimics
¨ Simple
n Disadvantage
¨ Dependencies are lost
¨ Does not account for latency hiding techniques (e.g. OOO)
n But..
¨ OPNET can be glued to Simics using Ruby
39
OutlineOutlinen Choosing a Simulators
n Simics
n And friends
¨ GEMS, Garnet & Orion , FeS2, SimFlex
n OPNET - modeling CMP interconnect
n Benchmarks
n Summary
¨ Technion goodies
40
Meet the ContendersMeet the Contendersn CPU2006, CPU2000
n OMP2001
n JBB2005, JBB2000
n SPLASH-2
n PARSEC
n Commercial workloads
¨ Apache
¨ Databases
¨ ?
n ?41
Benchmark ComparisonBenchmark Comparison
CPU2006
OMP2001
SPLASH-2
PARSEC Commercial
Programs 29 11 14 13 1
Multi-Threaded LLLLLLLLLLLL ☺☺☺☺ ☺☺☺☺ ☺☺☺☺ ☺☺☺☺
Diverse LLLL LLLL ☺☺☺☺
42
Diverse LLLL LLLL ☺☺☺☺
Updated LLLL ☺☺☺☺
Emerging apps LLLL ☺☺☺☺
Installation ease ☺☺☺☺ ☺☺☺☺ ☺☺☺☺ LLLLLLLLLLLL
Simulation friendly LLLLLLLLLLLL LLLL ☺☺☺☺☺☺☺☺☺☺☺☺ LLLLLLLLLLLL
The PARSC Benchmark SuiteThe PARSC Benchmark Suite
n Over 1000 downloads since release
n This is what everyone will be using43http://parsec.cs.princeton.edu/
OutlineOutlinen Choosing a Simulators
n Simics
n And friends
¨ GEMS, Garnet & Orion , FeS2, SimFlex
n OPNET - modeling CMP interconnect
n Benchmarks
n Summary
¨ Technion goodies
44
Technion Goodies Technion Goodies n http://www.ee.technion.ac.il/matrics/software.html
n Simics workload kits
¨ Ease up installation of simics workloads
n Wisconsin GEMS provide few other too
¨ Constantly adding more workloads to the pool
n Can you help?
n OPNET models for NoC
¨ Our entire QNoC model for OPNET
n Cores, router and links, SNUCA/DNUCA L2 caches
n Routing schemes, arbitration policies, resource contention
n Synthetic/trace driven simulation
n Transactified version of Apache 45
SummarySummaryn A swift overview of simulation tools for CMP
¨ Simics
¨ GEMS
¨ OPNET
¨ Benchmarks¨ Benchmarks
n Technion’s two cents
46
Questions?