simics and friends simics and friends – modeling tools for

46
Simics and Friends Simics and Friends – Modeling Tools for CMP Research Modeling Tools for CMP Research Zvika Zvika Zvika Zvika Zvika Zvika Zvika Zvika Guz Guz Guz Guz Guz Guz Guz Guz, , , , , , , , Isask’har Isask’har Isask’har Isask’har Isask’har Isask’har Isask’har Isask’har (Zigi Zigi Zigi Zigi Zigi Zigi Zigi Zigi) Walter ) Walter ) Walter ) Walter ) Walter ) Walter ) Walter ) Walter The Technion The Technion – Israel Institute of Technology Israel Institute of Technology

Upload: others

Post on 04-Feb-2022

8 views

Category:

Documents


0 download

TRANSCRIPT

Simics and Friends Simics and Friends ––Modeling Tools for CMP ResearchModeling Tools for CMP Research

ZvikaZvikaZvikaZvikaZvikaZvikaZvikaZvika GuzGuzGuzGuzGuzGuzGuzGuz, , , , , , , , Isask’harIsask’harIsask’harIsask’harIsask’harIsask’harIsask’harIsask’har ((((((((ZigiZigiZigiZigiZigiZigiZigiZigi) Walter) Walter) Walter) Walter) Walter) Walter) Walter) Walter

The Technion The Technion –– Israel Institute of Technology Israel Institute of Technology

Modeling Tools for CMP ResearchModeling Tools for CMP Research

AgendaAgenda

n Review the most commonly used tools in CMP µarch research

¨ Simulators

¨ Benchmarks

Official AgendaOfficial AgendaOfficial AgendaOfficial AgendaOfficial AgendaOfficial AgendaOfficial AgendaOfficial Agenda

Unofficial AgendaUnofficial AgendaUnofficial AgendaUnofficial AgendaUnofficial AgendaUnofficial AgendaUnofficial AgendaUnofficial Agenda

2

n Convince you to use Simics

¨ Because most often than not it is the best option

¨ Because we need more (geographically adjacent) people

Unofficial AgendaUnofficial AgendaUnofficial AgendaUnofficial AgendaUnofficial AgendaUnofficial AgendaUnofficial AgendaUnofficial Agenda

n Teaching the tools

Not on Our AgendaNot on Our AgendaNot on Our AgendaNot on Our AgendaNot on Our AgendaNot on Our AgendaNot on Our AgendaNot on Our Agenda

OutlineOutlinen Choosing a Simulators

n Simics

n And friends

¨ GEMS, Garnet & Orion , FeS2, SimFlex

n OPNET - modeling CMP interconnect

n Benchmarks

n Summary

¨ Technion goodies

3

OutlineOutlinen Choosing a Simulators

n Simics

n And friends

¨ GEMS, Garnet & Orion , FeS2, SimFlex

n OPNET - modeling CMP interconnect

n Benchmarks

n Summary

¨ Technion goodies

4

Choosing A SimulatorChoosing A Simulator

Detail Flexibility

Performance

DesignSpace

Detail Flexibility

PerformanceDesignSpace

Ease Of Use

n What should it model?

¨ Processor /Cache/Interconnect/etc.

n What would run on it?

¨ Benchmarks type

5

Detail FlexibilityDetail Flexibility

CMP Research CMP Research ((very partial list)very partial list)

n System-wide architecture

n Asymmetric CMP

n Memory hierarchy

¨ Caches

¨ Coherence protocols¨ Coherence protocols

n Reciprocation between HW and SW

¨ Hybrid Transactional memory

n Interactions among the cores

¨ Task assignment/migration

n Interconnect

¨ Network-On-Chip6

Choosing a Simulator for CMP Research Choosing a Simulator for CMP Research

n What will it model?

¨ Multiple cores

¨ Memory hierarchy (caches, coherence)

¨ Interconnect (NoC)

n What will run on it? n What will run on it?

¨ Multi-threaded benchmarks

¨ Commercial workloads

7

ßßßß Need OS for that

ßßßß Really need OS for that

⇒⇒⇒⇒ Full-system simulator, capable of booting (commercial) OS

OutlineOutlinen Choosing a Simulators

n Simics

n And friends

¨ GEMS, Garnet & Orion , FeS2, SimFlex

n OPNET - modeling CMP interconnect

n Benchmarks

n Summary

¨ Technion goodies

8

Meet the ContendersMeet the Contendersn SimpleScalar

¨ Uniprocessor

n PIN

¨ Not a simulator

Several in-house tools n Several in-house tools

¨ Not relevant

n M5

n Simics

n ?

9

Why Simics? Why Simics? (the short answer)(the short answer)

n Because everyone is using it

¨ THE most widely used simulator in our field

¨ 1/3 of ISCA’07 papers used Simics

n Huge, active community

Alive and kicking forum n Alive and kicking forum

n Because it is free

¨ For academia

n Up to Simics 4.2 L

n ..Oh.. and because it is really really good!

10

SimicsSimics in a Nutshellin a Nutshelln Virtual Hardware

¨ Event driven

¨ Cycle accurate*

Complete production

The software can’t tell the difference

Runs binaries from real target

Operating system

User program

MiddlewareDBJava VM

Target Software

11

HW/SW interface

productionsoftware

Simulated(virtual) hardware

Virtual Hardware

CPU

RAM

FLASH

User Intfdevice

A/DROM

PCI

I2C

BusCPU

NetworkDisk

Disk Ctrl

Drivers Boot firmwareHardware abstraction layer

Operating system

Target Software

http://www.virtutech.com/

Simics Overview (Simics Overview (11//33))

n A software, event-driven simulator

n Full-system simulator

¨ Processor

“Simics is a flexible, scalable, and high-performance full-system simulator”

¨ Memory hierarchy (DRAM, Disk)

¨ Network

¨ Devices (DMA, Interrupt controller, PCI, etc.)

n Runs unmodified binaries

¨ OS, drivers and applications

¨ Models the entire machine that OS sees

¨ Application cannot tell the difference12http://www.virtutech.com/

Simics Overview (2/3)Simics Overview (2/3)n

n Fully supported ISAs:

¨ SPARC

¨ X86

“Simics is a flexible, scalable, and high-performance full-system simulator”

¨ Alpha, Itanium, MIPS, ARM, ..

n Scalable:

¨ Single processor (uniprocessor /CMP) à MPs à Racks à ClustersàDistributed systems

13http://www.virtutech.com/

Simics Overview (Simics Overview (33//33))n

n Flexible

¨ Different degrees of simulation (details)

n Functionality only

“Simics is a flexible, scalable, and high-performance full-system simulator”

n Microarchitecture and timing

¨ Configurable

n Hook/unhook modules

n Control their timing

n Write your own (in C++)

14http://www.virtutech.com/

“Demo”“Demo”

Solaris/PowerPCSolaris/PowerPC

RedHat 7.2/Itanuim

NT/x86

RedHat 6.2/x86

15

RedHat 7.2/ Pentium III

XP/x86-64

RedHat 7.2/ Pentium III

Simics console

XP/x86-64Solaris 8/UltraSparc II

Simics console

http://www.virtutech.com/

What Have We Seen?What Have We Seen?

User application code

Target operating system (s)

Middleware and libraries

16

SimicsSimics

Host hardwareHost hardware

Host operating systemHost operating system

Virtual target hardware

Target operating system (s)

http://www.virtutech.com/

Simics Provides:Simics Provides:n Checkpoints

¨ Save/restore state

n Breakpoints

¨ Temporal breakpoints

¨ Break on memory/Register/IO¨ Break on memory/Register/IO

¨ Graphics breakpoint

n Magic instructions

¨ Signal Simics from within your application

n Access host files from the simulated machine

n So much more..

17

Simics Timing ModelsSimics Timing Modelsn Default mode

¨ Every instruction takes exactly 1 clock cycle

n Including access to disc, access to memory, etc.

n in-order mode

¨ User defines a timing model function which will be called

10X-100Xslowdown

¨ User defines a timing model function which will be called when memory request occurs

¨ Function returns the number of cycles to stall

n Out-of-order mode (MAI mode)

¨ Detailed out-of-order µarch simulation

¨ User-defined processor model

n Full control on how instructions advance

18

1000X-10000X slowdown

10000X-1 million slowdown

Simics Timing Simics Timing -- defaultdefaultn Emulation mode

n Used for fast-forwarding

¨ Boot OS

¨ Build workload

¨ Fast-forward to relevant execution part¨ Fast-forward to relevant execution part

n Basically, used for creating a checkpoint

19

Simics Timing Simics Timing –– in orderin ordern Timing model is a C program

n You can act on every memory access

n Usually used for modeling:

¨ Caches (and cache hierarchies)

¨ Coherency protocols (directory)

¨ Hardware/Hybrid transactional memory

20

Simics Timing Simics Timing –– Out Of Order ModeOut Of Order Moden Gives full control over timing

¨ User decides when things happen

n Fetch/decode/execute/commit

¨ Simics handle how these things happen

n MAI supports:n MAI supports:

¨ Out-of-order execution, multi-processor, multi-threading, branch prediction, value prediction

n Used for processor µarch research

¨ Models processor internal

n And whenever you need a better notation of time

¨ Interconnect study

21

Simple Example Simple Example –– Adding Cache (Adding Cache (11//44))

n Nahalal – A new cache architecture for CMP

n Architectural differentiation of cache lines at runtime

¨ According to usage - Private vs. Shared

22

CPU0

CPU1

CPU2

CPU6

CPU5

CPU4

CPU3CPU7

CPU0

CPU1

CPU2

CPU6

CPU5

CPU4

CPU3CPU7

Simple Example Simple Example –– Adding Cache (Adding Cache (22//44))

1. Writing a cache timing model

¨ C- Program

23

Simple Example Simple Example –– Adding Cache (Adding Cache (33//44))

2. Hooking the new cache into Simics

¨ Python script

24

Simple Example Simple Example –– Adding Cache (Adding Cache (44//44))

3. Run Simics and collect statistics

25

Simics in ResearchSimics in Researchn “Virtual Hierarchies,” M. R. Marty and M. D. Hill, Micro's Top Picks 2008

n “Improving Multiple-CMP Systems Using Token Coherence,”, M. R. Marty, J. D. Bingham, M. D. Hill, A. J. Hu, M.K. Martin and D. A. Wood, HPCA 2005

n "Nahalal: Cache Organization for Chip Multiprocessors", Z. Guz, I. Keidar, A. Kolodny, U. C. Weiser, IEEE Computer Architecture Letters, May 2007

n “Memory Mapped ECC: Low-Cost Error Protection for Last Level Caches”, D. H. Yoon and M. Erez, ISCA 2009Yoon and M. Erez, ISCA 2009

n “TokenTM: Efficient Execution of Large Transactions with Hardware Transactional Memory”, J. Bobba, N. Goyal, M. D. Hill, M. M. Swift, and D. A. Wood, ISCA 2008

n “Predicting the Performance of Reconfigurable Optical Interconnects in Distributed Shared-Memory Systems”, W. Heirman, J. Dambre, I. Artundo, C. Debaes, H. Thienpont, D. Stroobandt, J. Van Campenhout, Photonic Network Communications ’08

n “Serializing Instructions in System-Intensive Workloads: Amdahl's Law Strikes Again“ P. M. Wells, G. S. Sohi, HPCA 2008

n “Predictor Virtualization,” I. Burcea, S. Somogyi, A. Moshovos and B. Falsafi, ASPLOS 2008

26

OutlineOutlinen Choosing a Simulators

n Simics

n And friends

¨ GEMS, Garnet & Orion , FeS2, SimFlex

n OPNET - modeling CMP interconnect

n Benchmarks

n Summary

¨ Technion goodies

27

AddAdd--ons for Simicsons for Simicsn Open-source add-ons enlarge Simics capabilities

n Some as popular as Simics itself

n GEMS n GEMS

n Garnet & Orion

n SimFlex

n FeS2

28

MultifacetMultifacet GEMSGEMS

n The most mature Simics add-on

¨ Most of ISCA’s Simics papers actually use GEMS

¨ Alive and active forum

“GEMS is a set of modules for Virtutech Simics that enables detailed simulation of multiprocessor systems, including CMP.”

Alive and active forum

n Two main components¨ Ruby – Memory system timing simulator

¨ Opal – Timing model for OOO processor

n Flexible ¨ Can be configured/altered/hacked

¨ Add your own models

29http://www.cs.wisc.edu/gems/

GEMS RubyGEMS Rubyn Cache hierarchy

¨ L1, L2 (private/shared), SNUCA/DNUCA, Simple DRAM

n Different coherence protocols

¨ Snoop, Directory, Token coherence

¨ Write your own¨ Write your own

n HW transaction memory

¨ Log-TM

¨ Sun’s Rock

n Interconnect

¨ Simple

¨ Garnet - detailed NoC interconnect

30http://www.cs.wisc.edu/gems/

OutlineOutlinen Choosing a Simulators

n Simics

n And friends

¨ GEMS, Garnet & Orion , FeS2, SimFlex

n OPNET - modeling CMP interconnect

n Benchmarks

n Summary

¨ Technion goodies

31

L2$ L2$ L2$ L2$

L2$ L2$ L2$ L2$

CPUL1$

CPUL1$

CPU

L1$

CPU

L1$

CMP is More than CPUs and Memory…CMP is More than CPUs and Memory…

n We need to model the interconnect too

¨ Might have a paramount effect on performance and power

n Sometime, this is all we need!

L2$ L2$ L2$ L2$

L2$ L2$ L2$ L2$

CPU

L1$

CPUL1$

CPUL1$

CPU

L1$

32

n Important part of the system!

n Static modeling can account for static attributes

¨ Topology, routing, link bandwidth, packet size, etc.

n Run-time effects are much harder to (statically) model

Simulate the Interconnect? Why Bother?Simulate the Interconnect? Why Bother?

¨ Shared resource arbitration, finite buffer sizes, channel multiplexing, flow control, …

¨ Might be dominating factors

¨ Driving home during rush hours

33

n NoC is a network!

¨ Use a network oriented tool with built in support for traffic modeling

¨ Eliminate complex system simulator if not really needed

n Perfect tool for optimizing the interconnect

¨ Architecture, topology, protocols, parameter tuning, etc.

Network vs. Full System SimulatorNetwork vs. Full System Simulator

¨ Architecture, topology, protocols, parameter tuning, etc.

n Easy programming and debugging

n Fast!

¨ “Fastest discrete event simulation engine among leading industry solutions”

34

OPNET Modeler FeaturesOPNET Modeler Featuresn Object-oriented modeling

n Hierarchical modeling environment

n GUI-based debugging and analysis

n Event-driven simulation engineEvent-driven simulation engine

n Coding C/C++ & auxiliary functions

n Open interface for integrating external object files, libraries, and other simulators

n Asynchronous/synchronous modeling

35

n "QNoC: QoS architecture and design process for Network on Chip“, E. Bolotin, I. Cidon, R. Ginosar, A. Kolodny, Special issue on Networks on Chip, The Journal of Systems Architecture, December 2003

n "Network Delays and Link Capacities in Application-Specific Wormhole NoCs“, Z. Guz, I. Walter, E. Bolotin, I. Cidon, R. Ginosar, and A. Kolodny, VLSI Design, vol.2007, Article ID 90941, May 2007

OPNET in CMP ResearchOPNET in CMP Research

n "Routing Table Minimization for Irregular Mesh NoCs“, E. Bolotin, I. Cidon, R. Ginosar, A. Kolodny, DATE 2007

n "Access Regulation to Hot-Modules in Wormhole NoCs“, I. Walter, I. Cidon, R. Ginosar, A. Kolodny, NOCS 2007

n "The Power of Priority: NoC based Distributed Cache Coherency“, E. Bolotin, Z. Guz, I. Cidon, R. Ginosar, A. Kolodny, NOCS 2007

n "Best of Both Worlds: A Bus Enhanced NoC (BENoC)“, R. Manevich, I. Walter, I. Cidon, and A. Kolodny, the ACM/IEEE Int. Symp. on Networks-on-Chip (NOCS), 2009

36

n A new interconnect architecture, utilizing “the best of both worlds”

¨ Use NoC for data delivery

¨ Use bus for lightweight, latency critical meta-data

n Coherency

n Evaluated used OPNET and Simics RR R R

BusBus--Enhanced Network onEnhanced Network on--ChipChip

n Evaluated used OPNET and Simics R

R

R

R

R R

R

RR R R

RR R R

R

Module

Module

Module

Module

Module

Module

Module

Module

ModuleModule Module Module

ModuleModule Module Module

37

RR R R

BusBus--Enhanced Network onEnhanced Network on--ChipChip

R

R

R

R

R R

R

RR R R

RR R R

R

Module

Module

Module

Module

Module

Module

Module

Module

ModuleModule Module Module

ModuleModule Module Module

38

n Run OPNET as a trace-driven simulator

¨ L2 access logs generated by Simics

n Advantages

¨ Fast

¨ Simple

Gluing OPNET to Gluing OPNET to SimicsSimics

¨ Simple

n Disadvantage

¨ Dependencies are lost

¨ Does not account for latency hiding techniques (e.g. OOO)

n But..

¨ OPNET can be glued to Simics using Ruby

39

OutlineOutlinen Choosing a Simulators

n Simics

n And friends

¨ GEMS, Garnet & Orion , FeS2, SimFlex

n OPNET - modeling CMP interconnect

n Benchmarks

n Summary

¨ Technion goodies

40

Meet the ContendersMeet the Contendersn CPU2006, CPU2000

n OMP2001

n JBB2005, JBB2000

n SPLASH-2

n PARSEC

n Commercial workloads

¨ Apache

¨ Databases

¨ ?

n ?41

Benchmark ComparisonBenchmark Comparison

CPU2006

OMP2001

SPLASH-2

PARSEC Commercial

Programs 29 11 14 13 1

Multi-Threaded LLLLLLLLLLLL ☺☺☺☺ ☺☺☺☺ ☺☺☺☺ ☺☺☺☺

Diverse LLLL LLLL ☺☺☺☺

42

Diverse LLLL LLLL ☺☺☺☺

Updated LLLL ☺☺☺☺

Emerging apps LLLL ☺☺☺☺

Installation ease ☺☺☺☺ ☺☺☺☺ ☺☺☺☺ LLLLLLLLLLLL

Simulation friendly LLLLLLLLLLLL LLLL ☺☺☺☺☺☺☺☺☺☺☺☺ LLLLLLLLLLLL

The PARSC Benchmark SuiteThe PARSC Benchmark Suite

n Over 1000 downloads since release

n This is what everyone will be using43http://parsec.cs.princeton.edu/

OutlineOutlinen Choosing a Simulators

n Simics

n And friends

¨ GEMS, Garnet & Orion , FeS2, SimFlex

n OPNET - modeling CMP interconnect

n Benchmarks

n Summary

¨ Technion goodies

44

Technion Goodies Technion Goodies n http://www.ee.technion.ac.il/matrics/software.html

n Simics workload kits

¨ Ease up installation of simics workloads

n Wisconsin GEMS provide few other too

¨ Constantly adding more workloads to the pool

n Can you help?

n OPNET models for NoC

¨ Our entire QNoC model for OPNET

n Cores, router and links, SNUCA/DNUCA L2 caches

n Routing schemes, arbitration policies, resource contention

n Synthetic/trace driven simulation

n Transactified version of Apache 45

SummarySummaryn A swift overview of simulation tools for CMP

¨ Simics

¨ GEMS

¨ OPNET

¨ Benchmarks¨ Benchmarks

n Technion’s two cents

46

Questions?

[email protected]