simos–ppc - university of texas at austin€¦ · austin research lab simos-ppc/1999-3-10 p20...

27
SimOS–PPC Full System Simulation of PowerPC Architecture Tom Keller Austin Research Lab IBM Research Division [email protected] This Talk:(Inside IBM) http://simos.austin.ibm.com (Outside IBM) http://www.ece.utexas.edu/projects/ece/lca/events

Upload: others

Post on 22-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SimOS–PPC - University of Texas at Austin€¦ · Austin Research Lab SimOS-PPC/1999-3-10 p20 Customers & Status External to IBM Users Principal Requirement Additional Development

SimOS–PPC

Full System Simulationof

PowerPC Architecture

Tom KellerAustin Research Lab

IBM Research [email protected]

This Talk:(Inside IBM) http://simos.austin.ibm.com

(Outside IBM) http://www.ece.utexas.edu/projects/ece/lca/events

Page 2: SimOS–PPC - University of Texas at Austin€¦ · Austin Research Lab SimOS-PPC/1999-3-10 p20 Customers & Status External to IBM Users Principal Requirement Additional Development

Austin Research Lab SimOS-PPC/1999-3-10 p1

The SimOS-PPC Team

Full time

Pat BohrerTom KellerAnn Marie MaynardRick Simpson

Contractor (ex-AIX development)Bob Willcox

Temporary assignment (now completed)

Brian Twichell

Page 3: SimOS–PPC - University of Texas at Austin€¦ · Austin Research Lab SimOS-PPC/1999-3-10 p20 Customers & Status External to IBM Users Principal Requirement Additional Development

Austin Research Lab SimOS-PPC/1999-3-10 p2

Full System SimulationBackground

Processor and system design has been focused around trace-based analysis, using traces (mostly) of problem-state programs.

• Traces that include operating system code are difficult to collect.

• Traces of multi-processors are extremely difficult to collect and to correlate.

• Traces tend to be old, because they’re difficult to collect.

From the traces, we generate statistics such as cache and TLB miss rates, code frequency-of-use, memory bandwidth requirements, and the like.

Tracing problem state code tells us nothing about —

• Execution paths through supervisor state code

• Operating system services

• Device drivers

• Cache traffic due to supervisor state code

• Cache traffic due to interrupt handlers

• Memory traffic due to I/O operations

Page 4: SimOS–PPC - University of Texas at Austin€¦ · Austin Research Lab SimOS-PPC/1999-3-10 p20 Customers & Status External to IBM Users Principal Requirement Additional Development

Austin Research Lab SimOS-PPC/1999-3-10 p3

Full System SimulationBackground

For multiprocessors, trace-based analysis is even more problematic

• Problem state traces miss operating system code, especially code that deals with MP synchronization.

• Execution of code on one processor affects execution (and hence trace) on other processors (lock spinning, contents of shared caches, ...).

• The usual trick, which is to ‘pretend’ that several copies of the same uniprocessor trace are running on the different processors, usually with different starting points, doesn’t model any of the MP interaction.

Page 5: SimOS–PPC - University of Texas at Austin€¦ · Austin Research Lab SimOS-PPC/1999-3-10 p20 Customers & Status External to IBM Users Principal Requirement Additional Development

Austin Research Lab SimOS-PPC/1999-3-10 p4

Full System Simulation

A way around the difficulties of traces is to use execution based simulation on a full system simulator.

A (mostly) unmodified operating system and interesting applications are run on the simulator.

• The simulator models —

• Instruction set architecture

• Caches

• Memories

• Busses

• I/O devices

• Multiple processors

• with enough precision to answer interesting questions, and

• rapidly enough to give answers in finite time.

Page 6: SimOS–PPC - University of Texas at Austin€¦ · Austin Research Lab SimOS-PPC/1999-3-10 p20 Customers & Status External to IBM Users Principal Requirement Additional Development

Austin Research Lab SimOS-PPC/1999-3-10 p5

Full System SimulationRecent History

SimOS (Stanford University)

• MIPS R4000, R10000 + Irix

• Compaq Alpha + Unix

• http://www.simos.stanford.edu

SimICS (virtutech, spinoff from Swedish Institute of Computer Science)

• Sparc V8 + SunOS 5.x

• http://www.simics.com

Page 7: SimOS–PPC - University of Texas at Austin€¦ · Austin Research Lab SimOS-PPC/1999-3-10 p20 Customers & Status External to IBM Users Principal Requirement Additional Development

Austin Research Lab SimOS-PPC/1999-3-10 p6

SimOS–PowerPC

Our project is a port of SimOS to AIX on PowerPC, with the addition of PowerPC processor simulators.

We model —

• PowerPC ISA (64-bit ‘Raven’)

• Caches, memory

• Selected I/O devices, sufficient to run a server workload:

• Disk

• Ethernet

• Console

Each element can be modeled at varying levels of ‘faithfulness’, with an inverse relationship between accuracy and speed of model.

Example: Disk model

• Simple ‘instantaneous’, interrupt-free model used for AIX bring-up

• More compex model now in use models delays of actual disk access,interrupts at proper simulated time, models DMA transfers.

Page 8: SimOS–PPC - University of Texas at Austin€¦ · Austin Research Lab SimOS-PPC/1999-3-10 p20 Customers & Status External to IBM Users Principal Requirement Additional Development

Austin Research Lab SimOS-PPC/1999-3-10 p7

SimOS-PPC

Disksimulator

Ethernetsimulator

Cachesimulator

Memorysimulator

Simple ISA simulator

Block ISA simulator

SimOS-PPC 64-bit PowerPC

AIX 4.3.1Disk

driverEthernet

driver

Applications 32- or 64-bit

AIX 4.x

PowerPC processor 32- or 64-bit

Page 9: SimOS–PPC - University of Texas at Austin€¦ · Austin Research Lab SimOS-PPC/1999-3-10 p20 Customers & Status External to IBM Users Principal Requirement Additional Development

Austin Research Lab SimOS-PPC/1999-3-10 p8

Adapting AIX to SimOS

AIX kernel

• Is unmodified. We use a copy we’ve built with –g so that all the debugging information is present.

RAMFS

• Contains drivers for our disk and console.

• Contains an ODM database with special ‘config rules’ to configure our drivers.

‘savebase’ information

• Built by us to describe the simulated machine configuration to the early stages of the boot process.

• Everything non-essential is removed, so that AIX doesn’t spend time trying to discover what’s on the bus.

All this is bundled into a boot image by a modified version of the bosboot command.

Page 10: SimOS–PPC - University of Texas at Austin€¦ · Austin Research Lab SimOS-PPC/1999-3-10 p20 Customers & Status External to IBM Users Principal Requirement Additional Development

Austin Research Lab SimOS-PPC/1999-3-10 p9

Adapting AIX to SimOS

Device drivers

• Interface with SimOS’ device simulators via a special PowerPC instruction

• Unassigned PowerPC opcode interpreted by SimOS as “simulator support” call(same trick as “diagnose” on VM/370)

• Interrupt-driven console, disk and ethernet drivers

• Console interface will remain simple, as it isn’t performance critical.

(Simple doesn’t mean lack of function, though: it runs well enough for vi, smitty, and emacs.)

Files can be copied between the simulated AIX and the host environment:

simos-source /simos/src/tmp/ros/emacs.tar | tar xvf –

Page 11: SimOS–PPC - University of Texas at Austin€¦ · Austin Research Lab SimOS-PPC/1999-3-10 p20 Customers & Status External to IBM Users Principal Requirement Additional Development

Austin Research Lab SimOS-PPC/1999-3-10 p10

TCL interface to SimOS

TCL scripts are used to control the simulator

• Configuration (cache geometry, memory size, number of processors, ...)

• Statistics collection

• Run-time control

Most TCL is in the form of annotations

• Run arbitrary TCL scripts at specified ‘points of interest’

• Specify where/when to run an annotation by:

• Execution address (numeric, symbolic)

• Load or store to specified address

• Hardware event (device interrupt)

• User-defined events– Creating a new process– Entering the idle loop– Dispatching a particular thread– ...

Page 12: SimOS–PPC - University of Texas at Austin€¦ · Austin Research Lab SimOS-PPC/1999-3-10 p20 Customers & Status External to IBM Users Principal Requirement Additional Development

Austin Research Lab SimOS-PPC/1999-3-10 p11

TCL Annotations

Annotations are the basis of SimOS’s data collection.

Annotations have access to all the symbols of the program(s) being executed:

symbol load kernel unix

Through special TCL variables, annotations have access to the entire machine state:

REGISTER(regname)MEMORY(virtual address)PHYSICAL(real address)CYCLES (current cycle count)INSTRUCTIONS (current instr count on current CPU)

Examples (from IRIX):

Get the name of the current process:

symbol read kernel::((struct user*)$uarea)->u_comm

Count number of TLB faults via an annotation on the entry to vfault():

annotation set pc kernel::vfault:START {incr vFaultCount

}

Page 13: SimOS–PPC - University of Texas at Austin€¦ · Austin Research Lab SimOS-PPC/1999-3-10 p20 Customers & Status External to IBM Users Principal Requirement Additional Development

Austin Research Lab SimOS-PPC/1999-3-10 p12

Another IRIX example:

Tracking process fork, exec, and exit

annotation set osEvent procstart {log “PFORK $CYCLES cpu=$CPU $PID($CPU)-$PROCESS($CPU)\n”

}

annotation set pc kernel::exece {set argv [symbol read “kernel:exec.c:((struct execa*)$a0)->argp”]log “PEXEC $CYCLES cpu=$CPU “

# print out the whole command lineset arg 1while {$arg != 0} {

set arg [symbol read kernel::((int*)$argv)<0>]if {$arg != 0} {

set arg [symbol read kernel::((char**)$argv)<0>]log “ $arg”set argv [expr $argv + 4]

}}log “\n”

}

annotation set osEvent procexit {log “PEXIT $CYCLES cpu=$CPU $PID($CPU)-$PROCESS($CPU)\n”

}

Page 14: SimOS–PPC - University of Texas at Austin€¦ · Austin Research Lab SimOS-PPC/1999-3-10 p20 Customers & Status External to IBM Users Principal Requirement Additional Development

Austin Research Lab SimOS-PPC/1999-3-10 p13

Copy-on-write disks

AIX

SimOSPPC

AIX

SimOSPPC

AIXdisk

image

disk∆

disk∆

ckpts ckpts

read-only read-writeread-write

The AIX disk image is copied from a real disk on a running AIX 4.3.1 system.

• The disk is not modified during SimOS execution.

• One disk image serves for multiple simulations, multiple users.

Page 15: SimOS–PPC - University of Texas at Austin€¦ · Austin Research Lab SimOS-PPC/1999-3-10 p20 Customers & Status External to IBM Users Principal Requirement Additional Development

Austin Research Lab SimOS-PPC/1999-3-10 p14

How Fast? (1)

SimOS-PPC’s ‘simple’ simulator running a cpu-intensive speech benchmark:

Simulated throughput,KIPS

Simulated KIPSHost MHz

32-bit 160 Mhz 604e

64-bit 125 Mhz Raven

240

156 .97

1.9

64-bit 250 Mhz Blackbird

340

1.4

Page 16: SimOS–PPC - University of Texas at Austin€¦ · Austin Research Lab SimOS-PPC/1999-3-10 p20 Customers & Status External to IBM Users Principal Requirement Additional Development

Austin Research Lab SimOS-PPC/1999-3-10 p15

How Fast? (2)

SimOS-PPC “Simple” simulator running on an IBM 260 Mhz 64-bit host:

• CPU-intensive benchmark runs native at 1.6 cycles/instruction or 162 MIPS

• SimOS-PPC running on host emulates host at .34 MIPS,or 466 to 1

• SimOS-PPC running on host, modeling host’s L1 and L2 caches, executes at .095 MIPS, or 1713 to 1

• Simple simulator has not been tuned for performance.

• Block translator should improve the 466 to 1 case by a factor of 5.

Page 17: SimOS–PPC - University of Texas at Austin€¦ · Austin Research Lab SimOS-PPC/1999-3-10 p20 Customers & Status External to IBM Users Principal Requirement Additional Development

Austin Research Lab SimOS-PPC/1999-3-10 p16

Checkpointing

A checkpoint can be taken at any time

• After every n instructions

• When a specific point has been reached (via an annotation)

• By operator command at the simulated OS’ console

The checkpoint includes the entire state of the system:

• Registers and memory on all processors

• Cache contents

• Outstanding interrupt state

• Disk contents

Start-up from a checkpoint is immediate (about 2 seconds)

Repeatability for debugging and for ‘what if’ experiments with different configurations

Page 18: SimOS–PPC - University of Texas at Austin€¦ · Austin Research Lab SimOS-PPC/1999-3-10 p20 Customers & Status External to IBM Users Principal Requirement Additional Development

Austin Research Lab SimOS-PPC/1999-3-10 p17

What’s currently running

Simple CPU simulator

• One-at-a-time fetch/decode/execute loop

• Implements semantics of 64-bit Raven running as a uniprocessor, 2 or 4-way SMP

• General L1/L2 SMP caches modelled

• Idle loop is recognized; clock advanced to the next interesting event

Ethernet

• Simulated machines are seen as “real” to site, each with unique IP address

• Full Telnet and FTP. X11 is supported.

Simple model of disk (fixed delays)

Simple console

TCL interface for configuring the simulator

Page 19: SimOS–PPC - University of Texas at Austin€¦ · Austin Research Lab SimOS-PPC/1999-3-10 p20 Customers & Status External to IBM Users Principal Requirement Additional Development

Austin Research Lab SimOS-PPC/1999-3-10 p18

Lines of Code

SimOS Framework600 Files95,000 lines

SGI MIPS175 Files57,000 lines

GNU Environment Compaq Alpha160 Files55,000 lines

64-bit AIX gdb165 files changed 10 files added14,000 lines added or changed

IBM PowerPC180 Files47,000 lines

Page 20: SimOS–PPC - University of Texas at Austin€¦ · Austin Research Lab SimOS-PPC/1999-3-10 p20 Customers & Status External to IBM Users Principal Requirement Additional Development

Austin Research Lab SimOS-PPC/1999-3-10 p19

What’s next

1Q-99 Block CPU simulator for UP

• Translates and caches basic blocks

• Drops back to Simple simulator for ‘hard’ instructions (e.g., mtmsr)

2Q-99

• User program debugging with gdb. Running on a real machine, user attaches to a simulated process running in SimOS-PPC and debugs it.

3Q-99

• AIX system and user program profiling

4Q-99 & Beyond

• SimOS-PPC Support and IBM Proprietary Studies

Page 21: SimOS–PPC - University of Texas at Austin€¦ · Austin Research Lab SimOS-PPC/1999-3-10 p20 Customers & Status External to IBM Users Principal Requirement Additional Development

Austin Research Lab SimOS-PPC/1999-3-10 p20

Customers & Status

External to IBM Users Principal Requirement Additional Development

University of Texas General architectural analyses tool

Integrate cycle accurate models

Carnegie Mellon Front End for PowerPC MW simulator

Integrate Microprocessor Workbench

IBM Customers Principal Requirement Additional Development

Unix Performance Group Software Tuning and Perfor-mance Debugging

Profilers and Instruction flow tracing

Future IBM Systems High End SMP and Processor Design

Integrate cycle accurate models

OS Research Platform for debugging new OS boot

None

Page 22: SimOS–PPC - University of Texas at Austin€¦ · Austin Research Lab SimOS-PPC/1999-3-10 p20 Customers & Status External to IBM Users Principal Requirement Additional Development

Austin Research Lab SimOS-PPC/1999-3-10 p21

The Block Simulator

Translates and caches sequences (blocks) of instructions

Almost all the simulated machine registers reside in actual registers

• r0, r2 – r15, r24 – r31, cr, xer, all FP regs

• Re-use of values in registers across separately-generated blocks

Block ended by

• Branch instruction

• Any instruction the block simulator can’t deal with

Generated code calls (via blrl) an assembly-language routine to handle

• Address translation for load/store

• Periodically checking for pending interrupts

• Complicated instructions (lmw/stmw, trap, ...)

• Resolution of branch addresses

Page 23: SimOS–PPC - University of Texas at Austin€¦ · Austin Research Lab SimOS-PPC/1999-3-10 p20 Customers & Status External to IBM Users Principal Requirement Additional Development

Austin Research Lab SimOS-PPC/1999-3-10 p21

Translated block example

Original block: Translated block:

10008154 rlwinm r6, r6, 2, 0xFFFFFFFC

10008158 lwzx r0, r7, r6

1000815C cmpwi cr1, r0, 10

10008160 bge cr0, 0x10008180

decr ctr, branch non-zerocall support (time expired)

rlwinm r6, r6, 2, 0xFFFFFFFC

add r17, r7, r6call support (translate for load)lwzx r0, r0, r17

cmpwi cr1, r0, 10

add 4 to instruction count

bge cr0, _______

call support (resolve branch fallthru)

call support (resolve branch target)

Page 24: SimOS–PPC - University of Texas at Austin€¦ · Austin Research Lab SimOS-PPC/1999-3-10 p20 Customers & Status External to IBM Users Principal Requirement Additional Development

Java Spec 1% MTRT

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1 25 49 73 97 121

145

169

193

217

241

265

289

313

337

361

385

409

433

457

481

505

529

553

577

601

625

649

673

697

721

745

769

793

817

Interval is 1 Million Instructions

Per

cen

tag

e o

f In

stru

ctio

ns

KERNEL_INSTS USER_INSTS IDLE_INSTS

Page 25: SimOS–PPC - University of Texas at Austin€¦ · Austin Research Lab SimOS-PPC/1999-3-10 p20 Customers & Status External to IBM Users Principal Requirement Additional Development

Java Spec 1% MTRT Caches (Stacked)

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

1 27 53 79 105

131

157

183

209

235

261

287

313

339

365

391

417

443

469

495

521

547

573

599

625

651

677

703

729

755

781

807

Each Interval is 1M Ins

Mis

ses

Per

100

0 In

stru

ctio

ns

IL1 Miss / KI DL1 Miss / KI L2 Miss / KI

Page 26: SimOS–PPC - University of Texas at Austin€¦ · Austin Research Lab SimOS-PPC/1999-3-10 p20 Customers & Status External to IBM Users Principal Requirement Additional Development

Java Spec 1% MTRT Memory Reads/Writes (Stacked)

2325254

2131417

0.00E+00

2.00E+05

4.00E+05

6.00E+05

8.00E+05

1.00E+06

1.20E+06

1.40E+06

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

514

541

568

595

622

649

676

703

730

757

784

811

Each Interval is 1M Instructions

Co

un

t

MEM_READ MEM_WRITE

Page 27: SimOS–PPC - University of Texas at Austin€¦ · Austin Research Lab SimOS-PPC/1999-3-10 p20 Customers & Status External to IBM Users Principal Requirement Additional Development

Spec Java 1% MTRT Disk Activity (Stacked!)

248

0

20

40

60

80

100

120

140

1 27 53 79 105

131

157

183

209

235

261

287

313

339

365

391

417

443

469

495

521

547

573

599

625

651

677

703

729

755

781

807

Each Interval is 1M Instructions

Co

un

t o

f R

ead

s/W

rite

s

DISK_WRITE DISK_READ