soc 1.1 chapter 1 introduction to the systems approach computer system design system-on-chip by m....
TRANSCRIPT
soc 1.1
Chapter 1Introduction to
the Systems ApproachComputer System Design
System-on-Chipby M. Flynn & W. Luk
Pub. Wiley 2011 (copyright 2011)
soc 1.2
SOC architecture and design• system-on-chip (SOC)
– processors: become components in a system
• SOC covers many topics– processors, cache, memory, interconnect, design tools
• need to know– user view: variety of processors– basic information: technology and tools– processor internals: effect on performance– storage: cache, embedded and external memory– interconnect: buses, network-on-chip– evaluation: processor, cache, memory, interconnect– advanced: specialized processors, reconfiguration– design productivity: system modelling, design exploration
soc 1.3
System on a Chip: driven by semiconductor advances
soc 1.4
Basic system-on-chip model
soc 1.5
SOC vs processors on chip• with lots of transistors, designs move in 2 ways:
– complete system on a chip – multi-core processors with lots of cache
System on chip Processors on chip processor multiple, simple,
heterogeneousfew, complex, homogeneous
cache one level, small 2-3 levels, extensive
memory embedded, on chip very large, off chip
functionality special purpose general purpose
interconnect wide, high bandwidth often through cache
power, cost both low both high
operation largely stand-alone need other chips
soc 1.6
iPhone: has System-on-Chip
Source: UC Berkeley
soc 1.7
iPhone SOC
7
1 GHz ARM Cortex A8
I/O
I/O
I/O
Processor
Memory
Source: UC Berkeley
soc 1.8
AMD’s Barcelona Multicore Processor
Core 1 Core 2
Core 3 Core 4
Northbridge
512K
B L
2
512K
B L
2 51
2KB
L2
512K
B L
2
2MB
sh
ared
L3
Cac
he
4 out-of-order cores
1.9 GHz clock rate
65nm technology
3 levels of caches
integrated Northbridge
http://www.techwarelabs.com/reviews/processors/barcelona/
soc 1.9
SOC design: key ideas
• to design and evaluate an SOC, designers need to understand: – its components: processors, memory, interconnect– applications that it targets
• SOC economics heavily dependent on:– costs: initial design, marginal production– volume: applicability, lifetime
• reducing design complexity– Intellectual Property (IP)– reconfigurable technology
soc 1.10
SOC processors
• usually a mix of special and general purpose (GP)– can be proprietary design or purchased IP
• commonly GP processor is purchased IP– includes OS and compiler support
• GP processor optimized for an application– additional instructions– vector units
soc 1.11
soc 1.12
Some processors for SOCs
SOC Basic ISA Processor description
Freescale c600: signal processing
PowerPC Superscalar with vector extension
ClearSpeed CSX600: general
Proprietary Array processor with 96 processing elements
PlayStation 2: gaming
MIPS Pipelined with 2 vector coprocessors
ARM VFP11: general
ARM Configurable vector coprocessor
soc 1.13
Processor types: overview
Processor type Architecture / Implementation approach
SIMD Single instruction applied to multiple functional units
Vector Single instruction applied to multiple pipelined registers
VLIW Multiple instructions issued each cycle under compiler control
Superscalar Multiple instructions issued each cycle under hardware control
soc 1.14
Adding instructions
• additional instructions to support specialized resources– exception: superscalar, with hardware control
• instructions can be added to base processor for coprocessor control– VLIW: Very Large Instruction Word – Array– Vector
soc 1.15
Sequential and parallel machines
• basic single stream processors– pipelined: basic sequential– superscalar: transparently concurrent– VLIW: compiler generated concurrency
• multiple stream– array processors– vector processors
• multiprocessors
soc 1.16
Sequential processors• operation
– generally transparent to sequential programmer– appear as in order instruction execution
• pipeline processor– execution in order– limited to one instruction execution / cycle
• superscalar processor– multi instructions / cycle, managed by hardware
• VLIW– multi op execution / cycle, managed by compiler
soc 1.17
Pipelined processor
IF DFAGID WBEX
Instruction #1
IF DFAGID WBEX
Instruction #2
IF DFAGID WBEX
Instruction #3
IF DFAGID WBEX
Instruction #4
Time
• IF: Instruction Fetch• ID: Instruction Decode• AG: Address Generation• DF: Data Fetch• EX: Execution• WB: Write Back
soc 1.18
Superscalar and VLIW processors
IF DFAGID WBEX
Instruction #2
IF DFAGID WBEX
Instruction #3
IF DFAGID WBEX
Instruction #5
IF DFAGID WBEX
Instruction #6
Time
IF DFAGID WBEX
IF DFAGID WBEX
Instruction #4
Instruction #1
soc 1.19
Superscalar
VLIW
soc 1.20
Superscalar
VLIW
soc 1.21
Parallel processors
• execution managed by programmer• array processors
– single instruction stream, multiple data streams: SIMD
• vector processors– SIMD
• multiprocessors– multiple instruction streams, multiple data streams:
MIMD
soc 1.22
Array processors • perform op if condition = mask• operand can come from neighbour
mask op dest sr1 sr2
one instructionissued to all PEs
n PEs, each withmemory; neighbour communications
soc 1.23
Vector processors
• vector registers, eg 8 regs x 64 words x 64 bits• vector instructions: VR3 <- VR2 VOP VR1
soc 1.24
Array Processors
Vector Processors
soc 1.25
SOC multiprocessors
soc 1.26
Memory and addressing
• many SOC memory designs use simple embedded memory– a single level cache– real (rather than virtual) addressing
• as SOC become more complex– their designs are expected to use more complex
memory and addressing configurations
soc 1.27
Three levels of addressing
soc 1.28
User view of memory: addressing• a program: process address (offset + base + index)
– virtual address: process address + process id
• a process: assigned a segment base and bound– system address: segment base + process address
• pages: active localities in main/real memory – virtual address: translated by table lookup to real address– page miss: virtual pages not in page table
• TLB (translation look-aside buffer): recent translations– TLB entry: corresponding real and (virtual, id) address
• a few hashed virtual address bits address TLB entries – if virtual, id = TLB (virtual, id) then use translation
soc 1.29
The TLB andThe MMU
soc 1.30
SOC interconnect• interconnecting multiple active agents requires
– bandwidth: capacity to transmit information (bps)– protocol: logic for non-interfering message transmission
• bus – AMBA (adv. Microcontroller bus architecture) from ARM,
widely used for SOC– bus performance: can determine system performance
• network on chip– array of switches– statically switched: eg mesh– dynamically switched: eg crossbar
soc 1.31
Bus based SOC
soc 1.32
Network on a Chip
soc 1.33
SOC design approach• understand application (compiler, OS, memory
and real time constrains)• select initial die area, power, performance targets;
select initial processors, memory, interconnect• assume target processor and interconnect
performance, design and evaluate memory• evaluate and redesign processors with memory• design interconnect to support processors and
memory • repeat and iterate to optimize
soc 1.34
SOC design approach
soc 1.35
Processor optimization example• given embedded ARM processor
– in an SOC chip
• 1 IALU vs 2 IALU vs 3 IALU vs 4 IALU– instructions per cycle?
• 16k L1 instruction cache vs 32k L1 i-cache– how much improvement? less power?
• branch predictor: taken vs not-taken– misprediction rate?
• aim: explore this large design space
soc 1.36
Design cost: product economics
• increasingly product cost determined by – design costs, including verification– not marginal cost to produce
• manage complexity in die technology by – engineering effort– engineering cleverness
soc 1.37
Design complexity
soc 1.38
Cost: product program vs engineering
Product cost
Manufacturing costs
Engineering
Marketing, sales,
administration
Fixed costs
Variable costs
Chip design
CAD support
Software
Verify & test
Mask costs
Capital equipment
CAD programs
Labor costs
Fixed project costs
Engineering costs
soc 1.39
Two scenarios
• fixed costs Kf, support costs 0.1 x fct(n), and variable costs Kv*n, so
• design get more complex while production costs decrease– K1 increases while K2 decreases– implicitly requires higher volumes to break even
• when compared with 1995, in 2005– K1 increased by 10 times– K2 decreased by the same amount
nKnKKCosts vff **)*1.0( 3
soc 1.40
Two scenarios
soc 1.41
Product volume dictates design effort
Basic physical tradeoffs
Design time and effort
Balance point depends on n, number of units
soc 1.42
Reduce complexity: use IP
• IP: Intellectual Property
soc 1.43
Reduce complexity: reconfig. tech.• reconfigurable technology: no fabrication costs
– lower non-recurring engineering (NRE) costs
• reconfigurable design: faster and cheaper– improve time-to-market
• reconfigurability– in-system upgrade: improve time-in-market– run-time adaptation: respond to run-time conditions– compile-time reconfiguration: retarget accelerator
• overhead: performance, area, energy efficiency– less effective than ASIC (application-specific IC)
soc 1.44
Summary
• to design and evaluate an SOC, designers need to understand: – its components: processors, memory, interconnect– applications that it targets
• SOC economics heavily dependent on:– costs: initial design, marginal production– volume: applicability, lifetime
• reducing design complexity– Intellectual Property (IP)– reconfigurable technology