chapter 1 computer abstractions and technologyzhangs/csc446/lec1.pdf11 1.2 eight great ideas in...
TRANSCRIPT
Chapter 1 Computer Abstractions and Technology
Adapted from the following professors’ slides
Mark Schmalz, University of Florida
Dave Patterson, UC Berkeley
Jeanine Cook, New Mexico State University
Mithuna Thottethodi, Purdue university
2
• Rapidly changing field:– Performance doubled every 1.5 years (Moore’s law)
– Example: $500 today buys a computer better than $1000,000 could buy in 1985.
1.1 Introduction
3
Growth in Clock
4
Why Such Change in 30 years?
• Performance improvement due to–Technology Advances
• vacuum tube -> transistor -> IC -> VLSI– Computer architecture advances
• RISC, pipelining, superscalar, …,
• Price: Lower costs due to–Simpler development
• CMOS VLSI: smaller systems, fewer components–Higher volumes
• CMOS VLSI : same device, cost 10,000 vs. 10,000,000 units
5
User Perspective of Computers
• 50’s and 60 ’s– big mainframes: time-share or sign up for time
• 70’s–mini-computer: still time share but it might be a
more locally owned machine• 80’s
– microprocessors are born, personal computing is now possible => desktop computers
• 90’s– WWW, interconnectivity ==>
servers, datacenters, supercomputers• Now:
–High-performance consumer electronics: pda ’s, and cellular telephones => Embedded computers
6
Three types of computers
• Personal computers
– Examples: PCs, workstations
– General purpose, variety of software
– Metrics: performance (latency), cost, time to marke t
• Servers
– Range from small servers to building sized
– Examples: web servers, transaction servers, … supercomputers
– Network based, High capacity
– Metrics: performance (throughput), reliability, sca lability
• Embedded computers
– Hidden as components of systems
– Examples: PDA’s, printer, cell phone, TV, video con sole
– Metrics: performance (real-time), cost, power consu mption, complexity
7
Welcome to the PostPC Era
• Personal Mobile device (PMD):– Replace PC – PMD are battery operated
with wireless connectivity to the Internet
• Cloud Computing– Replace traditional servers– Relies upon giant
datacenters that are known as Warehouse Scale Computers (WSCs)• 100,000 servers• Provide service to PMDs• Software as a Service
(SaaS)
8
What You Will Learn
• Things you’ll be learning:– how computer works– how computer systems are designed– how to analyze their performance – issues affecting modern processors
• In details– How programs are translated into the machine
language• And how the hardware executes them
– The hardware/software interface - ISA– What determines program performance
• And how it can be improved– How hardware designers improve performance– What is parallel processing?
9
Topics
• Basics, Assembly and Machine Language (Chap. 1, 2, 3, appendices B)
– Instruction sets, Data representation and arithmeti c
– Performance evaluation (chap. 1)
• Processor Design (Chapters 4, 7)
– Datapath (chap.4), pipelining (chap.4),
– Parallel processors (chap. 6)
• Memory and I/O (Chapters 5 )
– Caches and virtual memory (chap. 5)
– Buses and I/O systems (?)
10
Understand Performance
• Both Hardware and Software affect performance:
– Algorithm determines number of source-level statements
– Language/Compiler/Architecture determine number of machine instructions executed per operation
– Processor/Memory determine how fast instructions are executed
– I/O system (including OS)
• Determines how fast I/O operations are executed
11
1.2 Eight Great Ideas in Computer Architecture
• Design for Moore’s Law
– IC resources double every 18-24 months
• Use Abstraction to Simplify Design
– Use abstractions to represent the design at different levels of representation
• Make the common case fast
– Making the common case fast will tend to enhance performance better than optimizing the rare case.
• Performance by Parallelism
– Get more performance b performing operations in parallel
12
1.2 Eight Great Ideas in Computer Architecture (continued)
• Performance via Pipelining
– A particular pattern of parallelism
• Performance via Prediction
– In some cases it can be faster on average to guess and start working rather than wait until you know for sure.
• Hierarchy of Memories
– Solving the processor-memory speed gap.
• Dependability via Redundancy
– Make systems dependable by including redundant components that can take over when a failure occurs and to help detect failures.
13
1.3 Below Your Program
• Application software
– Written in high-level language
• System software
– Compiler: translates HLL code to machine code
– Operating System: service code
• Handling input/output
• Managing memory and storage
• Scheduling tasks & sharing resources
• Hardware
– Processor, memory, I/O controllers
14
Below the program
• A software application program may consist of millions of lines of code
• However, computer hardware can only execute extremely low-levelinstructions ( 0’s and 1’s )
• Translation from complex application to simple instructions involves several layers of software
15
Levels of Program Code
High Level Language Program
Assembly Language Program
Machine Language Program
Control Signal Spec
Compiler
Assembler
Machine Interpretation
temp = v[k];v[k] = v[k+1];v[k+1] = temp;
lw $15, 0($2)lw $16, 4($2)sw $16, 0($2)sw $15, 4($2)
0000 1001 1100 0110 1010 1111 0101 10001010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111
ALUOP[0:3] <= InstReg[9:11] & MASK
CSC446
16
1.4 Under the Covers
• Same components forall kinds of computer– Desktop, server, embedded
• Perform the same basic functions:– inputting data, processing
data, outputting data, and storing data
• Input/output includes– User-interface devices
• Display, keyboard, mouse– Storage devices
• Hard disk, CD/DVD, flash– Network adapters
• For communicating with other computers
The BIG Picture
17
Computer Organization
• Three classic components:
– processor, memory, I/O
• Five classic components
– Datapath, control , memory, input, output
Processor
Memory
I/O
address
data
instructions
Control
Datapath
Von Neumann Machine
18
Anatomy of a Computer
Output device
Input device
Input device
Network cable
19
Anatomy of a Mouse
• Optical mouse
– LED illuminates desktop
– Small low-res camera
– Basic image processor
• Looks for x, y movement
– Buttons & wheel
• Supersedes roller-ball mechanical mouse
20
Through the Looking Glass
• LCD screen: picture elements (pixels)
– Mirrors content of frame buffer memory
21
Opening the Box (Desktop)
Motherboard
CPU
Memory(DRAM)
Hard Drive
CD/DVD Drive
Floppy Drive
Power
PCI slot
22
Motherboard
23
Opening the Box (Laptop)
24
Inside the Processor (CPU)
• Datapath: performs operations on data
• Control: sequences datapath, memory, ...
• Cache memory
–Small fast SRAM memory for immediate access to data
25
Inside the Processor
• AMD Barcelona: 4 processor cores
2626Copyright © 2014 Elsevier Inc. All rights reserved.
FIGURE 1.9 The processor integrated circuit inside t he A5 package. The size of chip is 12.1 by 10.1 mm, and it was manufactured originally in a 45-nm proce ss (see Section 1.5). It has two identical ARM processors or cores in the middle left of the chip and a PowerVR graphical processor unit (GPU) with four datapaths in the upper left quadrant. To the l eft and bottom side of the ARM cores are interfaces to main memory (DRAM). (Courtesy Chipworks, www.chipwo rks.com)
27
A Safe Place for Data
• Volatile main memory
– Loses instructions and data when power off
• Non-volatile secondary memory
– Magnetic disk
– Flash memory
– Optical disk (CDROM, DVD)
28
Networks
• Communication and resource sharing
• Local area network (LAN): Ethernet
– Within a building
• Wide area network (WAN): the Internet
• Wireless network: WiFi, Bluetooth
29
Abstractions
• Abstraction helps us deal with complexity
– Hide lower-level detail
• Instruction set architecture (ISA)
– The hardware/software interface
• Application binary interface
– The ISA plus system software interface
• Implementation
– The details underlying and interface
The BIG Picture
30
Levels of Abstraction
Application
Libraries
Operating System
Programming Language
Assembler Language
Graphical Interface
Processor IO System
Logic Design
Datapath and Control
Circuit Design
Semiconductors
Materials
Firmware
Circuits and devices
Fabrication
Digital DesignComputer Design
ApplicationProgramming
Microprogramming
Instruction Set Architecture - “Machine Language”
CSC 446
31
Hardware Abstraction
CSC446
32
1.5 Technology Trends
• Electronics technology continues to evolve
– Increased capacity and performance
– Reduced cost
Year Technology Relative performance/cost
1951 Vacuum tube 1
1965 Transistor 35
1975 Integrated circuit (IC) 900
1995 Very large scale IC (VLSI) 2,400,000
2005 Ultra large scale IC 6,200,000,000
DRAM capacity
33
Integrated Circuits: Technology trends
• Processor–logic capacity: increases about 30% per year–clock rate: increases about 20% per year–performance: increases about 50% per year (not anym ore)
• Memory–DRAM capacity: increases about 60% per year (4x eve ry 3
years)–performance: increases about 3.4% per year
• Disk–capacity: about 60% per year–performance: increases about 3.4% per year
• Network Bandwidth–Bandwidth increasing more than 100% per year!
• What impact does this have on future computer syste ms? • What impact does this have on design decisions?
34
Memory Wall: Speed Gap between Processor and DRAM
Log P
erf
orm
ance
Year
DRAM 7% per year
Processor 40% per year
Source: Junji Ogawa, Stanford
The divergence between performance and cost drives the need for memory hierarchies, to be discussed in future lectures.
35
Manufacturing ICs
• Yield: proportion of working dies per wafer
36
AMD Opteron X2 Wafer
• X2: 300mm wafer, 117 chips, 90nm technology
• X4: 45nm technology
37
Intel Pentium Pro Processor
• 306 mm2
• 5.5 M transistors
38
Integrated Circuit Cost
• Nonlinear relation to area and defect rate
– Wafer cost and area are fixed
– Defect rate determined by manufacturing process
– Die area determined by architecture and circuit design
2area/2)) Diearea per (Defects(11
Yield
area Diearea Wafer waferper Dies
Yield waferper Dies waferper Cost
die per Cost
×+=
≈
×=
39
1.7 Power Wall
• In CMOS IC technology
FrequencyVoltageload CapacitivePower 2 ××=
×1000×30 5V → 1V# of transistors connected to an output
40
Reducing Power
• Suppose a new CPU has
– 85% of capacitive load of old CPU
– 15% voltage and 15% frequency reduction
0.520.85FVC
0.85F0.85)(V0.85CPP 4
old2
oldold
old2
oldold
old
new ==××
×××××=
� The power wall� We can’t reduce voltage further� We can’t remove more heat
� How else can we improve performance?
41
1.8 Switch from Uniprocessors to multiprocessors
Constrained by power, instruction-level parallelism , memory latency
• The power limit has forced a dramatic change in the design of microprocessors
42FIGURE 1.17 Number of cores per chip, clock rate, a nd power for 2008 multicore microprocessors. Copyri ght © 2009 Elsevier, Inc. All rights reserved.
Power of Multicore processors
• Switch from
• Decrease the response time of a single program running on the single processor
• To:
• Microprocessors with multiple processors per chip, where the benefit is often more on throughput than on response time.
43
Multiprocessors
• Multicore microprocessors
– More than one processor per chip
• Requires explicitly parallel programming
– Compare with instruction level parallelism
• Hardware executes multiple instructions at once
• Hidden from the programmer
– Hard to do
• Programming for performance
• Load balancing
• Optimizing communication and synchronization
44
Changes in computer architecture
• Old Conventional Wisdom (CW) : Power is free, Transistors expensive
• New Conventional Wisdom : “Power wall” Power expensive, Xtors free (Can put more on chip than can afford to turn on)
• Old CW: Sufficiently increasing Instruction Level Parallelism (ILP) via compilers, innovation (Out-of-order, speculation, VLIW, …)
• New CW: “ILP wall” law of diminishing returns on more HW for ILP
45
Changes in computer architecture
• Old CW: Multiplies are slow, Memory access is fast
• New CW: “Memory wall” Memory slow, multiplies fast (200 clock cycles to DRAM memory, 4 clocks for multiply)
• Old CW: Uniprocessor performance 2X / 1.5 yrs
• New CW: Power Wall + ILP Wall + Memory Wall = Brick Wall
– Uniprocessor performance now 2X / 5(?) yrs
⇒⇒⇒⇒ Sea change in chip design: multiple “cores” (2X processors per chip / ~ 2 years)
• More simpler processors are more power efficient
4646
Multiprocessor design
• Not as simple as creating a chip with 1000 CPUs
–Task scheduling/division
–Communication
–Memory issues
–Even programming ���� moving from 1 to 2 CPUs is extremely difficult
47
Conclusion
• Programmers and computer designers must understand a wide variety of issues of computers
• Computer systems are comprised on datapath, control unit, memory, input devices, output devices.
• Processor performance increases rapidly, but the speeds of memory and I/O have not kept pace .
• Both hardware and software designers construct computer systems in hierarchical layers , which each layer hiding details from the level above.
– Principle of abstraction: used to build systems as layers