chapter 1 computer abstractions and technologyzhangs/csc446/lec1.pdf11 1.2 eight great ideas in...

Chapter 1 Computer Abstractions and Technology

Adapted from the following professors’ slides

Mark Schmalz, University of Florida

Dave Patterson, UC Berkeley

Jeanine Cook, New Mexico State University

Mithuna Thottethodi, Purdue university

2

• Rapidly changing field:– Performance doubled every 1.5 years (Moore’s law)

– Example: $500 today buys a computer better than $1000,000 could buy in 1985.

1.1 Introduction

3

Growth in Clock

4

Why Such Change in 30 years?

• Performance improvement due to–Technology Advances

• vacuum tube -> transistor -> IC -> VLSI– Computer architecture advances

• RISC, pipelining, superscalar, …,

• Price: Lower costs due to–Simpler development

• CMOS VLSI: smaller systems, fewer components–Higher volumes

• CMOS VLSI : same device, cost 10,000 vs. 10,000,000 units

5

User Perspective of Computers

• 50’s and 60 ’s– big mainframes: time-share or sign up for time

• 70’s–mini-computer: still time share but it might be a

more locally owned machine• 80’s

– microprocessors are born, personal computing is now possible => desktop computers

• 90’s– WWW, interconnectivity ==>

servers, datacenters, supercomputers• Now:

–High-performance consumer electronics: pda ’s, and cellular telephones => Embedded computers

6

Three types of computers

• Personal computers

– Examples: PCs, workstations

– General purpose, variety of software

– Metrics: performance (latency), cost, time to marke t

• Servers

– Range from small servers to building sized

– Examples: web servers, transaction servers, … supercomputers

– Network based, High capacity

– Metrics: performance (throughput), reliability, sca lability

• Embedded computers

– Hidden as components of systems

– Examples: PDA’s, printer, cell phone, TV, video con sole

– Metrics: performance (real-time), cost, power consu mption, complexity

7

Welcome to the PostPC Era

• Personal Mobile device (PMD):– Replace PC – PMD are battery operated

with wireless connectivity to the Internet

• Cloud Computing– Replace traditional servers– Relies upon giant

datacenters that are known as Warehouse Scale Computers (WSCs)• 100,000 servers• Provide service to PMDs• Software as a Service

(SaaS)

8

What You Will Learn

• Things you’ll be learning:– how computer works– how computer systems are designed– how to analyze their performance – issues affecting modern processors

• In details– How programs are translated into the machine

language• And how the hardware executes them

– The hardware/software interface - ISA– What determines program performance

• And how it can be improved– How hardware designers improve performance– What is parallel processing?

9

Topics

• Basics, Assembly and Machine Language (Chap. 1, 2, 3, appendices B)

– Instruction sets, Data representation and arithmeti c

– Performance evaluation (chap. 1)

• Processor Design (Chapters 4, 7)

– Datapath (chap.4), pipelining (chap.4),

– Parallel processors (chap. 6)

• Memory and I/O (Chapters 5 )

– Caches and virtual memory (chap. 5)

– Buses and I/O systems (?)

10

Understand Performance

• Both Hardware and Software affect performance:

– Algorithm determines number of source-level statements

– Language/Compiler/Architecture determine number of machine instructions executed per operation

– Processor/Memory determine how fast instructions are executed

– I/O system (including OS)

• Determines how fast I/O operations are executed

11

1.2 Eight Great Ideas in Computer Architecture

• Design for Moore’s Law

– IC resources double every 18-24 months

• Use Abstraction to Simplify Design

– Use abstractions to represent the design at different levels of representation

• Make the common case fast

– Making the common case fast will tend to enhance performance better than optimizing the rare case.

• Performance by Parallelism

– Get more performance b performing operations in parallel

12

1.2 Eight Great Ideas in Computer Architecture (continued)

• Performance via Pipelining

– A particular pattern of parallelism

• Performance via Prediction

– In some cases it can be faster on average to guess and start working rather than wait until you know for sure.

• Hierarchy of Memories

– Solving the processor-memory speed gap.

• Dependability via Redundancy

– Make systems dependable by including redundant components that can take over when a failure occurs and to help detect failures.

13

1.3 Below Your Program

• Application software

– Written in high-level language

• System software

– Compiler: translates HLL code to machine code

– Operating System: service code

• Handling input/output

• Managing memory and storage

• Scheduling tasks & sharing resources

• Hardware

– Processor, memory, I/O controllers

14

Below the program

• A software application program may consist of millions of lines of code

• However, computer hardware can only execute extremely low-levelinstructions ( 0’s and 1’s )

• Translation from complex application to simple instructions involves several layers of software

15

Levels of Program Code

High Level Language Program

Assembly Language Program

Machine Language Program

Control Signal Spec

Compiler

Assembler

Machine Interpretation

temp = v[k];v[k] = v[k+1];v[k+1] = temp;

lw $15, 0($2)lw $16, 4($2)sw $16, 0($2)sw $15, 4($2)

0000 1001 1100 0110 1010 1111 0101 10001010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111

ALUOP[0:3] <= InstReg[9:11] & MASK

CSC446

16

1.4 Under the Covers

• Same components forall kinds of computer– Desktop, server, embedded

• Perform the same basic functions:– inputting data, processing

data, outputting data, and storing data

• Input/output includes– User-interface devices

• Display, keyboard, mouse– Storage devices

• Hard disk, CD/DVD, flash– Network adapters

• For communicating with other computers

The BIG Picture

17

Computer Organization

• Three classic components:

– processor, memory, I/O

• Five classic components

– Datapath, control , memory, input, output

Processor

Memory

I/O

address

data

instructions

Control

Datapath

Von Neumann Machine

18

Anatomy of a Computer

Output device

Input device

Input device

Network cable

19

Anatomy of a Mouse

• Optical mouse

– LED illuminates desktop

– Small low-res camera

– Basic image processor

• Looks for x, y movement

– Buttons & wheel

• Supersedes roller-ball mechanical mouse

20

Through the Looking Glass

• LCD screen: picture elements (pixels)

– Mirrors content of frame buffer memory

21

Opening the Box (Desktop)

Motherboard

CPU

Memory(DRAM)

Hard Drive

CD/DVD Drive

Floppy Drive

Power

PCI slot

22

Motherboard

23

Opening the Box (Laptop)

24

Inside the Processor (CPU)

• Datapath: performs operations on data

• Control: sequences datapath, memory, ...

• Cache memory

–Small fast SRAM memory for immediate access to data

25

Inside the Processor

• AMD Barcelona: 4 processor cores

2626Copyright © 2014 Elsevier Inc. All rights reserved.

FIGURE 1.9 The processor integrated circuit inside t he A5 package. The size of chip is 12.1 by 10.1 mm, and it was manufactured originally in a 45-nm proce ss (see Section 1.5). It has two identical ARM processors or cores in the middle left of the chip and a PowerVR graphical processor unit (GPU) with four datapaths in the upper left quadrant. To the l eft and bottom side of the ARM cores are interfaces to main memory (DRAM). (Courtesy Chipworks, www.chipwo rks.com)

27

A Safe Place for Data

• Volatile main memory

– Loses instructions and data when power off

• Non-volatile secondary memory

– Magnetic disk

– Flash memory

– Optical disk (CDROM, DVD)

28

Networks

• Communication and resource sharing

• Local area network (LAN): Ethernet

– Within a building

• Wide area network (WAN): the Internet

• Wireless network: WiFi, Bluetooth

29

Abstractions

• Abstraction helps us deal with complexity

– Hide lower-level detail

• Instruction set architecture (ISA)

– The hardware/software interface

• Application binary interface

– The ISA plus system software interface

• Implementation

– The details underlying and interface

The BIG Picture

30

Levels of Abstraction

Application

Libraries

Operating System

Programming Language

Assembler Language

Graphical Interface

Processor IO System

Logic Design

Datapath and Control

Circuit Design

Semiconductors

Materials

Firmware

Circuits and devices

Fabrication

Digital DesignComputer Design

ApplicationProgramming

Microprogramming

Instruction Set Architecture - “Machine Language”

CSC 446

31

Hardware Abstraction

CSC446

32

1.5 Technology Trends

• Electronics technology continues to evolve

– Increased capacity and performance

– Reduced cost

Year Technology Relative performance/cost

1951 Vacuum tube 1

1965 Transistor 35

1975 Integrated circuit (IC) 900

1995 Very large scale IC (VLSI) 2,400,000

2005 Ultra large scale IC 6,200,000,000

DRAM capacity

33

Integrated Circuits: Technology trends

• Processor–logic capacity: increases about 30% per year–clock rate: increases about 20% per year–performance: increases about 50% per year (not anym ore)

• Memory–DRAM capacity: increases about 60% per year (4x eve ry 3

years)–performance: increases about 3.4% per year

• Disk–capacity: about 60% per year–performance: increases about 3.4% per year

• Network Bandwidth–Bandwidth increasing more than 100% per year!

• What impact does this have on future computer syste ms? • What impact does this have on design decisions?

34

Memory Wall: Speed Gap between Processor and DRAM

Log P

erf

orm

ance

Year

DRAM 7% per year

Processor 40% per year

Source: Junji Ogawa, Stanford

The divergence between performance and cost drives the need for memory hierarchies, to be discussed in future lectures.

35

Manufacturing ICs

• Yield: proportion of working dies per wafer

36

AMD Opteron X2 Wafer

• X2: 300mm wafer, 117 chips, 90nm technology

• X4: 45nm technology

37

Intel Pentium Pro Processor

• 306 mm2

• 5.5 M transistors

38

Integrated Circuit Cost

• Nonlinear relation to area and defect rate

– Wafer cost and area are fixed

– Defect rate determined by manufacturing process

– Die area determined by architecture and circuit design

2area/2)) Diearea per (Defects(11

Yield

area Diearea Wafer waferper Dies

Yield waferper Dies waferper Cost

die per Cost

×+=

≈

×=

39

1.7 Power Wall

• In CMOS IC technology

FrequencyVoltageload CapacitivePower 2 ××=

×1000×30 5V → 1V# of transistors connected to an output

40

Reducing Power

• Suppose a new CPU has

– 85% of capacitive load of old CPU

– 15% voltage and 15% frequency reduction

0.520.85FVC

0.85F0.85)(V0.85CPP 4

old2

oldold

old2

oldold

old

new ==××

×××××=

� The power wall� We can’t reduce voltage further� We can’t remove more heat

� How else can we improve performance?

41

1.8 Switch from Uniprocessors to multiprocessors

Constrained by power, instruction-level parallelism , memory latency

• The power limit has forced a dramatic change in the design of microprocessors

42FIGURE 1.17 Number of cores per chip, clock rate, a nd power for 2008 multicore microprocessors. Copyri ght © 2009 Elsevier, Inc. All rights reserved.

Power of Multicore processors

• Switch from

• Decrease the response time of a single program running on the single processor

• To:

• Microprocessors with multiple processors per chip, where the benefit is often more on throughput than on response time.

43

Multiprocessors

• Multicore microprocessors

– More than one processor per chip

• Requires explicitly parallel programming

– Compare with instruction level parallelism

• Hardware executes multiple instructions at once

• Hidden from the programmer

– Hard to do

• Programming for performance

• Load balancing

• Optimizing communication and synchronization

44

Changes in computer architecture

• Old Conventional Wisdom (CW) : Power is free, Transistors expensive

• New Conventional Wisdom : “Power wall” Power expensive, Xtors free (Can put more on chip than can afford to turn on)

• Old CW: Sufficiently increasing Instruction Level Parallelism (ILP) via compilers, innovation (Out-of-order, speculation, VLIW, …)

• New CW: “ILP wall” law of diminishing returns on more HW for ILP

45

Changes in computer architecture

• Old CW: Multiplies are slow, Memory access is fast

• New CW: “Memory wall” Memory slow, multiplies fast (200 clock cycles to DRAM memory, 4 clocks for multiply)

• Old CW: Uniprocessor performance 2X / 1.5 yrs

• New CW: Power Wall + ILP Wall + Memory Wall = Brick Wall

– Uniprocessor performance now 2X / 5(?) yrs

⇒⇒⇒⇒ Sea change in chip design: multiple “cores” (2X processors per chip / ~ 2 years)

• More simpler processors are more power efficient

4646

Multiprocessor design

• Not as simple as creating a chip with 1000 CPUs

–Task scheduling/division

–Communication

–Memory issues

–Even programming �� moving from 1 to 2 CPUs is extremely difficult

47

Conclusion

• Programmers and computer designers must understand a wide variety of issues of computers

• Computer systems are comprised on datapath, control unit, memory, input devices, output devices.

• Processor performance increases rapidly, but the speeds of memory and I/O have not kept pace .

• Both hardware and software designers construct computer systems in hierarchical layers , which each layer hiding details from the level above.

– Principle of abstraction: used to build systems as layers

chapter 1 computer abstractions and technologyzhangs/csc446/lec1.pdf11 1.2 eight great ideas in...

Documents