1EE 126 Mark Hempstead
EE 126
Computer Engineering
Fall 2017
Tufts University
Instructor: Prof. Mark Hempstead
2EE 126 Mark Hempstead
Lecture Outline
• Administrative details
• Why take EE 126? What you will learn?
• What is Computer Architecture?
• Moore’s Law and Future Challenges for Computer
Architects
• Information sheet
3EE 126 Mark Hempstead
Instructor
• Instructor: Mark Hempstead ([email protected] ), Halligan Hall 235A
• Office Hours: – Mondays 3:30 pm – 4:30 pm
– Tuesdays 3:00 – 4:00 pm
• My Background– Tufts undergrad in Computer Engineering
– PhD at Harvard June 2009
– Research Intern at Intel
– Recently at ARM R&D in Cambridge UK
– Assistant Professor, Drexel University 2010 - 2015
4EE 126 Mark Hempstead
Instructor: My Research
• Power Aware Computing and Low Power VLSI Design
• Accelerator-centric computing– Selecting accelerators using static characterization and ASTs
– Security of the thermal side-channel in many accelerator workloads
• Characterizing communication in workloads
• Memory systems– Cache replacement policies and prefetching
– Non-volitile memory technologies
• SynchroTrace for fast simulation and design exploration
• Energy efficient structures for high performance processors
• Power-agile computing systems
• Power modeling of mobile devices (Android phones)
SRAM2 SRAM1
MicrocontrollerMessage
Processor
FilterEvent Processor
TimerTester
2 mm
2 m
m
◼ Power-Agile Computing for Android Smartphones
◼ Power consumption and computational needs change rapidly
◼ Combines hardware and software systems to automatically stay under energy constraints
◼ Selecting Hardware Accelerators for Energy-Efficient Computing◼ Future of computing is threatened
by increasing power density
◼ Traditional microprocessors are not enough. New application specific hardware is required
◼ Using software compilers and high-level synthesis to discover accelerators before design begins
Prof. Mark Hempstead
Associate Professor
Electrical and Computer Engineering “Energy-Efficient Computing from Hardware to Software”
Tufts Computer Architecture Lab
Improving the energy consumption of smartphones
Accelerating common application with hardware
Energy-Performance
Tradeoff
Out-of-CoreAccelerators
6EE 126 Mark Hempstead
Resources
• Text: "Computer Organization and
Design" by Patterson & Hennessy (5th
Ed 2013)
– Morgan Kaufmann
– Print Book ISBN : 9780124077263
– eBook ISBN : 9780124078864
• The material in the 4th revised Ed of
the textbook is the same as our edition
but the homework problems are
different.
7EE 126 Mark Hempstead
Prerequisites
• ES 4 Digital Logic– Binary Addition– Logic Gates and Flip-Flops– Design of combinational logic– Design of state machines– Implementing and debugging digital systems at multiple ways
(schematic, truth table, state diagram, RTL)
• Assembly programming and basic machine organization; EE 14 ( Proc lab) or COMP 40
– ISAs and instructions– Assembly programming– Interrupts and interrupt routines– Basic Caches and interacting with memory (load-store)
• VHDL or Verilog and experience with large digital designs– ES 4 with EE 26 (Digital lab) recommended
• C Programming, UNIX• Compilers, OS, Circuits/VLSI background is a plus but not required
8EE 126 Mark Hempstead
Course Expectations
• Homework Assignments
– Completed Individually.
– Submitted during class on paper.
• Quizzes (4 over the semester)
• Midterm + Final
– Midterm is scheduled when the calendar says it is
– Final will be comprehensive. During the exam period.
• Labs
– VHDL Implementation of a processor
– Handouts will be provided this week
Sucks up all your time
New this year – pipeline
tracker ☺
9
Why a pipeline tracker?
• It’s your lightweight intro to verification
• 2016 industry survey
– 55% of engineers have the title “verification eng”
– 35% are design – but spend ½ of their time in verif!
– CAGR = 10% for verif. eng, 4% for design eng.
• Turn VHDL-lab lemons into lemonade
– Less work than before (if you use the tracker)
– Add a useful skill to your resume
– Probably do a little debug competition later
EE 126 Mark Hempstead
10EE 126 Mark Hempstead
Grading
• Grade Formula– Quizzes – 10%
– Midterm – 20%
– Final – 30%
– Labs + final project – 30%
– Homework – 10%
• Late days for HW/Lab assignments – 5 late days per quarter per student
– After all late days are used, the grade will be reduced by (10% multiplied by the number of days late).
• Lab makeup policy– Resubmit labs for up ½ credit lost
– Must be submitted before turning in the next lab
11EE 126 Mark Hempstead
Topics of Study & why we care
• Get through the basics of modern processor design– single-threaded 5-stage pipeline; 1980s technology
• Learn about pipelined systems– everything is pipelined
• Understand the interfaces between architecture
and system software (compilers, OS)– Essential to understand OS/compilers/PL
– For everyone else, it can help you write better code!
• Implement your own processor in VHDL– As previously discussed…
12EE 126 Mark Hempstead
After this course…
• Computer architects strive to give maximum
performance with programmer abstraction
– Compilers, OS part of this abstraction
– e.g. pipelining, superscalar, speculative execution, branch
prediction, caching, virtual memory…
• Technology has brought us to an inflection point
– Multiple processors on a single chip -- Why?
• Design complexity, ILP/pipelining-limits, power dissipation, etc
– How to provide the abstraction?
– Some burden will shift back to programmers
13EE 126 Mark Hempstead
Estimated Schedule
• Review of Assembly Programming and Machine Organization
– Instructions and ISAs
– The ALU and single cycle implementation
– Introduce the MIPS ISA
– 5-stage Pipelining, hazards, branches
• Memory Hierarchy and Caches
– Associative caches
– Cache coherence
• Security holes; superscalar processors; multi-cores
The class calendar
is always the up-
to-date schedule
14
Slide Credits
• Many of the slides and teaching materials have
been adapted from the work of others:
• Elsevier publishing company supporting material
for Patterson & Hennessy text.
• Mary Jane, Irwin PSU. CSE 431
• David Brooks, Harvard
EE 126 Mark Hempstead
15
Review: Some Basic Definitions
• Kilobyte – 210 or 1,024 bytes (KB or KiB)
• Megabyte– 220 or 1,048,576 bytes (MB or MiB)
– sometimes “rounded” to 106 or 1,000,000 bytes
• Gigabyte – 230 or 1,073,741,824 bytes
– sometimes rounded to 109 or 1,000,000,000 bytes
• Terabyte – 240 or 1,099,511,627,776 bytes
– sometimes rounded to 1012 or 1,000,000,000,000 bytes
• Petabyte – 250 or 1024 terabytes
– sometimes rounded to 1015 or 1,000,000,000,000,000 bytes
• Exabyte – 260 or 1024 petabytes
– Sometimes rounded to 1018 or 1,000,000,000,000,000,000 bytes
16
Quick quiz
• 1015 shops = ?
• One million aches = ?
• 1012 bulls = ?
• Reminders:– Kilobyte – 210 or 1,024 bytes (KB or KiB)
– Megabyte– 220 106 bytes
– Gigabyte – 230 109 bytes
– Terabyte – 240 1012 bytes
– Petabyte – 250 or 1015 bytes
– Exabyte – 260 or 1018 bytes
EE 126 Mark Hempstead
1 pet shop
1 terrible
1 MegaHertz
17EE 126 Mark Hempstead
Application
Trends
What is Computer Architecture?
Prog. Lang,
CompilersOperating
System
Applications
(AI, DB,
Graphics)
Instruction Set Architecture
Microarchitecture
System Architecture
VLSI/Hardware
Implementations
Technology
Trends
Hardware
Software
Where does this course fit into the world of computing?
18
Below the Program
• System software– Operating system – supervising program that interfaces the user’s
program with the hardware (e.g., Linux, MacOS, Windows)• Handles basic input and output operations• Allocates storage and memory• Provides for protected sharing among multiple applications
– Compiler – translate programs written in a high-level language (e.g., C, Java) into instructions that the hardware can execute
• Which of these two software layers “should” care about computer architecture?
Systems SW
Applications softwareHardware
126
Below the Program, Con’t• High-level language program (in C)
swap (int v[], int k)(int temp;
temp = v[k];v[k] = v[k+1];v[k+1] = temp;
)
• Assembly language program (for MIPS)swap: sll $2, $5, 2
add $2, $4, $2lw $15, 0($2)lw $16, 4($2)sw $16, 0($2)sw $15, 4($2)jr $31
• Machine (object, binary) code (for MIPS)000000 00000 00101 0001000010000000
000000 00100 00010 0001000000100000
. . .
C compiler
assembler
one-to-many
one-to-one
Advantages of Higher-Level Languages ?
• What are some advantages?
• As a result, very little programming is done today at
the assembler level
l Allow the programmer to think in a more natural language and for their intended use (Fortran for scientific computation, Cobol for business programming, Lisp for symbol manipulation, Java for web programming, …)
l Improve programmer productivity – more understandable code that is easier to debug and validate
l Improve program maintainability
l Allow programs to be independent of the computer on which they are developed (compilers and assemblers can translate high-level language programs to the binary instructions of any machine)
l Emergence of optimizing compilers that produce very efficient assembly code optimized for the target machine
Instruction Set Architecture (ISA)
• ISA, or simply Architecture – the abstract interface
between the hardware and the lowest level software that
encompasses all the information necessary to write a
machine language program, including instructions,
registers, memory access, I/O, …– Enables implementations of varying cost and performance to
run identical software
– A great business idea – but how well do you think it works?
• The combination of the basic instruction set (the ISA)
and the operating system interface is called the
application binary interface (ABI)
– ABI – The user portion of the instruction set plus the operating
system interfaces used by application programmers. Defines a
standard for binary portability across computers.
Under the Covers• Five classic components of a computer – input,
output, memory, datapath, and control
❑ datapath + control = processor (CPU)
23
History of the proc world
• and why the future might be interesting…
EE 126 Mark Hempstead
Courtesy, Intel ®
Moore’s Law
Moore’s Law is
the tail wagging a
very big dog!
❑ In 1965, Intel’s Gordon Moore predicted that the number of transistors that can be integrated on single chip would double about every two years
25
Technology Scaling Road Map
(ITRS)Year 2004 2006 2008 2010 2012 2014 2017
Feature size
(nm)
90 65 45 32 22 14 10
Intg.
Capacity
(BT)
2 4 8 16 33 83 162
• Fun facts about 45nm transistors
– 30 million can fit on the head of a pin
– You could fit more than 2,000 across the width of a human hair
– If car prices had fallen at the same rate as the price of a single transistor has since 1968, a new car today would cost about 1 cent
26
Another Example of Moore’s Law Impact
DRAM capacity growth over 3 decades
27
What would you do with endless
transistors?
• Your ideas?
29
But What Happened to Clock Rates and Why?
❑ Clock rates hit a “power wall”
1
10
100
1000
10000
Clo
ck R
ate
(M
Hz)
0
20
40
60
80
100
120
Po
wer
(W
atts
)
30
30
[Taylor, DAC and DaSi 2012]
Power Density creating the “Dark Silicon Problem”
31EE 126 Mark Hempstead
How have we used these transistors?
• More functionality on one chip
– Early 1980s – 32-bit microprocessors
– Late 1980s – On Chip Level 1 Caches
– Early/Mid 1990s – 64-bit microprocessors, superscalar (ILP)
– Late 1990s – On Chip Level 2 Caches
– Early 2000s – Chip Multiprocessors, On Chip Level 3 Caches
– Early 2010s – Many-Core, SoC integration, specialized hardware
• What is next?
– How much more cache can we put on a chip? (Itanium2)
– How many more cores can we put on a chip? (Niagara, etc)
– What else can we put on chips? (Accelerators)
32
Example: Intel Kaby Lake Quad Core
(Core i7/i5 7400-7700)
• Introduced August 2016
• Quad core out-of-order (14-19 stages of pipeline)
– Supports 8 threads
• 64-bit datapath
• 14nm technology
• Three levels of caches (L1, L2, L3) on chip
• Integrated memory controller
• Integrated graphics
• 3.6 GHz clock turbo boost up to 4/2 GHz
EE 126 Mark Hempstead
https://en.wikichip.org/wiki/intel/microarchitectures/kaby_lake
33
Example Processor: Apple A10 Fusion
• Introduced 2016
– iPhone 7
• 3.3 Billion Transistors
• 16 nm technology
• Integrated GPU
• 4 cores
– 2 high power 2.34 GHz
ARMv8-A cores
– 2 Energy-efficient cores
34
Example Processor: Apple A11 Bionic
A10 Fusion A11 Bionic
Phone IPhone 7 IPhone 8, 10
Technology 16nm 10nm
Number of cores 4 (two slow, two fast) 6 (four slow, two fast)
Number of transistors 3.3B 4.3B
Freq 2.34 GHz 2.4 GHz
Has a TV ad No Yes
• https://www.youtube.com/watch?v=QN1jHqIFEbQ
• Bionic: dedicated neural-net hardware accelerator, powers
FaceID & other tasks
35EE 126 Mark Hempstead
• Old Conventional Wisdom: Power is free, Transistors expensive
• New Conventional Wisdom: “Power wall” Power expensive, transistors are free (Can put more on chip
than can afford to turn on)
• Old CW: Sufficiently increasing Instruction Level Parallelism via compilers, innovation (Out-of-order,
speculation, VLIW, …)
• New CW: “ILP wall” law of diminishing returns on more HW for ILP
• Old CW: Multiplies are slow, Memory access is fast
• New CW: “Memory wall” Memory slow, multiplies fast
(200 clock cycles to DRAM memory, 4 clocks for multiply)
• Old CW: Uniprocessor performance 2X / 1.5 yrs
• New CW: Power Wall + ILP Wall + Memory Wall = Brick Wall
– Uniprocessor performance now 2X / 5(?) yrs
Sea change in chip design: multiple “cores”
(2X processors per chip / ~ 2 years)
• More simpler processors are more power efficient
Crossroads: Conventional Wisdom in
Comp. Arch
36
“For the P6, success criteria included performance
above a certain level and failure criteria included
power dissipation above some threshold.”
Bob Colwell, Pentium Chronicles
37EE 126 Mark Hempstead
Summary
• Welcome to EE 126
• Architecture is the “glue” between system
software/applications and VLSI implementations
• Need to create abstractions to deal with
complexity
Questions?
EE 126 Mark Hempstead
39EE 126 Mark Hempstead
Information Sheet
• Please fill this out
• Designed to provide an understanding of your
background and experience
• Be honest … this is not graded