cs211 advanced computer architecture l01 introductionshanghaitech/...what are covered by ca?...

1

Upload: others

Post on 10-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

CS211Advanced Computer Architecture

L02 Review

Chundong WangSeptember 9th, 2020

CS211@ShanghaiTech 1

Page 2: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

CS211@ShanghaiTech 2

L01 Survey

Admin

Page 3: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

What are covered by CA?

CS211@ShanghaiTech 3

Page 4: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

What are covered by CA?

CS211@ShanghaiTech 4

Instructions and micro-codesInstruction execution: pipeline, in-order or out-of-order, speculation, etc.

Memory hierarchy: cache, main memory, disk, etc.

Exception, interrupt, etc.

I/OSingle-threaded or multi-threaded

Page 5: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

Pipeline: instruction-level parallelism

CS211@ShanghaiTech 5

Inst. No. 1 2 3 4 5 6 7 8 9 10 11 12 13

i F D X M W

i+1 F D X

i+2 F D X M

i+3 F D X M W

i+4 F D X M W

Page 6: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

From single-core to multi-core

• Multi-core is not multi-threading• Single core supports multi-threading. Multi-threading is older than multi-core.• Intel introduced “hyper-threading” in 2002

• Virtually, one core becomes two.

• The era of multi-core • Intel with Core 2 Duo, AMD with Athlon 64 X2 in 2005/2006.• From Single Core to Multi-Core: Preparing for a new exponential, in ICCAD ’06

• Multi-core• Replicate multiple cores on a single die.• Operating systems perceives a core as a separate processor.

• Why multi-core?• Difficult to make single-core clock frequencies even higher, i.e., the wall• Multi-threading applications, more parallelism demanded

• Problems along with multi-cores?• Cache coherence, scheduling, interconnect, etc.

CS211@ShanghaiTech 6

Page 7: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

Memory system

• Memory hierarchy• Register, L1/L2/L3 caches, main memory, flash memory, hard disk

CS211@ShanghaiTech 7

Registers

Size 2000 bytes 64KB 256KB 8-32MB 8-64GB 256GB-2TB

Speed 300ps 1ns 3-10ns 10-20ns 50-100ns 50-100us

A typical memory hierarchy for a desktop

L1 CacheL2 Cache

L3 Cache

Memory

Storage

Page 8: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

Memory system

• Memory hierarchy• Register, L1/L2/L3 caches, main memory, flash memory, hard disk

• Locality• Spatial locality and temporal locality• Spatial locality (locality in space): if an item is referenced, items whose

addresses are close by will tend to be referenced soon• Temporal locality (locality in time): if an item is referenced, it will tend to be

referenced again soon.

CS211@ShanghaiTech 8

Page 9: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

Memory system

• Memory hierarchy• Register, L1/L2/L3 caches, main memory, flash memory, hard disk

• Locality• Spatial locality and temporal locality• Spatial locality (locality in space): if an item is referenced, items whose

addresses are close by will tend to be referenced soon• Temporal locality (locality in time): if an item is referenced, it will tend to be

referenced again soon.

CS211@ShanghaiTech 9

Spatial locality

Page 10: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

The Philosophy behind CPU Cache• Programmer-invisible hardware mechanism

• Give illusion of speed of fastest memory with size of largest memory• Works fine even if programmer has no idea what a cache is• However, performance-oriented programmers today sometimes “reverse

engineer” cache design to design data structures to match cache• And modern programming languages try to provide storage abstractions that

provide flexibility while still caching well

• Does have limits: When you overwhelm the cache your performance may drop off a cliff ...

10CS211@ShanghaiTech

Page 11: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

Processor

Control

Datapath

The involvement of cache

11

PC

Registers

Arithmetic & Logic Unit(ALU)

MemoryInput

Output

Bytes

Enable?Read/Write

Address

Write Data

ReadData

Processor-Memory Interface I/O-Memory Interfaces

Program

Data

Cache

CS211@ShanghaiTech

Page 12: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

CPU Cache

• CPU cache consists of a small, fast memory (mostly in SRAM) that acts as a buffer for the DRAM memory

• Cache block: the exchange unit between cache and main memory• Also known as cache line in state-of-the-art research papers• Each block has an address from the main memory

• Four questions related to CPU cache1. Where can a block be placed in a cache?2. How is a block found if it is in the cache?3. Which book should be replaced on a cache miss?4. What happens on a write?

CS211@ShanghaiTech 12

Page 13: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

Where to place a block in a cache

CS211@ShanghaiTech 13

01234567

01234567

01234567

Direct mapped

9527

(Block address) % (# of blocks in cache)

9527 % 8 = 7

Fully associativeA block can be placed anywhere in the cache

Set associativeCache partitioned into multiple setsA block can be placed in anywhere in a set

Set 0

Set 1

Set 2

Set 3

9527 % 4 = 3

Four sets, two ways

A direct mapped cache can be viewed as a set associative cache with N sets and one way.A fully associative cache can be viewed as set associative cache with one set and N ways.

Page 14: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

How to find a block in a cache• Block Offset: Byte address within block• Set Index: Select the set the block is in• Tag: choose the block by comparison with the blocks in the selected set

• Size of Index = 𝑙𝑙𝑙𝑙𝑙𝑙2# 𝑜𝑜𝑜𝑜 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠

• Size of Tag = Address size – Size of Index – 𝑙𝑙𝑙𝑙𝑙𝑙2# 𝑜𝑜𝑜𝑜 𝑏𝑏𝑏𝑏𝑠𝑠𝑠𝑠𝑠𝑠 𝑝𝑝𝑠𝑠𝑝𝑝 𝑏𝑏𝑏𝑏𝑜𝑜𝑏𝑏𝑏𝑏

Block offsetSet IndexTag

14

Address

CS211@ShanghaiTech

Page 15: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

Which block to be the victim for replacement

• From time to time, cache is almost always full• When a cache miss happens, a victim would be chosen and replaced

• Victim selection for a direct-mapped cache is straightforward• Why?

• For set- and fully-associative caches• Random

• e.g., ARM Cortex-A53 L1 cache• First in, first out (FIFO)• LRU (Least recently used)

• Pseudo-LRU, or approximated LRU• e.g., Intel Core i7 L1 cache

CS211@ShanghaiTech 15

Quite complicated in today’s CPUs

Page 16: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

Write policy

• Write through• Data written into cache and low-level memory• Pros: easy to implement, easy to maintain data coherency • Cons: slow, and may cause write stall

• Write back• Data written to cache only, written to main memory only upon replacement• Pros: fast, and multiple writes are batched in one, saving power• Cons: more complex control, data coherency across multi-level caches and

memory

• Write Allocate• If writing to memory not in the cache, fetch it first • Intel Core i7 L2 cache is Write Allocate

• No Write Allocate • Just write to memory without a fetch • Intel Core i7 L1 cache is no write allocate

CS211@ShanghaiTech 16

Page 17: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

Cache performance

• Cache hit and miss• Cache is transparent to system and applications. • All want high hit rates, right?

• Three types of cache miss• Compulsory miss

• First access to a block, cold start• Capacity miss

• When the cache cannot contain all the blocks needed for executing a program• Conflict miss

• Multiple blocks compete for the same set (block)

• How to reduce miss rates?• Increase the cache size? • Increase the block size?• Increase the associativity (# of ways per set)?

CS211@ShanghaiTech 17

Page 18: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

Multi-level cache

CS211@ShanghaiTech 18

Processor

Control

DatapathPC

RegistersArithmetic & Logic Unit

(ALU)

Instruction L1 Cache

DataL1 Cache

L2Cache

Memory (DRAM)

Bytes

Data

Program

Now L3 cache is very common. How to manage multi-level caches would be detailed later in this course.

Page 19: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

CS211@ShanghaiTech 19

TA’s time for lab

Admin

Page 20: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

Lab 0

• Submission deadline• 23:59:59pm (UTC+8), Wednesday, 23rd September, 2020

• Check Blackboard or course website to download it

• Submission format• A report about what you have done, what you observe, etc.• To be submitted to Blackboard• A subset of students would be asked to explain their solutions in person

• Check your email accounts after submission deadline

CS211@ShanghaiTech 20

Admin

Page 21: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

Main Memory

CS211@ShanghaiTech 21

Page 22: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

Virtual memory

• The reason of “virtual” memory• Helping programmers manage memory space• Protecting system from applications, and applications from applications• Extending memory space with the introduction of disk

• Paging vs. segmentation• Page in uniform size, e.g., 4096 or 8192 bytes• Segment in variable size• Both with pros and cons

CS211@ShanghaiTech 22

Page 23: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

23

• Processor-generated address can be split into:

Paged memory systems

Page tables make it possible to store the pages of a program non-contiguously.

0123

0123

Address Spaceof User-1

Page Table of User-1

10

2

3

page number offset

Physical Memory

A page table contains the physical address of the base of each page

CS211@ShanghaiTech

Page 24: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

24

Private address space per User

• Each user has a page table • Page table contains an entry for each user page

VA1User 1

Page Table

VA1User 2

Page Table

VA1User 3

Page Table

Phys

ical

Mem

ory

free

OSpages

CS211@ShanghaiTech

Page 25: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

25

Linear page table

VPN OffsetVirtual address

PT Base Register

VPN

Data word

Data Pages

Offset

PPNPPN

DPNPPN

PPNPPNPage Table

DPN

PPN

DPNDPN

DPNPPN

• Page Table Entry (PTE) contains:• 1 bit to indicate if page exists• And either PPN or DPN:• PPN (physical page number) for a

memory-resident page• DPN (disk page number) for a page on

the disk• Status bits for protection and usage

(read, write, exec)

• OS sets the Page Table Base Register whenever active user process changes

CS211@ShanghaiTech

Size of linear page table is a problem. Assume 64-bit address, 4KB page and 8B

PTE, 264

212× 8𝐵𝐵 for a page

table.

Page 26: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

26

Hierarchical Page Table – exploits sparsity of virtual address space use

Level 1 Page Table

Level 2Page Tables

Data Pages

page in primary memory page in secondary memory

Root of the CurrentPage Table

p1

p2

Virtual Address

(ProcessorRegister)

PTE of a nonexistent page

p1 p2 offset01112212231

10-bitL1 index

10-bit L2 index

Phys

ical

Mem

ory

CS211@ShanghaiTech

Page 27: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

27

Address Translation & Protection

• Every instruction and data access needs address translation and protection checks

A good VM design needs to be fast (~ one cycle) and space efficient

Physical Address

Virtual Address

AddressTranslation

Virtual Page No. (VPN) offset

Physical Page No. (PPN) offset

ProtectionCheck

Exception?

Kernel/User Mode

Read/Write

CS211@ShanghaiTech

Page 28: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

28

Translation Lookaside Buffers (TLB)Address translation is very expensive!

In a two-level page table, each reference becomes several memory accesses

Solution: Cache some translations in TLBTLB hit => Single-Cycle TranslationTLB miss => Page-Table Walk to refill

VPN offset

V R W D tag PPN

physical address PPN offset

virtual address

hit?

(VPN = virtual page number)

(PPN = physical page number)

CS211@ShanghaiTech

Page 29: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

29

TLB Designs• Typically 32-128 entries, usually fully associative

• Each entry maps a large page, hence less spatial locality across pages => more likely that two entries conflict

• Sometimes larger TLBs (256-512 entries) are 4-8 way set-associative• Larger systems sometimes have multi-level (L1 and L2) TLBs

• Random or FIFO replacement policy• Upon context switch? New VM space! Flush TLB …• “TLB Reach”: Size of largest virtual address space that can be

simultaneously mapped by TLB

CS211@ShanghaiTech

Page 30: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

VM-related events in pipeline

• Handling a TLB miss needs a hardware or software mechanism to refill TLB

• usually done in hardware now

• Handling a page fault (e.g., page is on disk) needs a precise trap so software handler can easily resume after retrieving page

• Handling protection violation may abort process

30

PCInst TLB

Inst. Cache D Decode E M

Data TLB

Data Cache W+

TLB miss? Page Fault?Protection violation?

TLB miss? Page Fault?Protection violation?

CS211@ShanghaiTech

Page 31: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

Disk and RAID

CS211@ShanghaiTech 31

Page 32: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

Magnetic Disk – common I/O device• A kind of computer memory

• Information stored by magnetizing ferrite material on surface of rotating disk• similar to tape recorder except digital rather than analog data

• A type of non-volatile storage• Retains its value without applying power to disk.

• Magnetic Disk1. Hard Disk Drives (HDD) – faster, denser, non-removable.

• Purpose in computer systems (Hard Disk Drive):1. Working file system + long-term backup for files2. Secondary “backing store” for main-memory. Large, inexpensive, slow

level in the memory hierarchy (virtual memory)

32CS211@ShanghaiTech

Page 33: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

Disk Device Terminology

• Several platters, with information recorded magnetically on both surfaces (usually)

• Bits recorded in tracks, which in turn divided into sectors (e.g., 512 Bytes)

• Actuator moves head (end of arm) over track (“seek”), wait for sector rotate under head, then read or write

OuterTrack

InnerTrackSector

Actuator

HeadArmPlatter

33CS211@ShanghaiTech

Page 34: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

RAID: Redundant Arrays of Independent (Inexpensive) Disks• Files are "striped" across multiple disks• Redundancy yields high data availability

• Availability: service still provided to user, even if some components failed

• Disks will still fail• Contents reconstructed from data redundantly stored in the array

=> Capacity penalty to store redundant info=> Bandwidth penalty to update redundant info

34CS211@ShanghaiTech

Page 35: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

RAID 0: Striping

• RAID 0 provides no fault tolerance or redundancy• Striping, or disk spanning• High performance

35

A0 A1 A2 A3A4 A5 A6 A7

CS211@ShanghaiTech

Page 36: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

RAID 1: Disk Mirroring/Shadowing

36

• Each disk is fully duplicated onto its “mirror(s)”• Very high availability can be achieved

• Bandwidth sacrifice on write:• Logical write = N physical writes• Reads may be optimized

• Most expensive solution: 100% capacity overhead • RAID 10 (striped mirrors), RAID 01 (mirrored stripes):

• Combinations of RAID 0 and 1.

A0 A0 A0 A0A1 A1 A1 A1

CS211@ShanghaiTech

Page 37: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

RAID 3: Parity Disk

37

P

100100111100110110010011. . .

logical record 10100011

11001101

10100011

11001101

P contains sum ofother disks per stripe mod 2 (“parity”)If disk fails, subtract P from sum of other disks to find missing information

Striped physicalrecords

CS211@ShanghaiTech

Page 38: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

RAID 4: High I/O Rate Parity

D0 D1 D2 D3 P

D4 D5 D6 PD7

D8 D9 PD10 D11

D12 PD13 D14 D15

PD16 D17 D18 D19

D20 D21 D22 D23 P

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.Disk Columns

IncreasingLogicalDiskAddress

Stripe

Insides of 5 disks

Example: small read D0 & D5, large write D12-D15

38CS211@ShanghaiTech

Page 39: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

Inspiration for RAID 5

• RAID 4 works well for small reads• Small writes (write to one disk):

• Option 1: read other data disks, create new sum and write to Parity Disk• Option 2: since P has old sum, compare old data to new data, add the difference

to P

• Small writes are limited by Parity Disk: Write to D0, D5 both also write to P disk

39

D0 D1 D2 D3 P

D4 D5 D6 PD7

CS211@ShanghaiTech

Page 40: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

RAID 5: High I/O Rate Interleaved Parity

40

Independent writespossible because ofinterleaved parity

D0 D1 D2 D3 P

D4 D5 D6 P D7

D8 D9 P D10 D11

D12 P D13 D14 D15

P D16 D17 D18 D19

D20 D21 D22 D23 P

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.Disk Columns

IncreasingLogicalDisk Addresses

Example: write to D0, D5 uses disks 0, 1, 3, 4

CS211@ShanghaiTech

Page 41: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

Problems of Disk Arrays: Small Writes

D0 D1 D2 D3 PD0'

+

+

D0' D1 D2 D3 P'

newdata

olddata

old parity

XOR

XOR

(1. Read) (2. Read)

(3. Write) (4. Write)

RAID-5: Small Write Algorithm

1 Logical Write = 2 Physical Reads + 2 Physical Writes

41CS211@ShanghaiTech

Page 42: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

Warehouse Scale Computer

CS211@ShanghaiTech 42

Page 43: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

Google’s WSCs

43

Ex: In Oregon

CS211@ShanghaiTech

Page 44: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

Containers in WSCs

44

Inside WSC Inside Container

CS211@ShanghaiTech

Page 45: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

Warehouse-Scale Computers• Datacenter

• Collection of 10,000 to 100,000 servers• Networks connecting them together

• Single gigantic machine• Very large applications (Internet service):

search, email, video sharing, social networking• Very high availability• “…WSCs are no less worthy of the expertise of computer

systems architects than any other class of machines” Barroso and Hoelzle, 2009

45CS211@ShanghaiTech

Page 46: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

Anatomy of a Web Search

46

Page 47: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

Anatomy of a Web Search (1/3)• Google “chundong wang”

• Direct request to “closest” Google WSC• Front-end load balancer directs request to one of many clusters of servers within

WSC• Within array, select one of many Goggle Web Servers (GWS) to handle the request

and compose the response pages• GWS communicates with Index Servers to find documents that contains the search

word, “chundong”, “wang”, uses location of search as well as user information• Return document list with associated relevance score

47CS211@ShanghaiTech

Page 48: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

Anatomy of a Web Search (2/3)• In parallel,

• Ad system: if someone has bothered to advertise for “chundong wang”

• Use docids (Document IDs) to access indexed documents to get snippets of stuff

• Compose the page• Result document extracts (with keyword in context) ordered by relevance score• Sponsored links and advertisements where possible

48CS211@ShanghaiTech

Page 49: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

Anatomy of a Web Search (3/3)

• Implementation strategy• Randomly distribute the entries

• Make many copies of data (a.k.a. “replicas”)

• Load balance requests across replicas

• Redundant copies of indices and documents• Breaks up search hot spots, e.g., “Tenet”

• Increases opportunities for request-level parallelism

• Makes the system more tolerant of failures

49CS211@ShanghaiTech

Page 50: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

Conclusion

• We have reviewed important topics of CA• Many not covered yet• But to be covered in a deeper way through this course

• Next lecture• Microcode, instruction, ISA, ROP

CS211@ShanghaiTech 50

Page 51: CS211 Advanced Computer Architecture L01 IntroductionShanghaiTech/...What are covered by CA? CS211@ShanghaiTech 4 Instruction execution: Instructions and micro-codes pipeline, in-order

51

Acknowledgements

• These slides contain materials developed and copyright by:• Prof. Krste Asanovic (UC Berkeley)• Prof. Xuehai Zhou (USTC)• Prof. Mikko Lipasti (UW-Madison)• Prof. Sören Schwertfeger (ShanghaiTech) • Prof. Kenji Kise (Tokyo Tech)• Prof. Jernej Barbic (USC)

CS211@ShanghaiTech