intel® processor micro-architecture – core® intel® software college

105
Intel® Processor Micro- architecture – Core® Intel® Software College

Upload: merryl-green

Post on 27-Dec-2015

231 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Intel® Processor Micro-architecture – Core® Intel® Software College

Intel® Processor Micro-architecture

– Core®

Intel® Software College

Page 2: Intel® Processor Micro-architecture – Core® Intel® Software College

2

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Objectives

After completion of this module you will be able to describe

• Components of an IA processor

• Working flow of the instruction pipeline

• Notable features of the architecture

Page 3: Intel® Processor Micro-architecture – Core® Intel® Software College

3

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Agenda

Introduction

Notable features

Micro-architecture drill-down

Advanced cache technology

Coding considerations

Page 4: Intel® Processor Micro-architecture – Core® Intel® Software College

4

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Agenda

Introduction

Notable features

Micro-architecture drill-down

Advanced cache technology

Coding considerations

Page 5: Intel® Processor Micro-architecture – Core® Intel® Software College

5

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software CollegeIndustrial Recognition

PC Format May 2006PC Format May 2006““Intel Strikes Back! Conroe is the name. Pistol-whipping Athlon . Pistol-whipping Athlon 64s into burger meat is the game..“64s into burger meat is the game..“

Intel Regains Performance Crown, Anandtech

“… At 2.8 or 3.0GHz, a Conroe EE would offer even stronger performance than what we’ve seen here.”

Intel Reveals Conroe Architecture, Extremetech“… And not only was the Intel system running at 2.66GHz— a slower clock rate than the top Pentium 4—it was outpacing an overclocked Athlon 64 FX-60. Wrap your brain around that idea for a bit…”

Conroe Benchmarks - Intel Showing Big Strength Hot Hardware.com“… Intel is poised to change the face of the desktop computing landscape…”

Intel Dishes the Knockout Punch to AMD with Conroe, GD Hardware.com“…the results were far more than we could hope for and it'll be amusing to see AMD's response to this beat-down session

Intel's Next Generation Microarchitecture UnveiledIntel's Next Generation Microarchitecture UnveiledReal World Tech

“Just as important as the technical innovations in Core MPUs, this microarchitecture will have a profound impact on the industry. “

Page 6: Intel® Processor Micro-architecture – Core® Intel® Software College

6

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Performance Summary

Intel® Core™ Microarchitecture dramatically boosts Intel platform performance• Conroe & Woodcrest drive clear Desktop/Server performance

leadership• Merom extends Intel Mobile performance leadership

Intel® Core™ Microarchitecture-based platforms set the bar in Performance and Energy Efficiency for the Multi-Core era • Intel’s 3rd generation dual-core (while competition stuck on 1st

generation)• New Intel high-performance ‘engine’: Wider, Smarter, Faster, More

Efficient

The “Core™ Effect”: Intel® Core™ Microarchitecture ramp fuels broad roadmap accelerationsBest Processor on the Planet: Energy-Efficient Performance 1

20% (Merom), 40% (Conroe), 80% (Woodcrest) Performance Boosts1 !

1 Based on SPECint*_rate_base2000

Page 7: Intel® Processor Micro-architecture – Core® Intel® Software College

7

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Mobile Microarchitecture

Intel® NetBurst®

+ New Innovations

Intel® Core™ Microarchitecture

Page 8: Intel® Processor Micro-architecture – Core® Intel® Software College

8

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Agenda

Introduction

Notable features

• Wide Dynamic Execution

• Smart Memory Access

• Advanced Smart Cache

• Advanced Digital Media Boost

• Intelligent Power Capability

Micro-architecture drill-down

Advanced cache technology

Coding considerations

Page 9: Intel® Processor Micro-architecture – Core® Intel® Software College

9

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Intel® Core® Micro-architecture Notable FeaturesIntel® Wide Dynamic Execution

• 14-stage efficient pipeline• Wider execution path• Advanced branch prediction• Macro-fusion

• Roughly ~15% of all instructions are conditional branches

• Macro-fusion fuses a comparison and jump to reduce micro-ops running down the pipeline

• Micro-fusion• Merges the load and operation micro-ops

into one macro-op

• Stack pointer tracker• ESP tracks the stack• This pointer allows push/pops to work

returning the correct values

• 64-Bit Support• Merom, Conroe, and Woodcrest

support EM64T

2M/4Mshared L2

Cache

up to10.4 Gb/s

FSB

L1 D-Cache and D-TLB

LoadLoad

SchedulersSchedulers

Retirement UnitRetirement Unit((ReOrderReOrder Buffer)Buffer)

ALUBranch

MMX/SSEFPmove

DecodeDecode

Rename/AllocRename/Alloc

uCodeuCodeROMROM

Instruction Fetch Instruction Fetch and and PreDecodePreDecode

ALUFAdd

MMX/SSEFPmove

ALUALUFMulFMul

MMX/SSEMMX/SSEFPmoveFPmove

Instruction QueueInstruction Queue

StoreStore

4444

4444

5555

Page 10: Intel® Processor Micro-architecture – Core® Intel® Software College

10

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Intel® Core® Micro-architecture Notable Features (cont.)Intel® Advanced Memory Access

• Improved prefetching

• Memory disambiguation• Advance load before a possible data dependency (pointer conflict)

• Earlier loads hide memory latencies

Page 11: Intel® Processor Micro-architecture – Core® Intel® Software College

11

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Intel® Core® Micro-architecture Notable Features (cont.)Intel® Advanced Smart Cache

• Multi-core optimization• Shared between the two cores• Advanced Transfer Cache architecture• Reduced bus traffic • Both cores have full access to the entire cache• Dynamic Cache sizing

• Shared second level (L2) 2MB 8-way or 4MB 16-way instruction and data cache

• Higher bandwidth from the L2 cache to the core• ~14 clock latency and 2 clock throughput

Page 12: Intel® Processor Micro-architecture – Core® Intel® Software College

12

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Intel® Core® Micro-architecture Notable Features (cont.)Intel® Advanced Digital Media Boost

• Single Cycle SIMD Operation• 8 Single Precision Flops/cycle• 4 Double Precision Flops/cycle

• Wide Operations• 128-bit packed Add• 128-bit packed Multiply• 128-bit packed Load• 128-bit packed Store

• Support for Intel® EM64T instructions

Core™ Core™ arch arch

PreviousPrevious

X4X4

Y4Y4

X4opY4X4opY4

SOURCESOURCE

X1opY1X1opY1

X3X3

Y3Y3

X3opY3X3opY3

X2X2

Y2Y2

X2opY2X2opY2

X1X1

Y1Y1

X1opY1X1opY1

DESTDEST

SSE/2/3 OPSSE/2/3 OP

X2opY2X2opY2

X3opY3X3opY3X4opY4X4opY4

CLOCKCLOCKCYCLE 1CYCLE 1

CLOCKCLOCKCYCLE 2CYCLE 2

00127127

CLOCKCLOCKCYCLE 1CYCLE 1

SIMD OperationSIMD Operation(SSE/SSE2/SSE3/SSSE)(SSE/SSE2/SSE3/SSSE)

Page 13: Intel® Processor Micro-architecture – Core® Intel® Software College

13

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Intel® Core® Micro-architecture Notable Features (cont.)Intelligent Power Capability

• Advanced power gating & Dynamic power coordination• Multi-point demand-based switching• Voltage-Frequency switching separation• Supports transitions to deeper sleep modes• Event blocking• Clock partitioning and recovery• Dynamic Bus Parking• During periods of high performance execution, many parts of the

chip core can be shut off

Page 14: Intel® Processor Micro-architecture – Core® Intel® Software College

14

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Intel® Core® Micro-architecture Notable Features (cont.)Intelligent Power Capability - Split Busses (core power feature)

Many buses are sized for worst case data

(x86 instruction of 15 bytes)(ALU can write-back 128 bits)

Improved Energy Efficiency

Page 15: Intel® Processor Micro-architecture – Core® Intel® Software College

15

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

By splitting buses to dealwith varying data widths,

we can gain the performancebenefit of bus width while

maintaining C dynamiccloser to thinner buses

Improved Energy Efficiency

Intel® Core® Micro-architecture Notable Features (cont.)Intelligent Power Capability - Split Busses (core power feature)

Page 16: Intel® Processor Micro-architecture – Core® Intel® Software College

16

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Agenda

Introduction

Notable features

Micro-architecture drill-down

• Front End

• Out-Of-Order Execution Core

• Memory Sub-system

Advanced cache technology

Coding considerations

Page 17: Intel® Processor Micro-architecture – Core® Intel® Software College

17

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Key Terms

CISC vs. RISC

Super-scalar

Out Of Order vs. In Order

Architecture vs. Micro-architecture

• Intel Architectures• IA32/X86• Intel® 64• IA64

• Historical Micro-architectures• P6 (Pentium Pro, Pentium II, Pentium III)• NetBurst (Pentium 4)• Mobile (Centrino platforms)

Page 18: Intel® Processor Micro-architecture – Core® Intel® Software College

18

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

System Bus

2nd Level Cache 1st Level Cache (Data)

Bus Unit

Decode/IQ

Instruction Fetch Unit

Execution Unit

Renamer/AllocatorBuffers(Retirement)

Scheduler

Branch Prediction Unit

Intel® Core® Micro-architecture Overview

Front EndFront EndExecution CoreExecution Core

Page 19: Intel® Processor Micro-architecture – Core® Intel® Software College

19

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Intel® Core® Micro-architecture Drill-down

icachebranch

predictionunit

instructionqueue

MS

instructiondecode

predecode

registeralias table

ALLOC Re-Order Buffer

ReservationStation

integerFP

SIMD(3x)

load

storeaddress

storedata

memoryorderbuffer

datacacheunit

page miss handler

Page 20: Intel® Processor Micro-architecture – Core® Intel® Software College

20

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Agenda

Introduction

Notable features

Micro-architecture drill-down

• Front End

• Out-Of-Order Execution Core

• Memory Sub-system

Advanced cache technology

Coding considerations

Page 21: Intel® Processor Micro-architecture – Core® Intel® Software College

21

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Core® Micro-architecture Front End

Instruction preparation before executed

• Instruction Fetch Unit

• Instruction Queue

• Instruction Decode Unit

• Branch Prediction Unit

Page 22: Intel® Processor Micro-architecture – Core® Intel® Software College

22

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Core® Micro-architecture Front End

Instruction Fetch Unit

Instruction Queue

Instruction Decode Unit

Branch Prediction Unit

Page 23: Intel® Processor Micro-architecture – Core® Intel® Software College

23

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Instruction Fetch Unit

Prefetches instructions that are likely to be executed

Caches frequently-used instructions

Predecodes and Buffers instructions

Intel® Core™ Microarchitecture – Front End

2nd Level Cache 1st Level Cache (Data)

IQ/ DecodeInstruction Fetch Unit

Execution Unit

Renamer/AllocatorBuffers(Retirement)

Scheduler

BTBs/Branch Prediction

Front EndFront EndExecution CoreExecution Core

branchprediction

unit

MS

instructiondecode

icache

instructionqueue

predecode

Page 24: Intel® Processor Micro-architecture – Core® Intel® Software College

24

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Instruction Fetch Unit (cont.)

I-Cache (Instruction Cache)• 32 KBytes / 8-way / 64-byte line• 16 aligned bytes fetched per cycle

ITLB (Instruction Translation Lookaside Buffer)• 128 4k pages, 8 2M pages

Instruction Prefetcher• 16-byte aligned lookup through the ITLB into the instruction cache

and instruction prefetch buffers

Instruction Pre-decoder• Instruction Length Decode (predecode)

• Avoid Length Changing Prefix, for example• The REX (EM64T) prefix (4xH) is not an LCP

Intel® Core™ Microarchitecture – Front End

Avoid in loop:

MOV dx, 1234hInstruction Prefixes Instruction Prefixes

(66H/67H)(66H/67H)OpcodeOpcode ModR/MModR/M SIBSIB DisplacemeDisplaceme

ntnt ImmediateImmediate

Page 25: Intel® Processor Micro-architecture – Core® Intel® Software College

25

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Core® Micro-architecture Front End

Instruction Fetch Unit

Instruction Queue

Instruction Decode Unit

Branch Prediction Unit

Page 26: Intel® Processor Micro-architecture – Core® Intel® Software College

26

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Instruction QueueBuffer between instruction pre-decode unit and decoder• up to six predecoded instructions written per

cycle• 18 Instructions contained in IQ• up to 5 Instructions read from IQ

Potential Loop cache

Loop Stream Detector (LSD) support• Re-use of decoded instruction• Potential power saving

Intel® Core™ Microarchitecture – Front End

2nd Level Cache 1st Level Cache (Data)

IQ/ DecodeInstruction Fetch Unit

Execution Unit

Renamer/AllocatorBuffers(Retirement)

Scheduler

BTBs/Branch Prediction

Front EndFront EndExecution CoreExecution Core

branchprediction

unit

MS

instructiondecode

icache

instructionqueue

predecode

Page 27: Intel® Processor Micro-architecture – Core® Intel® Software College

27

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Core® Micro-architecture Front End

Instruction Fetch Unit

Instruction Queue

Instruction Decode Unit

Branch Prediction Unit

Page 28: Intel® Processor Micro-architecture – Core® Intel® Software College

28

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Instruction Decode

Decode the instructions into micro-ops

Ready for the execution in OOO core

Intel® Core™ Microarchitecture – Front End

2nd Level Cache 1st Level Cache (Data)

IQ/ DecodeInstruction Fetch Unit

Execution Unit

Renamer/AllocatorBuffers(Retirement)

Scheduler

BTBs/Branch Prediction

Front EndFront EndExecution CoreExecution Core

branchprediction

unit

MS

instructiondecode

icache

instructionqueue

predecode

Page 29: Intel® Processor Micro-architecture – Core® Intel® Software College

29

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Instruction Decode

Decoders

Features

• Macro-fusion

• Micro-fusion

• Stack Pointer Tracking

Intel® Core™ Microarchitecture – Front End

Page 30: Intel® Processor Micro-architecture – Core® Intel® Software College

30

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Instruction Decode / Decoders

Instructions converted to micro-ops (uops)

• 1-uop includes load+op, stores, indirect jump, RET...

4 decoders:1 “large” and 3 “small”

• All decoders handle “simple” 1-uop instructions

• One large decoder handles instructions up to 4 uops

All decoder working in parallel

• Four(+) instructions / cycle

Micro-Sequencer takes over for long flows (handling instruction contains 2~4 uops, uCodeRom handles more complex)

Intel® Core™ Microarchitecture – Front End

Page 31: Intel® Processor Micro-architecture – Core® Intel® Software College

31

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Code Sequence in Front End

these instructions tookmore than one fetchas they are 22 bytes

IQ buffers them together

all instructions are decodable by all decoders

CMP and adjacent JCCare “fused” into a singleuop. up to 5 instructions decoded per cycle

Intel® Core™ Microarchitecture – Front End

cmp EAX, [mem] cmp EAX, [mem]

jne labeljne label

movps [EAX+240], xmm0movps [EAX+240], xmm0

mulps xmm0, xmm0mulps xmm0, xmm0addps xmm0, [EAX+16]addps xmm0, [EAX+16]

Large(dec0)

small(dec1)

small(dec2)

small(dec3)

cmpjne EAX, [mem], labelsta_std [EAX+240], xmm0mulps xmm0, xmm0, xmm0load_add xmm0, xmm0, [EAX+16]

IQ

Page 32: Intel® Processor Micro-architecture – Core® Intel® Software College

32

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Instruction Decode

Decoders

Features

• Macro-fusion

• Micro-fusion

• Stack Pointer Tracking

Intel® Core™ Microarchitecture – Front End

Page 33: Intel® Processor Micro-architecture – Core® Intel® Software College

33

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Instruction Decode / Macro - Fusion

Roughly ~15% of all instructions are conditional branches.

Macro-fusion merges two instructions into a single micro-op, as if the two instructions were a single long instruction.

Enhanced Arithmetic Logic Unit (ALU) for macro-fusion. Each macro-fused instruction executes with a single dispatch.

Not supported in EM64T long mode

Intel® Core™ Microarchitecture – Front End

cmpjae eax, [mem], label

Scheduler

Execution

flags and target to Write back

BranchEval

Page 34: Intel® Processor Micro-architecture – Core® Intel® Software College

34

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Instruction Queue

add ecx, 1

dec0

Cycle 2

Cycle 1

mov [mem1], ecx

mov [mem1], ecx

mov edx, [mem1]

add ecx, 1

cmp eax, [mem2]

dec1

dec2

dec3

jge label

mov edx, [mem1]

Instruction Decode / Macro-Fusion AbsentRead four instructions from Instruction Queue

Each instruction gets decoded into separate uops

Enabling Example

for (int i=0; i<100000; i++) {

}

cmp eax, [mem2]

jge label

dec0

Intel® Core™ Microarchitecture – Front End

Page 35: Intel® Processor Micro-architecture – Core® Intel® Software College

35

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Instruction Queue

add ecx, 1

dec0Cycle 1

mov [mem1], ecx

mov [mem1], ecx

mov edx, [mem1]

add ecx, 1

cmpjae eax, [mem2], label

dec1

dec2

dec3

mov edx, [mem1]

Instruction Decode / Macro-Fusion PresentedRead five Instructions from Instruction Queue

Send fusable pair to single decoder

Single uop represents two instructions

Enabling Example

for (unsigned int i=0; i<100000; i++) {

}

cmp eax, [mem2]

jae label

Intel® Core™ Microarchitecture – Front End

Page 36: Intel® Processor Micro-architecture – Core® Intel® Software College

36

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Instruction Decode / Macro – Fusion (cont.)

Benefits

• Reduces latency

• Increased renaming

• Increased retire bandwidth

• Increased virtual storage

• Power savings

Enabling Greater Performance & Enabling Greater Performance & EfficiencyEfficiency

Intel® Core™ Microarchitecture – Front End

Page 37: Intel® Processor Micro-architecture – Core® Intel® Software College

37

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Instruction Decode

Decoders

Features

• Macro-fusion

• Micro-fusion

• Stack Pointer Tracking

Intel® Core™ Microarchitecture – Front End

Page 38: Intel® Processor Micro-architecture – Core® Intel® Software College

38

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Instruction Decode / Micro-Op Fusion

Frequent pairs of micro-operations derived from the same Macro Instruction can be fused into a single micro-operation

Micro-op fusion effectively widens the pipeline

Intel® Core™ Microarchitecture – Front End

Page 39: Intel® Processor Micro-architecture – Core® Intel® Software College

39

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

std edx, [mem1]

Instruction Decode / Micro-Fusion (cont.)

u-ops of a Store “mov edx, [mem1]”

sta mem1

Intel® Core™ Microarchitecture – Front End

st edx, [mem1]

Page 40: Intel® Processor Micro-architecture – Core® Intel® Software College

40

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Instruction Decode

Decoders

Features

• Macro-fusion

• Micro-fusion

• Stack Pointer Tracking

Intel® Core™ Microarchitecture – Front End

Page 41: Intel® Processor Micro-architecture – Core® Intel® Software College

41

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Instruction Decode / Stack Pointer Tracker (Extended Stack Pointer folding)

ESP is calculated by dedicate logic

• No explicit Micro-Ops updating ESP

• Micro-Ops saving

• Power savingESPd=8

Decoder

0

Decoder

1

Decoder

N

4

PUSH EAX PUSH EDX POP EBX

0

Recovery

Information.

.

.

Intel® Core™ Microarchitecture – Front End

Page 42: Intel® Processor Micro-architecture – Core® Intel® Software College

42

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Core® Micro-architecture Front End

Instruction Fetch Unit

Instruction Queue

Instruction Decode Unit

Branch Prediction Unit

Page 43: Intel® Processor Micro-architecture – Core® Intel® Software College

43

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Branch Prediction Unit

Allow executing instructions long before the branch outcome is decided

• Superset of Prescott / Pentium-M features

• One taken branch every other clock

• Branch predictions for 32 bytes at a time, twice the width of the fetch engine

Intel® Core™ Microarchitecture – Front End

2nd Level Cache 1st Level Cache (Data)

IQ/ DecodeInstruction Fetch Unit

Execution Unit

Renamer/AllocatorBuffers(Retirement)

Scheduler

BTBs/Branch Prediction

Front EndFront EndExecution CoreExecution Core

branchprediction

unit

MS

instructiondecode

icache

instructionqueue

predecode

Page 44: Intel® Processor Micro-architecture – Core® Intel® Software College

44

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Branch Prediction Unit (cont.)

16-entry Return Stack Buffer (RSB)

Front end queuing of BPU lookups

Type of predictions

• Direct Calls and Jumps

• Indirect Calls and Jumps

• Conditional branches

Intel® Core™ Microarchitecture – Front End

Page 45: Intel® Processor Micro-architecture – Core® Intel® Software College

45

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Branch Prediction Improvements

Intel® Pentium® 4 Processor branch prediction PLUS the following two improvements:

Branch miss-predictions reduced by >20%

Indirect Branch Predictor Loop Detector

Intel® Core™ Microarchitecture – Front End

Page 46: Intel® Processor Micro-architecture – Core® Intel® Software College

46

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Agenda

Introduction

Notable features

Micro-architecture drill-down

• Front End

• Out-Of-Order Execution Core

• Memory Sub-system

Advanced cache technology

Coding considerations

Page 47: Intel® Processor Micro-architecture – Core® Intel® Software College

47

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Core® Micro-architecture Execution Core

2nd Level Cache 1st Level Cache (Data)

IQ/ DecodeInstruction Fetch Unit

Execution Unit

Renamer/AllocatorBuffers(Retirement)

Scheduler

BTBs/Branch Prediction

Front EndFront End Execution CoreExecution Core

Accepted decoded u-ops, assign resources, execute and retire u-ops

• Renamer

• Reservation station (RS)

• Issue ports

• Execution Unit

integerFP

SIMD(3x)

load

storeaddress

storedata

registeralias table

ALLOC Re-Order Buffer

ReservationStation

Page 48: Intel® Processor Micro-architecture – Core® Intel® Software College

48

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Execution Core Building Blocks

PortsPorts

2 Load2 Load3,4 Store3,4 Store

Memory Sub-systemMemory Sub-system

0,1,50,1,5SIMDSIMD

IntegerInteger

SIMD/IntegerSIMD/IntegerMULMUL

0,1,50,1,5IntegerInteger

0,1,50,1,5FloatingFloating

PointPointExecution UnitExecution Unit

ROBROB

RenamerRenamer

RSRS

Intel® Core™ Microarchitecture – Execution Core

Page 49: Intel® Processor Micro-architecture – Core® Intel® Software College

49

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Rename and Resources

4 uops renamed / retired per clock

• one taken branch, any # of untaken

• one fxchg per cycle

Uops written to RS and ROB

• Decoded uops were renamed and allocated with resource by RAT and sent to ROB read and RS

• RS waits for sources to arrive allowing OOO execution

• Registers not “in flight” read from ROB during RS write

Intel® Core™ Microarchitecture – Execution Core

registeralias table

ALLOC Re-Order Buffer

ReservationStation

Page 50: Intel® Processor Micro-architecture – Core® Intel® Software College

50

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Issue Ports and Execution Units

6 dispatch ports from RS

• 3 execution ports • (shared for integer / fp / simd)

• load

• store (address)

• store (data)

128-bit SSE implementation

• Port 0 has packed multiply (4 cycles SP 5 DP pipelined)

• Port 1 has packed add (3 cycles all precisions)

FP data has one additional cycle bypass latency

• Do not mix SSE FP and SSE integer ops on same register

Intel® Core™ Microarchitecture – Execution Core

integerFP

SIMD(3x)

load

storeaddress

storedata

Avoid: Addps XMM0,XMM1

Pand xmm0,xmm3Addps

xmm2,xmm0

Better: Addps XMM0,XMM1

Addps xmm2,xmm0

Pand xmm0,xmm3

Page 51: Intel® Processor Micro-architecture – Core® Intel® Software College

51

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

The Out Of Order

each uop only takes a single RS entry

load + add dispatches twice (load, then add)

mulps dispatches once when load + add to write back

sta + std dispatches twice

sta (address) can fire as early as possible

std must wait for mulps to write back

cmpjne dispatches only once (functionality is truly fused)

no dependency, can fire as early as it wants

Intel® Core™ Microarchitecture – Execution Core

cmpjne EAX, #2000, TOPsta_std [EAX+240], xmm0mulps xmm0, xmm0, xmm0load_add xmm0, xmm0, [EAX+16]

RS

Page 52: Intel® Processor Micro-architecture – Core® Intel® Software College

52

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Dispatching to OOO EXEIntel® Core™ Microarchitecture – Execution Core

RScmpjne EAX, [mem], labelsta_std [EAX+240], xmm0mulps xmm0, xmm0, xmm0load_add xmm0, xmm0, [EAX+16]

cmpjne EAX, [mem+4], labelsta_std [EAX+244], xmm0mulps xmm0, xmm0, xmm0load_add xmm0, xmm0, [EAX+16]

cmpjne EAX, [mem+8], labelsta_std [EAX+248], xmm0mulps xmm0, xmm0, xmm0load_add xmm0, xmm0, [EAX+16]

cmpjne EAX, [mem+C], labelsta_std [EAX+24C], xmm0mulps xmm0, xmm0, xmm0load_add xmm0, xmm0, [EAX+16]

5 GP (incl jmp)

4 STD

3 STA

2 Load

1 GP (incl FP add)

0 GP (incl FP mul)

Page 53: Intel® Processor Micro-architecture – Core® Intel® Software College

53

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Retirement Unit

ReOrder Buffer (ROB)

• Holds micro-ops in various stages of completion

• Buffers completed micro-ops

• updates the architectural state in order

• manages ordering of exceptions

Intel® Core™ Microarchitecture – Execution Core

registeralias table

ALLOC Re-Order Buffer

ReservationStation

Page 54: Intel® Processor Micro-architecture – Core® Intel® Software College

54

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Agenda

Introduction

Notable features

Micro-architecture drill-down

• Front End

• Out-Of-Order Execution Core

• Memory Sub-system

Advanced cache technology

Coding considerations

Page 55: Intel® Processor Micro-architecture – Core® Intel® Software College

55

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Core® Micro-architecture Memory Sub-System

32k D-Cache (8-way, 64 byte line size)

Loads & Stores

• One 128-bit load and one 128-bit store per cycle to different memory locations

• Out of order Memory operations

Data Prefetching

Memory Disambiguation

Store Forwarding

Shared Cache

Page 56: Intel® Processor Micro-architecture – Core® Intel® Software College

56

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Advanced Memory Access

3 clk latency and 1 clk thrput of L1D; 14 and 2 for L2

Miss Latencies

• L1 miss hits L2 ~ 10 cycles

• L2 miss, access to memory ~300 cycles (server/FBD)

• L2 miss, access to memory ~165 cycles (Desk/DDR2)• C step broadwater is reported to have ~50ns latency

Cache Bandwidth

• Bandwidth to cache ~ 8.5 bytes/cycle

Memory Bandwidth

• Desktop ~ 6 GB/sec/socket (linux)

• Server ~3.5 GB/sec/socket

Intel® Core™ Microarchitecture – Memory Sub-system

Page 57: Intel® Processor Micro-architecture – Core® Intel® Software College

57

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Advanced Memory Access / Enhanced Data Pre-fetch Logic

Speculates the next needed data and loads it into cache by HW and/or SW

Main Parking Lot (External Memory)

Valet Parking Area (L2 Cache)

Intel® Core™ Microarchitecture – Memory Sub-system

Door (L1 Cache)

Page 58: Intel® Processor Micro-architecture – Core® Intel® Software College

58

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Advanced Memory Access / Enhanced Data Pre-fetch Logic (cont.)• L1D cache prefetching

• Data Cache Unit Prefetcher• Known as the streaming prefetcher• Recognizes ascending access patterns in recently loaded data • Prefetches the next line into the processors cache

• Instruction Based Stride Prefetcher• Prefetches based upon a load having a regular stride• Can prefetch forward or backward 2 Kbytes

• 1/2 default page size

• L2 cache prefetching: Data Prefetch Logic (DPL)• Prefetches data to the 2nd level cache before the DCU requests the

data• Maintains 2 tables for tracking loads

• Upstream – 16 entries• Downstream – 4 entries

• Every load is either found in the DPL or generates a new entry• Upon recognition of the 2nd load of a “stream” the DPL will

prefetch the next load

Intel® Core™ Microarchitecture – Memory Sub-system

Page 59: Intel® Processor Micro-architecture – Core® Intel® Software College

59

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Advanced Memory Access / Memory Disambiguation

Intel® Core™ Microarchitecture – Memory Sub-system

Memory Disambiguation predictor

• Loads that are predicted NOT to forward from preceding store are allowed to schedule as early as possible• increasing the performance of OOO memory pipelines

Disambiguated loads checked at retirement

• Extension to existing coherency mechanism

• Invisible to software and system

Page 60: Intel® Processor Micro-architecture – Core® Intel® Software College

60

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Advanced Memory Access / Memory Disambiguation Absent

Intel® Core™ Microarchitecture – Memory Sub-system

Load4 must WAIT until previous stores complete

Memory

Data Y

Data Z

Data W

Data X

Load2 Y

Store3 W

Store1 Y

Load4 X

Page 61: Intel® Processor Micro-architecture – Core® Intel® Software College

61

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Advanced Memory Access / Memory Disambiguation Presented

Intel® Core™ Microarchitecture – Memory Sub-system

Loads can decouple from stores

Load4 can get its data WITHOUT waiting for storesMemory

Data Y

Data Z

Data W

Data X

Load2 Y

Store3 W

Store1 Y

Load4 X

Page 62: Intel® Processor Micro-architecture – Core® Intel® Software College

62

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Advanced Memory Access / Stores Forwarding

If a load follows a store and reloads the data that the store writes to memory, the micro-architecture can forward the data directly from the store to the load

Intel® Core™ Microarchitecture – Memory Sub-system

Memory

Data Y

Load2 Y

Store1 YInternal Buffers

Page 63: Intel® Processor Micro-architecture – Core® Intel® Software College

63

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Advanced Memory Access / Stores Forwarding: Aligned Store Cases

ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8

load 16 load 16 load 16 load 16 load 16 load 16 load 16 load 16

load 32 bit load 32 bit load 32 bit load 32 bit

load 64 bit load 64 bit

load 128 bit

store 128 bit

ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8

load 16 load 16 load 16 load 16

load 32 bit load 32 bit

load 64 bit

store 64 bit

ld 8 ld 8 ld 8 ld 8

load 16 load 16

load 32 bit

store 32 bit

ld 8 ld 8

load 16

store 16

Page 64: Intel® Processor Micro-architecture – Core® Intel® Software College

64

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Advanced Memory Access / Stores Forwarding: Unaligned Cases

Note that unaligned store forward does not occur when the load crosses a cache line boundary

ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8

load 16‡ load 16 load 16 load 16

load 32 bit‡ load 32 bit

load 64 bit

store 64 bit

ld 8 ld 8 ld 8 ld 8

load 16‡ load 16

load 32 bit‡

store 32 bit

ld 8 ld 8

load 16‡

store 16

ld 8

ld 8 Store forwarded to load

No forwarding

‡: No forwarding if the load crosses a cache line boundary

Note: Unaligned 128-bit stores are issued as two 64-bit stores. This provides two alignments for store forwarding

Page 65: Intel® Processor Micro-architecture – Core® Intel® Software College

65

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Agenda

Introduction

Notable features

Micro-architecture drill-down

• Front End

• Out-Of-Order Execution Core

• Memory Sub-system

Advanced cache technology

Coding considerations

Page 66: Intel® Processor Micro-architecture – Core® Intel® Software College

66

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Advanced Smart Cache® Technology: Advantages of Shared Cache

CPU1 CPU2

Memory

Front Side Bus (FSB)

Cache Line

Shipping L2 Cache Line~Half access to memory

Page 67: Intel® Processor Micro-architecture – Core® Intel® Software College

67

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

CPU2

Advanced Smart Cache® Technology: Advantages of Shared Cache (cont.)

CPU1

Memory

Front Side Bus (FSB)

Cache Line

L2 is shared:No need to ship cacheline

Page 68: Intel® Processor Micro-architecture – Core® Intel® Software College

68

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Advanced Smart Cache® Technology (cont.)Load & Store Access order

1. L1 cache of immediate core2. L1 cache of the other core3. L2 cache4. Memory BusBus

2 MB L2 Cache2 MB L2 Cache

CoreCore11

CoreCore22

Page 69: Intel® Processor Micro-architecture – Core® Intel® Software College

69

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Advanced Smart Cache® Technology (cont.)Shared second level (L2) 2MB 8-way or 4MB 16-way instruction and data cache

Cache 2 cache transfer

• improves producer / consumer style MP

Wider interface to L2

• reduced interference• processor line fill is 2 cycles

Higher bandwidth from the L2 cache to the core

• ~14 clock latency and 2 clock throughput

Page 70: Intel® Processor Micro-architecture – Core® Intel® Software College

70

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Agenda

Introduction

Notable features

Micro-architecture drilldown

Advanced cache technology

Coding considerations

Page 71: Intel® Processor Micro-architecture – Core® Intel® Software College

71

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Optimizing forInstruction Fetch and PreDecode

Avoid “Length Changing Prefixes” (LCPs)

• Affects instructions with immediate data or offset

• Operand Size Override (66H)

• Address Size Override (67H) [obsolete]

• LCPs change the length decoding algorithm – increasing the processing time from one cycle to six cycles (or eleven cycles when the instruction spans a 16-byte boundary)

• The REX (EM64T) prefix (4xH) is not an LCP• The REX prefix does lengthen the instruction by one byte, so use

of the first eight general registers in EM64T is preferred

Page 72: Intel® Processor Micro-architecture – Core® Intel® Software College

72

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Optimizing forInstruction Queue

Includes a “Loop Stream Detector” (LSD)

• Potentially very high bandwidth instruction streaming

• A number of requirements to make use of the LSD• Maximum of 18 instructions in up to four 16-byte packets• No RET instructions (hence, little practical use for CALLs)• Up to four taken branches allowed• Most effective at 70+ iterations

• LSD is after PreDecode so there is no added cost for LCPs

• Trade-off LSD with conventional loop unrolling

Page 73: Intel® Processor Micro-architecture – Core® Intel® Software College

73

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Optimizing forDecode

Decoder issues up to 4 uOps for renaming/ allocation per clock

• This creates a trade off between more complex instruction uOps versus multiple simple instruction uOps

• For example, a single four uOp instruction is all that can be renamed/allocated in a single clock

• In some cases, multiple simple instructions may be a better choice than a single complex instruction

• Single uOp instructions allow more decoder flexibility• For example, 4-1-1-1 can be decoded in one clock• However, 2-2-2-1 takes three clocks to decode

Page 74: Intel® Processor Micro-architecture – Core® Intel® Software College

74

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Optimizing forExecution

Up to six uOps can be dispatched per clock

• “Store Data” and “Store Address” dispatch ports are combined on the block diagram

Up to four results can be written back per clock

Single clock latency operations are best

• Differing latency operations can create writeback conflicts

• Separate multiple-clock uOps with several single uOp instructions• Typical instructions here: ADC/SBB, RWM, CMOVcc

• In some cases, separating a RMW instruction into its piece might be faster (decode and scheduling flexibility)

When equivalent, PS preferred to PD (LCP)

• For example, MOVAPS over MOVAPD, XORPS over XORPD

Page 75: Intel® Processor Micro-architecture – Core® Intel® Software College

75

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Optimizing forExecution (cont.)

Bypass register “access” preferred to register reads

Partial register accesses often lead to stalls

• Register size access that ‘conflicts’ with recent previous register write

• Partial XMM updates subject to dependency delays

• Partial flag stall can occur, too much higher cost• Use TEST instruction between shift and conditional to prevent

• Common zeroing instructions (e.g., XOR reg,reg) don’t stall

Avoid bypass between execution domains

• For example: FP (ADDPS) and logical ops (PAND) on XMMn

Vectorization: careful packing/unpacking sequence

• Use MXCSR’s FZ and DAZ controls as appropriate

Page 76: Intel® Processor Micro-architecture – Core® Intel® Software College

76

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Optimizing forMemory

Software prefetch instructions

• Can reach beyond a page boundary (including page walk)

• Prefetches only when it completes without an exception

General techniques to help these prefetchers

• Organize data in consecutive lines

• In general, increasing addresses are more easily prefetched

Page 77: Intel® Processor Micro-architecture – Core® Intel® Software College

77

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Summary

What has been covered

• Notable features of Core® Micro-architecture• Wide Dynamic Execution• Advanced Memory Access• Advanced Smart Cache• Advanced Digital Media Boost• Power Efficient Support

• Core® Micro-architecture components• Front End• OOO execution core• Memory sub-system

• Advanced cache technology

Page 78: Intel® Processor Micro-architecture – Core® Intel® Software College

78

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Page 79: Intel® Processor Micro-architecture – Core® Intel® Software College

79

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Reasons Why Software Prefetches Can Negatively Impact Performance on Core2 Duo Architecture

Software prefetches are rarely ignored on Merom Architecture• On P4 if you had a DTLB miss the prefetch could be ignored• On Merom architecture they are not ignored and the prefetch can hurt

performance since it cannot retire until after the page walk

Critical chunk is not utilized on a software prefetch• A prefetch can hurt performance if it is too close to the load• When the data comes in due to a prefetch you need the entire cache line

instead of just the critical chunk before the data can be used by the actual load

Software prefetches can trigger hardware prefetching mechanisms• Trigger the hardware prefetchers just like a regular load• False patterns can be found if you prefetch on the wrong data

• Software prefetches can saturate the bus

Not Guaranteed to Behave Cross Architecture

Performance can vary between architectures

Page 80: Intel® Processor Micro-architecture – Core® Intel® Software College

80

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Lets scale a vector: B[i] := A[i] * C

ExistingProcessor

Intel® Core™ Intel® Core™ uuarcharchAdvanced Digital Media BoostAdvanced Digital Media Boost

A

B

Advanced Digital Media Boost

2x Compute Throughput / Clock

Page 81: Intel® Processor Micro-architecture – Core® Intel® Software College

81

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

ExistingProcessor

Intel® Core™ Intel® Core™ uuarcharchAdvanced Digital Media BoostAdvanced Digital Media Boost

Assume both microarchs have 128-bit path from L1 to Processor

A

B

Advanced Digital Media Boost

2x Compute Throughput / Clock

Page 82: Intel® Processor Micro-architecture – Core® Intel® Software College

82

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

ExistingProcessor

Intel® Core™ Intel® Core™ uuarcharchAdvanced Digital Media BoostAdvanced Digital Media Boost

...handles all the memory data

A

B

Multiply can’tkeep up with load bandwidth

multiplieroperateson all data

Advanced Digital Media Boost

2x Compute Throughput / Clock

Page 83: Intel® Processor Micro-architecture – Core® Intel® Software College

83

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

ExistingProcessor

Intel® Core™ Intel® Core™ uuarcharchAdvanced Digital Media BoostAdvanced Digital Media Boost

Existing implementations eventually stall the load pipe waiting for multiplier

A

B

Load eventuallystalls waiting formultiplier

Load pipeis free to advance

Advanced Digital Media Boost

2x Compute Throughput / Clock

Page 84: Intel® Processor Micro-architecture – Core® Intel® Software College

84

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

ExistingProcessor

Intel® Core™ Intel® Core™ uuarcharchAdvanced Digital Media BoostAdvanced Digital Media Boost

...keeps pipeline free for computations

A

B

Load eventuallystalls waiting formultiplier

Load pipeis free to advance

Advanced Digital Media Boost

2x Compute Throughput / Clock

Page 85: Intel® Processor Micro-architecture – Core® Intel® Software College

85

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

ExistingProcessor

Intel® Core™ Intel® Core™ uuarcharchAdvanced Digital Media BoostAdvanced Digital Media Boost

...maintains 2X throughput compared to prior implementations

A

B

Load eventuallystalls waiting formultiplier

Load pipeis free to advance

Advanced Digital Media Boost

2x Compute Throughput / Clock

Page 86: Intel® Processor Micro-architecture – Core® Intel® Software College

86

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

ExistingProcessor

Intel® Core™ Intel® Core™ uuarcharchAdvanced Digital Media BoostAdvanced Digital Media Boost

A

B

Load eventuallystalls waiting formultiplier

Load pipeis free to advance

8 Single Precision Flops/cycle

Advanced Digital Media Boost

2x Compute Throughput / Clock

Page 87: Intel® Processor Micro-architecture – Core® Intel® Software College

87

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

ExistingProcessor

Intel® Core™ Intel® Core™ uuarcharchAdvanced Digital Media BoostAdvanced Digital Media Boost

A

B

Load eventuallystalls waiting formultiplier

Load pipeis free to advance

4 Double Precision Flops/cycle

Advanced Digital Media Boost

2x Compute Throughput / Clock

Page 88: Intel® Processor Micro-architecture – Core® Intel® Software College

88

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

ExistingProcessor

Intel® Core™ Intel® Core™ uuarcharchAdvanced Digital Media BoostAdvanced Digital Media Boost

A

B

Load eventuallystalls waiting formultiplier

Load pipeis free to advance

Advanced Digital Media Boost

2x Compute Throughput / Clock

Page 89: Intel® Processor Micro-architecture – Core® Intel® Software College

89

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

ExistingProcessor

Intel® Core™ Intel® Core™ uuarcharchAdvanced Digital Media BoostAdvanced Digital Media Boost

A

B

Load eventuallystalls waiting formultiplier

Load pipeis free to advance

Advanced Digital Media Boost

2x Compute Throughput / Clock

Page 90: Intel® Processor Micro-architecture – Core® Intel® Software College

90

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

ExistingProcessor

Intel® Core™ Intel® Core™ uuarcharchAdvanced Digital Media BoostAdvanced Digital Media Boost

A

B

Load eventuallystalls waiting formultiplier

Load pipeis free to advance

Advanced Digital Media Boost

2x Compute Throughput / Clock

Page 91: Intel® Processor Micro-architecture – Core® Intel® Software College

91

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

ExistingProcessor

Intel® Core™ Intel® Core™ uuarcharchAdvanced Digital Media BoostAdvanced Digital Media Boost

A

B

Load eventuallystalls waiting formultiplier

Load pipeis free to advance

Advanced Digital Media Boost

2x Compute Throughput / Clock

Page 92: Intel® Processor Micro-architecture – Core® Intel® Software College

92

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

ExistingProcessor

Intel® Core™ Intel® Core™ uuarcharchAdvanced Digital Media BoostAdvanced Digital Media Boost

A

B

Load eventuallystalls waiting formultiplier

Load pipeis free to advance

Advanced Digital Media Boost

2x Compute Throughput / Clock

Page 93: Intel® Processor Micro-architecture – Core® Intel® Software College

93

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

ExistingProcessor

Intel® Core™ Intel® Core™ uuarcharchAdvanced Digital Media BoostAdvanced Digital Media Boost

A

B

Load eventuallystalls waiting formultiplier

Load pipeis free to advance

Advanced Digital Media Boost

Leading Compute Density2x Compute Throughput / Clock

Page 94: Intel® Processor Micro-architecture – Core® Intel® Software College

94

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Intel® Core® Micro-architecture Notable Features (cont.) New Instructions

Instruction name Description

psignb/w/d mm, mm/m64

psignb/w/d xmm, xmm/m128Per element, if the source operand is negative, multiply the destination operand by -1.

pabsb/w/d mm, mm/m64

pabsb/w/d xmm, xmm/m128Per element, overwrite destination with absolute value of source.

phaddw/d/sw mm, mm/m64

phaddw/d/sw xmm, xmm/m128Pairwise integer horizontal addition + pack.

phsubw/d/sw mm, mm/m64

phsubw/d/sw xmm, xmm/m128Pairwise integer horizontal subtract + pack.

PMADDUBSW mm, mm/m64

PMADDUBSW xmm, xmm/m128Multiply signed & unsigned bytes. Accumulate result to signed-words. (Multiply Accumulate)

PMULHRSW mm, mm/m64

PMULHRSW xmm, xmm/m128Signed 16 bits multiply, return high bits.

PSHUFB mm, mm/m64

PSHUFB xmm, xmm/m128A complete byte-granularity permutation, including force-to-zero flag.

PALIGNR mm, mm/m64, imm8

PALIGNR xmm, xmm/m128, imm8Extract any continuous 16 (8 in the 64 bit case) bytes from the pair [dst, src] and store them to the dst register.

Page 95: Intel® Processor Micro-architecture – Core® Intel® Software College

95

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Using New Instructions with Intel Compiler

Architecture-tuning compiler switch –QxT.

Page 96: Intel® Processor Micro-architecture – Core® Intel® Software College

96

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Power Status Indicator (Mobile)

Processor communicates power consumption to external platform components

• Optimization of voltage regulator efficiency

• Load line and power delivery efficiency

PSI-2 / VID

VR

Page 97: Intel® Processor Micro-architecture – Core® Intel® Software College

97

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

DTS – Digital Thermal Sensor

Several thermal sensors are located within the Processor to cover all possible hot spots

Dedicated logic scans the thermal sensors and measures the maximum temperature on the die at any given time

Accurately reporting Processor temperature enables advanced thermal control schemes

Enabling Efficient Processor and Platform Thermal Control…

LPF

LPFCore 1

DTS Logic

Core 2DTS Logic

DTS control

and status

Page 98: Intel® Processor Micro-architecture – Core® Intel® Software College

98

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Processor provides its temperature reading over a multi drop single wire bus allowing efficient platform thermal control

Platform Environment Control Interface (PECI)

ProcessorFan

AuxiliaryFan

Manager

PECI

ChassisFan 1

ChassisFan 2

PROC #2

PROC #3

PROC #1

Page 99: Intel® Processor Micro-architecture – Core® Intel® Software College

99

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Other Features – Platform Power Management (cont.)

Front side bus with the following low power improvements• Lower voltage

• DPWR# and BPRI# signals• Must have FSB traffic to enable data and address bus input

sense amplifiers and control signals (~120 pins)• Eliminated higher address and dual processor capable pins

Intel® Core™ Microarchitecture

Page 100: Intel® Processor Micro-architecture – Core® Intel® Software College

100

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Other Features - Enhanced SpeedStep® TechnologyVoltage-Frequency switching separation

Clock partitioning and recovery

Event blocking

Even during periods of high performance execution, many parts of the chip core can be shut off

Intel® Core™ Microarchitecture

Page 101: Intel® Processor Micro-architecture – Core® Intel® Software College

101

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Disclaimer: This block diagram is for example purposes only. Significant hardware blocks have been arranged or omitted for clarity. Some resources (Bus Unit, L2 Cache, etc…) are shared between cores.

Branch Target Buffer

Microcode Sequencer

Register Allocation Table (RAT)

32 KBInstruction Cache

Next IP

InstructionDecode

(4 issue)

Fetch / Decode

Intel® Core® Micro-architecture Blocks

Retire

Re-Order Buffer (ROB) – 96 entry

IA Register Set

To L2 Cache

Port

Port

Port

Port

Bus Unit

Reserv

ati

on

Sta

tion

s (

RS

)3

2 e

ntr

y

Sch

ed

ule

r /

Dis

patc

h P

ort

s

32 KBData Cache

Execute

Port

FP Add

SIMDIntegerArithmetic

MemoryOrderBuffer(MOB)

Load

StoreAddr

FP Div/MulInteger

Shift/RotateSIMD

SIMD

IntegerArithmetic

IntegerArithmetic

Port

StoreData

Page 102: Intel® Processor Micro-architecture – Core® Intel® Software College

102

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

2M/4Mshared L2

Cache

up to10.4 Gb/s

FSB

L1 D-Cache and D-TLB

LoadLoad

SchedulersSchedulers

Retirement UnitRetirement Unit((ReOrderReOrder Buffer)Buffer)

ALUBranch

MMX/SSEFPmove

DecodeDecode

Rename/AllocRename/Alloc

uCodeuCodeROMROM

Instruction Fetch Instruction Fetch and and PreDecodePreDecode

ALUFAdd

MMX/SSEFPmove

ALUALUFMulFMul

MMX/SSEMMX/SSEFPmoveFPmove

Instruction QueueInstruction Queue

StoreStore

4444

4444

5555

Intel® Core® Micro-architecture Notable Features

Enhanced Pipeline

in orderin order

instruction fetchinstruction fetchinstruction decodeinstruction decodemicro-op renamemicro-op renamemicro-op allocatemicro-op allocate

Page 103: Intel® Processor Micro-architecture – Core® Intel® Software College

103

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Intel® Core® Micro-architecture Notable Features

Enhanced Pipeline (cont.)

out of orderout of order

micro-op schedulemicro-op schedulemicro-op executemicro-op execute

2M/4Mshared L2

Cache

up to10.4 Gb/s

FSB

L1 D-Cache and D-TLB

LoadLoad

SchedulersSchedulers

Retirement UnitRetirement Unit((ReOrderReOrder Buffer)Buffer)

ALUBranch

MMX/SSEFPmove

DecodeDecode

Rename/AllocRename/Alloc

uCodeuCodeROMROM

Instruction Fetch Instruction Fetch and and PreDecodePreDecode

ALUFAdd

MMX/SSEFPmove

ALUALUFMulFMul

MMX/SSEMMX/SSEFPmoveFPmove

Instruction QueueInstruction Queue

StoreStore

4444

4444

5555

Page 104: Intel® Processor Micro-architecture – Core® Intel® Software College

104

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

2M/4Mshared L2

Cache

up to10.4 Gb/s

FSB

L1 D-Cache and D-TLB

LoadLoad

SchedulersSchedulers

Retirement UnitRetirement Unit((ReOrderReOrder Buffer)Buffer)

ALUBranch

MMX/SSEFPmove

DecodeDecode

Rename/AllocRename/Alloc

uCodeuCodeROMROM

Instruction Fetch Instruction Fetch and and PreDecodePreDecode

ALUFAdd

MMX/SSEFPmove

ALUALUFMulFMul

MMX/SSEMMX/SSEFPmoveFPmove

Instruction QueueInstruction Queue

StoreStore

4444

4444

5555

Intel® Core® Micro-architecture Notable Features

Enhanced Pipeline (cont.)

out of orderout of order

memory pipelinesmemory pipelines

memory order unitmemory order unitmaintains architecturalmaintains architecturalordering requirementsordering requirements

Page 105: Intel® Processor Micro-architecture – Core® Intel® Software College

105

Copyright © 2006, Intel Corporation. All rights reserved.

Intel® Processor Micro-architecture - Core® microarchitecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

2M/4Mshared L2

Cache

up to10.4 Gb/s

FSB

L1 D-Cache and D-TLB

LoadLoad

SchedulersSchedulers

Retirement UnitRetirement Unit((ReOrderReOrder Buffer)Buffer)

ALUBranch

MMX/SSEFPmove

DecodeDecode

Rename/AllocRename/Alloc

uCodeuCodeROMROM

Instruction Fetch Instruction Fetch and and PreDecodePreDecode

ALUFAdd

MMX/SSEFPmove

ALUALUFMulFMul

MMX/SSEMMX/SSEFPmoveFPmove

Instruction QueueInstruction Queue

StoreStore

4444

4444

5555

Intel® Core® Micro-architecture Notable Features

Enhanced Pipeline (cont.)

in orderin order

micro-op retirementmicro-op retirementfault handlingfault handling

Retirement UnitRetirement Unitmaintains illusionmaintains illusionof in orderof in orderinstruction retirementinstruction retirement