adding pep to multicore - mit computer science and...

26
Adding PEP to Multicore Eric Lau Jason Miller, Inseok Choi, Omer Khan, Jim Holt, Anantha Chandrakasan, Donald Yeung, Saman Amarasinghe, Anant Agarwal Feb 16, 2011

Upload: others

Post on 30-Apr-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Adding PEP to Multicore - MIT Computer Science and ...projects.csail.mit.edu/angstrom/SLS/PEP_presentation_v0.1.pdf · Adding PEP to Multicore Eric Lau Jason Miller, Inseok Choi,

Adding PEP to MulticoreEric Lau

Jason Miller, Inseok Choi, Omer Khan, Jim Holt, Anantha Chandrakasan, Donald Yeung, Saman Amarasinghe, Anant Agarwal

Feb 16, 2011

Page 2: Adding PEP to Multicore - MIT Computer Science and ...projects.csail.mit.edu/angstrom/SLS/PEP_presentation_v0.1.pdf · Adding PEP to Multicore Eric Lau Jason Miller, Inseok Choi,

Outline• Motivation• PEP Concept• PEP Core Architecture• Graphite Simulation• Applications• Conclusion

3/16/2011 Angstrom Lunch Seminar 2

Page 3: Adding PEP to Multicore - MIT Computer Science and ...projects.csail.mit.edu/angstrom/SLS/PEP_presentation_v0.1.pdf · Adding PEP to Multicore Eric Lau Jason Miller, Inseok Choi,

Motivation PEP Concept Architecture Graphite Applications Conclusion

Angstrom

3/16/2011 Angstrom Lunch Seminar 3

Decrease programming effort

Increase performance

Increase resiliency

Increase energy efficiency

Page 4: Adding PEP to Multicore - MIT Computer Science and ...projects.csail.mit.edu/angstrom/SLS/PEP_presentation_v0.1.pdf · Adding PEP to Multicore Eric Lau Jason Miller, Inseok Choi,

Motivation PEP Concept Architecture Graphite Applications Conclusion

Positive Energy Partnerships• Non-application resources that perform tasks for a

net gain in energy.

• Hardware Partnerships• shared resources• hard-wired events

• Software Partnerships• expose to h/w ISA• algorithms• event servicing

3/16/2011 Angstrom Lunch Seminar 4

Page 5: Adding PEP to Multicore - MIT Computer Science and ...projects.csail.mit.edu/angstrom/SLS/PEP_presentation_v0.1.pdf · Adding PEP to Multicore Eric Lau Jason Miller, Inseok Choi,

Motivation PEP Concept Architecture Graphite Applications Conclusion

Tile

Positive Energy Partnerships• PEP Cores

3/16/2011 Angstrom Lunch Seminar 5

Router

ComputeCore

Cache

PEP

Page 6: Adding PEP to Multicore - MIT Computer Science and ...projects.csail.mit.edu/angstrom/SLS/PEP_presentation_v0.1.pdf · Adding PEP to Multicore Eric Lau Jason Miller, Inseok Choi,

Motivation PEP Concept Architecture Graphite Applications Conclusion

PEP Core Architecture

3/16/2011 Angstrom Lunch Seminar 6

Page 7: Adding PEP to Multicore - MIT Computer Science and ...projects.csail.mit.edu/angstrom/SLS/PEP_presentation_v0.1.pdf · Adding PEP to Multicore Eric Lau Jason Miller, Inseok Choi,

Motivation PEP Concept Architecture Graphite Applications Conclusion

PEP Core Architecture• Area Considerations

• μController: 16-bit, RISC, in-order datapath, unified cache• RAW: 32-bit, RISC, 8-stage, in-order datapath, split cache• Core 2: 64-bit, x86, 14-stage, out-of-order datapath, split cache

3/16/2011 Angstrom Lunch Seminar 7

Core Node (nm) SRAM Size (mm2) Scaled Size (mm2)

μController 65 128K 1.5 0.03

RAW 180 128K 16 0.25

Intel Core 2 65 2M/4M 80 9.0

Page 8: Adding PEP to Multicore - MIT Computer Science and ...projects.csail.mit.edu/angstrom/SLS/PEP_presentation_v0.1.pdf · Adding PEP to Multicore Eric Lau Jason Miller, Inseok Choi,

Motivation PEP Concept Architecture Graphite Applications Conclusion

PEP Core Architecture• Power Considerations

• μController: 16-bit, RISC, 1 MHz• TilePro64: 64-bit, VLIW, 700-866 MHz• Core 2: 64-bit, x86, 1-2.4GHz

3/16/2011 Angstrom Lunch Seminar 8

Core Energy/Cycle

μController 27.2pJ

TilePro64 286pJ

Intel Core 2 101 000pJ

Page 9: Adding PEP to Multicore - MIT Computer Science and ...projects.csail.mit.edu/angstrom/SLS/PEP_presentation_v0.1.pdf · Adding PEP to Multicore Eric Lau Jason Miller, Inseok Choi,

Motivation PEP Concept Architecture Graphite Applications Conclusion

Graphite

3/16/2011 Angstrom Lunch Seminar 9

Target Architecture

Graphite

Host Threads

Application

Host Process

Host Process

Host Process

Target Core Target Core Target Core

Target Core Target Core Target Core

Target Core Target Core Target CoreApplication Threads

Page 10: Adding PEP to Multicore - MIT Computer Science and ...projects.csail.mit.edu/angstrom/SLS/PEP_presentation_v0.1.pdf · Adding PEP to Multicore Eric Lau Jason Miller, Inseok Choi,

Motivation PEP Concept Architecture Graphite Applications Conclusion

Graphite

3/16/2011 Angstrom Lunch Seminar 10

Target Architecture

Graphite

Host Threads

Application Threads

Application

Host Process

Host Process

Host Process

C P C P C P

C PC PC P

C P C P C P

Page 11: Adding PEP to Multicore - MIT Computer Science and ...projects.csail.mit.edu/angstrom/SLS/PEP_presentation_v0.1.pdf · Adding PEP to Multicore Eric Lau Jason Miller, Inseok Choi,

Motivation PEP Concept Architecture Graphite Applications Conclusion

Potential Applications• Case Study

– Memory Prefetching

• Other Applications– Security– Reliability– Event Probing

3/16/2011 Angstrom Lunch Seminar 11

Page 12: Adding PEP to Multicore - MIT Computer Science and ...projects.csail.mit.edu/angstrom/SLS/PEP_presentation_v0.1.pdf · Adding PEP to Multicore Eric Lau Jason Miller, Inseok Choi,

Motivation PEP Concept Architecture Graphite Applications Conclusion

Case Study: Memory Prefetching• Pre-Execution

3/16/2011 Angstrom Lunch Seminar 12

• PEP cores are free to run with minimal energy:– Low clock frequency– Low power hardware– Tight coupling

• Helper thread extracted from main thread.– Static compiler– Dynamic extraction

Page 13: Adding PEP to Multicore - MIT Computer Science and ...projects.csail.mit.edu/angstrom/SLS/PEP_presentation_v0.1.pdf · Adding PEP to Multicore Eric Lau Jason Miller, Inseok Choi,

Motivation PEP Concept Architecture Graphite Applications Conclusion

Case Study: Helper Threads• EM3D benchmark (Execution Time)

3/16/2011 Angstrom Lunch Seminar 13

1.081.17 1.27

1.38

2.07

2.36

0.0

0.5

1.0

1.5

2.0

2.5

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1

Nor

mal

ized

Exec

utio

n Ti

me

PEP Frequency/Main Frequency

Page 14: Adding PEP to Multicore - MIT Computer Science and ...projects.csail.mit.edu/angstrom/SLS/PEP_presentation_v0.1.pdf · Adding PEP to Multicore Eric Lau Jason Miller, Inseok Choi,

Motivation PEP Concept Architecture Graphite Applications Conclusion

Case Study: Helper Threads• EM3D benchmark (Energy Efficiency)

3/16/2011 Angstrom Lunch Seminar 14

1.07931.1623

1.2436 1.2592

1.5797

1.1944

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1

Nor

mal

ized

Ener

gy E

ffici

ency

PEP Frequency/Main Frequency

Page 15: Adding PEP to Multicore - MIT Computer Science and ...projects.csail.mit.edu/angstrom/SLS/PEP_presentation_v0.1.pdf · Adding PEP to Multicore Eric Lau Jason Miller, Inseok Choi,

Motivation PEP Concept Architecture Graphite Applications Conclusion

Potential Applications• Security

– PEP core does not run application code• Immunity to application level attacks!

– Dynamic Information Flow Tracking (DIFT)• Taint pointers using PEP core• Run DIFT instructions in PEP

3/16/2011 Angstrom Lunch Seminar 15

Page 16: Adding PEP to Multicore - MIT Computer Science and ...projects.csail.mit.edu/angstrom/SLS/PEP_presentation_v0.1.pdf · Adding PEP to Multicore Eric Lau Jason Miller, Inseok Choi,

Motivation PEP Concept Architecture Graphite Applications Conclusion

Potential Applications• Security

– Helper thread runs ahead only on DIFT instructions– Tight coupling allows access to register values

3/16/2011 Angstrom Lunch Seminar 16

PEP PipelineMain Pipeline

main helper

taint register

Page 17: Adding PEP to Multicore - MIT Computer Science and ...projects.csail.mit.edu/angstrom/SLS/PEP_presentation_v0.1.pdf · Adding PEP to Multicore Eric Lau Jason Miller, Inseok Choi,

Motivation PEP Concept Architecture Graphite Applications Conclusion

Potential Applications• Reliability

– With 1000 cores on a single chip, and ultra low voltage logic, transient faults are a terrible issue.

– Redundant Multithreading (RMT)• Create a leading thread and a trailing thread.• Perform exact same execution on both threads.• Commit results only if both threads yield same results.

3/16/2011 Angstrom Lunch Seminar 17

Page 18: Adding PEP to Multicore - MIT Computer Science and ...projects.csail.mit.edu/angstrom/SLS/PEP_presentation_v0.1.pdf · Adding PEP to Multicore Eric Lau Jason Miller, Inseok Choi,

Motivation PEP Concept Architecture Graphite Applications Conclusion

Potential Applications• Reliability

– Implementation• SMT: Low hardware overhead; incurs switching overhead.• CMP: Simple to implement; repeats mis-speculations.• PEP cores gain advantages of both!

3/16/2011 Angstrom Lunch Seminar 18

trailing

leading

Memory/RF/ $

Input Replication

Output Comparison

Page 19: Adding PEP to Multicore - MIT Computer Science and ...projects.csail.mit.edu/angstrom/SLS/PEP_presentation_v0.1.pdf · Adding PEP to Multicore Eric Lau Jason Miller, Inseok Choi,

Motivation PEP Concept Architecture Graphite Applications Conclusion

Potential Applications• Self-Aware Computing

– Requires a way to monitor itself:• SMT• Extra thread• Extra core

– PEP core possesses detached execution context.• Main core keeps running!

3/16/2011 Angstrom Lunch Seminar 19

Page 20: Adding PEP to Multicore - MIT Computer Science and ...projects.csail.mit.edu/angstrom/SLS/PEP_presentation_v0.1.pdf · Adding PEP to Multicore Eric Lau Jason Miller, Inseok Choi,

Motivation PEP Concept Architecture Graphite Applications Conclusion

Potential Applications• Event Probing

3/16/2011 Angstrom Lunch Seminar 20

Page 21: Adding PEP to Multicore - MIT Computer Science and ...projects.csail.mit.edu/angstrom/SLS/PEP_presentation_v0.1.pdf · Adding PEP to Multicore Eric Lau Jason Miller, Inseok Choi,

Motivation PEP Concept Architecture Graphite Applications Conclusion

Potential Applications• Event Probing

3/16/2011 Angstrom Lunch Seminar 21

Main Pipeline

Cache PEP Core

On-chip Network Switch

Low

High

PEP Network Switch

Page 22: Adding PEP to Multicore - MIT Computer Science and ...projects.csail.mit.edu/angstrom/SLS/PEP_presentation_v0.1.pdf · Adding PEP to Multicore Eric Lau Jason Miller, Inseok Choi,

Motivation PEP Concept Architecture Graphite Applications Conclusion

Conclusion• PEP cores allow for several optimizations with a

low energy/area cost (<10%).

• Next steps:– Evaluate more applications with more benchmarks.– Implement a comprehensive scheme for all these

applications.– Determine proper communication protocol between

PEP cores and main cores.

3/16/2011 Angstrom Lunch Seminar 22

Page 23: Adding PEP to Multicore - MIT Computer Science and ...projects.csail.mit.edu/angstrom/SLS/PEP_presentation_v0.1.pdf · Adding PEP to Multicore Eric Lau Jason Miller, Inseok Choi,

HELPER

Case Study: Memory Prefetching

3/16/2011 Angstrom Lunch Seminar 23

CarbonSpawnHelperThread(helper);...

CarbonCondBroadcast(&resume_cond);cur_node = node_vec;

for (n = 0; n < N; ++n, ++cur_node){CarbonMutexLock(&counter_lock);shared_counter++;CarbonMutexunLock(&counter_lock);

cur_value = val(cur_node);values = cur_node->values;coeffs = cur_node->coeffs;

for (i = 0; i < stop; i++) cur_value -= values[i]*coeffs[i];

val(cur_node) = cur_value;}

MAIN

void * helper(){

while(1) {

CarbonCondWait(&resume_cond);

// Synchronization Code...local_counter = shared_counter++;...

values = cur_node->values;coeffs = cur_node->coeffs;

for (i = 0; i < stop; i++) {coeff_dummy = coeffs[i];value_dummy = values[i];

}}

}

Page 24: Adding PEP to Multicore - MIT Computer Science and ...projects.csail.mit.edu/angstrom/SLS/PEP_presentation_v0.1.pdf · Adding PEP to Multicore Eric Lau Jason Miller, Inseok Choi,

HELPER

Case Study: Memory Prefetching

3/16/2011 Angstrom Lunch Seminar 24

CarbonSpawnHelperThread(helper);...

CarbonCondBroadcast(&resume_cond);cur_node = node_vec;

for (n = 0; n < N; ++n, ++cur_node){CarbonMutexLock(&counter_lock);shared_counter++;CarbonMutexunLock(&counter_lock);

cur_value = val(cur_node);values = cur_node->values;coeffs = cur_node->coeffs;

for (i = 0; i < stop; i++) cur_value -= values[i]*coeffs[i];

val(cur_node) = cur_value;}

MAIN

while (l_counter < n_nodes){while(1) {

CarbonMutexLock(&counter_lock);c_counter = shared_counter;CarbonMutexUnlock(&counter_lock);

offset = l_counter - c_counter;

if (offset >= MIN && offset < MAX)break;

else if ( offset <= MIN_OFFSET)l_counter = c_counter + OFFSET;break;

}

while(prev_l_counter < l_counter){cur_node++;prev_l_counter++;

}}

Page 25: Adding PEP to Multicore - MIT Computer Science and ...projects.csail.mit.edu/angstrom/SLS/PEP_presentation_v0.1.pdf · Adding PEP to Multicore Eric Lau Jason Miller, Inseok Choi,

Backup Slides

3/16/2011 Angstrom Lunch Seminar 25

Page 26: Adding PEP to Multicore - MIT Computer Science and ...projects.csail.mit.edu/angstrom/SLS/PEP_presentation_v0.1.pdf · Adding PEP to Multicore Eric Lau Jason Miller, Inseok Choi,

Motivation PEP Concept Architecture Graphite Applications Conclusion

Case Study: Memory Prefetching• Simulation Results on Graphite

– Graphite instantiates two cores per tile.– Each core runs a separate context (stack, registers, etc.)– PEP and main cores contain private L1 and shared L2.– Synchronization through finite barrier across all threads.

• EM3D benchmark (Olden Suite)

3/16/2011 Angstrom Lunch Seminar 26