“early estimation of cache properties for multicore embedded processors” iserd icetm 2015...

20
“Early Estimation of Cache Properties for Multicore Embedded Processors” ISERD ICETM 2015 Bangkok, Thailand May 16, 2015

Upload: della-lyons

Post on 26-Dec-2015

221 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: “Early Estimation of Cache Properties for Multicore Embedded Processors” ISERD ICETM 2015 Bangkok, Thailand May 16, 2015

“Early Estimation of Cache Properties for Multicore Embedded Processors”

ISERD ICETM 2015Bangkok, Thailand

May 16, 2015

Page 2: “Early Estimation of Cache Properties for Multicore Embedded Processors” ISERD ICETM 2015 Bangkok, Thailand May 16, 2015

“Early Estimation of Cache Properties for Multicore Embedded Processors”

ISERD ICETM 2015Bangkok, Thailand

May 16, 2015

Presenter: Dr. Abu Asaduzzaman, Assistant ProfessorPrepared by: Mr. Kishore K. Chidella, PhD Student

Computer Architecture and Parallel Programming Laboratory (CAPPLab)Department of Electrical Engineering and Computer Science (EECS)

Wichita State University (WSU), USA

Page 3: “Early Estimation of Cache Properties for Multicore Embedded Processors” ISERD ICETM 2015 Bangkok, Thailand May 16, 2015

Dr. Zaman 3

“Early Estimation of Cache Properties for Multicore Embedded Processors”

Outline ►■ Introduction

Embedded systems with multicore processors Pros and cons due to cache

■ Background and Motivation Impact of cache on performance and power consumption Optimized cache improves the performance to power ratio

■ Proposed Cache Modeling Strategy Multicore architecture for embedded systems Work-flow diagram

■ Experimental Results■ Discussion

QUESTIONS? Any time!

Page 4: “Early Estimation of Cache Properties for Multicore Embedded Processors” ISERD ICETM 2015 Bangkok, Thailand May 16, 2015

Dr. Zaman 4

Authors■ Kishore K. Chidella, PhD Student

EECS Department, Wichita State University (WSU), USA

■ Muhammad F. Mridha, Assistant Professor CSE Department, University of Asia Pacific (UAP), Bangladesh

■ Abu Asaduzzaman, Assistant Professor EECS Department, Wichita State University (WSU), USA Director, Computer Arch & Parallel Prog Lab (CAPPLab)

“Early Estimation of Cache Properties for Multicore Embedded Processors”

Page 5: “Early Estimation of Cache Properties for Multicore Embedded Processors” ISERD ICETM 2015 Bangkok, Thailand May 16, 2015

Dr. Zaman 5

Introduction■ Multicore Embedded Systems

Future embedded systems should have multicore processors. Currently available single-core based simulation techniques are

not adequate to design multicore embedded systems [1-4]. Software applications are having more and more threads to take

advantage of the available cores [5-8]. Multicore processors are frequently deployed with multilevel

cache memories [9]. Parallel thread execution to achieve the best performance in

such a multicore system is difficult as it relates to cache sharing. Complex embedded systems design methodology needs

supports from early estimation techniques.

“Early Estimation of Cache Properties for Multicore Embedded Processors”

Page 6: “Early Estimation of Cache Properties for Multicore Embedded Processors” ISERD ICETM 2015 Bangkok, Thailand May 16, 2015

Dr. Zaman 6

Background and Motivation■ Some Early Work

The technical challenges associated with the integration of homogeneous and heterogeneous multiple cores in embedded systems is elucidated in [1].

However, a viable way to make early estimation on future embedded systems design is not provided.

According to the experimental results published in [4], cache parameters and the application code size have impact on total power consumption and mean delay per task.

This approach is not focused on designing embedded systems and does not cover the cache locking aspect.

“Early Estimation of Cache Properties for Multicore Embedded Processors”

Page 7: “Early Estimation of Cache Properties for Multicore Embedded Processors” ISERD ICETM 2015 Bangkok, Thailand May 16, 2015

Dr. Zaman 7

Background and Motivation (+)■ Some Early Work

Issues related to cache locking at level-1 and level-2 caches are discussed in [11, 12]. In [14], various algorithms to select a set of instructions to be locked in cache are compared. Cache locking may improve performance.

Entire (100% of the cache size) level-1 cache locking is not efficient for some applications, especially when the data size to be locked is smaller compared to the cache size.

Worst-case performance with locked caches may degrade with large cache lines due to cache pollution [12].

“Early Estimation of Cache Properties for Multicore Embedded Processors”

Page 8: “Early Estimation of Cache Properties for Multicore Embedded Processors” ISERD ICETM 2015 Bangkok, Thailand May 16, 2015

Dr. Zaman 8

Background and Motivation (+)■ Some Early Work

These techniques are developed for single-core systems and not suitable for contemporary multicore embedded systems. Also, these techniques are not useful to estimate power consumption, a crucial design factor for embedded systems.

Therefore, an early estimation technique to evaluate cache properties for multicore embedded systems is required.

“Early Estimation of Cache Properties for Multicore Embedded Processors”

Page 9: “Early Estimation of Cache Properties for Multicore Embedded Processors” ISERD ICETM 2015 Bangkok, Thailand May 16, 2015

Dr. Zaman 9

“Early Estimation of Cache Properties for Multicore Embedded Processors”

Outline ►■ Introduction

Embedded systems with multicore processors Pros and cons due to cache

■ Background and Motivation Impact of cache on performance and power consumption Optimized cache improves the performance to power ratio

■ Proposed Cache Modeling Strategy Multicore architecture for embedded systems Work-flow diagram

■ Experimental Results■ Discussion

QUESTIONS? Any time!

Page 10: “Early Estimation of Cache Properties for Multicore Embedded Processors” ISERD ICETM 2015 Bangkok, Thailand May 16, 2015

Dr. Zaman 10

Proposed Cache Modeling Strategy■ Multicore Cache Organization

Level-1• Private

• Split into I1 and D1

Level-2• Private or Shared

• Unified

Level-3• Optional (or Shared)

“Early Estimation of Cache Properties for Multicore Embedded Processors”

Page 11: “Early Estimation of Cache Properties for Multicore Embedded Processors” ISERD ICETM 2015 Bangkok, Thailand May 16, 2015

Dr. Zaman 11

Proposed Cache Modeling Strategy (+)■ Cache Locking

Private first level cache? Shared last level cache? Entire locking or partial/way locking?

“Early Estimation of Cache Properties for Multicore Embedded Processors”

Page 12: “Early Estimation of Cache Properties for Multicore Embedded Processors” ISERD ICETM 2015 Bangkok, Thailand May 16, 2015

Dr. Zaman 12

Proposed Cache Modeling Strategy (+)■ Work-Flow

Master Core Select jobs Assign jobs Pre-load cache memory Mean delay; Total power

Core x Select cache size Lock? (Yes or No) Assign task

“Early Estimation of Cache Properties for Multicore Embedded Processors”

Page 13: “Early Estimation of Cache Properties for Multicore Embedded Processors” ISERD ICETM 2015 Bangkok, Thailand May 16, 2015

Dr. Zaman 13

Simulation■ Simulation Tool

VisualSim tool to develop the modeling platform

■ Applications to Run the Simulation Program FFT (Fast Fourier Transform) GIF (Graphics Interchange Format) JPEG (Joint Photographic Experts Group) MPEG (Moving Picture Experts Group)-3 MPEG-4 Here, FFT is the smallest application (with code size 2.34 KB) and

MPEG-4 is the biggest application (with code size 91.83 KB).

“Early Estimation of Cache Properties for Multicore Embedded Processors”

Page 14: “Early Estimation of Cache Properties for Multicore Embedded Processors” ISERD ICETM 2015 Bangkok, Thailand May 16, 2015

Dr. Zaman 14

Input / Output Parameters■ Inputs

Number of cores: 4 (fixed) I1 / D1 size (KB): 2 / 2 (fixed) Line size (Byte): 128 (fixed) Associativity level (n-way): 8 (fixed) CL2 cache size (KB): 32, 64, 128, 256, or 512 Locked CL2 cache size (%): 0.0, 12.5, 25.0, 37.5, 50.0

■ Outputs Mean delay per task Total power consumption

“Early Estimation of Cache Properties for Multicore Embedded Processors”

Page 15: “Early Estimation of Cache Properties for Multicore Embedded Processors” ISERD ICETM 2015 Bangkok, Thailand May 16, 2015

Dr. Zaman 15

“Early Estimation of Cache Properties for Multicore Embedded Processors”

Outline ►■ Introduction

Embedded systems with multicore processors Pros and cons due to cache

■ Background and Motivation Impact of cache on performance and power consumption Optimized cache improves the performance to power ratio

■ Proposed Cache Modeling Strategy Multicore architecture for embedded systems Work-flow diagram

■ Experimental Results■ Discussion

QUESTIONS? Any time!

Page 16: “Early Estimation of Cache Properties for Multicore Embedded Processors” ISERD ICETM 2015 Bangkok, Thailand May 16, 2015

Dr. Zaman 16

Experimental Results■ Shared L2 Cache Size

JPEG behaves almost like GIF and MPEG-3 behaves almost like MPEG-4.

For CL2 cache size 32 KB to 128 KB, mean delay per task and total power consumption for MPEG-4 decrease significantly when we increase cache size and/or move from no locking to 25% locking.

It should be noted that the impact of shared CL2 on power consumption is more significant than that on delay.

“Early Estimation of Cache Properties for Multicore Embedded Processors”

Page 17: “Early Estimation of Cache Properties for Multicore Embedded Processors” ISERD ICETM 2015 Bangkok, Thailand May 16, 2015

Dr. Zaman 17

Experimental Results (+)■ Shared L2 Cache Size

Only for CL2 cache size 32 KB, mean delay per task and total power consumption for GIF decrease when 25% locking is applied.

However, CL2 cache size/locking has no positive impact on mean delay per task and total power consumption for FFT.

Increasing CL2 size beyond 128 KB has no positive impact (consumes more power without reducing the delay).

“Early Estimation of Cache Properties for Multicore Embedded Processors”

Page 18: “Early Estimation of Cache Properties for Multicore Embedded Processors” ISERD ICETM 2015 Bangkok, Thailand May 16, 2015

Dr. Zaman 18

Experimental Results (+)■ Shared L2 Cache Locking

Cache locking at shared CL2 has significant impact on mean delay per task and total power consumption for large applications (like MPEG-4) than small applications (like FFT).

According to shared CL2 cache locking results, the optimal performance (delay)/power ratio is obtained for 25% cache locking for all the workloads.

“Early Estimation of Cache Properties for Multicore Embedded Processors”

Page 19: “Early Estimation of Cache Properties for Multicore Embedded Processors” ISERD ICETM 2015 Bangkok, Thailand May 16, 2015

Dr. Zaman 19

Conclusions■ A simulation methodology is presented to early estimate

the effective cache properties (parameters and locked cache size) for multicore embedded systems.

■ A quad-core system with shared CL2 is simulated using FFT, GIF, JPEG, MPEG-3, and MPEG-4 workloads.

■ Albeit both mean delay per task and total power consumption decrease when shared CL2 cache size is increased and/or cache locking is applied, it is noted that the impact of shared CL2 on power consumption is more significant than that on delay.

“Early Estimation of Cache Properties for Multicore Embedded Processors”

Page 20: “Early Estimation of Cache Properties for Multicore Embedded Processors” ISERD ICETM 2015 Bangkok, Thailand May 16, 2015

Thank You!

QUESTIONS?

Contact: Abu AsaduzzamanE-mail: [email protected]

Phone: +1-316-978-5261CAPPLab: http://www.cs.wichita.edu/~capplab/

ISERD ICETM 2015 in Bangkok, Thailand

“Early Estimation of Cache Properties for Multicore Embedded Processors”