memory hierarchy adaptivity an architectural perspective alex veidenbaum amrm project sponsored by...

22
Memory Hierarchy Adaptivity Memory Hierarchy Adaptivity An Architectural Perspective An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO

Upload: heather-ross

Post on 17-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO

Memory Hierarchy AdaptivityMemory Hierarchy AdaptivityAn Architectural PerspectiveAn Architectural Perspective

Alex Veidenbaum

AMRM Project

sponsored by DARPA/ITO

Page 2: Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO

Opportunities for AdaptivityOpportunities for Adaptivity

• Cache organization

• Cache performance “assist” mechanisms

• Hierarchy organization

• Memory organization (DRAM, etc)

• Data layout and address mapping

• Virtual Memory

• Compiler assist

Page 3: Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO

Opportunities - Opportunities - Cont’dCont’d

• Cache organization: adapt what?– Size: NO– Associativity: NO– Line size: MAYBE, – Write policy: YES (fetch,allocate,w-back/thru)– Mapping function: MAYBE

Page 4: Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO

Opportunities - Opportunities - Cont’dCont’d

• Cache “Assist”: prefetch, write buffer, victim cache, etc. between different levels.

• Adapt what?– Which mechanism(s) to use– Mechanism “parameters”

Page 5: Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO

Opportunities - Opportunities - Cont’dCont’d

• Hierarchy Organization:– Where are cache assist mechanisms applied?

• Between L1 and L2

• Between L1 and Memory

• Between L2 and Memory

– What are the data-paths like?• Is prefetch, victim cache, write buffer data written into the cache?

• How much parallelism is possible in the hierarchy?

Page 6: Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO

Opportunities - Opportunities - Cont’dCont’d

• Memory Organization– Cached DRAM?– Interleave change?– PIM

Page 7: Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO

Opportunities - Opportunities - Cont’dCont’d

• Data layout and address mapping– In theory, something can be done but…– MP case is even worse– Adaptive address mapping or hashing based

on ???

Page 8: Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO

Opportunities - Opportunities - Cont’dCont’d

• Compiler assist– Can select initial configuration– Pass hints on to hardware– Generate code to collect run-time info and adjust execution– Adapt configuration after being “called” at certain intervals during

execution– Select/run-time optimize code

Page 9: Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO

Opportunities - Opportunities - Cont’dCont’d

• Virtual Memory can adapt– Page size?– Mapping?– Page prefetching/read ahead– Write buffer (file cache)– The above under multiprogramming?

Page 10: Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO

Applying AdaptivityApplying Adaptivity

• What Drives Adaptivity? Performance impact, overall and/or relative

• “Effectiveness”, e.g. miss rate

• Processor Stall introduced

• Program characteristics

• When to perform adaptive action– Run time: use feedback from hardware– Compile time: insert code, set up hardware

Page 11: Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO

Where to ImplementWhere to Implement

• In Software: compiler and/or OS+ (Static) Knowledge of program behavior+ Factored into optimization and scheduling- Extra code, overhead- Lack of dynamic run-time information- Rate of adaptivity- requires recompilation, OS changes

Page 12: Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO

Where to Implement - Where to Implement - Cont’dCont’d

• Hardware+ dynamic information available+ fast decision mechanism possible+ transparent to software (thus safe)– delay, clock rate limit algorithm complexity– difficult to maintain long-term trends– little knowledge of about program behavior

Page 13: Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO

Where to Implement - Where to Implement - Cont’dCont’d

• Hardware/software+ Software can set coarse hardware parameters+ Hardware can supply software dynamic info+ Perhaps more complex algorithms can be used– Software modification required– Communication mechanism required

Page 14: Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO

Current InvestigationCurrent Investigation

• L1 cache assist– See wide variability in assist mechanisms effectiveness between

• Individual Programs

• Within a program as a function of time

– Propose hardware mechanisms to select between assist types and allocate buffer space

– Give compiler an opportunity to set parameters

Page 15: Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO

Mechanisms UsedMechanisms Used

• Prefetching– Stream Buffers– Stride-directed, based on address alone– Miss Stride: prefetch the same address using the number of

intervening misses

• Victim Cache

• Write Buffer, all after L1

Page 16: Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO

Mechanisms UsedMechanisms Used - - Cont’dCont’d

• A mechanism can be used by itself or

• All are used at once

• Buffer space size and organization fixed

• No adaptivity involved

Page 17: Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO

Observed BehaviorObserved Behavior

• Programs exhibit different effect from each mechanism, e.g none a consistent winner

• Within a program the same holds in the time domain between mechanisms.

Page 18: Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO

Observed BehaviorObserved Behavior - - Cont’dCont’d

• Both of the above facts indicate a likely improvement from adaptivity– Select a better one among mechanisms

• Even more can be expected from adaptively re-allocating from the combined buffer pool– To reduce stall time– To reduce the number of misses

Page 19: Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO

Proposed Adaptive MechanismProposed Adaptive Mechanism

• Hardware:– a common pool of 2-4 word buffers– a set of possible policies, a subset of:

• Stride-directed prefetch

• PC-based prefetch

• History-based prefetch

• Victim cache

• Write buffer

Page 20: Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO

Adaptive Hardware - Adaptive Hardware - Cont’dCont’d

• Performance monitors for each type/buffer– misses, stall time on hit, thresholds

• Dynamic buffer allocator among mechanisms

• Allocation and monitoring policy:– Predict future behavior from observed past– Observe over a time interval dT, set for next– Save perform. trends in next-level tags (<8bits)

Page 21: Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO

Further opportunities to adaptFurther opportunities to adapt

• L2 cache organization– variable-size line

• L2 non-sequential prefetch

• In-memory assists (DRAM)

Page 22: Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO

MP OpportunitiesMP Opportunities

• Even longer latency

• Coherence, hardware or software

• Synchronization

• Prefetch under and beyond the above– Avoid coherence if possible– Prefetch past synchronization

• Assist Adaptive Scheduling