memory hierarchy adaptivity an architectural perspective alex veidenbaum amrm project sponsored by...

Memory Hierarchy AdaptivityMemory Hierarchy AdaptivityAn Architectural PerspectiveAn Architectural Perspective

Alex Veidenbaum

AMRM Project

sponsored by DARPA/ITO

Opportunities for AdaptivityOpportunities for Adaptivity

• Cache organization

• Cache performance “assist” mechanisms

• Hierarchy organization

• Memory organization (DRAM, etc)

• Data layout and address mapping

• Virtual Memory

• Compiler assist

Opportunities - Opportunities - Cont’dCont’d

• Cache organization: adapt what?– Size: NO– Associativity: NO– Line size: MAYBE, – Write policy: YES (fetch,allocate,w-back/thru)– Mapping function: MAYBE


• Cache “Assist”: prefetch, write buffer, victim cache, etc. between different levels.

• Adapt what?– Which mechanism(s) to use– Mechanism “parameters”


• Hierarchy Organization:– Where are cache assist mechanisms applied?

• Between L1 and L2

• Between L1 and Memory

• Between L2 and Memory

– What are the data-paths like?• Is prefetch, victim cache, write buffer data written into the cache?

• How much parallelism is possible in the hierarchy?


• Memory Organization– Cached DRAM?– Interleave change?– PIM


• Data layout and address mapping– In theory, something can be done but…– MP case is even worse– Adaptive address mapping or hashing based

on ???


• Compiler assist– Can select initial configuration– Pass hints on to hardware– Generate code to collect run-time info and adjust execution– Adapt configuration after being “called” at certain intervals during

execution– Select/run-time optimize code


• Virtual Memory can adapt– Page size?– Mapping?– Page prefetching/read ahead– Write buffer (file cache)– The above under multiprogramming?

Applying AdaptivityApplying Adaptivity

• What Drives Adaptivity? Performance impact, overall and/or relative

• “Effectiveness”, e.g. miss rate

• Processor Stall introduced

• Program characteristics

• When to perform adaptive action– Run time: use feedback from hardware– Compile time: insert code, set up hardware

Where to ImplementWhere to Implement

• In Software: compiler and/or OS+ (Static) Knowledge of program behavior+ Factored into optimization and scheduling- Extra code, overhead- Lack of dynamic run-time information- Rate of adaptivity- requires recompilation, OS changes

Where to Implement - Where to Implement - Cont’dCont’d

• Hardware+ dynamic information available+ fast decision mechanism possible+ transparent to software (thus safe)– delay, clock rate limit algorithm complexity– difficult to maintain long-term trends– little knowledge of about program behavior

Where to Implement - Where to Implement - Cont’dCont’d

• Hardware/software+ Software can set coarse hardware parameters+ Hardware can supply software dynamic info+ Perhaps more complex algorithms can be used– Software modification required– Communication mechanism required

Current InvestigationCurrent Investigation

• L1 cache assist– See wide variability in assist mechanisms effectiveness between

• Individual Programs

• Within a program as a function of time

– Propose hardware mechanisms to select between assist types and allocate buffer space

– Give compiler an opportunity to set parameters

Mechanisms UsedMechanisms Used

• Prefetching– Stream Buffers– Stride-directed, based on address alone– Miss Stride: prefetch the same address using the number of

intervening misses

• Victim Cache

• Write Buffer, all after L1

Mechanisms UsedMechanisms Used - - Cont’dCont’d

• A mechanism can be used by itself or

• All are used at once

• Buffer space size and organization fixed

• No adaptivity involved

Observed BehaviorObserved Behavior

• Programs exhibit different effect from each mechanism, e.g none a consistent winner

• Within a program the same holds in the time domain between mechanisms.

Observed BehaviorObserved Behavior - - Cont’dCont’d

• Both of the above facts indicate a likely improvement from adaptivity– Select a better one among mechanisms

• Even more can be expected from adaptively re-allocating from the combined buffer pool– To reduce stall time– To reduce the number of misses

Proposed Adaptive MechanismProposed Adaptive Mechanism

• Hardware:– a common pool of 2-4 word buffers– a set of possible policies, a subset of:

• Stride-directed prefetch

• PC-based prefetch

• History-based prefetch

• Victim cache

• Write buffer

Adaptive Hardware - Adaptive Hardware - Cont’dCont’d

• Performance monitors for each type/buffer– misses, stall time on hit, thresholds

• Dynamic buffer allocator among mechanisms

• Allocation and monitoring policy:– Predict future behavior from observed past– Observe over a time interval dT, set for next– Save perform. trends in next-level tags (<8bits)

Further opportunities to adaptFurther opportunities to adapt

• L2 cache organization– variable-size line

• L2 non-sequential prefetch

• In-memory assists (DRAM)

MP OpportunitiesMP Opportunities

• Even longer latency

• Coherence, hardware or software

• Synchronization

• Prefetch under and beyond the above– Avoid coherence if possible– Prefetch past synchronization

• Assist Adaptive Scheduling

memory hierarchy adaptivity an architectural perspective alex veidenbaum amrm project sponsored by...

Documents