memory hierarchy adaptivity an architectural perspective alex veidenbaum amrm project sponsored by...
TRANSCRIPT
Memory Hierarchy AdaptivityMemory Hierarchy AdaptivityAn Architectural PerspectiveAn Architectural Perspective
Alex Veidenbaum
AMRM Project
sponsored by DARPA/ITO
Opportunities for AdaptivityOpportunities for Adaptivity
• Cache organization
• Cache performance “assist” mechanisms
• Hierarchy organization
• Memory organization (DRAM, etc)
• Data layout and address mapping
• Virtual Memory
• Compiler assist
Opportunities - Opportunities - Cont’dCont’d
• Cache organization: adapt what?– Size: NO– Associativity: NO– Line size: MAYBE, – Write policy: YES (fetch,allocate,w-back/thru)– Mapping function: MAYBE
Opportunities - Opportunities - Cont’dCont’d
• Cache “Assist”: prefetch, write buffer, victim cache, etc. between different levels.
• Adapt what?– Which mechanism(s) to use– Mechanism “parameters”
Opportunities - Opportunities - Cont’dCont’d
• Hierarchy Organization:– Where are cache assist mechanisms applied?
• Between L1 and L2
• Between L1 and Memory
• Between L2 and Memory
– What are the data-paths like?• Is prefetch, victim cache, write buffer data written into the cache?
• How much parallelism is possible in the hierarchy?
Opportunities - Opportunities - Cont’dCont’d
• Memory Organization– Cached DRAM?– Interleave change?– PIM
Opportunities - Opportunities - Cont’dCont’d
• Data layout and address mapping– In theory, something can be done but…– MP case is even worse– Adaptive address mapping or hashing based
on ???
Opportunities - Opportunities - Cont’dCont’d
• Compiler assist– Can select initial configuration– Pass hints on to hardware– Generate code to collect run-time info and adjust execution– Adapt configuration after being “called” at certain intervals during
execution– Select/run-time optimize code
Opportunities - Opportunities - Cont’dCont’d
• Virtual Memory can adapt– Page size?– Mapping?– Page prefetching/read ahead– Write buffer (file cache)– The above under multiprogramming?
Applying AdaptivityApplying Adaptivity
• What Drives Adaptivity? Performance impact, overall and/or relative
• “Effectiveness”, e.g. miss rate
• Processor Stall introduced
• Program characteristics
• When to perform adaptive action– Run time: use feedback from hardware– Compile time: insert code, set up hardware
Where to ImplementWhere to Implement
• In Software: compiler and/or OS+ (Static) Knowledge of program behavior+ Factored into optimization and scheduling- Extra code, overhead- Lack of dynamic run-time information- Rate of adaptivity- requires recompilation, OS changes
Where to Implement - Where to Implement - Cont’dCont’d
• Hardware+ dynamic information available+ fast decision mechanism possible+ transparent to software (thus safe)– delay, clock rate limit algorithm complexity– difficult to maintain long-term trends– little knowledge of about program behavior
Where to Implement - Where to Implement - Cont’dCont’d
• Hardware/software+ Software can set coarse hardware parameters+ Hardware can supply software dynamic info+ Perhaps more complex algorithms can be used– Software modification required– Communication mechanism required
Current InvestigationCurrent Investigation
• L1 cache assist– See wide variability in assist mechanisms effectiveness between
• Individual Programs
• Within a program as a function of time
– Propose hardware mechanisms to select between assist types and allocate buffer space
– Give compiler an opportunity to set parameters
Mechanisms UsedMechanisms Used
• Prefetching– Stream Buffers– Stride-directed, based on address alone– Miss Stride: prefetch the same address using the number of
intervening misses
• Victim Cache
• Write Buffer, all after L1
Mechanisms UsedMechanisms Used - - Cont’dCont’d
• A mechanism can be used by itself or
• All are used at once
• Buffer space size and organization fixed
• No adaptivity involved
Observed BehaviorObserved Behavior
• Programs exhibit different effect from each mechanism, e.g none a consistent winner
• Within a program the same holds in the time domain between mechanisms.
Observed BehaviorObserved Behavior - - Cont’dCont’d
• Both of the above facts indicate a likely improvement from adaptivity– Select a better one among mechanisms
• Even more can be expected from adaptively re-allocating from the combined buffer pool– To reduce stall time– To reduce the number of misses
Proposed Adaptive MechanismProposed Adaptive Mechanism
• Hardware:– a common pool of 2-4 word buffers– a set of possible policies, a subset of:
• Stride-directed prefetch
• PC-based prefetch
• History-based prefetch
• Victim cache
• Write buffer
Adaptive Hardware - Adaptive Hardware - Cont’dCont’d
• Performance monitors for each type/buffer– misses, stall time on hit, thresholds
• Dynamic buffer allocator among mechanisms
• Allocation and monitoring policy:– Predict future behavior from observed past– Observe over a time interval dT, set for next– Save perform. trends in next-level tags (<8bits)
Further opportunities to adaptFurther opportunities to adapt
• L2 cache organization– variable-size line
• L2 non-sequential prefetch
• In-memory assists (DRAM)
MP OpportunitiesMP Opportunities
• Even longer latency
• Coherence, hardware or software
• Synchronization
• Prefetch under and beyond the above– Avoid coherence if possible– Prefetch past synchronization
• Assist Adaptive Scheduling