smart data structures
DESCRIPTION
Smart Data Structures. Jonathan Eastep David Wingate Anant Agarwal. 06/3/2012. Multicores are Complex!. The Problem System complexity is skyrocketing! Multicore architecture is a moving target The best algorithm and algorithm settings depend - PowerPoint PPT PresentationTRANSCRIPT
SAN FRANCISCO, CA, USA
Smart Data Structures
Jonathan EastepDavid WingateAnant Agarwal
06/3/2012
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
Multicores are Complex!
• The Problem System complexity is skyrocketing! Multicore architecture is a moving target The best algorithm and algorithm settings
depend Application inputs and workloads can be
dynamic Online tuning is necessary but typically
absent
2
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
The Big Picture
• Developed a dynamic optimization framework to auto-tune software and minimize burden
• Framework is based on online machine learning technologies
• Demonstrated the framework by designing “Smart Data Structures” for parallel programs
• The framework is general; could apply to systems such as Clouds, OS, Runtimes
3
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
Smart Data Structures
• Smart Data Structures are parallel data structures that self-optimize to minimize programmer burden
• They use online machine learning to adapt to changing app or system needs and achieve the best performance
• A library of Smart Data Structures open sourced on github (GPL)– github.com/mit-carbon/Smart-Data-Structures
• Publications: [1], [2], [3], [4]
4
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
0.0 0.3 0.6 1.0 1.3 1.6 2.0 2.3 2.6 3.0 3.3 3.6 4.0
0.6
0.8
1
1.2
1.4
1.6 x 106
T im e (s e c o nd s )Heart
rate (
beats
per s
econ
d / 1e
6)
O ptima lS martloc kP rio rity lo c k: po lic y 1P rio rity lo c k: po lic y 2S p in!loc k: R eac tiv e L oc kS pin!L oc k: Tes t and S e t
W o rklo ad #1 W o rklo ad #2 W o rklo ad #1
(Item
s pe
r sec
ond)
/ 1e
6A Sketch of The Benefits of SDS• Use a Smart Lock to optimize a master-worker
program Measure rate of completed work items Emulate dynamic frequency scaling due to Intel Turbo Boost® Workload 1: Worker 0 @ 3GHz, others @ 2GHz Workload 2: Worker 3 @ 3GHz, others @ 2GHz
5
gap
IdealSmartlock
Baseline
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
Outline
• Smart Data Structures Anatomy of a Smart Data Structure Implementation Example Research Challenges and Solutions Online Machine Learning Algorithm Empirical Benchmark Results Empirical Scalability Studies Future Directions Conclusions
6
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
What are Smart Data Structures?• Self-aware computing applied to data
structures• Data Structures that self-optimize using
online learning
• We can optimize knobs in other systems too7
• automatically• at runtime
knobs • self-tuned
Storage
AlgorithmInterfac
e• add• remove• peek
Smart DataStructure
E.g. Smart Queue
t1 t2 tn…
knobs • hand-tuned• per system• per app
DataStructure
E.g. Queue
t1 t2 tn…• static
Online LearningStorage
Algorithm
• add• remove• peek
Interface
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
Smart Data Structure Library
• C++/C Library of Popular Parallel Data Structures
8
• Supported:– Smart Lock– Smart Queue– Smart SkipList– Smart PairHeap– Smart Stack
• Future Work:– Smart DHT
• ML Optimization Type:Lock Acquisition SchedulingTuning Flat Combining
Dynamic Load-Balancing
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
Smart Queue, SkipList, PairHeap, Stack
• Implementation should leverage best-performing prior work• What are the best? Determine with experiments.• Result: Flat Combining Data Structures from
Hendler et al.• This is contrary to conventional wisdom
• Reason: FC Algorithm minimizes synchronization overheads by combining data structure ops and applying multiple ops at once
9
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
Serial Data Structure
enq cenq benq a
enq cenq b
enq d
Flat Combining Primer
10
enq a enq b enq c
Lock
WorkingWorking Working
enq d
CombiningWorking
Scancount3210!!!
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
Smart Queue, SkipList, PairHeap, Stack
• Here the application of learning is to auto-tune a performance-critical knob called the scancount
11
Interface
SmartQueue
Lock
Thread Request
Scancount
Serial QueueE.g.:
• enqueue• dequeue knobs
• number of scans over request records• peek
t1 t2 tn…
ReinforcementLearning
(of a discrete variable)Records
• dynamically tune the time spent combining
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
Why Does the Scancount Matter?
• Scancount controls how long threads spend as the combiner
• Increasing scancount allows combiner to do more data structure ops within the same lock
• But, increasing scancount increases latency of the combiner’s op
• It’s good to increase scancount up to a point, but after that latency can hurt performance
• Smart Data Structures use online learning to find the ideal scancount at any given time
12
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
SDS Implementation
• Goal: minimize application disruption
• Internal lightweight statistics or external application-specific reward signal
• Number of learning threads is one by default; it runs learning engines for all SDS 13
throughput (ops/s)
ExternalPerf. MonitorE.g. Heartbeats
ApplicationThreads
Storage
AlgorithmInterfac
e
Smart DataStructure
E.g. Smart Queue
Online Learning
• add• remove• peek
Rewardstat
t1 t2 tn…
LearningThread
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
SDS Implementation
– Machine learning co-optimization framework– Supports joint optimization: multiple
knobs– Supports discrete, gaussian, boolean,
permutation knobs– Designed explicitly to support other
systems than SDS
14
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
Major SDS Research Challenges
1. How do you find knob settings with best long-term effects?
2. How do you measure if a knob setting is helping?
3. How do you optimize quickly enough to not miss opportunities?
4. How do you manage a potentially intractable search space?
15
Quality Challenge
s
Timeliness
Challenges
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
Addressing Other Quality Challenges
1. How do you find settings with best long-term effects?
Leverage one of the machine learning technologies for planning
Use online RL to adapt to workload or phase changes
2. How do you measure if a knob setting is helping?
Extensible reward signal interface for performance monitoring
Heartbeats Framework for application-specific perf. evaluations
16
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
Addressing Timeliness Challenges
3. How to optimize fast enough not to miss opportunities? Choose a fast gradient-based machine learning algorithm Use learning helper thread to decouple learning from app
threads
4. How to manage potentially intractable search space? Relax potentially exponential discrete action space into
continuous one Use a stochastic soft-max policy which enables gradient-based
learning17
Burberry
“Sorry I’m late dear…have you been waiting long?”
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
Reinforcement Learning Algorithm
• Goal: optimize rate of reward (e.g. heart rate)• Method: Policy Gradients Algorithm
Online, model-free, handles exponential knob spaces
Learn a stochastic policy which will give a probability distribution over knob settings for each knob
Sample settings for each knob from the policy, try them empirically, and listen to performance feedback signal
Improve the policy using a method analogous to gradient ascent• I.e. estimate gradient of the reward wrt policy and
step policy in the gradient direction to get maximum reward
• Balance exploration vs. exploitation + make policy differentiable via stochastic soft-max policy
18
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
How Does SDS Perform? • Full sweep over SDS, load: compare against
Static Oracle• Result: near-ideal performance in many cases
• Result: Quality Challenge is met
19
0
500
1000
1500
Smart Queue Ideal StaticSDS DynamicAvg Static
Post Computation (ns)
Thro
ughp
ut (o
ps/m
s)
0200400600800
1000
Smart Pair Heap
Post Computation (ns)†14 threads
Static AvgSDS DynamicStatic Oracle
0200400600800
1000
Smart Skip List
Post Computation (ns)
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
What if Workload Changes Rapidly?
• Inject changes in the data structure “load” (i.e. post computation between ops)
• Sweep over SDS, random load schedules, frequencies• Result: Good benefit even when load changes every
10μs
• Result: Quality and Timeliness Challenges are met 20
1/10000
1/1000 1/100 1/100
200400600800Smart Pairing Heap: Sched.
1
Interval Frequency (1/µs)
1/10000
1/1000 1/100 1/100
200400600800
Smart Skip List: Sched. 1
Interval Frequency (1/µs)
0200400600800
1000Smart Queue: Sched.
1
Interval Frequency (1/µs)
Thro
ughp
ut
(ops
/ms)
Dynamic AverageSDS DynamicDynamic Oracle †14 threads
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
Future Directions
• Extend this work to a common framework to coordinate tuning across all system layers E.g.: application -> runtime -> OS -> HW Scalable, decentralized optimization
methods
21
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
Conclusions
• Developed a framework to dynamically tune systems and minimize programmer burden via online machine learning
• Demonstrated the framework through a case study of self-tuning “Smart Data Structures”
• Now looking at uses in systems beyond data structures jonathan dot eastep at gmail
• Reinforcement Learning will play an increasingly important role in the development of future software and hardware
22
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
Presentation References
[1] J. Eastep, D. Wingate, M. D. Santambrogio, A. Agarwal, “Smartlocks: Lock Acquisition Scheduling for Self-Aware Synchronization,” 7th IEEE International Conference on Autonomic Computing (ICAC’10), 2010. Best Student Paper Award (pdf)
[2] J. Eastep, D. Wingate, M.D. Santambrogio, A. Agarwal, “Smartlocks: Self-Aware Synchronization through Lock Acquisition Scheduling,” MIT CSAIL Technical Report, MIT-CSAIL-TR-2009-055, November 2009. (pdf)
[3] J. Eastep, D. Wingate, A. Agarwal, “Smart Data Structures: A Reinforcement Learning Approach to Multicore Data Structures,” 8th IEEE International Conference on Autonomic Computing (ICAC’11), 2011. (pdf)
[4] J. Eastep, “Smart Data Structures: An Online Machine Learning Approach to Multicore Data Structures,” Doctoral Dissertation, MIT, May 2011 (pdf)
23