linux kernel scalability: using the right tool for the job · l2 cache hit 12.9 atomic increment...
TRANSCRIPT
![Page 1: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/1.jpg)
1
Linux Kernel Scalability:Using the Right Tool for the
Job
Paul E. McKenneyIBM Beaverton
2005 linux.conf.au
Copyright © 2005 IBM Corporation
![Page 2: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/2.jpg)
2
Overview
● Moore's Law and SMP Software● Performance Fault Isolation● Synchronization Usage:
– Locking, Counting, NBS, and RCU– Putting it All Together
● The Road Ahead● Summary
![Page 3: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/3.jpg)
3
Moore's Law and SMP Software
![Page 4: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/4.jpg)
4
Instruction Speed Increased
![Page 5: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/5.jpg)
5
Synchronization Speed Decreased
![Page 6: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/6.jpg)
6
Critical-Section Efficiency
Lock Acquisition (Ta )
Critical Section (Tc )
Lock Release (Tr )
Efficiency =T
c
Tc+T
a+T
r
Assuming negligible contention and no caching effects in critical section
![Page 7: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/7.jpg)
7
Instruction/Pipeline Costs on a4-CPU 700MHz Pentium®-III
Operation NanosecondsInstruction 0.7Clock Cycle 1.4L2 Cache Hit 12.9Atomic Increment 58.2Cmpxchg Atomic Increment 107.3Atomic Incr. Cache Transfer 113.2Main Memory 162.4CPU-Local Lock 163.7Cmpxchg Blind Cache Transfer 170.4Cmpxchg Cache Transfer and Invalidate 360.9
![Page 8: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/8.jpg)
8
Visual Demonstration of Latency
cmpxchg transfer & invalidate: 360.9ns
Each pair of nanoseconds representsup to about three instructions
![Page 9: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/9.jpg)
9
What is Going On? (1/3)● Taller memory hierarchies
– Memory speeds have not kept up with CPU speeds● 1984: no caches needed, since instructions slower than
memory accesses● 2005: 3-4 level cache hierarchies, since instructions orders
of magnitude faster than memory accesses● Synchronization requires consistent view of data across CPUs, in
other words, CPU-to-CPU communication– Unlike normal instructions, synchronization operations tend
not to hit in top-level cache– Hence, they are orders of magnitude slower than normal
instructions because of memory latency
![Page 10: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/10.jpg)
10
What is Going On? (2/3)● Longer pipelines
– 1984: Many clocks per instruction– 2005: Many instructions per clock – 20-stage pipelines
● Modern super-scalar CPUs execute instructions out of order in order to keep their pipelines full– Can't reorder the critical section before the lock!!!
● Therefore, synchronization operations must stall the pipeline, decreasing performance
![Page 11: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/11.jpg)
11
What is Going On? (3/3)● 1984: The main issue was lock contention● 2005: Even if lock contention is eliminated, critical-
section efficiency must be addressed!!!– Even if the lock is always free when acquired,
performance is seriously degraded– Some hardware guys tell me that this will all soon be
better...● But I will believe it when I see it!!!
![Page 12: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/12.jpg)
12
Forces Acting on SMP Efficiency
SMP Efficiency
HardwareThreading
MulticoreDies
Memory-SystemPerformance
SystemSize
HistoricTrends
CPU ClockFrequency
![Page 13: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/13.jpg)
13
Performance Fault Isolation
![Page 14: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/14.jpg)
14
Finding Performance Problems
● System-level throughput/latency tests● Profiling● Differential profiling
– http://www.rdrop.com/users/paulmck/paper/profiling.2002.06.04.pdf
● Hardware-level tools
![Page 15: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/15.jpg)
15
Locking
![Page 16: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/16.jpg)
16
Locking Designs
SequentialProgram
CodeLocking
DataLocking
DataOwnership
Partition Fuse
Partition Fuse
Own Disown
Reader/WriterLocking
RCU
HierarchicalLocking
AllocatorCaches
ParallelFastpath
Critical-SectionFusing
Critical-SectionPartitioning
Inverse
![Page 17: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/17.jpg)
17
Sequential Program
● If a single CPU can do the job you need, why are you messing with SMP and locking???– Not enough challenge in your life???– You like slowing things down by including SMP
primitives?
![Page 18: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/18.jpg)
18
Code Locking
● AKA “global locking”:– Only one CPU at a time in given code path
● Very simple, but no scaling● Examples:
– 2.4 runqueue_lock– dcache_lock
● Guards all dcache in 2.4, dcache updates in 2.6– rcu_ctrlblk.mutex
![Page 19: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/19.jpg)
19
Data Locking
● But isn't it all data locking?– Yes, but... Data locking associates locks with
individual data items rather than code paths:● 2.4: “spin_lock_irq(&runqueue_lock);”● 2.6: “spin_lock_irq(&rq->lock)”
– Allows CPUs to process different data in parallel● Examples:
– 2.6 O(1) scheduler (per-runqueue locking)– 2.6 d_lock (per-dentry locking for path walking)– Manfred Spraul RCU_HUGE patch
![Page 20: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/20.jpg)
20
Data Locking Implications (1)
● How to handle common global structure?– Retain global lock for this purpose
● dcache_lock retained when per-dentry d_lock added● Need both locks on many code paths
– Restructure to eliminate common structure– Apply more aggressive locking model
● What if every CPU hits the same data item?– mm_lock is great – unless everyone is faulting on the
same shared-memory segment...
![Page 21: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/21.jpg)
21
Data Locking Implications (2)
● How to handle two data items concurrently?– Acquire locks in order: d_move() in dcache:
if (target < dentry) { spin_lock(&target- >d_lock); spin_lock(&dentry- >d_lock);} else { spin_lock(&dentry- >d_lock); spin_lock(&target- >d_lock);}
– Acquire multiple locks only if holding global lock● Careful!!! The use of a global lock can easily wipe out any
data-locking performance gains!– Figure out a way to handle one item at a time
● But first need to carefully state requirements...
![Page 22: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/22.jpg)
22
Data Locking: Requirements
● Move element between two linked lists– Delete from list A, insert into list B– Cannot copy, must move the element!!!
● Might be lots of references to element being moved● Each list has its own lock● Only hold one lock at a time: Avoid deadlock
– But OK to acquire and release locks in sequence● Acquire A, release A, acquire B, release B...
● Must be “atomic”:– If not found in old list, must be in new list– If found in new list, must not be in old list
● Is there a solution???
![Page 23: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/23.jpg)
23
Data Locking: One at a TimeA B C
D E
A B C
D E
Tombstone
A
B
C
D E
Tombstone
A
B
C
D E
1
2
3
4
![Page 24: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/24.jpg)
24
Data Ownership
● DEFINE_PER_CPU(type, name)– But it is possible to access others' variables via
per_cpu(var, cpu)– This is used during initialization– Also for reading out performance statistics
● IA64 pfm_proc_show()● PPC64 proc_eeh_show()
– And for coordinating CPUs● IA64 wrap_mmu_context()
![Page 25: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/25.jpg)
25
Data Ownership Implications
● Data completely private to owning CPU– Used pervasively throughout Linux® kernel
● Incomplete privacy:– Owning CPU updates, others read
● Statistics (next slide)– Owning CPU offline, so other CPUs may update
● Didn't see any, may have missed some...– Owning CPU reads, others update (via sysfs)
● store_smt_snooze_delay()
![Page 26: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/26.jpg)
26
Owning CPU Updates
● TCP stats gathered via IP_INC_STATS_BH● TCP stats readout:
static unsigned long__fold_field(void *mib[], int offt){
unsigned long res = 0;int i;for (i = 0; i < NR_CPUS; i++) {if (!cpu_possible(i))
continue;res += *((unsigned long *)(((void *)per_cpu_ptr(mib[0], i)) +
offt));res += *((unsigned long *)(((void *)per_cpu_ptr(mib[1], i)) +
offt));}
return res;}
![Page 27: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/27.jpg)
27
Owning CPU Reads
● PPC64 idle-loop control of hardware threads:unsigned long start_snooze;unsigned long *smt_snooze_delay = &__get_cpu_var(smt_snooze_delay);while (1) {
oldval = test_and_clear_thread_flag(TIF_NEED_RESCHED);if (!oldval) {
set_thread_flag(TIF_POLLING_NRFLAG);start_snooze = __get_tb() +
*smt_snooze_delay * tb_ticks_per_usec;while (!need_resched()) {
if (*smt_snooze_delay == 0 || __get_tb() < start_snooze) {
HMT_low(); / * Low thread priority */continue;
}HMT_very_low(); / * Low power mode */
. . .
![Page 28: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/28.jpg)
28
Data Ownership: Function Shipping
● mm/slab.c: static void do_drain(void *arg){
kmem_cache_t *cachep = (kmem_cache_t*)arg;struct array_cache *ac;check_irq_off();ac = ac_data(cachep); / * Returns ptr to per- CPU element. */spin_lock(&cachep- >spinlock);free_block(cachep, &ac_entry(ac)[0], ac- >avail);spin_unlock(&cachep- >spinlock);ac- >avail = 0;
}static void drain_cpu_caches(kmem_cache_t *cachep){
smp_call_function_all_cpus(do_drain, cachep);. . .
}
![Page 29: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/29.jpg)
29
Parallel Fastpath
● Make the common case fast, the uncommon case as simple as possible– Reader-writer locking– RCU (more on this later...)– Hierarchical locking– Allocator caches
![Page 30: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/30.jpg)
30
Reader-Writer Locking
● Use for large read-side critical sections● get_task() is an example of good usage:
– Might have 1000s of processes– Releases lock before returning pointer...
read_lock(&tasklist_lock);for_each_process(task){
if(task- >pid == pid){ret = task;break;
}}read_unlock(&tasklist_lock);
![Page 31: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/31.jpg)
31
Do Not Use rwlock_t for Short Read-Side Critical Sections
CPU 0
CPU 1
Rea
d-A
cqu
ire
Read-SideCritical Section
Me
mo
ry B
arr
ier
Rea
d-A
cqu
ire
Mem
ory
Ba
rrie
r
Rea
d-A
cqu
ire
Mem
ory
Ba
rrie
r
Rea
d-A
cqu
ire
Mem
ory
Ba
rrie
r
Read-SideCritical Section
![Page 32: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/32.jpg)
32
Performance Comparison:What Benchmark to Use?
● Focus on operating-system kernels– Many read-mostly hash tables
● Hash-table mini-benchmark– Dense array of buckets– Doubly-linked hash chains– One element per hash chain
● You do tune your hash tables, don't you???
![Page 33: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/33.jpg)
33
How to Evaluate Performance?
● Mix of operations:– Search– Delete followed by reinsertion: maintain loading– Random run lengths for specified mix
● (See thesis)● Start with pure search workload (read only)● Run on 4-CPU 700MHz P-III system
– Single quad Sequent®/IBM® NUMA-Q® system
![Page 34: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/34.jpg)
34
Locking Performance
Extra CPUs not buying much!Note: workload fits in cache.
![Page 35: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/35.jpg)
35
Locking Designs
SequentialProgram
CodeLocking
DataLocking
DataOwnership
Partition Fuse
Partition Fuse
Own Disown
Reader/WriterLocking
RCU
HierarchicalLocking
AllocatorCaches
ParallelFastpath
Critical-SectionFusing
Critical-SectionPartitioning
Inverse
![Page 36: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/36.jpg)
36
Counting
![Page 37: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/37.jpg)
37
Counters: Workload Dependent● No blocking while holding or releasing count● Updates rare (just use a global counter!!!)● Updates common:
– References rare:● “Fuzzy” readout permissible● Exact readout required
– References frequent:● Just use seqlock_t!!!● Memory-barrier/atomic overhead too much and large value
– “Fuzzy” readout permissible– References are checks for rarely exceeded range
● Otherwise, innovation required
![Page 38: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/38.jpg)
38
Updates Common, References Rare (1)
● Statistical counters!!! Per-CPU counters...● Fuzzy readout: just need to manage value
– Reference released on same CPU as acquired (or monotonic counters)
● Simple per-CPU counters, sum them without lock● See previous data-ownership example
– CPUs can release other CPUs' references● Need to migrate counts in some cases
– For example, if it is important to detect zero crossings– Rusty has been working on a prototype, crude version here
![Page 39: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/39.jpg)
39
Updates Common, References Rare (2)
● Exact readout at arbitrary time and value?● Must stall readers... And add complexity...
– br_read_lock() to update counter, br_write_lock() to read counter (use per_cpu() spinlocks in 2.6)
● Moderate latency for readout● Moderate overhead for read
– RCU and flags, readers block if flag set● Untried, not clear this is a good approach
● Friendly advice... Tolerate uncertainty!!!
![Page 40: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/40.jpg)
40
brlock Counter● Implementing brlock counter in 2.4 kernel:
/ * Increment counter. */br_read_lock(BR_MY_LOCK);__get_cpu_var(my_count)++;br_read_unlock(BR_MY_LOCK);
/ * Read out counter. */br_write_lock(BR_MY_LOCK);for_each_cpu(i)
sum += per_cpu(my_count, i);br_write_unlock(BR_MY_LOCK);
● Yes, you do read-acquire lock to do write and vice versa!!!● We are really using (abusing!) the brlock as a local-global rather
than a reader-writer lock● Need very low read-out rate on a large Altix...
![Page 41: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/41.jpg)
41
2.6 Implementation of brlock Counter
● Implementing brlock counter in 2.6 kernel:/ * Increment counter. */spin_lock(__get_cpu_var(mylock));__get_cpu_var(mycount)++;spin_unlock(__get_cpu_var(mylock));
/ * Read out counter. */for_each_cpu(i) {
spin_lock(per_cpu(mylock, i));sum += per_cpu(mycount, i);
}for_each_cpu(i) {
spin_unlock(per_cpu(mylock, i));}
● A few more lines of on the read-out side, but two rather than three loops
● Inline functions helpful if frequently used
![Page 42: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/42.jpg)
42
“Big Reference Count”
● Maintain per-CPU counters● But also provide a global counter
– Value is sum of all counters– Ship counts between per-CPU and global count– Apply a large bias to the count
● Use the per-CPU counters in fastpath● When checking for zero, remove the bias
– Force use of only global counter
![Page 43: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/43.jpg)
43
Big Reference Count Data
● Per-CPU component:struct brefcnt_percpu { int brcp_count; / * Per- CPU ctr. Should interlace */}
● Global component:struct brefcnt { spinlock_t brc_mutex; / * Guards all but brc_percpu. */ long brc_global; / * Global portion of count. */ void (*brc_zero)(struct brefcnt *r, void *arg); / * Function to call zero count. */ void *brc_arg; / * 2nd argument for brc_zero. */ struct brefcnt_percpu *brc_percpu ____cacheline_aligned; int brc_local; / * 1=use local counts, 0=use gbl. */};
● Converging with krefcnt would be challenge!!!
![Page 44: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/44.jpg)
44
Big Reference Count Increment● Big reference count increment:
void brefcnt_inc(struct brefcnt *r){ int val; if (likely(r- >brc_local)) { val = r- >brc_percpu[smp_processor_id()].brcp_count++; if (unlikely(val > 2 * BREFCNT_PER_CPU_TARGET)) { r- >brc_percpu[smp_processor_id()].brcp_count - = BREFCNT_PER_CPU_TARGET; spin_lock(&r- >brc_mutex); r- >brc_global += BREFCNT_PER_CPU_TARGET; spin_unlock(&r- >brc_mutex); } return; } spin_lock(&r- >brc_mutex); r- >brc_global++; spin_unlock(&r- >brc_mutex);}
![Page 45: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/45.jpg)
45
Big Reference Count Decrement● Big reference count decrement:
void brefcnt_dec(struct brefcnt *r){
long val;int *pcp = &r- >brc_percpu[smp_processor_id()].brcp_count;if (likely(r- >brc_local)) {
if (*pcp > 1) {(*pcp)- - ;return;
}spin_lock(&r- >brc_mutex);r- >brc_global - = BREFCNT_PER_CPU_TARGET;spin_unlock(&r- >brc_mutex);*pcp += BREFCNT_PER_CPU_TARGET - 1;return;
}spin_lock(&r- >brc_mutex);val = - - r- >brc_global;spin_unlock(&r- >brc_mutex);if ((val == 0) && (r- >brc_zero != NULL)) {
r- >brc_zero(r, r- >brc_arg);}
}
![Page 46: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/46.jpg)
46
Big Refcount Remove Bias● Big refcount bias removal:
void brefcnt_remove_bias(struct brefcnt *r){ int i; long val;
spin_lock(&r- >brc_mutex); r- >brc_local = 0; spin_unlock(&r- >brc_mutex);
synchronize_kernel(); / * wait for racing incs/ decs. */
spin_lock(&r- >brc_mutex); for_each_cpu(i) { r- >brc_global += r- >brc_percpu[i].brcp_count; r- >brc_percpu[i].brcp_count = 0; } val = (r- >brc_global - = BREFCNT_BIAS); spin_unlock(&r- >brc_mutex); if ((val == 0) && (r- >brc_zero != NULL)) r- >brc_zero(r, r- >brc_arg);}
![Page 47: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/47.jpg)
47
Updates Rare, References Common
● Just use seqlock_t!● Unless you cannot afford the atomic-instruction
and memory-barrier overhead:– If you really believe you cannot afford the atomic-
instruction and memory-barrier overhead, do the measurements again, and carefully analyze the results!!!
– If you really cannot afford this, you can use big reference count in some special cases
![Page 48: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/48.jpg)
48
seqlock_t Timer Handling
● Timer update:write_seqlock(&xtime_lock);cur_timer- >mark_offset();do_timer_interrupt(irq, NULL, regs);write_sequnlock(&xtime_lock);
● Timer readout:do { seq = read_seqbegin_irqsave(&xtime_lock, flags); delta_cycles = rpcc() - state.last_time; sec = xtime.tv_sec; usec = (xtime.tv_nsec / 1000); partial_tick = state.partial_tick; lost = jiffies - wall_jiffies;} while (read_seqretry_irqrestore(&xtime_lock, seq, flags));
![Page 49: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/49.jpg)
49
Counter Decision Tree
Holders or releasersblock?
N RCU
Y
Updates rare?
N
YGlobalCounter
References rare? Y
N
![Page 50: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/50.jpg)
50
Counter Decision Tree (Rare Ref)
Exact Readout? YRCU+flag-or- brlock
Y
Acquire/release onsame CPU?
N
Arbitrary value? Y
Biased cacheper-CPU
NPer-CPUcounterY
Per-CPUcache
References Rare
![Page 51: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/51.jpg)
51
Counter Decision Tree (Many Ref)
Large Value? YPer-CPU
cache
Rarely exceededlarge range?
N
Fuzzy readout? Y
Life is hard!!!
References Common
NPer-CPU
cacheY
N
![Page 52: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/52.jpg)
52
Other Counter Complications
● 64-bit counters on 32-bit machine● Access from both irq and process context
– Preemption can have similar effects...● Need to update other CPUs' counters● Need agreement on sequence of values
– Parallel increments of 1, 5, and 7– 1, 6, 13? 5, 12, 13? 7, 8, 13?– Friendly advice: tolerate dissent!!!
![Page 53: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/53.jpg)
53
5
1
7
5
Tolerate Counting Dissent
135
1 6
7
8
12
1
7
57
7
1
15
0
![Page 54: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/54.jpg)
54
Non-Blocking Synchronization (NBS)
![Page 55: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/55.jpg)
55
What About Non-Blocking Synchronization?
● What is non-blocking synchronization (NBS)?– Roll back to resolve conflicting changes instead of
spinning or blocking– Uses atomic instructions to hide complex updates
behind a single commit point● Readers and writers use atomic instructions such as
compare-and-swap or LL/SC● Simple “NBS” algorithms in heavy use
– Atomic-instruction-based algorithms
![Page 56: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/56.jpg)
56
Why Not NBS All The Time?
Operation NanosecondsInstruction 0.7Clock Cycle 1.4L2 Cache Hit 12.9Atomic Increment 58.2Cmpxchg Atomic Increment 107.3Atomic Incr. Cache Transfer 113.2Main Memory 162.4CPU-Local Lock 163.7Cmpxchg Blind Cache Transfer 170.4Cmpxchg Cache Transfer and Invalidate 360.9
![Page 57: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/57.jpg)
57
When to Use NBS?
● Simple NBS algorithm is available– Counting (strictly speaking, only by 1)
● See example from previous section– Simple queue/stack management– Especially if NBS constraints may be relaxed!
● Workload is update-heavy– So that NBS's use of atomic instructions and memory
barriers is not causing gratuitous pain
![Page 58: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/58.jpg)
58
NBS Constraints
● Progress guarantees in face of task failure– Everyone makes progress: wait free– Someone makes progress: lock free– Someone makes progress in absence of contention:
obstruction free● “Linearizability”
– All CPUs agree on all intermediate states● Both constraints mostly irrelevant to Linux
![Page 59: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/59.jpg)
59
RCU
![Page 60: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/60.jpg)
60
What is Synchronization?
● Mechanism plus coding convention– Locking: must hold lock to reference or update– NBS: must use carefully crafted sequences of atomic
operations to do references and updates– RCU coding convention:
● Must define “quiescent states” (QS)– e.g., context switch in non-CONFIG_PREEMPT kernels
● QSes must not appear in read-side critical sections● CPU in QSes are guaranteed to have completed all
preceding read-side critical sections– RCU mechanism: “lazy barrier” that computes “grace
period” given QSes.
![Page 61: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/61.jpg)
61
RCU Fundamental Primitives
● rcu_read_lock(); rcu_read_lock_bh();● rcu_read_unlock(); rcu_read_unlock_bh();● call_rcu(p, f); call_rcu_bh(p, f);● v = rcu_dereference(p);● v = rcu_assign_pointer(p, v);
● synchronize_kernel() vs. synchronize_rcu() vs. synchronize_sched()
![Page 62: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/62.jpg)
62
RCU API Operation
CPU 0
CPU 1
RC
U R
ead
-Sid
eC
riti
c al S
ect i
on
CPU 2
CPU 3
call
_ rcu
()
Ca
llb
ack
sIn
vok
ed
![Page 63: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/63.jpg)
63
How Can RCU be Fast?
● Piggyback notification of reader completion on context-switch (and similar events)
● Kernels are usually constructed as event-driven systems, with short-duration run-to-completion event handlers– Greatly simplifies deferring destruction because
readers are short-lived– Permits tight bound on memory overhead
● Limited number of versions waiting to be collected
![Page 64: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/64.jpg)
64
RCU's Deferred Destruction
CPU 0
CPU 1
RC
U R
ead
-Sid
eC
riti
cal S
e cti
on
RC
U R
ead
-Sid
eC
riti
cal S
ect i
on
call
_ rcu
( )
Co
nte
x tSw
itch
Co
nte
x tSw
itc h
RC
U R
ead
-Sid
eC
riti
c al S
ecti
on
RC
U R
ead
-Sid
eC
riti
c al S
ect i
on
RC
U R
ead
-Sid
eC
riti
c al S
ect i
on
May hold referenceCan't hold reference to oldversion, but RCU can't tell
Can't hold referenceto old version
Can't hold referenceto old version
Co
nte
x tSw
itc h
![Page 65: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/65.jpg)
65
Grace Periods
CPU 0
CPU 1
RC
U R
ead
-Sid
eC
riti
cal S
e cti
on
RC
U R
ead
-Sid
eC
riti
cal S
ect i
on
Del
ete
Ele
men
t
Co
nte
x tSw
itch
Co
nte
x tSw
itc h
RC
U R
ead
-Sid
eC
riti
c al S
ecti
on
RC
U R
ead
-Sid
eC
riti
c al S
ect i
on
RC
U R
ead
-Sid
eC
riti
c al S
ect i
on
Co
nte
x tSw
itc h
Grace Period
Grace Period
Co
nte
x tSw
itch
Grace Period
![Page 66: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/66.jpg)
66
What is RCU? (1)
● Reader-writer synchronization mechanism– Best for read-mostly data structures
● Writers create new versions atomically– Normally create new and delete old elements
● Readers can access old versions independently of subsequent writers– Old versions garbage-collected by “poor man's” GC,
deferring destruction– Readers must signal “GC” when done
![Page 67: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/67.jpg)
67
What is RCU? (2)
● Readers incur little or no overhead● Writers incur substantial overhead
– Writers must synchronize with each other– Writers must defer destructive actions until readers
are done– The “poor man's” GC also incurs some overhead
![Page 68: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/68.jpg)
68
x86 Read-Only Results
![Page 69: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/69.jpg)
69
x86 Results for Mixed Workload
![Page 70: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/70.jpg)
70
x86 Read-Only Results (Large)
![Page 71: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/71.jpg)
71
x86 Mixed Results (Large)
![Page 72: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/72.jpg)
72
Two Types of Designs For RCU
● For situations well-suited to RCU:– Designs that make direct use of RCU
● For algorithms that do not tolerate RCU's stale- and inconsistent-data properties:– Design templates that transform algorithms so as to
tolerate stale data, inconsistent data, or both
![Page 73: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/73.jpg)
73
Designs for Direct RCU Use● Reader/Writer-Lock/RCU Analogy (5)
– Routing tables, Linux tasklist lock patch, ...● Pure RCU (4)
– Dynamic interrupt handlers...– Linux NMI handlers...
● RCU Existence Locks (7)– Ensure data structure persists as needed (K42)– Linux SysV IPC, dcache, IP route cache, ...
● RCU Readers With NBS Writers (1)– K42 hash tables
![Page 74: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/74.jpg)
74
Locking Design Patterns w/RCU
SequentialProgram
CodeLocking
DataLocking
DataOwnership
Partition Fuse
Partition Fuse
Own Disown
Reader /Writer
Locking
Hier.Locking
AllocatorCaches
ParallelFastpath
Critical-SectionFusing
Critical-SectionPartitioning
Inverse
RW/RCUAnalogy
RCUExistenceLocking
RCU ReadWith
NBS Write
RCUPureRCU
![Page 75: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/75.jpg)
75
Reader/Writer-Lock/RCU Analogy
● read_lock()● read_unlock()● write_lock()● write_unlock()● list_add()● list_del()● free(p)
● rcu_read_lock()● rcu_read_unlock()● spin_lock()● spin_unlock()● list_add_rcu()● list_del_rcu()● call_rcu(free, p)
![Page 76: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/76.jpg)
76
Reader-Writer Lock and RCU
int search(long key, int result){
struct el *p;read_lock(&rw);list_for_each_entry(h, p, lst)
if (p- >key == key) {*result = p- >data;read_unlock(&rw);return (1);
}read_unlock(&rw);return (0);
}
int search(long key, int result){
struct el *p;rcu_read_lock();list_for_each_entry_rcu(h, p, lst)
if (p- >key == key) {*result = p- >data;rcu_read_unlock();return (1);
}rcu_read_unlock();return (0);
}
● Search:
![Page 77: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/77.jpg)
77
Reader-Writer Lock and RCU
int delete(long key){ struct el *p; write_lock(&rw); list_for_each_entry(h, p, lst) if (p- >key == key) { list_del(&p- >lst); write_unlock(&rw); return (1); } write_unlock(&rw); return (0);}
int delete(long key){ struct el *p; spin_lock(&lck); list_for_each_entry(h, p, lst) if (p- >key == key) { list_del_rcu(&p- >lst); spin_unlock(&lck); return (1); } spin_unlock(&lck); return (0);}
● Delete:
![Page 78: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/78.jpg)
78
Reader-Writer Lock and RCU
void insert(struct el *p){ write_lock(&rw); list_add(p, h); write_unlock(&rw);}
void insert(struct el *p){ spin_lock(&lck); list_add_rcu(p, h); spin_unlock(&lck);}
● Insert:
![Page 79: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/79.jpg)
79
RCU/Reader-Writer-Lock Caveats
● Searches race with updates– Some algorithms tolerate such nonsense– Others need to be transformed – see later slides
● Updaters still can see significant contention– See earlier locking designs
● There is no way to block readers– Which is the whole point...– See later slides for ways to deal with this
![Page 80: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/80.jpg)
80
Pure RCU
● Delay execution of update until all existing readers are done– See prior “big reference counter” example– The dynamic NMI/SMI/IPMI handlers are another
example
![Page 81: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/81.jpg)
81
Pure RCU: Timeouts and Interrupts● RCU permits dynamic SMI handlers:
spin_lock_irqsave(&(to_clean- >si_lock), flags);spin_lock(&(to_clean- >msg_lock));to_clean- >stop_operation = 1;to_clean- >irq_cleanup(to_clean);spin_unlock(&(to_clean- >msg_lock));spin_unlock_irqrestore(&(to_clean- >si_lock), flags);
synchronize_kernel();while (!to_clean- >timer_stopped) { set_current_state(TASK_UNINTERRUPTIBLE); schedule_timeout(1);}rv = ipmi_unregister_smi(to_clean- >intf);if (rv) printk(KERN_ERR "Can't unregister device: errno=%d\n", rv);
to_clean- >handlers- >cleanup(to_clean- >si_sm);kfree(to_clean- >si_sm);to_clean- >io_cleanup(to_clean);
![Page 82: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/82.jpg)
82
RCU Existence Locks
● Normal existence-guarantee schemes use global locks or per-element reference counts– Subject to contention and cache thrashing– But reference counts are OK if you need to write to
the element anyway!● RCU provides existence guarantees:list_del_rcu(p);synchronize_kernel();kfree(p);
![Page 83: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/83.jpg)
83
Designs for Direct RCU Use● Reader/Writer-Lock/RCU Analogy (5)● Pure RCU (4)● RCU Existence Locks (7)● RCU Readers With WFS Writers (1)
– Only one use thus far, ask me again later!● But what about algorithms that don't like stale data???
![Page 84: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/84.jpg)
84
Stale and Inconsistent Data
● RCU allows concurrent readers and writers– RCU allows readers to access old versions
● Newly arriving readers will get most recent version● Existing readers will get old version
– RCU allows multiple simultaneous versions● A given reader can access different versions while
traversing an RCU-protected data structure● Concurrent readers can be accessing different versions
● Some algorithms tolerate this consistency model, but many do not
![Page 85: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/85.jpg)
85
RCU Transformational Templates
● Substitute Copy for Original● Impose Level of Indirection● Mark Obsolete Objects● Ordered Update With Ordered Read● Global Version Number● Stall Updates
![Page 86: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/86.jpg)
86
Substitute Copy For Original
● RCU uses atomic updates of single value:– Most CPUs support this
● If multiple updates must appear atomic:– Must hide updates behind a single atomic operation in
order to apply RCU● To provide atomicity:
– Make a copy, update the copy, then substitute the copy for the original
● Example in next section
![Page 87: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/87.jpg)
87
Impose Level of Indirection
● Problem: difficult to ensure consistent view of multiple independent data elements– Requires lots and lots of memory barriers
● Solution: place the independent data elements in one structure referenced by a pointer– Then can atomically switch the pointer
● And get rid of most of the memory barriers!!!– Example in next section
![Page 88: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/88.jpg)
88
Mark Obsolete Object
● RCU search structure w/data-locked items:rcu_read_lock();p = search(key);if (p != NULL) spin_lock(&p- >mutex);rcu_read_unlock();● Place a “deleted” flag in each element:rcu_read_lock();p = search(key);if (p != NULL) { spin_lock(&p- >mutex); if (p- >deleted) { spin_unlock(&p- >mutex); p = NULL; }}rcu_read_unlock();return (p);
![Page 89: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/89.jpg)
89
Ordered Update with Ordered Read
● Expanding array (obsolete):/ * update */new_array = kmalloc(new_size * sizeof(*newarray));copy_and_init(new_array, array);smp_wmb();array = new_array;smp_wmb();size = new_size;
/ * read */if (i >= size) return - ENOENT;smp_rmb();p = array;return rcu_dereference(p[i]);
● Usually better to impose level of indirection...
![Page 90: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/90.jpg)
90
Global Version Number
● In Linux, combine seqlock_t with RCU● For example, in dcache lookup:
do { seq = read_seqbegin(&rename_lock); dentry = __d_lookup(parent, name); if (dentry) break;} while (read_seqretry(&rename_lock, seq));
● RCU protects against cache prune and “rm”● seqlock_t protects against “mv” ● Could also place sequence number in dentry to
allow “mass invalidate” of dentries
![Page 91: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/91.jpg)
91
RCU Transformational Patterns
● Substitute Copy for Original (2)● Impose Level of Indirection (~1)● Mark Obsolete Objects (2)● Ordered Update With Ordered Read (3)● Global Version Number (2)● Stall Updates (~1)
![Page 92: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/92.jpg)
92
Putting It All Together
![Page 93: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/93.jpg)
93
entries
2.4 System V Semaphore Locking
0 1 2 3 4 5 6 7
Sem0 Sem4 Sem6
Global spinlock_t sem_ids.aryGlobal sema_t sem_ids.sem
size
![Page 94: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/94.jpg)
94
2.6 System V Semaphore Locking
0 1 2 3 4 5 6 7
Sem0 Sem4 Sem6
Global sema_t sem_ids.semRCU
lock
Each semaphore has a “deleted” flag to force search failure
lock lock
entries
size
![Page 95: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/95.jpg)
95
2.6 SysV Sema Animation (1)
0 1 2 3 4 5 6 7
Sem0 Sem4 Sem6
entries
8
![Page 96: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/96.jpg)
96
2.6 SysV Sema Animation (2)
0 1 2 3 4 5 6 7
Sem0 Sem4 Sem6
1 2 3 4 5 6 70 8 ...
Sem8
entries
8
16
![Page 97: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/97.jpg)
97
2.6 SysV Sema Animation (3)
0 1 2 3 4 5 6 7
Sem0 Sem4 Sem6
1 2 3 4 5 6 70 8 ...
Sem8
entries
8
16
![Page 98: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/98.jpg)
98
2.6 SysV Sema Animation (4)
Sem0 Sem6
1 2 3 4 5 6 70 8 ...
Sem8
entries
16
![Page 99: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/99.jpg)
99
Searching for Semaphore● Search function body (ipc_lock()):
rcu_read_lock();smp_rmb(); / * prevent indexing old array with new size */entries = rcu_dereference(ids- >entries);if(lid >= entries- >size) { rcu_read_unlock(); return NULL;}out = entries- >p[lid];if(out == NULL) { rcu_read_unlock(); return NULL;}spin_lock(&out- >lock);if (out- >deleted) { spin_unlock(&out- >lock); rcu_read_unlock(); return NULL;}return out;
![Page 100: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/100.jpg)
100
Expanding Semaphore Array● Expand-array function body (grow_ary()):
new- >size = newsize;memcpy(new- >p, ids- >entries- >p, sizeof(struct kern_ipc_perm *)*size + sizeof(struct ipc_id_ary));for (i=size;i<newsize;i++) { new- >p[i] = NULL;}old = ids- >entries;rcu_assign_pointer(ids- >entries, new);ipc_rcu_putref(old);return newsize;
![Page 101: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/101.jpg)
101
RCU Sem Micro-Benchmark
Kernel Run 1 Run 2 Avg2.5.42-mm2 515.1 515.4 515.32.5.42-mm2+ipc-rcu 46.7 46.7 46.7
Numbers are test duration, smaller is better.
![Page 102: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/102.jpg)
102
RCU Sem DBT1 Performance
Kernel Average2.5.42-mm2 85.0 7.52.5.42-mm2+ipc-rcu 89.8 1.0
Standard Deviation
Numbers are transaction rate, larger is better.
![Page 103: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/103.jpg)
103
The Road Ahead
![Page 104: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/104.jpg)
104
Uniprocessor Unbound
![Page 105: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/105.jpg)
105
Uniprocessor With Friends
![Page 106: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/106.jpg)
106
Multithreaded Mania
![Page 107: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/107.jpg)
107
More of the Same
![Page 108: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/108.jpg)
108
Crash Dummies Slamming into the Memory Wall
![Page 109: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/109.jpg)
109
Your Predictions?
Copyright © 2004 Melissa McKenney
![Page 110: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/110.jpg)
110
My Guess...
Somewhere between Multithreaded Mania and More of the Same, with both hardware threading
and multicore dies.
PPC SMT, x86 HT, Cell Processor, Niagara, ...
![Page 111: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/111.jpg)
111
Summary and Conclusions
![Page 112: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/112.jpg)
112
UseUsethe right toolthe right toolfor the job!!!for the job!!!
Copyright © 2004 Melissa McKenney
![Page 113: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/113.jpg)
113
Legal Statement● This work represents the view of the author, and does not
necessarily represent the view of IBM.● IBM, NUMA-Q, and Sequent are registered trademarks of
International Business Machines in the United States, other countries, or both.
● Pentium is a registered trademark of Intel Corporation or its subsidiaries in the United States and other countries.
● Linux is a registered trademark of The Open Group in the United States and other countries.
● Other company, product, and service names may be trademarks or service marks of others.
![Page 114: Linux Kernel Scalability: Using the Right Tool for the Job · L2 Cache Hit 12.9 Atomic Increment 58.2 Cmpxchg Atomic Increment 107.3 Atomic Incr. Cache Transfer 113.2 Main Memory](https://reader033.vdocument.in/reader033/viewer/2022050603/5faa3a8e4355b0175644d598/html5/thumbnails/114.jpg)
114
BACKUP