designing high-performance network elements using multiprocessing technology and adaptive...
TRANSCRIPT
Designing High-Performance Network Elements Using Multiprocessing Technologyand Adaptive Partitioning
Peter van der VeenQNX Software Systems
2
Typical Hardware Architecture
Network
Network
Chassis
Network
Network
Network...
Hig
h-s
pe
ed
inte
rco
nn
ect
Lo
w-s
pee
d b
us
Line card
Control card
3
Typical Netcom System Software Constraints
Kernel(RTOS)
Application
TCP/IPStack
Filesystem
DeviceDriver
Application
SS7Stack
Many millions of lines of code
Tens to hundreds of S/W components
Hundreds to thousands of processors and threads
Strict availability requirements
DeviceDriver
DeviceDriver
4
Software Architecture
Thread E
Thread B
Thread C Thread DThread B
Thread BThread A
ROUTE MANAGERThread D
FILE SYSTEM ETHERNET DRIVER
QNX NEUTRINO REALTIME SCHEDULER (OS)
PRIORITY
Thread A
Thread C Thread E
MEMORY
CPU
CACHE
CPU
CACHE
CPU
CACHE
CPU
CACHE
HIGH-BANDWIDTH CPU BUS
Multiple processors sharing common hardwareCommon memory bus and
address spaceAccess to all peripheral devices
and interruptsOS manages tasks running on
processors – true concurrency Transparent to application
programs No incremental hardware No application software
changes needed
6
SMP Memory Organization
e600 core1
e600 core0Apps "A"
Apps "B"
OS
Apps "A"
Apps "B"
Sharedmemory
Physicalmemory
OS
Apps "A"
Sharedmemory
OS
Apps "B"
Sharedmemory M
MU
The OS kernel resides at physical memory address 0, addressable by both cores
The MMU relocates applications and shared memory appropriately
OS
MM
U
7
Making the Most of SMP
Concurrency … divide and conquer► Write software components using threads
► Remove serializations from dataflow
Caches … keep them hot► Minimize writes to globally shared data
► Process data on the same processor where possible
Scheduling … get your ducks in a row► Take advantage of the OS scheduler
► Use diagnostic tools to adjust runmasks and priorities
8
SMP Optimizing Tools
System Profiler► Provide a timeline view of activity
in the system
► Identify resource contention and serialization
► Analyze SMP scheduling thrashing
► Visualize distributed message passing
CPU Performance Counters► Count operations such as cache
misses
► Statistically sample based on significant events
10
Introducing Adaptive Partitioning
What is Adaptive Partitioning?► Adaptive partitioning is a new QNX product that extends the Neutrino RTOS► Allows you to build secure compartments or “partitions” around a set of
applications or threads► Partitions enforce CPU guarantees for applications, controlled by easy to
use budgets
Why is it Adaptive?► Patent-pending design ensures all available CPU cycles are given to
partitions that need processing time – no CPU cycles wasted
► Provides performance advantage by permitting full processor utilization to accommodate spikes in demand
Easy to get started► No changes to how designers work today
POSIX programming model for the same, familiar design, programming & debugging techniques
► No code changes are required to implement partitions
11
0% 20% 40% 60% 80% 100%
System Restart
Steady State
TopologyChange
Reconfiguration
Routing & Forwarding
ManagementInterfaces
(CLI, SNMP)
5%
10% 70% 20%
5%
10%
95%
80%
90%
ProcessingLoad
Scenarios
Understanding “Adaptive”
Maintenance
Idle Time10%
5%
12
MaintenanceManagement
Defining Partitions
Management Interface
QNX Neutrinomicro-kernel
Routing & Forwarding
Maintenance
Given the processing scenarios, choose a partitioning approach and appropriate partition budgets
5% 75% 20%
Routing & Forwarding
13
0% 20% 40% 60% 80% 100%
Restart
Steady State
TopologyChange
Reconfigure
0% 20% 40% 60% 80% 100%
Restart
Steady State
TopologyChange
Reconfigure
Management Interface
QNX NeutrinoMicrokernel Maintenance
Routing & Forwarding
5% 75% 20%
Understanding “Adaptive” Partitioning
10% 10%
10%
5%
5%
5% 75% 20%
95%
80%
85%
Adaptive
75% 20%
10%
10%
75%
75% 10%
20% 75% 5%
5%
5%
5%
Static
10%
CPU Time wasted when partitions do not consume their
budget. Applications cannot benefit from available time.
10%
Adaptive: Budgets enforced when CPU is loadedAdaptive: Applications can
use free CPU time if available from other partitions
15
Security Threats
Embedded systems are becoming network connected► Untrusted interfaces and network threats► Untrusted add on software
If appropriate measures are not included by design, your product’s security and availability can be compromised
Rogue software can launch denial of service (DOS) attack and starve core applications of CPU time
► Need to ensure untrusted, add-on software can be contained to guard against attacks
Distributed DOS attacks can busy your system with network processing
File System
Networking Core Application
CoreApplication
QNX NeutrinoMicrokernel
Add-On
Add-On
Device Drivers
CoreApplication
Networking stack hogging CPU time
Rogue add-on stealing CPU time
16
Partitioning to Contain Threats
Create OS enforced partitions to ensure critical system resources are protected
► Ensure CPU available for core functions► Partition inheritance ensures applications get CPU time for OS
services (such as drivers, file systems, networking)
Contain threats and protect core applications► Limit impact of rogue applications
File System
Networking Core Application
CoreApplication
QNX Neutrinomicro-kernel Add-On
Add-On
Device Drivers
CoreApplication
Networking
Consuming CPU Time
Rogue add-on
thwarted
18
Partition Accounting
What does “30% CPU Budget” mean?► CPU usage is calculated over a sliding window.
► Partition budget guaranteed percentage of cpu time, balanced over sliding window
► Partition usage CPU time executed, during last sliding window, expressed as percentage
Accuracy► Counting ticks is not enough. “Micro-billing” is used to track actual CPU utilization even when
threads don’t use their whole timeslice
► Micro- and nano-second resolution
► Threads are billed based on real usage, not statistics
“windowsize” is configurable as an argument to kernel at boot► Tradeoff maximum READY-state latency with accuracy of CPU budgeting
100ms window -> 1% accuracy or better.
► Internal arithmetic accurate to 0.5% or better
T= nowT= -100ms
User InterfaceQNX NeutrinoMicrokernel
DiagnosticsRoute
CalculationData
Acquisition
30% 40% 30%
19
Behavior During Normal Load
Ready
6
67
4 1010
Hard real-time, priority based scheduler under normal load
Running thread selected as highest priority READY thread
No delay on scheduling if adaptive partition has budget
CPU BudgetAvailable
CPU BudgetAvailable
Blocked
6
118
99
Running
20
Behavior During Overload
Ready
6
67
410
CPU BudgetAvailable
CPU BudgetExceeded
Blocked
6
118
9
Runs beforehigher priority
Partition budgets are enforced when the CPU is fully loaded
Highest priority READY thread in partition with budget runs
No delay on scheduling if partition has budget
Ready – No Budget
21
Behavior with Free CPU Time
Blocked Running
6
118
6
67
4
If no partitions with remaining budget have READY threads, highest priority READY thread is selected to run from other partitions
This allows “free” time to be given based upon priority► “Free” time is still accounted and may have to be paid back (for example, if partition 3 becomes ready within 1 averaging window)
6
10
10
8
Blocked
CPU BudgetExceeded
CPU BudgetExceeded
CPU BudgetAvailable
1099 10
Ready
22
Partition Inheritance
When a server process does work requested by a client, the time is “billed” to the client
Prevents runaway client processes from monopolizing system services such as device drivers and server processes
Ensures fair CPU scheduling Allows you to create servers and assign server budgets independent of
number of clients Builds on Neutrino micro-kernel and client-server, message passing
architecture
QNX NeutrinoMicrokernel
File System Application
ThreadsThreads
Inheritance: File System operation uses application’s budget
23
30
Borrowed Time: Critical Threads
Blocked
Running
Ready
6
118
11
6
67
4
30
Critical threads still run (based on priority) even if partition has no budget
Critical threads provide deterministic scheduling even in overload
Critical threads are given critical budget and can go into short-term debt► Critical time is accounted and has to be repaid
► Exceeding critical budget is considered an error and causes notification/action
CriticalThread11
CPU BudgetExceeded
CPU BudgetAvailable
24
Adaptive Partition APIs and Utilities
Control of Adaptive Partitioning Scheduler is done through a kernel API
► API is restricted to privileged processes (root)► Must be called from within default (system) partition► Partitions are created with budget (normal and possibly critical)
“aps” system utility provided► “aps” utility part of adaptive partitioning package► Can be used to create and modify partitions► Also provides usage stats over time► Use “on” to launch processes into partitions
Boot script syntax extended► Define partitions within the build file► Launch processes into specific partitions
Partition configuration completely dynamic► Can create partitions, modify budgets at runtime► Averaging window can also be changed at runtime
25
Getting Started with Adaptive Partitioning
Install Adaptive Partitioning
Step 1 Step 2
Build ImageDefine
Partitions and Budgets
Step 3
Launch Applications In Partitions
Step 4
CODECHANGES
POSIXPROGRAMMING
ALLOWED
26
Summary
SMP is a key enabler for enhancing scalability
SMP delivers measurable performance gains in real-world applications
QNX provides transparent support for SMP systems
Adaptive partitioning can be used to increase your systems security and availability
Adaptive partitioning is easy to apply to existing designs and implementations
Adaptive partition helps you integrate complex systems to improve time to market