understanding concurrency, performance optimizations, and … · understanding concurrency,...
TRANSCRIPT
Understanding Concurrency, Performance
Optimizations, and Debugging for Multicore Platforms
Multicore Programming Practices
Markus Levy
President
Multicore Association
Rob Oshana
Dir. Global SW R&D
Networking and Multimedia
Freescale Semiconductor
David Stewart
CEO and Co-Founder
CriticalBlue
The Multicore Association
• Established in 2005
• Mission: Improve time to market through the use of industry standards
• Membership: board, working group, university
• Committee-based standards development
Multicore Association Board Members
Multicore Association University Members
Multicore Association Working Group Members
Multicore Association Accomplishments • Multicore Communications API (MCAPI) 2.0
– Over 2000 downloads – Semantic for communication and synchronization between processing cores in
embedded systems. – Growing number of implementations: www.multicore-association.org/products/index.php
– Discussing possible options for MCAPI extensions to support accelerators – Low-level layer for higher-level programming models
• Check out www.embedded.com - What the new OpenMP standard brings to embedded multicore software design
• Multicore Resource Management API (MRAPI) – Over 1000 downloads – Capabilities required by multicore applications to allow coordinated
concurrent access to system resources (i.e. memory, mutexes)
• Multicore Programming Practices Guide (MPP)
– 120+ pages dedicated to various multicore programming techniques
Join These Active Working Groups • Tools Infrastructure Working Group (TIWG)
– Tool interoperability for multiple IDEs – CE Linux Forum collaboration – Chaired by: Brian Cruickshank (TI) and Aaron Spear (VMWare)
• Multicore Virtualization Working Group (MVWG)
– Profiling different processor virtualization features – Preliminary specification available for review – Chaired by Rajan Goyal (Cavium) and Surender Reddy (NSN)
• Multicore Task Management API (MTAPI)
– Leveraging task parallelism on embedded devices (homogeneous or heterogeneous multicore processors).
– Dynamic scheduling and mapping tasks to processor cores – Chaired by Urs Gleim (Siemens)
General Programming Issues • Fact: C/C++ will be predominant programming language
for at least 8 years
• Problem: While we wait for long term research results, the
multicore programmability gap is opening rapidly
What Does The Industry Need Right Now?
• Continue with long term research into languages, methodologies, etc
• Short term direction as to how today’s embedded C/C++ code may be written to be “multicore ready” today
• Influence of a group of like-minded methodology experts to ensure completeness, usefulness and industry-wide compatibility
• The creation of a standard “best practices” guide through a recognized, neutral industry body, based on capturing current best practices
Action - Multicore Programming Practices Working Group
• Best practices for writing multicore-ready software using C/C++ without extensions
• Allow embedded software to be more easily compiled across a range of multicore processor platforms
• Framework of common pitfalls when transitioning from serial to parallel
• Consider solutions or avoidance tactics • Minimize debugging efforts by reducing bugs
Multicore Association: Multicore Programming Practices WG
Multicore Programming Practices (MPP) The creation of MPP, a best practices guide to the writing of C/C++ embedded software, such that it may be more easily compiled across a range of multicore processor platforms. MPP will be an open document, possibly a book or booklet, created by a working group operating under the Multicore Association standards body, and constructed in layers such that initial works may be delivered quickly, while being further refined. The document could also form the basis of future Association standards.
MPP - Getting Started • Purpose: Provide an initial series of discussion points to
kick-start the program and provide the benefit of a multi-year development project Critical Blue MPP Contribution
A framework of methodology considerations and examples of commonly observed programming issues together with their solutions, with performance analysis where appropriate
The Essence of MPP • Introduction & Business Overview • Overview of Available Technology • Analysis and High Level Design • Implementation and Low-Level Design • Debug • Performance Tuning • Glossary
MPP - Summary • Problem To Solve: While we are waiting for long term
research results, the multicore programmability gap is opening rapidly
• Action Taken: MPP Working Group – Best practices for writing multicore-ready embedded software
• Objective Met: Release of MPP guide to meet immediate needs of multicore stakeholders in an open, efficient and effective manner
Using IP Forwarding as an MPP Case Study • System is typically partitioned into control plane and data plane • Control plane runs control protocols and provides management capabilities • Data plane performs the real-time processing for data packets
Receive packets
FIB Lookup LPM Table
ARP Lookup Hash Table
Egress Pipe
Ingress Pipe
Classify Table Hash Table
Scheduling among 128 Groups
•4K table entries •Key is dest IP, result is next hop IP
•4K table entries •Key is next hop IP, result is next hop MAC
•128K table entires •Key is 5 tuples, result is DSCP, group_id, queue_id, color
Sending packets
L2 process proto-check
•Only IP protocol need
Traffic metering
en-queue (Tail drop, WRED)
Modifying Layer2 header
MPP Case Study • Re-partition the application and optimize it to meet performance goals • Freescale followed the following MPP guidelines called out in chapter 3
– Prepare – Measure – Tune – Assess
1. Analyzed customer application 2. Partitioned application to multiple cores (parallel and/or pipeline operation) 3. Measured performance to see if it’s as expected (iterate) 4. Collect debug info/statistics to locate bottlenecks 5. Fine-tuned partitioning design based on above collection to eliminate
bottlenecks 6. Intelligent use of data path acceleration capabilities for further
optimizations 7. After several iterations, we met our performance goal and concluded the
exercise
What We Did
Control core(s)
Rx
dTSEC
Ingress core group
Egress core group
soft-queue
Ingress pipe
Classify
L2 process
FIB lookup
ARP lookup
enqueue
Tx
L2 modify
shaping
dequeue
Control plane
configure
protocols
authenticate
management
Egress pipe
FMan/BMan/QMan
Before Applying MPP Guidelines; Multicore Partition – Ingress and Egress
Performance Results – Initial Partition
Module Instructions Cycles
Rx() 130 107
L2i() 43 66
Classify() 322 470
Ip_route_lookup() 85 87
Arp() 162 174
Queue_enque() 180 660
Total 922 1564
Egress-core:
Module Instructions Cycles
3 ColorBlind_srTcm() total 163 326
Queue_deque_pq/drr () 318 654
Queue_deque() 555 1216
Total 555 1216
After Applying MPP Guidelines; Multicore Partition – Ingress and Egress
Rx
dTSEC
Ingress core group
Egress core group
QMan FQ
Ingress pipe
Classify
L2 process
FIB lookup
ARP lookup
qman_
enqueue
Tx
L2 modify
shaping
dequeue
Egress pipe
FMan/BMan/QMan
enqueue
qman_poll
Performance Results – After Re-Partition
Module Instructions Cycles
Rx() 130 107
L2i() 43 66
Classify() 322 470
Ip_route_lookup() 85 87
Arp() 162 174
Qman_enqueue() 120 118
Total 862 1022
Egress-core:
Module Instructions Cycles
Qman_poll() 40 40
Queue_enque() 143 151
3 ColorBlind_srTcm() total 163 192
Queue_deque_pq/drr() 318 340
Queue_deque() 352 375
Total 535 566
• Partitioning and optimizing multicore software requires a disciplined process
• Use an iterative approach to achieve faster time to market, or ‘time to performance goals”
• MCA Multicore Programming Practices can serve as a useful guide for developers involved in all phases of multicore software development
Case Study Conclusions