multicore hardware experiments in software producibility: kickoff meeting

18
Arizona’s First University. This work is supported by the Air Force Research Laboratory, under award #FA8750-08-1-0024, titled “MultiCore Hardware Experiments in Software Producibility”. MultiCore Hardware Experiments in Software Producibility: Kickoff Meeting Jonathan Sprinkle (University of Arizona), and Brandon Eames (Utah State University)

Upload: adonica

Post on 22-Jan-2016

20 views

Category:

Documents


0 download

DESCRIPTION

MultiCore Hardware Experiments in Software Producibility: Kickoff Meeting. Jonathan Sprinkle (University of Arizona), and Brandon Eames (Utah State University). Overview. Motivations Goals, Assumptions, and Constraints Approach Candidate algorithms/systems Experiment Setup Example - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: MultiCore Hardware Experiments in Software Producibility: Kickoff Meeting

Arizona’s First University.

This work is supported by the Air Force Research Laboratory, under award #FA8750-08-1-0024, titled “MultiCore Hardware Experiments in Software Producibility”.

MultiCore Hardware Experiments in Software Producibility: Kickoff Meeting

Jonathan Sprinkle (University of Arizona), andBrandon Eames (Utah State University)

Page 2: MultiCore Hardware Experiments in Software Producibility: Kickoff Meeting

26 March 2008 MultiCore Hardware Experiments in Software Producibility--Kickoff Meeting

Overview

• Motivations

• Goals, Assumptions, and Constraints

• Approach– Candidate algorithms/systems– Experiment Setup Example– Hardware choices– Metrics and measuring

• Plan– Timeline– Division of Labor

• Questions

Page 3: MultiCore Hardware Experiments in Software Producibility: Kickoff Meeting

36 March 2008 MultiCore Hardware Experiments in Software Producibility--Kickoff Meeting

Why are multi-core real-time systems different?

Process 1

Process 1

Process 2

Process 1

Process 1

Process 2

Process 2

Process 3

Process 3

Process 3

Process 2

Process 2

Process 2

Process 2

Process 4

Process 4

Process 4

Process 2 reads output from Process 3 here

Process 1

Process 1

Process 2

Process 1

Process 1

Process 2

Process 2

Process 3

Process 3

Process 3

Process 2

Process 2

Process 2

Process 2

Process 4 reads output from Process 1 here

Process 4

Process 4

Process 4

time

With interleaved threads, the processes acted nicely on a single core, but with multi core, threads must be synchronized.

Process 2 reads output from Process 1 here

Processes designed for distributed processing will work fine, but there may be some real-time tasks which “just work” for single-core systems due to the prevalence of “weak testing” (if it works, don’t try to fix it!).

Page 4: MultiCore Hardware Experiments in Software Producibility: Kickoff Meeting

46 March 2008 MultiCore Hardware Experiments in Software Producibility--Kickoff Meeting

Motivations

• Real-time systems are often subject to subtleties even, for single-core machines

– Cache size, HDD access times, interrupt timings, etc., can all affect stability of the system if depended upon unwisely

• A few of the possible problems:– Even though we only utilize a single thread, there may be a second core

available, but not utilized by this application; however, other applications are now free to make shared resource conflicts

– Synchronized threads, whose timing is okay on one processor, but • when using multiple cores, the processes execute too fast• when accessing shared resources, conflict occurs

– Non-synchronized, multi-threaded processes, where interleaving of commands involving a third process slows down one process enough for stability, but without the third process, system is unstable

Page 5: MultiCore Hardware Experiments in Software Producibility: Kickoff Meeting

56 March 2008 MultiCore Hardware Experiments in Software Producibility--Kickoff Meeting

Goals

• Demonstrate potential performance gains of legacy software using multicore processors

– with high-confidence of safe execution– with the ability to know whether there are dangers of stability– with side-by-side comparisons of executions using single-core processors

• Produce exemplar experiments, where – measurements for the system are taken, – the system is composed using off-the-shelf components, and – documentation for how the experiment was performed is created, allowing

someone else to duplicate the experiment

• Give specific examples for testing– Data-in-the-loop (DIL)– Simulator/Software-in-the-loop (SWIL)– Hardware-in-the-loop (HWIL)

Hokuyo Laser Sensor

Page 6: MultiCore Hardware Experiments in Software Producibility: Kickoff Meeting

66 March 2008 MultiCore Hardware Experiments in Software Producibility--Kickoff Meeting

Assumptions and Constraints

• New infrastructure will not be created for the project

• New research in metrics and measurements are not expected

• Only lightweight software will be written (configuration file, glue code, perhaps some variants of execution, but ~1000 lines, not ~100k lines).

• Existing off-the-shelf software and middleware can be used for an engineering (and preferably DoD-related) application

– Emphasis will be placed on open-source tools– Metrics and other measurements are possible with such tools

• Existing simulators and data for the *-in-the-loop can be utilized

• Hardware developed in related research programs may be used for HWIL, especially for testing highly-parallel algorithms to simulate future multi-core (core>>2) processors

http://playerstage.sourceforge.net/Real data from autonomous runs is available for use.

Page 7: MultiCore Hardware Experiments in Software Producibility: Kickoff Meeting

76 March 2008 MultiCore Hardware Experiments in Software Producibility--Kickoff Meeting

Approach: Candidate Algorithms/Systems

• Multi-component autonomous vehicle simulation– Simulation of vehicle dynamics– Simulation of environment (obstacles, other vehicles, etc.)– Real-time path-planning with obstacle avoidance (key algorithm, with multi-

core extensions)– Currently encoded as a distributed system, but capable of simulation on a

Core2 Duo Laptop in VmWare (so not exorbitantly slow!)

• Advantages:– Existing data--using actual vehicle and trajectory experiments– Existing simulators--3D simulators, featuring hardware acceleration, capable

of being turned off to simulate older processors, or to ‘hit’ the cache– Existing software--all components to run this demonstration already exist in

open-source, permitting the free use for future persons wishing to run the experiments

– Familiarity--Sprinkle was Team Leader for the research group who put together the software

– Potential for multi-core acceleration (real-time path planner is key)– Potential for follow-on algorithm development (computer vision, etc.)

Page 8: MultiCore Hardware Experiments in Software Producibility: Kickoff Meeting

86 March 2008 MultiCore Hardware Experiments in Software Producibility--Kickoff Meeting

Autonomous Vehicle Simulation and Experiments: basicsim

For this simulation, all vehicle components use simulated information that comes from dedicated simulators. The components gridmap, faithlocaliser3d, dgclocalnav, and highlevelplanner, each run independently of the data source. The components laser{3,2,1}, Car, and imu retrieve data from simulators.

In addition to gathering simulated data, these data-source components can replay data that has been logged.

Page 9: MultiCore Hardware Experiments in Software Producibility: Kickoff Meeting

96 March 2008 MultiCore Hardware Experiments in Software Producibility--Kickoff Meeting

Autonomous Vehicle HWIL: vehiclecheck

For this simulation, all vehicle components use simulated information that comes from hardware devices. The components gridmap, lanedetector dgclocalnav, and highlevelplanner, each run independently of the data source. The components laser{3,2,1}, Car, and insgps retrieve data from hardware (and store it in a log repository). Note that previous component faithlocaliser3d is not needed, since insgps provides localization information,

In addition to gathering simulated data, these data-source components can replay data that has been logged.

Page 10: MultiCore Hardware Experiments in Software Producibility: Kickoff Meeting

106 March 2008 MultiCore Hardware Experiments in Software Producibility--Kickoff Meeting

Why is this component-based design relevant?

Simulator 1

Data Log 1

Simulator 2

Algorithm(on single-core)

Algorithm(on multi-core)

Algorithm(on hardware)

Simulators and data sources provide nondeterministic, and deterministic, data source comparisons for varying implementation platforms for these high-level algorithms. Thus, we can gather performance requirements of each, with controlled data. Finally, this data is, in many cases, gathered from the actual vehicle!

Page 11: MultiCore Hardware Experiments in Software Producibility: Kickoff Meeting

116 March 2008 MultiCore Hardware Experiments in Software Producibility--Kickoff Meeting

Approach: Hardware Choices

• Multi-core processors: integrated processing power, shared + distributed memory architectures

• Commodity processors widely available on the market

– Performance impact not well understood

• Commercial products:– Intel Core 2 Duo, Quad series– AMD Athlon X2, Phenom series

• Basic idea: multiple processors on a single die– Major differences in memory subsystems

• Intel: Symmetric dual core processors, 2 shared 2MB L2 caches

• AMD: 4 individual cores, 4 individual 512 KB L2, 1 shared 2 MB L3 on-die, on-die DDR controller

Page 12: MultiCore Hardware Experiments in Software Producibility: Kickoff Meeting

126 March 2008 MultiCore Hardware Experiments in Software Producibility--Kickoff Meeting

Approach: Metrics and Measuring

• Goal: Understand the performance impacts of multicore processing through profiling and measurement

• Relevant metrics:– Wall-clock execution time

– Memory hierarchy profiles (L2 Cache misses, page faults)

– Fine-grained timing (execution time per function)

• Measurement approach– Profile-based analysis using available profiling tools

– Standard hardware benchmark platforms

• Single core multi-processor platform

• Dual core uni-processor

• Quad core uni-processor

– Execution of selected software on each benchmark platform

• Profiling tools to capture, per process

– Level-2 cache misses

– Number of page faults

– Histograms of function call frequency, execution time

• OS calls to measure wall-clock time

Page 13: MultiCore Hardware Experiments in Software Producibility: Kickoff Meeting

136 March 2008 MultiCore Hardware Experiments in Software Producibility--Kickoff Meeting

Measurement Tools

• Multiple profiling tools are either commercially or freely available– Many target only single-threaded programs

– None implement all types of measurements simultaneously

• GNU profiling tools– gcov, gprof

– Function call histogram and timing, via sampling

– Single-thread only (but workarounds exist for multithread profiling

– Function coverage, branch execution frequencies

– http://gcc.gnu.org/onlinedocs/gcc/Gcov.html

– http://www.gnu.org/software/binutils/manual/gprof-2.9.1/gprof.html

• Valgrind– Memcheck: detects erroneous memory usage, memory leaks

– Cachegrind: detailed simulation of cache behavior (on both L1 and L2 caches)

– Callgrind: callgraph analysis

– Freely available, Linux based

– Automatic (heavy) instrumentation of code

– http://valgrind.org

Page 14: MultiCore Hardware Experiments in Software Producibility: Kickoff Meeting

146 March 2008 MultiCore Hardware Experiments in Software Producibility--Kickoff Meeting

Measurement Tools

• VTune– Intel-developed commercial performance measurement tool– Support for threads and Intel multicore processors– http://www.intel.com/cd/software/products/asmo-na/eng/23914

4.htm

• Tau– Multiprocess profiling tool– Requires manual instrumentation of software– Freely available– http://www.cs.uoregon.edu/research/tau/

Page 15: MultiCore Hardware Experiments in Software Producibility: Kickoff Meeting

156 March 2008 MultiCore Hardware Experiments in Software Producibility--Kickoff Meeting

int UMHEXIntegerPelBlockMotionSearch (…)… iXMinNow = best_x; iYMinNow = best_y; for (m = 0; m < 4; m++) { cand_x = iXMinNow + Diamond_x[m]; cand_y = iYMinNow + Diamond_y[m]; SEARCH_ONE_PIXEL }

function UMHEXIntegerPelBlockMotionSearch called 291600 returned 100% blocks executed 95%…

201488: 352:/*EOF*/ 201488: 353:/*EOF*/ 1007440: 354:/*EOF*/ 201488: 354-block 0 805952: 354-block 1 201488: 354-block 2

Example profile: gcov + gprof for H.264 video encoder

– H.264 Encoder UMHex motion estimation algorithm

Page 16: MultiCore Hardware Experiments in Software Producibility: Kickoff Meeting

166 March 2008 MultiCore Hardware Experiments in Software Producibility--Kickoff Meeting

Plan

Task/Breakdown Schedule Kickoff Meeting Via telecon January, 2008 Algorithm/Task Specification Legacy Software Choice

Real-time emphasis February 29, 2008

Hardware specification Computational hardware Embedded hardware Processor families

April 30, 2008

Interim Review (optional) Tucson, AZ May, 2008 Test integration SWIL testing

Hardware integration HWIL testing

August 31, 2008

Final Review Rome, NY August, 2008

Page 17: MultiCore Hardware Experiments in Software Producibility: Kickoff Meeting

176 March 2008 MultiCore Hardware Experiments in Software Producibility--Kickoff Meeting

Division of Labor

• Sprinkle– Experiment setup, including software choices– Sample experiments, including component choices– Write-ups for experiments, intended as a resource for future users

• Eames– Profiling and measurements– Hardware choices, including optimization choices– Experiment performance, and writeup, based on “future user” write-ups by

Sprinkle – Quantitative comparisons of single, dual, multi-core experiments

Page 18: MultiCore Hardware Experiments in Software Producibility: Kickoff Meeting

186 March 2008 MultiCore Hardware Experiments in Software Producibility--Kickoff Meeting

Questions