Software Engineering in RoboticsSystems Evaluation and Benchmarking
Henrik I. Christensen – [email protected]
Outline Introduction Design of a system model Definition of performance / metrics Systems evaluation Tools to assist in testing Summary
Introduction It is well known that systems verification is hard What are ways through which we can ease the task Structured testing is essential and not a post-hoc
task Unit testing System testing Systems integration tests Systems delivery tests Acceptance/Reference based benchmarking
Testing should be considered from the start Can you do early verification in simulation? How can each module be tested independently? Is it possible to generate reference data for verification
Unit testing Testing of the smallest possible unit to verify
function In procedural programming this could a
function Define “contract” for units and embed code to
ensure performance and Verification of code after re-factoring
Visual Studio has special classes/NameSpace for unit testing
Small Example interface Adder {
int add(int a, int b);}
class AdderImpl : Adder {int Adder.add(int a, int b) {
return a + b;}
}
public class TestAdder { public void testSum() {
Adder adder = new AdderImpl(); Assert(adder.add(1, 1) == 2); Assert(adder.add(1, 2) == 3); Assert(adder.add(2, 2) == 4);
} }
nUnit There are a number of unit testing tools available.
jUnit is used for java and nUnit is used for .NET frameworks
Example based on .NET/C# using NUnit.Framework;
[TestFixture] public class ExampleTestOfNUnit {
[Test] public void TestMultiplication() { Assert.AreEqual(4, 2*2, "Multiplication"); // Equivalently, since version 2.4 NUnit offers a new and // more intuitive assertion syntax based on// constraint objects
// [http://www.nunit.org/index.php?p=constraintModel&r=2.4.7]: Assert.That(2*2, Is.EqualTo(4), "Multiplication constraint-based"); }
}
Integration Testing Integration testing is about evaluation of
interfaces to make sure the integrated system has the desired functionality
Several different models Big-Bang Testing – Putting it together and se what
happens? Use Case modelling
Verify the different use cases for module interaction What interaction / interfaces generates what actions? Consider full coverage of module/class interaction
states
System Testing This is a problem that is widely studied to
ensure that systems to be delivered to a “customer” satisfies a broad set of requirements
IEEE has a standard 829 that exclusively focuses on Software Testing Initial version 1998 and a revision by 2008
Tests to consider GUI software testing
Usability testing
Performance testing
Compatibility testing
Error handling testing
Load testing
Volume testing
Stress testing
Security testing
Scalability testing
Sanity testing
Smoke testing
Exploratory testing
Ad hoc testing
Regression testing
Reliability testing
Installation testing
Maintenance testing
Recovery testing and failover testing.
Accessibility testing,
In testing think about engagement What are the critical components? What are support components? Focus resources correspondingly
[www.implementingscrum.com]
Simulation Given the easy of moving between simulation
and real systems. Consider use of simulation for early testing Use abstract interfaces to easy use of models
Differential Drive Robots Web Cams Range Scanners Odometry …
The risk is minimal and allows early verification and minimum cost
Simulation systems RDS Simulation Engine w. PhysX USARSim w. UnrealTournament ROS Gazebo
Derived for the Player Stage Gazebo project GraspIt for grasp simulation There are also a number of commercial
systems KUKASim V-REP – Virtual Reality Experimentation …
Think about how you can design a system What is a good underlying model for your
system? Can you provide a performance model? What are good parametric tests to verify
performance?
16
System / Project Objectives
Hypothesis formulation Construction of a system Verification of work Reporting
17
Research hypothesis
Definition of a well formed objective that can be subjected to testing using standard scientific methods Optimality, existence, … Examples
“Integration of behaviours using multi-objective decision making is Pareto optimal”
“Integration of multiple cues improves robustness”
18
Research Approach
Inspired by Marr (1981)1) Formulation of theory for problem
Definition of mathematical basis
2) Formulation of algorithm for theoryDesign of algorithm & data structures using standard
methods (space/time efficiency)
3) Implementation of algorithmTransfer to computational platform (data types, ….)
19
Verification Benchmarking of systems
Use of standard data sets or definition of reference data/scenario
Using standard methods for hypothesis testing, i.e. 2 test of statistics or similar
Empirical testing using real-world data
20
Example: Estimation of structure
Estimation of the size of a junction of an object
Hypothesis “The size of a junction can
be estimated without any calibration of the camera and through use of qualitative control”
21
Observation
Line length is unstable (end-points uncertain)
Orientation a line property not a point property
Use of orientation preferable
24
Implementation Camera system to look at objects Regular experimental C code with simple
image processing Not quality software by any measure
27
Evaluation Experiments carried out on >100 test objects About 40000 images processed Accuracy of estimation ~1 deg Hypothesis verified using theory and empirical
tests
28
Benchmarking: 10 Major Objections Evaluation is task dependent The module is part of a ‘system’ Vision/robotics is too complex The models/assumptions used are wrong or
incorrect Metrics are not comparable Theory is not available for many well-known
method There are too many parameters Ground truth is expensive to obtain Simulations cannot replace real
experiments Benchmarking is not acknowledged
29
Cultural differences Cowboy research
It ‘works’, why bother with a theoretical analysis? Puritan research
The proof is in the theory! Scientific research
Validation across laboratories/researchers
Benchmarking Example The virtual manufacturing challenge
www.cma-competition.com AGV Navigation Mixed Palletizing
Datasets Use of reference factory layout for navigation and
control Use of reference order data from bottling plant and
distribution centers The challenge: Can you beat the industry standards?
Executed each year at ICRA with competition across the world
Consider use of reference data sets There are a number of reference datasets out there
Comparative performance Navigation
Radish: Robot Data Repository http://radish.sourceforge.net/
University of Freiburg Repository http://kaspar.informatik.uni-freiburg.de/~slamEvaluation/
datasets.php Amsterdam navigation dataset (Annotated with Ground Truth)
http://www2.science.uva.nl/sites/cogniron
Computer Vision Caltech 101 – Object Recognition LabelMe – A dataset from MIT CSAIL Indoor Scene Recognition Database – CSAIL Scene Categorization Dataset --
http://categorizingplaces.com/dataset.html
Summary Testing should be pervasive not and
afterthought Think about testing from units to systems What are good models for your system? Use Cases for Interaction between modules Well defined module interfaces / performance
metrics Can you characterize performance
quantitatively? Consider using SW tools from support to
simulators Consider use of golden standards datasets Testing requires a serious amount of resources