testing dynamic behavior in executable software models - making cyber-physical systems testable
TRANSCRIPT
.lusoftware verification & validationVVS
Testing Dynamic Behavior in Executable Software Models!
-!Making Cyber-physical Systems Testable!
! Lionel Briand
July 18, ISSTA 2016
Acknowledgements
• Shiva Nejati
• Reza Matinejad
• Raja Ben Abdessalem
2
Cyber-Physical Systems • Increasingly complex and critical
systems
• Complex environment
• Complex requirements, e.g., temporal, timing, resource usage
• Dynamic behavior
• Uncertainty, e.g., about the environment
• Testing is expensive and difficult, e.g., HW in the loop
3
Cyber Space
Physical Sensing
Actuation Information
Networks
ObjectDomain
Real Space
Dynamic Behavior
• Common when dealing with physical entities
• Inputs and outputs are variables evolving over time (signals)
• Properties to be verified consider change over time, for individual or sets of outputs
4
Time-Continuous Magnitude-Continuous
time
value
MiL Testing Hardware-in-the-Loop
StageModel-in-the-Loop
Stage
Simulink Modeling
Generic Functional
Model
MiL Testing
Software-in-the-Loop Stage
Code Generationand Integration
Software Running on ECU
SiL Testing
SoftwareRelease
HiL Testing
5
Both the system and environment
are modeled
Testing is highly expensive and time consuming
Simulink Models - Simulation
• Simulation Models
• heterogeneous
6
Time-Continuous Simulink Model Hardware
Model
Network Model
• continuous behavior
• are used for
• algorithm design testing
• comparing design options
MiL Test Cases
7
ModelSimulation
InputSignals
OutputSignal(s)
S3t
S2t
S1t
S3t
S2t
S1t
Test Case 1
Test Case 2
MiL Testing Challenges
• Space of test input signals is extremely large.
• Model execution, especially when involving physical modeling, is extremely expensive.
• Oracles are not simple Boolean properties – they involve analyzing changes in value over time (e.g., signal patterns) and assess levels of risk.
8
MiL Testing Challenges (2)
• Simulable model of the (physical) environment is required for test automation, but not always available.
• Effectiveness of test coverage strategies is questionable, e.g., model coverage.
• No equivalence classes on input signal domains, no combinatorial approaches.
9
We need novel, automated, and cost-effective MiL
testing strategies for CPS
10
Industrial Examples
11
Advanced Driver Assistance Systems (ADAS)
Decisions are made over time based on sensor data
12
Sensors Software
Pedestrian Detection System (PeVi)
13
• The PeVi system is a camera-based assistance system providing improved vision
Challenges • Simulation/testing is performed
using physics-based simulation environments
• Challenge 1: A large number of simulation scenarios
• more than 2000 configuration variables
• Challenge 2: Simulations are computationally expensive
14
weather road sensors human vehicles
Simulation Scenario
Approach
15
Generation of Test specifications
Static[ranges/values/
resolution]
Dynamic[ranges/
resolution]
(2)
test case specification
Specification Documents (Simulation Environment and PeVi System)
Domain model
Requirements model
(1)Development of Requirementsand domain models
Domain Model
16
- intensity: RealSceneLight
DynamicObject
1- weatherType: Condition
Weather
- fog- rain- snow- normal
«enumeration»Condition
Output Trajectory
- field of view: Real
Camera Sensor
RoadSide Object
- roadType: RTRoad
1 - curved- straight- ramped
«enumeration»RT
- vc: RealVehicle
- x0: Real
- y0: Real
- θ: Real- vp: Real
Pedestrian
- x: Real- y: Real
Position
1
*
1
*
11
- state: BooleanCollision
Parked Cars
Trees- simulationTime: Real- timeStep: Real
Test Scenario
PeVi
- state: BooleanDetection
11
11
11
11
«positioned»
«uses»1 1
Requirements Model
17
<<trace>> <<trace>>
Speed Profile
Path1 1
Slot Path Segment
1..**1
Trajectory Human1*
trajectory
WarningSensors posx1, posx2
posy1, posy2
AWACar/Motor/Truck/Bus
sensorhas
hasawa
11
1
*
humanappears
posx1 posx2
posy1
posy2
The PeVi system shall detect any person located in the Acute Warning Area of a vehicle
Test Generation Overview
18
Simulator + PeVi
Environment Settings (Roads, weather, vehicle type, etc.)
Fixed during Search Manipulated by Search
Human Simulator (initial position,
speed, orientation)
Car Simulator (speed)
PeVi
Meta-heuristic Search (multi-objective)
Generate scenarios
Detection or not?
Collision or not?
Multi-Objective Search
• Search algorithm needs objective or fitness functions for guidance
• In our case several independent functions can be interesting (heuristics): • Distance between car and pedestrian
• Distance between pedestrian and AWA
• Time to collision
19
posx1 posx2
posy1
posy2
Pareto Front
20
Individual A Pareto dominates individual B if A is at least as good as B
in every objective and better than B in at
least one objective.
Dominated by x
O1
O2
Pareto front x
• A multi-objective optimization algorithm must achieve:
• Guide the search towards the global Pareto-Optimal front. • Maintain solution diversity in the Pareto-Optimal front.
MO Search with NSGA-II
21
Non-Dominated Sorting
Selection based on rank and crowding distance
Size: 2*N Size: 2*N Size: N
• Based on Genetic Algorithm • N: Archive and population size • Non-Dominated sorting: Solutions are ranked according to how
far they are from the Pareto front, fitness is based on rank. • Crowding Distance: Individuals in the archive are being spread
more evenly across the front (forcing diversity) • Runs simulations for close to N new solutions
Pareto Front Results
22
TTC / D(P/AWA)
23
Simulation Scenario Execution
• https://sites.google.com/site/testingpevi/
24
Improving Time Performance
• Individual simulations take on average more than 1min
• It takes 10 hours to run our search-based test generation !(≈ 500 simulations)
è We use surrogate modeling to improve the search
• Goal: Predict fitness based on dynamic variables
• Neural networks 25
Multi-Objective Search with Surrogate Models
26
Non-Dominated Sorting
Selection based on rank and crowding distance
Size: 2*N Size: 2*N Size: N
Original Algorithm - Runs simulations for all!new solutions
New Algorithm - Uses prediction values &!
prediction errors to run !simulations only!for the solutions that !might be selected
Results – Surrogate Modeling
27
0.00
0.25
0.50
0.75
1.00
Time (min)
HV
50 100 15010
(a) Comparing HV values obtained by NSGAII and NSGAII-SM
NSGAII (mean)NSGAII-SM (mean)
0.00
0.25
0.50
0.75
1.00
Time (min)50 100 15010
(b) Comparing HV values obtained by RS and NSGAII-SM
HV
RS (mean)NSGAII-SM (mean)
0.00
0.25
0.50
0.75
1.00
Time (min)
HV
50 100 15010
(c) HV values for worst runs of NSGAII, NSGAII-SM and RS
RS
NSGAII-SM NSGAII
Results – Worst Runs
28
0.00
0.25
0.50
0.75
1.00
Time (min)
HV
50 100 15010
(a) Comparing HV values obtained by NSGAII and NSGAII-SM
NSGAII (mean)NSGAII-SM (mean)
0.00
0.25
0.50
0.75
1.00
Time (min)50 100 15010
(b) Comparing HV values obtained by RS and NSGAII-SM
HV
RS (mean)NSGAII-SM (mean)
0.00
0.25
0.50
0.75
1.00
Time (min)
HV
50 100 15010
(c) HV values for worst runs of NSGAII, NSGAII-SM and RS
RS
NSGAII-SM NSGAII
Results – Random Search
29
0.00
0.25
0.50
0.75
1.00
Time (min)
HV
50 100 15010
(a) Comparing HV values obtained by NSGAII and NSGAII-SM
NSGAII (mean)NSGAII-SM (mean)
0.00
0.25
0.50
0.75
1.00
Time (min)50 100 15010
(b) Comparing HV values obtained by RS and NSGAII-SM
HV
RS (mean)NSGAII-SM (mean)
0.00
0.25
0.50
0.75
1.00
Time (min)
HV
50 100 15010
(c) HV values for worst runs of NSGAII, NSGAII-SM and RS
RS
NSGAII-SM NSGAII
Conclusion • A general testing approach for ADAS, and many CP systems
• Formulated the generation of critical test cases as a multi-objective search problem using NSGAII algorithm
• Improved the search performance with surrogate models based on neural networks
• Generated some critical scenarios: no detection in the AWA, collision and no detection
• No clear cut oracle – failure to detect may be deemed acceptable risk
30
Dynamic Continuous Controllers
31
• Supercharger bypass flap controller ü Flap position is bounded within
[0..1] ü Implemented in MATLAB/Simulink ü 34 (sub-)blocks decomposed into 6
abstraction levels ü The simulation time T=2 seconds
Supercharger
Bypass Flap
Supercharger
Bypass Flap
Flap position = 0 (open) Flap position = 1 (closed)
Simple Example
32
InitialDesired Value
FinalDesired Value
time time
Desired Value
Actual Value
T/2 T T/2 T
Test Input Test Output
Plant Model
Controller(SUT)
Desired value Error
Actual value
System output+-
MiL Testing of Controllers
33
Configurable Controllers at MIL
Plant Model
++
+
⌃
+-
e(t)
actual(t)
desired(t)
⌃
KP e(t)
KDde(t)dt
KI
Re(t) dt
P
I
D
output(t)
Time-dependent variables
Configuration Parameters 34
Requirements and Test Objectives In
itial
Des
ired
(ID)
Desired ValueI (input)Actual Value (output)
Fina
l Des
ired
(FD
)
timeT/2 T
Smoothness
Responsiveness
Stability
35
A Search-Based Test Approach
Initial Desired (ID)
Fina
l Des
ired
(FD
)
Worst Case(s)?
• Continuous behavior
• Controller’s behavior can be complex
• Meta-heuristic search in (large) input space: Finding worst case inputs
• Possible because of automated oracle (feedback loop)
• Different worst cases for different requirements
• Worst cases may or may not violate requirements
36
Initial Solution
HeatMap Diagram
1. ExplorationList of Critical RegionsDomain
Expert
Worst-Case Scenarios
+Controller-
plant model
Objective Functionsbased on
Requirements 2. Single-State
Search
time
Desired ValueActual Value
0 1 20.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Initial Desired
Final Desired
37
Results
• We found much worse scenarios during MiL testing than our partner had found so far
• These scenarios are also run at the HiL level, where testing is much more expensive: MiL results -> test selection for HiL
• But further research was needed:
• Simulations are expensive
• Configuration parameters
38
Final Solution
+Controller
Model (Simulink)
Worst-Case Scenarios
List of Critical
PartitionsRegressionTree
1.Exploration with Dimensionality
Reduction
2.Search withSurrogate Modeling
Objective Functions
DomainExpert
Visualization of the 8-dimension space
using regression trees Dimensionality reduction to identify
the significant variables (Elementary Effect Analysis)
Surrogate modeling to predict the objective
function and speed up the search (Machine learning)
39
Regression Tree
All Points
FD>=0.43306
Count MeanStd Dev
Count MeanStd Dev
FD<0.43306Count MeanStd Dev
ID>=0.64679Count MeanStd Dev
Count MeanStd Dev
Cal5>=0.020847 Cal5>0.020847Count MeanStd Dev
Count MeanStd Dev
Cal5>=0.014827 Cal5<0.014827Count MeanStd Dev
Count MeanStd Dev
1000 0.007822
0.0049497
ID<0.64679
574 0.00595130.0040003
426 0.01034250.0049919
373 0.00475940.0034346
201 0.00816310.0040422
182 0.01345550.0052883
244 0.00802060.0031751
70 0.01067950.0052045
131 0.00681850.0023515
40
Surrogate Modeling
Any supervised learning or statistical technique providing fitness predictions with confidence intervals 1. Predict higher fitness with high
confidence: Move to new position, no simulation
2. Predict lower fitness with high confidence: Do not move to new position, no simulation
3. Low confidence in prediction: Simulation
Surrogate Model
Real Function
x
Fitness
41
ü Our approach is able to identify more critical violations of the controller requirements that had neither been found by our earlier work nor by manual testing.
MiL-Testing different configurations
Stability
Smoothness
Responsiveness
MiL-Testing fixed configurations Manual MiL-Testing
- -2.2% deviation
24% over/undershoot 20% over/undershoot 5% over/undershoot
170 ms response time 80 ms response time 50 ms response time
Results
42
Open Loop Controllers
On
Off
CtrlSig
• Mixed discrete-continuous behavior: Simulink stateflows
• No plant model: Much quicker simulation time
• No feedback loop -> no automated oracle
• The main testing cost is the manual analysis of output signals
• Goal: Minimize test suites
• Challenge: Test selection
• Entirely different approach to testing
respectively. In addition, we adapt the whitebox coverage and theblackbox output diversity selection criteria to Stateflows, and evalu-ate their fault revealing power for continuous behaviours. Coveragecriteria are prevalent in software testing and have been consideredin many studies related to test suite effectiveness in different appli-cation domains [?]. In our work, we consider state and transitioncoverage criteria [?] for Statflows. Our output diversity criterion isbased on the recent output uniqueness criterion [?] that has beenstudied for web applications and has shown to be a useful surro-gate to whitebox selection techniques. We consider this criterionin our work because Stateflows have complex internal structuresconsisting of differential equations, making them less amenable towhitebox techniques, while they have rich time-continuous outputs.
In this paper, we make the following contributions:
• We focus on the problem of testing Stateflows with mixeddiscrete-continuous behaviours. We propose two new testcase selection criteria output stability and output continuitywith the goal of selecting test inputs that are likely to pro-duce continuous outputs exhibiting instability and disconti-nuity failures, respectively.
• We adapt the whitebox coverage and the blackbox outputdiversity selection criteria to Stateflows, and evaluate theirfault revealing power for continuous behaviours. The formeris defined based on traditional state and transition coveragefor state machines, and the latter is defined based on the re-cent output uniqueness criterion [?].
• We evaluate effectiveness of our newly proposed and theadapted selection criteria by applying them to three Stateflowcase study models: two industrial and one public domain.Our results show that RESULT.
Organization of the paper.
2. BACKGROUND AND MOTIVATIONMotivating example. We motivate our work using a simplifiedStateflow from the automotive domain which controls a superchargerclutch and is referred to as the Supercharger Clutch Controller (SCC).Figure 1(a) represents the discrete behaviour of SCC specifyingthat the supercharger clutch can be in two quiescent states [?]: en-gaged or disengaged. Further, the clutch moves from the disen-gaged to the engaged state whenever both the engine speed engspdand the engine coolant temperature tmp respectively fall inside thespecified ranges of [smin..smax] and [tmin..tmax]. The clutchmoves back from the engaged to the disengaged state whenevereither the speed or the temperature falls outside their respectiveranges. The variable ctrlSig in Figure 1(a) indicates the sign andmagnitude of the voltage applied to the DC motor of the clutchto physically move the clutch between engaged and disengagedpositions. Assigning 1.0 to ctrlSig moves the clutch to the en-gaged position, and assigning �1.0 to ctrlSig moves it back tothe disengaged position. To avoid clutter in our figures, we useengageReq to refer to the condition on the Disengaged ! En-gaged transition, and disengageReq to refer to the condition onthe Engaged ! Disengaged transition.
The discrete transition system in Figure 1(a) assumes that theclutch movement takes no time, and further, does not provide anyinsight on the quality of movement of the clutch. Figure 1(b) ex-tends the discrete transition system in Figure 1(a) by adding a timervariable, i.e., time, to explicate the passage of time in the SCCbehaviour. The new transition system in Figure 1(b) includes two
(a) SCC -- Discrete Behaviour
(b) SCC -- Timed Behaviour
EngagedDisengaged
Engaging
(c) Engaging state of SCC -- mixed discrete-continuous behaviour
Disengaging
Disengaged
Engaged
time + +;
[disengageReq]/time := 0
[time
>5]
[time
>5]
time + +;
[(engspd > smin � engspd < smax) � (tmp > tmin � tmp < tmax)]/ctrlSig := 1
[engageReq]/ time := 0
[¬(engspd > smin � engspd < smax) � ¬(tmp > tmin � tmp < tmax)] /ctrlSig := �1
OnMoving OnSlipping
OnCompleted
time + +;ctrlSig := f(time)
Engaging
time + +;ctrlSig := g(time)
time + +;ctrlSig := 1.0
[¬(vehspd = 0) �time > 2]
[(vehspd = 0) �time > 3]
[time > 4]
Figure 1: Supercharge Clutch Controller (SCC) Stateflow.
transient states [?], engaging and disengaging, specifying that mov-ing from the engaged to the disengaged state and vice versa takessix milisec. Since this model is simplified, it does not show han-dling of alterations of the clutch state during the transient states.In addition to adding the time variable, we note that the variablectrlSig, which controls physical movement of the clutch, cannotabruptly jump from 1.0 to �1.0, or vice versa. In order to ensuresafe and smooth movement of the clutch, the variable ctrlSig hasto gradually move between 1.0 and �1.0 and be described as afunction over time, i.e., a signal. To express the evolution of thectrlSig signal over time, we decompose the transient states en-gaging and disengaging into sub-state machines. Figure 1(c) showsthe sub-state machine related to the engaging state. The one relatedto the disengaging state is similar. At beginning (in state OnMov-ing), the function ctrlSig has a steep grade (i.e., function f ) tomove the stationary clutch from the disengaged state and acceler-ate it to reach a certain speed in about two milisec. Afterwards (instate OnSlipping), ctrlSig decreases the speed of clutch basedon the gradual function g until about four milisec. This is to ensurethat the clutch slows down as it gets closer to the crank shaft ofthe car. Finally, at state OnCompleted, ctrlSig reaches value 1.0and remains constant, causing the clutch to get engaged in aboutone milisec. When the car is stationary, i.e., vehspd is 0, the clutchmoves based on the steep grade function f for three milisec, anddoes not have to go to the OnSlipping phase to slow down beforeit reaches the crank shaft at state OnCompleted.Input and Output. The Stateflow inputs and outputs are signals(functions over time). Each input/output signal has a data type,e.g. boolean, enum or float, specifying the range of the signal.For example, Figure 2 shows an example input (dashed line) andoutput (solid line) signals for SCC. The input signal is related toengageReq and is boolean, while the output signal is related to
43
Selection Strategies Based on Search • Input signal diversity
• White-box structural coverage
• State Coverage
• Transition Coverage
• Output signal diversity
• Failure-Based selection criteria
• Domain specific failure patterns
• Output Stability
• Output Continuity
S3t
S3t
44
!Output Diversity -- Vector-Based
45
Output
Time Output Signal 2 Output Signal 1
46
Output Diversity -- Feature-Based
increasing (n) decreasing (n)constant-value (n, v)
signal featuresderivative second derivative
sign-derivative (s, n) extreme-derivatives
1-sided discontinuity
discontinuity
1-sided continuitywith strict local optimum
value
instant-value (v)constant (n)
discontinuitywith strict local optimum
increasing
C
A
B
Failure-based Test Generation
47
Instability Discontinuity
0.0 1.0 2.0-1.0
-0.5
0.0
0.5
1.0
Time
Ctr
lSig
Output
• Search: Maximizing the likelihood of presence of specific failure patterns in output signals
• Domain-specific failure patterns elicited from engineers
0.0 1.0 2.0Time
0.0
0.25
0.50
0.75
1.0
Ctr
lSig
Output
Results • The test cases resulting from state/transition coverage
algorithms cover the faulty parts of the models
• However, they fail to generate output signals that are sufficiently distinct from expectations, hence yielding a low fault revealing rate
• Output-based algorithms are much more effective
• Existing commercial tools: Not effective at finding faults, not applicable to entire Simulink models
48
Reflecting
49
Commonalities • Large input spaces
• Combinatorial approaches not applicable
• Coverage?
• Expensive testing: test execution time, oracle analysis effort
• Complex oracle (dynamic behavior)
• Testing is driven by risk
• Search-based solution: Highest risk scenarios
50
Differences • Model execution time, e.g., plant or not
• Fitness function: Exact or heuristic
• Single or multi objectives
• Automated oracle or not
• Other techniques involved to achieve scalability: regression trees, neural networks, sensitivity analysis, …
51
Related Work
52
Constraint Solving • Test data generation via constraint solving is not feasible
when:
• Continuous mathematical models, e.g., differential equations
• Library functions in binary code
• Complex operations
• Constraints capturing dynamic properties (discretized) tend not to be scalable
53
Search-Based Testing
• Largely focused on unit or function testing, where the goal is to maximize model coverage, check temporal properties (state transitions) …
• To address CPS, we need more work on system-level testing, targeting dynamic properties in complex input spaces capturing the behavior of physical entities.
54
Future: !A more general methodological
and automation framework, targeting more complex and heterogeneous CPS models
55
Future Work • Shifting the bulk of testing from implemented systems to models of
such systems and their environments requires:
• Heterogeneous modeling and co-simulation
• Modeling dynamic properties and risk
• Uncertainty modeling enabling probabilistic test oracles
• Executable model at a proper level of precision for testing purposes
• Use results to make the best out of available time and resources for testing the implemented system with hardware in the loop
• Focus on high risk test scenarios within budget and time constraints 56
References • R. Ben Abdessalem et al., "Testing Advanced Driver Assistance Systems Using
Multi-Objective Search and Neural Networks”, ASE 2016
• R. Matinnejad et al., “Automated Test Suite Generation for Time-continuous Simulink Models“, ICSE 2016
• R. Matinnejad et al., “Effective Test Suites for Mixed Discrete-Continuous Stateflow Controllers”, ESEC/FSE 2015 (Distinguished paper award)
• R. Matinnejad et al., “MiL Testing of Highly Configurable Continuous Controllers: Scalable Search Using Surrogate Models”, ASE 2014 (Distinguished paper award)
• R. Matinnejad et al., “Search-Based Automated Testing of Continuous Controllers: Framework, Tool Support, and Case Studies”, Information and Software Technology, Elsevier (2014)
57
.lusoftware verification & validationVVS
Testing Dynamic Behavior in Executable Software Models!
!
Lionel Briand
** WE HIRE! **
July 18, ISSTA 2016