keynote sbst 2014 - search-based testing
TRANSCRIPT
Search-Based Software Testing in Industry ---
Research collaborations and Lessons Learned
Lionel Briand Interdisciplinary Centre for ICT Security, Reliability, and Trust (SnT) University of Luxembourg, Luxembourg SBST, Hyderabad, 2014
SnT Software Verification and Validation Lab
• SnT centre, Est. 2009: Interdisciplinary, ICT security-reliability-trust
• 200 scientists and Ph.D. candidates, 20 industry partners
• SVV Lab: Established January 2012, www.svv.lu
• 25 scientists (Research scientists, associates, and PhD candidates)
• Industry-relevant research on system dependability: security, safety, reliability
• Six partners: Cetrel, CTIE, Delphi, SES, IEE, Hitec …
• And we are always hiring! 2
An Effective, Collaborative Model of Research and Innovation
Basic Research Applied Research
Innova3on & Development
• Basic and applied research take place in a rich context
• Basic Research is also driven by problems raised by applied research, which is itself fed by innovation and development
• Publishable research results and focused practical solutions that serve an existing market. 3
Schneiderman, 2013
Collaboration in Practice
• Well-defined problems in context • Realistic evaluation • Long term industrial collaborations
4
Problem Formulation
Problem Identification
State of the Art Review
Candidate Solution(s)
Initial Validation
Training
Realistic Validation
IndustryPartners
ResearchGroups
1
2
3
4
5
7Solution Release
8
6
Outline
• Four projects: – Testing PID controllers in the automotive industry (Delphi) – Robustness testing of a video conference system (Cisco) – Environment-based testing of a seismic acquisition system
(WesternGeco) – Schedulability analysis and stress testing of safety-critical
drivers in the oil&gas industry (Kongsberg)
• Lessons learned, patterns, discussions
• Meant to be an interactive talk – I am also here to learn
5
Acknowledgements
PhD. Students: • Marwa Shousha • Shaukat Ali • Zohaib Iqbal • Hadi Hemmati • Reza Matinnejad • Stefano Di Alesio
Research Associates/Scientists, former colleagues: • Shiva Nejati • Andrea Arcuri • Arnaud Gotlieb • Yvan Labiche
6
Testing PID Controllers (Delphi)
References:
7
• R. Matinnejad, S. Nejati, L. Briand, T. Bruckmann, “MiL Testing of Highly Configurable Continuous Controllers: Scalable Search Using Surrogate Models”, Submitted (2104)
• R. Matinnejad, S. Nejati, L. Briand, T. Bruckmann, C. Poull, , “Search-Based Automated Testing of Continuous Controllers: Framework, Tool Support, and Case Studies”, forthcoming in Information and Software Technology (2014)
• R. Matinnejad, S. Nejati, L. Briand, T. Bruckmann, C. Poull , “Automated Model-in-the-Loop Testing of Continuous Controllers using Search”, in 5th Symposium on Search-Based Software Engineering (SSBSE 2013), Springer Lecture Notes in Computer Science (2013, August)
Dynamic continuous controllers are present in many embedded systems
8
Development Process
9
Hardware-in-the-Loop Stage
Model-in-the-Loop Stage
Simulink Modeling
Generic Functional
Model
MiL Testing
Software-in-the-Loop Stage
Code Generationand Integration
Software Running on ECU
SiL Testing
SoftwareRelease
HiL Testing
Controllers at MIL
10
Plant Model
++
+
⌃
+-
e(t)
actual(t)
desired(t)
⌃
KP e(t)
KDde(t)dt
KI
Re(t) dt
P
I
D
output(t)
Inputs: Time-dependent variables
Configuration Parameters
Inputs, Outputs, Test Objectives
11
Initi
al D
esire
d(ID
)
Desired ValueI (input)Actual Value (output)
Fina
l Des
ired
(FD
)
timeT/2 T
Smoothness
Responsiveness
Stability
Process and Technology
12
HeatMap Diagram
1. ExplorationList of Critical RegionsDomain
Expert
Worst-Case Scenarios
+Controller-
plant model
Objective Functionsbased on
Requirements 2. Single-State
Search
Continuous Controller Tester
(a) Liveness (b) Smoothness
Testing in the Configuration Space
• MIL testing for all feasible configurations • The search space is much larger
• The search is much slower (Simulations of Simulink models are expensive)
• Not all configuration parameters matter for all objective functions
• Results are harder to visualize
13
Modified Process and Technology
14
+Controller
Model (Simulink)
Worst-Case Scenarios
List of Critical
PartitionsRegressionTree
1.Exploration with Dimensionality
Reduction
2.Search withSurrogate Modeling
Objective Functions
DomainExpert
Visualization of the 8-dimension space using regression trees Dimensionality
reduction to identify the significant variables
Surrogate modeling to predict the objective function and speed up the search
Dimensionality Reduction
• Sensitivity Analysis: Elementary Effect Analysis (EEA)
• Identify non-influential inputs in computationally costly mathematical models
• Requires less data points than other techniques
• Observations are simulations generated during the Exploration step
• Compute sample mean and standard deviation for each dimension of the distribution of elementary effects
15
Cal5ID
Cal3FD
Cal4Cal6
Cal1,Cal2
0.6
0.4
0.2
0.0
Sam
ple
Stan
dard
Dev
iatio
n (
)
-0.6 -0.4 -0.2 0.0 0.2Sample Mean ( )
⇤10�2
⇤10�2
S� i
�i
Visualization in Inputs & Configuration Space
16
All Points
FD>=0.43306
Count MeanStd Dev
Count MeanStd Dev
FD<0.43306Count MeanStd Dev
ID>=0.64679Count MeanStd Dev
Count MeanStd Dev
Cal5>=0.020847 Cal5>0.020847Count MeanStd Dev
Count MeanStd Dev
Cal5>=0.014827 Cal5<0.014827Count MeanStd Dev
Count MeanStd Dev
1000 0.007822
0.0049497
ID<0.64679
574 0.00595130.0040003
426 0.01034250.0049919
373 0.00475940.0034346
201 0.00816310.0040422
182 0.01345550.0052883
244 0.00802060.0031751
70 0.01067950.0052045
131 0.00681850.0023515 Regression Tree
Surrogate Modeling
17
• Any supervised learning or statistical technique providing fitness predictions with confidence intervals
1. Predict higher fitness with high confidence: Move to new position, no simulation
2. Predict lower fitness with high confidence: Do not move to new position, no simulation
3. Low confidence in prediction: Simulation
Surrogate Model
Real Function
x
Fitness
Results
• Search yielded worst-case scenarios that were much worse than known and expected scenarios
• Surrogate modeling: Polynomial regression yielded best fit and predictive power so far
• Dimensionality reduction helps generate better surrogate models
• Surrogate modeling can yield up to an eight-fold increase in search speed
• Surrogate modeling can help find more critical requirements violations
• By accounting for variations in configurations, we found more critical requirements violations than just with the HIL configuration
18
Robustness Testing of a Video Conference System (Cisco)
References:
19
• S. Ali, Briand, H. Hemmati, “Modeling Robustness Behavior Using Aspect-Oriented Modeling to Support Robustness Testing of Industrial Systems”, Journal of Software and Systems Modeling (Springer), 2011
• S. Ali, M. Z. Iqbal, A. Arcuri, L. Briand, “Generating Test Data from OCL Constraints with Search Techniques”, IEEE Transactions on Software Engineering, 2013
Video Conference System
20
Core Functionality
21
EP1 EP3
EP2
Call
Outgoing channel
Incoming channel
Audio Channel
Presentation Channel Video
Channel
Robustness
• Robustness is the degree to which a software component functions correctly in the presence of exceptional inputs or stressful environmental conditions (IEEE Std 610.12-1990)
• Significant additional complexity lies with handling the
robustness properties – Network communication faults – Media quality faults in media streams – Faults in the endpoints
22
Cross-Cutting Concern
23
NotFull [0<#calls<max]
Full [#calls=
max]
dial()
dial() [#calls=max-1]
dial() [#calls<m
ax-1]
disconnect()
disconnect() [#calls=1]
disconnect() [#calls>1]
Idle [#calls=0
]
Recovery […]
Afte
r(tim
e)
Disc
onne
ctAll()
PL>0 or PacketDelay>0 or ReorderDelay>0 or corrupt>0 or Duplicate>0
PL=0 && PacketDelay=0 && ReorderDelay=0 && corrupt=0 && Duplicate=0
Cross-cutting concern
Base model
Model-Based Testing (MBT)
24
• Goals: Scalability, complete automation • Model-based Testing (MBT) uses models of the system for test case
and oracle generation – The models typically describe some aspects of system under test – Increasingly used for complete test automation, e.g., aerospace,
automotive, banking
• Often using well-established standards for modeling and their extensions: UML (profiles), OCL, etc.
• Requirements: – Test-ready models – Appropriate test strategies, e.g., path selection – Test data generation – Oracles
Model-Based Testing: Process and Technology
25
Test Data Generation for MBT
• Test data is needed to execute program paths as required by a coverage criterion during testing
• For MBT, test data is typically an instance of a class diagram • Instances must fulfill invariants • Paths in state machines carry constraints (guards) on conditions • To generate test data for UML/OCL models, we need to solve
OCL constraints written on the models
26
context Student inv ageConstraint: self.age > 15 and self.age < 80
Example OCL expression in VC Model
27
context Saturn inv synchronizationConstraint: !!self.systemUnit.NumberOfActiveCalls > 1 and !
!self.systemUnit.NumberOfActiveCalls <= !! ! ! ! ! !!self.systemUnit.MaximumNumberOfActiveCalls !!and !!self.media.synchronizationMismatch.unit = TimeUnitKind::s and !!!(!! !self.media.synchronizationMismatch.value >= 0 and !
! !self.media.synchronizationMismatch.value <= ! ! !!!! ! !self.media.synchronizationMismatchThreshold.value!!) and !!self.conference.PresentationMode = Mode::Off and !!self.conference.call→select(call | !! !call.incomingPresentationChannel.Protocol <> VideoProtocol::Off)→size()=2 !!and !!!self.conference.call→select(call | !! ! call.outgoingPresentationChannel.Protocol <> VideoProtocol::Off)→size()=2!
OCL Constraint Solvers
• A number approaches for OCL constraint solving
• Not complete – Support subset of OCL
• Lack of proper tool support – A number of approaches are not automated
• Not scalable – Often based on translation (e.g., to CSP) – Combinatorial explosion
28
A Search Problem
• We used an alternate approach by applying the search-based testing (SBT) concepts to solve OCL constraint
• The process of generating test data can be seen as a search process – There is a huge number of possible instances that can be
generated for a particular model – We need to select instances that solve the constraint
• Fitness defined as a distance function d() – d() returns 0 if the constraint is solved – otherwise a value that heuristically estimates how far the constraint
was from being evaluated as true
29
Challenges
• Primitive Types, Boolean Operators • Operations on Collections, Iterators • Fine grained fitness functions for iterators using size, oclInState
• Consider a collection C = {1, 2, 3} and a constraint C→forAll(x|x= 0)
d(C->forAll(x|x=0)) ! d(C.at(i) = 0)/C->size() ! (d(1 = 0) + d(2=0) + d(3=0))/3 ! (2 + 3 + 4)/3 ! 3
• Many complex rules for the computations of fitness functions based on OCL expressions
• Fine grained heuristics -> maximum guidance 30
VC Model and Results
• UML Class diagram, state machines, OCL • 20 subsystems, on average 5 states and 11
transitions (largest: 22 states – 63 transitions) • OCL: 144 constraints as guards, 100 invariants, and
57 change events • Results:
– All constraints were resolved – Maximum time: ~ 2 minutes on laptop
31
Environment-Based Testing of a Seismic Acquisition System (WesternGeco)
References:
32
• Z. Iqbal, A. Arcuri, L. Briand, “Empirical Investigation of Search Algorithms for Environment Model-Based Testing of Real-Time Embedded Software”, ACM ISSTA, 2012
• Z. Iqbal, A. Arcuri, L. Briand, “Environment Modeling and Simulation for Automated Testing of Soft Real-Time Embedded Software”, Software and System Modeling (Springer), 2014
Objectives
• Model-based System testing – Black-box – Environment models
33
Environment Simulator
Test cases
Environment Models
Test oracle
Environment: “Domain” Model
34
Environment: “Behavioral” Model
35
Test Case Generation
• Test objectives: Reach “error” states (critical environment states) • Test Case: (1) Environment and (2) Simulation Configuration
– (1) Number of instances for each component in domain model, e.g., number of items on conveying belt
– (2) Setting non-deterministic properties of the environment, e.g., speed of sorter’s left and right arms
• Oracle: Reaching an “error” state • SBST: Heuristics
– Distance from error state – Distance from satisfying OCL guards – Time distance – Time in “risky” states – …
36
Schedulability Analysis and Stress Testing of Safety-Critical Drivers (Kongsberg Maritime)
References:
37
• L. Briand, Y. Labiche, and M. Shousha, “Using genetic algorithms for early schedulability analysis and stress testing in real-time systems”, Genetic Programming and Evolvable Machines, vol. 7 no. 2, pp. 145-170, 2006
• S. Nejati, S. Di Alesio, M. Sabetzadeh, and L. Briand, “Modeling and analysis of cpu usage in safety-critical embedded systems to support stress testing,” in Model Driven Engineering Languages and Systems. Springer, 2012, pp. 759–775.
• S. Di Alesio, S. Nejati, L. Briand. A. Gotlieb, “Stress Testing of Task Deadlines: A Constraint Programming Approach”, ISSRE 2013, San Jose, USA!
• S. Di Alesio, S. Nejati, L. Briand. A. Gotlieb, “Worst-Case Scheduling of Software Tasks – A Constraint Optimization Model to Support Performance Testing, Constraint Programming (CP), 2014
Fire/Gas Detection and Emergency Shutdown
38
Drivers (Software-Hardware Interface)
Control Modules Alarm Devices (Hardware)
Multicore Archt.
Real Time Operating System
Monitor gas leaks and fire in oil extraction platforms
Performance Requirements are Hard to Verify
39
They constraint the entire system’s behavior and thus can’t be checked locally
They depend on the environment the software interacts with (hw devices)
They depend on the computing platform on which the software runs
Schedulability Analysis and Testing
• RTES have concurrent interdependent tasks which have to finish before their deadlines
• Each task has a deadline (i.e., latest finishing time) w.r.t. its arrival time
• Some task properties depend on the environment, some are design choices
• Tasks can trigger other tasks, and can share computational resources with other tasks
• Schedulability analysis encompasses techniques that try to predict whether all (critical) tasks are schedulable, i.e., meet their deadlines
• Stress testing runs carefully selected test cases that have a high probability of leading to deadline misses
40
Arrival Times Determine Deadline Misses
41
0123456789
𝒋𝟎, 𝒋𝟏, 𝒋𝟐 arrive at 𝒂𝒕𝟎, 𝒂𝒕𝟏, 𝒂𝒕𝟐 and must , 𝒋𝟏, 𝒋𝟐 arrive at 𝒂𝒕𝟎, 𝒂𝒕𝟏, 𝒂𝒕𝟐 and must , 𝒋𝟐 arrive at 𝒂𝒕𝟎, 𝒂𝒕𝟏, 𝒂𝒕𝟐 and must arrive at 𝒂𝒕𝟎, 𝒂𝒕𝟏, 𝒂𝒕𝟐 and must , 𝒂𝒕𝟏, 𝒂𝒕𝟐 and must , 𝒂𝒕𝟐 and must and must finish before 𝒅𝒍𝟎, 𝒅𝒍𝟏, 𝒅𝒍𝟐 , 𝒅𝒍𝟏, 𝒅𝒍𝟐 , 𝒅𝒍𝟐
𝒋𝟏 can miss its deadline 𝒅𝒍𝟏 depending on can miss its deadline 𝒅𝒍𝟏 depending on depending on when 𝒂𝒕↓𝟐 occurs!
0123456789
𝒋𝟐
𝒂𝒕𝟐
𝒅𝒍𝟐
𝒋𝟏
𝒂𝒕𝟏
𝒅𝒍𝟏
𝑻
𝒋𝟎
𝒂𝒕𝟎
𝒅𝒍𝟎
𝒋𝟐
𝒂𝒕𝟐
𝒅𝒍𝟐
𝒋𝟏
𝒂𝒕𝟏
𝒅𝒍𝟏
𝑻
𝒋𝟎
𝒂𝒕𝟎
𝒅𝒍𝟎
Search-Based Approaches
• This problem can be tackled as a search problem in the space of arrival times for aperiodic tasks
• Identify worst-case scenarios for testing • No assumptions
• Genetic algorithms: Briand et al., 2003-2006
• Constraint Programming (e.g., OPL, ILOG CP Optimizer) – Nejati et al., 2012 – Di Alesio et al., 2013-2014
42
Constraint Optimization
43
Constraint Optimization Problem
Static Properties of Tasks (Constants)
Dynamic Properties of Tasks
(Variables)
Performance Requirement (Objective Function)
OS Scheduler Behaviour (Constraints)
Process and Technologies
44
UML Modeling
Automated Search
Optimization Problem (Find arrival times that maximize the chance of deadline misses)
System Platform
Solutions (Task arrival times likely to
lead to deadline misses)
Deadline Misses Analysis
System Design Design Model (Time and Concurrency
Information)
INPUT
OUTPUT
Genetic Algorithms
(GA)
Stress Test Cases
Constraint Programming
(CP)
𝒂𝒕↓𝟎 =𝟏 𝒂𝒕↓𝟏 =𝟑 𝒂𝒕↓𝟐 =𝟒
Results and Current Work
• GA tends to be more efficient but less effective than CP – More efficient: Find deadline misses quicker – More effective: Find worse deadline misses
• CP is deterministic, evolutionary search is randomized
• For testing we want a diverse test of stress test cases
• Combining GA and CP (Di Alesio’s dissertation): – Achieve an efficiency close to GA and an effectiveness close to
CP – Use GA first and improve worst solutions found by GA by
performing a CP complete search in the neighborhood of solutions
– Results on five case studies are very encouraging
45
SBST in Industry: Discussion
• Scalability
• Applicability
• Variety of heuristics as a function of test objectives, available information, assumptions, etc.
• Search as a piece of the solution: multidisciplinarity
• Combining search with other techniques: Likely candidates
46
Scalability
• Search spaces are huge in practice
• Fitness computation is often computationally-intensive
• Test execution can be expensive – Web applications or phone apps versus embedded systems
with HIL – Models, simulation to guide the search
• Simulation is always expensive – Simulink models, e.g., 31s for a 2s simulation – Surrogate modeling?
• In many situations, models of the system can help guide the search
47
Applicability
• Many academic solutions are not applicable in practice
• Context matters
• Scalability -> applicability
• But also inputs required for guiding the search
• Integrated to the rest of the development process – E.g., design models, WCET analysis, Simulink development
48
A Large Variety of Heuristics
• Test objectives differ a great deal depending on context – Performance, robustness, critical environment states …
• Available information also differs, both for guiding test generation and oracles – Purely black-box testing – Design information, e.g., through models
• Working assumptions – About process, technology, … – E.g., availability of plant/environment models in Simulink
• In a given context, some degree of tailoring is usually required for applying SBST
49
Multidisciplinarity
• Typically, meta-heuristic search is only part of a solution to a testing problem
• Dedicated system or environment modeling, e.g., in Cisco and WesternGeco studies
• Machine learning, e.g., regression trees in Delphi study
• Statistical analysis, e.g., EEA and non-linear regression in Delphi study
• Constraint programming, e.g., in Kongsberg study
50
Search-Based Software Testing in Industry ---
Research collaborations and Lessons Learned
Lionel Briand Interdisciplinary Centre for ICT Security, Reliability, and Trust (SnT) University of Luxembourg, Luxembourg SBST, Hyderabad, 2014 SVV lab: svv.lu SnT: www.securityandtrust.lu