demystifying the black box: a test strategy for autonomy · demystifying the black box: a test...

61
Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses Operational Evaluation Division [email protected] 12 April 2019

Upload: others

Post on 18-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

Demystifying the Black Box: A Test Strategy for Autonomy

Dr. Daniel PorterInstitute for Defense Analyses

Operational Evaluation [email protected]

12 April 2019

Page 2: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

1

Page 3: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

Talk Takeaways – Order of Importance

2

1. Testing should aim to develop a model of system decision-making and confirm the underlying capabilities.

2. The fundamental challenge of testing autonomy and AI is generalizing to unobserved situations.

Page 4: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

Talk Takeaways

3

Testing should aim to develop a model of system decision-making and confirm the underlying capabilities.

The fundamental challenge of testing autonomy and AI is generalizing to unobserved situations.

Page 5: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

B. Testing should aim to develop a model of system decision-making and confirm the underlying capabilities.

A. The fundamental challenge of testing autonomy and AI is generalizing to unobserved situations.

Talk Takeaways – Order of Discussion

4

Page 6: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

The fundamental challenge of testing autonomy and AI is generalizing to

unobserved situations.

5

Page 7: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

An autonomous car shows up instead of a taxi

6

Page 8: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

If the AI certification process were just the same road test that humans take, would you trust it?

7

Page 9: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

We trust the human because we have a model

8

• Neither has encountered all the situations it will

• We trust the human but not the car

• I have a model of the human’s decision-making Not being dead is proof the model works

Page 10: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

What do I mean by model?

9

He tried to put the poison where he thinks I would take it.

He’s thinking about where I think he’s thinking about putting it!

Page 11: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

“Common Sense” is the collection of models we implicitly understand humans use to act in the world.

If we want humans to appropriately trust machine decisions, the humans must be able to model those decisions.

10

Page 12: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

Trust of decision making has three basic inputs

11

Goals

Competence

Process

Page 13: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

Goals may exist globally

12

• Don’t die• Don’t get arrested• Get paid

Competence

Process

Page 14: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

• Don’t die• Don’t get arrested• Get paid

Global goals can spawn task-specific sub-goals

13

Get there safely

Competence

Process

Page 15: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

Get there safely

Process identifies current state and picks next action

14

Competence

Process

Page 16: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

Human Competence is well-tested

15

Manufacturing Line

Quality Assurance

Page 17: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

Human driver certification can be weak because I can assume life tested most of the underlying Competence

16

Page 18: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

Machines don’t have common sense

17

• If decision engine is a black box, we don’t understand: Global GoalsMoment-to-moment decision Process Underlying Competence those processes depend on

• People fear discontinuities in decision-makingMachine Learning is just signal extraction

The world is full of correlated but incorrect signal

If we don’t have a model, we can’t be confident behavior will be continuous as we move across a dimension

Page 19: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

Signal may not be universally useful

18

Page 20: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

The fundamental challenge of testing autonomy and AI is generalizing to unobserved situations

19

• Aerodynamics model allows inference Test edges, infer center

Page 21: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

# Non-target objectsCollateral Damage

The fundamental challenge of testing autonomy and AI is generalizing to unobserved situations

20

• We don’t have models of system decision-making

Valu

e of

Targ

et

???High Probability Engage

Low Probability Engage

Page 22: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

The fundamental challenge of testing autonomy and AI is generalizing to

unobserved situations.

21

Page 23: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

Testing should aim to develop a model of system decision-making and confirm

the underlying capabilities.

22

Page 24: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

Testing is about assurance

23

• Does the system meet its requirements? Contractual Operational

• To what extent do different factors affect performance? Identify areas for improvement

Inform development of tactics/guidance

Hopefully aligned

Page 25: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

Assurance can be built in different ways

24

• Brute ForceCover operational space sufficiently for acceptable level of riskBlack box forces this approach

• InterpolationObserve limited points and predict between observationsHaving an underlying model enables this approach

Page 26: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

1. The system’s Goals & Processes are reasonable. What is the system trying to accomplish? What information is the system using, and how does that

information change its decision?

2. The Competencies these require are functioning. Is it able to acquire that information and execute its actions?

25

Model-based assurance needs to find two things

Page 27: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

Autonomy: Making decisions based on environmental input

26

VS

Page 28: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

Autonomy exists at the level of task being considered

27

Mission:Clear Minefield

Task:Remove mine

Task:Locate Mine

Task:Pick Search Pattern

Page 29: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

Some systems won’t need testing to find Goals or Process.

28

Page 30: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

Some systems’ operationally relevant goals are decided by a human

29

• Procedural Autonomy System has autonomy in the moment-to-moment Process

decisions to achieve a GoalE.g., control loops

Page 31: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

Systems with just Procedural Autonomy don’t need special test methods in most cases

30

• The Process is known in advance Physics model or explicitly coded logic

• Brute force is feasible Small operational space or low-risk consequences

• Correct moment-to-moment decisions just affect performance of a defined task If it is performing well, it is making the right decisions

Page 32: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

We need models for systems that make Goal decisions

31

• Executive Autonomy System can set a goal for itselfMaking “should” decisions about tasks

• “Should” decision correctness usually won’t be captured by typical objective performance metrics

• This is the type of autonomy that really worries people

Page 33: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

If you need a model… A Cheat Sheet

32

1. Decompose the mission into operationally relevant tasks.

2. Determine for which tasks the system has autonomy.

3. Write out information needs of the tasks for which the system has autonomy.

4. Build model by experimenting with system decision-making across those information dimensions.

5. Confirm system’s Competence to get accurate information and execute appropriate actions.

6. Test decision performance in realistic scenarios.

Page 34: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

Testing needs to become a continuum

33

• Build a model Demonstrate that information affects decisions correctly

Traditional contractor testingGood candidate for M&S

• Demonstrate Competencies Show that system can accurately acquire this information in a

timely manner under realistic conditionsTraditional developmental testing

• Test the model Show that the system makes appropriate decisions under

realistic conditionsTraditional operational testing

Page 35: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

Testing should provide assurance about the model

34

Goals

Competence

Process

Page 36: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

Talk Takeaways – Order of Importance

35

1. Testing should aim to develop a model of system decision-making and confirm the underlying capabilities.

2. The fundamental challenge of testing autonomy and AI is generalizing to unobserved situations.

Page 38: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

Thank You

37

Page 39: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

Backup

38

Page 40: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

In layman’s terms…

39

• Goals What does the world look like when I’m done? How do I value achieving a certain situation?

• Process Strategy for:

Identifying features of current situation Identifying available optionsEvaluating how options would change the situation Choosing an option that best meets goalsExecuting chosen action

• Competence How well can you perform each Process step

Page 41: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

A test of AI/autonomy should let us know…

40

• Goals What does the world look like when I’m done? How do I value achieving a certain situation?

• Process Strategy for:

Identifying features of current situation Identifying available optionsEvaluating how options would change the situation Choosing an option that best meets goalsExecuting chosen action

• Capabilities How well can you perform each Process step

These are reasonable

These are adequate

Page 42: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

The lifecycle of test must be a continuum, not discrete categories operating as independent fiefdoms.

41

Page 43: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

Testing autonomy will require more data

42

• Autonomous systems will have larger operational spaces Still have to test physical performance Also have to test decision performance

Adds (many) factors to test design

• People likely less forgiving of machine decisions Acceptable level of risk will be smaller

Requires more evidence to achieve acceptable risk

• Need efficient methods to discover AI’s model

Page 44: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

Designing a model is easier than figuring it out.

43

Page 45: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

The Black (Box) Plague should be avoided

44

AUTONOMY

Page 46: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

Cognitive Architecture: Life’s easier if you plan!

45

VIDEO FEED

SHAPEBUILDER

EDGE DETECTOR

OBJECTASSEMBLER

SPATIALLOCALIZER

OBJECTIDENTIFER

Task: Identify what things are where

Page 47: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

Design architectures from the start.

46

Page 48: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

Data collection must be built into the system

47

• The system must record the data itself Impossible to record data in many situations Requires horde of observers when it is possible

• Data collection infrastructure must be a requirement

• This is not just for OT Developers & DT will need the infrastructure too

Need to diagnose decisions to fix them

Page 49: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

We must get more evidence without breaking budgets

48

• Challenge: Autonomy will require more evidence

• Solution: Build a “body of evidence” over time Targeted testing: cover the space in intelligent ways

Each point must provide more evidential valueFocus on what test points allow us to learn about system

Expand data sources that inform operational evaluationMore evidence without more test

Page 50: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

Targeted testing must be informed by prior results

49

• Sequential testing guides targeted testing Pick next test points based on what we learned in past Test over time instead of one massive test Helps maximize value of each point

• Modeling & Simulation can inform targeted testing

Page 51: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

Targeted testing must not delay fielding

50

• Challenge: Sequential testing can expand timelines Need to have previous test points to pick the next ones

Can’t do this in a live test, so have to test over longer period

• Solution: Push the start of testing left Begin collecting operational-esque data earlier

Earlier start means data must support both DT & OTDT/OT needs to become a continuumThis is probably desirable for autonomy in any event

AI needs realistic environment to see true behavior anyway OT needs to continue to enhance our understanding of system

Page 52: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

CT-DT-OT should be CDOT continuum

51

Develop a concept of what

the model is

Test performance under realistic

conditions

Confirm model, assess underlying

capabilities

Page 53: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

Spiral development should probably be the goal

52

• Challenge: Testing systems with large operational spaces where failure risks human life

Too many opportunities for holes in coverage to lead to catastrophic consequences

• Solution: Limited or Incremental Capability Fielding Complex tasks can be broken down into smaller ones Choose a subtask with acceptable risk and test that

If it passes this test, approve it for fielding on that task Potentially limit to human supervision

Collect field data through built-in infrastructureOver time adjust risk of approved tasks

Page 54: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

Risky, complex systems should not be fielded all at once

53

Perform OT on SC

Approve SC for human -supervised

fielding

Collect extensive field data

Choose a sub -capability

Collect extensive field data

Increase risk of approved

unsupervised tasks

Increase risk of approved human -supervised tasks

Page 55: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

54

Page 56: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

Talk Takeaways – Order of Importance

55

1. Testing should aim to develop a model of system decision-making and confirm the underlying capabilities.

2. The lifecycle of test must be a continuum, not discrete categories operating as independent fiefdoms.

3. The fundamental challenge of testing autonomy and AI is generalizing to unobserved situations.

Page 57: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

Acknowledgements

56

• DARPA Explainable AI (XAI)Machine Common Sense (MCS)

• Acquisition Reform Shift Left Integrated Testing Incremental Capability Fielding / Spiral Development

Page 58: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

Test should still be scoped around acceptable risk

57

• What are the possible consequences? Size of operational space

How likely are we to miss a problem? Severity of consequence

What harm could a failure cause?

• How independent is the system? Time between human control

Is it off on its own for long periods of time? Decisions between human control

Does it act faster than a human can reasonable intervene?

Page 59: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

All vowels have an odd number on the back.What cards do you need to flip to test this fully?

58

A 3 B 8

Page 60: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

There are four people in a bar. For some you know what they’re drinking. For some you know their age. Who do you check to make sure the law isn’t being broken?

59

Beer 22 Soda 15

Page 61: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational

The goal of testing should be developing and confirming a generalizable model of system decision-making.

61

• Brute force testing is not feasible

• Interpolation required for evaluation and TTPs

• Models enable interpolation

• Solution: structure tests and pick test points based on what allows you to understand the decision process