reverse engineering state machines by interactive grammar inference

22
Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin

Upload: lael-malone

Post on 02-Jan-2016

31 views

Category:

Documents


0 download

DESCRIPTION

Reverse Engineering State Machines by Interactive Grammar Inference. Neil Walkinshaw , Kirill Bogdanov , Mike Holcombe, Sarah Salahuddin. State Machines. Used to model software behaviour. edit. Documentation. load. Inspection / review. save as. close. Model-based testing. ok. exit. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Reverse Engineering State Machines by Interactive Grammar Inference

Reverse Engineering State Machines by Interactive Grammar Inference

Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin

Page 2: Reverse Engineering State Machines by Interactive Grammar Inference

State Machines

• Used to model software behaviour

load

exit

close

edit

save as

ok

Documentation

Inspection / review

Model-based testing

Model checking

Page 3: Reverse Engineering State Machines by Interactive Grammar Inference

State Machines

• Used to model software behaviour

load

exit

close

edit

save as

ok

Documentation

Inspection / review

Model-based testing

Model checking

• Only useful if complete and up-to-date• Usually not the case due to time constraints and software

evolution

Page 4: Reverse Engineering State Machines by Interactive Grammar Inference

Reverse Engineering State Machines

• Static analysis – analysis of source code– symbolic execution, flow analyses, ...– Inevitably considers executions that are infeasible in

practice• Dynamic analysis – infer model from sample

executions– Favoured for accuracy– States considered equal if subsequent trace is similar– Variants of the k-tails algorithm [Biermann, Feldman-

1972] most common reverse engineering algorithm

Page 5: Reverse Engineering State Machines by Interactive Grammar Inference

Traditional Approach• For any point in a trace, its k-tail is the

following sequence of k events or functions– Point x is considered equivalent to y if the k-tails are equal<load,edit,edit,edit,save_as,ok,edit,edit>

load edit edit save_as ok edit editedit

Page 6: Reverse Engineering State Machines by Interactive Grammar Inference

load edit edit save_as ok edit editedit

Traditional Approach• For any point in a trace, its k-tail is the

following sequence of k events or functions– Point x is considered equivalent to y if the k-tails are equal<load,edit,edit,edit,save_as,ok,edit,edit> K=2

Page 7: Reverse Engineering State Machines by Interactive Grammar Inference

load edit edit save_as ok edit editedit

Traditional Approach• For any point in a trace, its k-tail is the

following sequence of k events or functions– Point x is considered equivalent to y if the k-tails are equal<load,edit,edit,edit,save_as,ok,edit,edit> K=2

load edit save_as

edit

ok

Page 8: Reverse Engineering State Machines by Interactive Grammar Inference

load edit edit save_as ok edit editedit

Traditional Approach• For any point in a trace, its k-tail is the

following sequence of k events or functions– Point x is considered equivalent to y if the k-tails are equal<load,edit,edit,edit,save_as,ok,edit,edit> K=2

load edit save_as

edit

okRemove

Non determinism load save_as

edit

ok

Page 9: Reverse Engineering State Machines by Interactive Grammar Inference

Problems• Too expensive if result is to be correct and complete:– Need complete set of executions up to certain length– Passive – all executions need to be presented at once

• If provided traces only partial (probable for non-trivial system) the resulting model is untrustworthy– Difficult to tell how complete the model is – what’s

missing?

load save_as

edit

okload

exit

close

edit

save as

ok

Page 10: Reverse Engineering State Machines by Interactive Grammar Inference

Regular Grammar Inference

• Given a set of valid and (optionally) invalid sentences from a language, infer its grammar.

• Regular grammars can be represented as deterministic finite state machines

• Problem of regular grammar inference equivalent to that of reverse engineering state machines

• Several sophisticated grammar inference techniques– Effectively address many problems that arise with

current reverse-engineering approaches

Page 11: Reverse Engineering State Machines by Interactive Grammar Inference

Benefits of Adapting Grammar Inference Techniques

• Active techniques – Do not require set of executions to be presented at

once– Interact with an oracle to identify missing information

• More efficient– Can efficiently process large sample sets.

• Reasonably accurate given sparse sets of executions– More sophisticated heuristics to accurately identify

equivalent states

Page 12: Reverse Engineering State Machines by Interactive Grammar Inference

Query-Driven State Merging (QSM)

• Devised by Dupont et al. • Combines benefits mentioned on previous slide– Active, efficient, reasonably accurate for sparse sets of

sample executions• Guaranteed to produce correct machine if set of

sample executions is characteristic:– Must cover every transition in the target grammar– Enough positive and negative samples to differentiate

between different states (to prevent false merges)– Questions aim to elicit characteristic sample from oracle

Page 13: Reverse Engineering State Machines by Interactive Grammar Inference

Query-Driven State Merging (QSM)<load, close, exit><load, edit, edit, save_as, ok, close, exit><load, edit, edit, edit, close, exit>

load close

exit

editedit save_as ok close exit

edit

close exit

Generate “Prefix Tree Acceptor”

Page 14: Reverse Engineering State Machines by Interactive Grammar Inference

Query-Driven State Merging (QSM)

load closeexit

editedit save_as ok close exit

edit

close exit

Attempt mergeProduce questions (executions valid in this machine, but not in unmerged version)

<close,exit>?<edit,edit...>?<Load,load,close,exit>?

Page 15: Reverse Engineering State Machines by Interactive Grammar Inference

Query-Driven State Merging (QSM)Attempt mergeProduce questions (executions valid in this machine, but not in unmerged version)If all questions answered yes,

merge nodesElse

add negative questions to graph

load close

exit

editedit save_as ok close exit

edit

close exit

close, edit

ActiveEfficientAccepts negative information about model

Page 16: Reverse Engineering State Machines by Interactive Grammar Inference

Implementation• Use Eclipse TPTP to record traces– Sequence of method calls → <load,edit...>

• Questions can either be answered manually– OR as tests directly to the system– Can vary number of questions generated

• QSM component accepts simple text files of strings (prefixed with “+” and “-”)

Page 17: Reverse Engineering State Machines by Interactive Grammar Inference
Page 18: Reverse Engineering State Machines by Interactive Grammar Inference

Evaluation

• Used traces to generate JHotDraw case study– Described in paper

• Generated random state machines – Subject to certain constraints – minimal, deterministic

etc.– Three sets of 10 random machines (5, 25, 50 states)– Random paths over these machines = initial set of

traces– Measured accuracy of final machine, and number of

questions required

Page 19: Reverse Engineering State Machines by Interactive Grammar Inference
Page 20: Reverse Engineering State Machines by Interactive Grammar Inference
Page 21: Reverse Engineering State Machines by Interactive Grammar Inference

Current and Future Work

• Identify data constraints associated with states– Can use tools such as Daikon

• Automatically answer queries– Static analysis – using call graph analysis to

automatically propose negative / impossible executions

– Automated test generation• Heuristics – can certain questions be safely

ignored?

Page 22: Reverse Engineering State Machines by Interactive Grammar Inference

Conclusions

• Preliminary results show technique is reasonably accurate and efficient

• Can potentially be almost entirely automated– Automatically generates tests (questions), many of

which can be eliminated by static analysis anyway• Grammar Inference is useful source of ideas

for dynamic analysis and reverse engineering