an overview of my phd research

76
Analyzing & visualizing spreadsheets Felienne Hermans (@felienne)

Upload: felienne-hermans

Post on 09-May-2015

14.089 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: An overview of my PhD research

Analyzing & visualizing spreadsheets Felienne Hermans (@felienne)

Page 2: An overview of my PhD research

Analyzing & visualizing spreadsheets Felienne Hermans (@felienne)

In this slidedeck I present an

overview of my PhD research. I

recently defended my dissertation

titled ‘Analyzing and visualizing Spreadsheets’

Page 3: An overview of my PhD research

In this slidedeck I present an

overview of my PhD research. I

recently defended my dissertation

titled ‘Analyzing and visualizing Spreadsheets’

This one!

Page 4: An overview of my PhD research

Bridging the gap

Funny story: I wasn’t hired to

research spreadsheets at all. When

I started my PhD project, I was

supposed to research the gap between business users and programmers.

Users

Programmers

Page 5: An overview of my PhD research

To research this gap, I started by studying business in practice

Page 6: An overview of my PhD research

What surprised me, is that this gap

wasn’t that big, it was more like a

small creek than a huge cliff.

Some programmers were heavilly

involved in business, and even more

interesting: some business guys were

doing serious programming.

Programmers

Users

Page 7: An overview of my PhD research

What surprised me, is that this gap

wasn’t that big, it was more like a

small creek than a huge cliff.

Some programmers were heavilly

involved in business, and even more

interesting: some business guys were

doing serious programming.

In Excel!

Programmers

Users

Page 8: An overview of my PhD research

What surprised me, is that this gap

wasn’t that big, it was more like a

small creek than a huge cliff.

Some programmers were heavilly

involved in business, and even more

interesting: some business guys were

doing serious programming.

In Excel!

So I looked into some previous work

on the impact of spreadsheets on

business.

Programmers

Users

Page 9: An overview of my PhD research

95% of all U.S. firms use spreadsheets for financial reporting

Page 10: An overview of my PhD research

90% of all analysts in industry perform calculations in spreadsheets

Page 11: An overview of my PhD research

50% of spreadsheets form the basis for decisions

Page 12: An overview of my PhD research

Importance can grow over time

When studying the impact of

spreadsheets, we found that they

do not become important

overnight. As processes change, spreadsheets can become key

company assets over time.

Nobody sets out to create a mission

critical spreadsheet, they “just happen”

Page 13: An overview of my PhD research

This is a simple spreadsheet for many users

Furthermore, spreadsheets can become surprisingly complex.

Page 14: An overview of my PhD research

And, spreadsheet exist ‘under the radar’

Another interesting property of

spreadsheets is that they often live

‘under the radar’:

There is no list of spreadsheets, no

one keeps track of what sheets are

needed for what report and some

spreadsheets do not have a clear owner.

Page 15: An overview of my PhD research

Only 33% of spreadsheets has a manual

Finally, spreadsheets are lacking

documentation. In only one third of

spreadsheets we found ‘documentation’ (i.e. Some sort of

explanation on how to use the

spreadsheet) Technical

documentation, explaining why a

spreadsheet was designed as it is, was hardly ever found.

Page 16: An overview of my PhD research

Complex spreadsheets without documentation can lead to serious errors

You can imagine the combination

of all the above facts:

• Spreadsheets are important

• They are complex

• They lack documentation

is a potential recipe for disaster.

And indeed, those errors happen

Page 17: An overview of my PhD research

The European Spreadsheet Risk Interest Group (Eusprig.org) collects horror stories

Page 18: An overview of my PhD research
Page 19: An overview of my PhD research
Page 20: An overview of my PhD research
Page 21: An overview of my PhD research
Page 22: An overview of my PhD research
Page 23: An overview of my PhD research
Page 24: An overview of my PhD research
Page 25: An overview of my PhD research
Page 26: An overview of my PhD research
Page 27: An overview of my PhD research

Estimated loss: 10 billion dollars a year

Page 28: An overview of my PhD research

We interviewed spreadsheet professionals

Once I had studied related

spreadsheet work and the horror

stories from Eusprig, I wanted to gain a deeper understanding of

spreadsheet problems in practice.

So I interviewed 27 spreadsheet professionals at the Dutch Robeco

bank.

Page 29: An overview of my PhD research

We interviewed spreadsheet professionals

Once I had studied related

spreadsheet work and the horror

stories from Eusprig, I wanted to gain a deeper understanding of

spreadsheet problems in practice.

So I interviewed 27 spreadsheet professionals at the Dutch Robeco

bank.

I asked only two questions (a semi-

structured interview) to obtain an overall view of spreadsheet

problems:

Page 30: An overview of my PhD research

What annoys you?

Page 31: An overview of my PhD research

And what makes you happy?

Page 32: An overview of my PhD research

Financial professionals spend 2 days a week working with Excel

From the interviews, we learned the

following facts

Page 33: An overview of my PhD research

Spreadsheets can have a long life, 5 years on average

Page 34: An overview of my PhD research

Average sheet is used by 12 different people

Page 35: An overview of my PhD research

There is a gap! Between importance and treatment.

Then I concluded that there is an

interesting gap that needs

bridging:

the gap between how important

spreadsheets are and how well

they are treated.

So how could this gap be bridged?

Page 36: An overview of my PhD research

It looks like software in the 70s!

Let’s summarize the problems

around spreadsheets again:

• They lack documentation

• They contain errors

• They stay alive for several years and are used by several people

• They are complex

Does this remind you of

something?

It reminded me of the problems in

the early days of software

Page 37: An overview of my PhD research

Hence, we tried to bridge this gap with methods from software engineering.

Page 38: An overview of my PhD research

Spreadsheet users lack great tool support

If you compare the tooling of

spreadsheet developers with that

of software developers, the difference is clear.

Page 39: An overview of my PhD research

Modern IDEs (like Visual Studio)

have all kinds of build-in tools to help you build software in a

responsible way: debugging,

testing, analyzing and visualizing

are accessible at the click of a

button.

Page 40: An overview of my PhD research

Compare this to a spreadsheet environment, like Excel. Lots of

support to create a spreadsheet,

with fonts and colors and borders,

but none of the helpful tools to

build a maintainable spreadsheet.

Page 41: An overview of my PhD research

We did not start coding immediately

However tempting, we did not start to build a spreadsheet IDE

immediately. Instead, we looked

at the results of the interviews, to

find the most pressing information

need that spreadsheet users had.

Page 42: An overview of my PhD research

Most important problem: support for understanding spreadsheets was missing

Page 43: An overview of my PhD research

To address this information need

specifically, we developed our tool Breviz.

This tool visualizes the

dependencies among worksheets, depicted as rectangles with arrows

drawn between them. The thicker

the arrow, the more connections

there are.

Example: In worksheet ‘POA

Project’ formulas are placed that refer to cells in ‘ProjectTeam’

Page 44: An overview of my PhD research

We went back to practice

With our tool, we went back to practice, to see whether it really supported spreadsheet users.

Page 45: An overview of my PhD research

Turned out, it did. Some of the

responses of users:

“This diagram reminds me of what I had in mind when building”

Page 46: An overview of my PhD research

Turned out, it did. Some of the

responses of users:

This remark is interesting: apparently, this spreadsheet user

did do some modeling before

building a spreadsheet.

“This diagram reminds me of what I had in mind when building”

Page 47: An overview of my PhD research

Turned out, it did. Some of the

responses of users:

A clear sign that we were on the right track!

“This makes my job 10 times easier”

Page 48: An overview of my PhD research

This work was published at ICSE 2011

Page 49: An overview of my PhD research

However, unexpected things also

happened. Not all spreadsheets looked as well structured as this

one.

Let’s look at some of them:

Page 50: An overview of my PhD research
Page 51: An overview of my PhD research
Page 52: An overview of my PhD research

Here, pink blocks represent

worksheets outside of the spreadsheet. So this spreadsheet

gathers information from over 20

other worksheets and combines

this information.

Page 53: An overview of my PhD research

Users diagnosed with the diagrams

We found that, due to the diversity on the diagrams, users started to

judge spreadsheets based on their

dataflow diagrams.

We therefore formalized this

feeling users had into ‘smells’ at

the design level.

These spreadsheet smells turned out to be very similar to code

smells as defined by Fowler.

Page 54: An overview of my PhD research

Consider for instance the ‘feature envy’ smell. This occurs when a

method from class B refers to

many fields outside its own class.

This method envies all the cool fields that A has, hence the name.

Page 55: An overview of my PhD research

Consider for instance the ‘feature envy’ smell. This occurs when a

method from class B refers to

many fields outside its own class.

This method envies all the cool fields that A has, hence the name.

Easy to see how this smell could

be defined on spreadsheets,

where a formula in worksheet B could be overly interested in cells

on worksheet A.

Page 56: An overview of my PhD research

We added support in Breviz for

detecting and visualizing these

inter-worksheet code smells.

Page 57: An overview of my PhD research

We went back to practice

Next, of course, we went back to

practice, to see how users felt

about the detected smells.

Page 58: An overview of my PhD research

“That should be improved”

Results showed that users

understoond why certain

constructions were qualified as

smelly.

Page 59: An overview of my PhD research

“That should be improved”

Results showed that users

understoond why certain

constructions were qualified as

smelly.

“This must be confusing for others”

Page 60: An overview of my PhD research

Published at ICSE 2012

Page 61: An overview of my PhD research

However, new problems were to be

discovered. We found that, once

the structure of the spreadsheets

had been understood and

validated, complex formulas still got in the way of understanding spreadsheets.

Page 62: An overview of my PhD research

This led us to the idea of formula smells

Page 63: An overview of my PhD research

Again, we took our inpiration from the smells that Fowler defines in his

canonical book on refctoring.

Page 64: An overview of my PhD research

Published at ICSM 2012

Page 65: An overview of my PhD research

In a recent extention of the paper, we also suggest refactorings

corresponding to smells.

This formula, for instance, contain

the same subformula twice. Extracting this subformula into a

seperate cell will improve

readbility.

Page 66: An overview of my PhD research

We went back to practice

And again... A look in practice

Page 67: An overview of my PhD research

We found that cloning (i.e. Copy

pasting) in spreadsheets was a problem. If data is copy-pasted,

updates will not be propagated to

the copies and that might lead to

errors.

Based on existing work in clone

detection in source code, we

developed an algorithm to detec

clones.

Page 68: An overview of my PhD research

Clone visualization was added to

our visualization, indicated with a dashed arrow. After all, when data

is copy-pasted between

worksheets, there is a dependency

between those worksheets (albeit a different one than a formula link)

Page 69: An overview of my PhD research

To validate our algorithm, we

performed a case study at the distribution centre of the South

Dutch food bank. There, they

process 100.000 kilos of food per

month, and keep track of that with spreadsheets.

We were able to detect 61 near-

miss clones, of which 25 were

actual errors.

Because of our analysis, this

distrubution centre is now running

error-free spreadsheets!

Page 70: An overview of my PhD research

To be published at ICSE 2013

Page 71: An overview of my PhD research

And this paper concluded my PhD

thesis.

I will continue to work on spreadsheet analysis for at least

five more years at Delft University of

Technology, so in the remaining

few slides, I’ll line out what I will be

working on in the future.

Page 72: An overview of my PhD research

Remember spreadsheets stay in

business for 5 years and are used

by 12 people during their life span?

This makes it interesting to consider

‘spreadsheet evolution’ and study

how spreadsheets are created.

Page 73: An overview of my PhD research

Visual Basic Analysis

In our current visualization and analysis technique, we only

consider formulas.

However, spreadsheets also allow for code to interact with data and

formulas (VBA code in Excel).

By analyzing this, we could make

our analysis more complete and interesting.

Page 74: An overview of my PhD research

Spreadsheet testing

Finally, we want to research how spreadsheet users test. One might

think that spreadsheet users do not test, but this is not true.

Page 75: An overview of my PhD research

In our previous studies, we often saw formules like this one. Here,

nothing is really calculated.

Instead, some sort of validation is

performed: if ‘find zone’!W3 is

smaller than 0, we are not interested in the value.

When we could extract these type

of formulas, we could use them to test the spreadsheet.

Page 76: An overview of my PhD research

Analyzing and visualizing spreadsheets Felienne Hermans

Thanks for reading about the

research adventure I was enjoying the past 4 years!

If you want to know more, have a

look at my blog: www.felienne.com

If you are intrested in collaborating,

please send me an Email [email protected] or a tweet @felienne