uk xl user conference 2006 spreadsheet design concepts simon murphy [email protected]...

39
UK XL User Conference 2006 Spreadsheet Design Concepts Simon Murphy [email protected] Developer – Codematic Ltd

Upload: dania-wells

Post on 14-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

UK XL User Conference

2006

Spreadsheet Design Concepts

Simon [email protected] – Codematic Ltd

UK XL User Conference

2006

Spreadsheet background• Up to 200 Mb size• Up to 1 Million formulas• 1-10,000 unique formulas• 5-10,000 lines of VBA• £Billions in values• Often linked to other technologies such as OLAP,

ADO, COM or .net etc.• Finance, Banking and Sales and Marketing areas• Development cost up to $1M• Active member of Eusprig – European Spreadsheet

Risk group – dedicated to raising awareness of dangers and error rates in commercial spreadsheets www.eusprig.org

UK XL User Conference

2006

Agenda• Fundamental Imperative• Security• Software Development Lifecycle• Design

– Technology choice– Inputs– Data– Logic/Formulas– Documentation

• Summary• Any Questions

UK XL User Conference

2006

Fundamental Imperative• Manage complexity (McConnell)

• Solution complexity grows at 4 x the rate of problem complexity. (Glass)

• [Things] should be as simple as they can be, but no simpler (Einstein)

• K.I.S.S.

• This principle should drive all other work.

• Easier to build, easier to test, easier to document, easier to use, etc… [No conflict]

UK XL User Conference

2006

Security• Is everybody’s concern• Spreadsheets can be used as a staging board for privilege

escalation (with your login details!)• Consider SD3 +C

– Secure by• Design• Default• Deployment• Communication

• Threat Modelling – Assets, Threats • Threat Types – STRIDE

– (Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privilege)

• Threats – rate with DREAD– (Damage potential, Reproducibility, Exploitability, Affected Users,

Discoverability)• Spreadsheets (all flavours) are fairly insecure

– Compiled UDFs (.net, COM, XLL) and Database servers can help• Set macro security to high and use code signing certificates.• See Microsoft MOC 2840A – Implementing security for more info.

UK XL User Conference

2006

Software Development Lifecycle• Systems Development lifecycle

– Requirements,– Analysis, – Logical Design,– <Technology Choice>, – Physical Design,– Construction, – Test, – Release, – Maintain. – In some shape or form.

• Spreadsheet Development lifecycle – “Oh! I need a model” – clickety-click, reasonableness check, release, (Test & Maintain in live environment).

UK XL User Conference

2006

Big Design Issues• Project Scope• Business needs• Security• Performance• Maintainability• Extensibility• Availability• Scalability• Human factors• Integration• Methodologies

UK XL User Conference

2006

Design – Excel or not• Excel/VBA is often not technically the best

solution– Databases better for large volumes of data– Compiled languages better for security– Spreadsheets are a 2 dimensional tool where

most business problems are 5+ dimensions

• It is good when considering– Initial speed of development– Cost of initial development– Current skill sets– Simple initial deployment

Note: If you need to restrict Excel functionality you may be better outside Excel

UK XL User Conference

2006

Design – 2 Questions, 2 Approaches

• 2 Questions– What will happen when things go right?– What will happen when things go

wrong?

• 2 Approaches:– What will the system do?– What real world objects am I modelling?

UK XL User Conference

2006

Assuming Spreadsheet

• Design– N-Tier– Defensive Designs– Inputs– Data– Layout– Formulas– VBA– Extending Excel– Documentation

UK XL User Conference

2006

N-Tier

• Basic version is 3 tier– Data layer– Logic (or analysis) layer– Presentation layer

• Each of these may be further broken out (into N tiers)

• Easily implemented in worksheets• Suitable for most non trivial workbooks• Powerful and flexible but adds weight

UK XL User Conference

2006

Basic design (N-tier)

Inputs

Reference Data

Logic

Reports

Assumptions

WorkbookStructure

Accountability

Revision history

Model Support structure

Demo

UK XL User Conference

2006

Defensive Designs• How will I test this?• Clearly identify Input areas• Group items that need updating with similar frequency• Place formulas where they are safe from accidents• Don’t use sheet protection, its weak, annoying, counter

productive, and reduces peoples ability to check and understand your logic (it creates more problems than it solves),(Trust – but check) Use a compiled component if security is a genuine concern. Demo

• Use cross checks• Use a (simple) consistent convention to separate items that

should be treated differently (inputs/formulas or actual/forecast)• Use blanks cells around different blocks (enables ‘Current

Region’)• Consider using a status sheet with summary error reports on it• Consider support structure – Index, assumptions, accountability.• Consider personal and tool comfort zone• Place check controls where they will be noticed

UK XL User Conference

2006

Inputs• Never trust inputs• Always test for correctness as soon as possible• Limit inputs wherever possible (eg option buttons,

or drop downs)• Use data validation but be aware of its limits demo• Help users by clearly identifying what is needed• Give clear feedback on errors or problems• Minimise input – Let Excel do the work• Have validation formulas nearby• If test is ok say “ok” (or “Row/Column check ok”)

not “”• Prefer Text in cells rather than comments

UK XL User Conference

2006

Database (Relational)• Useful for flexible and efficient storage and updating• Identify Entities (nouns), Attributes (adjectives) and

relationships (verbs) in problem statement• 2 approaches

– Top down (identify entities then attributes)– Bottom up (group attributes to describe entities)

• Data normalisation– Basically keep related items together– Provides design flexibility– 3rd Normal Form (TNF/3NF)

• All items depend on the key (a unique identifier), the whole key and nothing but the key.

• More useful at the transaction processing end rather than analysis and reporting. (eg sales recording)

• Analysis performance can be poor• PL demo

UK XL User Conference

2006

Database (OLAP)• Useful for flexible/powerful reporting.• The most useful concept of Dimensionality – the number of

ways to describe something• In a spreadsheet it’s the row, column, worksheet, workbook,

and maybe directory.• Very useful for things that may ordinarily be missed• How to describe the numbers (dimensionality)

– Eg Inflation <Time?, Country?, Cost Type?>– Eg. P&L value <P&L line description, Time, Business unit, value

type (actual, budget etc), currency, Company etc>– Useful for layout – time in cols or time in sheets?

• Hierarchies are ways of adding up dimension elements– There may be more than one hierarchy for any given dimension– Eg time– day > week > quarter > financial year– day > calendar month > calendar year

• Reporting performance can be excellent as many calculations pre-aggregated

UK XL User Conference

2006

Spreadsheet eg. of Dimensionality

UK XL User Conference

2006

Spreadsheet eg. of Dimensionality 2

338 is

Period 8 2005

Actual Figure

For Employment Costs

For the North

For Large Co

UK XL User Conference

2006

Pivot Tables

• Probably the most important feature in spreadsheets

• A superb way to manage complexity• Most users can’t access them because their

data is already half pivoted.• Repeated blocks are a strong hint to use

pivots• Pivot source data should have 1 one column

with numbers in (slightly simplified)• Demo

UK XL User Conference

2006

Software Development Principles

• Modularisation• Cohesion• Coupling• Fan in / Fan out

UK XL User Conference

2006

Modularisation• Basic idea – to break down complexity into understandable chunks

(note Millers Theorem (7+/-2))• Advantages

– Simplifies and adds analysis layers– Adds flexibility– Improves robustness– Reduces dependencies– Improves testability

• Disadvantages/Limitations– All Cells can be read from everywhere– Can add redundancy– Not really applicable for very small models

• Use of a block of cells for one (single) task

Volume Calcs (eg Geographic)

Total Volumes

Volume Summary

Total Revenue

Sales Price

UK XL User Conference

2006

Worksheet modularisation

UK XL User Conference

2006

Worksheet modularisation 2

UK XL User Conference

2006

Worksheet modularisation 3 (bad)

UK XL User Conference

2006

Modular blocks

A ‘block’ is a area of cells surrounded by empty cells, that performs some analysis

UK XL User Conference

2006

Cohesion

• How interrelated a unit is• High cohesion means all elements are

highly interrelated – this is good, it aids understanding and reduces range of influence

• Low cohesion makes things harder to understand like random letters

• Cohesion is like well normalised data – but also considers what the unit does

• Cohesion test…

UK XL User Conference

2006

Cohesion 2

• A<kfj\s =]’#\kdjw487

• Give me all your chocolate

UK XL User Conference

2006

Cohesion 3

• Who can remember the first string?

• The second?

UK XL User Conference

2006

Cohesion 4

• Keep different shaped data blocks apart

• Either diagonally or on separate sheets

• Try to make all formulas in a block similar

UK XL User Conference

2006

Poor Cohesion

UK XL User Conference

2006

Poor Cohesion 2

UK XL User Conference

2006

Coupling

• How strongly 2 separate elements depend on each other

• Low coupling is better, especially through clearly defined interfaces

• High coupling often means hidden dependencies which generally leads to incorrect modifications (side effects)

• Example: hard coded cell addresses in VBA code

UK XL User Conference

2006

Fan in / fan out

• Low fan in – a ‘unit’ (cell/worksheet/VBA routine) depends on only a few other units– Good because it minimises dependencies and

reduces complexity• High fan out – a ‘unit’ (cell/worksheet/VBA

routine) is used by many others– Good because it minimises duplication

• Example: putting VAT (Sales tax) rates in their own cells, and referring there in calculations

• Example: calculating an offset once and using the result many times

UK XL User Conference

2006

Logic/Formulas• Sketch design on a whiteboard first• Don’t use IF(ISERROR(, be specific demo• Put expected part first in Ifs• Don’t start in A1 – Give yourself room to manoeuvre. Try

D10 – hide the unused.• Line sheets up on first data cell rather than headers• Don’t hide rows and columns for visual effect, use a

separate sheet if required• Use Goal seek and or VBA rather than circular references• Avoid more than 2 or 3 levels of formula nesting – break it

out across several cells.• Use tools – many pay for themselves on first use. But don’t

totally rely on them.• Place totals above and to the left of details (more flexible

and robust eg links, filtering)• Build for testability – how to test you did what you meant to

do

UK XL User Conference

2006

VBA Connection• VBA UDFs should get all range info via parameters• Where (non UDF) VBA uses worksheet ranges, these should almost

certainly be named ranges.– Sheet1.[Inflation].value = 0.02, rather than– Sheet1.[C5].value = 0.02, which may become invalid if rows or

columns are inserted or deleted– strInflationRange = “C5” is just as bad – it creates a hidden

dependency that must be manually updated when worksheet changes (poor coupling)

• Variable and routine names:– Use very meaningful names (8-30 chars length)– Use a simple naming convention (matched pairs)– Use scope prefixes (g, m), and minimise it– Don’t have to use data type variable prefixes (s, str, l, lng etc) (see .net

advice)– avoid abbreviations

• Don’t use code comments demo make code ultra clear instead• Option Explicit on • Avoid Application.Run (non VBA – breaks error management)• routines 1 screen long max• 3-4 levels of nesting max• Use source control

UK XL User Conference

2006

Extending Excel

• Excel is powerful not perfect• Leverage benefits whilst managing

weaknesses by using complimentary technologies

• Data: VBA, ADO, ODBC, OLAP, .net, COM, Info Bridge, XML, DDE, Web Queries, SOX Solution Accelerator

• Logic: VBA, COM, .net, xll, Pivots, Filters• Search the web for vast array of free or

cheap tips and tools.

UK XL User Conference

2006

Documentation• If you design for simplicity only very minor additional

documentation is needed• Design and build the user Interface with the Users

needs primary, the documentation will automatically be there

• Design and build the business logic parts with the maintainers needs primary, the main documentation will automatically be there, expand as required.

• Integral tests should explicitly clarify intent.• Reports should contain enough description to be

meaningful.• External documentation is almost always so out of date

its worse than useless.• Excessive documentation is too hard to plod through• Poorly targeted documentation is pointless• Working software is more useful than documentation• Sometimes documentation is important

UK XL User Conference

2006

Summary• Manage complexity• Consider security carefully• Excel/VBA not the best tool for everything• Be defensive, especially with inputs• Understand your data• Aim for cohesive models with low coupling• Use names to connect VBA to worksheets• Use complimentary technologies where

appropriate• Manage documentation

UK XL User Conference

2006

Questions?

[email protected]– Spreadsheet consulting, reviewing,

maintaining, rescuing, migrating, add-in development etc.

– Staff coaching, mentoring and training

• Websites– www.codematic.net– www.xlanalyst.co.uk