designing dashboards visualizing software metrics for...

Designing dashboards – visualizing software metrics for Continuous Delivery Fanny Chan

KTH ROYAL INSTITUTE OF TECHNOLOGY

E le c t r i c a l E n g i n e e r i n g a n d C o m p u t e r S c i e n c e

i

Abstract

Feedback is an essential part of the software delivery process. Software metrics, as feedback, can give knowledge about the essential parameters that affect the software development process. An improved understanding of the software development process can facilitate more effective software management. With new software development methodologies emerging, such as Continuous Delivery, new information needs arise. The new methodology requires a new way of thinking when designing and developing dashboards for software development. A dashboard is a communication tool that can provide up-to-date information through at a glance interaction. The purpose of this thesis was to investigate how different software metrics related to Continuous Delivery can be visualized in a dashboard system at the company Saab. This thesis used a user-centered approach to find the appropriate visualization and user context to provide the user with feedback that supports the software development. The thesis work included user observations in the form of interviews and contextual inquiry. Thereafter, prototyping and usability testing were conducted in two iterations to design the dashboard and gather feedback. The result of this thesis work was a final prototype that was implemented in the program Kibana using real-time data from a software project. This thesis presents a set of elements that should be included when designing a dashboard for software development based on the findings of this study. Keywords Continuous Delivery; Software metrics; Dashboard design; Information radiator

i

Abstract

Feedback är en av de essentiella byggstenarna i en mjukvaruleveransprocess. Med mätvärden för mjukvaruutveckling kan organisationen få en större kunskap om de väsentliga faktorerna som påverkar mjukvaruutvecklingen. En ökad förståelse kan leda till en mer effektiv hantering av mjukvaruutveckling. Med nya metoder inom mjukvaruutveckling, t.ex. Continuous Delivery, som betyder kontinuerlig leverans, förändras behovet av feedback och nya utmaningar uppstår. Den nya metoden kräver nya uppläggningar vid utformning och utveckling av informationsradiatorer, en typ av kontrollpanel för mjukvaruutveckling. En kontrollpanel är ett kommunikationsverktyg som kan bidra med aktuell information om situationen genom att ge en överblicksbild. Målet med denna uppsats var att utreda hur mätvärden för mjukvaruutveckling kopplade till Continuous Delivery kan visualiseras på en kontrollpanel på företaget Saab. Detta examensarbete använde en användarcentrerad metod för att undersöka de vilka visualiseringar var lämpliga att använda och i vilken kontextanvändarna ville ha feedback. Examensarbetet utförde användarobservationer i form av intervjuer och kontextuell utredning. Sedan gjordes prototyper och användbarhetstestning. Detta gjordes i två iterationer för att samla in feedback och designa den slutgiltiga prototypen. Den prototypen utvecklades i programmet Kibana och använde realtidsdata från ett projekt. Resultatet i denna uppsats är ett förslag på vilka element som borde tas i hänsyn när en ska designa en kontrollpanel för systemutveckling. Nyckelord Continuous Delivery; Mätvärden; Kontrollpanel; Informationsradiator

i

Table of Contents

1 Introduction ................................................................................................................. 5 1.1 Background ..................................................................................................................... 5 1.2 Commissioned work ..................................................................................................... 7

1.2.1 Division of work ...................................................................................................................... 7 1.3 Problem............................................................................................................................. 6 1.4 Purpose ............................................................................................................................. 6 1.5 Goal ..................................................................................................................................... 6

1.5.1 Benefits, Ethics, and Sustainability ................................................................................. 7 1.6 Methodology and methods ......................................................................................... 8 1.7 Delimitations .................................................................................................................. 8 1.8 Outline ............................................................................................................................... 9

2 Continuous Delivery .............................................................................................. 11 2.1 The concept of Continuous Delivery ..................................................................... 11

2.1.1 The deployment pipeline ..................................................................................................11 2.1.2 Maturity model ......................................................................................................................13

2.2 Software metrics .......................................................................................................... 13 2.3 Saab’s deployment pipeline ..................................................................................... 14

3 Dashboard design ................................................................................................... 17 3.1 Dashboard ...................................................................................................................... 17 3.2 Information visualization ......................................................................................... 18

3.2.1 Raw data ...................................................................................................................................19 3.2.2 Data structures ......................................................................................................................19 3.2.3 Visual structures ...................................................................................................................19 3.2.4 Views .........................................................................................................................................21 3.2.5 Interaction ...............................................................................................................................22

3.3 User-centered design ................................................................................................. 22 3.4 Related work ................................................................................................................. 23

4 Method ........................................................................................................................ 27 4.1 Overview ......................................................................................................................... 27 4.2 Literature study ........................................................................................................... 27 4.3 User observation ......................................................................................................... 28

4.3.1 Interview ..................................................................................................................................28 4.3.2 Contextual inquiry ...............................................................................................................29

4.4 Ideation session ........................................................................................................... 30 4.5 Lo-Fi prototype ............................................................................................................. 30 4.6 Hi-Fi prototype ............................................................................................................. 31 4.7 Usability testing ........................................................................................................... 31

4.7.1 Lo-Fi prototype......................................................................................................................31 4.7.2 Hi-Fi prototype ......................................................................................................................32 4.7.3 Number of test users ...........................................................................................................32

4.8 Implementation ........................................................................................................... 33

5 Interview and usability test result .................................................................... 35 5.1 User observation ......................................................................................................... 35

5.1.1 Interview ..................................................................................................................................35 5.1.2 Contextual inquiry ...............................................................................................................36 5.1.3 Software metrics in the lo-fi prototype .......................................................................36

5.2 Lo-fi prototype .............................................................................................................. 38 5.3 Usability test result from lo-fi prototype ............................................................ 40

ii

5.3.1 Build status dashboard ......................................................................................................40 5.3.2 Team dashboard ...................................................................................................................41 5.3.3 Feature dashboard ...............................................................................................................41 5.3.4 General feedback ..................................................................................................................42

5.4 Hi-fi prototype .............................................................................................................. 42 5.5 Usability test result from hi-fi prototype ............................................................ 46

5.5.1 Team dashboard ...................................................................................................................46 5.5.2 Feature dashboard ...............................................................................................................47 5.5.3 Overview dashboard ...........................................................................................................48 5.5.4 General feedback ..................................................................................................................48

6 Dashboard prototype and design guidelines ................................................ 49 6.1 Final prototype in Kibana ......................................................................................... 49 6.2 Design guidelines for designing dashboards .................................................... 50

6.2.1 Importance of information quality ................................................................................50 6.2.2 Context and interaction .....................................................................................................51 6.2.3 Dashboard depending on contexts ................................................................................51 6.2.4 Visualizations based on thresholds ..............................................................................52 6.2.5 Standardization and flexibility ........................................................................................53

7 Discussion .................................................................................................................. 55 7.1 Limitations of the study ............................................................................................ 55

7.1.1 Dissonance between thesis projects.............................................................................55 7.1.2 Technical limitations ...........................................................................................................55

7.2 Continuous delivery maturity model and dashboard .................................... 56 7.3 Selection of visualization .......................................................................................... 56

8 Conclusion and future work ................................................................................ 59 8.1 Conclusions .................................................................................................................... 59 8.2 Future work ................................................................................................................... 59

References .......................................................................................................................... 61

Appendix A ......................................................................................................................... 65

Appendix B ......................................................................................................................... 67

Appendix C ......................................................................................................................... 69

iii

List of Figures

Figure 2.1. A basic illustration of the pipeline by Humble and Farley (2010) . 12 Figure 2.2. Saab's current implementation of the deployment pipeline .......... 15 Figure 2.3. Saab’s deployment pipeline with used tools included .................... 15 Figure 3.1. Degree of visual emphasis on a dashboard (Few, 2006). .............. 18 Figure 3.2. The process of generating a graphical representation. ................... 19 Figure 3.3. Example of graphical elements and properties. ............................ 20 Figure 3.4. The user-centered design process. ................................................. 23 Figure 3.5. Jenkins Build Monitor Plugin (Jenkins 2017). .............................. 24 Figure 3.6. ElectricFlow with one of its dashboard views. ............................... 25 Figure 5.1. Paper prototype with build status for whole project and metrics.. 39 Figure 5.2. The paper prototype for a team. ..................................................... 39 Figure 5.3. The paper prototype for a feature. ................................................. 40 Figure 5.4. Hi-fi prototype for team dashboard. .............................................. 43 Figure 5.5. Hi-fi prototype for the second version of team dashboard. ........... 44 Figure 5.6. Hi-fi prototype of a feature branch dashboard. ............................. 44 Figure 5.7. Hi-fi prototype for a feature dashboard ........................................ 45 Figure 5.8. Hi-fi prototype for an overview dashboard. .................................. 46 Figure 6.1. Prototype in Kibana ........................................................................ 50 Figure B.1. The result of the ideation session…………………………………………….60 Figure C.1. Build status for all teams with additional metrics…………………….61 Figure C.2. Paper prototype with build status in a network graph……………….62 Figure C.3. Paper prototype for a team dashboard…………………………………….62

iv

List of Tables Table 2.1. Maturity model for Continuous Delivery at Saab. ............................ 14 Table 3.1. Encoding different data formats with preattentive attributes. ........ 21 Table 5.1. List of software metrics with description (Johansson, 2018) ......... 37 Table 5.2. Software metrics included in the hi-fi prototype. ........................... 46

5

1 Introduction

IT organizations have been moving from traditional development methodologies such as the waterfall model to more flexible and iterative ways of working (Sturm et al. 2017). Furthermore, to meet fast market changes and create more reliable and stable releases, new methodologies are developed to tackle these issues. One of these methodologies is Agile, which is a software methodology that consists of a set of practices that works towards iterative and incremental development cycles. The Agile Manifesto presents the principles behind it and states the most important objective: “Our highest priority is to satisfy the customer through early and continuous delivery of valuable software” (Agile Manifesto, 2001). Continuous Delivery is a software development methodology that builds on Agile practices (Wolff, 2017). Having continuous deliveries of features in a complex software product that exceeds a million lines of code requires extensible, reliable, and well-designed software architectures that provide short feedback loops (Staron et al. 2014). Hence, new information needs regarding monitoring the product’s quality emerges. Consequently, this requires new ways of thinking when designing and developing information radiators, dashboards that visualize software development processes (Staron et al. 2008).

1.1 Background

IT organizations face a number of challenges when working with software deployment. Software releases are seen as a complicated process which is only performed a few times a year (Wolff, 2017). Consequently, big batches of newly developed features are accumulated and released at the same time. A release of a big software change poses the organization to a higher risk of finding major issues that can cause the software to fail. Another challenge is long deployment lead time. Lead time is time taken to deploy a request and processing time is the time the task is worked on (Kim et al. 2016). Having a long lead time and a short processing time means that a feature request is mostly waiting in queues to be completed. Large and complex organizations that work with tightly-coupled, monolithic applications can have deployment lead times counted in months. The software development methodology called Continuous Delivery has been developed to deal with this set of challenges. The concept of Continuous Delivery is to shorten the software delivery process and produce software changes in short cycles but at the same time ensuring reliability. The goal is to reduce or eliminate the unnecessary steps in the process. One part is to automate recurring tasks and create a reusable deployment process. In this way, the deployment process can reduce cost, time and risk of delivering software changes (Chandrasekara, 2017).

Humble and Farley (2010) state that feedback is at the heart of any software delivery process. In software development, software metrics are used to give insights of the development process and whether the software development teams are meeting the expected goals. With Continuous Delivery, traditional

6

software metrics might not be applicable anymore, and new software metrics are needed (Lethonen et al. 2015). Humble and Farley (2010) also emphasize the importance of improving feedback by making the feedback cycles short and visible for everyone in a hard-to-avoid manner. One way to achieve shorter feedbacks cycles and make them visible is to broadcast the information using information radiators, dashboards used for software development (Cockburn, 2007). Dashboards are information tools that present information using graphical visualizations. The study of using graphical representation to display information is called information visualization. This thesis aims to study how feedback with software metrics should be visualized and the use of dashboard to broadcast information in a software organization.

1.2 Problem

When deploying Continuous Delivery in the development process, the organization will need to ensure that the automated deployment pipeline is delivering the expected outcome. Being able to always have code and infrastructure in a deployable state require that the software is monitored, and feedback is given to the organization as soon as possible. Moreover, continuous deliveries of complex software that have a big code base requires reliable infrastructure that can give stable releases. To have a better insight into the development process, software metrics need to be visualized appropriately to allow the users to take further actions. This thesis investigated how different software metrics can be visualized in a dashboard that gives the user quick, relevant, and easily understandable feedback from the Continuous Delivery deployment pipeline. From the above-mentioned problems, the following research question is formed: How can software metrics be visualized to provide quick and easily understandable feedback about the software development process?

1.3 Purpose

The purpose of this thesis was to investigate how different software metrics regarding Continuous Delivery can be visualized in a dashboard. This thesis presents how a user-centered design approach can be used to design this kind of dashboard and to find the appropriate visualization to provide feedback to the user.

1.4 Goal

The goal of this thesis was to design a dashboard system that visualizes different software metrics that are relevant for the deployment pipeline at the company Saab. The dashboard should provide quick and easily understandable feedback effectively, to assess the progress of the software development.

The expected outcome of this thesis project is a prototype of a dashboard and a set of design guidelines for designing an information radiator. The dashboard with visualizations can give the user insights of the current state of progress. Ultimately, this thesis contributes to a better understanding of designing

7

dashboards that are used to monitor software development. Furthermore, this thesis also contributes to an understanding of how a user-centered approach can be used to create a dashboard for software development.

1.5 Commissioned work

The company Saab commissioned this thesis work. Saab works with products and solutions for military defense and civil security. This thesis was devoted to supporting the software development at the business area Surveillance. The products developed at this department include both software and hardware that are integrated to create a complete solution. The department was deploying Continuous Delivery in their development process. One goal of the Continuous Delivery transformation was to implement an information radiator that could inform the organization how the software development process was performing. Four different development teams are included in this thesis work when designing the dashboard.

1.5.1 Division of work

This thesis work was conducted alongside with the thesis “Continuous delivery: Improving feedback with a user-centered approach“ by Johansson (2018). Johansson’s thesis investigates what software metrics are appropriate to use with Continuous Delivery, with Saab as a business case. This thesis incorporated findings from Johansson’s study and aimed to find the most appropriate way to visualize the software metrics. The collaborated parts of the thesis were: interviews during user observations, brainstorming session of software metrics and implementation of the final prototype. The authors designed the questions and conducted the interviews together. Thereafter, a brainstorming session was run to discuss what software metrics should be included in the first prototype. The implementation of the final prototype included retrieving data and designing it according to the findings in this thesis work.

1.5.2 Benefits, Ethics, and Sustainability

This thesis contributes to Continuous Delivery by investigating how software metrics can be visualized in a dashboard to support the software delivery process. Regarding benefits, the dashboard can support the organization to have a better understanding of the processes by providing useful and easy understandable feedback. In the long term, improved tools for detecting errors in the software delivery process can help the organization to eliminate errors faster. Hence, creating a more stable delivery process with a small number of errors and eliminate fatal errors earlier in the process.

One risk within this thesis project is that a poor dashboard design can have a reversed effect and reduce the effectivity if the design has fatal flaws in usability and user experience. For instance, the users do not understand the information or misinterpret it and therefore take the wrong action. Consequently, leading to a worsen working environment and the dashboard will not be able to reinforce a stable and sustainable delivery.

8

One ethical aspect of this thesis is that it involves interviewing people and test subjects in the usability tests. It is important that the interviewees be informed about their rights, e.g., they can choose to stop the interview at any time. As an interviewer, it is important to document the answers from the interviews as exact as possible, in order not to bias or misrepresents the users. In this thesis, all interviewees are kept anonymous, and recorded materials were deleted after transcription.

1.6 Methodology and methods

Two common research methods are used when conducting research: inductive and deductive approach. An inductive research method starts with observations and data collection. Based on the data and findings from the observations, theories are formulated and proposed at the end of the research process. On the contrary, a deductive research method aims to test a theory or a hypothesis. The hypothesis is formulated at the beginning of the research and then tested through empirical observation (Lancaster, 2004). There are two data collection methods: quantitative and qualitative data collection (Lancaster, 2004). Quantitative data collection gathers quantifiable data while qualitative data collects through detailed observations or interviews. Quantitative data is considered to be more objective compared to qualitative data, as it can be classified and is absolute. Although qualitative data is thought to be more subjective, Vine (2011) states that qualitative methods might lack in reliability, however, have higher validity than quantitative methods. This thesis project was conducted with an inductive research approach along with qualitative data collection. This research approach was considered to be the most suitable for this thesis project as the goal was to find patterns from observations rather than confirming a defined hypothesis. The data collection was qualitative, gathered from interviews, observations, and usability test. The thesis work started with a literature study to get insights in the field. Then, user observations in the form of interviews and contextual inquiry were conducted. The interviews were semi-structured because the goal was to have an in-depth understanding of the user’s needs while limiting the scope of the interviews to the specific topic. The contextual inquiry was used to understand the different contexts of the user and to get insights how they interacted with each other and with systems relevant for this thesis project. Prototypes were designed based on the gathered data and were used for usability testing to collect feedback. This process was iterative, and two sessions of usability testing were conducted in total.

1.7 Delimitations

One delimitation was that this thesis did not make an extensive study of software metrics. Instead, the selected software metrics based on the cooperating study by Johansson (2018). This thesis aimed to understand how the software metrics are best visualized and used to provide feedback to the organization. A comprehensive evaluation of the final prototype was not conducted at the end

9

of the study, because the goal was to find an appropriate dashboard design and visualizations with a user-centered approach. An exhaustive evaluation was therefore considered to be outside the scope of this study. Usability testing was performed throughout the thesis on the prototypes that were created in the dashboard. One technical delimitation in this thesis project was the exclusion of multimodality. Multimodality, e.g., audio or light, was not tested to be included in the dashboard. The objective of this thesis work was to design a dashboard that would be appropriate to use in Saab’s office landscape. Therefore, elements that could be considered to be disturbing in that specific environment were not part of this study. In this thesis study, the final prototype was implemented in the software Kibana. The choice of software was predefined by the company Saab, and therefore, no other software or tools were investigated.

1.8 Outline

This thesis includes the following chapters: Chapter 2: Continuous Delivery: This chapter presents the software development methodology Continuous Delivery and software metrics.

Chapter 3: Dashboard design: This chapter reviews the concepts of dashboard design and the concept of information visualization. Chapter 4: Method: This chapter reviews the selected methods to conduct this thesis project. Chapter 5: Interview and usability test results: This chapter presents the results from the user observations, the designed prototypes and the feedback from the usability testing. Chapter 6. Dashboard and design guidelines: This chapter presents the final dashboard prototype and provides design guidelines for a dashboard for software development. Chapter 7: Discussion: This chapter discusses the outcome of the study and factors that can have influenced the thesis work. Chapter 8: Conclusion: This chapter summarizes the thesis and concludes with potential future work for this thesis.

11

2 Continuous Delivery

In this chapter, the concept of Continuous Delivery is described. First, an introduction to Continuous Delivery is presented and thereafter the deployment pipeline is described. Then, software metrics are described. Last, this chapter includes a description of the company Saab’s current deployment pipeline.

2.1 The concept of Continuous Delivery

Continuous Delivery is developed from Agile software development. The core of Continuous Delivery is to create a deployment pipeline which ensures that the code and infrastructure always are in a deployable state (Kim et al. 2016). The deployment pipeline automates to a large extent the software rollout and reinforces low-risk processes for deploying new releases. Furthermore, Continuous Delivery emphasizes continuous and frequent deployments, rather than the traditional practices where releases are only performed a few times each year.

According to Wolff (2017), the most prominent effect of deploying Continuous Delivery in an organization is that it can bring features to the market faster, with substantially higher reliability into production than before. Humble and Farley (2010) state that the aim is threefold. First, it aids collaboration as the process of building, deploying, testing, and releasing is visible to everyone. Second, it improves fast feedback and allows teams to identify and resolve problems as early as possible in the process. Third, the fully automated process of the pipeline enables teams to deploy and release any version of their software to any environment at will.

2.1.1 The deployment pipeline

The deployment pipeline is seen as an automated manifestation of the process from getting software from version control into the hands of the users. The objective is to eliminate the risk of releasing changes that can cause errors in the system as early as possible. Continuous Delivery is preceded by Continuous Integration, a software development practice which aims to reduce integration problems in development teams (Lethonen et al. 2015).

The deployment pipeline consists of five stages: commit, acceptance test, capacity tests, user acceptance test (UAT) and release (Humble and Farley, 2010). An illustration of the five stages can be viewed in Figure 2.1. An instance of the pipeline starts when a developer commits a change to the version control system. This action triggers an event in the continuous integration management system. The change will go through the test suite included in the commit stage. Successful test result and build will trigger the acceptance stage. If the acceptance test also was successful, the pipeline branches to independent deployments to different environments for the capacity testing, UAT, and release. However, if the change generates failing result at any of these stages,

12

the pipeline will stop, and the changes are passed back to the developer to resolve the problem.

Figure 2.1. A basic illustration of the pipeline by Humble and Farley (2010)

Commit The deployment pipeline starts with the commit stage. This stage compiles, runs unit tests, performs code analysis, and creates binaries. The purpose of unit tests is to test the small modules of the application. Code analysis is performed to measure code quality and get useful diagnostic data (Wolff, 2017). The commit stage is seen as the primary defense against inadvertent changes to the system as it gives rapid feedback on the most occurring errors to the developers (Humble and Farley, 2010).

Acceptance testing Acceptance testing includes automated tests that should verify whether the features and functions fulfill the acceptance criteria based on the customer’s requirement. One of the goals of Continuous Delivery is that a feature should be tested at every commit. Thus, leading to that number of acceptance testing will increase. Manual testing would result in considerable testing cost and comprise problems such as unreproducible environment or bugs.

Capacity testing

Capacity testing ensures that the software provides the necessary performance. The capacity test should be performed after each commit as it ensures that the performance does not decline when adding a new feature (Wolff, 2017). If several features were tested at the same time, it could be harder to investigate which feature is causing the performance to suffer. Exploratory testing

Exploratory testing is mainly manual testing on new domain features and unanticipated behavior. Exploratory testing is preferable for non-functional requirements, including usability, design, and security requirements. Experts of the domain should test the application as they have the insight of the domain context and can provide more reliable feedback. Production The final stage of the pipeline is deploying software into production. Throughout the whole pipeline, the tests are completed in reproducible and highly identical environments to the production environment. Wolff (2017) means that Continuous Delivery changes the view of deploying, as it is just another execution of the process usually performed when setting up the different test environments.

13

2.1.2 Maturity model

A Continuous Delivery transformation should be completed gradually, and maturity models are developed to assess the organization’s maturity regarding its practices and to identify improvement areas. A maturity model can provide a clearer view of the objectives that need to be achieved. Several versions of the maturity models have been developed, e.g., Humble and Farley (2010) designed a maturity model for configuration and release management. The maturity model includes areas such as build management and continuous integration, release management and compliance, and data management. The maturity model is based on the authors’ experiences of consulting different organizations. However, maturity models can be adjusted and specified for an organization. A maturity model has been developed at Saab and can be reviewed in Table 2.1. The levels go from base to expert and include the steps build, architecture, test, culture, and visibility. In Table 2.1, creating radiators in the workspace that shows real-time status is one of the goals in the category visibility. This thesis work contributes to accomplishing that goal.

2.2 Software metrics

The emergence of new development practices has led to that traditionally used software metrics may not apply anymore. The outcome of the traditional techniques to measure software development may be dubious to the extent that it becomes irrelevant for its purpose (Misra and Omorodion, 2011; Lethonen et al. 2015). The goal of defining and measuring software metrics is to get knowledge about the essential parameters that affect the software development. Furthermore, improved use of software metrics can facilitate more effective software management (Mills, 1998). Humble and Farley (2010) emphasize that feedback is an essential part of the software delivery process. They also point out that and feedback cycles should be short and visible. One way to achieve this is to continually measure and broadcast the result, e.g., with an information radiator. An information radiator is a display, electronic or any other medium that contains information that is of interest and can update the observer at a glance (Cockburn, 2007). The term information radiator was coined in connection with agile software development to improve the communication within the team as well as to stakeholders. Ideally, an information radiator should be large and visible and contain continuously updated information (Agile Alliance, 2018a). Cockburn seems to define an information radiator as a kind of dashboard. In this thesis, the term dashboard will be used, meaning a dashboard used in software development.

14

Table 2.1. Maturity model for Continuous Delivery at Saab.

Base Beginner Intermediate Advanced Expert

Build - Nightly builds - Artifacts are managed

- Auto-triggered builds - Correction of broken builds (green floor)

- Deploy within the hour

- Auto-triggered notification to developers when a build fails

- Zero-touch continuous deployments

Architecture - The organization takes responsibility for the architecture and supports functional changes

- Modular, and loosely coupled architecture

- Developers must think about the complete system

- Activity sequencing – scheduling of prototyping, and pre-studies related to architecture, and system design

Test - Pre-tested commits - Fast feedback - Test selection on event basis, and schedulers

- Static code analysis - Code coverage - Unit tests

- The release branch is pristine - Automates release notes & software version description - One-click releases

- Automated regression tests - Automated test selection in real-time

- Automatic acceptance and performance tests - Risk-based manual testing - Verify the expected business value

Culture - Prioritizes backlog - Defined and documented processes

- Basic agile methods - Frequent commits - All commits are auto tied to a Jira ID

- Remove boundary between dev & ops - Share the pain

- Continuous improvements - Stable teams

- Products instead of projects - Agile method SAFe

Visibility - Build status is notified to committer

- Latest build status is available to all stakeholders

- Build and trend reports - Measure the process

- Radiators in workspace show real-time status

-Traceability built into Jenkins pipelines

2.3 Saab’s deployment pipeline

Currently, all stages of the Continuous Delivery pipeline are not implemented at Saab. Illustrated in Figure 2.2 is the current pipeline where the deployed parts are marked in yellow and blue. The grey parts indicate that the step is manually conducted. The commit stage is colored yellow, and the release and production stage are marked blue. The feedback pipe illustrates the process where a commit of a new feature returns to the developers if any errors occur. After the developer corrected the issue, the feature goes through all steps of the deployment pipeline again to ensure the quality. Developers mostly operate the commit stage. Thereafter, they also do some manual integration test before they

15

deliver the release candidate to the System Integration Team which mainly operate acceptance stage and load and performance stage. This team is responsible for testing the integration of software and hardware. The used tools to create the deployment pipeline are Bitbucket, Jira, Jenkins, and Artifactory. An illustration of how the tools are connected to each other can be seen in Figure 2.3. Bitbucket is a distributed version control system where the source code is stored (Bitbucket, 2018). JIRA is a project and issue tracking software where the team plans and follow up on projects (Atlassian, 2018). Jenkins is an automation server that supports building, deploying and automating projects, which is the basis for Continuous Delivery (Jenkins, 2018). When a developer commits a change to Bitbucket, it will trigger a new instance in the deployment pipeline, and Jenkins builds the software and runs the automated unit tests. If the changes successfully go through commit stage, the binaries are stored in Artifactory. Artifactory is a repository manager where binary artifacts can be stored and managed (JFrog, 2018). Thereafter, the binaries are packaged for acceptance tests, and load and performance test. When the changes finally go through all tests, it will be released and eventually go into production.

Figure 2.2. Saab's current implementation of the deployment pipeline

Figure 2.3. Saab’s deployment pipeline with used tools included

17

3 Dashboard design

This chapter presents the concept of dashboard design and the theory of information visualization. First, general dashboard design is presented, followed by an introduction to information visualization and the process of visual representation. Then, a presentation of user-centered design is given. Last, related work to this thesis is described.

3.1 Dashboard

A dashboard is a communication tool that monitors information at a glance that provides the user with up-to-date information about the current situation (Few, 2006; Kerzner, 2013). Few (2006) has defined the term dashboard as: “A dashboard is a visual display of the most important information needed to achieve one or more objectives; consolidated and arranged on a single screen so the information can be monitored at a glance.” A dashboard should contain the information that can tell the progress and point out if any events require the user’s attention. However, it does not need to provide the details that might be necessary to be able to act. The fundamental challenge of designing a dashboard is to include a vast amount of data into a limited amount of space yet being able to communicate information efficiently and immediately understandable (Few, 2006). Another challenge is to utilize the design space as efficient as possible. All information displayed on the dashboard should be important. However, when designing the dashboard, the importance of the information needs to be evaluated and ordered because some data might be more important than other. Few (2006) categorizes important data into two kinds, information that is always important and information that is only important at the moment and argues that these need two different ways of highlighting on a dashboard. The location of data is one aspect that can be used to highlight information that is important. Figure 3.1 illustrates the degree of visual emphasis on different regions of a dashboard. The top-left region and the center are the regions that have most emphasis. However, placing information in the center requires that it be separated from the rest of the information, e.g., with white space, to reach this degree of emphasis. The reason for the great emphasis on the upper left corner is primarily because of western language conventions where reading starts from left to right (Few, 2006). At places with other language conventions, a dashboard could have a different distribution of emphasis on dashboard locations.

18

Figure 3.1. Degree of visual emphasis on a dashboard (Few, 2006).

3.2 Information visualization

Information visualization is a field which concerns the use of graphical representation to display information (Card et al. 1999; Fekete et al. 2008; Mazza, 2014). One of the most prominent advantages of information visualization is that it can facilitate a better understanding of data (Fekete et al. 2008). Spence (2016) defines visualization as “Visualization is the activity of forming a mental model of something.”. He means that the mental model created by visualizations can enhance the understanding of the information. The visualizations can enhance the detection of patterns (Fekete et al. 2008). Visuals can act as temporary storage for human cognitive processes, providing a larger working set for thinking and analyzing. If a large amount of data needs to be communicated, the visual representation can provide an effective way of communicating underlying concepts, ideas, comparison and relations in the data. The challenge is to define the visual representation that will communicate the information effectively. Tufte (2001) claimed: “excellence in statistical graphics consists of complex ideas communicated with clarity, precision, and efficiency.”

19

Figure 3.2. The process of generating a graphical representation.

The process of generating a graphical representation includes four components: raw data, data structures, visual structures, and views. The user can interact with the views and change the transformations of the components (Card et al. 1999; Mazza, 2009; Spence, 2016). A common reference model of the process of generating a graphical representation can be seen in Figure 3.2.

3.2.1 Raw data

Raw data is collected data that is not yet structured. It can be gathered from one or several sources, e.g., logs and spreadsheets. The data can be in various formats: nominal, numerical, and categorical (Spence, 2014). An example of nominal and numerical data is the name and the age of a person. Generally, the collected data is not in an appropriate format for an automatic processing tool and requires pre-processing before being used for visualization (Mazza, 2009). For instance, raw data often contain errors or missing values that need to be addressed before being processed (Card et al. 1999).

3.2.2 Data structures

The raw data is then transformed into data structures. Data transformation is needed since raw data mostly comes in a format or a structure which is not appropriate for processing. For software to process the data, it has to be transformed into a logical structure which the software can interpret (Card et al. 1999; Mazza, 2009,). The data structure can be enriched by adding relevant information or perform some preliminary processing. For instance, filtering operations which can eliminate unnecessary data that will not be used. Adding attributes to the data, metadata, may be useful for logically organizing the data. After the raw data has been transformed into data structures, the data should be mapped to visual structures.

3.2.3 Visual structures

This process of mapping and transforming data into visual structures is called visual mapping (Card et al. 1999). When finding the corresponding visual structure, three elements must be defined: spatial substrate, graphical elements, and graphical properties.

20

Figure 3.3. Example of graphical elements and properties.

The spatial substrate defines the dimensions in physical space of the visual structure. An example of the definition of the spatial substrate can be regarding axes, e.g., x- and y-axes. Furthermore, the axis can be divided into the three data types; quantitative, ordinal, and nominal. Graphical elements are the visible elements that are shown in the visualization, which can be presented in four types: points, lines, surfaces, and volumes. Graphical properties are the properties of the graphical elements. Some common properties are size, orientation, color, texture, and shape (Mazza, 2009). An example of the different graphical elements and properties can be reviewed in Figure 3.3. Nowell et al. (2002) have done another categorization of graphical properties and use the term graphical device for describing the visuals elements, such as color hue, shape, saturation, and alphanumeric identifiers. The different graphical properties of the visual structures can differ regarding effectiveness in encoding. Several works have been done to classify and compare the effectiveness among the properties, e.g., MacEachren (1995), Few (2006) and Ware (2013). Preattentive processing is the early stage of visual perception that happens within the first 200 ms (Spence, 2014). It is the visual processing, detection of a specific set of visual attributes that are immediately perceived without the need for conscious attention (Few, 2006; Mazza, 2009).

The challenge of visual mapping for preattentive processing is because of the limited number of preattentive attributes and visual distinctions of a single attribute. Mazza (2009) states that no universal ranking of preattentive attributes exists due to the complexity and many factors that impact the choice of encoding. Ware (2013) suggests that preattentive attributes of visual perception are organized in the categories: color, form, spatial position, and motion. Further categorization is done for the different data formats. In Table 3.1, a categorization for preattentive attributes and the three data formats is illustrated by Mazza (2009). It can be seen in Table 3.1 that few of the attributes apply to be used for preattentive processing. A green check mark indicates that

21

the attribute is suitable, a blue dash indicates that there is limited suitability for the data format and a red cross indicates that it is not recommended.

Table 3.1. Encoding different data formats with preattentive attributes.

3.2.4 Views

The view is the result shown to the user (Mazza, 2009). The process of changing the view through interaction is called view transformation. View transformations specify graphical parameters such as position, scaling and clipping (Card et al. 1999). With a large amount of data, all visual structures might not be visibly supported in the available space. In these situations, interaction techniques provide a possibility to see details and get further insight. Three common view transformations are location probes, viewpoint controls, and distortion.

Location probes are view transformation that use location to show additional information, e.g., pop-up window providing details-on-demand. Viewpoint controls use affine transformation to zoom, pan, and clip the viewpoint. This view transformation allows the user to magnify visual structures and change the point of view to get more detailed information. Distortion provides views with focus + context. The focus gives details about the data but keeps the overview to provide context to the details (Card et al. 1999).

22

3.2.5 Interaction

With user interactions, the data can be explored, and the underlying patterns can be found. The user can have one or several tasks that needs to be completed to find the necessary information. Depending on the user’s tasks, the interaction techniques can vary to support the specific task. User interactions can affect all three transformations and control the mappings that exist in the process of generating visualizations. A visual information-seeking mantra has been established by Shneiderman (1996), which follows: “Overview first, zoom and filter, then details-on-demand.” It describes the tasks that the user can do at a high level of abstractions: overview, zoom, filter, details-on-demand, relate, history and extract. It can be seen as different interactions that could be implemented in the visualization to support the user to find the answer in the data.

3.3 User-centered design

User-centered design (UCD) is a design process where the user’s needs and requirements are in the center of all design decisions (Interaction Design Foundation, 2018). This design process revolves around the user, to create systems formed after the specific user group. Thus, the designer can create a usable and accessible system that provides user satisfaction and enhance user experience. The design process is often iterative, starting with identifying the user’s needs, using investigative and generative methods to understand the user’s context and requirements. Then a design phase is followed by an evaluation phase, where the user can evaluate how well the design is performing according to the context and requirements. Based on the evaluation, further iterations are done, starting with specifying the context of use. The design process is illustrated in Figure 3.4.

UCD is important in information visualization as its primary goal is to support the user to explore patterns and answers. Early in the design process, it is necessary to find the characteristics and context of the users. Depending on the tasks and the goal of the users, the design of the information visualization can differentiate. For instance, in a scenario with an ambulance driver, the task is well-formulated and known because of training. The provided visualization might need to have more emphasis on helping the user to find the needed information fast. In another scenario, the user’s task is poorly formulated, the design might need to provide more explorative tools and visualizations to complete the task, and speed may not be a requirement (Spence 2014). Romero (2017) poses that the design process for creating information visualization should be user-centered and iterative, to continually improve the design. The following questions should be addressed when creating the information visualization with a user-centered design process (Romero 2017):

1. Who is the user? 2. What are the tasks? 3. What is the data? 4. What are the visual structures? 5. How does the visualization support the tasks? 6. How can it be improved?

23

Figure 3.4. The user-centered design process.

3.4 Related work

A build monitor is a plugin to the automation server Jenkins that allows the user to create a dashboard that visualizes the build status of selected parts of the system (Jenkins, 2018). An example view of the build monitor can be reviewed in Figure 3.5. The build monitor updates automatically and uses the traffic light metaphor where colors indicate the build status, e.g., green for success and red for failure. The user can select to add additional information to be displayed on the monitor, e.g., the last time the components were built. This dashboard is currently used and placed at one of the entrances to the office landscape. The dashboard is clickable if the user wants to gain more information and redirects to a Jenkins webpage. Figure 3.6 shows an example of a dashboard that is designed to support Continuous Integration and Continuous Delivery. The dashboard is part of the product ElectricFlow that is created by the company Electric Cloud (Electric Cloud, 2018). The dashboard uses several visualizations to provide information to the users, e.g., donut chart for total deployment and bar chart for displaying applications. The product ElectricFlow consists of several dashboard views to support the organization.

24

Figure 3.5. Jenkins Build Monitor Plugin (Jenkins 2017).

Staron et al. (2008) conducted a study on developing dashboards for software engineering. The authors used the term measurement systems for a system that collect, analyze, and present data about software related activities. The study used a standardized process in the ISO/IEC 15939 standard to measure activities such as identifying, creating and evaluating measures. Staron et al. (2014) later published a study about dashboards for continuous monitoring of software products. In this study, three dashboards at three companies were examined, and the authors extracted the elements of successful dashboards and developed a framework for a dashboard monitoring software engineering. The most crucial element was to have the correct indicators and measures that the users wanted to see. However, additional five elements are required to make the dashboard widely used in organizations, including standardization, early warning, focus on decisions and predictions, succinct presentation and assuring information quality.

25

Figure 3.6. ElectricFlow with one of its dashboard views.

27

4 Method

This chapter presents the selected methods in this thesis work. First, an overview of the process is presented. Thereafter, a more detailed description of the subsequent steps is presented: literature study, user observation, lo-fi prototype, usability testing, hi-fi prototype and implementation of the dashboard.

4.1 Overview

It started with a literature study within the relevant fields for the project. Moreover, the literature study focused on reviewing related work to get an insight of current state-of-the-art. After the literature study, a user observation was conducted through semi-structured interviews and contextual inquiry. Then, a low-fidelity prototype was designed for usability testing. With the gathered feedback from the usability testing, a high-fidelity prototype was created. Another round of usability testing was conducted with the hi-fi prototype in a similar setting to the first usability test. Last, based on all received feedback, a final prototype was implemented in Kibana, a tool used to visualize collected data from ElasticSearch. This thesis project was conducted with a user-centered design process. A user-centered design approach was selected as it has several benefits when designing interactive systems even though it is often more expensive and time consuming (Benyon, 2014). One of the benefits is return on investment, where the involvement of users leads to fewer training materials, increased throughput and ensures acceptance among the users. Moreover, the users will be able to more effectively use the systems, thus, be more productive. Other benefits are safety, ethics and sustainability. Other methods, such as the ISO/IEC 15939 standard, do provide a standardized measurement process to identify, define, select, apply and improve a project or organizational measurement structure (ISO, 2017). However, little emphasis is placed in designing a measurement system or a dashboard that in best way can communicate the information. Therefore, even though this thesis aimed to create a dashboard which has low interactivity, a user-centered approach was seen as more suitable. The goal of this thesis was highly focused on designing a dashboard that could provide great user experience in terms of providing relevant and easily understandable feedback which in its turn can support the organizations daily work, which could be supported by a user-centered design approach.

4.2 Literature study

The thesis project started with a literature study. The goal of the literature study was to get a fundamental understanding of the main concepts of the subjects. The literature study mainly focused on Continuous Delivery, dashboard design, and information visualization. The literature study also explored previous works done in this field. Internal documentation was reviewed to get knowledge of the system development process and the Continuous Delivery pipeline at Saab.

28

In the early stage of the literature study, the books A Practical Guide to Continuous Delivery by Wolff (2017) and Introduction to Information Visualization by Mazza (2009) were read to get a fundamental knowledge about the subjects. Thereafter, another set of books were read to acquire more in-depth knowledge; this includes Farley and Humble (2010) book about Continuous Delivery, Few’s (2006) book Information Dashboard Design - The effective visual communication of data. Farley and Humble’s book is one of the first books about Continuous Delivery and gives a broader aspect concerning Continuous Delivery, using software metrics as feedback and maturity models. Few’s book provides this thesis a fundamental understanding of dashboard design and how it relates to information visualization. Thereafter, relevant articles were read to get more in-depth knowledge. The articles were found in several databases, including Royal Institute of Technology library database, SpringerLink, and ScienceDirect.

4.3 User observation

The first step of the design process was to do user observations. User observations were conducted to understand the context and the need of the users. In this study, the primary user for the dashboard was identified as software developers and the development teams as a whole. Thereafter, there are other secondary users who were identified as the DevOps developers and managers. User observations were performed in two ways: semi-structured interviews and contextual inquiry.

4.3.1 Interview

The selected interview method was semi-structured interviews. A semi-structured interview includes a set of predetermined questions; however, the interviewer can ask for clarification, add questions, or follow up on comments during the interview. This interview structure gives the interviewer more flexibility not to follow the script to gain additional insight, however, offers the focus of a structured interview (Lazar et al. 2010). Interviews were appropriate to use, as the goal of the initial exploration was to get an insight into how the software development teams were working. Moreover, the semi-structure provides the interviewer with a structure to collect comparable qualitative data, yet include open-ended questions to develop a keen understanding of the topic. The interview questions were mainly focused on two parts: Continuous Delivery and software metrics. If the interviewee already used a dashboard, a third subset of questions was asked as well. The interview questions can be seen in Appendix A. Some questions were based on previous studies that had been conducted in this field, e.g., the thesis by Jain and ram Aduri (2016). Other questions were formed with Johansson (2018). All interviews, except one, were conducted with Johansson and both interviewers made separate documentation. As both interviewers documented, the validity was increased, and answers could be compared. Voice recording was therefore considered not needed at this phase.

29

All interviewees were personnel at Saab and were selected based on their job role. To cover the current software development process, the selected interviewees were from different parts of the deployment pipeline. The software developers were selected from different teams, to ensure that an overview of the whole pipeline is acknowledged and diversity among the interviewees. In total, six people were interviewed and had the following job roles: software engineer, software developer, scrum master, design responsible, system integration leader and product owner. In this case, six interviews were considered to be adequate to get an insight of the current software development processes the interviews covered the whole deployment pipeline. Product owners and scrum master plan the sprints for the team. The scrum master, software developer, and software engineer develop and perform testing. The system integration leader conducts integration testing software with hardware. Design responsible is a broader job role which includes specifying requirements, software development, and planning.

4.3.2 Contextual inquiry

Contextual inquiry is a design method to systematically study people, task, procedures, and environments in specific contexts (Benyon, 2014; Privitera, 2015). This method allows designers to understand the social and physical environments of the context that form the tasks and user behavior more deeply. Contextual inquiry can highlight specific use patterns and behavior with the system. Furthermore, it can give highly detailed information compared to other qualitative methods that produce more high-level information (Interaction Design Foundation, 2018). From one of the interviews, it was discovered that one development team used the current dashboard on the daily morning meeting. Contextual inquiry was used to get an insight of how they use the dashboard and to find specific user patterns. Contextual inquiry was an appropriate method to use as it allowed the observer to gather data about the users with little inference. Contextual inquiry was selected when it was discovered in one of the interviews that the current dashboard was used during one of the team’s daily morning meeting (See Figure 3.5). Contextual inquiry was conducted in two morning meetings with two development teams. Observing two different teams could reveal if there were differences between the teams. The observer was in the meeting room meanwhile the team had their usual morning meeting. The observer’s primary objective was to take notes on the structure of the meeting as well as comments and interactions with the dashboard. The observer intended not to disturb the meeting and did not participate in any discussion. The observer stood in the corner of the room and documented interactions within the team. When the meeting was over, the observer asked questions to clarify any uncertainties or to get further comments from the scrum master. In both cases, the observer was only acquaintance with the scrum master of the team. Therefore, the presence of the observer in the meeting might have an impact on how team behaved or interacted.

30

4.4 Ideation session

An ideation session was performed by the author before the lo-fi prototypes were made. An ideation session aims to generate as many ideas as possible, to in a later phase select the best, most usable or any other criteria that the design requires (Dam and Siang, 2017). In this ideation session, the goal was to find possible ways of visualizing each metric extensively. Sketching was used to express ideas and potential solutions. Instead of documenting words, rough sketches can with simple means convey the basic concepts of the design. One advantage of sketching is also that visuals can trigger more ideas and provide a broader perspective (Dam and Siang, 2018). Before the ideation session started, the software metrics in Table 5.1 were reviewed to get a sense of what type of data this metrics included. On the ideation session, all software metrics were first written on a whiteboard. Sketches of different visualizations were drawn on post-it notes and attached to the whiteboard. When an exhaustive number of sketches were found, an attempt to group the software metrics were made. The goal of grouping software metrics together was to investigate if any of the software metrics could be visualized together. For instance, all software metrics that were related to code were grouped. Similarly, two metrics that were related to each other was number of bug reports and defect age, where the defect age could be shown coherent with number of bug reports. The ideation session started over to find different visualizations for the grouped metrics. The study of Mazza (2009) for categorization of preattentive attributes of visual structures were used to aid the ideation session (See Table 3.1). The table contains several ways data could be mapped to different visual structures and visualized. At this stage, the data type did not matter. Mazza’s (2009) categorization functioned more as a catalyst for finding new ideas. Furthermore, the related work that was researched during the literature study worked as a source of inspiration. Figure B.1 shows the result of the ideation session.

4.5 Lo-Fi prototype

Paper prototypes were made based on the findings from the user observation and sketches from the ideation session. A paper-prototype is categorized as a lo-fi prototype. A low fidelity prototype (lo-fi prototype) is a representation of the real product that is done quickly and is in general very cheap to produce. The lo-fi prototype differs from the real product in interaction style, visual appearance, and level of details (Walker et al. 2002). Paper prototypes were selected because they can be produced fast and in a low-cost manner yet convey the main design ideas. The objective of using lo-fi prototypes was to confirm different design ideas.

After the ideation session, the author selected a few sketches to make paper prototypes. The paper prototypes were made on papers that had the size 29,7 × 42,0 cm. The size of the paper prototype was somewhat similar to the design space and could ensure that the drawn visualizations had realistic dimensions. The paper prototypes can be reviewed in Section 5.2 and Appendix C.

31

4.6 Hi-Fi prototype

In contrast to lo-fi prototype, a high-fidelity prototype (hi-fi prototype) is a prototype that is functional. The hi-fi prototype offers a more realistic interaction and visual appearance (Walker et al. 2002). In this thesis work, the hi-fi prototypes were made in Axure. Axure is a prototyping program that provides tools for creating functional prototypes with conditional logic and dynamic content (Axure, 2018). From the usability testing with the lo-fi prototypes, feedback in the form of voice records, sketches, and comments was collected. The feedback was summarized and then used to improve the design and create the hi-fi prototypes. Additional metrics were found in Johansson’s (2018) thesis and were added to the hi-fi prototype. The objective of having hi-fi prototypes was to show the users a more realistic view of the final dashboard, thus collect feedback about usability issues and design flaws to be improved. Five hi-fi prototypes were designed and can be reviewed in Section 5.4.

4.7 Usability testing

User-based testing is a type of usability testing, where representative users are involved in the testing. The objectives of a usability test are to find design flaws, discover what is working and improve the quality. In general, usability testing can be conducted at any stage of the design process, from paper prototypes to user interfaces in implemented software (Lazar et al. 2010). User-based testing was selected above other usability testing methods, such as expert-based usability testing, as the target group was available. There are three types of user-based testing: formative, summative and validation test (Lazar et al. 2010). Generally, formative testing takes place early in the development and tends to be exploratory and test early design. Summative testing is done on a more formal prototype where high-level design choices have been made. The aim is to evaluate the effectiveness of the selected design choices. Validation test takes place right before the application is released to the customer. The purpose of this test is to validate that the interface meets the requirements. For instance, benchmarking that 90% of the users can complete a task within a specified time limit.

4.7.1 Lo-Fi prototype

For the lo-fi prototype, formative testing was used. The usability test was conducted with six people. It used a think-aloud approach where the test users were encouraged to express their thoughts and feelings aloud during the whole test session (Charters, 2013). Along with the think aloud, post-it notes were provided as an additional tool to make notes or sketches on the prototype and could be quickly attached to specific parts of the prototype. Both the test user and the observer could use this tool to clarify feedback as well as discuss new ideas. Since there only was one observer that conducted the usability test, as an additional tool, voice recording was used. Voice recording could ensure that the interviewee’s answers would be correctly documented. The voice records were then transcribed after the usability tests. Video recording was not an option due to company policies. Hence interactions or specific behaviors with the prototype could not be captured. Therefore, the post-it notes could support the observer on that specific aspect of the observation.

32

The usability test started with the observer briefing the test user about the usability test and its approach. Thereafter, the observer would show one prototype at a time and asked the test user to describe what the user saw. Due to the nature of the interaction where it is mostly at a glance interaction, no scenario-based setting was used for the usability testing. Instead, more emphasis was put on investigating how the test users responded to the software metrics and the design as well as trying to figure out what caught their attention in the prototype. The users could state where they would like to have interactions, e.g., click on a component for details-on-demand. Moreover, one of the goals was to understand in which contexts were it appropriate to use the dashboards. The observer’s responsibility was to encourage the test user to keep talking and clarify comments. The test session ended with a debriefing where the user could add further comments or thoughts of the prototypes. The paper prototypes can be viewed in Section 5.2 and Appendix C.

4.7.2 Hi-Fi prototype

The usability testing for the hi-fi prototype was conducted in a similar setting as the lo-fi prototype. The hi-fi prototype used formative testing as the dashboard design will not be finalized in this step. This thesis work will only provide with a final prototype and therefore was formative testing considered to be appropriate to use. The goal of the hi-fi prototype was to show a more realistic view of the dashboard and gain insights about design flaws and improvement areas. The usability testing started with the observer briefing the test user about the usability test and the think-aloud approach, where the user was informed that they were encouraged to speak out their mind throughout the test session. The usability tests were conducted in a controlled environment in the form of a meeting room. The prototypes were shown one at a time on a TV display. This setting mimics to some extent the situation of a morning meeting, where one person is moderating the dashboard meanwhile the team is watching on the screen. The observer and test user were sitting down during the whole usability test, which differs from the usual morning meeting that is stand-up meetings. However, the user was informed that they could stand up and point on the TV if they wanted to. During the test session, the observer was responsible for moderating the view meanwhile the test user described the view for the observer. The whole test session was voice recorded and then transcribed.

4.7.3 Number of test users

For the usability testing, 11 test users were recruited. Six of them participated in the lo-fi prototype usability test, meanwhile, five of them participated in the second usability test. To ensure diversity among the selected test users, the recruited test users worked in four different development teams and had different job roles. Job roles that were included in the usability tests were scrum masters, product owner, software engineer, DevOps engineer and software developers.

33

At the first usability test, three of the test users were interviewed during the user observations. On the second usability testing, one test user had been part of the user observation before. Meanwhile, the remaining test users were new to this study. This division of test users was a way to confirm that the prototypes realized the test user’s expectation that was described during the interviews. Furthermore, including new test users also ensured that the dashboard is approved by the rest of the team and fulfills the whole team’s need. Nielsen (2012) suggests that using 15 test users are sufficient for usability testing. However, it is recommended to use a smaller number of users and to have several tests. In an iterative design process, Nielsen (2000) means that five tests users will find about 85% of all usability problems. Therefore, it is better to distribute the user testing over several sessions in an iterative manner rather than have one big test session. After each usability test, next step is to redesign and do another session of usability testing. He claims that user research that is qualitative should aim at collecting insight to drive design and increasing the number of test users is therefore not guaranteed to discover more usability problems.

4.8 Implementation

The final hi-fi prototype of the dashboard was implemented in Kibana. Kibana belongs to the product series of Elastic and allows the user to create visualizations with collected data that is sent to Logstash and stored in Elasticsearch (Elastic.co, 2018b). The software will not be described in detail as it is considered to be out of scope for this thesis. Before the implementation started, some research was done to gain more knowledge about how the software worked. The implementation required scripting to fetch and send data, configuring the tools and preprocessing the collected data for Elasticsarch to interpret it and be used in Kibana. Data were collected from the different tools, such as Jenkins and Bitbucket. Thereafter, the gathered data was visualized using the software Kibana. The implementation was conducted with Johansson (2018).

35

5 Interview and usability test result

This chapter presents the results of the initial user observation and the usability tests and how the prototypes evolved based on the usability test. First, the key findings from the initial user observations are described, consisting of interviews and contextual inquiry. Then, the lo-fi prototype is described, followed by a summarization of the results from the usability tests. Then, the hi-fi prototype is presented, and last, the result from the usability test is presented.

5.1 User observation

The user observations were conducted at the beginning of the thesis project to gain insights about the user’s requirements and contexts. This information was gained through six interviews and contextual inquiry at two morning meetings. The findings from the initial exploration are presented in subsections 5.1.1 and 5.1.2.

5.1.1 Interview

From the interviews, it was discovered that most of the teams were not using any metrics at all or very few of them. One software metric more widely used was build status, which could have the states successful, failed, or unstable. Three of the development teams used this metric, and it is shown on the Jenkins build monitor (See Figure 3.5). This dashboard told the status of all components in the product and was placed at the entrance to the office landscape. One of the teams had a customized dashboard from Jenkins with only the components the team was working with. This team specific dashboard was used on the daily morning meeting. The same team mentioned that one feedback they would like to receive is a clearer fault identification as there was no clear flow of steps to get to the page with the error description. Few of the interviewees could name any particular software metrics that they wanted to use. Software developers were interested in getting more feedback about static code analysis, especially number of warnings. Warnings can be seen in the compiler and log files in Jenkins. However, it was never summarized and broadcasted for the whole team, and the page with error description could not be shown in an easily accessed way. Some mentioned that due to lack of automation in the process and automated data collection, metrics could not be shown with the most up-to-date information with no additional effort. Another discovery was the importance of seeing trends, e.g., trends of different warnings. One of the development teams had implemented a dashboard that shows the trends over successful and failing builds. The other development teams were not able to see trends and therefore could not assess if the software development process were improving or decreasing. The importance of seeing trends was to see when there were significant changes and being able to identify when it happens.

The system integration leader belonged to the System Integration Team and worked with system integration which includes testing, integrating and verifying the systems. The performed work was mainly done manually. The

36

team had just started to identify software metrics, e.g., start-up time for delivery. However, no investigation had been done yet to see what type of software metrics were available to collect regarding an automatized collection of data.

5.1.2 Contextual inquiry

In the contextual inquiry at the two morning meetings, it was discovered that the two development teams had different settings for the morning meeting. What they had in common were that in the meeting, the team would go through Jira. A person moderated Jira on a large display as each person described the progress of each task they were working with. Having daily meetings is a common practice within software development and is also known as a daily stand-up or daily scrum (Agile Alliance, 2018b). In the first morning meeting, the team had a stand-up meeting, meaning that all team members stand up during the meeting. This type of meeting is believed to keep the meetings shorter and more effective than sit-down meetings (Agile Alliance 2018b). The team used both Jira and a customized version of the dashboard (see Figure 3.5). Ambiguous color choices in the dashboard was noted, where the team was unsure what a yellow component meant precisely. The team members made several guesses. Moreover, the users had to click on several places in Jenkins, including going back to the first page due to unsuccessful attempts, before finding the desired page with the error message. On the follow-up, the scrum master emphasized the importance of having both team specific view as well as the overall view. One team member also mentioned that all software metrics are not always of interest. Instead, some metrics might be needed at different phases at the development cycle. For instance, software metrics that show the current status of the product need to be addressed immediately. Meanwhile, other metrics are more relevant during the sprint planning or retrospective. However, no clear division was provided. On the second morning meeting, the development team only used Jira. In a follow-up discussion, the scrum master mentioned that the team wanted to have a customized dashboard, similar to the other team in the first contextual inquiry. However, they had not yet been provided with one. The workflow for fixing errors that would show in the current dashboard placed in the office would begin with checking the error on the desktop. Depending on what kind of error it is, the person who has the most knowledge of the problem would take it on. If the problem could not be solved on the same day, the issue would be brought up again during the next morning meeting. This team wanted to see static code analysis regarding their software components. The dashboard could show software metrics that had significant variation, e.g., a sudden increase or decrease of warnings. A visualized trend would make the issues more visible than seeing it on Jenkins or locally on the developer’s computer.

5.1.3 Software metrics in the lo-fi prototype

A preliminary set of software metrics were defined and used in the first prototype. These software metrics were the findings from Johansson’s (2018) thesis. The table of software metrics includes a description of each metric (See

37

Table 5.1). All software metrics included in this stage were primarily in the data type quantitative, meaning that it is a numeric value is used to present it. However, using a spatial substrate with x- and y-axes, the x-axes can vary for the metrics. For instance, the number of warnings metric can be distributed over the ordinal data type of build number, but also over the nominal data which shows the number of warnings for each warning type.

Table 5.1. List of software metrics with description (Johansson, 2018)

Metric name Description

Code versus comments It measures the ratio between code and comments. A well-documented code can shorten the learning curve for people who have not worked with it before.

Code complexity

It measures the complexity of a program. Low complexity ensures that the code is easy to debug.

Code coverage Code coverage measures the degree a program is executed with a test suite. Having a high code coverage indicates that errors are detected as early as possible with tests.

Code duplication Duplicate code measures the sequences of code that appears more than once. It can help developers to reuse code.

Test pass % It measures the ratio between passed test cases and the total number of test cases to be run.

Build Quality Build quality measures the ratio between successful builds and the total number of builds.

Test lead time Test lead time measures how long a feature is in the test stage, including waiting time and processing time.

Defect Age Defect age measures how long a defect exists in the

program, starting from when it was discovered and ends when it is resolved.

Number of bug reports

Measures number of bug reports in the queue.

Review time Review time measures how long time a feature is in the review stage.

Release cycle Release cycle measures the time between two releases.

Build duration

Measures the time between build to all tests are completed.

38

Number of warnings Measures the number of warnings that are in the program. Warnings need to be addressed to prevent increasing technical debt.

5.2 Lo-fi prototype

Six lo-fi prototypes were designed for the usability test. The prototypes that were most preferred by the test users can be reviewed in this section. These prototypes were also used as the foundation for the hi-fi prototype. The remaining paper prototypes can be reviewed in Appendix C. Build status dashboard Figure 5.1 shows the build status of the complete system with all software modules. The software modules were visualized as squared components and used the traffic lights metaphor to indicate build status. This design were a potential redesign to improve the current dashboard (See Figure 3.5). In Figure 5.1 the purpose was to test if additional software metrics would be interesting to see in a dashboard placed in the office landscape. Also, all software modules were placed with an associated team. This placement would reduce the problem with the current dashboard, where the components changed position if new software modules were added. If a software module failed to build, the developer could quickly distinguish if it belonged to the team or not. The design also tried to test the level of details that were needed. Information such as manifest version, build number and last build time was removed, to investigate which information that was essential. By removing some of the information, the size of the software modules could be reduced, and additional metrics could be added. Team dashboard Figure 5.2 shows a potential team dashboard, where the team can see their software modules and processes. In the prototype, the team’s associated software modules are shown to the left and have the same visualization as in Figure 5.1. Furthermore, the prototype has a focus on the metric number of warnings. The warnings are shown as a trend with a line chart, as well as a number and a table with the most common warnings. Review time and Jira component are providing information about the progress of the sprint.

39

Figure 5.1. Paper prototype with build status for whole project and metrics.

Figure 5.2. The paper prototype for a team.

40

Figure 5.3. The paper prototype for a feature.

Feature dashboard Figure 5.3 was designed to be a dashboard that shows information about a feature which a developer is currently working on. Therefore, in this dashboard view, more focus is placed on software metrics that are related to code quality. It provides information such as warnings, both as a trend and as a snapshot which has categorized each warning type. It also included code quality metrics such as code coverage and code vs. comments visualized with the gauge visualization and code complexity and code duplication as numbers. Colors were used to show whether the software metric is met or not.

5.3 Usability test result from lo-fi prototype

The usability test result from the lo-fi prototype is presented in four parts: build status dashboard, team dashboard, feature dashboard and general feedback.

5.3.1 Build status dashboard

Several tests users thought that the prototypes in Figure 5.1 was similar the current dashboard and could therefore easily understand the view. Four out of six test users thought it was an improvement from the current dashboard because the software modules were grouped based on the teams. Logical ordering of the components could support faster recognition when passing by the dashboard and quickly see which team needs to take action. It would also prevent the problem with the current dashboard where modules will relocate when adding new software modules. However, the structure raised two concerns. The biggest concern was about placing the software modules in different teams. Certain teams worked with more modules than other teams, which would require more space for the team component or smaller space for

41

each component within it. Some modules are being worked with or are part of two teams’ work, which can create an uncertainty how these software modules should be placed. However, some logical structure of the ordering of the software modules were desired. The second concern was about having several software metrics on the same dashboard with build status would require lessening the detailed information about build status. The concern was that it would reduce the details to that extent that it would not support the developers anymore. Most of the test users preferred the prototype in Figure 5.1, as the software modules could provide more details compared to the other prototypes. Only one test user that thought that reducing the details was acceptable, given that the team had an own dashboard showing the build status for their software modules with more details. Another feature that was appreciated was the release summary in Figure 5.1, providing that the release summary could provide information shortly and concisely. This feature could give a better insight into what new software features are included in the releases. Test user 4 pointed out the following on the usability test: “As an overview, it can be too much [of components]. I am not sure how it will look in the end.

However, there is a risk that it will be cluttered. The old dashboard gave us details at the level

we needed it. Details should not be undervalued. “ In general, none of the test users were fond of having other software metrics than build status on the same dashboard if it was supposed to replace the current dashboard. It was mentioned that software metrics that concerned process and quality were good to have and could provide insights of the software development process for the organization. Three test users mentioned that it could be an own dashboard view.

5.3.2 Team dashboard

Figure 5.2 was highly appreciated among all test users. This dashboard could limit the information in a way that all data that was shown was relevant to the team. Moreover, it both showed build status for relevant software modules and other software metrics, which were a good combination. What all test user commented on was that the information about the warnings was visualized in too many ways. The least favored was the table with most common warnings, as the most common warnings might be the most unimportant one. One test user mentioned it could be changed to show the most prioritized warnings. This change was suggested for the other test users. However, the responses were indifferent. What was missing in the dashboard was test results and the information about the number of bug reports were also appreciated and was wished to be included in the dashboard view.

5.3.3 Feature dashboard

Figure 5.3 includes software metrics related to code quality, such as code complexity, code duplication, and comments versus code. Software metrics for code quality were in general appreciated. Several test users mentioned that the code quality metrics were good metrics to include and visualize. At the same time, it might be hard to include it due to the current working process. Currently, there is no outspoken requirement that the developers need to write unit tests or reach any number of code coverage. Instead, the developers are in

42

charge whether testing is needed or not for the new-written code. One test user stated that not reaching to the code quality measures should not generate a failed build. Instead, the dashboard should only provide the developer with information about the current status. Two test users were positive in using code quality metrics with thresholds if the team could adjust the limits themselves. Giving the teams the control was important as they would be possible to adjust to a reasonable limit. Setting a threshold that was too high from the current state might discourage the team. Meanwhile, setting own thresholds would allow the team to make improvements according to their ambitions.

5.3.4 General feedback

The knowledge of the different metrics varied among the test users. Several metrics were unknown for the test users, and it varied which metrics that were new to them. The three most unknown metrics were code vs. comments, build quality and sanitizer warnings. However, they mentioned that it was not a problem as they can be learned. Also, it might be easier to understand with more descriptive names and labels. None of the test users had heard about sanitizers before even though the tool was proposed to be used by the company. Sanitizers are tools that are used to find potential bugs in the system and is a part of the metric number of warnings. After explaining the tool and its usage, all test users agreed that it was good to include it. All test users were positive of the idea of having several dashboards, designed for different contexts. Having a dashboard view for the feature branch would allow them to receive feedback on the new code quickly. Meanwhile, the team dashboard could support the team to get an overview of their progress. The build status dashboard was appreciated as it had some logical structuring, which could increase the time it takes for the developer to identify whether an action was needed or not. However, some of the teams shared software components. Therefore, grouping based on teams was not appropriate.

5.4 Hi-fi prototype

After the usability testing with the paper prototypes, the feedback was summarized to create the hi-fi prototype. In total, three main views were designed, and two of them had two versions that also were tested. New software metrics were included to the hi-fi prototype (See Table 5.2). The new software metrics were included in the dashboard views that were considered to be the most appropriate one and would be of interest to the user. Overview dashboard The dashboard view in Figure 5.1 was decided not to be included in the hi-fi prototype since it was discovered that some teams had shared responsibility for software modules. Consequently, placing the software modules components into team-specific spaces would not provide any help. Moreover, two test users thought that it could be unchanged given that newly added components would not relocate all software modules in the dashboard. Also, the components could be placed sorted in alphabetical order to receive some logical structure.

43

Team dashboard In Figure 5.4, a hi-fi prototype of a team dashboard can be viewed, which derived from the paper prototype in Figure 5.2. The number of bug reports metric was added together with the alphanumeric visualization. The number of bug reports was important to see. However, the previous visualization showing the trend was not needed in the daily work. The Jira component was unchanged, despite adding the sprint dates to make the scope clearer. As the response to having the Jira component was unsure, it was unchanged to get more insights into the second usability test. Additionally, build history was added to the squared components and used weather icons to display. The weather icons are used in Jenkins, and therefore it was assumed to be familiar to the user. A new component with the heading “Last commits and stories merged” was added. This component would provide information in the form of text about the events that have happened in the team and would have similar functionality as the release summary component in Figure 5.1. Figure 5.5 is a second version of the prototype in Figure 5.4 and was designed to test if more emphasis should be placed on the build status information. Four circles were added to the software module component to create a faster understanding of where the issue has appeared in the build chain. The four circles represent the steps that are included in the commit stage in Continuous Delivery: compile, unit tests, analysis and build. Similarly, the traffic light metaphor is used to indicate the status. A darker hue of green was used to indicate that the step was successful. A darker hue of red indicated the specific step in which the error had occurred. A white circle indicated that the build chain never reached this specific step. The number of warnings metric was also relocated to be next to each software module to show warnings connected to each module. This relocation was based on test user 6’s feedback from the usability test.

Figure 5.4. Hi-fi prototype for team dashboard.

44

Figure 5.5. Hi-fi prototype for the second version of team dashboard.

Feature dashboard Figure 5.6 is a dashboard that visualizes software metrics related to a feature branch. It includes test result from the unit tests in the form of an alphanumeric mark and as a trend. The title and description of the Jira issue were included to make it clearer to which issue this feature was related. The review part was used to indicate whether the feature is reviewed or not. Code reviewing is part of the workflow before merging new code into the main branch. The second new component was the addition of three status components. Two of them are related to build status and visualizes whether the software module and the main component can build with the new code in the feature branch. The third component is hardware availability, which derives from Johansson’s thesis (2018).

Figure 5.6. Hi-fi prototype of a feature branch dashboard.

45

The second version of Figure 5.6 was designed for using it on a feature that is on the main branch. The difference between a feature branch and a feature on the main branch is that the feature branch is short-lived and cloned from the feature on the main branch. When the developer is done with developing the feature, the feature branch will be merged into the main feature. Therefore, a feature on the main branch is long-lived and should always be in a deployable state, according to the theory of Continuous Delivery. In Figure 5.7, the similar dashboard is adjusted to be used for a feature on the main branch. The changes that were done was that in the dashboard, more emphasis is placed on build history and the different steps that a software change had to go through in the deployment pipeline. CI cycle time was included to inform how often a new feature branch is merged into the main branch. The rest of the components are unchanged from the previous version.

Figure 5.7. Hi-fi prototype for a feature dashboard

Overview dashboard Figure 5.8 is an overview dashboard that was designed as the interviewees mentioned during the first usability test that additional software metrics should not be included in the build status dashboard. This dashboard view includes the software metrics that was previously used in the paper prototypes, e.g., build duration and build quality. It also included two new software metrics, CI cycle time and open pull requests. Some metrics are reoccurring from previous dashboard views, such as number of bug reports, technical debt, and code quality. It was mentioned that the history of bug reports was not of interest when reviewing the team dashboard. Therefore, it was tested whether it gave more value to be shown in the overview dashboard

46

Figure 5.8. Hi-fi prototype for an overview dashboard.

Table 5.2. Software metrics included in the hi-fi prototype.

Metric name Description Data type

CI cycle time CI (Continuous Integration) cycle time measures the time from the first commit to it is integrated with the main code.

Quantitative

Hardware availability

Hardware availability displays the availability of the software-specific hardware.

Nominal

(Binary)

Current stage of change

This metric shows where a new change is in the deployment pipeline, which enables users to track changes.

Ordinal

Open pull request status

This metric shows how long and why pull requests are open and waiting to be integrated.

Quantitative

& Nominal

Build status Build status shows the build status of a job in Jenkins and can have the following statuses: successful, failure and unstable.

Nominal

5.5 Usability test result from hi-fi prototype

The usability test result from the hi-fi prototype is presented in four parts: team dashboard, feature dashboard, overview dashboard, and general feedback.

5.5.1 Team dashboard

Build history was unknown among many of the test users. Some of them recognized the icons from Jenkins but did not use them or had reflected over what they meant. It was to some extent self-explaining, e.g., a sun represented

47

good build history, and a cloud with thunder meant something bad. However, the other icons could mean everything in between these two definitions. Moreover, the icons could not describe how far the build history went back, which was also mentioned to be important. Figure 5.5 is a second version of the prototype presented in Figure 5.4, where more focus was placed on the software modules. The different steps of the build chain were perceived as an improvement as it could speed up the process of pointing out the error. Test user 7 stated this is a good way to create a reaction within the team, as the developers can identify if that build was made by them through the ID number and immediately know where the error occurred. Test user 11 stated that the build chain was good to see as it could be reviewed during the morning meeting without checking an additional page. A similar view can be found in Jenkins which show a more extensive view with details. However, this was enough for the morning meeting and to get the information which specific step it was. The division of number of warnings in Figure 5.5 received a varied response. Test user 9 and 11 thought it was an improvement as their teams usually had developers working more closely with a specific software module. The division could more precisely point out where the problem was. The accumulated version would not be able to point the problem at first sight. However, test user 10 stated that it was not necessary as that could be shown in the next dashboard view, e.g., Figure 5.7, which could show the number of warnings per software module. Moreover, the added dashboard visualizations for the warnings made the view cluttered.

5.5.2 Feature dashboard

At the first usability test, several test users mentioned that test results were missing in the dashboard. Therefore, in the hi-fi prototype unit tests were added to Figure 5.6 and Figure 5.7. The most common feedback was that having a test result with only unit tests was not enough. On the contrary, it would not support the development team at all. Most of the development teams worked with unit tests to a lesser extent. Furthermore, there was a variance in what kind of test result that was wanted. One team that worked close to the hardware wanted to include hardware integration tests result in the dashboard. Meanwhile, another team wanted to include another set of tests result. Consequently, code coverage would also be inadequate, as unit tests were only a small set of all tests that would be done. The tests users mentioned that if it were possible, the proper solution would be to calculate the code coverage from all tests suits somehow. Furthermore, if the dashboard could somehow visualize percentages of the different tests in code coverage test the solution would be more nearly complete. Figure 5.6 and Figure 5.7 were designed from the paper prototype in Figure 5.3. Several test users mentioned that the same dashboard design could be applicable on a higher-level, e.g., on the whole project. Two of the test users thought that the usage of this dashboard view depended on how easy and accessible it was to create for each feature branch. If the process was automated, then this dashboard could be a good tool to provide immediate and visual

48

feedback. On the other hand, if it required additional effort in form of manual insertion of system information, then the feature branch dashboard will not be used. In Continuous Delivery, the goal is to make incremental changes and integrate it with short development cycles. Therefore, a feature branch should be short-lived. Test users mentioned that generally, a feature branch should not exist longer than a few days. Test user 7 provided the following insight on the usability testing: “This [feature dashboard] seems to look like per feature branch if you look on the heading. I personally, don’t think this is the right way to go. It tends to make feature bigger and it becomes more formal. I think it is much better to work with small changes and many branches. I think it becomes contradictory to Continuous Delivery, or more to Continuous Integration. […] Then, a feature branch does not live up to two weeks. However, this seems to be applicable for a feature on the main branch. Eventually, if there is a feature branch that involves a whole team or several teams, which is very big and complex and has a big impact. Then, this dashboard would be required on-demand for such a feature.”

5.5.3 Overview dashboard

The prototype for an overview dashboard was perceived to be not the most important view to see in the daily work. None of the developers thought that this would be an essential part of the work. However, it could be perceived as a proof that the whole project is functioning as expected. Test user 7 stated that this dashboard contained “background measurements,” general information that was not always of interest however it is still important to check on. The dashboard should be placed in a shared space that is used by many people, e.g., the coffee room. Two test users mentioned that it was interesting information but might be more of interest for project leaders or managers higher up in the organizations. Another feedback received from the overview dashboard was that it contained very different information. Some of the information was already displayed in other dashboards views. Therefore some test users thought that these metrics should not be equally emphasized as it can be seen on other dashboards. For instance, code quality metrics were not considered necessary to have equal emphasis as other metrics that were new.

5.5.4 General feedback

For code quality metrics that derived from static code analysis, e.g., code duplication and cyclomatic complexity, mainly two types of visualization were used: gauges and metric number. Gauges were criticized again as the organization did not have any standardized value or thresholds. However, visualizations such as gauges were more desired to have as it would be better on visualizing how much was left until the team reached the goals or how well the team is performing. On the other hand, visualizing with numbers and arrows with colors could create confusion. One test user mentioned that it was possible to interpret three things in the visualization. For instance, in the visualization of number of warnings in Figure 5.4, it could mean that there are 74 warnings, the color indicates that the team has surpassed the threshold and the arrow means that the number of warnings has increased.

49

6 Dashboard prototype and design guidelines

In this chapter, the final prototype in Kibana is presented and based on the result, a set of suggestions are presented for designing and developing a dashboard visualizing software metrics.

6.1 Final prototype in Kibana

The final prototype was created in Kibana after the usability tests. The software Logstash and Elasticsearch handled data collection, processing and storing data. The programs belong to a product series provided by Elastic. Kibana provides a set of visualizations that can be used to visualize metrics. Some of them can be seen in Figure 6.1 where line chart, pie chart, and alphanumeric are used. Raw data Raw data were collected from the various tools that were used to create the deployment pipeline described in Figure 2.3. In Figure 6.1, the software metrics used data from Jenkins and Bitbucket. Part of the work was to automate the monitoring, and therefore, Jenkins was configured to send data automatically whenever an event occurred. The event triggers Jenkins to send all data to Logstash, which then is stored by Elasticsearch. Some preprocessing is performed in Logstash and in Kibana to get additional information. For instance, Logstash adds metadata to describe the data and preprocessing can be done in Kibana, e.g., creating new data fields by calculating existing data and find the duration between two dates. Data structure The raw data is stored as indexes, which is a collection of documents. A document is a unit of information that can be indexed, searched, and visualized (Elastic.co, 2018a). The process of indexing will not be explained in this thesis as it exceeds the scope of the study. Visual structure The visual structures are predefined by the visualizations that can be used in Kibana. In this dashboard, the data has been mapped to graphical elements such as points, lines, and surfaces. The graphical properties that are used are size, orientation, color, and alphanumerical markers. A majority of the metrics are unidimensional; however, some metrics are shown as trends with line charts. In the line charts, the spatial substrate is defined as quantitative for the y-axes and ordinal for x-axes. View The view is composed of several visualizations. The user can relocate and resize the components to customize the dashboard after the user’s own needs. Location probes in the form of pop-up windows are one view transformation Kibana can provide.

50

Figure 6.1. Prototype in Kibana

Interactions This dashboard provides interactions such as details-on-demand and filtering. The details-on-demand is provided to some extent where hovering over a visualization. For instance, when hovering over a specific event on the line chart for number of commits, the pop-up window will state the exact number of commits and date. Filtering can be done by typing a filter query in the search bar on the top of the webpage. Another type of filter interaction is to click on the visualizations. A click on the build health pie chart can filter out to show only the values that are connected to the specific data set, e.g., successful builds. This action will update the whole dashboard dynamically.

6.2 Design guidelines for designing dashboards

This section presents guidelines for designing and developing dashboards visualizing software metrics. The guidelines for the dashboard are based on the findings in the usability tests that were conducted on four software development in the department Surveillance at Saab. The guidelines are importance of information quality, context and interaction, dashboard depending on context, visualization with thresholds and standardization and flexibility.

6.2.1 Importance of information quality

One of the most important aspects of the dashboard was that it was updated automatically and that it always showed the correct value. In the interviews during the user observations, test users mentioned that automation was an essential part of collecting data and broadcasting the most up-to-date information with no additional effort. Moreover, it was desired that the dashboard updated whenever a new event occurred, similar to how the deployment pipeline functioned, where an event should trigger another one. That would ensure that the dashboard facilitates as the tool that enables shorter

51

feedback cycles. Therefore, it was discovered that the visualization must update as soon as an event has occurred. Furthermore, the user should be assured by the dashboard that the information is not outdated or showing the wrong data. It was mentioned several times in the usability tests that clear labeling with the scope shown is required. Clear labeling included texts showing when the dashboard was last updated and where the data were collected. For instance, the Jira component in Figure 5.4 should include the name of the board and the dates of the sprint. The labels would then reassure the team that the correct board was shown for the users. Assuring information quality was one of the elements that should be included in a successful dashboard, according to Staron et al. (2014).

6.2.2 Context and interaction

When developing the dashboard system, the different contexts should be considered. The contexts should be the foundations of what the dashboard includes and the level of interaction, thus affects how the dashboard will be used. The ultimate goal is to create dashboards that can provide feedback loops in the different contexts of software delivery. The three contexts had been identified in this thesis. The first context is office landscape or in a shared room, e.g., the coffee room. A dashboard in this specific environment is also named to be an information radiator, according to Cockburn’s definition (2007). The second context is dashboard used at morning meetings, and the third context is when a developer is working with a feature. To the three contexts, two primary design spaces were identified. The first design space was using a big screen, e.g., a TV display, which can broadcast information to a larger audience. The second design space is a computer desktop screen, which is the user’s workstation. When using a dashboard for a broader audience at a common room, the level of interaction should be low, as the user is not supposed to interact further with a mouse or any other controller. The second context is during the morning meetings, where the team can through a dashboard track process and receive feedback from several data sources. Low level of interaction, such as click for redirection to another webpage or details-on-demand to find specific information have been desired. The third context is when the developer is working with the code. This context should provide fast feedback and acknowledge the user about the health of the new code. Ultimately, the user can explore to some extent where in the code the problem has occurred.

6.2.3 Dashboard depending on contexts

Throughout the whole thesis work, the distribution of different dashboard views for different context was appreciated by the test users. It offers a dashboard where all information is considered to be relevant and supports the user in different contexts. Therefore, when designing a dashboard for a software development organization, the dashboard should tackle the problem from several aspects. Thus, creating a complete solution. The following dashboards were identified for the company Saab: Overview, build status dashboard, team dashboard, and feature dashboard.

52

Overview dashboard The overview dashboard should contain information that involves the whole project and should be able to give an overview. The software metrics shown in this view should measure the whole project, including metrics such as build quality and CI cycle time. This dashboard is also the appropriate dashboard to visualize issues identified within the organization. One example given during the usability test was release on time, which was a problem in the organization at that time. This information concerned the whole project and therefore should be placed in the overview dashboard. Build status dashboard The build status dashboard contains only information about the build status of all software modules in the product. This dashboard should be placed in a location which is hard to avoid and inform everyone about the current situation. This information is interesting for more than only the developers as it shows whether the software is in a deployable state and concerns other stakeholders such as the DevOps team, product owners, and managers. Team dashboard The team dashboard is specifically designed to support the teams’ software development processes and daily work. If there are any test result or processes that only concerns this team, the results should be viewed in this view. Moreover, the team should have to some extent availability to change the visualizations and metrics in the dashboard. Feature dashboard The feature dashboard is the dashboard that should help the developers to receive fast feedback. The feedback is important for the user to decide whether the code changes meet the expectations that are established within the team or organization.

6.2.4 Visualizations based on thresholds

The use of visualizations varies depending on if there are thresholds or not. Without thresholds, using visualizations such as traffic light metaphors and arrows might be used to indicate the trends. However, it does not provide any insight if that is a reasonable value or not. Some values are more self-explaining, e.g., compiler warnings, which are not desired. However, other metric s such as code versus comments, it is not as apparent if an upgoing or downgoing trend is a good value. During the usability testing, test user 7 defined the goal of implementing a dashboard at Saab: “To make it valuable to see a number, you have to have a relation to them. For instance, here it says 164 in technical debt. You have to know if it affects you or not. Otherwise, you will not care about the number. That is what we try to achieve in the DevOps team when we talk about radiators; we want to create a behavior within the organization that looks at this screen that they should act when it is indicating red. No mail should be sent out, and no manager should need to delegate.”

Therefore, if no thresholds are provided, the dashboard need to provide some calculation whether the number is good or not to aid the user. In this case, an upgoing trend with arrows has been identified as the indicator. However, using

53

more detailed visualizations such as gauges should be further investigated if the threshold is established. Gauges were more desired to have as it would be better on visualizing how much was left until the team reached its goals or how well the team is performing.

6.2.5 Standardization and flexibility

When designing the dashboard for several development teams, the designer must consider whether specific dashboards should be standardized or adaptable. For dashboards to be supportive, it is important that it visualize the information that is needed. This thesis suggests that dashboard views should be modified after development teams’ and users’ needs. In this thesis project, it was attempted to design a dashboard system that could be utilized by all development teams that were included in this thesis work. The user observation aimed to discover what the teams had in common and visualized what seems to be relevant and supportive. Is was discovered that the development teams had a significant difference regarding information need. The fact that the department both works with software and hardware have an impact. One of the development teams worked more closely to the hardware. Meanwhile, another team worked more closely with the software. Consequently, the feedback the teams wanted depended on which specific part of the product they worked with. Staron et al. (2014) mention it is important with standardization with international standards as it increases the portability of measures. It functions as a communication tool to showcase for the customer the measurements and the progress. It can be argued that similar argumentation can be used for having standardized metrics within an organization. Having the same measurement enables portability, and other teams or stakeholders can easily get insight into another team’s work and make comparisons. Consequently, it creates a common language and reduces confusions or misinterpretations. At the same time, Staron et al. (2014) state that properly choosing stakeholders is crucial when adopting dashboards in a large organization. The stakeholders must have the authority to define the indicator and its threshold. They also need to have the authority to act upon the status of the indicator. The development team can monitor the validity of the plan and assess whether the development plan is upheld using software metrics. However, it can be questioned if the software development team can perform these tasks if the dashboard is not formed after their needs.

55

7 Discussion

This chapter discusses the result of this thesis. First, it discusses the limitations that existed in this thesis. Thereafter, the maturity rate of Continuous Delivery and how that might affect the dashboard design is discussed. Lastly, the selected visualizations are discussed.

7.1 Limitations of the study

In this section, the two major limitations of the thesis work are presented. The first limitation was due to dependencies on another thesis work that was conducted simultaneously. The second limitation was the technical limitations.

7.1.1 Dissonance between thesis projects

One limitation of this study was that the software metrics used in the design were based on the ongoing study of Johansson (2018), which had not concluded the results yet. Thus, the software metrics were updated continuously during the study. Leading to some software metrics were not included in the lo-fi prototype, however, were in the hi-fi prototype and vice versa. Moreover, the software metrics that were included in the second prototype did not go through the same generative ideation process as the preliminary set of metrics. In addition, there was a dissonance between the two thesis projects. Hardware availability was one of the software metrics that most people voted for to receive more feedback about in Johansson’s questionnaire. However, when the metric was included in the hi-fi prototype (see Figure 5.6 and Figure 5.7), the response to it was rather reserved. It was mentioned as “good to know” but not necessarily in the dashboard view that was shown for the test user. This difference in responses can be due to several reasons. For instance, the tests users that were part of the usability test did not vote for hardware availability or did not participate in the questionnaire. Another reason could be that it was not visualized precisely in the way the users had expected. Uncertainty exists in the result that cannot be explained. Ideally, this thesis project would be conducted after Johansson’s thesis or based on other studies related to software metrics that were completed. Then, the metrics for Continuous Delivery as well as for Saab as the specific context would have been defined before this thesis.

7.1.2 Technical limitations

The technical limitations were the availability of data and visualization opportunities. Data collection from various sources is required to visualize software metrics. Moreover, the data needs to be processed to get the desired software metric. For instance, CI cycle time was calculated by finding the different time stamps in the data that builds the metrics. When this thesis project was conducted, not all tools and data sources were available yet, and no processing of data had been done. Many of the code quality metrics used in the prototypes require some static code analysis tool to provide the data.

56

Consequently, in the final prototype, the number of software metrics visualized were restricted to the data sources that were available. The second technical limitation was the limited variety of visualizations in Kibana. It has a set of predetermined visualizations that can be used, such as bar charts, region maps, heat map. When the lo-fi and hi-fi prototypes were designed, the premise that all types of visualizations would be available was made. Therefore, when implementing the last prototype, not all software metrics were visualized. For instance, data about build status for a software component could be retrieved, however, no visual structure could support it in Kibana.

7.2 Continuous delivery maturity model and dashboard

The Continuous Delivery deployment at the Surveillance department was at a relatively early stage. At this stage, certain feedback was wanted, and a limited amount of feedback could be provided. As the organization’s Continuous Delivery processes mature, new feedback might be desired. This desire might affect the dashboard design as well as the dashboard visualizations. This issue was mentioned in the usability test with test user 8, which pointed out that there have been delays in the software releases. A dashboard visualization could be provided to gain a better understanding of this problem. For instance, the visualization could tell for how long it was delayed and related activities to it. The dashboard visualization could then uncover underlying patterns related to this specific issue. However, when the problem is solved as the release process improves, the metric would not require as much focus. Therefore, a change in the dashboard design might be needed, e.g., the information can be placed in another secondary view or be minimalized to be only displayed with a color that indicates whether the release was in time or not. Working with organization culture is also part of the Continuous Delivery maturity model developed at Saab (See Table 2.1). This thesis project designed a dashboard at a department which did not work with dashboards before. However, as the deployment pipeline matures, and the organization adopts to using a dashboard, new behaviors, and user requirements might appear.

7.3 Selection of visualization

The visualizations in the dashboard were primarily using alphanumerical marks, traffic light metaphors in several components and line charts. The selected visualizations based on the readings, especially from Few’s (2006) book about dashboard design and Staron et al. ’s study (2014). Few (2006) states the most commonly used visualizations in dashboard fall in the graph category, as the predominance of quantitative data that is visualized. Charts such as bullet graphs, sparklines, and box plots belong to this categorization. Moreover, icons such as up and down errors are commonly used. Text in the dashboard is both necessary and desirable, especially when only wanting to report a single measure alone without comparing it to anything. In such cases, using text to commnicate numbers are more direct and efficient than a graph.

57

Staron et al. ’s (2014) state that the most powerful visualization metaphors are using traffic lights and gauges and meters. Traffic lights are powerful visualization as they can quickly communicate the status. Gauges and meters can provide details such as numbers and still indicate the state by showing the graduation of the status. The traffic light metaphor is widely used in the dashboard for stating the different status. Gauges and meters were tested in the prototypes, however, due to no thresholds are used in the organization, numbers were more preferred to use to convey the current status with traffic light metaphor. Nowell et al. (2002) mentioned that for unidimensional displays, previous research has suggested digits gave the most accurate results and alphanumeric as a graphical device was effective for its purpose. The visualizations in the dashboard were selected based on the appropriate level of interaction and suitability of the actual data. The dashboard was mostly designed to support at-a-glance interaction, which limits the number of visualization that can be used. Furthermore, the variations of visualizations highly depend on the ideation process at the beginning of the study. In the ideation process, the goal was to generate as many ideas as possible to in a later stage summarize the sketches to create the prototypes. In other words, the ideation session is one of the essentials steps which builds the foundation for creating the dashboard. As the ideation session was performed by the author only, the ideas that emerged are based solely on the author’s previous experiences and knowledge. A different setting for the ideation process could have led to different kinds of visualization emerging. For instance, having an ideation session in a multidisciplinary team or together with the users. Similarly, the inspiration for designing the dashboard was taken from other dashboards used within the same field. Dashboards from other industries could be reviewed to gain a broader perspective and to get an even more fruitful ideation session. The usability tests were conducted in a controlled environment. The observer’s responsibility was to keep neutral, meaning that questions should be asked in a non-leading way. Since the author was responsible for transcribing, coding, and finding themes in the answers, personal interpretations might have affected the results. Moreover, test users might have felt that they need to please the observer, consciously or subconsciously, and thus answered untruthfully. Most of the test users had not used dashboard in the workspace previously at Saab. In the specific test setting, the user might rather choose familiar visualizations which they already have knowledge of rather than new kinds of visualization. Allowing the test users test the prototypes in a more natural setting might reveal the design flaws and opinions even better where they might feel freer to explore and take less pressure regarding time or give good feedback. Further studies can explore how the dashboard can include more views, where more complex information visualization might be more appropriate to use, and the emphasis is placed on interaction, and the user context is where the user spends more time reviewing the visualization.

59

8 Conclusion and future work

This chapter concludes this thesis and presents suggestions for future work.

8.1 Conclusions

The company Saab was deploying Continuous Delivery, and the organization wanted to implement a dashboard that could monitor the software delivery process and provide feedback. In this thesis project, a prototype was implemented in the program Kibana. Additionally, a set of design guidelines were constructed for designing dashboards visualizing software metrics. The expected effect of implementing the dashboards with the design guidelines is that it will support the organization to get insights into the software development process. The visualized feedback can create a shorter feedback loop and help the organization to identify areas of improvement and track progress to assess whether the development is proceeding as planned. The earlier an issue is detected, the easier and less-expensive the troubleshooting process will be. The dashboard aims to provide the fast and useful feedback that enables the quick detection in the software development process. One of the significant drawbacks of this study was that it depended on another uncompleted study. Leading to that the software metrics used in the study were not yet evaluated whether they were appropriate for the organization or not. Moreover, there were some technical limitations which resulted in that the last prototype in Kibana could not be implemented according to the earlier prototypes. This thesis work used a user-centered approach to design the dashboard. Throughout the whole study, interviews and user-based usability testing were conducted to receive represented users’ thoughts. To ensure the validity of this thesis, in the semi-constructed interview, both interviewers were responsible for documenting answers. Moreover, during the usability test, voice recording was used. Hence, reducing the risk of misinterpreting the user’s answers and increase the validity of the collected data.

8.2 Future work

The dashboard design guidelines in this thesis are based on the conditions of the company Saab. To evaluate the usability of this thesis’ results the dashboards need to be implemented following the guidelines in this thesis. Thereafter, an extensive evaluation should be conducted. On a larger scale, similar studies need to be conducted in other organizations to see if the design guidelines established in this thesis are useable in other organization. Then, the design guidelines can be modified and be more general and applicable for its purpose. Therefore, further studies on using user-centered approach to develop dashboards for monitoring software development processes in several organizations are needed to establish a universal framework and guidelines for this field.

60

A further study might be needed on software metrics, user context, and dashboard visualization at different stages of the maturity of the organization. To investigate the impact of the Continuous Delivery maturity has on dashboard designs for software monitoring. As the organization’s transformation is gradually changed, the dashboard might be updated to cohere to the organization's pipeline. Future work could, therefore, investigate whether there are differences in different phases of the transformation and therefore the dashboard design should be maintained.

61

References

Agile Alliance. (2018a). What is an Information Radiator? [Online] Available at: https://www.agilealliance.org/glossary/information-radiators/ [Accessed 5 Mar. 2018]. Agile Alliance. (2018b). Daily Meeting. [Online] Available at: https://www.agilealliance.org/glossary/daily-meeting/ [Accessed 9 May 2018]. Agile Manifesto (2001). Principles behind the Agile Manifesto. [Online] Available at: http://agilemanifesto.org/principles.html [Accessed 22 May 2018] Atlassian. (2018). Jira | Issue & Project Tracking Software | Atlassian. [Online] Available at: https://www.atlassian.com/software/jira [Accessed 5 May 2018]. Axure. (2018). Design the right solution. [Online] Available at: https://www.axure.com/ [Accessed 9 May 2018]. Benyon, D. (2014). Designing Interactive Systems - A comprehensive guide to HCI, UX and interaction design (3rd ed.). Italy: Pearson Bitbucket. (2018). Version control software for professional teams. [Online] Available at: https://bitbucket.org/product/version-control-software [Accessed 5 May 2018] Card, S.K., Mackinlay, J.D., & Shneiderman, B. (1999). Readings in information visualization - using vision to think. Chandrasekara, C. (2017). Beginning Build and Release Management with TFS 2017 and VSTS. Berkeley, CA: Apress. Charters, E. (2003). The use of think-aloud methods in qualitative research - An introduction to think-aloud methods", Brock Education Journal, vol. 12, no. 2, pp. 68-82. Cockburn, A. (2007). Agile Software Development - The cooperative game. Indiana: Pearson Education Inc. Dam, R. and Siang, T. (2018). Introduction to the Essential Ideation Techniques which are the Heart of Design Thinking. [online] The Interaction Design Foundation. Available at: https://www.interaction-design.org/literature/article/introduction-to-the-essential-ideation-techniques-which-are-the-heart-of-design-thinking [Accessed 29 May 2018] Dam, R. and Siang, T. (2017). What is Ideation – and How to Prepare for Ideation Sessions. [Online] The Interaction Design Foundation. Available at:

https://www.agilealliance.org/glossary/information-radiators/

https://www.agilealliance.org/glossary/daily-meeting/

http://agilemanifesto.org/principles.html

https://www.atlassian.com/software/jira

https://www.axure.com/

https://bitbucket.org/product/version-control-software

https://www.interaction-design.org/literature/article/introduction-to-the-essential-ideation-techniques-which-are-the-heart-of-design-thinking



62

https://www.interaction-design.org/literature/article/what-is-ideation-and-how-to-prepare-for-ideation-sessions [Accessed 9 May 2018]. Elastic.co (2018a). Basic Concepts. [online]. Available at: https://www.elastic.co/guide/en/elasticsearch/reference/current/_basic_concepts.html [Accessed 28 May 2018] Elastic.co. (2018b). Kibana: Explore, Visualize, Discover Data | Elastic. [Online] Available at: https://www.elastic.co/products/kibana [Accessed 14 May 2018]. Electric Cloud. (2018). ElectricFlow. [Online] Available at: http://electric-cloud.com/products/electricflow/features/ [Accessed 12 Mar. 2018] Few, S., (2006). Information Dashboard Design - The effective visual communication of data. Italy:O’Reilly Interaction Design Foundation (2018). Contextual Interviews and How to Handle Them. [Online] The Interaction Design Foundation. Available at: https://www.interaction-design.org/literature/article/contextual-interviews-a nd-how-to-handle-them [Accessed 17 Apr. 2018] Interaction Design Foundation (2018). What is User Centered Design? [Online] The Interaction Design Foundation. Available at: https://www.interaction-design.org/literature/topics/user-centered-design [Accessed 27 Feb.2018] ISO.org, (2017). ISO/IEC/IEEE 15939:2017(en). [Online]. Available at: https://www.iso.org/obp/ui/#iso:std:iso-iec-ieee:15939:ed-1:v1:en [Accessed 24 June 2018) Jain, A. and ram Aduri, R. (2016). Quality metrics in Continuous Delivery - A Mixed approach. [Online] Faculty of Computing Blekinge Institute of Technology. Available at: http://bth.divaportal.org/smash/get/diva2:945682/FULLTEXT02.pdf [Accessed 21 Feb. 2018]. Jenkins. (2018). Jenkins. [Online] Available at: https://jenkins.io/ [Accessed 5 May 2018]. JFrog. (2018). Artifactory - Universal Artifact Repository Manager - JFrog. [Online] Available at: https://jfrog.com/artifactory/ [Accessed 5 May 2018]. Johansson, S. (2018) “Continuous delivery: Improving feedback with a user-centered approach “, MA thesis, University of Trento. Humble, J., Farley, D. (2010) Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation. Pearson Education, Kerzner, H. (2013). Project Management Metrics, KPIs, and Dashboards: A Guide to Measuring and. Hoboken, New Jersey: John Wiley & Sons.

https://www.interaction-design.org/literature/article/what-is-ideation-and-how-to-prepare-for-ideation-sessions

https://www.interaction-design.org/literature/article/what-is-ideation-and-how-to-prepare-for-ideation-sessions

https://www.elastic.co/guide/en/elasticsearch/reference/current/_basic_concepts.html

https://www.elastic.co/guide/en/elasticsearch/reference/current/_basic_concepts.html

http://electric-cloud.com/products/electricflow/features/

http://electric-cloud.com/products/electricflow/features/

https://www.interaction-design.org/literature/article/contextual-interviews-a

https://www.interaction-design.org/literature/article/contextual-interviews-and-how-to-handle-them

https://www.interaction-design.org/literature/topics/user-centered-design

https://www.interaction-design.org/literature/topics/user-centered-design

https://www.iso.org/obp/ui/#iso:std:iso-iec-ieee:15939:ed-1:v1:en

http://bth.divaportal.org/smash/get/diva2:945682/FULLTEXT02.pdf

https://jenkins.io/

https://jfrog.com/artifactory/

63

Kim, G., Humble, J., Debois, P., Wiilis, J. (2016). The DevOps Handbook – How to create world-class agility, reliability, & Security in technology organizations. Portland: IT Revolution Press. Lancaster, G. 2004. Research Methods in Management. Oxford:Elsevier Butterworth Heinemann

Lazar, J., Feng, J. H., Hochheiser, H. (2010). Research methods in Human-Computer Interaction. Glasgow: John Wiley & Sons. Lehtonen, T., Suonsyrjä, S., Kilamo, T. & Mikkonen, T. (2015). Defining Metrics for Continuous Delivery and Deployment Pipeline. Proceedings of the 14th Symposium on Programming Languages and Software Tools. p. 16-30 15 p. (CEUR Workshop Proceedings; vol. 1525) MacEachren (1995). How Maps Work. New York: The Guilford Press.

Mazza, R. (2009). Introduction to information visualization. London: Springer.

Mills, E. E. (1988). Software Metrics. SEI Curriculum Module SEICM-12-1.1 Seattle University. Software Engineering Institute, Carnegie Mellon University

Nielsen. J. (2010). Why You Only Need to Test with 5 Users. [Online]. Available at: https://www.nngroup.com/articles/why-you-only-need-to-test-with-5-

users/ [Accessed 5 Apr 2018]

Nielsen, J. (2012). How Many Test Users in a Usability Study? [Online] Available at: https://www.nngroup.com/articles/how-many-test-users/ [Accessed 5 Apr 2018] Nowell, L., Schulman, R., Hix, D. (2002). Graphical Encoding for Information Visualization: An Empirical Study. In Information Visualization, 2002. INFOVIS 2002. IEEE Symposium on, pp. 43–50. Spence, R. (2014). Information Visualization. Cham: Springer International Publishing.

Sharma, S. (2017). The DevOps adoption playbook: A guide to adopting devOps in a multi-speed IT enterprise.

Soruceforge. (2018). Panopticode. [Online] Available at: https://sourceforge.net/projects/panopticode/ [Accessed 2018-02-07]

Privitera, M.B., 2015. Contextual Inquiry for Medical Device Design 1st ed., Elsevier Academic Press. Available at: http://store.elsevier.com/ContextualInquiry-for-Medical-Device-Design/Mary-Beth-Privitera/isbn9780128018521/

https://www.nngroup.com/articles/why-you-only-need-to-test-with-5-users/

https://www.nngroup.com/articles/why-you-only-need-to-test-with-5-users/

https://www.nngroup.com/articles/how-many-test-users/

https://sourceforge.net/projects/panopticode/

http://store.elsevier.com/ContextualInquiry-for-Medical-Device-Design/Mary-Beth-Privitera/isbn9780128018521/

http://store.elsevier.com/ContextualInquiry-for-Medical-Device-Design/Mary-Beth-Privitera/isbn9780128018521/

64

Staron, M., Meding, W., Hansson, J., Höglund, C., Niesel, K., & Bergmann, V. (2014). Dashboards for continuous monitoring of quality for software product under development. In I. Mistrik, R. Bahsoon, P. Eeles, R. Roshandel, & M. Stal (Eds.), Relating System Quality and Software Architecture (pp. 209–229). Waltham, MA: Morgan Kaufmann Publisher. Staron, M., Meding, W., & Nilsson, C. (2008). A framework for developing measurement systems and its industrial evaluation. Inf. Softw. Technol. 51, 721-737. Sturm, R., Pollard, C., & Craig, J. (2017). Chapter 10 - DevOps and Continuous Delivery. In Application Performance Management (APM) in the Digital Enterprise. pp. 121–135. Tufte, E. (2001). The visual display of quantitative information (2.nd Ed.). Cheshire, Conn.: Graphics Press. Vine, E. 2011. Introducing qualitative design. In: Richardson, P., Goodwin, A., Vine, E. (Eds.). Research Methods and Design in Psychology. United Kingdom: Learning Matter Ltd, pp. 97-109.

Walker, M., Takayama, L., Landay, J.A. High-fidelity or low-fidelity, paper or computer? Choosing attributes when testing web prototypes, Proceedings of the Human Factors and Ergonomics Society 46th Annual Meeting. September 29–October 4, 2002, Baltimore, USA, HFES, Santa Monica (2002), pp. 661– 665. Ware. C. (2013). Information Visualization - Perception for design. 3rd ed. Waltham: Elsevier/MK

Wolff, E. (2017). A Practical Guide to Continuous Delivery. Addison-Wesley

65

Appendix A

This appendix presents the questions asked on the interviews at the initial user. 1. What is your job title? 2. What is your job description? What do you work with? 3. Which product/development team are you working with? 4. Which development methodology does your team use? 5. How does your team ensure the quality of the product being developed? 6. What are the limitations of metrics in use to ensure quality? 7. If working with CD, which part of the pipeline do you work with? 8. Which type of feedback is important to you or would assist you in your

daily work? 9. Do you use any metrics today? 10. Which tools does your team use?

Questions that were asked if applicable, e.g. the interviewee already used a dashboard:

1. How are you using the current dashboard? 2. When are you using the dashboard? 3. What type of interactions do you have with the dashboard? E.g. desktop

and exploration or at-a-glance?

67

Appendix B

This appendix contains an image from the ideation session with sketches on post-it notes attached to a whiteboard.

Figure B.1. The result of the ideation session.

69

Appendix C

This appendix shows some of the paper prototypes that were used for the usability testing. In total, six paper prototypes were shown for the test users. Figure C.1 shows the build status for the whole project and the software modules are places with the associated teams. In addition, the software metrics lead time, releases per month and build frequency are visualized. Figure C.2 is another version of a build status dashboard that shows the build status of the whole project and the software modules are places with the associated teams. In addition, the software metrics lead time, releases per month and build frequency are visualized. In Figure C.3, more focus is on metrics that relate to code quality. It also shows test pass % and bug reports in trends, meanwhile other metrics such as code complexity, code coverage, and build quality provides numbers as a snapshot of the current moment. The traffic light metaphor is used to indicate whether the expected goal of the software metric is met or not.

Figure C.1. Build status for all teams with additional metrics.

70

Figure C.2. Paper prototype with build status in a network graph.

Figure C.3 Paper prototype for a team dashboard.

TRITA-EECS-EX-2018:372

designing dashboards visualizing software metrics for...

Documents