[ieee 2011 37th euromicro conference on software engineering and advanced applications (seaa) -...

9
Kanban Implementation in a Telecom Product Maintenance Marko Seikola Team Coach [email protected] Hanna-Mari Loisa Team Coach [email protected] András Jagos Team Coach [email protected] Abstract— Telecom Product Maintenance at two Ericsson R&D Centers implemented Kanban. The third level maintenance is performed by the Customer Support Request (CSR) teams and by the design maintenance teams, which are located in two sites. There are separate backlogs for the customer support and for the design maintenance in the primary site and a third backlog for the design maintenance in the secondary site. Pull mindset, team working, team empowerment, and continuous improvement have become part of the everyday activities. Best practices from Scrum have been selected to complement the Kanban implementation. Also, the metrics have been reviewed. The major challenges have been related to the boundary rules, for instance, multi-site working, the platform dependency and the service level agreements but also to the roles and responsibilities. The Kanban boards and the chosen practices have been adjusted as the understanding of Lean has increased. This industry paper presents the journey to Lean product maintenance including the identification of the key success factors that are to some extent generalizable. First, the overall implementation is discussed followed by a deeper description of the implementation both in the CSR handling and in the Fault Handling (design maintenance) including also the experiences from the secondary R&D site. The paper continues by discussing the identified challenges and positive effects. At the end of the paper, the implementation concepts and key success factors are stated. Keywords- Kanban, Customer Support, Design Maintenance, Lean, Agile, Multi-site I. INTRODUCTION Lean Software Development Focuses on improving the lead times and removing the waste from the workflow. In a telecom product maintenance, Kanban was selected as the method to implement the Lean mindset. To be precise, there are in fact three slightly different setups within the same organization. The best practices from the Scrum framework have been selected to complement the Kanban approach. During the transition, multiple challenges have been faced. This paper provides insights into performing a massive Lean transition in an organization that is constantly in customer contact. The paper first briefly introduces the base setup, followed by the introduction of the three different Kanban implementations mainly focusing on the people, the backlog, the Kanban board, meetings, Work in Progress limits, and metrics. After the introduction of the current state, the implementation concepts are summarized, followed by an identification of positive experiences and challenges. As Kanban is highly adaptive in nature, these challenges are constantly in focus. At the end of the paper, the key success factors are stated. Hence, this industry paper provides a case study that can be utilized as a base for an organization in the beginning of a Lean transition. II. BACKGROUND An Ericsson R&D Center in two locations develops a complex mobile network product. The product is mature. In addition to the development, the maintenance of the product, i.e., the third level support, is located in both sites. A support case is assigned in a form of a Customer Support Request (CSR). In 2009, the primary R&D site launched a transformation to implement the Scrum framework in the new feature development. It was realized that the sprints required by the Scrum framework would not fit into the hectic product maintenance. Indeed, if a customer escalates a severe problem, the concepts of sprints and sprint commits are not applicable. As usual in customer care, there are contracts that define the target lead times. In addition to the Service Level Agreement, an internal agreement has been written to define the target lead time for each level of the support organization. Previously, the CSR handling was performed individually. All handlers were highly experienced and thus they were often able to solve the CSR by themselves. The work was assigned by a project manager (push mindset). The design maintenance was separated from the CSR handling. If a software fault is found, a Trouble Report (TR) is issued towards the design maintenance. Earlier, the TRs were fixed by groups of individuals that were formed around different parts of the software. These groups were called product teams. The teams possessed a wide knowledge of the status of the specific product part since they handled all of the TRs affecting their responsible area. III. OVERALL SOLUTION According to the Kanban Primer [1], the first area utilizing Kanban in software development was maintenance. Thus, it was decided that the product maintenance would experiment with Kanban. Since the primary site already had the first experiences of Scrum, the best practices from Scrum 2011 37th EUROMICRO Conference on Software Engineering and Advanced Applications 978-0-7695-4488-5/11 $26.00 © 2011 IEEE DOI 10.1109/SEAA.2011.56 321 2011 37th EUROMICRO Conference on Software Engineering and Advanced Applications 978-0-7695-4488-5/11 $26.00 © 2011 IEEE DOI 10.1109/SEAA.2011.56 321 2011 37th EUROMICRO Conference on Software Engineering and Advanced Applications 978-0-7695-4488-5/11 $26.00 © 2011 IEEE DOI 10.1109/SEAA.2011.56 321

Upload: andrs

Post on 12-Jan-2017

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: [IEEE 2011 37th EUROMICRO Conference on Software Engineering and Advanced Applications (SEAA) - Oulu, Finland (2011.08.30-2011.09.2)] 2011 37th EUROMICRO Conference on Software Engineering

Kanban Implementation in a Telecom Product Maintenance

Marko Seikola Team Coach

[email protected]

Hanna-Mari Loisa Team Coach

[email protected]

András Jagos Team Coach

[email protected]

Abstract— Telecom Product Maintenance at two Ericsson R&D Centers implemented Kanban. The third level maintenance is performed by the Customer Support Request (CSR) teams and by the design maintenance teams, which are located in two sites. There are separate backlogs for the customer support and for the design maintenance in the primary site and a third backlog for the design maintenance in the secondary site.

Pull mindset, team working, team empowerment, and continuous improvement have become part of the everyday activities. Best practices from Scrum have been selected to complement the Kanban implementation. Also, the metrics have been reviewed. The major challenges have been related to the boundary rules, for instance, multi-site working, the platform dependency and the service level agreements but also to the roles and responsibilities. The Kanban boards and the chosen practices have been adjusted as the understanding of Lean has increased.

This industry paper presents the journey to Lean product maintenance including the identification of the key success factors that are to some extent generalizable. First, the overall implementation is discussed followed by a deeper description of the implementation both in the CSR handling and in the Fault Handling (design maintenance) including also the experiences from the secondary R&D site. The paper continues by discussing the identified challenges and positive effects. At the end of the paper, the implementation concepts and key success factors are stated.

Keywords- Kanban, Customer Support, Design Maintenance, Lean, Agile, Multi-site

I. INTRODUCTION Lean Software Development Focuses on improving the

lead times and removing the waste from the workflow. In a telecom product maintenance, Kanban was selected as the method to implement the Lean mindset. To be precise, there are in fact three slightly different setups within the same organization. The best practices from the Scrum framework have been selected to complement the Kanban approach. During the transition, multiple challenges have been faced. This paper provides insights into performing a massive Lean transition in an organization that is constantly in customer contact.

The paper first briefly introduces the base setup, followed by the introduction of the three different Kanban implementations mainly focusing on the people, the backlog, the Kanban board, meetings, Work in Progress limits, and

metrics. After the introduction of the current state, the implementation concepts are summarized, followed by an identification of positive experiences and challenges. As Kanban is highly adaptive in nature, these challenges are constantly in focus.

At the end of the paper, the key success factors are stated. Hence, this industry paper provides a case study that can be utilized as a base for an organization in the beginning of a Lean transition.

II. BACKGROUND An Ericsson R&D Center in two locations develops a

complex mobile network product. The product is mature. In addition to the development, the maintenance of the product, i.e., the third level support, is located in both sites. A support case is assigned in a form of a Customer Support Request (CSR).

In 2009, the primary R&D site launched a transformation to implement the Scrum framework in the new feature development. It was realized that the sprints required by the Scrum framework would not fit into the hectic product maintenance. Indeed, if a customer escalates a severe problem, the concepts of sprints and sprint commits are not applicable.

As usual in customer care, there are contracts that define the target lead times. In addition to the Service Level Agreement, an internal agreement has been written to define the target lead time for each level of the support organization.

Previously, the CSR handling was performed individually. All handlers were highly experienced and thus they were often able to solve the CSR by themselves. The work was assigned by a project manager (push mindset).

The design maintenance was separated from the CSR handling. If a software fault is found, a Trouble Report (TR) is issued towards the design maintenance. Earlier, the TRs were fixed by groups of individuals that were formed around different parts of the software. These groups were called product teams. The teams possessed a wide knowledge of the status of the specific product part since they handled all of the TRs affecting their responsible area.

III. OVERALL SOLUTION According to the Kanban Primer [1], the first area

utilizing Kanban in software development was maintenance. Thus, it was decided that the product maintenance would experiment with Kanban. Since the primary site already had the first experiences of Scrum, the best practices from Scrum

2011 37th EUROMICRO Conference on Software Engineering and Advanced Applications

978-0-7695-4488-5/11 $26.00 © 2011 IEEE

DOI 10.1109/SEAA.2011.56

321

2011 37th EUROMICRO Conference on Software Engineering and Advanced Applications

978-0-7695-4488-5/11 $26.00 © 2011 IEEE

DOI 10.1109/SEAA.2011.56

321

2011 37th EUROMICRO Conference on Software Engineering and Advanced Applications

978-0-7695-4488-5/11 $26.00 © 2011 IEEE

DOI 10.1109/SEAA.2011.56

321

Page 2: [IEEE 2011 37th EUROMICRO Conference on Software Engineering and Advanced Applications (SEAA) - Oulu, Finland (2011.08.30-2011.09.2)] 2011 37th EUROMICRO Conference on Software Engineering

were implemented to complement Kanban. For instance, all teams have the daily meetings and the retrospectives. In fact, the engineers even participated in a two-day Scrum course.

The chosen approach follows the Kanban basics as defined by Kniberg and Skarin [2]:

• Visualize the workflow • Limit Work In Progress (WIP) • Measure the lead time and optimize

On top of the Kanban basics – or perhaps as a ground for Kanban – the Lean principles [3] have been widely discussed and adopted. For instance, there is a common target of removing waste defined by Poppendieck [3].

To establish a solid ground for the teamwork, the offices have been renovated and walls have been removed. As a result, the CSR handling teams are now sitting in one open office and the Fault Handling teams in another. The open offices are located in the same corridor. The layout of the rooms is such that each team has their own desk. The maintenance management is placed to a third open office. In the secondary site, the design maintenance is also placed in one room. As opposite to the approach in the primary site, the team members have their worktables against the wall so that the center of the room remains empty. Hence, team members can roll around with their chairs and work together in ad hoc groups.

Separate backlogs have been formed for the CSR handling and for the Fault Handling. Both of the backlogs are placed on whiteboards in the primary site. However, the design maintenance in the secondary site decided to utilize also a software tool to visualize the backlog between the sites. The Product Owner role is shared by the ‘Proxy Product Owners’ (PPO).

As the product is based on a platform that is utilized also by other products, it was evident from the beginning that the interface towards the platform maintenance should remain untouched.

IV. KANBAN IMPLEMENTATION AT THE CSR HANDLING The very first steps in the Kanban implementation were

to renovate the office and to establish the teams. The teams were not formed as a management decision. Instead, all of the CSR handlers went to a meeting room and agreed on the team setups while management was not allowed to attend.

Once the teams and the office environment were in place, it was time to establish the Kanban board and the backlog. In fact, a group of university students contributed to the transition by conducting a study on the workflow at the CSR handling. We proceeded by utilizing the Kanban board setup the students suggested.

A. The People The teams are the core of the CSR handling – all of the

engineers are highly experienced professionals with wide knowledge of the product. In the beginning, a kick-off session was held for the teams. However, after minor adjustments within the organization, two teams were merged. The merge was triggered by the teams, not by the

management. Also, the team setups have been adjusted since there have been visitors from other support organizations to widen their knowledge regarding the product by joining the teams.

A challenge for the teamwork is that once in a while, a CSR handler is required to travel to a live site. Often the trips come on a short notice. Additionally, it is common that an incoming severe case receives the highest priority and thus forces the teams to switch task. As the team responsibility of the CSRs has been emphasized, the team members help each other in case of unplanned changes.

The primary tasks of a Team Coach are similar to the tasks of a Scrum Master, e.g., supporting the team in continuous improvement by challenging the team, facilitating meetings, removing impediments, and protecting the team from external interfaces pushing work onto the teams. In addition, in the CSR handling, the Coach is responsible for most of the statistics.

Because of the nature of the support activity, the PPOs are involved in the escalation and management discussions regarding the high-attention support requests. The maintenance management is also involved in these discussions. Both the managers and the PPOs actively visit the open office for discussions with teams and to see the Kanban board to interpret the high-level status.

B. The CSR Backlog There are different severity options for CSRs. Also, a hot

flag can be raised. Since there are guidelines regarding the prioritization, the backlog was straightforward to establish. The PPOs prioritize the incoming CSRs to the backlog, placing the highest priority support request as the topmost item. On the top of the backlog is a box called ‘Urgent’, which is often utilized for the CSRs that are flagged as hot. If the PPOs place an item there, the analysis needs to start as soon as possible. The priority is a function of, e.g., the severity, CSR type, and the support level the customer has bought.

The stickers on the board come in different colors representing the severities. In addition, green stickers are utilized to represent the non-CSR work the teams perform. These items are prioritized in the same backlog.

Since the handlers, and thus the teams, have relatively equal competence level, anyone is able to select the topmost item from the backlog. In addition, the management and the PPOs have realized that in the Kanban setup their task is not to push the tasks to the teams. Hence, a genuine pull mindset has been established. Overall, the management support for the transition has been outstanding. Because of the nature of the support activity, the PPOs are involved in the escalation and management discussions regarding the high-attention support requests. The maintenance management is also involved in these discussions. Both the managers and the PPOs actively visit the open office to discuss with teams and to see the Kanban board to interpret the high-level status.

C. The Kanban Board From the beginning, the decision was to utilize a physical

Kanban board since all of the teams located in the same open

322322322

Page 3: [IEEE 2011 37th EUROMICRO Conference on Software Engineering and Advanced Applications (SEAA) - Oulu, Finland (2011.08.30-2011.09.2)] 2011 37th EUROMICRO Conference on Software Engineering

office. The backlog was placed to the left on the board and a row was drawn for each team. An additional row, called ‘External’, was drawn for the engineers that are not in the teams but handle CSRs on demand. Initially, there was only one column for ongoing activities called ‘Analysis’. On the other hand, there were six waiting states, and a column ‘CSR answered’ separated from the column ‘Done’ since the Definition of Done (DoD) was that in addition to the CSR answer, the solution database is updated.

Naturally, the first board was not perfect; there was only one state for the active work and multiple waiting states. The workflow in the teams was a black box. Also, there was a tendency to select new items from the backlog before finalizing the update of the solution database since the ‘CSR answered’ column was separated from the ‘Done’ column. The two levels of done increased the probability of partially done work. Hence, the ‘CSR answered’ state was removed and the column ‘Done’ was revised to cover both the answer and the update of the solution database.

After experimenting with the board for some months, the teams gathered to revise the board. The active state was split into four: ‘Analysis’, ‘Fault Reproduction’, ‘Request for more Information’ (the reply for the request is fast and thus the state is treated as active), and ‘Solution Ongoing’. On the other hand, the number of waiting states was decreased to three. Figure 1 visualizes the current Kanban board.

D. The Work In Progress Limits The Work in Progress limits are set by the teams. As the

teams are not equal in size, the limits vary between the teams. From the beginning, there have been Work in Progress limits for all of the ongoing states. As stated earlier, the first version of the board included only one ongoing state. After splitting the active states into four, the total WIP limit of each team increased. However, the number of unlimited waiting states decreased. The limits have been lowered since the launch of the revised board.

Some of the CSR teams are relatively small. Thus, they have had challenges in setting the limits low since it might be that a CSR does not flow through all states. The common situation is that most of the active support requests are on the state ‘Analysis’. To solve the problem of relatively high

limits for each column in the small teams, a total WIP limit for the active states has been defined. Hence, they have a maximum limit for each column but also a maximum limit for the total number of items (denoted with ‘Y’ in Figure 1). This is a new concept taken into use in the previous retrospective before writing the article and thus the results of the experiment remain unknown for the time being. For the larger teams it has been easier to set the limits as they can have multiple items in each column simultaneously.

The nature of the waiting states is such that the CSR handlers have limited options to enhance the pace. Hence, there are no limits for the waiting states. The engineers that handle CSRs only once in a while and are not involved in a team are treated as externals. There is no WIP limit for their row. In addition, the limits are applicable only for the support requests since the non-CSR tasks, i.e., the green stickers, are different in nature and usually do not follow the state-division in the board. Also, often the non-CSR work involves major idle time, for instance, in case of a follow-up activity or a consultancy activity.

In the beginning, it was common that the teams exceeded their limits. The situation was discussed with the team to understand the reasons and to agree on the means to proceed.

The concept of limiting the Work in Progress has triggered plenty of discussion. Previously, incoming CSRs were assigned immediately by a project manager. Often a handler had multiple cases open simultaneously that led to task switching and partially done work.

In addition to the previous push mindset, it is challenging to trust the positive effect of limiting WIP if simultaneously the backlog of support requests is increasing. Since the inflow rate is volatile, there is a tendency among the handlers to see the limits as a negative a factor if the inflow peaks.

Figure 1. The CSR Kanban board visualized. The WIP limits per column are denoted with ‘X’ whereas the total WIP limit for a small team is denoted by ‘Y’. Note that the WIP limits are applicable only for the CSRs, i.e., not for the green stickers that

model the non-CSR work items. The number and distribution of the stickers represent a fictive situation.

323323323

Page 4: [IEEE 2011 37th EUROMICRO Conference on Software Engineering and Advanced Applications (SEAA) - Oulu, Finland (2011.08.30-2011.09.2)] 2011 37th EUROMICRO Conference on Software Engineering

E. The Meetings To enhance the teamwork, the concept of Scrum daily

meetings was added to the Kanban approach. It was decided that the meetings will take place three times a week since on Tuesday and Friday there is a CSR walkthrough meeting that is common for all teams. In this meeting, hot, new, and closed CSRs are discussed to spread the knowledge, to discuss challenges, and to maintain the status updated since there is a requirement to report to the customers and to the other levels of the support organization.

In addition, the concept of retrospectives was selected from Scrum. The retrospective takes place approximately every second week. These meetings are more philosophical in nature in Kanban than Scrum since there are no sprints to enhance the focus. To compensate that, the monthly statistics are reviewed and reflected in every second retrospective.

On top of the internal meetings, there are also escalation phone meetings, which are not affected by Kanban. Still, the managers, the PPOs, and the responsible CSR handlers participate in the meetings.

F. The Metrics and The Radiator Since there are multiple maintenance organizations, there

are metrics addressed from the corporate level. However, once starting with Kanban, all the internal metrics were revised. The underlying concept was that the unnecessary metrics, i.e., the ones that do not support planning or improving, are removed. On the other hand, it was understood that new metrics are needed.

Currently, the metrics are as follows:

• Queue time in the backlog (per severity, average, and distribution)

• Lead time (per severity, average, and distribution) • Overdue percentage (per severity) • Inflow and outflow • Number and percentage of cases on each column in

the board

One of the previous challenges was that the CSR handling performance was clearly indicated regarding the end-to-end lead time including all support layers but not regarding the time the support request spent in the current support organization. As described earlier, there is an agreement between the different support layers to define the target lead times for each layer. The status compared to these lead times was not visible. Thus, a CSR Radiator was built.

The Radiator extracts data to an Excel sheet. The sheet includes the key information of the CSR as well as the number of days to overdue and the reported hours. Also, the number of days the platform maintenance was involved is visualized. The overdue cases are marked with red and the soon-to-overdue with yellow. The data is updated automatically each morning. A dedicated computer displays the Radiator next to the Kanban board. Hence, each time one is next to the board, she or he can easily view the time left for the current CSRs. As stated earlier, the end-to-end performance was already clearly tracked earlier.

V. KANBAN IMPLEMENTATION AT THE FAULT HANDLING

The design maintenance activities were earlier organized as a project sharing resources with the new feature development. Development teams were formed around different parts of the software. These teams were software design teams without verification activities on the real target environment; the testing and verification was entirely performed on a simulated environment. The testing and troubleshooting on the target environment was performed either by the CSR handlers or by the Packet Testing (PT) engineers, managed by another project.

The setup caused handovers and additional coordination. Also, there was constantly a need to discuss priorities between the design maintenance activities and the new feature development. Due to peaks in the customer support activities, the new feature development often received a lower priority and thus suffered in the ability to meet the planned time schedules. This, in turn, caused a need for overtime and weekend work. The need to secure the focus of the developers on the new feature development was one of the main reasons to form the separate Fault Handling teams.

A. The People Since in maintenance activities it is crucial to

troubleshoot and fix customer problems fast, competence in the Fault Handling teams is a key factor. To involve motivated people in the teams, management did not decide the persons for the teams. Instead, the design teams were asked to decide which persons could move to the FH teams considering also the required competence level. The optimum team size (6-8 engineers) was defined based on the experiences with Scrum teams in the feature development organization.

Next, the volunteers formed teams that were competence-wise as equal as possible covering all parts of the product, except the ones that were handled by the secondary site. As well as the Scrum teams in the new feature development, the FH teams include, in addition to the SW design competence, function verification competence on the target environment. Thus, the teams are cross-functional. This is the main enabler for faster problem solving and end-to-end responsibility in an FH team.

After the decision to organize the design maintenance activities in an Agile way, actions took place promptly. Within weeks, the team setup was agreed, the office environment was renovated to an open office and all needed persons were allocated. Simultaneously, the continuation of the customer support and the design maintenance activities were secured seamlessly.

The Fault Handling teams and the PPOs, with support of the Team Coaches, were launched in March 2010. Already in the beginning, it was clear that the teams would rotate between the maintenance and the new feature development. Hence, multiple teams have the possibility to work close to the customer support and perceive a valuable view regarding the issues raised by the customers.

324324324

Page 5: [IEEE 2011 37th EUROMICRO Conference on Software Engineering and Advanced Applications (SEAA) - Oulu, Finland (2011.08.30-2011.09.2)] 2011 37th EUROMICRO Conference on Software Engineering

The Fault Handling teams started with a full day kick-off session that included a presentation by the PPOs, Kanban training, and most importantly a session where the teams agreed their Way of Working statements and the Definition of Done, as well as team names and values.

The Fault Handling PPOs used to work as technical coordinators, thus, they had gained a wide knowledge and experience regarding the product as a whole. In addition, they were familiar with the working methods in maintenance. The Team Coaches support the teams, solve impediments, and facilitate meetings. Also, the coaches help the teams to identify their strengths and weaknesses and to continuously improve as a team.

B. The Backlog The main responsibility of the FH PPOs is to ensure that

customer problems are prioritized in the backlog by business value and that the problems are solved. In addition, the PPOs support the teams in solving the problems by providing a customer viewpoint for the teams. In the Fault Handling, it is also possible for a team member to perceive the real end customer view by participating in the escalation meetings.

Each FH PPO is responsible for one software release track, and since there is only one backlog, the PPOs need to reach consensus on the priority order of the tasks related to different release tracks. In addition to the technical severity of the problem, close communication inside the whole maintenance organization also helps in prioritization of the tasks in the backlog.

C. The Kanban Board The teams decided to utilize a common Kanban board in

the team space. The common board is practical, as there is only one backlog from which the teams select tasks. Thus, it is also evident for the PPOs, the management, and anyone interested in the overall situation in the Fault Handling. The first version of the Fault Handling Kanban board was simple,

including only the most important phases of the typical maintenance case workflow: ‘Analysis and fixing’, ‘Pending’, ‘White verification’ and ‘Done’. The ‘Urgent’ box was established to trigger immediate involvement of the teams. It is utilized for high-severity customer cases. In practice, it is the PPOs that agree with the teams on who is to start working with the case and which case is put into pending phase due to re-prioritization.

Later on, more columns were added to better cover the workflow as a whole. Figure 2 visualizes the current Kanban board. Currently, the columns are: ‘Analysis and fixing’, ‘Pending’, ‘Integration ongoing’, ‘Package done’, ‘White verification’, ‘Map’, and ‘Done’. Also, the Definition of Done has been added on each column on the board to support the recall of the required tasks before moving to the next phase. White verification is a phase for verifying the corrections in a software version in which the correction is integrated with the rest of the system. This is practically a candidate for a customer package.

Correction mapping needs to be done between the delivery tracks. Forward mapping is mandatory, thus, after an upgrade, no corrections are lost. Backward mapping creates a TR that will be fixed in the older release track. Mappings are performed immediately in all tracks since that has proven to be the most efficient solution.

The software integration is not conducted in the Fault Handling teams. Instead, there are teams providing continuous integration service to all development teams. In practice, a new SW package is built every night and consolidation tests on the target environment are automatically performed.

D. The Work In Progress Limits The Work in Progress limit is one of the core factors in

Kanban. The WIP limits are defined for the ‘Analysis and fixing’ and the ‘Pending’ columns. Since integration is performed outside the teams, it was decided that the WIP

limits for work phases following the integration, i.e., ‘White verification’ and ‘Map’ columns in the Kanban board, are not reasonable. The FH teams decided the limits themselves in the kick-off meeting. The first values were set after considering the goal for focusing on very few issues at the same time and also on the willingness to promote team and pair working. Also, it was decided that the WIP limits should be adjusted only at retrospectives to Figure 2. The FH Kanban board. The WIP limits are denoted with ‘X’. The number and distribution of the stickers represent a

fictive situation.

325325325

Page 6: [IEEE 2011 37th EUROMICRO Conference on Software Engineering and Advanced Applications (SEAA) - Oulu, Finland (2011.08.30-2011.09.2)] 2011 37th EUROMICRO Conference on Software Engineering

ensure that they are not changed without careful consideration and discussions.

It was realized that the initial values were too low and caused frustration since the team was not able to utilize all members effectively. There was a feeling of idling. After a couple of iterations of WIP limits, they have been stable for several months. The exception is that one of the FH teams is experimenting with the way of working without the WIP limits since they had the feeling that strict limiting is not the most efficient solution. As a result, currently there are tasks in the pending column waiting for time from the team. The solution for the problem remains to be seen.

E. The Meetings Similar to the CSR teams, the Fault Handling teams have

also adopted practices typical for Scrum. These are, for instance, daily stand-up meetings and retrospectives. The main purpose of the daily stand-up meeting is to check in the team that everyone is focusing on the proper issues and work is shared in the most efficient way. The PPOs are also present in the daily meetings to provide their support, and to receive a view regarding the status of the ongoing activities (even though daily meetings are not for status reporting).

Since there are no sprints in Kanban, the retrospective frequency has varied in the Fault Handling teams depending on the customer support case situation and the teams’ own interest. For instance, one of the FH teams decided to perform a retrospective “on the need basis”. Different approaches have been utilized in retrospectives; discussion regarding a specific topic (e.g., pair programming or competence), focusing on the feelings and atmosphere only, identifying improvement needs for the team itself and also for the PPOs and the Team Coaches, or utilizing the “traditional” retrospective method: identify what has gone well and what needs to be improved.

The FH teams are part of the development community. Hence, the FH teams participate in the Scrum of Scrum (SoS) meetings of the development organization as well as in the development organization Community of Practice (CoP) meetings. In SoS, all R&D teams share information regarding issues that concern also other teams.

In addition, there is a weekly walkthrough meeting for solved problems. In this meeting, the teams describe to each other the customer problems they have solved and discuss both the chosen solution and the verification strategy. The meeting has been established to guarantee that all possible impacts of a certain code modification have been considered before delivering the solution to the customer. These meetings have been valuable in increasing competence since people discuss different cases and solutions.

F. The Metrics and The Radiator Measuring the lead time is a tool for optimizing the

throughput in Kanban. Statistics of lead time has been mainly utilized to support the PPOs in estimating whether all required corrections can be performed prior to the planned release date. Maintenance work is by nature challenging to estimate. The plan is to start utilizing the measurements also

in challenging the teams to identify the means to improve the lead times.

As in the CSR handling, the Fault Handling utilizes a tool called ‘Radiator’ to support tracking the agreed internal time limits related to the customer problems. The Radiator is based on a Visual Basic macro that extracts data to an Excel sheet automatically. This tool sorts all customer problems based on the time remaining to provide the answer for the problem. The Radiator is also utilized for visualizing the TR inflow/outflow and the lead time success rate (the number of customer TRs answered in time in relation to all customer TRs).

VI. FAULT HANDLING IN THE SECONDARY SITE The goal was to implement a similar workflow in both

sites. In this chapter we concentrate on differences in the Kanban implementation between the sites.

A. The People Earlier, the secondary site was involved only in the

software development activities. Hence, verification on the target environment had to be learnt. Experts from the primary site spent three months in the secondary site. One of them joined a Fault Handling team and thus currently the entire life cycle of a fix is performed locally in the secondary site as well. The exception is that some test cases requiring major traffic load are handled outside the team. The situation is similar for the FH teams in the primary site as well.

B. The Backlog Multi-site development causes challenges and introduces

communication overhead. In addition to agreeing on unified working methods between the sites, a common solution is to utilize a Single Point of Contact (SPOC) role since there are issues that need regular communication. The most important of these issues is the backlog handling and prioritization. If the PPOs and the team are in the same location, the backlog can be maintained on the Kanban board. As this was not possible in the secondary site, a simple electronic backlog handling tool was created. In the tool, the state of the fix is stated and the fault is described.

C. The Kanban Board The Kanban boards of Fault Handling teams are similar

at the two sites. Pending issues can be put on the lower half of the table in any state to avoid items moving back and forth. Developers also utilize the backlog tool. They move items into the ‘In Progress’ state once it is taken from the backlog and close it once the item reaches the last column on the physical Kanban board. The approach enhances the self-organizing nature of the team.

D. The Work in Progress Limits Competence buildup is supported by maintaining the

WIP limits low and thus encouraging pair working and learning. Competence development is closely related to the increased involvement in the CSR handling; the Fault Handling teams learn from each CSR that they are involved with.

326326326

Page 7: [IEEE 2011 37th EUROMICRO Conference on Software Engineering and Advanced Applications (SEAA) - Oulu, Finland (2011.08.30-2011.09.2)] 2011 37th EUROMICRO Conference on Software Engineering

E. The Meetings The internal meetings discussed earlier are applicable

here as well; daily stand-up, retrospective, and ’Done cases walkthrough meeting’. Additionally, a weekly meeting is required to discuss backlog priorities. To improve the verification activities, there is a regular meeting with the primary site experts to discuss the target verification.

F. The Metrics The same Radiator tools as in the primary site are utilized

on a dedicated machine. Currently there are no measurements conducted that would be specific for the secondary site.

VII. IMPLEMENTATION SUMMARY Table 1 summarizes the Kanban implementation in the

CSR handling and the Fault Handling.

VIII. POSITIVE EXPERIENCES Multiple positive effects of the Kanban implementation

are visible. The level of teamwork has increased drastically. For instance, if a team member is out of the office, the team will provide backup. The teamwork is supported also by limiting the WIP and by the current seating arrangements. The Definition of Done is an enhancement for removing the partially done work.

The engineers are more willing to push the limits and

perform tasks beyond their comfort zone. It has been seen both in the FH and CSR teams that the engineers have selected items from the backlog for which they do not feel to possess adequate knowledge in advance. Indeed, in the Fault Handling, some of the testers have conducted code changes and some of the designers have been involved in testing.

Also, cooperation between the design maintenance and the CSR handling has improved. Earlier it was common that the engineer fixing the fault did not even know the CSR handler and vice versa. Since the open offices are next to each other, it is possible to visit another office and immediately know with whom to discuss the case.

The current pull mindset disables the option of PPOs pushing tasks to the individuals. Prior to the Agile transformation, it was common that the project managers pushed the items for engineers. Naturally, there are still escalations that receive the highest priority. Once in a while, placing an urgent item at the top of the backlog is not enough. In case of a severe problem in a network, the PPOs need a volunteer immediately for the task. In these situations the current setup is an advantage; the PPOs know exactly the workload of each team and individual, other items can be down-prioritized dynamically, and the urgent task can be selected by a team instead of an individual.

One main factor impacting positively on the lead times in Fault Handling is the cross-functionality of the FH teams. Now it is possible at the same time to start fault reproduction on the target environment and the analysis of the code. This

TABLE I. SUMMARY OF THE KANBAN IMPLEMENTATION IN THE CSR HANDLING AND THE FAULT HANDLING.

CSR Handling FH in the primary site Implementation concepts

People

• Team members are highly experienced professionals in the domain of customer support

• Team members rotate between the CSR handling and other R&D organization

• Teams formed from the earlier product teams • Teams have cross-functional competence • Teams & PPOs rotate between the FH and the

feature development

• Teams formed by the people themselves, not by the management

• Teams supported by the Team Coaches • PPOs provide the customer view

Backlog

• One backlog for all teams • Prioritized by CSR PPOs • Color-coding for different

severities and non-CSR work • All work items included

• One backlog per site • Prioritized by FH PPOs • Secondary site: web-based backlog tool with

status information • Secondary site: weekly backlog meeting

between the SPOC & FH PPOs

• One prioritized backlog for each area (CSR handling, FH). Hence, the importance of each task clear for the teams

• Customer value as basis on the backlog prioritization

• Fast track for urgent cases ensures flexibility that the maintenance activities require

Kanban Board

• One physical board for all the teams

• One physical board for all the teams • Secondary site: backlog handling tool

including the status information

• Visualizes the workflow • Defined and adjusted by the teams themselves • Always up-to-date

WIP • Defined by each team • Limits for each ongoing state,

waiting states not limited

• Defined by each team • Limits for the most important states

• Defined and adjusted by the teams themselves • Experiment certain time prior to changing

Meetings

• Team stand-up meetings three times a week

• Common meetings for all CSR teams twice a week

• Retrospective every second week

• Daily stand-up meeting for each team • Retrospective frequency decided by each team • Weekly walkthrough meeting regarding the

solved cases • Team representative participates in SoS &

CoPs • Secondary site: meeting of target verification

• Minimize the amount of regular meetings • Keep team daily stand-up meetings as short as

possible

Metrics

• Queue time in the backlog • Lead time • Overdue percentage • Inflow & outflow • Number of cases on each

column in the board

• Lead time • Overdue percentage • Inflow & outflow

• Customized tool utilized as a basis for a ‘Radiator’ screen on the wall to visualize the current status and the main metrics

• Multiple common metrics addressed from the organization above the support organization

327327327

Page 8: [IEEE 2011 37th EUROMICRO Conference on Software Engineering and Advanced Applications (SEAA) - Oulu, Finland (2011.08.30-2011.09.2)] 2011 37th EUROMICRO Conference on Software Engineering

naturally enhances the troubleshooting and fault identification activities. Also, verification of the correction after the implementation in the same team provides instant feedback.

Kanban and the retrospectives provide a solid ground for continuous improvement. In the current mindset, it is easy for anyone to raise their concerns and seek for improvement. Previously there has not been focus on regularly reviewing and optimizing the process.

On top of the other positive experiences, in the new setup, the whole is more visible for all stakeholders. For instance, everyone is constantly aware of the highest priority items and also of the most common issues in the customer networks. In addition, the bottlenecks are evident to interpret from the Kanban boards.

The main positive aspects can be summarized as follows:

• Teamwork, empowerment, and responsibility • Pull instead of push • Challenging the comfort zone • Cooperation between FH and CSR handling • Cross-functionality • Mindset of continuous improvement • Visibility

IX. CHALLENGES In addition to the positive findings, there have been – and

there are – challenges. First of all, the transition from a group of individuals to a team is time-consuming. In addition to the challenges in the social behavior and workload balancing, the software tools both in the CSRs and Trouble Reports have been designed to support individual work.

A major challenge is that a severe CSR or TR can arrive anytime. Hence, once in a while, task switching cannot be avoided. In addition, the CSR handlers are sometimes required to travel to the site. In this case, the team members support each other as the team is responsible for the CSRs selected from the backlog.

Previously there has been push and follow-up from the management. In the new setup, the team has the responsibility for the work items. As well as the transition towards teamwork, also the change in the mindset towards taking the responsibility is time-consuming.

The transition from push to pull has been challenging also for the management. For instance, in the beginning, the PPOs were concerned regarding the growing backlog. After the first peaks in the size of the backlog, they became more confident with the concept of backlog and began to believe that the items are handled properly even if they are idle for a while in the backlog.

On the other hand, previously the management was regularly updated in status meetings. In the new setup, the management needs to trust the engineers and the PPOs that they are actually taking the responsibility for the work items without formal status update meetings. For instance, it is challenging for the management to feel confident that the Fault Handling PPOs posses the adequate overall picture to prioritize the Trouble Reports in an order that the most critical fixes can be delivered in the coming software packet.

The challenges of the distributed development have been widely studied, e.g., [4]. The challenges are visible also in our context. The chosen Kanban implementation relies on, e.g., face-to-face conversations, physical Kanban boards, and stand-up meetings. These are challenging concepts since the design maintenance is located also in two sites. Currently, two backlogs exist for the Fault Handling; a physical one on the Kanban board in the primary site and a web-based one for the Fault Handling in the secondary site. Since there is a possibility of the backlogs not being synchronized, the backlogs should be merged. A Single Point of Contact has been utilized to enhance the cooperation between the sites.

The competences are different between the sites. The teams are competent but the areas of competence are different. Hence, there are challenges in selecting the topmost item from the backlog. A component Guardian role has been established to support FH teams in the competence development. The Component Guardian is an experienced developer possessing deep knowledge of a certain technical area. She or he has allocated time for supporting other developers to learn that domain. Naturally this requirement for continuous learning can also be exhausting. Indeed, some persons in the FH teams have been frustrated since they could not always work on the area they are used to. In the CSR handling, all teams are located in one room and possess equal competences. Thus, the problem of different competences does not affect the CSR domain.

Previously, the maintenance organization did not possess a function verification environment. With the establishment of the Fault Handling teams, function verification regression testing was also started in the maintenance tracks. This has been a major improvement but it also required effort from the teams to build the system and overcome the challenges.

The interface towards the platform maintenance has proven to be a bottleneck in the CSR handling. The bottleneck became even more visible with Kanban. Due to the different working methods between the discussed support organization and the platform maintenance as well as due to the complexity of the platform-related support requests, the issues are difficult to address.

The work in the platform maintenance is not performed with Agile methods. There are challenges in the visibility and it has not been possible to set the WIP limits for the CSRs forwarded to the platform since there are also other maintenance organizations sending CSRs to them. Based on the visibility regarding the bottlenecks, discussions with the platform maintenance have been initiated.

In overall, the main challenges have been (or are) related to the following:

• Unpredictable inflow of work • Interface to the platform maintenance • Multi-site working • Establishing a team from a group of individuals • Initiating the team responsibility • Tools that are not supporting teamwork • Changes in the testing • Continuous learning

328328328

Page 9: [IEEE 2011 37th EUROMICRO Conference on Software Engineering and Advanced Applications (SEAA) - Oulu, Finland (2011.08.30-2011.09.2)] 2011 37th EUROMICRO Conference on Software Engineering

X. CONCLUSIONS In this industry paper, the transition to Kanban has been

described in a product maintenance organization. We have described the Kanban approach in a highly complex context in which multiple boundary rules exist, e.g., regarding the lead times. Also, challenges and positive experiences have been identified.

Multiple key success factors led to the successful transition. From the team working perspective, the common open offices, team responsibility, and team empowerment have led to intensive cooperation. The cooperation has been improved between the CSR handling and the Fault Handling as well as between the sites. Additionally, a team-level rotation between the Fault Handling and the new feature development is taking place.

The number of backlogs has been minimized. All work items are pulled via the backlog that is prioritized by the experienced PPOs. Without the shared Product Owner role, the backlog would not reflect the business value as the workload is high for the PPOs.

The Kanban principles have been obeyed. The visibility, pull mindset, and WIP limits have been introduced. In addition to the Kanban board, the visibility has been improved by introducing proper metrics and the Radiator. Also, best practices from the Scrum framework have been selected to complement Kanban.

The findings are generalizable to some extent. For instance, all of the support organizations are committed to Service Level Agreements, hence, the introduced metrics are likely beneficial also in other similar contexts. The aspects of team working are applicable for most Agile teams whereas the selected best practices from Scrum could benefit also other organizations implementing Kanban.

During the transition, the management support has been extensive. Team Coaches have been in place to support the teams and the whole organization in the change. We have

been able to implement Kanban successfully in a complex and hectic multi-site context even tough there have been challenges. The journey from the previous setup has been drastic. However, the journey is not finished. There are no pre-defined destinations – it is all about continuous improvement and learning as well as challenging the comfort zone.

ACKNOWLEDGMENT The authors acknowledge the Ericsson personnel that

were supporting in brainstorming the topics and/or in reviewing the article. These persons include Mikko Friman (Product Maintenance Department Manager), Kari Laipio (FH PPO), Christian Engblom (Agile Framework Manager), Kirsi Mikkonen (Agile Change Driver), Ismo Paukamainen (Kanban Team Coach), Kjell Lauren (Scrum Master), Juha Hakulinen (Scrum Master), and Ferenc Nagy (Technical Author).

This paper was completed in Tivit’s (the Strategic Centre for Science, Technology and Innovation in the Field of ICT, www.tivit.fi) Cloud Software Program (www.cloudsoftwareprogram.org). The work is partly funded by Tekes (the Finnish Funding Agency for Technology and Innovation, www.tekes.fi).

REFERENCES

[1] David J. Anderson, The Kanban Primer, Better Software, January/February 2009, pp. 84-90

[2] Henrik Kniberg, and Mattias Skarin, Kanban and Scrum – making the most of both, InfoQ, 2010

[3] Mary Poppendieck, and Tom Poppendieck, Lean Software Development – An Agile Toolkit, Addison Wesley, 2003

[4] Maria Paasivaara, Sandra Durasiewicz, and Casper Lassenius, “Using Scrum in Distributed Agile Development: A Multiple Case Study”, 2009 Fourth IEEE International Conference on Global Software Engineering, pp. 195-204, DOI 10.1109/ICGSE.2009.27

329329329