mtr announces launch of east rail line new signalling
TRANSCRIPT
-more-
PR006/211 February 2021
MTR Announces Launch of East Rail Line New Signalling System and 9-car Trains on 6 February and
a Series of Measures to Enhance Shatin to Central Link Project Control MTR Corporation announces today (1 February 2021) the commissioning of the East Rail Line (EAL) new signalling system and 9-car trains on 6 February 2021, after the satisfactory completion of all further testings, as well as approvals from relevant Government departments on safe and sound condition of the new signalling system and trains. At the same time, the Corporation announces the establishment of a dedicated “Shatin to Central Link (SCL) Technical and Engineering Assurance Team”, directly accountable to the Chief Executive Officer (CEO), to monitor the SCL project from both a technical and service readiness perspective and to identify important unknown issues of the remaining works of the SCL project for timely reporting and follow up. The establishment of the team is an initiative of the Corporation after reviewing the Report of the Investigation Panel (the Report) into the postponement of the commissioning of the EAL new signalling system in mid-September last year. In addition, as per request by the Government, a new Service Reliability Report has been introduced as part of the Government’s reviewing mechanism of the commissioning of new lines to ensure the timely reporting and handling of issues with a potentially significant reliability impact. This report will complement the existing System Safety Report. Dr Jacob Kam, CEO of MTR Corporation, points out that the Corporation decided to implement these two measures following the experience gained from the postponement and a detailed review of the Report. The Corporation also accepts and will implement the other recommendations made in the Report, as follows:
1. providing internal procedures to ensure that relevant Government departments are kept adequately informed of all significant reliability issues in the future;
2. strengthening training to raise sensitivity around public concerns, effective
communications and the importance of service quality and reliability, in addition to safety issues, and
3. reinforcing second line of defence arrangements on risk management and compliance
control to detect and escalate important issues early. Dr Kam reiterated the Corporation’s commitment to enhancing the project’s preparation works in relation to safety and passenger service.
“We never compromise the safety of our passenger services, and also attach great importance to their reliability. This is also the case for the new signalling system and it will only be launched when the Corporation and the relevant Government departments are satisfied with its performance in these aspects,” said Dr Kam. Dr Kam expressed his gratitude to the Investigation Panel chaired by Ir Edmund Leung. The Corporation announced the setting up of the Panel on 13 September 2020 to look into the communication and reporting mechanisms of the Corporation both internally and with relevant Government departments from May 2020, when the issue leading to the eventual postponement was first identified, to 11 September 2020 when the deferral decision was made. The Corporation submitted to the Transport and Housing Bureau on 21 January 2021 the report prepared by the Panel, which is appended in full (Appendix 1). The Corporation acknowledges and accepts the findings of the Panel which include a finding that the issue concerned is not an issue of safety but of service reliability. Safety has been reaffirmed by the technical investigation, which has shown that the concerned issue was caused by a non-safety critical software module being overloaded by a new software module specifically built for the Corporation to provide extra train monitoring information to the Operations Control Centre. The contractor has resolved the issue by upgrading the software and stopping the new software module. The Investigation Panel has during its course of the investigation referenced the findings of the technical investigation. The report of the technical investigation is appended in full (Appendix 2).
-End- About MTR Corporation
Every day, MTR connects people and communities. As a recognised world-class operator of sustainable rail transport services, we are a leader
in safety, reliability, customer service and efficiency.
MTR has extensive end-to-end railway expertise with more than 40 years of railway projects experience from design to planning and
construction through to commissioning, maintenance and operations. Going beyond railway delivery and operation, MTR also creates and
manages dynamic communities around its network through seamless integration of rail, commercial and property development.
With more than 40,000 dedicated staff*, MTR carries over 13 million passenger journeys worldwide every weekday in Hong Kong, the United
Kingdom, Sweden, Australia and the Mainland of China. MTR strives to grow and connect communities for a better future.
For more information about MTR Corporation, please visit www.mtr.com.hk.
*includes our subsidiaries and associates in Hong Kong and worldwide
Appendix 1
Investigation Panel Report on Route Recall Issue Submitted by:
_________________ __________________ Ir Edmund KH Leung Dr Peter Ronald Ewen Chairman Deputy Chairman
Date: 8 December 2020
Table of Contents 1. Introduction ………………………………………………………….. P.1 2. Safe and Sound (S&S) …………………………………………….. P.3 3. Sequence of Events ………………………………………………... P.4 4. Findings ……………………………………………………………... P.13
4.1 Safe & Sound ……………………………………………….. P.13
4.2 Identification, Analysis and Classification of the Issue….. P.15
4.3 Internal Escalation ………………………………………….. P.16
4.4 Reporting to Government ………………………………….. P.17
4.5 Advisers …………………………………………………….. P.19
5. Summary of Findings ………………………………………………. P.19 6. Recommendations ………………………………………………… P.21
Page 1 of 22
1. Introduction
1.1 On 11 September 2020, MTR Corporation Limited (MTRCL) announced a postponement of the commissioning of the new signalling system and roll-out of new trains on the East Rail Line (EAL), which would have brought in the Mixed-Fleet Operation (MFO). The postponement decision was made after conducting a final review of the new system prior to service commencement. The announcement put on hold the changeover to MFO that was due to take place on 12 September 2020.
1.2 In the final review, MTRCL noted that during on-site testing in Non-Traffic Hours (NTH) on 11 May there had been a software issue that could potentially cause deviation of trains from their intended route. The software issue had been discovered and reported by Siemens, the contractor, as a defect that required remedial measures, but it had not been deemed to be a Safe and Sound (S&S) issue by the MTRCL team. Train deviations due to this software issue were only observed during analysis of logfiles during testing/simulation and did not actually occur in real operation on that day.1 Although the probability of occurrence was considered remote (at that time), train deviation from an intended route remained a possibility. The problem was identified as a Route Recall (RR) issue.
1.3 The precautionary postponement of the commencement of MFO
to address the software issue was deemed necessary to better ensure smooth and reliable operations of the new signalling system. It should be noted that the public has not experienced any disruption of the EAL service as a result of the deferment of MFO. However, the decision to defer the commissioning of the new signalling system and MFO just a day before the planned changeover has raised public concerns about the process of communication within MTRCL and its interaction with the Government.
1 In the course of successfully replicating the RR issue during testing in October and December, among other technical findings, the team also found that train was routed to the wrong platform of the correct station in two occasions (due to another known software bug in the time-table module and incorrect software installation sequence respectively, not caused by or linked to a RR. Both issues have now been corrected to prevent recurrence.).
Page 2 of 22
1.4 On 13 September, MTRCL announced the establishment of an Investigation Panel (IP) to investigate the matter.
1.5 The mandate of the Panel was:
• To ascertain how the potential route recall issue in the new signalling system was identified, confirmed, analysed and followed up
• To review whether the internal communication and reporting mechanism of MTRCL was sufficiently robust and was being timely and properly implemented during the above-mentioned process
• To investigate the reporting by MTRCL to relevant Government departments and ascertain whether this was timely and properly implemented
1.6 It should be noted that since postponement of MFO, MTRCL has
conducted a formal and comprehensive technical investigation into the RR issue. This is part of MTRCL’s ongoing effort to re-affirm its readiness for the commissioning of the new signalling system and MFO. The IP took note of this parallel stream of work by MTRCL which has, amongst other things, reviewed and validated the root cause2 of the RR issue and identified solutions accordingly. The scope of the technical investigation is different from that of this IP’s work. The focus of the IP is primarily on the period between the emergence of the software issue on 11 May and the announcement of deferral on 11 September, and to ascertain the way in which the RR issue was handled, review the internal communication and reporting mechanisms, and investigate the communication of the issue with the concerned Government departments. Since the technical investigation was conducted after the 11 September deferral, it bears little direct reference to the focus of the IP.
2 The new software link between the two internal ATS modules, which was built specifically for MTR to provide extra train fault status monitoring to the Traffic Controller, overloaded the Train Monitoring Tracking (TMT) data processing capacity and resulted in the RR issue.
Page 3 of 22
2. Safe and Sound (S&S) 2.1 For the purposes of the third part of the IP’s mandate as set out
in Paragraph 1.5 above, the reporting to relevant Government departments refers mainly to the S&S declaration process. The S&S process is a formal approval process between MTRCL and Government departments to ensure that a new project is safe and ready for operation which requires MTRCL to issue a declaration to the Railways Branch of EMSD (EMSD RB).
2.2 The requirements and guidance on what constitutes a safety-
related issue are comprehensive and clear; and having considered all these, the potential consequences of the RR issue, and the findings of technical investigation conducted after 12 September, the IP is of the view that the RR issue is NOT an issue of safety. This will be discussed in greater detail in later sections of this report and is in line with the view of the Independent Safety Assessor (ISA).3
2.3 The scope of what must be demonstrated to prove “Sound” is
agreed on a project-by-project basis. For MFO, Sound requirements related primarily to the demonstration of train service headway and journey time. The Sound requirements included the demonstration of the signalling system that it met the appropriate reliability performance; and the demonstration of the readiness of the Operating Team to operate the system and recover any potential reliability issues during service. These demonstrations were successfully completed for MFO.
2.4 A new project cannot commence passenger service without a
S&S declaration from MTRCL and then a follow-up letter of ‘satisfaction’ of S&S conditions from EMSD RB. MTRCL submitted two S&S declarations for MFO: one in May and one in August. The reason for the second submission was to reconfirm the first one, following the later discovery of some issues (unrelated to the RR issue) on 23 and 25 May, which will be described later. With the second submission, EMSD RB issued a letter of satisfaction of S&S conditions for MFO on 25 August.
3 The ISA is contracted by MTRCL to advise on critical safety aspects of the new signalling system.
Page 4 of 22
3. Sequence of Events 3.1 Initial discovery: As part of the process to satisfy itself of the
safety and soundness of the new signalling system and MFO (hereafter referred to as “the project”), MTRCL conducted extensive testing during NTH. The RR issue was first noted in an NTH test on 11 May. When a train that has already been given a route by the Automatic Route Setting (ARS) system physically moves past a signal, the route should be released. However, if the virtual representation of the train lags slightly behind its actual position (this phenomenon is termed “late stepping”) RR could occur. The virtual representation of trains is tracked and reported by a sub-function, Train Monitoring and Tracking (TMT) of the Automatic Train Supervision (ATS) system. When a ‘late stepping’ issue happens, the ARS sets or “re-calls” the identical route, which can be picked up by the following train.
Page 5 of 22
3.2 A RR issue can lead to a “deadlock” situation when a number of
trains cannot continue along their path, because they are blocked by another train. At bifurcations and at stations with multiple platforms, an RR issue could potentially lead to a train taking the wrong route. There are no safety implications as the signalling safety sub-systems would maintain the safe separation of trains and prevent collisions even if a train took the wrong route. However, a RR issue may result in service disruption or passenger inconvenience through a delay of service or, train movement along an unintended route, even though safety is protected throughout the whole movement.
3.3 Analysis and classification of the RR issue: As soon as the
problem was noted, Siemens, the contractor for the Shatin to Central Link signalling project, began analysis of the issue. On 12 May, the Siemen’s team presented to MTRCL their conclusion that the RR issue was caused by system overloading, which affected the TMT functions of the ATS system software. The RR issue was jointly (i.e. with MTRCL’s agreement) classified as a “medium” ATS issue and was categorised as a “Day 2” item. Day 2 items are those not considered to be critical to the commencement of service (commencement is “Day 1”) and can be resolved after commencement of MFO. The classification of it as a “medium” issue meant that it required a fix but was not
Page 6 of 22
critical. As the RR issue observed on 11 May had been classified a Day 2 item and was not considered to be an S&S issue, later on the day of 12 May MTRCL submitted it S&S declaration to EMSD RB.
3.4 Identification of corrective measure: By 4 June, Siemens
presented to MTRCL a corrective measure to resolve the RR issue and suggested it could be implemented with the next version of the ATS software. According to the plan, this new version would be installed on 15 September, shortly after the scheduled commencement of MFO on 12 September. The issue remained designated as a “medium” issue.
3.5 The IP observed that there were no significant consequences of
the RR issue identified at this point and, according to Siemens, the RR issue was not relevant to safety. It appears that neither Siemens or MTRCL had contemplated the RR issue from the perspective of “soundness”, which (as explained earlier) is based primarily on headway and journey time. This is despite the possibility, albeit considered at that time to be remote, of a deadlock and/or mis-routing in a real-life RR situation. Indeed, it appears that with the identification of a corrective action, the Siemens and MTRCL teams considered the issue as temporarily “resolved” insofar as a solution had been identified for near future implementation, and that no further action was therefore required at that stage. This judgement is now known to have been flawed. The IP opines that at this stage Siemens should have provided, and MTRCL should have requested, a full investigation of the RR issue including a probability and impact statement which should have then been included in the change request documentation (which will be discussed later). This analysis and the change process would have initiated the escalation and reporting process.
3.6 Meanwhile, on 23 May and 25 May separately, during a further
phase of NTH testing three significant issues occurred. The first incident involved the Signalling Automatic Train Supervision (ATS) subsystem and resulted in a display Greyout in the Operations Control Centre (OCC). It was concluded that the problem was the result of the activation of a data logging function, Paktel, in the ATS Subsystem during the testing and it was
Page 7 of 22
determined that the data logging function would not be used in normal service operation. The major lesson learned from this incident was to be very cautious about making last-minute changes to a system. The second incident involved the shutdown of the interlocking system and it was determined that the cause was due to the simultaneous manual shutdown of two of the four safety computers instead of the normal sequential shutdown. This was a procedural error and relevant maintenance manuals were subsequently updated to highlight those precautions and appropriate methods of shutting down safety related computers. The third incident involved a test train proceeding in the wrong direction and passing a red signal under “Restricted Manual (RM)” mode. This incident was attributed to human factor and the training and assessment of Train Captains on RM mode driving has since been enhanced.
3.7 While the three issues on 23 May and 25 May were not directly
related to the RR issue, the Greyout issue, in particular, had an impact on the resolution of the RR issue. First, resources and attention were focused on resolving these new issues. Second, because the root cause of the Greyout issue was related to the changing of a logging function, also used for debugging purposes in the ATS system, there was an increased caution in making any alterations to the ATS system, particularly as a last-minute change. It should be noted that all three issues were resolved before the scheduled date of MFO commencement and a full report was submitted to the Government4 and a Press Release5 issued by MTRCL in August. A subsequent inspection of logs during the technical investigation undertaken after 12 September has shown that RR issues had occurred in the same test sequence but went unremarked, due to the belief that RR was a Day 2 issue and a solution had already been identified and, that the SPAD, Greyout and interlocking shutdown issues demanded more immediate attention.
4 Report for Incidents during the Two Tests for East Rail Line (EAL) Mixed Fleet Operation (MFO) on 22^23 and 24^25 May 2020 dated 13 August 2020. 5 Press Release PR055/20 dated 17 August 2020: Report on the three incidents on East Rail Line May 2020.
Page 8 of 22
3.8 Submission of EDOC to adjust TMT trace-level settings: In early July, Siemens proposed adjusting the TMT trace-level settings on the ATS system to boost its overall performance (TMT trace-level setting alters the debugging and logging details but is different from the debugging tool Paktel, that caused the Greyout). Over the course of July, as testing of the systems and analysis was ongoing, it appeared (at that time) that this adjustment could also help improve the TMT system overloading issue, which Siemens had reported at the end of July to be the cause of the RR issue. (This was a mitigation but not the full fix, which was still planned to be achieved through the software upgrade post-commencement of MFO). Within MTRCL, with the formal handover of the system from the Projects team to Operations, as per normal practice, on 11 May, any change to the system required a formal application and approval through the Engineering Document (EDOC) process. The EDOC process is a change approval process for any change to a system under the custodianship of Operations. At the end of July, an EDOC was submitted to adjust the TMT trace-level settings, and this was approved on 18 August.
3.9 The EDOC process requires that all parties that could be
impacted by the EDOC review and sign the documents for the concerned system changes to proceed according to the implementation steps defined therein. This is to ensure that the integrity and configuration of the system is maintained. Once an EDOC is signed, it is expected that it will be implemented. In this case, implementation of the EDOC was planned to be before commencement of MFO (as ‘strongly suggested’ in Section 4.3 of the signed EDOC). The EDOC went through two drafts from 29 July to its final signing on 18 August. The second draft explained that the change to the TMT trace-level settings would improve TMT performance, as well as reduce the likelihood of occurrence of RR. However, although this might have been the understanding at the time, the IP notes that MTRCL’s technical investigation (referred to in Paragraph 1.6 above) has subsequently shown that this TMT trace-level settings adjustment would have no notable improvement for the RR issue.
3.10 Efforts to effect adjustment in the TMT trace-level settings prior
to MFO: At this stage, the RR issue continued to be a Day 2 issue.
Page 9 of 22
Over the course of August, there were increasing efforts to change the TMT trace-level settings, mainly because it was seen as a relatively easy procedure that would improve the performance of the ATS system. It is notable that the implications for soundness or reliability, such as possible train next stop/destination deviations or deadlocks, were not presented as reasons for making the change. However, Siemens urged MTRCL to make the TMT trace-level settings adjustment before commencement of MFO. On 7 August, Siemens “strongly suggest” implementing the change before the start of MFO, because “it is the first step to improve TMT performance without changing the software itself.” Although this “strong recommendation” was not explicitly linked by Siemens to the RR issue, it was understood to be so by the relevant MTRCL team and was articulated as such in the EDOC. It appears that the consequences of not implementing the TMT trace-level settings change were not made explicit by Siemens or the MTRCL team because it was assumed that it would be executed upon approval of the EDOC. However, Siemens never declared that the TMT trace-level settings adjustment was a pre-requisite for MFO, despite their repeated requests to implement the change. That is, Siemens never asked to stop MFO because the adjustment had not been made.
3.11 As will be discussed in later sections, the IP opines that the
receipt of the “strong recommendation” to implement the change before MFO should have been the trigger point for MTRCL to proactively engage EMSD RB. The IP also believes that if the root cause and full solution requirements of RR had been investigated fully at this point (as was done by the technical investigation after 12 September), it would have been known that the cause of TMT overloading was more complex and the full solution required much more consideration than was thought at that time. The IP therefore believes that MTRCL team initially under-estimated the complexity of the RR issue, leading to a delayed reporting and resolution of the issue.
3.12 MTRCL had planned to implement the TMT trace-level settings
change to test its effectiveness (with a plan to reset it after the test whilst the results were being analysed), prior to NTH testing on August 18^19, but due to a Typhoon Signal No. 8, this activity
Page 10 of 22
was cancelled. However, throughout August it was believed that the EDOC would still be implemented and subsequent attempts to implement the TMT trace-level settings adjustment were made before the end of the month.
3.13 Second S&S declaration issued: Meanwhile, the second and
final S&S declaration was under preparation and was signed off for submission to EMSD RB on 17 August. As mentioned in Paragraph 3.3, the first S&S declaration submitted on 12 May did not include the RR issue, because it was not considered a S&S issue. The second S&S declaration was also issued without mention of the RR issue because it was still considered a priority Day 2 issue i.e. highly desirable but not essential for Day 1 MFO operations. Furthermore, it was believed (at that time) that the issue would be mitigated by the adjustment of the TMT trace-level settings prior to MFO commencement and that there was therefore no reason to raise it to EMSD RB. The IP opines that if the complexity of the RR issue had been fully understood, the issue would have been raised to EMSD RB.
3.14 Growing pressure from Siemens to adjust the TMT trace-level
settings: By the end of August, there was growing pressure from Siemens to adjust the TMT trace-level settings. On 24 August, Siemens reiterated its advice to adjust the TMT trace-level settings. MTRCL advised that it would be discussed at the meeting of the Signalling Implementation Task Force (ITF) on 25 August6. At the meeting there was a discussion of the TMT trace-level settings adjustment and the strong need to get its implementation approved although there is no record of the potential full consequences being explicitly articulated.
3.15 There were then two further attempts in emails on 26 August and
31 August to get the EDOC implemented. Again, there were concerns expressed internally that any changes to the ATS system could lead to unforeseen issues, and therefore there was a reluctance to make the TMT trace-level settings adjustment. It appears that the priority was to deliver the commencement of MFO and that any changes to the systems were seen as creating
6 The ITF is an MTRCL internal co-ordination meeting to prioritise site-work on the new signalling system, and to minimise the impact of site-work to the existing train service on East Rail Line.
Page 11 of 22
a risk of potentially introducing unknown new problems and were therefore advised against.
3.16 Simulation tests were also carried out to assess the impact of the
TMT trace-level settings change. No notable improvement to the system loading and TMT performance was observed after the simulated change to the TMT trace-level settings. The IP opines that this was another trigger for a fuller technical investigation into the RR issue, at least to establish how much benefit the TMT trace-level settings adjustment could actually bring to improve the RR issue.
3.17 Final attempt to implement the EDOC and decision to defer
action by Joint T&C Safety Panel: The final attempt to implement the EDOC came in the week of 7 September. On 7 September, Siemens specified the possibility of a 20-minute delay and that a train could be wrongly routed. These issues were reiterated in an internal early morning email on 8 September.
3.18 This was then followed by a meeting on 8 September of the Joint
Testing & Commissioning (T&C) Safety Panel7. The focus of the discussion of the RR issue at the meeting was on the deadlocking situation in the terminus. There was a clear discussion and full agreement that this was not a safety issue. The potential consequences of a 20-minute delay due to deadlocking at Terminus and bifurcation point were discussed and understood. However, while the potential for wrong routing at a bifurcation point was mentioned by Siemens, it did not appear to have been picked up by all the participants at the meeting.
3.19 At this Joint T&C Safety Panel meeting, a decision was made to
delay the implementation of the TMT trace-level settings adjustment until after commencement MFO for a number of reasons:
7 The Joint T&C Safety Panel is an MTRCL panel with external experts and contractor attendance. It is responsible for reviewing the readiness for any major tests and drills of the new signalling system to be carried out on site.
Page 12 of 22
• The implementation of the TMT trace-level settings change would only mitigate and not solve the RR issue, so there was still a possibility that the issue could occur. In fact, technical investigations after 12 September have now shown that the TMT trace-level settings adjustment will have no notable improvement to the RR issue.
• It was considered that there would be some risks entailed in
implementing any change so close to the commencement of MFO, and the concern over this risk was heightened by the Greyout issue in May, which had also been related to late changes to the ATS system.
• The likelihood of occurrence of the RR issue was considered
to be remote. Furthermore, if RR happens in service, it was considered that the situation could be detected and controlled manually by operational procedure.
3.20 The IP agrees that making last-minute changes should be
avoided whenever possible; however, the decision whether or not to implement the TMT trace-level settings adjustment should have been made in the light of a clear understanding of the probability and impact of the RR issue occurring and, the potential benefits that the adjustment would bring. Furthermore, this decision should have been discussed with Government in order to develop an agreed way forward.
3.21 The Joint T&C Safety Panel asked the team to develop an
operational procedure to handle the deadlock issue and to further investigate what would need to be done before implementation of the TMT trace-level settings adjustment.
3.22 The development of operational measures to manage the RR
issue for passenger operations: Following the Joint T&C Safety Panel meeting on 8 September, the Projects team and Operation Control Centre (OCC) staff met on 9 and 10 September to discuss an operational procedure as the TMT trace-level settings adjustment was not going to be implemented before the commencement of MFO. There were some objections raised to the initial proposed procedure, as it would involve someone
Page 13 of 22
monitoring a screen for 19 hours. In light of this concern, a more robust operational procedure was developed.
3.23 Over the course of 9 and 10 September, the operational
procedure was refined, incorporating observations from various stakeholders.
3.24 On 10 September, Siemens reiterated the need to adjust the
TMT trace-level settings. Meanwhile, there were further discussions that day among various MTRCL staff to develop the operational procedure, and by the afternoon of 10 September, a fully developed operational procedure was in place.
3.25 At the same time, the media raised questions with the
Government and MTRCL about the train routing issue and the possible consequences, including the potential for wrong routing.
3.26 Later on 10 September, the Executive was informed about the
issue and the media interest. It was agreed to hold any decision making until the issue had been discussed with Government the following day.
3.27 After discussion with relevant Government departments, it was
decided that a technical solution, rather than operational procedure, would be the optimal way to deal with the RR issue. Accordingly, on 11 September, MTRCL decided to defer the commencement of MFO and an announcement was made in the afternoon. The IP opines that this was the correct decision on 11 September. Furthermore, if in June the RR issue, its root cause, consequence and solution requirements, and even interim operational procedures had been determined and discussed with EMSD RB, a more robust and earlier decision concerning MFO could have been made.
4. Findings
4.1 Safe & Sound
4.1.1 Having reviewed the evidence, including those related to the S&S declaration, the IP is of the view that the RR
Page 14 of 22
issue was not a safety related issue. MTRCL has conducted assessments of the potential for misrouting of a train including both as part of the risk review undertaken during the original project development process and subsequently as part of the technical investigation undertaken since 12 September. It has been formally assessed as having no safety impact and this view aligns with that of the ISA also given as part of the recent technical investigation undertaken.
4.1.2 While the IP could not identify any single definition for
‘soundness’ a process has been implemented based on a practice which is understood by relevant stakeholders. The scope of what must be demonstrated to Government departments to prove soundness is agreed on a project-by-project basis.
4.1.3 The assessment of the soundness of the Shatin to
Central Link (SCL) project followed the normal practice, notably by assessing and demonstrating journey time and train service headway reliability performance. The RR issue was not considered by the MTRCL team to fall into the formal criteria for reporting to Government departments based on the established S&S.
4.1.4 The IP opines that additional analysis on the root-cause
as well as the probability and impact of the RR issue on service reliability should have been undertaken from when the proposed 15 September software implementation fix was identified in June (as discussed in para 3.4). Further, even without this analysis, as the understanding of the ultimate consequences of this RR issue and the urge to make the TMT trace-level settings adjustment before commencement of MFO grew, MTRCL should have proactively discussed it with Government. Since the first identification of the issue in May, there were many opportunities to discuss the issue with Government officials through formal and ad hoc channels.
Page 15 of 22
4.2 Identification, Analysis and Classification of the Issue
4.2.1 The RR issue was first identified on 10^11 May. It was considered a Day 2 item upon its identification and analysis. However, the situation started to change from early July when Siemens first suggested an adjustment to the TMT trace-level settings to improve the performance of the ATS system.
4.2.2 By early August, the linkage between the adjustment of
TMT trace-level settings and the RR issue was understood by the MTRCL team and Siemens increased the strength of their recommendation. On 7 August, Siemens replied to an MTRCL email, stating: “We strongly suggest to implement this [trace-level] change before start of MFO because it is the first step to improve TMT performance without changing the SW [software] itself” as they considered that an improvement had been identified which was “a very simple means to gain significant performance, hence providing an extra level of confidence and therefore was strongly recommended to be done before start of revenue operation.” However, the TMT trace-level settings adjustment was still only considered by the MTRCL team to be a highly desirable short-term Day 1 mitigation measure, with the software update as the Day 2 full resolution. An EDOC was created to implement the TMT trace-level settings change before commencement of MFO. This EDOC could be considered as the beginning of the escalation process given its circulation for comments and the need to for sign-off. The EDOC was finally signed-off by MTRCL on 18 August.
4.2.3 The IP considers that the overall performance of the TMT
appeared to be becoming increasingly concerning over the period in question with respect to its impact on the functionality of the ARS system and the RR issue. Siemens’ change of stance to a “strong suggestion”, followed by the persistent urging to implement the TMT trace-level settings adjustment, suggests that this issue had grown in significance. This change of significance
Page 16 of 22
does not appear to have been fully recognised within the MTRCL team who then underestimated the complexity of the issue and trusted that there was a simple mitigation (by way of implementation of the TMT trace-level settings adjustment). Furthermore, the MTRCL team were not sensitive enough to the potential service impact of the RR issue. The IP opines that the MTRCL team were lacking in demanding Siemens provide a more detailed analysis and follow-up of the RR issue and the overall ARS functionality. This underestimation then influenced the reporting actions that are covered in the next sections. At the same time it was also incumbent on Siemens, as the signalling experts and system supplier and contractor, to provide a proper analysis of the RR issue and better explain their reasoning for requesting the trace level setting be adjusted before commencement of MFO. It appears too much reliance was being placed on the assumption that the trace level setting would help without there being a detailed analysis.
4.2.4 Technical Investigation after 12 September has now
shown that the RR issue is more complex and the TMT trace-level settings adjustment “strongly suggested by Siemens” has no notable effect in terms of mitigation and the full resolution requires software changes.
4.3 Internal Escalation
4.3.1 MTRCL has a hierarchy of governance and decision-
making bodies to oversee the commissioning of the new signalling system. Project progress is reported through a hierarchy of reports from team and project levels to Executive meetings, and to Board’s Capital Works Committee and the Board itself. The RR issue has not been reported in this path. To manage drills and exercises and to co-ordinate on-site testing and commissioning activities, there is the Joint T&C Safety Panel; and the ITF. There are therefore clear and well understood paths for escalation of issues and decision making at an appropriate level depending on the seriousness of an issue. Having reviewed this
Page 17 of 22
governance and reporting mechanism the IP considers it to be robust, as long as an issue has been appropriately classified and understood by the relevant teams which, in this case, was not.
4.3.2 There were three unsuccessful attempts to implement the
TMT trace-level settings adjustment, but these efforts were unsuccessful due to a typhoon and the prioritisation of other work which were considered to be more urgent. The issue was then escalated through the ITF to the Joint T&C Safety Panel on 8 September as a final last-minute attempt to get the TMT trace-level settings adjusted before MFO. However, by this time, commencement of MFO was only four days away. Moreover, there was an underlying concern that any last-minute software changes could lead to a repetition of the Greyout issue, which had caused significant project progress disruption between May and August. The lack of time and the uncertainty caused by previous issues created a reticence to implement the TMT trace-level settings adjustment.
4.3.3 Although the issue was eventually escalated to Director
level on 8 September at the Joint T&C Safety Panel, this was too late. The opinion of the IP is that the issue should have been escalated earlier and more widely within MTRCL from early August when Siemens’ position on the implementation of the adjustment to the TMT trace-level settings stated to be a “strong suggestion”. The IP therefore believes that MTRCL’s internal checking (known as ‘Second Line of Defence’) should be enhanced to detect and escalate issues early.
4.4 Reporting to Government
4.4.1 The fact that the RR issue was not considered a
reportable Day 1 issue was acceptable in May but became increasingly less so from the time Siemens first time stated their suggestion in early August.
Page 18 of 22
4.4.2 As discussed above, the complexity of the issue was underestimated by the MTRCL team who were not sensitive enough to the potential service impact of the issue. Both Siemens and MTRCL considered that the potential risk resulting from the RR issue would be mitigated by the implementation of the TMT trace-level settings adjustment prior to commencement of MFO. They also considered that the TMT trace-level settings adjustment was a simple procedure. The fact that the RR issue would be largely mitigated before MFO with a simple fix led MTRCL to believe that it was not an issue that warranted being raised with Government. It also limited internal discussion on the ultimate consequences of the issue and the operational procedure that could have been required.
4.4.3 The IP is of the opinion that the fact the Government was
not informed was a misjudgement on the part of MTRCL. Given this, and despite the belief at that time that a simple change was going to be made before commencement of MFO, the Panel opines that the Government should have been informed by MTRCL. Furthermore, once the decision not to implement the TMT Trace Settings adjustment was made at the Joint T&C Safety Panel, MTRCL should have discussed the issue with Government to agree a proposed way forward.
4.4.4 MTRCL’s Project Integrated Management System (PIMS)
guidelines on whether, when and how Government departments should be informed of potential soundness issues and the overall Sound process are unclear. Therefore, the IP recommends that PIMS should be reviewed to provide more clarity on the Sound process and reporting responsibilities. This should include clear reference to and guidance on working with the Government’s newly introduced Service Reliability Reporting mechanism.
Page 19 of 22
4.5 Advisers
4.5.1 Modern signalling systems are highly complex. While MTRCL has a high level of signalling expertise there is inevitably a degree of reliance on the deep system knowledge within the contractor organisation. Therefore, in order to support the assurance process MTRCL employs specialist advisers from whom advice can be sought, namely, the ISA and the Independent Reviewer (IR). As the ISA is contracted by MTRCL to advise on critical safety aspects of the new signalling system and given that this issue was never contemplated to be a safety issue by MTRCL and Siemens, it is understandable that the ISA was not engaged on this issue. However, the IR is contracted by MTRCL to provide advice and guidance on the technical maturity of the new signalling system and specifically on issues pertaining to performance and reliability. The IR was not consulted, despite working from the same office space. The IP considers that the IR should have been engaged as soon as Siemens made their “strong suggestion” and before making the decision as to whether or not to implement the TMT trace-level settings adjustment.
5. Summary of Findings
5.1 The key findings are as follows:
5.1.1 The RR issue was not a safety issue; it was one of reliability.
5.1.2 The root cause of the issue was not investigated
thoroughly for a number of reasons: there was a diversion of attention to arising from issues of greater potential operational and safety impact (23 and 25 May); the possibility of an RR was considered remote; and both Siemens and MTRCL believed that the proposed TMT trace-level settings adjustment was to be implemented before commencement of MFO and the issue would be sufficiently mitigated (now known to be incorrect).
Page 20 of 22
Furthermore, it was thought that the problem would be fully resolved shortly after commencement of MFO by a planned software upgrade. The IP opines that the MTRCL team and Siemens should have carried out a more detailed investigation earlier. Had they done so they would have realised there were additional underlying root causes to the issue, which were more complex. This would likely have alerted them to the necessity to escalate the matter within MTRCL and report it to Government. It was an error of judgement not to carry out a more detailed investigation earlier. The MTRCL team and Siemens were aware of the consequences of misrouting of a train, with a risk review undertaken as part of the project development process. It is considered this judgement error was influenced by the aforementioned factors and the IP found no evidence of deliberate concealment by MTRCL or Siemens of the RR issue.
5.1.3 The full consequences of the RR issue were not explicitly
articulated until 01 September. The IP opines that the consequences should have been more communicated by both Siemens and the MTRCL team.
5.1.4 The technical investigation conducted after 12
September has now identified the root cause of the RR issue and a full technical solution is being developed. This should have been done earlier by Siemens and MTRCL.
5.1.5 The issue was escalated internally to MTRCL Director
level, when it was realised that the TMT trace-level settings adjustment was not going to be undertaken before commencement of MFO. However, this was too late. Rather, it should have been escalated as soon as the contractor, Siemens, first stated their position to be a “strong suggestion”, that the TMT trace-level settings adjustment should be implemented before the commencement of MFO.
5.1.6 While it is not considered by the MTRCL work team to fall
within the pre-existing classification of matters required to be reported to relevant Government departments, the IP
Page 21 of 22
considers that this issue should have been discussed with the relevant Government departments. The IP notes and supports the recently added requirement of a Service Reliability Report, which would have reported matters such as the un-resolved RR issue, alongside the existing System Safety Report in the S&S process.
6. Recommendations
6.1 The Panel has the following recommendations after reviewing the relevant procedures and evidence: 6.1.1 There was no specific articulation, assessment, or
questioning of the probability or impact of the RR issue, nor an assessment of the potential reduction on the likelihood of an RR occurrence by the proposed adjustment to the TMT trace-level settings. Once the issue had been identified, the potential consequences of not implementing the change should have been explicitly articulated in the TMT trace-level settings adjustment documentation. The IP recommends a review of the PIMS and EDOC processes to give clear guidance on:
• The need to make explicit the probability and impact of
the subject issue occurring
• The quantifiable benefits or improvements expected from implementing a proposed change
• A clear recommendation of when the change should
be made and the relevance of that timing. 6.1.2 Relevant staff should be briefed to raise the sensitivity
and awareness of public concerns, effective communication and the importance of service quality and reliability issues, in addition to the current focus on safety.
6.1.3 As the importance of the RR issue grew with time, it
should have been escalated more quickly and better use
Page 22 of 22
should have been made of the IR. Furthermore, Government should have been informed. However, there was insufficient clarity in terms of whether this should have been reported to or shared with the relevant Government departments and by whom. The IP recommends a revision of PIMS to add more clarity on the use of both ISAs and IRs and, escalation and reporting including the importance and requirements of the S&S submission process.
6.1.4 The Second Line of Defence arrangements within
MTRCL should be enhanced in order to detect and escalate important issues early.
6.1.5 The IP is of the opinion that while the RR issue did not
concern safety, it was one of reliability and should have been reported to Government. The IP strongly supports the recent introduction of the Service Reliability Report, to accompany the existing System Safety Report. This should ensure that Government departments are kept adequately informed of all significant reliability issues in the future.
Appendix 2
Page 1
Technical Investigation Report
on Route Recall Issue
Submitted by:
Date: 21 January 2021
____________________ Operations Director
Dr. Tony Lee
Chairman of the Technical
Investigation Core Team
Page 2
Table of Contents
Executive Summary 1. Preamble
2. Route Recall (RR) Issue
3. Technical Investigation
4. Findings on the RR Issue
5. Two Additional Automatic Route Setting issues
6. Known system performance improvements
7. Conclusions
8. Recommendations
Annex 1: Technical Investigation Core Team Members
Annex 2: TMT Data Processing between ATS modules “lidi” and “TMT”
Annex 3: Delayed TD Stepping Phenomenon
Page 3
Executive Summary
Subsequent to the deferral of the planned Mixed Fleet Operation (MFO)
of the East Rail Line (EAL) on 12 September 2020 due to the “Route
Recall (RR)” Issue of the Automatic Train Supervision (ATS) subsystem
in the new signalling system, a technical investigation on the root cause
of the RR Issue together with other technical issues related to the
launching of EAL new signalling system has been conducted by the
Technical Investigation Core Team comprising MTR Operations and
Projects teams, Siemens (the Contractor) and External Technical
Advisors.
The Core Team has conducted a series of simulation and train testing
in the non-traffic hours (NTH) on 12, 19 and 28 October 2020 and
reviewed all the observed results to ascertain the root cause of the RR
Issue and develop the technical solutions.
The investigation confirmed that the RR Issue is not a safety issue but
a service reliability issue. It could lead to potential routing of a train to
an unintended destination and may generate false “Signal Passed at
Danger (SPAD)” alarms. It was attributed to the unexpected high
volume of data in the new Fault Classification Update (FCU) software
routine between two ATS modules, which was a non-standard software
specifically built by the Contractor as an add-on feature to fulfil MTRCL’s
requirement to provide extra train monitoring information to the Traffic
Controller in Operations Control Centre (OCC). The root cause of the
Page 4
RR Issue is due to the software defect in the new FCU software routine
which was not identified during testing carried out by the Contractor.
The overall system performance for normal train operation will not be
affected if the new FCU software routine is shut down. The provisioning
of the Fault Classification Update software routine for future use, if
needed, will be subjected to Government approval with further design,
testing and validation processes before its introduction.
Simulation tests done in August 2020 and the series of NTH testing in
October 2020 also confirmed that the adjustment of the “Train
Monitoring and Tracking (TMT)” trace-level settings inside the ATS
subsystem as proposed by the Contractor from July to early September
2020 would have no observable effect in resolving the RR issue.
To rectify the RR issue, the Contractor had (a) shut down the new FCU
software routine, (b) upgraded the computer hardware and (c) upgraded
the Automatic Route Setting (ARS) software as an extra assurance to
prevent the wrong routing. MTRCL and Contractor had conducted a
series of NTH testing and verified their effectiveness in eliminating the
RR Issue.
During the NTH train testing on 12 October and 6 December 2020, there
were two additional ARS issues that led to train route being assigned to
Page 5
the unintended platform of the correct station due to other software
defect and software installation issues respectively. They had been
fixed by upgrading the corresponding software and enhancing the
software upgrading control & authorization procedures respectively.
In the course of the technical investigation of the RR issue, the Core
Team has also taken the opportunity to review latest known signalling
system performance improvement items as at early January 2021 with
EMSD, Highways Department (HyD) and Transport Department (TD).
They have been properly addressed with the appropriate operational
measures derived to handle any related incidents during service
particularly those that may lead to minor service impact after MFO
commencement.
MTRCL will continue to monitor the performance of the signalling
system closely if any further improvement areas are identified thereafter
through operational experience after MFO. MTRCL will keep on
optimizing and enhancing its performance.
It is recommended to commence MFO after the completion of Safe and
Sound process.
Page 6
1. Preamble
1.1 North South Line (NSL), which is being built under the Shatin to
Central Link (SCL) project, is an extension of the existing East
Rail Line (EAL) to cross the harbor to the existing Admiralty
Station (ADM). The existing EAL Signalling System will be
migrated to the new EAL Signalling System prior to the opening
of NSL. The new EAL Signalling System will be put in passenger
service on EAL from the commencement of Mixed Fleet
Operation (MFO).
1.2 A new signalling system with introduction of MFO is necessary
as to replace the existing aged signalling system, meet
requirements of extended part of East Rail Line (EAL) to
Admiralty, and support gradual conversion of 12-car to 9-car train
operation during MFO to achieve final 9-car train operation for
SCL project.
1.3 In preparation for MFO on the East Rail Line (EAL), two tests
were conducted during the Non-Traffic Hours (NTH) of 22^23
and 24^25 May 2020 respectively to demonstrate the operation
of the new EAL signalling system. Two independent incidents on
system behavior were reported during the tests. One was about
Automatic Train Supervision (ATS) Line Overview Display grey-
Page 7
out caused by activation of the data logging function i.e. Paktel1,
which was wrongly deployed for normal operation. It degraded
the processing performance. Upon subsequent detailed
investigation of the root cause of the incident, this logging
function has since been banned from use. The other one was
Interlocking shutdown incident which was related to
simultaneous manual shutdown of the signalling interlocking
computers instead of sequential shutdown of computers. To
address the issue, revision to relevant maintenance manual has
been arranged together with necessary briefing to relevant staff
and posting of prominent warning notice besides the related
computers to prevent simultaneous manual shutdown. For
these incidents, MTRCL had submitted the investigation report
with findings reviewed and accepted by the Government on 17
August 2020 and the improvement action plan as mentioned
above has been implemented.
1.4 Thereafter, MTRCL planned to commission the new signalling
system and gradually introduce new 9-car trains on the EAL
(collectively, “MFO”) starting from 12 September 2020.
1 Paktel is a data logging function designed for use in testing and maintenance.
Page 8
2. Route Recall Issue
2.1 On 10 September 2020, the media raised questions with the
Government and MTRCL about the train routing issue and the
possible consequences, including the potential for wrong routing.
On 11 September 2020, MTRCL and Siemens (the Contractor)
notified the Government about an issue observed during train
testing in non-traffic hours on 11 May 2020 (the “Route Recall
Issue”). In essence, each scheduled train shall have a pre-
determined route in the pre-set train operation schedule. The
symptom of the Route Recall (RR) Issue was a slow response of
the ATS system so that, even though a train had already entered
the designated route, the route for this train was recalled by the
Automatic Route Setting function (ARS)2 for the following train.
As such, for train routing at a bifurcation, it was possible that the
following train may have gone to an unintended route and
destination e.g. Sheung Shui to Lo Wu/Lok Ma Chau, and
between Shatin and University via Fo Tan/Racecourse. Upon a
final review of the new system prior to service commencement
on 11 September 2020, the planned MFO commencement
scheduled for 12 September 2020 was held over to allow MTRCL
to find out the root cause and technical solution instead of using
operational measures to handle the issue.
2 Automatic Route Setting (ARS) is a function in ATS to perform route setting for train movement
automatically.
Page 9
2.2 Subsequent technical investigation revealed that RR also
occurred on 23 May during the NTH testing when the ATS grey
out was observed, which was due to the activation of a specific
data logging function, Paktel, in the ATS subsystem, as
mentioned in paragraph 1.3. On some occasions, false “Signal
Passed At Danger (SPAD)3” alarm will come with the RR as well,
although there were physically no such train passing through
stop signals in red. The original intent of SPAD alarm serves to
alert operators for intervention when detecting a train passing
through a stop signal in red without authorization.
3. Technical Investigation
3.1 A Technical Investigation Core Team was set up on 18
September 2020 to examine the root cause of the RR Issue and
develop technical solutions before the commencement of MFO.
The Core Team is staffed with MTRCL staff from Operations and
Projects teams, Contractor and External Technical Advisors (as
listed in Annex 1). The methodology adopted in the investigation
was primarily based on Root Cause Analysis i.e. identifying
problems, collecting facts, analyzing data, determining root
causes plus other causal factors, and verifying them through
simulation plus site testing. The Core Team has unanimously
agreed on the investigation findings, identified the root cause of
3 Signal Passed at Danger (SPAD) is an event when a train passes a stop signal without authorization.
Page 10
the RR issue, reviewed all observed issues during simulations
and train testing, and validated the identified technical solutions.
3.2 After initial analysis, the Contractor advised that the RR was
related to the performance of the “Train Monitoring and Tracking
(TMT)4” module inside ATS. The observed slow performance of
TMT was originally considered as the cause of RR. The then
trace-level setting 5 was initially considered as the immediate
cause of slow performance of TMT, thus in early August the
Contractor suggested to adjust the TMT trace-level setting.
4. Findings on the RR Issue
4.1 Through in-depth technical investigation by computer simulation
and actual train testing in non-traffic hours (NTH) on 12, 19 and
28 October 2020, much useful additional technical information
was obtained. It was revealed that the Computer Processing
Unit (CPU) assigned to process the “TMT” data received an
unexpectedly high volume of “fault-classification update” data
from another ATS module called List Display (“lidi”) as shown in
Annex 2. This high volume of data transfer could be triggered
by a range of events, such as
4 Train Monitoring and Tracking (TMT) is a software module to monitor the train position and track train
movement according to the route setting by ATS either manually or automatically by ARS.
5 Trace-level setting alters the debugging and logging details in testing and maintenance.
Page 11
(a) changing the control initiated by the changeover of
responsibilities at controller terminals of Traffic Controllers
in Operations Control Centre;
(b) rebooting of any interlocking sub-systems; and
(c) resuming a faulty trackside data link.
4.2 The software routine called Fault Classification Update (FCU),
which was a non-standard software routine, was built then with
messages being sent from the ATS module “lidi” to another ATS
module “TMT”. The data processing of the “fault classification
update” messages is a new6 feature tailor-made for the SCL new
signalling system by the Contractor to fulfill the MTRCL’s
requirement to provide extra train monitoring data to the Traffic
Controller. This FCU is an add-on feature by the Contractor to
its standard product but it will not affect the overall system
performance of normal train operation even if it is subsequently
shut down.
4.3 Under this new and additional software logic, a high demand of
“TMT” CPU processing power was deployed to handle the
unexpectedly high volume of “fault classification update” data,
resulting in a delay in processing of the real time train and
trackside information, thereby generating a delayed stepping7
6 Fault Classification Update (FCU) software routine was newly developed by Siemens and has not been
used for any other railway projects.
7 Delayed stepping is a virtual representation of the train lags slightly behind its actual position.
Page 12
phenomenon, i.e. virtual representation of the train position in
ATS lags slightly behind the actual train position (as shown in
Annex 3). The ATS used the virtual representation of the train
position to perform route setting. Because of the difference
between the virtual representation of the train position and the
actual train position, the ATS regarded the train (based on the
virtual representation) was still yet to enter the intended route
and therefore set the route again and led to a Route Recall. In
addition, since the actual train has passed the proceed signal
and turned it to a stop signal8, when the virtual representation of
the same train passes the stop signal, the ATS would generate
a false SPAD alarm. These two sequences of events were the
root cause of Route Recall and false SPAD alarm. More details
can be found in Annex 3. This finding revealed that the new
FCU software routine implemented by the Contractor had a
software defect which was not identified during testing resulting
in the RR issue.
4.4 From July to early September 2020, whilst noting the RR issue
is not a safety issue as all safety functions including Automatic
Train Protection system and Interlocking system of the new
signalling system were functional in full order, the Contractor had
recommended implementing an interim measure to alleviate the
“TMT” CPU heavy loading by adjusting its trace-level settings
8 The stop signal was generated by the safety interlocking system based on the actual train position
instead of ATS which based on virtual representation of the train position.
Page 13
with approval obtained through an MTRCL internal process. In
the absence of a full technical investigation into the root cause of
the RR issue, its effectiveness was however not duly tested, and
indeed it was later revealed through simulation tests that the
change resulted in no performance difference on the TMT
loading. Subsequent results obtained from the NTH train testing
arranged by the Core Team in collaboration with the Contractor
on 28 October 2020 also confirmed that the adjustment of the
“TMT” trace-level would have no observable effect in tackling the
data processing loading, hence would not have improved the RR
issue.
4.5 To eliminate the RR issue including the associated false SPAD,
the Contractor completed the following measures with site
testing to validate their effectiveness satisfactorily.
(a) applied a software change to shut down the new FCU
software routine built for sending messages between the
two internal ATS modules i.e. “TMT” and “lidi”, as the overall
performance for new signalling system, including the
flexibility for future operational expandability, will not be
affected;
(b) upgraded the computer server with a faster possessing
speed to give additional capacity buffer for the “TMT” CPU
process; and
Page 14
(c) upgraded the ARS software by data implementation in
software control logic to prevent RR Issue, hence wrong
routing, at critical junctions and bifurcations as an extra
assurance.
5. Two additional Automatic Route Setting (“ARS”) issues
5.1 During the NTH train testing on 12 October and 6 December
2020, there were two cases of train route being assigned to
unintended platform of the correct station due to two separate
issues.
5.2 The first case happened on 12 October 2020. A southbound test
train from Lo Wu was unintentionally routed to Sheung Shui
northbound platform instead of southbound platform. The
incident train, which was not in the loaded set of timetabled trains,
departed from Lo Wu. Investigation found that the ATS wrongly
treated the train as a timetabled train and further assigned to this
train an incorrect data of the next station stop to Sheung Shui
northbound platform. This undesirable result was caused by a
software defect implemented by the Contractor.
5.3 Another case happened on 6 December 2020 in which a
southbound test train from Lok Ma Chau was unintentionally
routed to Sheung Shui northbound platform instead of
southbound platform. Investigation revealed that during the
Page 15
software upgrading process for fixing the ATS software issues,
there was a human error, causing incorrect software upgrading
sequence, software version, and database files. The whole
software installation process was conducted by the Contractor
without MTRCL’s verification, resulting in having the
unintentional route assignment for the test train.
5.4 To address the issues of two cases of train route being assigned
to unintended platform, the Contractor applied a fix to the ATS
software and enhanced the software upgrading control &
authorization procedures to prevent incorrect software patch and
database file installation. Further train testing has shown that the
faults have been corrected.
6. Known system performance improvements
6.1 During the technical investigation of the RR issue, the Core
Team has also taken the opportunity to review all the latest
known signalling system performance improvement items with
EMSD, HyD and TD, and confirmed that they have been properly
addressed and enhanced, or appropriate operational measures
derived anew or by using existing ones to handle related issues
during service after MFO commencement.
6.2 MTRCL will continue to monitor the performance of the signalling
system closely if any further improvement areas are identified
Page 16
thereafter through operational experience after MFO. MTRCL
will keep on optimizing and enhancing its performance.
7. Conclusions
7.1 The Core Team had identified the root cause of the RR issue and
the two additional ARS issues. Corresponding software fixes,
new software upgrading control & authorization procedures have
been tested and verified in the presence of Government
departments. To eliminate the RR issue including the associated
false SPAD, the Contractor had already completed the following
measures with site testing to validate their effectiveness
satisfactorily.
(a) applied a software change to shut down the new FCU
software routine built for sending messages between the
two internal ATS modules i.e. “TMT” and “lidi”, so that the
overall performance for new signalling system, including the
flexibility for future operational expandability, will not be
affected;
(b) upgraded the computer server with a faster possessing
speed to give additional capacity buffer for the “TMT” CPU
process; and
Page 17
(c) upgraded the ARS software by data implementation in the
software control logic to prevent RR Issue, hence wrong
routing, at critical junctions and bifurcations as an extra
assurance.
7.2 The Core Team also reviewed all known system performance
improvement items and derived appropriate operational
measures in particular those items with minor service reliability
impact.
7.3 The investigation reconfirmed that the RR issue, the two
additional ARS issues, and all the known system performance
improvement items are not safety related. Throughout the test
and investigation, all safety functions including Automatic Train
Protection system and Interlocking system of the new signalling
system were functional in full order.
7.4 The investigation reconfirmed that the shutdown of this FCU
software routine, which is an add-on feature by The Contractor
to its standard product, to eliminate the RR issue would not affect
the overall system performance of normal train operation.
7.5 The two additional automatic route setting issues leading to train
route being assigned to an unintended platform at Sheung Shui
Station during testing had been addressed by fixing the ATS
Page 18
software by the Contractor and enhancing the software
upgrading control & authorization procedure.
7.6 The Independent Safety Assessor employed by the MTRCL
confirmed i) the correct identification of the root cause of the RR
Issue, ii) the corresponding rectification has successfully fixed
the RR Issue, iii) the effectiveness of related rectification
measures taken on the two additional route setting issues, and
iv) the known system performance improvement items are not
safety related.
8. Recommendations
8.1 The provisioning of the FCU software routine for future use, if
needed, will be subject to Government approval with further
design, testing and validation processes before its introduction.
8.2 Close monitoring on the performance of the new signalling
system and the implementation of all the committed system
performance improvement areas will be undertaken to ensure
the continuous pursuit of high reliability in the operation of new
signalling system.
8.3 It is recommended to commence EAL MFO after the completion
of Safe and Sound process.
Page 19
Annex 1 : Technical Investigation Core Team Members
MTRCL
Operations Director Tony Lee
(Chairman)
Chief of Operations Engineering Nelson Ng
Chief of Operating Sammy Wong
Chief Signal Engineer (Operations) Gordon Lam
Acting Head of Operating – N&ER Bess Ng
Project Safety Assurance Manager Maggie Chan
Head of E&M Construction CL Leung
General Manager – Planning & Development Ronald Cheng
General Manager – Planning & Development
(Designate)
Allen Ding
Acting Deputy General Manager – Planning
& Development
Mark Chan
General Manager – Special Duties John William Manho
Head of Operations Strategic Business
Manager
Siman Tang
(Secretarial)
Principal Engineering Advisor (Operations) Philip Wong
Project Manager – Signalling Terence Law
Construction Manager – SCL Signalling Clifford Chow
Infrastructure Development & Engineering
Asset Manager
Jonathan Lau
Chief Signalling Design Manager Joseph Sin
Deputy General Manager – Technical &
Asset Engineering
Michael Leung
Page 20
Project Manager – Rolling Stock CS Chan
Lead Design Manager – Rolling Stock
Mechanical
CC Hon
Operations Assurance Manager Ryan Lam
Support Engineer – Signal & Electronic
Control Engineer
Patrick Hsia
(Secretarial)
Siemens (Contractor)
CEO Siemens Mobility Limited Peter Brauner
Project Advisor (Offshore) Torsten Oelkers
Project Director Alex Tsang
Deputy Project Director Adam Schmidt
Project Manager (Offshore) Andreas Wiedemann
ATS Sub-project Manager Dirk Paulmann
System Integration Manager (Offshore) Matthias Lange
T&C Manager Saransh Mutreja
System Manager Ivano Folgosa
External Technical Advisors
Professor SL Ho – as Core Team member (Chair Professor of
Electricity Utilization)
Has published more than 400 technical papers, mostly in reputable journals
such as IEEE Transactions or IET Proceedings and has more than 40 years
of experiences in research and development on electrical/power system,
particularly in railway engineering in the last twenty years.
Page 21
Stephen Clarke – Supporting Core Team (Lead Signatory
Certification)
Technical Lead for Independent Assurance and Lead Signatory for Ricardo
Certification. He has had a long-term involvement in assessment of safety
cases for train control and signalling systems. Responsible for numerous
high-profile Independent Assurance jobs discharged by Ricardo in the UK,
Europe and the Middle East. Lead Signatory for Ricardo Certification,
overseeing and providing approval for a wide range of independent
assurance projects in the UK, Denmark, Spain and the Middle East.
Mike Castles – Supporting Core Team (Global Technical Expert—
Functional Safety)
Safety assurance manager with a strong background in safety-critical
signalling systems, infrastructure projects and rolling stock. Experienced in
the whole safety lifecycle, from hazard identification and safety planning
through to safety cases and approvals. Expert in application of the Common
Safety Method on Risk Evaluation and Assessment (CSM RA). Experienced
RAM practitioner. Specialist technical expertise in signalling control systems
and computer based interlockings.
Mike Ainsworth – Supporting Core Team (Principal Consultant)
Safety engineer with experience of developing, verifying and assessing
systems across a variety of industries, with the last ten years spent working
primarily on rail systems and projects.
Page 22
Annex 2: TMT Data Processing between ATS modules “lidi” and
“TMT”
The new TMT Data processing for “fault classification updates”
messages made between the ATS module “lidi” and “TMT” as shown
below, is a new and project specific feature tailor-made by the
Contractor to fulfill the MTRCL’s requirement.
Remark:
“lidi” – List Display
“tmt” – Train Monitoring and Tracking
Page 23
Annex 3: Delayed TD Stepping Phenomenon
1. “Route Recall”
During non-traffic hour (NTH) tests conducted on 11 May 2020,
appearances of “routes not released” were observed and recorded in
the relevant test report of 11 May 2020 NTH. According to the system
data log files, it was observed that the “route not released” was actually
the route being recalled with Train Describer (TD) stepping delay as
the symptom. The TD stepping delay led to recalling of a route already
given to a preceding train, hence the following train may pick up the
recalled route and go to an unintended destination as described below.
2. Normal TD stepping and route setting
The Automatic Route Setting (ARS) function in ATS uses a TD with a
destination code (e.g. YL stands for LMC destination and YM stands
for Lo Wu destination) to automatically set route for train movement to
its destination. The TD should be located at the front of the train
normally, and when the train is approaching a signal with its TD
entering the triggering tracks (namely OP1, OP2 and OP3), the ARS
will check if
(a) the train destination requires ARS,
(b) the route ahead is available for route setting; and
(c) the signal ahead is at “Red” aspect before initiating route setting
as illustrated below.
Page 24
Thereafter, the signal aspect would be turned to “Proceed” after route
setting with conditions checked and completed by interlocking.
Train planned to Lok Ma Chau
The TD would step forward as the physical train moves forward. As
the train passes the signal, the signal aspect would be turned to Red
and TD steps across the signal. The route set will be released as train
passed the route.
As the following train YM0125 approaches with Red signal aspect and
no route ahead, automatic route setting for it will occur when its TD
steps on OP1, leading to its own destination to LOW as defined by TD.
Train planned to Lo Wu
Train Describer
Page 25
3. TD Stepping Delay and RR issue
When the TMT CPU is overloaded as revealed in the investigation, the
“TD stepping” could be delayed and hence the TD location of the
preceding train YL0123 in the above would be lagging behind the
physical train position. As a result, the TD is located behind the train
as shown below.
4. Consequence
Since the TD is located behind the train, even after the train has passed
the signal and turned the signal to stop (Red) aspect, the ARS re-calls
the route again when the delayed TD steps on OP3, as illustrated
below. Such would lead to recalling of the same route as that of the
Page 26
preceding train (YL0123), for this example, with destination of Lok Ma
Chau (LMC) for the next train, and hence the following train (YM0125)
may pick up the recalled route and go to its unintended destination. As
illustrated in diagram below, the following train (YM0125) with original
destination at Lo Wu (LOW) might be unintentionally routed to LMC.
Another side effect is, subject to the extent of the delay “TD stepping”,
false SPAD alarm may be resulted in TMT processing when the
delayed TD steps across a red signal:
Page 27
5. Safety functions remain intact
Route Recall Issue is not a safety related issue because the ARS is a
non-safety related function provided by the ATS subsystem. The
safety checks provided by the Interlocking subsystem and Automatic
Train Control subsystem (including Automatic Train Protection i.e. ATP)
as below, for route setting and train separation remain intact, i.e. all
safety functions use the physical train locations and not the Train
Describers to maintain a safe separation between trains. The ATP
always monitors the physical position of the train to ensure that
adequate safety distance is maintained between trains; and if not so
maintained, the ATP will apply emergency braking.