triple core lock step (tcls) arm for space...1 confidential triple core lock step (tcls) arm for...

26
CONFIDENTIAL 1 Triple Core Lock Step (TCLS) ARM FOR SPACE Xabier Iturbe, Emre Ozer & Balaji Venu Toby Proctor & Alex Robinson ARM Research, Cambridge [email protected], [email protected]

Upload: others

Post on 03-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Triple Core Lock Step (TCLS) ARM FOR SPACE...1 CONFIDENTIAL Triple Core Lock Step (TCLS) ARM FOR SPACE Xabier Iturbe, Emre Ozer & Balaji Venu Toby Proctor & Alex Robinson ARM Research,

CONFIDENTIAL 1

Triple Core Lock Step (TCLS) ARM FOR SPACE

Xabier Iturbe, Emre Ozer & Balaji Venu

Toby Proctor & Alex Robinson

ARM Research, Cambridge [email protected], [email protected]

Page 2: Triple Core Lock Step (TCLS) ARM FOR SPACE...1 CONFIDENTIAL Triple Core Lock Step (TCLS) ARM FOR SPACE Xabier Iturbe, Emre Ozer & Balaji Venu Toby Proctor & Alex Robinson ARM Research,

CONFIDENTIAL 2

Agenda Big Picture - What is TCLS trying to address? Introduction to Cortex-R CPU family TCLS overview

Introduction TCLS results Architecture, Implementation PPA comparison

Status of the IP Future work Discussions

Page 3: Triple Core Lock Step (TCLS) ARM FOR SPACE...1 CONFIDENTIAL Triple Core Lock Step (TCLS) ARM FOR SPACE Xabier Iturbe, Emre Ozer & Balaji Venu Toby Proctor & Alex Robinson ARM Research,

CONFIDENTIAL 3

TCLS system IP a research project initiated by ARM Research, not by a product division

This is not a committed product

We built on top of Cortex-R5 product

Disclaimer

Page 4: Triple Core Lock Step (TCLS) ARM FOR SPACE...1 CONFIDENTIAL Triple Core Lock Step (TCLS) ARM FOR SPACE Xabier Iturbe, Emre Ozer & Balaji Venu Toby Proctor & Alex Robinson ARM Research,

CONFIDENTIAL 4

What is TCLS trying to address? Radiation hardening by process

State of the art technique used in the community to guard against Total Ionising Diode (TID) up to 300 krad(si) [1] Single event Latch-up (SEL) up to 110 MeV-cm2 / mg [1] Upsets & Transients (SEU, SET) up to 120 Mev [2]

SEUs are non-destructive Mission critical failures Non-deterministic behavior Silent Data Corruption

TCLS solution makes CPUs fully resilient to SEUs / soft errors

[1] http://www.voragotech.com/products/VA10820 [2] https://www.aerospaceonline.com/doc/how-rad-hard-do-you-need-the-changing-approac-0001

Page 5: Triple Core Lock Step (TCLS) ARM FOR SPACE...1 CONFIDENTIAL Triple Core Lock Step (TCLS) ARM FOR SPACE Xabier Iturbe, Emre Ozer & Balaji Venu Toby Proctor & Alex Robinson ARM Research,

CONFIDENTIAL 5

Resiliency to rescue

Make it ultra reliabile Rad hard by process/technology

Choose parts with Rad hard by architecture/design features

Make the design more resilient thus improving reliability

Addressing reliability concern in COTS parts of cubesats

Page 6: Triple Core Lock Step (TCLS) ARM FOR SPACE...1 CONFIDENTIAL Triple Core Lock Step (TCLS) ARM FOR SPACE Xabier Iturbe, Emre Ozer & Balaji Venu Toby Proctor & Alex Robinson ARM Research,

CONFIDENTIAL 6

Background Introduction to Cortex-R CPUs

Page 7: Triple Core Lock Step (TCLS) ARM FOR SPACE...1 CONFIDENTIAL Triple Core Lock Step (TCLS) ARM FOR SPACE Xabier Iturbe, Emre Ozer & Balaji Venu Toby Proctor & Alex Robinson ARM Research,

CONFIDENTIAL 7

ARM Cortex® Processor Profiles in Automotive Three architecture variants profiled for the different application sectors Actuation, fast control

Fast response / Real-time control

Extended Functional Safety

Cortex-R processors

MCUs, IoT, sensors, motors

RTOS

DSP

Smallest footprint / lowest power

Cortex -M processors

Computation, robotics computer-vision

Linux®, QNX

Higher performance

Cortex-A processors ARMv8-R

Page 8: Triple Core Lock Step (TCLS) ARM FOR SPACE...1 CONFIDENTIAL Triple Core Lock Step (TCLS) ARM FOR SPACE Xabier Iturbe, Emre Ozer & Balaji Venu Toby Proctor & Alex Robinson ARM Research,

CONFIDENTIAL 9

Cortex-R5 CPU Designed for functional safety (e.g. ISO 26262) in automotive, shipping for over a decade

Fault-tolerant Features

Dual-core lockstep operation (DCLS) for ASIL-D ◦ Detects transient and hard faults in CPUs

◦ Fail safe capable (e.g. reset & restart, checkpoint & roll-back)

ECC in Caches/TCMs and AXI/AHB busses

CPU0

Checker

Shared I$/TCM ECC

Shared D$/TCM ECC

CPU1

Flag Error

CPU Output Ports

Page 9: Triple Core Lock Step (TCLS) ARM FOR SPACE...1 CONFIDENTIAL Triple Core Lock Step (TCLS) ARM FOR SPACE Xabier Iturbe, Emre Ozer & Balaji Venu Toby Proctor & Alex Robinson ARM Research,

CONFIDENTIAL 10

TCLS Overview

Page 10: Triple Core Lock Step (TCLS) ARM FOR SPACE...1 CONFIDENTIAL Triple Core Lock Step (TCLS) ARM FOR SPACE Xabier Iturbe, Emre Ozer & Balaji Venu Toby Proctor & Alex Robinson ARM Research,

CONFIDENTIAL 11

TCLS ARM for Space

Introduction Collaborative project funded by European

Commission H2020 Space program http://www.tcls-arm-for-space.eu/

Objectives Understand fail functional design requirements and principles under heavy SEU scenarios Assess the fail functional design using radiation-tolerant STM65nm technology Trade-offs of TCLS v/s (DCLS) solutions v/s single core rad-hard solution

Partners (Project Leader)

Public Website

Page 11: Triple Core Lock Step (TCLS) ARM FOR SPACE...1 CONFIDENTIAL Triple Core Lock Step (TCLS) ARM FOR SPACE Xabier Iturbe, Emre Ozer & Balaji Venu Toby Proctor & Alex Robinson ARM Research,

CONFIDENTIAL 12

Project outcomes

Soft Error Failure Rate:

5.4%

Source: X. Iturbe, B. Venu, and E. Ozer: "Soft error vulnerability assessment of the real-time safety-related ARM Cortex-R5 CPU", International Symposium on Defect and Fault Tolerance in VLSI and

Nanotechnology Systems (DFT’16)

1) TCLS System Arch Specification (v7R & v8R)

2) Implemented in 28nm tech PPA comparison

3) Soft error failure rate analysis

Page 12: Triple Core Lock Step (TCLS) ARM FOR SPACE...1 CONFIDENTIAL Triple Core Lock Step (TCLS) ARM FOR SPACE Xabier Iturbe, Emre Ozer & Balaji Venu Toby Proctor & Alex Robinson ARM Research,

CONFIDENTIAL 13

TCLS CPU Subsystem Architecture

3 CPUs in lockstep operation Shared memory protected ECC No modifications to the CPU RTL TCLS Assist Unit - system-level fault handling Fail-functional operation

Source: X. Iturbe, B. Venu, E. Ozer and S. Das: "A Triple Core Lock-Step (TCLS) ARM Cortex-R5 Processor for Safety-Critical and Ultra-Reliable Applications", International Conference on Dependable Systems and Networks (DSN'16),

June/July 2016.

TCLS CPU Subsystem

Recovery Time is FAST - 2500 CPU cycles or 5.5µs @450MHz

Detect Divergence

Reset All 3 CPUs

Pop in Arch State from

TCM

Restart All 3 CPUs

Interrupt (FIQ) All 3 CPUs

Push out Arch State to

TCM

Correctable Error Sequence

Go to SAVE state

Go to RESTORE state

Page 13: Triple Core Lock Step (TCLS) ARM FOR SPACE...1 CONFIDENTIAL Triple Core Lock Step (TCLS) ARM FOR SPACE Xabier Iturbe, Emre Ozer & Balaji Venu Toby Proctor & Alex Robinson ARM Research,

CONFIDENTIAL 14

System Level Challenges

CPU0

CPU cluster

Interconnect

DRAM

ACE

Accelerator

SRAM

UART / GPIO /

Peripherals

DMA

Page 14: Triple Core Lock Step (TCLS) ARM FOR SPACE...1 CONFIDENTIAL Triple Core Lock Step (TCLS) ARM FOR SPACE Xabier Iturbe, Emre Ozer & Balaji Venu Toby Proctor & Alex Robinson ARM Research,

CONFIDENTIAL 15

System Level Challenges TCLS solves SEU vulnerability from the CPU point of view

System IPs still needs to be protected

TCLS enables the concept of “Reliable Root Node”

Build fault tolerant system leveraging the “Reliable Root Node”

Popular techniques: Scrubbing BIST (increase fault coverage) Implement System IPs using Rad hard process VORAGO HARDSIL technology

CPU0

TCLS cluster

Interconnect

DRAM

TCLS assist

ACE

Accelerator

SRAM

UART / GPIO /

Peripherals

DMA

CPU0

CPU0

Page 15: Triple Core Lock Step (TCLS) ARM FOR SPACE...1 CONFIDENTIAL Triple Core Lock Step (TCLS) ARM FOR SPACE Xabier Iturbe, Emre Ozer & Balaji Venu Toby Proctor & Alex Robinson ARM Research,

CONFIDENTIAL 16

Soft error resiliency (TCLS v/s single core) IP name Flip Flop count Num of vulnerable flops

R5 single core 18500 1K (5.4%)

IP name Flip Flop count Num of vulnerable flops

TCLS config 3 R5 CPUs

3*18500 0

TCLS assist unit 7414 81 (1.1%) Easily reduced to 0% post local TMR of flops

CPU fully resilient against SEUs / soft errors using TCLS technology

Page 16: Triple Core Lock Step (TCLS) ARM FOR SPACE...1 CONFIDENTIAL Triple Core Lock Step (TCLS) ARM FOR SPACE Xabier Iturbe, Emre Ozer & Balaji Venu Toby Proctor & Alex Robinson ARM Research,

CONFIDENTIAL 17

TCLS status & plan

Page 17: Triple Core Lock Step (TCLS) ARM FOR SPACE...1 CONFIDENTIAL Triple Core Lock Step (TCLS) ARM FOR SPACE Xabier Iturbe, Emre Ozer & Balaji Venu Toby Proctor & Alex Robinson ARM Research,

CONFIDENTIAL 18

Status of the IP Porting it to V2M-MPS3 platform

Implement POC TCLS system on Xilinx Kintex-7 FPGA

Write bare metal code to demonstrate TCLS functionality

Invite universities & industry to develop around TCLS

Target further Cubesat events

TCLS IP on Xilinx FPGA

V2M-MPS3 platform

Page 18: Triple Core Lock Step (TCLS) ARM FOR SPACE...1 CONFIDENTIAL Triple Core Lock Step (TCLS) ARM FOR SPACE Xabier Iturbe, Emre Ozer & Balaji Venu Toby Proctor & Alex Robinson ARM Research,

CONFIDENTIAL 19

The plan

HW deliverables TCLS RTL /

FPGA Bitstream

SW deliverables • SW Driver • Application

exampe • Startup guide

Silicon Partners Space Agencies

• White paper creation • Promotion at events • Official channel of

contact • Co-ordinate with ESA

the ARM-ESA CubeSat Challenge

Space & Avionics industry

Universities

Page 19: Triple Core Lock Step (TCLS) ARM FOR SPACE...1 CONFIDENTIAL Triple Core Lock Step (TCLS) ARM FOR SPACE Xabier Iturbe, Emre Ozer & Balaji Venu Toby Proctor & Alex Robinson ARM Research,

CONFIDENTIAL 20

TCLS Executive Summary

System Architecture solution with the following benefits

CPU fully resilient against SEUs / soft errors Fault injection experiments estimate failure rate 5.4% for Cortex-R5, TCLS makes it 0%

Complementary technology to existing state of the art rad-hard process.

Resynchronization time upon a soft error is 2500 clock cycles (v7R arch) 5.5µs @450MHz (state of art 1ms[3])

Scrub feature further reduces non-deterministic behavior (i.e., real time)

PPA comparable against Single core rad hard technology with the above advantages

[3] http://www.ddc-web.com/Products/Microelectronics/images/documents/SCS750_rev8_r6.pdf

Page 20: Triple Core Lock Step (TCLS) ARM FOR SPACE...1 CONFIDENTIAL Triple Core Lock Step (TCLS) ARM FOR SPACE Xabier Iturbe, Emre Ozer & Balaji Venu Toby Proctor & Alex Robinson ARM Research,

CONFIDENTIAL 21

References Publications

Triple Core Lock Step (TCLS) ARM FOR SPACE Xabier Iturbe, Balaji Venu, Emre Ozer https://indico.esa.int/indico/event/148/session/10/contribution/71/material/slides/0.pdf

A Triple Core Lock-Step (TCLS) ARM® Cortex®-R5 Processor for Safety-Critical and Ultra-Reliable Applications Xabier Iturbe, Balaji Venu, Emre Ozer, Shidhartha Das http://ieeexplore.ieee.org/abstract/document/7575387/

Soft error vulnerability assessment of the real-time safety-related ARM Cortex-R5 CPU Xabier Iturbe, Balaji Venu, Emre Ozer http://ieeexplore.ieee.org/abstract/document/7684076/

A Fail-Functional Automotive CPU subsystem architecture for Mitigating Single point of Failures Balaji Venu, EmreOzer, Xabier Iturbe, Alex Robinson https://www.researchgate.net/publication/310480363_A_Fail-Functional_Automotive_CPU_Subsystem_Architecture_for_Mitigating_Single_Point_of_Failures

Page 21: Triple Core Lock Step (TCLS) ARM FOR SPACE...1 CONFIDENTIAL Triple Core Lock Step (TCLS) ARM FOR SPACE Xabier Iturbe, Emre Ozer & Balaji Venu Toby Proctor & Alex Robinson ARM Research,

CONFIDENTIAL 22

Thank you

Page 22: Triple Core Lock Step (TCLS) ARM FOR SPACE...1 CONFIDENTIAL Triple Core Lock Step (TCLS) ARM FOR SPACE Xabier Iturbe, Emre Ozer & Balaji Venu Toby Proctor & Alex Robinson ARM Research,

CONFIDENTIAL 23

Backup

Page 23: Triple Core Lock Step (TCLS) ARM FOR SPACE...1 CONFIDENTIAL Triple Core Lock Step (TCLS) ARM FOR SPACE Xabier Iturbe, Emre Ozer & Balaji Venu Toby Proctor & Alex Robinson ARM Research,

CONFIDENTIAL 24

Reliability estimation of R5 soft error failure rate

error propagation analysis (RAS talk earlier this year)

Page 24: Triple Core Lock Step (TCLS) ARM FOR SPACE...1 CONFIDENTIAL Triple Core Lock Step (TCLS) ARM FOR SPACE Xabier Iturbe, Emre Ozer & Balaji Venu Toby Proctor & Alex Robinson ARM Research,

CONFIDENTIAL 25

TCLS Fault Injection work Component # Seq. Elements % of CPU

PFU 1,029 6%

MPU 1,065 7%

LSU 1,545 10%

CACHES 3,841 25%

CACHE-LOGIC 189 1%

DCACHE_16kB 604 4%

ICACHE_16kB 494 4%

CACHE_STB 681 4%

CACHE-AXIM 1,873 12%

DPU 8,398 52%

DPU_BR 220 1%

DPU_CPSR 442 3%

DPU_CTL 1,218 8%

DPU_DE 597 4%

DPU_LDST 205 1%

DPU_REGBANK 974 6%

DPU_FPU 1,663 10%

DPU_FREGBANK 1,130 7%

DPU_CP 777 5%

DPU_DP 1,172 7%

TOTAL 15,878 100%

Page 25: Triple Core Lock Step (TCLS) ARM FOR SPACE...1 CONFIDENTIAL Triple Core Lock Step (TCLS) ARM FOR SPACE Xabier Iturbe, Emre Ozer & Balaji Venu Toby Proctor & Alex Robinson ARM Research,

CONFIDENTIAL 26

Fault Error Failure:

FI in all Sequential Elements (FFs & Memory cells)

10 iterations of 7 benchmarks of EEMBC AutoBench

Benchmark execution time divided into 64 equally-sized intervals

FI instant randomly chosen within each interval

Failure rate = (# of exp which resulted in error) (total # of exp)

Total # of exp = 16K * 64 * 7 ~= 8 Million

Fault Injection (FI) Methodology

Initialization Routines

Interval 1 Interval 2 Interval 3 Interval 4 Interval 63 Interval 64

Benchmark execution time (10 iterations)

Initialization Routines

Interval 1 Interval 2 Interval 3 Interval 4 Interval 63 Interval 64

Initialization Routines

Interval 1 Interval 2 Interval 3 Interval 4 Interval 63 Interval 64

Page 26: Triple Core Lock Step (TCLS) ARM FOR SPACE...1 CONFIDENTIAL Triple Core Lock Step (TCLS) ARM FOR SPACE Xabier Iturbe, Emre Ozer & Balaji Venu Toby Proctor & Alex Robinson ARM Research,

CONFIDENTIAL 27

Soft-Error Rate (SER) Failure Rate:

5.4%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

ttsprk01 aifftr01 matrix01 tblook01 canrdr01 rspeed01 a2time01 puwmod01

OTHERS

DPU-CP

PFU

LSU

CACHE-AXIM

DPU-CTL

CACHE-STB

DPU-DE

DPU-FREGBANK

DPU-REGBANK