the evolution of dependable computing at the university of ...the evolution of dependable computing...

22
The Evolution of Dependable Computing at the University of Illinois Dedicated to Professor Algirdas Avizienis R. Iyer, W. Sanders, J. Patel, Z. Kalbarczyk Center for Reliable and High-Performance Computing Department of Electrical and Computer Engineering and Coordinated Science Laboratory University of Illinois at Urbana-Champaign

Upload: others

Post on 26-Jan-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

  • The Evolution of Dependable Computing at the University of Illinois

    Dedicated toProfessor Algirdas Avizienis

    R. Iyer, W. Sanders, J. Patel, Z. KalbarczykCenter for Reliable and High-Performance Computing

    Department of Electrical and Computer Engineering andCoordinated Science Laboratory

    University of Illinois at Urbana-Champaign

  • Long Association with FTCS (now DSN) Al Aviziennis

    Ph.D. Illinois, Founder FTCS and IFIP WG 10.4

    Program Chair or General ChairGary Metze, faculty, FTCS-2, FTCS-4John Hayes, Ph.D. Illinois, FTCS-7Jacob Abraham, faculty, FTCS-11, FTCS-19, IEEE TC on Fault-Tolerant Computing chair

    Ravi Iyer, faculty, FTCS-19, FTCS-25, IEEE TC on Fault-Tolerant Computing chairKent Fuchs, faculty, FTCS-27IEEE TC on Fault-Tolerant Computing chairRam Chillarege, Ph.D. Illinois, FTCS-28Bill Sanders, faculty, FTCS-29 IEEE TC on Fault-Tolerant Computing chairZbigniew Kalbarczyk, Ph.D., DSN-02Tim Tsai, Ph.D. Illinois, DSN-04

  • Early Developments –Illiac-I and Illiac-II

    During 1950’s and 1960’s Illiac-I and Illiac-II

    Frequent failures of vacuum tubes jump started the early work in fault diagnosis

    A subtle design bug in the arithmetic unit, (-2) * (-2) giving (-4), escaped the tests

    caught after about nine months by a numerical double-check built into a user’s program

    While no attempt was made to model faults systematically, the handshake mechanism used in the ALU control exhibited the basic idea of what is now called self-checking operation.

    Avizienis conducts early research on parallel computer arithmetic

  • Pioneering Work of Seshu and Metzein Test and Diagnosis

    Sequential Analyzer (Seshu), which included a set of programs for

    automatic generation of fault simulation data (stuck-line faults) for a given logic circuit and test sequence

    automatic generation of test sequences for combinational and sequential circuits.

    studying computer self-diagnosis

    Sequential analyzer was applied in at Bell Telephone Laboratories to improve diagnosis procedures for ESS-1

    Chang, Manning, and Metze produced, as a tribute to Seshu; probably the first book devoted entirely to digital fault diagnosis.

    PMC (Preparata, Metze, Chien) model for computer self-diagnosis

  • Automatic Test Pattern Generation (ATPG) – from Research to Product

    First commercial ATPG in 70’s (Marlett) and then several updated versions through 80’s and 90’s

    Founding ATPG company Sunrise Test Systems, now a part of Synopsis (Niermann and Patel)

    Commercialization of PROOFS, the fastest, most memory efficient fault simulation algorithm, in 1990

    PROOFS now used in all commercial ATPG tools

  • Testing of multi-Million gates –Scan Design

    Early Scan Design

    NEC, IBM and William-Angel of Stanford, all claim to be the first to propose Scan

    Others claim that the first proposal of the scan idea was in the book by Chang, Manning and Metze of Illinois!

    Today large circuits with close to a million flip-flops in scan chain requires enormous test time and data

    Illinois Scan Architecture (1999-2002)

    Reduces test time and test data by factors of 100 or more with no logic overhead!

    Already in use at IBM, Intel, Syntest and others

  • Illinois Scan Architecture

    internal scan chains

    externalscan-in

    pins

    outputcompactor

  • Functional-Level Test GenerationMemory testing using functional-level fault model (i.e., without the availability of information about their internal structure)

    the initial fault model for memories included stuck bits in the memory and coupling between cells in a memory.

    Test generation procedures for microprocessors based on an extrapolation of the approach to testing memories

    A general graph-theoretic model at the register-transfer level to model microprocessors using only information about its instruction set and the functions performed. A fault simulation study on a real microprocessor showed extremely good fault coverage for tests developed using these procedures.

  • Detection & Recovery

  • Self-Checking and Time Redundancy

    Pioneering work of Carter led Anderson and Metze (at Illinois) to formulate Totally Self Checking (TSC) circuits; subsequent work led to introduction of Strongly Fault Secureproperty

    Triggered widespread research in academia and industry in the 70’s and 80’s

    Metze proposed time redundancy techniques for checking errors based on Alternating Logic; subsequently enhanced as RESO (Recomputing with Shifted Operands)

  • Algorithm-based Fault Tolerance –ABFT and Checkpointing

    ABFTMatrix encoding schemes for detecting and correcting errors when matrix operations are performed by processor arrays

    Generalized to linear arrays, Laplace equation solvers, and FFT networks

    Ideal for low-cost fault tolerance for special-purpose computations, including signal-processing applications

    Explored by a large number of researchers,

    more recently as part of the REE program at JPL

    CheckpointingAsynchronous checkpointing for distributed systems optimized for space overhead and performance

    Compiler-assisted multiple instructions rollback scheme to aid in speculative execution repair in microprocessors

    Low-overhead coordinated checkpointing for long-running parallel and high-availability applications

    RENEW toolset for rapid development and testing of checkpoint protocols

  • Current Directions

  • Hierarchical Application Aware Error Detection and Recovery

    Parity, ECC at processor level Control logic protection

    Reliable ultra-fast interconnect servicesApplication-aware programmable

    checks

    Application Support

    Compiler Support

    Operating System Support

    Hardware Support

    Robust (self-checking) middlewareAPI for High-Dependability support

    Control flow checking and data audit

    Application and/or OS instrumentation

    to customize and invoke FT mechanisms

    Kernel Health Monitor,Application transparent checkpointing

    Security services, e.g., runtime memory randomization

    Compiler generated assertions

    HW broadcast

    Randomizing memory layout

    Instructions to invoke hardware-

    levelerror checks

  • Middleware and Hardware Frameworks for Fault Tolerance and Security

    Hardware Platform

    ARMORs

    Application

    Operating System

    ExecutionARMOR

    ARMORInterface

    ApplicationHeartbeatARMOR

    DaemonARMOR

    Node 1

    Hardware Platform

    ARMORs

    Application

    Operating System

    ExecutionARMOR

    ARMORInterface

    ApplicationHeartbeatARMOR

    DaemonARMOR

    Node 1

    ARMOR are multithreaded processes composed of replaceable building blocks – elements, which implement error detection, recovery policies, security services, and management of runtime environment.

    ARMOR – High Availability and Security Middleware

    RSE – Reliability and Security EngineInstruction-

    Cache Instruction-FetchBranch

    Predictor

    Instruction-AddressTranslation Buffer

    4-entryFetch Buffer

    Dispatcher

    InstructionDecoder

    RegisterFile

    Load / StoreUnit

    IntegerUnit

    Multiply / Divide Unit

    Branch Resolve Unit

    ReorderBuffer

    Write-Buffer

    Data-Cache

    Commit-UnitData-AddressTranslation Buffer Arbiter

    Instruction-Cache Instruction-Fetch

    BranchPredictor

    Instruction-AddressTranslation Buffer

    4-entryFetch Buffer

    Dispatcher

    InstructionDecoder

    Dispatcher

    InstructionDecoder

    RegisterFile

    Load / StoreUnit

    IntegerUnit

    Multiply / Divide Unit

    Branch Resolve Unit

    ReorderBuffer

    Write-Buffer

    Data-Cache

    Commit-UnitData-AddressTranslation Buffer Arbiter

    ROBAllocPtr1 ( i )ROBAllocPtr2 ( i )

    InstrReg1InstrReg2

    MUX3

    MDUDataOutALUDataOutLSUEffAddr

    LoadFromALU ( i )LoadFromMDU ( i )LoadFromLSU ( i )

    LSUDataOut

    LoadFromLSU ( i )

    MUX5

    LSUSrc2ALUSrc2MDUSrc2

    IssueLSU ( i )IssueALU ( i )IssueMDU ( i )

    MUX2

    InstructionCheckerModule

    MemoryLayout

    Randomization

    DataDependency

    Tracker

    MemoryAccessUnit

    InstructionOutputQueue

    Memory Access Request Mem_Rdy

    Memory Access Request

    MemoryData

    ModuleOutputs

    RSE Framework

    CUCommitInstr1

    InstrToCommit ( i )

    MUX4CUCommitInstr2

    Mem

    AdaptiveHeartbeatMonitor

    RegFile_DataEntry i

    Execute_OutEntry i

    Fetch_Out Entry i

    Memory_OutEntry i

    Commit_OutEntry i

    ModuleEnable/Disable

    InstrReg3InstrReg4

    ROBAllocPtr3 ( i )ROBAllocPtr4 ( i )

    MUX1 Hardware Modules

    Input interface

    Fetch / Dispatch Width 4 instructionsIssue width 4 instructionsRUU / LSQ size 16/8 entriesInstruction L1 cache Size: 8 KB, 1-way associativeData L1 cache Size: 8 KB, 1-way associativeInstruction L2 cache Size: 64 KB, 2-way associative

    Data L2 Cache Size: 128 KB, 2-way associative

    checkcheckValid

    Bus-InterfaceUnit

    ExternalBus

    Framework to provide application-aware techniques for error-detection, masking of security vulnerabilities and recovery in a uniform, low overhead manner

    Algorithms and architectures for building dependable, object-oriented, distributed computer/communication systems

    AQuA – Adaptive Quality of Service for Availability

    ITUA - Intrusion Tolerance by Unpredictable Adaptation

    Algorithms and architectures for building intrusion-tolerant distributed systems

    GossipNameServer Gateway

    Object QuO

    ObjectFactory

    Gateway

    ProteusDependability

    Manager

    GatewayGateway

    Object QuO

    Underlying Group Communication System

    GossipNameServer Gateway

    Object QuO

    Gateway

    Object QuO

    ObjectFactory

    Gateway

    ObjectFactory

    Gateway

    ProteusDependability

    Manager

    GatewayGateway

    Object QuO

    Gateway

    Object QuO

    Underlying Group Communication System

  • Modeling and Simulation

    Componentlibraries

    Fault injector

    Simulationengines

    Faultdictionaries

    Other facilities

    - hardware componentsand their functional behavior

    - logical (architecture) components and their functionality

    - hardware componentsand their functional behavior

    - logical (architecture) components and their functionality

    - hardware componentsand their functional behavior

    - logical (architecture) components and their functionality

    - effect of faults on higher level in terms ofa low level fault model

    - effect of faults on higher level in terms ofa low level fault model

    - effect of faults on higher level in terms ofa low level fault model

    - random numbergenerators

    - recording, statisticsreporting

    - random numbergenerators

    - recording, statisticsreporting

    VHDLbehavioral

    model

    VHDLstructural

    model

    C++basedmodel

    Blockdiagram

    VHDLbehavioral

    model

    VHDLstructural

    model

    C++basedmodel

    Blockdiagram

    Task & Environment Manager

    DEPENDDEPENDMenupagesMenupagesTool-kit

    Graphical editorComp-lib builderVHDL translator

    A simulation framework to support the design of systems for fault tolerance and high availability. It takes as inputs both VHDL and C++ system description and produces as output dependability characteristics including fault coverage, availability, and performance. (License to several companies and employed to simulate a number of industrial Systems, e.g. Integrity S2 from Tandem)

    DEPEND

    MöbiusAn Extensible Modeling Tool for Quantifying Dependability, Security, and Performance Features:• Multiple model representation techniques facilitate modeling of hardware, software, protocols, and application in a unified manner• Unified representation of dependability, security and performance attributes• Integrated analytical/numerical and simulation-based solution(Licensed by over 190 institutions)

    Framework Component

    Atomic Model

    Composed Model

    Solvable Model

    Connected Model

    Study Specifier(generates multiplemodels)

  • Experimental System Evaluation and Benchmarking – Fault Injection

    CampaignScript

    CampaignScript

    LogLog

    ControlHost

    ControlHost

    ProcessManagerProcessManager

    InjectorProcessInjectorProcess

    ApplicationProcess

    ApplicationProcess

    ProcessManagerProcessManager

    InjectorProcessInjectorProcess

    ApplicationProcess

    ApplicationProcess

    LAN

    Error Injection Targets

    Control Host

    CampaignScript

    CampaignScript

    LogLog

    ControlHost

    ControlHost

    ProcessManagerProcessManager

    InjectorProcessInjectorProcess

    ApplicationProcess

    ApplicationProcess

    ProcessManagerProcessManager

    InjectorProcessInjectorProcess

    ApplicationProcess

    ApplicationProcess

    LAN

    Error Injection Targets

    Control HostNFTAPE A software framework for conducting automated

    fault/error injection-based dependability characterization;Evaluation/benchmarking of:- fault-tolerant systems, e.g., Tandem FT-platforms;- operating systems, e.g., Linux kernel on Pentium and PowerPC- applications, e.g., call processing, space-borne software- security violations due to errors, e.g., FTP, firewall facilities(Licensed to multiple institutions, including JPL-NASA)

    LokiA software fault injector for distributed systems in which the introduction of faults is triggered based on the global state of the system. Evaluation of large-scale distributed systems, e.g., a group membership protocol in Ensemble, and correlated network partitions in Coda distributed file system.

    . . .Loki RuntimeSystemUnderTest Loki RuntimeSystemUnderTest Loki RuntimeSystemUnderTest

    Loki Runtime

    SystemUnderTest

    Probe

    Inject Fault

    StateMachine

    Notifications

    Recorder

    StateMachineTransport

    FaultParser

    LAN1

    LAN2

    . . .Local Timelines

    Offline ClockSynchronization

    Uses timestamp datacollected before andafter the experiment

    Single Global Timeline

    To LAN2

    MeasureAnalysis

    Measures3711

    1

    N

    2

    N21

  • Operational System Data Analysis

    LAN of Windows NT Machines Internet Data type and number of machines

    System logs from 503 servers

    System logs from 68 mail servers

    Web site access success/failure logs from 97 most popular Web sites

    Period for data collecting 4 months 6 months 40 weeks Failure context

    Machine reboots logged in a server system log.

    Machine reboots logged in a server system log.

    Inability to contact a host and fetch an HTML file from it.

    Availability

    System perspective User perspective (user gets expected service)

    99% N/A

    99% 92%

    N/A 99%

    Downtime (median)

    N/A

    0.2h

    1h (45%) 4.5h (49%) 53h (6%)

    Comments

    Indication of error propagation across the network

    A few major network-related failures made nearly 70% of the hosts inaccessible

    Failure Data Analysis

    Impact of workload on hardware and software fault toleranceCharacterization of error latencyFailure characterization of UNIX and Windows NT servers Reliability of Internet hosts

    Security Vulnerability Analysis

    Size(PostD

    ata)bk=C

    B->fd=&addr_free-(offset of field bk)B->bk=Mcode♦-

    B->fd=&addr_free-(offset of field bk)B->bk=Mcode

    -♦ When buf is freed, execute B->fd->bk = B->bkB->fd and B->bk

    unchanged ♦-

    .GOT entry of function free points to MCode

    addr_free changed♦-

    addr_freeunchanged ♦- -♦ Execute addr_free when

    function free is called

    Mcode is executed

    Note: addr_free is the .GOT entry of function free

    X?

    Calloc is called

    -♦ Load addr_freeto the memory during program initializationManipulate the

    .GOT entry of function free(i.e., addr_free)

    pFSM1pFSM2

    pFSM3

    pFSM4

    Heap Layout

    Operation 1:

    Operation 2:

    Operation 3:

    Size(PostD

    ata)bk=C

    B->fd=&addr_free-(offset of field bk)B->

    Size(PostD

    ata)bk=C

    B->fd=&addr_free-(offset of field bk)B->bk=Mcode♦-

    B->fd=&addr_free-(offset of field bk)B->bk=Mcode

    -♦ When buf is freed, execute B->fd->bk = B->bkB->fd and B->bk

    unchanged ♦-

    .GOT entry of function free points to MCode

    addr_free changed♦-

    addr_freeunchanged ♦- -♦ Execute addr_free when

    function free is called

    Mcode is executed

    Note: addr_free is the .GOT entry of function free

    X?

    Calloc is called

    -♦ Load addr_freeto the memory during program initializationManipulate the

    .GOT entry of function free(i.e., addr_free)

    pFSM1pFSM2

    pFSM3

    pFSM4

    Heap Layout

    Operation 1:

    Operation 2:

    Operation 3:

    bk=Mcode♦-

    B->fd=&addr_free-(offset of field bk)B->bk=Mcode

    -♦ When buf is freed, execute B->fd->bk = B->bkB->fd and B->bk

    unchanged ♦-

    .GOT entry of function free points to MCode

    addr_free changed♦-

    addr_freeunchanged ♦- -♦ Execute addr_free when

    function free is called

    Mcode is executed

    Note: addr_free is the .GOT entry of function free

    X?

    Calloc is called

    -♦ Load addr_freeto the memory during program initializationManipulate the

    .GOT entry of function free(i.e., addr_free)

    pFSM1pFSM2

    pFSM3

    pFSM4

    Heap Layout

    Operation 1:

    Operation 2:

    Operation 3:

    1: PostData = calloc(contentLen +1024,sizeof(char));x=0; rc=0;

    2: pPostData= PostData;3: do {4: rc=recv(sock, pPostData,

    1024, 0);5: if (rc==-1) { 6: closeconnect(sid,1);7: return; 8: }9: pPostData+=rc;10: x+=rc;11: } while ((rc==1024) ||

    (x

  • Security Vulnerability Avoidance and Runtime Protection

    CERT advisories indicateCERT advisories indicate≥≥ 66% 66% vulnerabilitiesvulnerabilities due to pointer due to pointer taintednesstaintedness≥≥ 33% 33% due to errors in library functions due to errors in library functions or incorrect invocations of library or incorrect invocations of library functionsfunctions

    Pointer Pointer TaintednessTaintedness –– a unified basis a unified basis for reasoning about security for reasoning about security vulnerabilitiesvulnerabilities

    A pointer is tainted if a user value, A pointer is tainted if a user value, including a return address, is derived including a return address, is derived directly or indirectly from the user input. directly or indirectly from the user input.

    Pointer Pointer taintednesstaintedness semanticssemanticsapplied to formally reason and extract applied to formally reason and extract security specifications of library security specifications of library functions functions

    Exposes (and removes) security Exposes (and removes) security vulnerabilities vulnerabilities

    Vulnerability Avoidance Transparent Runtime Randomization

    Protects against about 60% of security attack reported by CERTBreaks the attacker’s assumptions on the fixed memory layout of the target systemMakes it impossible to correctly determine location of critical application associated memory regions essential in designing and launching a successful attackConverts an attack into application crashProgram initialization overhead – 1% to 6%NO runtime overhead Memory cost –extra copy of GOT(global offset table) 200 Byte to 3.5 KB

    use code

    Kernel space0xC0000000

    0xFFFFFFFF

    0x08048000static data

    bss

    shared libraries0x40000000 ± Rand

    0xBFFFFFFC - Randuser stack

    user heap End_of_bss + Rand

  • DPASA – Designing Protection & Adaptation Into a Survivability Architecture

    Client Zone

    JBI Management Staff

    ExecutiveZone

    CrumpleZone

    OperationsZone

    JBI Core

    Quad 1 Quad 2 Quad 3 Quad 4

    Network

    Access Proxy (Isolated Process Domains in SE-Linux)Domain6

    First Restart Domains Eventually Restart HostLocal Controller

    RMI

    STCPTCP

    PS Sensor Rpts

    TCP UDP

    IIOP

    PSQImplPSQImpl

    IIOP

    TCP

    DC

    Eascii

    Domain1 Domain2 Domain3 Domain4 Domain5Forward/Ratelimit

    Proxy LogicInspect / Forward / Rate Limit

    Client Zone

    JBI Management Staff

    ExecutiveZone

    CrumpleZone

    OperationsZone

    JBI Core

    Quad 1 Quad 2 Quad 3 Quad 4

    Network

    Access Proxy (Isolated Process Domains in SE-Linux)Domain6

    First Restart Domains Eventually Restart HostLocal Controller

    RMI

    STCPTCP

    PS Sensor Rpts

    TCP UDP

    IIOP

    PSQImplPSQImpl

    IIOP

    TCP

    DC

    Eascii

    Domain1 Domain2 Domain3 Domain4 Domain5Forward/Ratelimit

    Proxy LogicInspect / Forward / Rate Limit

    Access Proxy (Isolated Process Domains in SE-Linux)Domain6

    First Restart Domains Eventually Restart HostLocal Controller

    RMI

    STCPTCP

    PS Sensor Rpts

    TCP UDP

    IIOP

    PSQImplPSQImpl

    IIOP

    TCP

    DC

    Eascii

    Domain1 Domain2 Domain3 Domain4 Domain5Forward/Ratelimit

    Proxy LogicInspect / Forward / Rate Limit

    JBI critical mission objectives

    JBI critical functionality

    Initialized JBI provides essential services

    Authorized publish processed successfully

    ConfidentialityDataflow Timeliness Integrity

    (from functional model execution)

    Functional model assumptions hold

    JBI mission awareness

    CA1: Origin of Attacks on Clients

    CA2: Attack Propagation from Clients

    CA3: Client Process Corruption

    PA1: Client-Core

    Communication I & C

    PA2: Alternate Path

    Availability

    QA1: QIS Incorruptibility

    QA2: QIS Communication

    Cutoff

    QA3: QIS Input

    Integrity

    QA4: QIS Function

    Correctness

    AA1: AP Function

    Correctness

    AA2: AP Application-layer Integrity

    AA3: AP Application-layer Confidentiality

    AA4: Origin of Attacks on

    Access Proxy

    AA5: Attacks from AP

    AA6: DoSfrom Compromised

    Core

    AA7: AP Process Corruption

    AA8: DoSPrevention by Access Proxy

    DA1: DC Communications

    GA1: Process Corruption on Guardian

    DA3: Process

    Corruption on DC

    GA2: Attacks from Guardian

    DA2: Origin of Attacks on DC

    SA1: Origin of Attacks on PSQ Server

    SA2: Attacks from PSQ Server

    SA3: IO Integrity in PSQ Server

    SA4: Client Confidentiality in PSQ Server

    SA5: IO Authenticity

    SA6: Network-layer I & C

    SA7: Process Corruption in PSQ Server

    SeA1: Attacks from IDS Sensor

    SeA2: Sensor False Alarm

    Rate

    SeA3: Sensor Detection Delay

    SeA4: Sensor Detection Probability

    SeA5: Process

    Corruption in Sensor

    AcA1: Process Corruption in Actuator

    AcA2: Attacks from Actuator

    LA1: Process Corruption in

    Local Controller

    LA2: Attacks from Local Controller

    CoA1: CorrleatorFalse Alarm

    Rate

    CoA2: Origin of Attacks on Correlator

    CoA3: Attacks from Correlator

    CoA4: Alert IntegrityMA1: SM Byzantine Agreement

    MA2: Origin of Attacks on

    SM

    MA3: Attacks from SM

    PsA1: ADF Policy Server

    Input Correctness

    PsA1: ADF Policy Server

    Synchronization

    ScA1: Process Corruption in Subscribed

    Client

    System Connectivity

    Physical Topology

    Network Topology Restricted Routing No Tunneling Attacks

    Process Isolation

    SELinux Trusted Solaris Windows 2000

    Type Enforcement Hardened Kernel Hardened Kernel Kernel Loadable Wrappers

    VMWareover SELinux

    Platform Mechanisms Component-specific policy

    Private Key Confidentiality

    No Unauthorized Direct Access

    Keys Protected from Theft

    DoDCommon Access Card (CAC)

    PKCS #11 Tamperproof

    Keys Not Guessable

    Algorithmic Framework

    Key Length Key Lifetime

    No Unauthorized Indirect Access

    Physical Protection of CAC device

    Protection of CAC Authentication Data

    No Compromise of Authorized Process Accessing CAC

    No Cryptography in Access Proxy

    Not Preconfigured

    Not Reconfigurable

    ADF NIC services protected

    ADF Correctness

    ADF NIC Physical Security

    ADF NIC Firmware Initialization

    ADF Key Initialization

    ADF Agent Initialization

    ADF Protocol Correctness

    ADF Host Independence

    ADF Agent Correctness

    VPG Integrity VPG Confidentiality

    Policy Server Integrity

    ADF Policy Correctness

    Correctness of Registration Protocol

    Correctness of Reattachment

    Protocol

    Hard-wired Configuration

    Electrically Isolated

    Physically Protected

    Connectivity Physical Integrity

    Electrical Integrity

    Gate Configuration and

    Truth Table

    Proxy Protocol Configuration

    Can Identify Malformed Traffic

    Correctness of Flow Control Mechanisms

    Bidirectional Flow Control

    Correctness of Certificate Exchange

    IDS Experimental Evaluation

    Correctness of Modified ITUA Protocols

    Functional model faithful to design

    IDS objectives

    Notification Confidentiality

    IO Confidentiality (end-to-end)

    Confidential info not exposed

    Unauthorized activity properly rejected

    Authorized join/leave processed successfully

    Authorized query processed successfully

    Authorized subscribe processed successfully

    JBI properly initialized

    Design Team Review

    JBI critical mission objectives

    JBI critical functionality

    Initialized JBI provides essential services

    Authorized publish processed successfully

    ConfidentialityDataflow Timeliness Integrity

    (from functional model execution)

    Functional model assumptions hold

    JBI mission awareness

    CA1: Origin of Attacks on Clients

    CA2: Attack Propagation from Clients

    CA3: Client Process Corruption

    PA1: Client-Core

    Communication I & C

    PA2: Alternate Path

    Availability

    QA1: QIS Incorruptibility

    QA2: QIS Communication

    Cutoff

    QA3: QIS Input

    Integrity

    QA4: QIS Function

    Correctness

    AA1: AP Function

    Correctness

    AA2: AP Application-layer Integrity

    AA3: AP Application-layer Confidentiality

    AA4: Origin of Attacks on

    Access Proxy

    AA5: Attacks from AP

    AA6: DoSfrom Compromised

    Core

    AA7: AP Process Corruption

    AA8: DoSPrevention by Access Proxy

    DA1: DC Communications

    GA1: Process Corruption on Guardian

    DA3: Process

    Corruption on DC

    GA2: Attacks from Guardian

    DA2: Origin of Attacks on DC

    SA1: Origin of Attacks on PSQ Server

    SA2: Attacks from PSQ Server

    SA3: IO Integrity in PSQ Server

    SA4: Client Confidentiality in PSQ Server

    SA5: IO Authenticity

    SA6: Network-layer I & C

    SA7: Process Corruption in PSQ Server

    SeA1: Attacks from IDS Sensor

    SeA2: Sensor False Alarm

    Rate

    SeA3: Sensor Detection Delay

    SeA4: Sensor Detection Probability

    SeA5: Process

    Corruption in Sensor

    AcA1: Process Corruption in Actuator

    AcA2: Attacks from Actuator

    LA1: Process Corruption in

    Local Controller

    LA2: Attacks from Local Controller

    CoA1: CorrleatorFalse Alarm

    Rate

    CoA2: Origin of Attacks on Correlator

    CoA3: Attacks from Correlator

    CoA4: Alert IntegrityMA1: SM Byzantine Agreement

    MA2: Origin of Attacks on

    SM

    MA3: Attacks from SM

    PsA1: ADF Policy Server

    Input Correctness

    PsA1: ADF Policy Server

    Synchronization

    ScA1: Process Corruption in Subscribed

    Client

    System Connectivity

    Physical Topology

    Network Topology Restricted Routing No Tunneling Attacks

    Process Isolation

    SELinux Trusted Solaris Windows 2000

    Type Enforcement Hardened Kernel Hardened Kernel Kernel Loadable Wrappers

    VMWareover SELinux

    Platform Mechanisms Component-specific policy

    Private Key Confidentiality

    No Unauthorized Direct Access

    Keys Protected from Theft

    DoDCommon Access Card (CAC)

    PKCS #11 Tamperproof

    Keys Not Guessable

    Algorithmic Framework

    Key Length Key Lifetime

    No Unauthorized Indirect Access

    Physical Protection of CAC device

    Protection of CAC Authentication Data

    No Compromise of Authorized Process Accessing CAC

    No Cryptography in Access Proxy

    Not Preconfigured

    Not Reconfigurable

    ADF NIC services protected

    ADF Correctness

    ADF NIC Physical Security

    ADF NIC Firmware Initialization

    ADF Key Initialization

    ADF Agent Initialization

    ADF Protocol Correctness

    ADF Host Independence

    ADF Agent Correctness

    VPG Integrity VPG Confidentiality

    Policy Server Integrity

    ADF Policy Correctness

    Correctness of Registration Protocol

    Correctness of Reattachment

    Protocol

    Hard-wired Configuration

    Electrically Isolated

    Physically Protected

    Connectivity Physical Integrity

    Electrical Integrity

    Gate Configuration and

    Truth Table

    Proxy Protocol Configuration

    Can Identify Malformed Traffic

    Correctness of Flow Control Mechanisms

    Bidirectional Flow Control

    Correctness of Certificate Exchange

    IDS Experimental Evaluation

    Correctness of Modified ITUA Protocols

    Functional model faithful to design

    IDS objectivesIDS objectives

    Notification ConfidentialityNotification

    ConfidentialityIO Confidentiality

    (end-to-end)IO Confidentiality

    (end-to-end)

    Confidential info not exposed

    Confidential info not exposed

    Unauthorized activity properly rejected

    Unauthorized activity properly rejected

    Authorized join/leave processed successfully

    Authorized join/leave processed successfully

    Authorized query processed successfully

    Authorized query processed successfully

    Authorized subscribe processed successfullyAuthorized subscribe processed successfully

    JBI properly initializedJBI properly initialized

    Design Team Review

    JBI critical mission objectives

    JBI critical functionality

    Initialized JBI provides essential services

    Authorized publish processed successfully

    ConfidentialityDataflow Timeliness Integrity

    (from functional model execution)

    Functional model assumptions hold

    JBI mission awareness

    CA1: Origin of Attacks on Clients

    CA2: Attack Propagation from Clients

    CA3: Client Process Corruption

    PA1: Client-Core

    Communication I & C

    PA2: Alternate Path

    Availability

    QA1: QIS Incorruptibility

    QA2: QIS Communication

    Cutoff

    QA3: QIS Input

    Integrity

    QA4: QIS Function

    Correctness

    AA1: AP Function

    Correctness

    AA2: AP Application-layer Integrity

    AA3: AP Application-layer Confidentiality

    AA4: Origin of Attacks on

    Access Proxy

    AA5: Attacks from AP

    AA6: DoSfrom Compromised

    Core

    AA7: AP Process Corruption

    AA8: DoSPrevention by Access Proxy

    DA1: DC Communications

    GA1: Process Corruption on Guardian

    DA3: Process

    Corruption on DC

    GA2: Attacks from Guardian

    DA2: Origin of Attacks on DC

    SA1: Origin of Attacks on PSQ Server

    SA2: Attacks from PSQ Server

    SA3: IO Integrity in PSQ Server

    SA4: Client Confidentiality in PSQ Server

    SA5: IO Authenticity

    SA6: Network-layer I & C

    SA7: Process Corruption in PSQ Server

    SeA1: Attacks from IDS Sensor

    SeA2: Sensor False Alarm

    Rate

    SeA3: Sensor Detection Delay

    SeA4: Sensor Detection Probability

    SeA5: Process

    Corruption in Sensor

    AcA1: Process Corruption in Actuator

    AcA2: Attacks from Actuator

    LA1: Process Corruption in

    Local Controller

    LA2: Attacks from Local Controller

    CoA1: CorrleatorFalse Alarm

    Rate

    CoA2: Origin of Attacks on Correlator

    CoA3: Attacks from Correlator

    CoA4: Alert IntegrityMA1: SM Byzantine Agreement

    MA2: Origin of Attacks on

    SM

    MA3: Attacks from SM

    PsA1: ADF Policy Server

    Input Correctness

    PsA1: ADF Policy Server

    Synchronization

    ScA1: Process Corruption in Subscribed

    Client

    System Connectivity

    Physical Topology

    Network Topology Restricted Routing No Tunneling Attacks

    Process Isolation

    SELinux Trusted Solaris Windows 2000

    Type Enforcement Hardened Kernel Hardened Kernel Kernel Loadable Wrappers

    VMWareover SELinux

    Platform Mechanisms Component-specific policy

    Private Key Confidentiality

    No Unauthorized Direct Access

    Keys Protected from Theft

    DoDCommon Access Card (CAC)

    PKCS #11 Tamperproof

    Keys Not Guessable

    Algorithmic Framework

    Key Length Key Lifetime

    No Unauthorized Indirect Access

    Physical Protection of CAC device

    Protection of CAC Authentication Data

    No Compromise of Authorized Process Accessing CAC

    No Cryptography in Access Proxy

    Not Preconfigured

    Not R e c o n f i g u r a b l e

    ADF NIC services protected

    ADF Correctness

    ADF NIC Physical Security

    ADF NIC Firmware Initialization

    ADF Key Initialization

    ADF Agent Initialization

    ADF Protocol Correctness

    ADF Host Independence

    ADF Agent Correctness

    VPG Integrity VPG Confidentiality

    Policy Server Integrity

    ADF Policy Correctness

    Correctness of Registration Protocol

    Correctness of Reattachment

    Protocol

    Hard-wired Configuration

    Electrically Isolated

    Physically Protected

    Connectivity Physical Integrity

    Electrical Integrity

    Gate Configuration and

    Truth Table

    Proxy Protocol Configuration

    Can Identify Malformed Traffic

    Correctness of Flow Control Mechanisms

    Bidirectional Flow Control

    Correctness of Certificate Exchange

    IDS Experimental Evaluation

    Correctness of Modified ITUA Protocols

    Functional model faithful to design

    IDS objectives

    Notification Confidentiality

    IO Confidentiality (end-to-end)

    Confidential info not exposed

    Unauthorized activity properly rejected

    Authorized join/leave processed successfully

    Authorized query processed successfully

    Authorized subscribe processed successfully

    JBI properly initialized

    Design Team Review

    JBI critical mission objectives

    JBI critical functionality

    Initialized JBI provides essential services

    Authorized publish processed successfully

    ConfidentialityDataflow Timeliness Integrity

    (from functional model execution)

    Functional model assumptions hold

    JBI mission awareness

    CA1: Origin of Attacks on Clients

    CA2: Attack Propagation from Clients

    CA3: Client Process Corruption

    PA1: Client-Core

    Communication I & C

    PA2: Alternate Path

    Availability

    QA1: QIS Incorruptibility

    QA2: QIS Communication

    Cutoff

    QA3: QIS Input

    Integrity

    QA4: QIS Function

    Correctness

    AA1: AP Function

    Correctness

    AA2: AP Application-layer Integrity

    AA3: AP Application-layer Confidentiality

    AA4: Origin of Attacks on

    Access Proxy

    AA5: Attacks from AP

    AA6: DoSfrom Compromised

    Core

    AA7: AP Process Corruption

    AA8: DoSPrevention by Access Proxy

    DA1: DC Communications

    GA1: Process Corruption on Guardian

    DA3: Process

    Corruption on DC

    GA2: Attacks from Guardian

    DA2: Origin of Attacks on DC

    SA1: Origin of Attacks on PSQ Server

    SA2: Attacks from PSQ Server

    SA3: IO Integrity in PSQ Server

    SA4: Client Confidentiality in PSQ Server

    SA5: IO Authenticity

    SA6: Network-layer I & C

    SA7: Process Corruption in PSQ Server

    SeA1: Attacks from IDS Sensor

    SeA2: Sensor False Alarm

    Rate

    SeA3: Sensor Detection Delay

    SeA4: Sensor Detection Probability

    SeA5: Process

    Corruption in Sensor

    AcA1: Process Corruption in Actuator

    AcA2: Attacks from Actuator

    LA1: Process Corruption in

    Local Controller

    LA2: Attacks from Local Controller

    CoA1: CorrleatorFalse Alarm

    Rate

    CoA2: Origin of Attacks on Correlator

    CoA3: Attacks from Correlator

    CoA4: Alert IntegrityMA1: SM Byzantine Agreement

    MA2: Origin of Attacks on

    SM

    MA3: Attacks from SM

    PsA1: ADF Policy Server

    Input Correctness

    PsA1: ADF Policy Server

    Synchronization

    ScA1: Process Corruption in Subscribed

    Client

    System Connectivity

    Physical Topology

    Network Topology Restricted Routing No Tunneling Attacks

    Process Isolation

    SELinux Trusted Solaris Windows 2000

    Type Enforcement Hardened Kernel Hardened Kernel Kernel Loadable Wrappers

    VMWareover SELinux

    Platform Mechanisms Component-specific policy

    Private Key Confidentiality

    No Unauthorized Direct Access

    Keys Protected from Theft

    DoDCommon Access Card (CAC)

    PKCS #11 Tamperproof

    Keys Not Guessable

    Algorithmic Framework

    Key Length Key Lifetime

    No Unauthorized Indirect Access

    Physical Protection of CAC device

    Protection of CAC Authentication Data

    No Compromise of Authorized Process Accessing CAC

    No Cryptography in Access Proxy

    Not Preconfigured

    Not Reconfigurable

    ADF NIC services protected

    ADF Correctness

    ADF NIC Physical Security

    ADF NIC Firmware Initialization

    ADF Key Initialization

    ADF Agent Initialization

    ADF Protocol Correctness

    ADF Host Independence

    ADF Agent Correctness

    V P G I n t e g r i t y VPG Confidentiality

    Policy Server Integrity

    ADF Policy Correctness

    Correctness of Registration Protocol

    Correctness of Reattachment

    Protocol

    Hard-wired Configuration

    Electrically Isolated

    Physically Protected

    Connectivity Physical Integrity

    Electrical Integrity

    Gate Configuration and

    Truth Table

    Proxy Protocol Configuration

    Can Identify Malformed Traffic

    Correctness of Flow Control Mechanisms

    Bidirectional Flow Control

    Correctness of Certificate Exchange

    IDS Experimental Evaluation

    Correctness of Modified ITUA Protocols

    Functional model faithful to design

    IDS objectivesIDS objectives

    Notification ConfidentialityNotification

    ConfidentialityIO Confidentiality

    (end-to-end)IO Confidentiality

    (end-to-end)

    Confidential info not exposed

    Confidential info not exposed

    Unauthorized activity properly rejected

    Unauthorized activity properly rejected

    Authorized join/leave processed successfully

    Authorized join/leave processed successfully

    Authorized query processed successfully

    Authorized query processed successfully

    Authorized subscribe processed successfullyAuthorized subscribe processed successfully

    JBI properly initializedJBI properly initialized

    Design Team Review

    DemonstratesADF based protectionDJM: signing and signature checking, authentication, RBAC, semantic and behavioral checks, FT protocolsAdaptive response: rapid reaction, IO rejection, quad isolation, dynamic clients, fall back

    R o u t e r 1

    C A FC l i e n t

    A O D BC l i e n t

    W e a t h e rC l i e n t

    W 0W 01 0 M b p s R o u t e r 2 A t t a c k e r

    Q u a d 2 Q u a d 3 Q u a d 4

    H u b

    M a n a g e d S w i t c h

    N I D S H u b

    Q u a d 1

    1 0 0 M b p s

    1 0 0 M b p s

    R o u t e r 1

    C A FC l i e n t

    A O D BC l i e n t

    W e a t h e rC l i e n t

    W 0W 01 0 M b p s R o u t e r 2 A t t a c k e r

    Q u a d 2 Q u a d 3 Q u a d 4

    H u b

    M a n a g e d S w i t c h

    N I D S H u b

    Q u a d 1

    1 0 0 M b p s

    1 0 0 M b p s

    Enough proof of concept implementations developed to show 3 AFRL clients running a simplified scenario over 4 quad core

    DPASA W/O Signature Verification

    0

    200

    400

    600

    800

    1000

    1200

    0 100 200 300 400 500 600 700

    Publications

    Late

    ncy(

    ms)

    Series1

    Testbed configuration

    Screenshots and performance graph

    The DPASA IT-JBI design provides critical functionality with high probability even when under heavy successful attack.

    Total number of intrusions versus MTTD_A (min)

    0

    100

    200

    300

    400

    500

    600

    700

    800

    900

    10 100 1000 10000 100000

    MTTD_A (min)

    Tota

    l Num

    ber

    of In

    trusi

    ons

    12 hour mission 24 hour mission 48 hour mission

    98% of all publishes successful when new vulnerabilities are discovered, on the average, once a day or less often during a 12-hour mission. (An extremely high new vulnerability discovery rate; CERT data suggest MTTDA ~ 6000 min.)

    At this new vulnerability discovery rate, system provides correct functionality even when about 10 intrusions occur during a 12-hour mission.

    Fraction of successful publishes versus MTTD_A (min)

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    10 100 1000 10000 100000

    MTTD_A (min)

    Frac

    tion

    of S

    ucce

    ssfu

    l Pub

    lishe

    s

    12 hour mission 24 hour mission 48 hour mission

  • Secure ARMORs – Seamless Dependability & Security Testbed

    MultithreadedLSA, PDS Algorithms

    iMac

    iMac

    iMac

    Ad Hoc Wireless NetworkSimulator (NS/JavaSim)

    Highly ReliableWireline Network

    Errorspropagation

    Higher ErrorRate

    Physical Wireless Testbed

    Wireless Network Emulator

    ErrorInjection

    iMac

    •Frequent disconnections• Mobility• Fading

    • Noisy channels• Battery exhaustion• HW transients• SW bugs

    • HW transients• SW bugs

    High performance/ bandwidth

    Limited performance/ bandwidth

    Error Injection

    Secure ARMORscustomizable software and/or

    hardware to provide reliability and security

    services

    Well established techniques: • Replication• Checkpointing

    • Seamless interaction across domains• Limit error propagation• Reduce dependability bottleneck dueto wireless network

  • The Information Trust Institute (ITI)Trust is about public confidence –

    It is about security, but also correctness, reliability, availability, and survivability

    ITI: Integrated private/public R&D and workforce development to foster technology transfer, new industry, products, and services in order to provide design and validation tools and architectural constructs needed to ensure and justify public trust in critical applications and systems

    Ensuring Public Confidence

    Providing designers and agencies with the

    tools to measure and validate trust in IT-dependent critical

    applications and systems

    PublicEducation &

    Workforce DevelopmentAddressing the nation’s

    cybertrust workforce needs:identifying and encouraging

    talent in K-12,university & college programs,

    professional programs,public awareness

    Designing inTrust

    Providing system design& validation tools, and

    architectural constructs todeploy and validate

    trustworthiness in critical applications & systems

    The loss of public confidence in or access to critical systems can be economically devastating:Two weeks after 9/11: >$10B and 100,000 jobs lost in airline industryDowntime costs per hour: brokerage operations: $6.45 M, credit card authorization: $2.6 M

  • Research in Dependable and Secure Systems at Illinois

    FacultyMicroprocessor architecture – S. Patel, N. Carter Hardware checkpointing – J. TorrellasFormal methods – J. MeseguerTest and VLSI design – J. PatelStorage systems – Y. ZhouMobile computing – N. VaidyaSimulation – D. NicolModeling and design – W. SandersOptical networks – S. LumettaMeasurement and design – R. IyerBenchmarking and design – Z. Kalbarczyk

    Major PartnersIBM – benchmarkingSUN – next generation supercomputerBBN – Intrusion-Tolerant Computing and Adaptive SystemsMotorola – wireless communication and computingHP – utility computingBoeing – trusted softwareJPL – next generation space borne computing