his 2015: prof. ian phillips - stronger than its weakest link
TRANSCRIPT
1
Stronger than its weakest link
HighIntegritySo.wareConference(HIS'15)5nov15:Bristol.
Pdf&SlideCast@hCp://ianp24.blogspot.com
Opinionsexpressedaremyown...
Prof. Ian Phillips Principal Staff Engineer
ARM Ltd [email protected]
Visiting Prof. at ...
Contribution to Industry Award 2008
2v0
2
High Integrity Software !?
..Or..
"ThescienNficmethodassumesthatasystemwithperfectintegrityyieldsasingularextrapolaNonwithinitsdomainthatonecantestagainstobservedresults"(Wikipedia)
§ IsSo.waretheweakestlinkinHighIntegritySystems?§ Suchthatimprovingitisallthat's
necessarytoproduceHighIntegritySystems?
§ WhenwesaySo.wareareweareactuallythinkingComputaNon?§ ButComputa;onisaboutresultsnotabout
implementa;ontechnologies!
3
We know what Proper Computing is... § HPCandMainframe...maybeWorksta2on
§ ButnotreallyLaptopor...(Heavenforbid)aPocketable?
4
Graham's Orrery - c1700
§ AmachinetoComputetheposi;onoftheplanets§ Single-Task,Con;nuousTime,Analogue,Mechanical,Computer(Withbacklash!)
George Graham. Clock-Maker (1674-1751)
5
Amsler’s Planimeter - c1856
Planimeter 2015 !
§ AMachineforCompu;ngtheAreaofanarbitrary2Dshape§ Technology:PrecisionMechanics,Analogue§ Availabletoday...Electronicallyenhanced
Jakob Amsler-Laffon. Mathematician, physicist, engineer (1823-1912)
6
IN (x) Enumerated Phenomena
OUT (y) Processed Data/
Information y=F(x)
§ State(s)andTime(t)areimplicitorexplicitvariablesinthis
§ AndsoareAccuracy(a),Reliability(r)andCost($)
§ Allofwhichcanbebalanced(Architected)tomeetEnd-Customerneeds§ Exceedingneedsalmostalways'costs'more!
...TechnologiesandMethodologiesjustoffer'star$'op;onsoverbasicfunc;onality ...Notallofwhichwillbecommerciallyvaluable
Computing is solving a Model of a Subset of Reality ... Fast enough to be useful and affordable by its customer
y=F(x,s,t,a,r,$)
7
10nm
100nm
1um
10um
100um
App
roxi
mat
e P
roce
ss G
eom
etry
ITRS’99
Tran
sist
ors/
Chi
p (M
)
Tran
sist
or/P
M (
K)
X
http://en.wikipedia.org/wiki/Moore’s_law
Digital Electronics Changed the Computation Game ...
8
2012: Nvidea’s Tegra 3 Processor Unit (Around1Btransistors)
NB: The Tegra 3 is similar to the Apple A4
9
Computing Systems
§ TheSystemisperceivedatitsHumanInterface§ Thoughtheactualinterfaceis(usually)rela;velydumb§ AnditsComputeEngineisalmostalwaysremote(andmaybeshared)
10
The Invisible Face of Computing Today
UnrecognisedbutVital...AllneedtobeDependable
11
The Visible Face of Computing Today
EssenNalbutnotVital...ButBIG-BIG-BIG$
12
§ DigitalElectronics§ So.ware§ Memory§ OpNcs§ AnalogueElectronic§ Sensors/Transducers§ Mechanics§ Micro-Motors§ Displays§ DischargeTube§ RoboNcAssembly§ PlasNc,Metal,Glass
Input:Image(Light)=>Compute(ProcessImage)=>Output:SDCard(Electrons)
...ManyTechnologiesseamlesslycoopera;ng,toEnhanceHumanMemory...Tradi;onalsiloes(inc.SWandHW)arejustameanstothisend!
Electronic System (Cyber-physical System) - c2015
Incorporating DIGIC5+ (ARM)
System-Level Computation
‘Classic’ Computer
13
Human Population
Computing for the Masses ... ... Technology Products are Increasingly ‘Intelligent’
1970 1980 1990 2000 2010 2020 2030
Main Frame
Mini Computer
Personal Computer
Desktop Internet
Mobile Internet
Mill
ions
of
Uni
ts
1st Era Select work-tasks
2nd Era Broad-based computing for specific tasks
3rd Era Computing as part of our lives
TechnologyistheDriver
ConsumeristheDriver
...OldMarketsares;llthere;butdon'tdrivetheTechnologytoday!
14
Typical 2015 Computing Platform ... ... is just 137.2 x 70.5 x 5.9 mm
15
Typical 2015 Computing Platform Exynos5422Eight32bitCPUs(big.LITTLE):• Fourbig(2.1GHzARMA15)forheavytasks;
• Foursmall(1.5GHzARMA7)forlightertasks.
+NineMaliGPUcores...
...A~30CoreHeterogeneousMul;-Processor...InyourShirtPocket!
OneBoard...21significant‘Chips’
16
2010: Apple’s A4 SIP Package (Cross-sec;on)
ICPackagingTechnology§ Theprocessoristhecentrerectangle.Thesilvercirclesbeneathitaresolderballs.§ TworectanglesaboveareRAMdie,offsettomakeroomforthewirebonds.
§ PufngtheRAMclosetotheprocessorreduceslatency,makingRAMfasterandreducespowerconsumpNon...Butincreasescost.
§ Memory:Unknown§ Processor:Samsung/Apple(ARMProcessor)§ Packaging:Unknown(SIPTechnology)
Source ... http://www.ifixit.com
Processor SOC Die
2 Memory Dies
Glue
Memory ‘Package’
4-Layer Platform Package’
SteveJobsWWDC2010
17
2013: Samsung Solid-State Memory
§ SmartMemory(eMMC)§ 16-128Gbinasinglepackage
§ 8Gb/die.Stacked2-16die/package§ HandleserrorsintheAPI(SmartInterface)§ Packagejust1.4mmthick!(11.5x13x1.4mm)...Smallerthanapostagestamp
18
10nm
100nm
1um
10um
100um
App
roxi
mat
e P
roce
ss G
eom
etry
ITRS’99
Tran
sist
ors/
Chi
p (M
)
Tran
sist
or/P
M (
K)
“Verification Gap”
1,800py 8,500py
100py
Moore’s Law: Increasing Design Challenge...
http://en.wikipedia.org/wiki/Moore’s_law
19
§ TheysellthingsthatTheirCustomersdesireandcanafford§ Tosa;sfytheEnd-Customersneeds...InanEnd-Productwhichmaybeseveral‘layers’abovethem.
§ FocusontheirCoreCompetenciesasaComponentProviderinaGlobalMarket§ AvoidCommodiNsaNonbyDifferenNaNon
§ ImprovedCostandQuality(byimprovingProcess)..and..§ ImprovedBusiness-Models(whichmaketheMoney)..and..§ ImprovedFunc;onality(bynewTechnologyandMethods)
§ ButNewProductDevelopmentisaCostandaRisktobeMinimised§ Technology(HW,SW,Mechanics,Op;cs,Graphene,etc)justenablesOp;ons!§ New-Technologymaycostmore(includingrisk)thanitdeliversinProductValue!§ Over-Designcosts...Businesscan'taffordthePrecau;onaryPrinciple!
...BecausesuccessfulEnd-Productsfundtheiren;re(RD&I)Value-Chains...ReuseoftheirTechnologiesbecomeeconomicnecessityinothermarkets!
Computing Technologies in Business Context Businesses have to be Competitive, Money Making Machines today ...
20
Component and Sub-Systems from Global Enterprise ... ... Global Teams contributing Specialist Knowledge & Knowhow
§ AppleID’d159Tier-1Suppliers...§ ThousandsofEngineersGlobally
§ Est.10xTier-2Suppliers...§ IncludingVirtualComponents1and
Sub-Systems(ARMandotherIPProviders)
§ Mul;pleTechnologies...§ Hardware,Sojware,Op;cs,
Mechanics,Acous;cs,RF,Plas;cs,etc§ Manufacturing,Test,Qualifica;on,
etc.§ Methods,Tools,Training,etc
§ TensofthousandsEngineersGlobally...Morethan90%ofTechnologyandMethodsareReused(produc;vity)!
1:VirtualComponentsdonotappearonBOM
21
§ ButtheonlywaytoeconomicallyrealisethispotenNalisbyproductevoluNon;reusingandreusingagaintheworkofourtechnicalpredecessors...§ Hardware,SoHwareandotherTechnologies;MethodsandTools;andthroughoutthestack§ In-Company:SourcedandEvolvedfromPredecessorProducts§ Ex-Company:SourcedfrombusinesseswithSpecialistKnowledge/Experiance§ ReuseImprovesQuality;asobjectsaredesignedmorecarefully,andbug-fixesareincremental§ ReuseImprovesProducLvity;asobjectscanbedeployedwithoutunderstandtheirimplementa;on
technology(oritslimita;ons)...Itdeliversworkingsystemsquicklywithfiniteteams;butthedependabilitycannotbequan;fied!
...Despitethis,CommercialTechnologieswillbeusedinSystemsonwhichpeopleDepend§ ThecostofalternaLveswillbeseveralordersofmagnitudetoogreat§ Theissueis(just)makingdependablesystemsusingundependablecomponents
Designer Productivity has become the Limiting Factor The Customer Expectation of the Billions of available Transistors is irresistible!
22
ARM: Delivers Reuse-Based Productivity ...
....24Processorsin6FamiliesfordifferentApplica;onDomains
About 50MTr
About 50KTr
23
... Tools to create optimal Hetrogeneous Multi-Processors ...
ACE
ACE
NIC-400 Network Interconnect
Flash GPIO
NIC-400
USBQuad Cortex-
A15
L2 cache
Interrupt Control
CoreLink™DMC-520
x72DDR4-3200
PHY
AHB
Snoop Filter
Quad Cortex-
A15
L2 cache
Quad Cortex-
A15
L2 cache
Quad Cortex-
A15
L2 cache
CoreLink™DMC-520
x72DDR4-3200
8-16MB L3 cache
PCIe10-40GbE
DPI Crypto
CoreLink™ CCN-504 Cache Coherent Network
IO Virtualisation with System MMU
DSPDSP
DSP
SATA
Dual channel DDR3/4 x72
Up to 4 cores per cluster
Up to 4 coherent clusters
Integrated L3 cache
Up to 18 AMBA interfaces for I/O coherent accelerators
and IO
Peripheral address space
Heterogeneous processors – CPU, GPU, DSP and accelerators
Virtualized Interrupts
Uniform System memory
24
… Other Tools, Libraries and Partners to Realize the Potential § TechnologytobuildElectronicSystemsolu2ons:
§ SoHware,Drivers,OS-Ports,Tools,ULliLestocreateefficientsystemwithop;mizedsojwaresolu;ons
§ DiversePhysicalComponents,includingCPUandGPUprocessorsdesignedforspecifictasks
§ InterconnectSystemIPdeliveringcoherencyandthequalityofservicerequiredforlowestmemorybandwidth
§ OpLmisedCell-Librariesforahighlyop;mizedSoCimplementa;ons
§ WellConnectedtoPartnersintheLife-Cycle:§ Forcomplementarytoolsandmethodsrequiredby
SystemDevelopers
§ GlobalTechnologyGlobalPartners:§ >900Licences;MillionsofDevelopers
25
Are the Outcomes of this 'chain' Dependable? Evidently so: They are Functional and Dependable enough to satisfy Billions/yr!
(2Q2015)
Smart-Phone shipments 2Q15 - 185 million (~0.75B/yr)
...Theprobabilityofa'fairlyreliable'systemsfailing,whenyouneedtouseitfor'improbable'event,is'highlyimprobable'...Andmostlythisisenough
26
‘OpNmal’Plaporm
HW1" HW2" HW3" HW4"Hardware Interface"RTOS/Drivers"
Thr
ead"
Bus(es) Processor(s)
F1"F2"
F3"F4"
F5"
CreateFuncNonal-Model1ona'Generic'Plaporm
(F1)! (F3)!
(F5)!(F2)!
EvolvingtheModel(&
Plaporm)unNlFu
ncNonal
andNon-FuncNo
nal,PerformanceisA
dequate.
NOTE:'FinalSW'issNlla
ModelofBehaviou
r!
Design is Transforming a Model of Behaviour ... ... evolving a Mathematical Model to meet Non-Functional Constraints
TransformtoaFuncNonal-Modelonan'OpNmal'(HW/SW)Plaporm
1:ThisincludesaModelofExecu;onsuchasaJavaVM.
27
§ AllmodelsareasimplificaNonofreality;thereforetheyallhavelimitaNons§ "Allmodelsarewrong,butsomeareuseful"(G.E.Box)
§ NormalSo.wareDesignMethodsarecreate-it-wrong,test-it-right...§ QualityisestablishedbyTest;andbug-fixes/patchesinthefield(Aninherentlypoormethod)§ SojwareReuseoffershugelyimprovedProducLvity(Not-usingitisnotanop;on)§ SojwareReuseoffersimprovedQuality(Butoverwhat?)
§ ExaminaNonshowsthatallcodehashighresidualerrors...§ WellstructuredandtestedSource-Codehas~5errorsper1,000linesofcode(E-KLOC)§ Commercialcodeistypically~5xworsethanthis§ Mosterrorsareharmless–Butthereisnousefulcorrela;on
§ Formal-MethodsarebeRer;butcostishighifyoucan'tuNlise(normal)legacycode.§ ButEven'Perfect-Sojware's;llhastoexecuteonanImperfect-Plauorm
..."YES!":ButGood-Enoughsa;sfiestheCommercialImpera;veformostapplica;ons
Is Software (Logic) Inherently Undependable? Software is a Model of Reality, executing on a Hardware and Software Platform
28
Open Source is Dependable? "Somebody will see the bugs!" (But only if they look!)
1: http://www.wired.com/2014/04/heartbleedslesson/ 2: http://veridicalsystems.com/blog/of-money-responsibility-and-pride/
“ItisnowveryclearthatOpenSSLdevelopmentcouldbenefitfromdedicatedfull-Nme,properlyfundeddevelopers”“OSFtypicallyreceivesonly$2,000ayearindonaNons”§ OpenSSLHeartBleedbug(2014)1
§ UpdatewasreceivedjustbeforeaPublicHoliday§ Editorwasaknownandhigh-qualitysource§ Codewasreviewedinformallyandreleased
§ Editorwasconflictedwithday-job,familyandholidaypressure2 § Toolixleresourcestodoaproperjob.
§ ThiswasaclassicE-KLOCerror...§ NotaCoding,Formayng,orFunc;onalerror§ ItwasaSystemerror(anomissioninanon-func;onalaspectofthecode).
...Wasthe‘fault’withthesojwareSource(OpenSSLSojwareFounda;on(OSF))? ...OraUserCommunitytoo-readytobelieveintheMythofOpenSourcesojware?
29
§ BooleanMathemaNcs(HDL)isDependable;butimplementaNondependsonreliablymappingitsequaNonstothephysicalworldthroughLogic-Gates§ AGateisaSaturatedAnaloguecircuit;withNon-Func;onalaxributes.
§ CMOShasbeena'reliable'Booleanmappingfor30years,but...§ Today’s20nmtransistors(14nmsoon)havelargervariability,
andtherearemanymoreonachip(Typically1Bin2014)§ At70degC,Vtn=130mv(sigma~25mv)around1in5million,
transistorshaveVt<0(Can’tbeturnedoff)§ Sothat’s>100transistors/chipthatdon’tswitchoff§ Andthere'sanother>100thatonlyturn-onweakly(lowdrive/slow)§ Thisisintrinsic(atomic),sowillalwaysberandomlylocated!
..."NO!":Today’schipsshouldn’twork!(Sowhydothey?)
So is Hardware (Logic) Dependable? 1/3
B
A
+V
A
B OUT NAND
OUT
30
MiNgaNngthiswehave...§ WeakTransistors:Notall...
§ Areat70degCevenifthedieis(Butsomewillbehigher)§ AreMinimumSize(Larger‘area’reducesvariability)§ AreonCri;calPaths;andtheprobabilityoftherebeingmorethanoneonapathislow!
§ CMOSLogic:IsveryrobustandwillconNnuetofuncLonwithout-of-spectransistors§ LeakyGatesandFasterTransi;onsareseldomfunc;onalfailures(buttheydohitreliability!)§ Speedvaria;onsonapathaverageout(onaverage!)§ Errorsarefrequentlydifficulttodetect(andthuscorrect!)
§ Memory:AnalogueCircuitsaremuchmoresensiNvetotransistorvariaNon.But...§ Failuresareeasiertodetect(andworkaround)§ Sparerows/columnsareincludedtofixmanufacturing(sta;c)defects...butnotdynamic(use)§ NV-Mlimitedwrite-cyclesandbitfailuresareshieldedbytheirsmartAPI...tosomedegree.
...Hardwarefailureisnotalwayseasilyspoxedatthefunc;onallevel!
So is Hardware (Logic) Dependable? 2/3
31
§ Andwehaven'tincludedimponderables... § InternallyandExternallygeneratednoise?(Greatersuscep;bilityatlowervoltages)§ High-energypar;cles?(Greatersuscep;bilityatsmallergeometries)§ Wear-out:Vt/GaindrijandElectroMigra;on?(Greatersuscep;bilityatsmallergeometries)§ LocalHot-Spots?(140Cisnotuncommononchip)§ Limita;onsofVerifica;onandTest(State-Spaceexplora;onisalwaysasub-set)
§ WearerepeatedlymulNplyingNny-improbables,byeverlarger-numbers...§ Andmanyofthevaluesareonlyguesses!§ Wehavenorealideaaboutthereliability/dependabilityofmodernSystemsorComponents
§ Butweknowthatasprocessgeometriesshrink,SuscepNbilitywillgetworse...§ Chipswillgetevermorecomplex(andmorechipswillbeusedinmorecomplexSystems)§ TransistorswillgetsmallerandDesignerswillerodesafetymarginstogetperformance
...Despitethis;ChipsandSystemsdoYieldmorethanwewouldrightlyexpect... ...Sowemustbeu;lisingUnknownSafetyMargins!
So is Hardware (Logic) Dependable? 3/3
32
Killing a Sacred Cow: SW and HW Logic are the Same ... They have different characteristics, so choice is a System Architectural decision!
// A master-slave type D-Flip Flop
module flop (data, clock, clear, q, qb);
input data, clock, clear;
output q, qb;
// primitive #delay instance-name
// (output, input1, input2, .....),
nand #10 nd1 (a, data, clock, clear),
nd2 (b, ndata, clock),
nd4 (d, c, b, clear),
nd5 (e, c, nclock),
nd6 (f, d, nclock),
nd8 (qb, q, f, clear);
nand #9 nd3 (c, a, d),
nd7 (q, e, qb);
not #10 inv1 (ndata, data),
inv2 (nclock, clock);
endmodule
'Hardware' Language (Verilog) 'Software' Language (C)
#include<time.h>
/* Use the PC's timer to check */
/* processing time */
main()
{
clock_t time,deltime;
long junk,i;
float secs;
LOOP:
printf("input loop count: ");
scanf("%ld",&junk);
time = clock();
for(i=0;i<junk;i++)
deltime = clock() - time;
secs = (float) deltime/CLOCKS_PER
printf("for %ld loops, #tics = %
%f\n",junk,deltime,secs);
goto LOOP;
...
Target Platform CMOS -------- CPU
Target Architecture Info
Compilers HW ----------- SW
Configuration Files HW -------------- SW
33
§ BytheNmeyouarewriNngApplicaNonsyouarehugelydependentonthelayered-accuracyofotherpeoplesworkbeneath
...BothHardwareandSo.ware
So whilst Boolean Mathematics is Absolute ... ... all implementations of it are not
A Software View
A Hardware View
34
§ WeCan’tDesignthemRight§ HWisSW;andCodingerrorsremain.State-spacetoobigforsimula;on
explora;on.Can’tmodelorexplorewholeSystemsandtheyaretoocomplexforFormalmethods.Reuseembodiesunknownbugs.
§ WeCan’tMakethemRight§ ChipsaresubjecttoProcessImperfec;onsandVariability.Chipsand
SystemsaresubjecttoVerifica;onsandTestEscapes.Booleanmathisabsolute;logiccellsandreallayoutsarenot
§ WeCan’tKeepthemRight§ Chipsaresuscep;bletoSupplyTransients,Wear-OutandHigh-Energy
par;cles.Mostdamageisnotimmediatelyobvious.
...Anditwillallgetworseasprocessgeometriesshrink
...YeteveryyearwemakeBillionsofSystemsthatwork! "TheNaysayersarejustHarbingersofDoom!"
So Complex Electronic Systems are Impossible!
35
§ System-LevelDependabilityiswhatmaCers...§ ComponentandSub-Systemdependabilityisinherentlypoor(andwillgetworse).
§ ProducNvitydemandsthatDependableSystemsmustReuseComponentsandSub-Systems(PhysicalandVirtual);andtheaffordableonesareofCommercialquality!§ Clean-Sheetdesignisnotanop;onforalmostallcomplexproducts! ...thecost-is-no-objectcustomerisanendangeredspecie
§ IncreasingtheDependabilityofComponentsandSub-Systemshelps;butcanneverbeenough
§ ARMproductisreally;'EnhancedReuseforElectronicSystemDesignandManufacture'
...TheOnlyPlacetoimplementSystem-LevelDependabilityonanUndependablePlauorm,isattheSystem-Layer!
§ Reliablecomponentsandsub-systemswillhelp,butcannoteverbeenough§ Predominantlya'So.ware'challenge;butnotalone(Don'tforgetthesimpleWatch-Dog)
Dependable on Undependable Any Methods that are based on perfection in HW or SW are untenable ...
36
The Real Conclusions § SystemsarewhatEnd-Customersbuy;theyexpectthemtobeDependableEnough
§ Asubjec;veconcept;whichisApplica;on,StateandContextdependent(&Technologyindependent)
§ CommercialComponents(HW/SW)willbethebuildingblocksofDependableSystems§ CommercialusegivesustheTechnologieswhichweareeconomicallyboundtousetoday§ Thoughtheyworkbexerthanwewouldrightlyexpect,wecannotquan;fytheirquality§ ImprovingtheirQuality/Reliability/Dependabilityhelps;but100%isanasympto;cgoal!
§ TheSystemKnowswhattheSystemWants§ So:SystembehaviourandrobustnessmustbehandledattheSystem-Level(Top-Level);
onlyitcanknowtheexpectedac;onandappropriatecorrec;veac;onforitsdomain.§ And:BecauseofthesizeoftheFunc;onalandNon-Func;onalSpace,conformancecannotbe
measured;soitwillrequireaPolicyBasedapproach.
...Meanwhilesystemsthatpeopledependonwillbeproduced ...TheCommercialImpera;vecan’t/won'twaitforthe'rightmethodology'
37
The END Is Very Nigh ...
Pdf & SlideCast through http://ianp24.blogspot.com