![Page 1: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive](https://reader036.vdocument.in/reader036/viewer/2022081515/5ad000ca7f8b9a4e7a8d6144/html5/thumbnails/1.jpg)
RevisitingResourcePartitioningforMulti-coreChips:IntegrationofSharedResourcePartitioningonaCommercialRTOS
21Apr.2017
PAK,EUNJI
Seniorresearcher,ETRI(ElectronicsandTelecommunicationsResearchInstitute)
CMAAS’2017
![Page 2: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive](https://reader036.vdocument.in/reader036/viewer/2022081515/5ad000ca7f8b9a4e7a8d6144/html5/thumbnails/2.jpg)
Agenda• Qplus-AIR, acommercialRTOS• ComprehensivesharedresourcepartitioningimplementationonQplus-AIR
![Page 3: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive](https://reader036.vdocument.in/reader036/viewer/2022081515/5ad000ca7f8b9a4e7a8d6144/html5/thumbnails/3.jpg)
Qplus-AIR
ARINC653compliantRTOSCertifiableforDO-178BLevelA
![Page 4: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive](https://reader036.vdocument.in/reader036/viewer/2022081515/5ad000ca7f8b9a4e7a8d6144/html5/thumbnails/4.jpg)
IntroductiontoQplus-AIR• Qplus-AIR
� DevelopedbyETRIforsafety-criticalsystem(2010~2012)� MainoperatingsystemfortheIFCC(Integratedflightcontrolcomputer)ofUAV(UnmannedAvionicsVehicle),KAI
� IntegrateMC(MissionControl),FC(FlightControl),andC&C(CommunicationsandCommands)intheIFCC
� ARINC653compliantRTOS*� Robustpartitioningamongapplications� Spatialandtemporal� Preventcross-applicationinfluenceanderrorpropagationamongapplications
� Easyintegrationofmultipleapplicationswithdifferentdegreesofcriticality
*AirlinesElectronicEngineeringCommittee,AvionicsApplicationSoftwareStandardInterfaceARINCSpecification653Part1,2006
![Page 5: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive](https://reader036.vdocument.in/reader036/viewer/2022081515/5ad000ca7f8b9a4e7a8d6144/html5/thumbnails/5.jpg)
IntroductiontoQplus-AIR• Qplus-AIR
� CertifiablepackageforDO-178BLevelA� LightweightARINC653support:kernel-levelimplementation� Supportformulticoreplatforms(2014~)
• RTWORKS� AcommercialversionofQplus-AIR� ManagedbyRTST(2013~),ETRI’sspin-offcompany
� Startwith4developers,andnowhas11OSdevelopers� AUTOSAR(automotiveindustrystandard)andISO26262ASILDisinprogress
• ETRIfocusesonresearchissueswhileRTSTfocusesoncommercialization
![Page 6: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive](https://reader036.vdocument.in/reader036/viewer/2022081515/5ad000ca7f8b9a4e7a8d6144/html5/thumbnails/6.jpg)
ApplicationExamples• Safety-criticalindustrialapplications
� Integratedflightcontrolcomputerofunmannedavionicsvehicle,2010~2012
� Tiltrotorflightcontrolcomputer,2012� Nuclearpowerplantcontrolsystem,2013� HUMS(HealthandUsageMonitoringSystem)forhelicopter,2013~2016
� Subwayscreen-doorcontrolsystem,2016 (exporttoBrazil)� Communicationsystemofself-propelledguns,2017~� (project)Autonomousdrivingcar,2015~
![Page 7: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive](https://reader036.vdocument.in/reader036/viewer/2022081515/5ad000ca7f8b9a4e7a8d6144/html5/thumbnails/7.jpg)
ComprehensivesharedresourcepartitioningimplementationonQplus-AIR
![Page 8: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive](https://reader036.vdocument.in/reader036/viewer/2022081515/5ad000ca7f8b9a4e7a8d6144/html5/thumbnails/8.jpg)
Contents• Introduction
• HWplatform:P4080
• Comprehensiveresourcepartitioningimplementation� Memorybusbandwidthpartitioning� DRAMbankpartitioning� Sharedcachepartitioning– set-based/way-based
• CombinedallthetechniquesontheQplus-AIR
• Evaluations
• Conclusions&FutureWork
![Page 9: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive](https://reader036.vdocument.in/reader036/viewer/2022081515/5ad000ca7f8b9a4e7a8d6144/html5/thumbnails/9.jpg)
Introduction[1/2]• Robustpartitioningamongapplications(partitions)
� Qplus-AIRsupportsspatialandtemporalpartitioning� Ensuresindependentexecutionofmultipleapplicationswithvarioussafety-criticallevels
• Robustpartitioningmaynolongerbevalidinmulticore� Multiplecoressharehardwareresourcessuchascacheormemory� Concurrentlyexecutingapplicationsaffecteachotherduetothecontentiononsharedresource
� Majorsourceoftimingvariability� PessimisticWCETestimation→overprovisioningofhardwareresourcesandlowsystemutilization
� Insafety-criticalsystems,wehadtoturnoffbutonecore
![Page 10: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive](https://reader036.vdocument.in/reader036/viewer/2022081515/5ad000ca7f8b9a4e7a8d6144/html5/thumbnails/10.jpg)
Introduction[2/2]• Wemustdealwiththeresourcecontentionproperly
� WCEToftasksstaysguaranteedandtightlybounded� Especiallyforsafetycriticalapplicationsthatrequirecertification
• Requirementofinter-coreinterferencemitigation� “TheapplicanthasidentifiedtheinterferencechannelsthatcouldpermitinterferencetoaffectthesoftwareapplicationshostedontheMCPcores,andhasverifiedtheapplicant’schosenmeansofmitigationoftheinterference.“- FAACAST(CertificationAuthoritiesSoftwareTeam)-32APositionPaper*
• ComprehensivesharedresourcepartitioningimplementationonARINC653compliantRTOS� Integrateanumberofresourcepartitioningschemes,eachofwhichtargetsdifferentsharedhardwareresources, onQplus-AIR
� UniquechallengesduetothefactthattheRTOSdidnotsupportLinux-likedynamicpaging
*CertificationAuthoritiesSoftwareTeam,PositionPaperCAST-32A:Multi-coreProcessors,2016.
![Page 11: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive](https://reader036.vdocument.in/reader036/viewer/2022081515/5ad000ca7f8b9a4e7a8d6144/html5/thumbnails/11.jpg)
HWplatform,P4080[1/2]• P4080architecture*
� EightPowerPCe500mccores� Eachcorehasaprivate32KB-I/32KB-DL1and128KBL2cache� TwoL332-way1MBcacheswithcache-lineinterleaving� Twomemorycontrollersfortwo2GBDDRDIMMmodules(eachDIMMmoduleshas16DRAMbanks)
� CoreNet coherencyfabric– interconnectscoresandotherSoC modules,ahigh-bandwidthswitchthatsupportsseveralconcurrenttransactions
PowerPCe500mccore
CoreNetInterface
L2$
L1I-$ L1I-$
CoreNetFabric
L3$
DDR
Controller
L3$
DUART
GPIO
FMan
BMan
……
QMan
DDR
Controller
DIMM
mod
ule
DIMM
mod
ule
*P4080QorIQIntegratedProcessorHardwareSpecifications,Feb2014.
![Page 12: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive](https://reader036.vdocument.in/reader036/viewer/2022081515/5ad000ca7f8b9a4e7a8d6144/html5/thumbnails/12.jpg)
HWplatform,P4080[2/2]• PartitioningsupportofrecentPowerPCprocessors*
Hardware Support for Robust Partitioning in Freescale QorIQ Multicore SoCs (P4080 and derivatives), Rev. 0
10 Freescale Semiconductor
Overall partitioning model
Figure 4. Example of a Partitioned System
In this model, there are four distinct partitions, each running on two cores. The main memory is divided into several physical regions:
• Private• Shared between partitions; accessible at user level• Shared among partitions; restricted to hypervisor level
This mapping is enforced by the cores’ MMUs accessible only at the hypervisor level. System peripherals (PCIe and sRIO) in this example are not shared -- each is allocated to a partition usage. As such, the hypervisor is able to restrict their DMA-accessible memory range to some part of the memory region assigned to the partition through the MMU.
The shared internal memory (CPC) is partially partitioned, which provides two partition-specific sub-ranges.
NOTEThis CPC allocation can be done per-way. Each way is configured to work either as a cache or as a fixed-address sRAM.
1.7 HypervisorsSeveral hypervisor technologies are proposed for the P4080 to address different purposes.
RTOS suppliers, such as GreenHills, SysGo and WindRiver, have developed their own hypervisor technology with particular focus on safety and robust partitioning.
*HardwareSupportforRobustPartitioninginFreescaleQorIQMulticoreSoCs(P4080andderivatives)
Mainmemoryisdividedintoseveralphysicalregions• Private• Sharedbetweenpartitions;accessibleatuserlevel
• Sharedamongpartitions;restrictedtohypervisorlevel
Thismappingisenforcedbythecore’sMMUs
Systemperipheralsarenotshared• HypervisorisabletorestricttheirDMA-accessiblememoryrangetosomepartofthememoryregion(throughtheMMU)
CPCisPartitioned• Waypartition(32KBperway)
Eachcoreisallocatedtoeachpartition
Restrictthecoherencyoverhead• Disablethecoherency– preventsnoopoverhead• Specifyagroupparticipatingcoherency
![Page 13: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive](https://reader036.vdocument.in/reader036/viewer/2022081515/5ad000ca7f8b9a4e7a8d6144/html5/thumbnails/13.jpg)
Resourcepartitioningmechanisms• 1. Memorybus(interconnect)bandwidthpartitioning
• 2. Memorybankpartitioning
• Sharedcachepartitioning� 3. Set-basedcachepartitioningwithpagecoloring� 4. Way-basedcachepartitioningwiththesupportofP4080hardware
• CombineallthetechniquesandintegratedonQplus-AIR
• Paging� Memorybankpartitioningandset-basedcachepartitioningassumesthatOSsupportsLinux-likepaging
� PagingimplementationinQplus-AIR
![Page 14: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive](https://reader036.vdocument.in/reader036/viewer/2022081515/5ad000ca7f8b9a4e7a8d6144/html5/thumbnails/14.jpg)
ResourcepartitioningmechanismsMemorybusbandwidthregulator [1/2]• Busbandwidthregulator*
� Limitthebandwidthusagepercore
Core1 Core2
1)Setmemorybusbandwidthbudget
10/10 3/10
2)Count#ofrequestssenttomemorybus
3) Generateaninterrupt
Core1 Core2
Memorybus(CoreNet Fabric)
#/10 #/10
Memorybus(CoreNet Fabric)
Core1 Core2
10/10 3/10
Memorybus(CoreNet Fabric)
Core1 Core2
10/10 3/10
Memorybus(CoreNet Fabric)
4)Throttletherequestsfromcore1
*H.Yun,G.Yao,R.Pellizzoni,M.Caccamo,andL.Sha.Memorybandwidthmanagementforefficientperformanceisolationinmulti-coreplatforms.IEEETransactionsonComputers,65:562–576,2015.
![Page 15: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive](https://reader036.vdocument.in/reader036/viewer/2022081515/5ad000ca7f8b9a4e7a8d6144/html5/thumbnails/15.jpg)
ResourcepartitioningmechanismsMemorybusbandwidthregulator [2/2]• Implementation
� Setupthebudgetandconfiguretogenerateaninterruptwhenacoreexhaustthebudget� Configureperformancemonitoringcontrolregistersandperformancemonitoringcounters
� OSschedulerthrottlesfurtherexecutionatthatcore� ImplementinterrupthandlerfortheinterruptthatPMCgenerates� Schedulerde-schedulethetasksonthecore
• Periodofbandwidthregulatorexecution� Iftooshort,overheadbecomesexcessive;incontrast,iftoolong,predictabilityisworsened
� Defaultperiodofourimplementationis5ms
![Page 16: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive](https://reader036.vdocument.in/reader036/viewer/2022081515/5ad000ca7f8b9a4e7a8d6144/html5/thumbnails/16.jpg)
ResourcepartitioningmechanismsBank-awarememoryallocation• DRAMbank-awarememoryallocation*
� Managesmemoryallocationinsuchawaythatnoapplicationsharesitsmemorybankwithapplicationsrunningonothercores
1)requestmemory
DRAM2)Allocatephysicalmemory
Bank1
Bank2
Application2
VirtualMemory Physical
memory
OS
Application1
VirtualMemory
Core1 Core2
Physicalmemory
DRAM
Core1 Core2
Bank1
Physicalmemory
Bank2
Physicalmemory
Pagetable(virtual-to-physicaladdresstranslation)
HWMMU
*H.Yun,R.Mancuso,Z.-P.Wu,andR.Pellizzoni.PALLOC:Drambank-awarememoryallocatorforperformanceisolationonmulticoreplatforms.InRTAS,2014.
031 67141618
banks
12L3cachesets
L2cachesets
[P4080memoryaddressmapping]
![Page 17: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive](https://reader036.vdocument.in/reader036/viewer/2022081515/5ad000ca7f8b9a4e7a8d6144/html5/thumbnails/17.jpg)
ResourcepartitioningmechanismsSet-basedcachepartitioning [1/2]• Set-basedpartitioningviapagecoloring*
� Allocationofphysicalmemoryconsideringthecachesetlocation� 𝑛𝑢𝑚𝑏𝑒𝑟𝑜𝑓𝑐𝑜𝑙𝑜𝑟𝑠 = ./.012341
5/612341∗./.01/223.3/83938:
1)requestmemory
DRAM2)AllocatephysicalmemoryApplication2
RTOS
Application1
Core1 Core2
Cache
031 716 12
L3cachesets
colorsPhysicalpagenumber
*R.Mancuso,R.Dudko,E.Betti,M.Cesati,M.Caccamo,andR.Pellizzoni.Real-timecachemanagementframeworkformulti-corearchitectures.InRTAS,2013.*M.Chisholm,B.C.Ward,N.Kim,andJ.H.Anderson.Cachesharingandisolationtradeoffsinmulticoremixed-criticalitysystems.InRTSS,2015.
![Page 18: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive](https://reader036.vdocument.in/reader036/viewer/2022081515/5ad000ca7f8b9a4e7a8d6144/html5/thumbnails/18.jpg)
ResourcepartitioningmechanismsSet-basedcachepartitioning[2/2]• Implementations
� Manipulatesvirtualtophysicaladdressmapping– allocatedisjointcachesetstoeachcore� Amongaddressbits[15:7],cachesetindex,exploits[15:12]bits,whichintersectswiththephysicalpagenumberinP4080
• L2co-partitioning&Restrictionsofset-basedpartitioning� Co-partitionL2cache
� L3cachesetisdeterminedby[15:12]andL2cachesetby[13:6]� Using[13:12]bitshasasideeffectofco-partitioningL2cache
� Onlythe[15:14]bitsareallowedforL3cachesetpartitioning� Thenumberofcachepartitionsislimitedto4� Ifweadoptfor8cores,somecachesetsinevitablysharedby2cores
031 67141618
banks
12L3cachesets
L2cachesets
[P4080memoryaddressmapping]
![Page 19: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive](https://reader036.vdocument.in/reader036/viewer/2022081515/5ad000ca7f8b9a4e7a8d6144/html5/thumbnails/19.jpg)
ResourcepartitioningmechanismsWay-basedcachepartitioning[1/2]• Way-basedpartitioningwithHardware-levelsupport
� Configuremainmemorywithmultipledistinctpartitions� Foreachpartition,registerthe(memoryrange,target,andpartitionID)intheLAW(LocalAccessWindow)register
� PartitiontheL3cacheandallocatedisjointcachewaystoeachcore� ConfiguretheL3cache(CPC)relatedregisters– transactionsfromthespecifiedpartitioncanallocatetheblocksinthedesignatedcacheways� E.g.,transactionsfromthe‘partition1‘allocateblocksinthe‘way0,1,2,3’
Physicalmemory(DDR3,DRAM)CPC(L3cache)
e6500core
L1cache L1cache L1cache L1cache
2MBBankedL2cache
CoreNetCoherencyFabric
e6500core
e6500core
e6500core
LocalAccessWindowsLocalAccessWindowsLocalAccessWindows CPCConfigurationRegister
MMUMMU MMUMMU
Part.1
Part.2
Part.1 Part.2 Part.3 Part.4
Part.3
Part.4
shared
Part.
1
Part.
2
Part.
3
Part.
4
![Page 20: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive](https://reader036.vdocument.in/reader036/viewer/2022081515/5ad000ca7f8b9a4e7a8d6144/html5/thumbnails/20.jpg)
ResourcepartitioningmechanismsWay-basedcachepartitioning[2/2]• Relaxedrestrictionsonthenumberofcachepartitions
� Withset-basedcachepartitioning,numberofcachepartitionsisrestricteduptofour
� P4080supportscachepartitioningwithper-waygranularity,witheachwayproviding32KB� L3cacheis32-wayandcanbepartitionedto32parts
• Limitationsofway-basedcachepartitioning� Way-basedcachepartitioningcannotbeusedwithset-basedcacheormemorybankpartitioning
� Conflictingrequirementofmemoryallocation� Sequentialvs.interleaving� MayberelevanttoallotherPowerPCchipmodels
� Cachewaylockingallowintegration� MostARMprocessorssupportscachewaylocking� PowerPCe500mcprocessorsupportscachelockinginablockgranularity
Part.1(core1)
Part.2(core2)
Part.1Part.2Part.1Part.2Part.1Part.2
vs.
![Page 21: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive](https://reader036.vdocument.in/reader036/viewer/2022081515/5ad000ca7f8b9a4e7a8d6144/html5/thumbnails/21.jpg)
ImplementationissuesFromtheperspectiveofanRTOS[1/4]• Challenges– paging
� PagecoloringassumesthatOSmanagesmemorywithfixed-sizedpages(normally,4KB)
� Qplus-AIRdeliberatelyavoidpagingduetothetimingpredictabilityisworsenedwhenaTLBmissoccurswithinapagingscheme
Kerneldata
Kernelcode
Partition2
Partition1
Partition3
Memorylayout
• MemorymanagementofQplus-AIR� Managedwithvariablesizedpagesratherthanfixed4KBpages� Kerneldata/code,partitionregions� Manageseachregionasonelargepage- 1TLBentryforeachregion
� OSlockstheentryintheTLB- ForceallthemappingdatatostayintheTLB
� Sizeofmemoryforeachapplicationisconfiguredbydevelopers
� MMUisusedtopreventcross-applicationmemoryaccesses
16MB
16MB
Size(example)
16MB
64MB
64MB
![Page 22: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive](https://reader036.vdocument.in/reader036/viewer/2022081515/5ad000ca7f8b9a4e7a8d6144/html5/thumbnails/22.jpg)
ImplementationissuesFromtheperspectiveofanRTOS[3/4]• MemorymanagementinP4080
� TwolevelsofMMU� Hardware-managedL1MMU� Software-managedL2MMU
� EachMMUconsistsof� TLBforvariable-sizedpages(VSP),11differentpagesizes(4KB~4GB)
� TLBfor4KBfixed-sizedpages(FSP)� TLBlockingforvariable-sizedpages
• Modify memorymanagementofQplus-AIR� Tosupportpagecoloring,whichisusedtoimplementmemorybankpartitioningandset-basedcachepartitioning
� Manageapplication’smemoryregionswith4KBgranularity� Managementofkernelregionswasunchanged– bindperformancepredictabilityofkernelexecution
[ref.]PowerPCe500mccorereferencemanual
![Page 23: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive](https://reader036.vdocument.in/reader036/viewer/2022081515/5ad000ca7f8b9a4e7a8d6144/html5/thumbnails/23.jpg)
ImplementationissuesFromtheperspectiveofanRTOS[3/4]• Overheadofpaging
� ‘Latency’benchmarkwithchangingdatasizeandaccesspattern� Sequentialaccessandrandomaccessoflinkedlist
� Measuretheaveragememoryaccesslatency
0
10
20
30
40
50
60
70
80
90
0 2000 4000 6000 8000 10000
aver
age
mem
ory
late
ncy
data size (KB)
paging overhead(sequential access)
no paging paging
0
50
100
150
200
250
300
0 2000 4000 6000 8000 10000
aver
age
mem
ory
late
ncy
data size (KB)
paging overhead(random access)
no paging paging
Upto6%overheadwhendatasize>2MB
[note]TLBhitratio=98.43%L2TLBhas512-entry
Upto197%overheadwhendatasize>2MB
![Page 24: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive](https://reader036.vdocument.in/reader036/viewer/2022081515/5ad000ca7f8b9a4e7a8d6144/html5/thumbnails/24.jpg)
ImplementationissuesFromtheperspectiveofanRTOS[4/4]• Analysisofoverhead
� DegradationisduetotheMMUarchitectureofe500mccore� L1instructionanddataTLBsandL2unifiedTLB� L1MMUiscontrolledasaninclusivecacheofL2MMU� InPowerPCe6500core,L1andL2MMUisnotinclusive
• Requirementsforthepredictablepaging� Somestudiesfocusedonpredictablepaging*� COTShardwareprovidesmeansforimplementingpredictablepaging–software-managedTLBorTLBlocking
L1TLB
L2TLB
L1TLB
L2TLB
L1TLB
L2TLB
TLBentryforcodeTLBentryfordata
Evict(replaceout)InstructionTLBentries
DatasizeincreasesInvalidated(inclusionproperty)
L1I-TLBmiss!
L1I-TLBmissevenifthecodesizeiswithintheL1I-TLBcoverage
I-TLB D-TLB
*D.HardyandI.Puaut.Predictablecodeanddatapagingforrealtimesystems.InECRTS,2008.
*T.Ishikawa,T.Kato,S.Honda,andH.Takada.Investigationandimprovementontheimpactoftlb missesinreal-timesystems.InOSPERT,2013.
![Page 25: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive](https://reader036.vdocument.in/reader036/viewer/2022081515/5ad000ca7f8b9a4e7a8d6144/html5/thumbnails/25.jpg)
ResourcepartitioningmechanismsIntegrationofpartitioningschemes• Fourtechniqueswithpaging
� Memorybuspartitioning(RP-BUS),memorybankpartitioning(RP-BANK),set-basedcachepartitioning(RP-$SET),andway-basedcachepartitioning(RP-$WAY)
• Integrationofmemorybus,memorybank,andset-basedandway-basedcachepartitioningmechanisms� Notethatway-basedcachepartitioningcannotbeintegratedwithmemorybankpartitioningorset-basedcachepartitioning
• Possibleintegration options� Integrationoption#1:RP_BUS,RP_BANK,andRP_$SET
� Restrictionsonthenumberofavailablecachepartitions� Integrationoption#2:RP_BUSandRP_$WAY
� Contentionsonmemorybankisunavoidable
![Page 26: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive](https://reader036.vdocument.in/reader036/viewer/2022081515/5ad000ca7f8b9a4e7a8d6144/html5/thumbnails/26.jpg)
Evaluations [1/5]• Evaluationsetup
� Hardwareplatform� P4080withactivate4or8oftotal8cores
� Softwareplatform� Qplus-AIR
� Syntheticbenchmark� Latency :traversealinkedlisttoperformaread/writeoperationoneachnode,memoryrequestismadeoneatatime
� Bandwidth :accessmemoryinsequencewithnodatadependencybetweenconservativeaccesses– CPUgeneratemultiplememoryrequestsinparallel,maximizingmemorylevelparallelism(MLP)availableinthememorysystem
� Metric� Averagememoryaccesslatency(ns)– timetoread/writeoneblock(64B)� Normalizeaveragelatencytothebest-casewithoutresourcecontention
![Page 27: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive](https://reader036.vdocument.in/reader036/viewer/2022081515/5ad000ca7f8b9a4e7a8d6144/html5/thumbnails/27.jpg)
Evaluations[2/5]• Evaluationsetup
� Twobenchmarkmixes� 4-core MIX
� Causecontentiononallthememoryresourcestoevaluateeachpartitioningmechanismandintegratedone
� 8-coreMIX� toshowthelimitationofset-basedcachepartitioning
� Datasizeconfiguration
Core1 Core2, 3 Core4
Latency(512KB)
Bandwidth(4MB)
Bandwidth(32MB)
Core1, 2 Core3, 4, 5, 6 Core7, 8
Latency(512KB)
Bandwidth(4MB)
Bandwidth(32MB)
DatasizeExamples Cache(LLC)
hit ratePlatform:2MBLLCon4-coreCPU
LLC SizeofLLCdividedbynumberofcores
2MB/4cores=512KB
100%
DRAM/small TwicethesizeofLLC 2MB;2 =4MB 0%
DRAM/large SignificantlylargerthanLLC
Muchlargerthan2MB(32MBinourexperimentalsetup)
0%
![Page 28: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive](https://reader036.vdocument.in/reader036/viewer/2022081515/5ad000ca7f8b9a4e7a8d6144/html5/thumbnails/28.jpg)
Evaluations [3/5](a) (b) (c) (d) (e)
core1 0.41 0.55 0.97 0.97 1.00
core2 0.49 0.57 0.62 0.78 1.00
core3 0.50 0.57 0.62 0.79 1.00
core4 0.93 0.87 0.87 0.85 1.00
0.20.30.40.50.60.70.80.91
1.1
(a)WORST (b)RP_BANK (c)RP_BANK+RP_$SET
(d)RP_BANK+RP_$SET+RP_BUS
(e)BEST
Normalize
dperformance
core1 core2 core3 core41 istheperformancew/ointerference
• 4-coreMIX,IntegrationOption#1� RP_BANK,RP_$SET,andRP_BUS� (b)RP_BANK:allthecoresareenabledtoaccessbanksinparallel� (c)AddingRP_$SETensures512KBL3cacheforLatency(LLC)apprunningoncore1(56%improvementcomparedtotheworst-case)� Moreover,feweraccessestomainmemorywererequestedbycore1helpsperformanceonothercores
� (d)AddRP_BUS:Performancewhenalltechniquesareputtogether
![Page 29: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive](https://reader036.vdocument.in/reader036/viewer/2022081515/5ad000ca7f8b9a4e7a8d6144/html5/thumbnails/29.jpg)
Evaluations [4/5]• 4-coreMIX,Integrationoption#2
� RP_$WAYandRP_BUS� RP_BANKisinapplicable
� Inthisbenchmark,memoryaccessisnotconcentratedtoabanksinceRP_$WAYallocatesmemorytoeachcoresequentially
� However,worstcasecouldarisedependingonantaskbehavior� RP_$WAYvs.RP_SET
� PagingoverheadonRTOSdegradesperformance� 3%, 16%, 17%, and13%foreachapplicationoncore1,2,3,and4
0.20.30.40.50.60.70.80.91
1.1
(a)WORST (b)RP_$WAY (c)RP_$WAY+RP_BUS
(d)BEST
Normalize
dperformance
core1 core2 core3 core4
(a) (b) (c) (d)
core1 0.41 1.00 1.00 1.00
core2 0.49 0.78 0.91 1.00
core3 0.50 0.79 0.91 1.00
core4 0.93 1.01 0.89 1.00
0.20.30.40.50.60.70.80.91
1.1
(c)RP_BANK+RP_$SET
1 istheperformancew/ointerference
![Page 30: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive](https://reader036.vdocument.in/reader036/viewer/2022081515/5ad000ca7f8b9a4e7a8d6144/html5/thumbnails/30.jpg)
0
0.2
0.4
0.6
0.8
1
1.2
(a)WORST (b)RP_BANK+RP_$SET (c)RP_BANK+RP_$SET+RP_BUS
(d)RP_$WAY (e)RP_$WAY+RP_BUS BEST
Norm
alize
dperfo
rmance
core1 core2 core3 core4 core5 core6 core7 core8
Evaluations [5/5]• 8-coreMIX,Integration#1
� Restrictionsonnumberofpossiblecachepartitions� RP_$SET– 4partitions,RP_$WAY– 32partitionsinP4080platform� PerformanceofLatency(LLC)isabout64%and88%withRP_$SETandRP_$WAY,respectively
� Overheadofpaging� Comparetheperformancein(b)and(d),or(c)and(e)
(a) (b) (c) (d) (e) (f) core1 0.37 0.64 0.64 0.88 0.87 1.00core2 0.37 0.64 0.63 0.88 0.86 1.00core3 0.30 0.42 0.54 0.52 0.71 1.00core4 0.30 0.42 0.54 0.52 0.71 1.00core5 0.30 0.42 0.54 0.53 0.71 1.00core6 0.30 0.42 0.54 0.53 0.71 1.00core7 0.82 0.75 0.74 0.94 0.79 1.00core8 0.82 0.74 0.73 0.94 0.79 1.00
1 istheperformancew/ointerference
![Page 31: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive](https://reader036.vdocument.in/reader036/viewer/2022081515/5ad000ca7f8b9a4e7a8d6144/html5/thumbnails/31.jpg)
Conclusions&FutureWork• Conclusions
� Qplus-AIR,anARINC653compliantRTOS� ComprehensivesharedresourcepartitioningimplementationonanARINC653compliantRTOS,Qplus-AIR� Implementationissuesofimplementingandcombiningmultipleresourcepartitioningmechanisms
� TheuniquechallengesweencounteredduetothefactthattheRTOSdidnotsupportLinux-likedynamicpaging
• FutureWork� Predictablepaging� Evaluationwithreal-worldapplications
![Page 33: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive](https://reader036.vdocument.in/reader036/viewer/2022081515/5ad000ca7f8b9a4e7a8d6144/html5/thumbnails/33.jpg)
References [1/2][1]AirlinesElectronicEngineeringCommittee,AvionicsApplicationSoftwareStandardInterfaceARINCSpecification653Part1,2006.[2]BIOSandkerneldeveloper’sguildforAMDfamily15hprocessors,March2012.[3]ARMCortex53TechnicalReferenceManual,2014.[4]P4080QorIQIntegratedProcessorHardwareSpecifications,Feb2014.[5]CertificationAuthoritiesSoftwareTeam,PositionPaperCAST-32A:Multi-coreProcessors,2016.[6]QorIQ T2080ReferenceManual,2016.[7]M.Chisholm,B.C.Ward,N.Kim,andJ.H.Anderson.Cachesharingandisolationtradeoffsinmulticoremixed-criticalitysystems.InRTSS,2015.[8]J.Flodin,K.Lampka,andW.Yi.Dynamicbudgetingforsettlingdramcontentionofco-runninghardandsoftreal-timetasks.InSIES,2014.[9]D.HardyandI.Puaut.Predictablecodeanddatapagingforrealtimesystems.InECRTS,2008.[10]T.Ishikawa,T.Kato,S.Honda,andH.Takada.Investigationandimprovementontheimpactoftlb missesinreal-timesystems.InOSPERT,2013.[11]H.Kim,A.Kandhalu,andR.Rajkumar.Acoordinatedapproachforpracticalos-levelcachemanagementinmulti-corereal-timesystems.InECRTS,2013.[12]T.Kim,D.Son,C.Shin,S.Park,D.Lim,H.Lee,B.Kim,andC.Lim.Qplus-air:Ado-178bcertifiablearinc 653rtos.InThe8thISET,2013.
![Page 34: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive](https://reader036.vdocument.in/reader036/viewer/2022081515/5ad000ca7f8b9a4e7a8d6144/html5/thumbnails/34.jpg)
References [2/2][13]R.Mancuso,R.Dudko,E.Betti,M.Cesati,M.Caccamo,andR.Pellizzoni.Real-timecachemanagementframeworkformulti-corearchitectures.InRTAS,2013.[14]M.D.BennettandN.C.Audsley.Predictableandefficientvirtualaddressingforsafety-criticalreal-timesystems.InECRTS,2001.[15]J.NowotschandM.Paulitsch.Leveragingmulti-corecomputingarchitecturesinavionics.InEDCC,2012.[16]J.Nowotsch,M.Paulitsch,D.Buhler,H.Theiling,S.Wegener,andM.Schmidt.Multi-coreinterference-sensitivewcetanalysisleveragingruntimeresourcecapacityenforcement.InECRTS,2014.[17]S.A.PanchamukhiandF.Mueller.Providingtaskisolationviatlbcoloring.InRTAS,2015.[18]M.K.QureshiandY.N.Patt.Utility-basedcachepartitioning:Alow-overhead,high-performance,runtimemechanismtopartitionsharedcaches.InMICRO,2006.[19]R.E.KesslerandM.D.Hill.Pagereplacementalgorithmsforlargereal-indexedcaches.InACMTrans.onComp.Sys.,1992.[20]L.Sha,M.Caccamo,R.Mancuso,J.-E.Kim,andM.-K.Yoon.Singlecoreequivalentvirtualmachinesforhardreal-timecomputingonmulticoreprocessors,whitepaper.2014.[21]N.Suzuki,H.Kim,D.deNiz,B.Anderson,L.Wrage,M.Klein,andR.Rajkumar.Coordinatedbankandcachecoloringfortemporalprotectionofmemoryaccesses.InICCSE,2013.[22]H.Yun,R.Mancuso,Z.-P.Wu,andR.Pellizzoni.Palloc:Drambank-awarememoryallocatorforperformanceisolationonmulticoreplatforms.InRTAS,2014.[23]H.Yun,G.Yao,R.Pellizzoni,M.Caccamo,andL.Sha.Memorybandwidthmanagementforefficientperformanceisolationinmulti-coreplatforms.IEEETransactionsonComputers,65:562–576,2015.