a memory consistency model for risc-v...t0 t1 t2 t3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à...
TRANSCRIPT
![Page 1: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0](https://reader031.vdocument.in/reader031/viewer/2022041120/5f32b1ec6699724c863b6abd/html5/thumbnails/1.jpg)
AMemoryConsistencyModelForRISC-VFormallyEvaluatedwithTriCheck
CarolineTrippelPrincetonUniversityNovember29,2016
CarolineTrippel,Yatin Manerkar,DanielLustig,MichaelPellauer,andMargaretMartonosi.“TriCheck:MemoryModelVerificationattheTrisectionSoftware,Hardware,andISA”.In ProceedingsoftheTwenty-SecondInternationalConferenceonArchitecturalSupportforProgrammingLanguagesandOperatingSystems (ASPLOS'17).
![Page 2: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0](https://reader031.vdocument.in/reader031/viewer/2022041120/5f32b1ec6699724c863b6abd/html5/thumbnails/2.jpg)
RoleoftheInstructionSetArchitecture(ISA)
Software/HLL
HardwareISA
Weak PPO(e.g.,ARM,POWER)
More orderingprimitives(e.g.,fences/barriers)insertedbycompiler
• Introducedin1964byIBM• 1setofsoftware• >1hardwareimplementations
• Definitivespec.ofhardwareasseenbysoftware:
• Specificationofwhathardwaremustimplement
• Targetforcompilertranslation
Software/HLL
Hardware
ISA
Strong PPO(e.g.,SC,TSO)
Fewer orderingprimitives(e.g.,fences/barriers)insertedbycompiler
![Page 3: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0](https://reader031.vdocument.in/reader031/viewer/2022041120/5f32b1ec6699724c863b6abd/html5/thumbnails/3.jpg)
OurWork:MemoryConsistencyModelVerification
Software/HLLMemoryModel
ISAMemoryModel
HardwareMemoryModel
Compilation
MicroarchitecturalImplementation PipeCheck [Lustig etal.MICRO-47]CCICheck [Manerkar etal.MICRO-48]
COATCheck [Lustig etal.ASPLOS‘16]
ArMOR [Lustig etal.ISCA‘15]
OperatingSystem
TriCheck [Trippeletal.ASPLOS‘17]
![Page 4: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0](https://reader031.vdocument.in/reader031/viewer/2022041120/5f32b1ec6699724c863b6abd/html5/thumbnails/4.jpg)
MemoryModelsBugsObservedinPractice
ARMRead-after-ReadHazard[Alglave etal.TOPLAS‘14]• AmbiguousISAspec.regardingsame-addressLdàLd ordering
• ARMcompilersdidnotinsertsynchronizationprimitives(e.g.,fences/barriers)• SomeARMimplementationsrelaxedsame-addressLdàLd ordering(e.g.,Cortex-A9,Snapdragon805)
• C/C++atomicsrequiresame-addressLdàLd ordering• ARMissuederrata1:Rewritecompilerstoinsertfences(withperformancepenalties)
We’veidentifiedandcharacterizedflawsinthecurrentRISC-Vmemorymodel(i.e.,thememorymodeldefinedinthecurrentmanual)[Trippeletal.ASPLOS‘17]
1ARM.Cortex-A9MPCore,programmeradvicenotice,read-after-readhazards.ARMReference761319.,2011.http://infocenter.arm.com/help/topic/com.arm.doc.uan0004a/UAN0004A_a9_read_read.pdf.
Notethatthemodificationstofixtheseissueswillbemostlycompatiblewithcurrentimplementations.
![Page 5: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0](https://reader031.vdocument.in/reader031/viewer/2022041120/5f32b1ec6699724c863b6abd/html5/thumbnails/5.jpg)
Outline
• RoleofMemoryModelsinISAs• WhatShouldWeRequireFromtheHardware?• WhatFences/BarriersDoWeNeedtoSupportC/C++?• TriCheck FrameworkforFull-StackMemoryModelVerification• On-GoingWork&Conclusions
![Page 6: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0](https://reader031.vdocument.in/reader031/viewer/2022041120/5f32b1ec6699724c863b6abd/html5/thumbnails/6.jpg)
SequentialConsistency
• Memory models specify the allowed behavior of a multithreaded program executing with shared memory
• First defined by [Lamport 1979], execution is the same as if:(R1) Memory ops of each processor appear in program order(R2) Memory ops of all processors were executed in some global sequential order
Thread 0x=1y=1
Thread 1r1=yr2=x
x=1y=1r1=yr2=x
x=1r1=yy=1r2=x
x=1r1=yr2=xy=1
r1=yr2=xx=1y=1
r1=yx=1r2=xy=1
r1=yx=1y=1r2=x
Program Legal Executions
![Page 7: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0](https://reader031.vdocument.in/reader031/viewer/2022041120/5f32b1ec6699724c863b6abd/html5/thumbnails/7.jpg)
TwoCategoriesofMemoryModelRelaxation
PreservedProgramOrder:DefinesprogramorderingsthathardwaremustpreservebydefaultStoreAtomicity:Definesorderinwhichstoresbecomevisibletocores• Multiple-copyatomic:
• Allcoresseestoresimultaneously• Read-Own-Write-Early-multiple-copyatomic:
• Storingcorecanreaditsownstorebeforeothercores• Storesmadevisibletoallremotecoressimultaneously
• Non-multiple-copyatomic:• Storingcorecanreaditsownstorebeforeothercores• Storeismadevisibletosomeremotecoresbeforeothers
E.g.,monolithicmemory
E.g.,privatestorebuffer
E.g.,sharedstorebuffer
![Page 8: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0](https://reader031.vdocument.in/reader031/viewer/2022041120/5f32b1ec6699724c863b6abd/html5/thumbnails/8.jpg)
RISC-VProposedPreservedProgramOrderandStoreAtomicityPreservedProgramOrder:
StoreAtomicity:Non-multiple-copyatomic:
• Storingcorecanreaditsownstorebeforeothercores• Storeismadevisibletosomeremotecoresbeforeothers
![Page 9: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0](https://reader031.vdocument.in/reader031/viewer/2022041120/5f32b1ec6699724c863b6abd/html5/thumbnails/9.jpg)
EffectsofNon-Multiple-CopyAtomicStores
Initial conditions: x=0, y=0 T0 T1 T2 T3
st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2]F R, R F R, R
ld y à [r1] ld x à [r3]Non-SC Outcome: r0=1, r1=0, r2=1, r3=0
ThisoutcomecorrespondstothecaseinwhichthestoresonthreadsT0andT1arrivetothreadsT2andT3indifferentorders
L1$
L1$
![Page 10: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0](https://reader031.vdocument.in/reader031/viewer/2022041120/5f32b1ec6699724c863b6abd/html5/thumbnails/10.jpg)
WhyAllowNon-Multiple-CopyAtomicStores?
• CommercialISAsallownon-multiple-copyatomicstores(e.g.ARM,POWER)
• RISC-VisintendedtobeintegratedwithothervendorISAs• Potentialdeploymentinnon-multiple-copyatomicmemorysystems• Ifsharingmemorysystem,awarenessthatstoresmaybeobservedinordersthatdifferfromothercores
![Page 11: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0](https://reader031.vdocument.in/reader031/viewer/2022041120/5f32b1ec6699724c863b6abd/html5/thumbnails/11.jpg)
Outline
• RoleofMemoryModelsinISAs• WhatShouldWeRequireFromtheHardware?• WhatFences/BarriersDoWeNeedtoSupportC/C++?• TriCheck FrameworkforFull-StackMemoryModelVerification• On-GoingWork&Conclusions
![Page 12: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0](https://reader031.vdocument.in/reader031/viewer/2022041120/5f32b1ec6699724c863b6abd/html5/thumbnails/12.jpg)
FencestoRestoreMultiple-CopyAtomicity
Initial conditions: x=0, y=0 T0 T1 T2 T3
st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2]pscF RW, RW pscF RW, RW
ld y à [r1] ld x à [r3]Non-SC Outcome: r0=1, r1=0, r2=1, r3=0
Predecessor-/Successor- CumulativeFence:NecessarytoRestoreSCforNon-Multiple-CopyAtomicMemorySystems
![Page 13: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0](https://reader031.vdocument.in/reader031/viewer/2022041120/5f32b1ec6699724c863b6abd/html5/thumbnails/13.jpg)
OtherFences/Barriers/OrderingPrimitives
• BaselineMemoryModel• PPOrequiressame-addressR-Rordertobemaintained• PPOrequiresordertobemaintainedbetweenmostdependentinstructions• Predecessor-/Successor-CumulativeFRW,RW;FIO,IO;FIORW,IORW
• Baseline+AtomicsExtension• Predecessor-CumulativeFRW,W
![Page 14: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0](https://reader031.vdocument.in/reader031/viewer/2022041120/5f32b1ec6699724c863b6abd/html5/thumbnails/14.jpg)
Outline
• RoleofMemoryModelsinISAs• WhatShouldWeRequireFromtheHardware?• WhatFences/BarriersDoWeNeedtoSupportC/C++?• TriCheck FrameworkforFull-StackMemoryModelVerification• On-GoingWork&Conclusions
![Page 15: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0](https://reader031.vdocument.in/reader031/viewer/2022041120/5f32b1ec6699724c863b6abd/html5/thumbnails/15.jpg)
TriCheck Full-StackVerificationFramework
SuiteofC/C++LitmusTests
SuiteofSmallC/C++Programs
CompilerMappingsfromC/C++toRISC-V
C/C++HerdModel RISC-VCheckModel
ISALevelOutcomeForbiddenC/C++OutcomeForbidden implies
TriCheck comparesHLLoutcomestoISA-leveloutcomes
foraspectrumoflegalISAmicroarchitectures.
![Page 16: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0](https://reader031.vdocument.in/reader031/viewer/2022041120/5f32b1ec6699724c863b6abd/html5/thumbnails/16.jpg)
TriCheck Full-StackVerificationFramework
SuiteofC/C++LitmusTests
SuiteofSmallC/C++Programs
CompilerMappingsfromC/C++toRISC-V
C/C++HerdModel RISC-VCheckModel
ISALevelOutcomeForbiddenC/C++OutcomeForbidden implies
ISADOESNOTALLOWoutcomesprohibitedbytheISA
![Page 17: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0](https://reader031.vdocument.in/reader031/viewer/2022041120/5f32b1ec6699724c863b6abd/html5/thumbnails/17.jpg)
TriCheck Full-StackVerificationFramework
SuiteofC/C++LitmusTests
SuiteofSmallC/C++Programs
CompilerMappingsfromC/C++toRISC-V
C/C++HerdModel RISC-VCheckModel
ISALevelOutcomeForbiddenC/C++OutcomeForbidden implies
ISAALLOWSoutcomesprohibitedbytheISA
![Page 18: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0](https://reader031.vdocument.in/reader031/viewer/2022041120/5f32b1ec6699724c863b6abd/html5/thumbnails/18.jpg)
RISC-VBase:LackofCumulativeFences
050100150200250
WR
rWR
rWM
rMM
nWR
nMM
A9like
WR
rWR
rWM
rMM
nWR
nMM
A9like
riscv-curr riscv-ours
wrc
RISC-VBaseline(Base)
TestVa
riatio
nsBugs OverlyStrict Equivalent
μSpec Model:
Variation:
Litmustest:
ISA:
• C/C++acquire/releasesynchronizationistransitive:• Accessesbeforeareleasewriteinprogramorder,andobservedbythe
releasingcorepriortothereleasewrite mustbeorderedbeforethereleasefromtheviewpointofanacquirereadthatreadsfromthereleasewrite
• BaseRISC-VISAlackscumulativefences• Minimally,theISArequiresaPredecessor-/SuccessorCumulativeFRW,RW• Cannotfixbugsby modifyingcompilercurrently
OurcurrentRISC-VproposalrequiresonlyaP-/S-CumulativeFRW,RWintheRISC-VBaseISA,andincludesaweakerP-CumulativeFRW,WFenceintheBase+Atomics extension.
![Page 19: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0](https://reader031.vdocument.in/reader031/viewer/2022041120/5f32b1ec6699724c863b6abd/html5/thumbnails/19.jpg)
Outline
• RoleofMemoryModelsinISAs• WhatShouldWeRequireFromtheHardware?• WhatFences/BarriersDoWeNeedtoSupportC/C++?• TriCheck FrameworkforFull-StackMemoryModelVerification• On-GoingWork&Conclusions
![Page 20: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0](https://reader031.vdocument.in/reader031/viewer/2022041120/5f32b1ec6699724c863b6abd/html5/thumbnails/20.jpg)
On-GoingWork&Conclusions
• WehaveformulatedanEnglishlanguagediff.ofthecurrentspec.withourproposedchanges
• CurrentlyweareconstructingaformalmodelinHerd[Alglave etal.,TOPLAS‘14]ofourproposedmemorymodelmodifications
• Memorymodeldesignchoicesarecomplicatedandinvolvereasoningaboutthesubtleinterplaybetweenmanydiversefeatures
• DefininganISAspecificationinlightoftheevaluationofasinglemicroarchitectureisnotsufficient
• TriCheck isgeneralizabletoanyISA anduncovered/quantifiedflawsintheRISC-Vmemorymode.
![Page 22: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0](https://reader031.vdocument.in/reader031/viewer/2022041120/5f32b1ec6699724c863b6abd/html5/thumbnails/22.jpg)
RISC-VBase+A:LackofTransitiveReleases
• C/C++acquire/releasesynchronizationistransitive:• Accessesbeforeareleasewriteinprogramorder,andobservedbythe
releasingcorepriortothereleasewrite mustbeorderedbeforethereleasefromtheviewpointofanacquirereadthatreadsfromthereleasewrite
0
50
100
150
200
250
WR
rWR
rWM
rMM
nWR
nMM
A9like
WR
rWR
rWM
rMM
nWR
nMM
A9like
riscv-curr riscv-ours
wrc
RISC-VBaseline+Atomics(Base+A)
TestVa
riatio
nsBugs OverlyStrict Equivalent
μSpec Model:
Variation:
Litmustest:
ISA:
• Base+A RISC-VISAlackstransitivereleases• i.e.,RISC-VacquiresdonotsynchronizewithRISC-VreleasesasrequiredbyC/C++• AMO.rl andstrongerAMO.aq.rl arebothinsufficeint• Cannotfixbugsby modifyingcompiler
• Oursolution: redefinereleaseoperationsintheBase+A RISC-VISAtobetransitive
![Page 23: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0](https://reader031.vdocument.in/reader031/viewer/2022041120/5f32b1ec6699724c863b6abd/html5/thumbnails/23.jpg)
0153045607590
WR
rWR
rWM
rMM
nWR
nMM
A9like
WR
rWR
rWM
rMM
nWR
nMM
A9like
WR
rWR
rWM
rMM
nWR
nMM
A9like
WR
rWR
rWM
rMM
nWR
nMM
A9like
riscv-curr riscv-ours riscv-curr riscv-ours
mp sb
RISC-VBaseline+Atomics(Base+A)
TestVariations
Bugs OverlyStrict Equivalent
μSpec Model:
Variation:
Litmustest:
ISA:
RISC-VBase+A:NoRoach-MotelMovementforSCAtomics• Roach-motelmovement=expansionofacquire-release
criticalsection• C++SCloadhaveC++Acquiresemantics• C++SCstoreshaveC++Releasesemantics
• RISC-VSCloadsandstoresrequirebothaq andrl bitssetonAMOs• Operationhasacquireandreleasesemantics• Prohibitsroach-motelmovement
• Oursolution: addansc bitforimplementingAMO.aq.sc andAMO.rl.sc instructionswhicharecapableofimplementingC/C++SCloadsandstores
![Page 24: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0](https://reader031.vdocument.in/reader031/viewer/2022041120/5f32b1ec6699724c863b6abd/html5/thumbnails/24.jpg)
RISC-VBase:SameAddressLdàLd Re-Ordering
• C/C++forbidssame-addressLdàLd reordering• BugsalwayswhenC/C++loadsaremappedtoregularRISC-Vloads
0153045607590
WR rWRrWMrMMnWRnMM A9 WR rWRrWMrMMnWRnMM A9
riscv-curr riscv-ours
corr
RISC-VBaseline(Base)
TestVa
riatio
nsBugs OverlyStrict Equivalent
μSpec Model:
Variation:
Litmustest:
ISA:
Initial conditions: x=0, y=0
T0 T1
a: sw x1, (x5) c: lw x3, (x5)
b: sw x2, (x5) d: lw x4, (x5)
Forbidden HLL Outcome: x1=1, x2=2, x3=2, x4=1• BaseRISC-VISAincludesFR,R• Possibletofixbugsbymodifyingcompilerwithpotentialperformancepenalty• 20.3%preliminaryestimateoffenceinsertionperformancepenaltyforARM
• Oursolution: modifyBaseRISC-Vmemorymodeltorequiresame-addressLdàLd ordering
OurcurrentRISC-Vproposalelimites FR,RfromtheRISC-VBaseISA,andrequireshardwaretoenforcesame-addressLdàLdorderbydefault.
![Page 25: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0](https://reader031.vdocument.in/reader031/viewer/2022041120/5f32b1ec6699724c863b6abd/html5/thumbnails/25.jpg)
Re-orderingDependentOperations• RISC-Vdoesnotrequireorderingfordependentinstructions• ManycommercialISAs– x86,ARM,Power– respectdependencies
• Canalsobeusedaslightweightsynchronization
• Explicitsynchronization/fencesneededwhendependencyorderingisrequiredbutnotenforcedbydefault,e.g.,Linux
• Macroread_barrier_depends()optionallyinsertsabarrier• InsertsafenceforAlpha,whichdoesnotrespectdependencies1
• InsertsnothingforRISC-V,whichdoesnotrespectdependencies2
• Oursolution:modifyBaseRISC-Vmemorymodeltorequirethepreservationofdependencyorderings.
1LinusTorvaldsetal.Linuxkernel,2016.https://github.com/torvalds/linux/blob/master/arch/alpha/include/asm/barrier.h2RISC-VFoundation.RISC-VportofLinuxkernel,2016.https://github.com/riscv/riscv-linux/blob/master/rch/riscv/include/asm/barrier.h