snort® intrusion detection system with intel® software ...for efficient memory management and...

21
1 Snort® Intrusion Detection System with Intel® Software Guard Extension (Intel® SGX) Dmitrii Kuvaiskii, Somnath Chakrabarti, Mona Vij {Dmitrii.Kuvaiskii, Somnath.Chakrabarti, Mona.Vij}@intel.com

Upload: others

Post on 26-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Snort® Intrusion Detection System with Intel® Software ...for efficient memory management and reduced TLB pressure. This design allows achieving 10-40Gbps throughput in software

1

Snort®IntrusionDetectionSystemwithIntel®SoftwareGuardExtension(Intel®SGX)DmitriiKuvaiskii,SomnathChakrabarti,MonaVij

{Dmitrii.Kuvaiskii,Somnath.Chakrabarti,Mona.Vij}@intel.com

Page 2: Snort® Intrusion Detection System with Intel® Software ...for efficient memory management and reduced TLB pressure. This design allows achieving 10-40Gbps throughput in software

2

Abstract

NetworkFunctionVirtualization(NFV)promisesthe benefits of reduced infrastructure,personnel, and management costs byoutsourcingnetworkmiddleboxestothepublicorprivatecloud.Unfortunately,runningnetworkfunctionsinthecloudentailssecuritychallenges,especiallyforcomplexstatefulservices.

Inthispaper,wedescribeourexperienceswithhardeningthe“kingofmiddleboxes”–IntrusionDetectionSystems(IDS)–usingIntel®SoftwareGuard Extensions (Intel® SGX) technology. OurIDSsecuredusingIntel®SGX,calledSEC-IDS,isanunmodifiedSnort®3withaDPDKnetworklayerthat achieves 10Gbps line rate. SEC-IDSguarantees computational integrity by runningallSnort®codeinsideanIntel®SGXenclave.Atthe same time, SEC-IDS achieves near-nativeperformance,withthroughputcloseto100%ofvanillaSnort®3,byretainingnetworkI/Ooutsideof the enclave. Our experiments indicate thatperformanceisonlyconstrainedbythemodestEnclave Page Cache size available on currentIntel® SGX Skylake based E3 Xeon platforms.Finally, we kept the porting effort minimal byusingtheGraphene-SGXlibraryOS:only27LinesofCode(LoC)weremodifiedinSnort®and178LoCinGraphene-SGXitself.

1. IntroductionNetwork Function Virtualization (NFV) is agrowing trend to move middleboxes such asswitches, load-balancers, and firewalls fromprivatenetworks intothecloud[1].Thisallowscompanies to significantly reduce theirinfrastructure costs and ease resourcemanagement.Theconundrum,however,ishowto protect confidentiality and integrity of

networkfunctionsoncetheyareoutsourcedtoapossiblyadversarialcloud[2].

Adding protections to outsourced networkfunctions is no easy task. To highlight itscomplexity, we focus on the “king ofmiddleboxes” – Intrusion Detection Systems(IDS’s) – and discuss how current researchefforts fail to protect them. In particular, wedissect Snort3 – themost popular andmatureopen-sourceIDS[3].

First, IDS’s are complex software systems thatdetect network attacks by analyzing all trafficagainst a set of signature-, protocol-, andanomaly-based rules. For example, Snort®decodesnetworkpackets,reassemblesstreams,maintainsflowstates,runsthemthroughastringof protocol- and application-specific detectors,performscomplexpatternmatching,andsignalsalarms on suspicious traffic. Unfortunately,state-of-the-art crypto-scheme solutions forsecuringmiddleboxes such as BlindBox [4] andEmbark [5] cannot support this rich IDSfunctionality.

Second,duetotheaforementionedcomplexityofIDS’s,it isundesirabletobuildanewsystemfromscratchortointroduceintrusivechangesintheexistingcodebase.Snort®has seven librarydependencies,with1millionLoCofC/C++code;rewriting or modifying it for security purposeswouldbetootime-consuminganderror-prone.This requirement for legacy code supportrendersthewholemiddlebox-frameworklineofresearch[6–9]unsuitableforIDS’s.

Third, IDS’smust sustain high line rates of 10-40Gbps, e.g., a typical deployment of Snort®spawns many threads to achieve desiredthroughput. This requirement is at odds withsoftware-based cryptographic solutions thatinevitablydegradeperformance[4,5].Thebetter

Page 3: Snort® Intrusion Detection System with Intel® Software ...for efficient memory management and reduced TLB pressure. This design allows achieving 10-40Gbps throughput in software

3

alternative is to use hardware support such asIntel® SGX [10] that provides integrity andconfidentialityguaranteestoapplicationswhilemaintainingacceptableperformance.

ThesethreerequirementsdictatedourapproachtohardenIDS’sforsecuritywiththeexampleofSnort®.TosidestepthecomplexityofSnort®,weputitsentirefunctionalityinsidetheIntel®SGXenclave and require (almost) no codemodifications. For performance, we keep theDPDK packet acquisition layer and the packetsthemselves outside of the enclave and feedpointers to them to Snort® via lockless ringbuffers.WecallthefinalsystemSEC-IDS.

TheportingeffortwaskeptminimalbyusingtheGraphene-SGXlibraryOS[11]:only27LoCweremodified in Snort® and 1,205 LoC in theDPDKlayer. We also resolved minor challenges ofSnort®-DPDKseparationandtrustedclocksalongthe way, leading to 178 changed LoC inGraphene-SGX.

Our evaluation shows that SEC-IDS achievesnear-native speed on workloads where Snort®state fits into Intel® SGX’s Enclave Page Cache(EPC),withthroughputcloseto100%ofvanillaSnort®. Larger workload sizes can beaccommodated with future versions of Intel®SGX,wheretheEPCsizeisexpectedtogrow.

In this work, we describe the challenges ofachieving near-native performance for Intel®SGX-hardened IDS (or indeed any high-performant complex application) as well asjustify our design choices. In particular, weprovideguidelinesoncleanlyseparatingtrustedanduntrustedpartsoftheapplicationandavoidexpensivesystemcalls.

2. BackgroundIntel® SGX. The Intel® Software GuardExtensions (SGX) is an ISAextension for recentIntel® CPUs that provides confidentiality andintegrity protections for sensitive parts of

applications[10].WithIntel®SGX,thecodeanddatatobeprotectedareputinsideanIntel®SGXenclave–aregionofmemorythatisopaquetoall other software including privilegedOS/hypervisor.Thecodeinsidetheenclavecanexecute almost all CPU instructions (seeimportant exceptions below) and can accessdata both inside and outside of enclave. Anyattempttoaccessenclavedatafromoutsideoftheenclavefails.

At the hardware level, a handful of new x86instructionswere introduced to initialize, start,resume, and exit an enclave.When in enclavemode, the CPU disallows context switches tokernel(i.e.,`syscall`and`int`instructions).Thus,whenaninterruptorasystemcallhappens,anenclave is first exited, the interrupt/syscall ishandled by the kernel, and the enclave isresumed. Sinceanenclaveexit is anexpensiveoperation, this execution pattern leads to highperformanceoverheads[12,13].

Another currently unsupported instruction is`rdtsc`, the low-overhead relative time source[14,15]. Unfortunately, a lot of real-worldsoftware relies on this instruction, usuallythrough library or system calls such as`gettimeofday` and `clock_gettime`. The usual(but not secure!) workaround is to exit theenclave on `rdtsc`, execute the instruction inuntrustedcode,andpasstheresultbacktotheenclave[11].Similartosyscalls,thiscanresultinhigh overheads if the application uses `rdtsc`extensively.

To achieve confidentiality and integrity ofenclavedata,thehardwareisaugmentedwithaMemory Encryption Engine (MEE) and theEnclave Page Cache (EPC). The EPC is adesignatedareainphysicalmemoryinaccessibleto any software other than the correspondingenclave.AlldatatransferredfromCPUtoEPCisencrypted with a key associated with thisparticular CPU, and decrypted in anotherdirection.Currently,EPCsizeisamodest128MB

Page 4: Snort® Intrusion Detection System with Intel® Software ...for efficient memory management and reduced TLB pressure. This design allows achieving 10-40Gbps throughput in software

4

with only 96MB available for user data, andenclaves exceeding this amount requireexpensive Intel® SGX-aware pagingmechanismthat securely swaps enclave pages to RAM.Because of MEE encryption and EPC paging,enclavesexperiencetwosourcesofoverhead:1-10XoverheadwhendataleavesLLCand2-2000XwhenitleavesEPC[12].Thus,itisimportanttokeep enclave code and data to aminimum, atleastoncurrentSGXservers[7,13,16].

Other parts constituting the SGX technology,suchasremoteattestation,arenotrelevantforthis paper, and we refer the reader to Intel®manualsandothersources[10].

Graphene-SGX. Graphene-SGX is a library OStailored to Intel® SGX environments [11]. Itallows running unmodified applications byproviding a runtime to transparently exit theenclave and resume it on system calls,unsupported instructions, and interrupts. Toprovide additional security against, e.g., Iagoattacks[17],Graphene-SGXintroducesshields–sophisticated checks at the enclave-untrustedappinterface.

Features that distinguish Graphene-SGX fromsimilar shielded execution frameworks[12,14,18] includeshieldeddynamicloading(tosupport dynamically loaded libraries and run-timelinking),multi-processabstractions,andfileauthentication.Forsecurity,aso-calledmanifestfileisrequiredwithawhitelistoftrustedlibrariesand files that canbeusedwithGraphene-SGX.These libraries and files are also hashed, andGraphene-SGX checks that the runtimecalculatedhash isequal totheonespecified inthemanifest.

Akin to other frameworks, Graphene-SGXsupportsmultithreadedapplications(usinga1:1threadingmodel),exceptionhandling,andasetof 28 Linux system calls. Graphene-SGX

1http://dpdk.org/

overheads are modest, with 25% lowerthroughputw.r.t.nativeforwebapplications.

Intel® DPDK. Intel® DPDK is a framework forhigh-speednetworkpacketacquisition.1ItreliesonrecentadvancesinNIChardwareandcopiesnetworkpacketcontentsdirectlyintouserspacememory, completely bypassing the kernelnetwork stack.DPDKalso leveragesHugePagesforefficientmemorymanagementandreducedTLB pressure. This design allows achieving 10-40Gbpsthroughputinsoftwaremiddleboxes.

DPDKconsistsofasetoflibrariesthatarelinkedinto the user application. For this paper, themainlibrariesareEnvironmentAbstractionLayer(EAL), `mbuf`, and `ring`. EAL abstracts awaybootstrapping details of underlying NIChardware and drivers as well as memory andthreadmanagement.The`mbuf`librarydefinesstructures and functions to store and analyzenetwork packets (DPDK does not use Linuxkernelstructures).The`ring`libraryprovidestheabstractionofanRTEring–alocklessfixed-sizeFIFOqueueofpointers,which isprimarilyusedforpassingreferencestonetworkpackets.

Snort® IDS. Snort® is an Intrusion DetectionSystem (IDS) that fetches packets from thenetwork, preprocesses and analyzes them formalicioustraffic[3].Incaseanattacksignatureisdetected,Snort®caneitherblockthepacket(ifserving as a firewall) or generate an alert forsystemadministrator.

Figure1showsthehigh-leveloverviewofSnort®functionality. Each packet from the network isdecodedtodeterminethe“protocol–sourceIP

Figure1:Snort®IDSworkflow.

Page 5: Snort® Intrusion Detection System with Intel® Software ...for efficient memory management and reduced TLB pressure. This design allows achieving 10-40Gbps throughput in software

5

–sourceport–destinationIP–destinationport”5-tuple; also, the protocol fields are examinedforsanity.Thedecodingphaseisstateless.

Next, TCP packets are grouped by liveconnections (flows) and TCP streams arereassembled for convenience of subsequentdetectionmodules.2Atthispoint,listsofallliveflowsandpendingsegments-to-reassemblearemaintained.ThisconstitutesSnort®’sstate.

After this preprocessing, the actual attackdetection is invoked. Each packet payload isanalyzedusingtwo-phasepatternmatching:firsta simple search against a set of rules isperformed.For rules thatmatched, thesecondheavyweightphasecheckswhetherthepayloadcontains a full attack signature, with regularexpressionsandflow-specificoptions.Thesetofrules is defined in a separate file andpreprocessed at Snort® startup. We discussSnort®rulesinmoredetailinAppendix1.

Finally,ifanattacksignatureismatched,analertisgeneratedandoutputintheconsoleorloggedinafile.IfSnort®isconfiguredininline(firewall)mode, itcanalsoblocktheoffendingpacket. Ifpacket is innocuous, Snort® allows the packetandpushesitbacktonetwork(ifininlinemode).

Snort®development started in1998,andsincethen the system incorporated numerousfeaturesandbecamethemostwidelyusedIDS.Snort®3akaSnort®++isamajorrewriteoftheoriginalcodebasedoneinC++;weusethisnewversion in the paper. Figure 2 presents theoverallarchitectureofSnort®3.

Snort® is multithreaded, following the “one-thread-do-all” model. Each worker threadoperatesinaninfiniteeventloop:itfetchesthenextpacketfromthenetwork,preprocessesandanalyzes it as described above, and outputsalerts.Workerthreadsarespecificallydesigned

2 Actually, Snort® keeps track of connections even for statelessprotocolslikeUDPandICMP,butweskipthesedetails.

tosharenostateforperformancereasons.Thisdesign dictates that the same network flow(characterizedbya5-tuple)mustbeprocessedbythesamethread;bidirectional flowsrequirespecialtreatment.

Duetoabundanceof features,Snort®adoptsapluggablearchitectureandusesdynamicloadingand linking extensively. It has seven librarydependencies, one of which is linked statically(LibDAQ)andothersdynamically.Preprocessing,analysis,andloggingmodulescanbewrittenasshared-libraryorLua-scriptpluginsandloadedatstartup. Snort®’s configuration file specifieswhichmodulesandpluginstouseaswellastheirspecific settings. Due to all these features, thecodebase of Snort® with all dependenciesconsistsof1millionLoCinC/C++asreportedbycloc.3

OfparticularinteresttousistheLibDAQlibraryfor network packet acquisition. LibDAQ clearlyseparatesthefetchingofpacketsfromNICandactual Snort® processing. LibDAQ’s defaultnetwork library is PCAP (libpcap) – a platform-independent interface to capture packets inuser-space.Duetoitsrelianceonkernelsupportand interrupt-driven network I/O, it cannothandle data rates of 10-40Gbps. Alas, LibDAQdoesnotprovideanofficialDPDKmoduleasofthiswriting.

3. ThreatModelandSecurityPropertiesWe assume a threat model where the onlytrusted entities are the CPU and the code

3https://github.com/AlDanial/cloc

Figure2:OriginalSnort®architecture.

Page 6: Snort® Intrusion Detection System with Intel® Software ...for efficient memory management and reduced TLB pressure. This design allows achieving 10-40Gbps throughput in software

6

executing in an SGX enclave, similar to otherpapers[11–14,16,18].Allotherunprivilegedandprivileged software including the OS and thehypervisor is potentially malicious. Physicalattacks on RAM, memory bus, or networkinterfacearealsopossible.Denial-of-serviceandside-channel attacks (including network sidechannels)[19,20]areoutofscope.

TheassetsweaimtoprotectaresummarizedinTable1. Inthefollowing,weproviderationalesfortheproposedlevelsofprotection.

(No) integrity and confidentiality of networktraffic.Themainmotivationbehindthisworkisto demonstrate transparency, usability andperformance benefits of Intel® SGX, thus weconcentrateonimmediatesecuritybenefitsthatIntel® SGX technology and Graphene-SGXframework provide out-of-the-box. For thisreason, we do not target confidentiality ofnetworktraffic,wherealltrafficpassingthroughthe middlebox is encrypted but still correctlyprocessed.ThispropertyisnottrivialtoachieveforanIDSlikeSnort®thatexpectspacketsintheclear. Intheory,packetscouldarriveencryptedand then be decrypted by Snort® beforeprocessing, but such a setup would requireprovisioningallsessionkeystoSnort®,whichisinfeasible.

We also do not aim to protect integrity ofnetwork packets. The attacker can drop orrearrange packets as well as inject maliciouspacketsormodifypartsofthetrueones(sincein

4Foranetworkattacktosucceed,theattackerneedsaccesstoaphysical device or powerful OS privileges. This can be easily

our scenario traffic is in plain text). A low-costweak-security solution would be to install asimple statistics-collecting middlebox at user’spremises and compare against statisticscollectedbyenclavisedSnort®.However, thereisnogeneralwaytoprotectintegrityofpackets:apowerfulnetworkattackercansimplymodifyincoming or outgoing packets before or afterSnort®inspection.4

Integrity of execution and configuration. Weaim to protect integrity of execution (akacomputational integrity) of Snort®: the user isassured that Snort® runs correctlywith correctconfiguration and set of rules. This is useful toverify functionality and performance of Snort®for audit purposes. In addition, Intel® SGXprotectstheconfidentialityoftheinternalstateof Snort® including metadata, flows, andstreams.Notethatitistrivialtoaddsupportforconfidentiality of rules and configuration byproviding encrypted rules/configuration filesand decrypting them inside an enclave with asecurelyprovisionedkey.

System call interface. The OS can potentiallylaunchattacksontheenclaveusingthesystemcall interface, i.e., Iago attacks [17]. TheGraphene-SGX framework already providesshieldsfor28mostcommonlyusedsystemcalls,but, aswewill see, Snort® relies on additionalsyscalls for correct functioning. Thus, it isnecessarytocarefullyexaminethesemanticsof

detected, unlike more subtle attacks on Snort® that we try topreventinthiswork.

Figure3:ArchitectureofSEC-IDS.

Asset ProtectionSnort®execution IntegritySnort®state (flows,streams,metadata)

Integrityandconfidentiality

Snort®configuration andrules

Integrity and possibly confidentiality(needsecurelyprovisionedkeys)

Networktraffic Outofscope

Table1:Snort®’sassetsconsideredforprotection.

Page 7: Snort® Intrusion Detection System with Intel® Software ...for efficient memory management and reduced TLB pressure. This design allows achieving 10-40Gbps throughput in software

7

these syscalls. We revisit this requirement inSection4.

4. ArchitectureOurarchitecturalgoalistorunSnort®insideanIntel® SGX enclave with near-nativeperformanceandminimal changes to the codebase.Toaccomplish this,wechosethepathofiterativerefinement,firstputtingallSnort®codeinsidetheenclaveandthenaddingfeaturesonebyone.Afterfiveiterations,wecameupwiththefinalarchitecturepresentedonFigure3. In thefollowing, we discuss encountered challengesandoursolutions.

Challenge1:Snort®anditsdependenciesinsideenclave.As a first step,we needed to port allSnort® functionality, including librarydependencies and loadable plugins, inside anenclavewithminimumeffort.

AnaïvesolutionwouldbetousetheIntel®SGXSoftware Development Kit (SDK).5 The SDKprovides building blocks to create, run, andattestenclaves, sealdata tobe storedondisk,and tools to develop ECALL/OCALL enclaveinterfaces.EventhoughtheSDKcanbeusedtoport complex existing applications, themanualeffort to define all required interfaces and tomodifytheoriginalcodebaseissignificant(e.g.,11,078LoCofchangesinSGX-Tor)[21].Manuallyportinghuge codebases is known tobeerror-prone, leading to new vulnerabilities andnegating the promise of Intel® SGX-enabledsecurity.

The appropriate solution is to use a shieldedexecutionframeworkthathidesthecomplexityofadjustingapplicationstoenclaves.Thereisanumber of frameworks to choose from:Haven[14], Graphene-SGX [11], SCONE [12], andPanoply [18]. Haven is an early closed-sourcedWindows-basedLibraryOS systemwitha large

5https://software.intel.com/en-us/sgx-sdk

210MB memory footprint. Graphene-SGX is aLibrary OS for Linux applications with a muchsmaller TCB. SCONEandPanoplydecreaseTCBevenfurtherbutdonotsupportdynamiclinkingandloadingrequiredbySnort®.Infact,SCONE’sfeatures of user-level threading andasynchronous syscalls are ineffective in theSnort® +DPDK case: Snort® invokesno systemcalls during normal execution (except`clock_gettime` that we discuss next). Lastly,Panoply prioritizes minimal TCB overperformanceandthusexhibitshigheroverheadsthanGraphene-SGX.

Ultimately,wesettledonGraphene-SGXsinceitperfectly aligns with Snort®’s requirements: itsupportsdynamiclinkingandloading,OSsignals,1:1multithreading, and file authentication. Ontop of this, Graphene-SGX provides highperformance and strong security via syscalls’shields.

Even with Graphene-SGX, porting Snort®required some code and build-systemmodifications. We encountered three minorissues:

(i) The hwloc library – to set affinity of Snort®workerthreadsforbetterperformance–issues`sched_setaffinity` and `sched_getaffinity`systemcalls.Thekernel issupposedtosetandgetaCPUaffinitymask,butthere isnowaytoprove correctnessof theseactions fromwithinthe enclave. Thus, issuing these syscalls to amaliciousOS is at best futile and atworst cantrigger obscure application bugs (e.g., Snort®assumedthat`sched_getaffinity`neverfailsandwentintoinfiniteloopwhenwetriedtostubthesyscall with a dummy value). In the end, wedecidedtoremovethehwlocdependencyfromSnort® altogether, patching 27 LoC in a singlefile.6

6ThiswastheonlychangewemadeinSnort®itself.WewouldevenarguethatthiswasbeneficialforSEC-IDS:byremovinghwloc,weinstantlydecreasedTCBby40,000LoC.

Page 8: Snort® Intrusion Detection System with Intel® Software ...for efficient memory management and reduced TLB pressure. This design allows achieving 10-40Gbps throughput in software

8

(ii)The luajit librarycontainsanembeddedLuainterpreter to parse Snort®’s configuration fileand Lua-based plugins. The library uses a rare`MAP_32BIT` flag when allocating memory via`mmap`. This flag instructs the OS to allocatememory in first 2GB of process address space,for outdated performance reasons. Graphene-SGXdidnotsupportthisflag,thusweintroducedaone-linepatchtoGraphenetoignoreitinsidethe enclave. Note that this change does notaffect security since enclaves use their ownmemorymanagementanyway.

(iii) The libpcap library is used for user-levelpacketcapture in thePCAP format. Itoperateson so-called “raw sockets” to fetch IP packets.The only types of sockets Graphene-SGXsupports are TCP/UDP, and adding shieldingsupport for a too-powerful primitive like rawsocketswouldgoagainsttheLibOSphilosophy.WeremovedlibpcapasadependencysinceSEC-IDS already uses the DPDK library residingoutsideoftheenclaveforpacketacquisition.

These were the only changes required to runSnort®anditsdependenciesinsideanIntel®SGXenclave.

Challenge 2: DPDK outside enclave. Our nextchallengewas to provide an interface to allowcommunication between Snort® threads insidetheenclaveandDPDKthreadsoutsideofit.

We first need to explain our decision to leaveDPDK (and LibDAQ for thatmatter) outside oftheenclave.Afterall,itispossibletohaveDPDKcodeandstateinsidetheenclave.However,wesee three problems with this design. First, itwould still requireadding low-levelnetworkingsupport to Graphene-SGX, similar to libpcap.Second,puttingallDPDKcode–113,000LoC–inside the enclave unnecessarily bloats TCB.Third, to store fetched packets, DPDK usesHugePages not supported by Intel® SGX, and

7https://github.com/napatech/daq_dpdk_multiqueue

there is little sense to protect DPDK code andmetadatawhilethepacketsthemselvesareleftinuntrustedmemory.

Asofthiswriting,thereisnoofficialsupportforDPDK in Snort® 3. Thus,weused a fork of theLibDAQ library with a DPDK module fromNapatech.7Thelibraryisfeature-richandallowsto correctly load-balance network flows acrossmultipleSnort®threads,usingtheReceiveSideScaling(RSS)networkdrivertechnology[22].

TheoriginaldesignofthisDPDKmoduledoesnotsuitourneeds:itcombinesDPDKpacketfetchingand Snort® processing in one thread (“one-thread-do-all” model). In other words, eachworker thread runs an infinite loopwith threesteps: (1) receive packets from network in aburst, (2) run Snort® analysis oneach receivedpacket, and (3) transmit each allowed packetbacktonetwork,ifSnort®isininlinemode.

SincewewanttorunDPDKcodeoutsideoftheenclave, we break this functionality into twoparts(compareFigures2and3).WedesignateM“DPDK threads” that perform steps 1 and 3 asabove.However, insteadofdirectlycalling intoSnort®, each DPDK thread puts pointers tofetchedpacketsinaRXringandgetspointerstoallowedpacketsfromaTXring,ifininlinemode.AsecondsetofN“Snort®threads”performstep2,i.e.,actualSnort®analysis.EachSnort®threadgets the next available pointer from a RX ring,analyzes the packet pointed to, and puts thepointertothispacketinaTXringifthepacketisallowed.

RXandTXringsareimplementedasRTElocklessrings and serve as a main communicationchannel between the enclavized Snort® andoutsideDPDK. They are createdat startupandallocated in unprotected HugePages memory.Note that these rings contain only pointers topackets (more precisely, to their `mbuf` DPDK

Page 9: Snort® Intrusion Detection System with Intel® Software ...for efficient memory management and reduced TLB pressure. This design allows achieving 10-40Gbps throughput in software

9

representations), allowing zero-copy packetanalysis.

There is one caveatwithRX rings. Each Snort®thread operates on its own subset of networkbidirectionalflows.Thus,wecannotuseasingleshared RX ring: all Snort® threads would readfromthisringinnoparticularorderandnetworkflowswillbecomejumbled.Toalleviatethis,wecreate as many RX rings as there are Snort®threads. For every fetched packet, a DPDKthreadlooksintothelastsixbitsoftheRSShashand maps them to the corresponding RX ring.Since all packets in the same network flowproducethesameRSShash,theyendupinonlyoneSnort®thread.

Pushing packets back to the network requiresonlyasinglesharedTXring.ThisiswhyweuseaMultiple ProducerMultiple Consumer (MPMC)TX ring, but allow Multiple Producer SingleConsumer(MPSC)RXrings.

Figure4illustratestheinterfacebetweenSnort®andDPDKthreadsviaRXandTXrings.Notethatthe rings contain pointers to `mbuf` objects,denoted with an apostrophe. Also note thatSnort® sendspacketsviaTX ringandTXqueueonlyininlinemode.

UsingseparateDPDKandSnort®threadsentailstwoadditionalbenefits.First,sinceeachofthesethreads runs on a separate CPU core, the L1cache is less stressed than in the “one-thread-

do-all”design.Second,ourexperimentsindicatethat a singleDPDK thread is enough to sustain10Gbpsnetworkload.

Challenge 3: Interfaces between Snort® andDPDK. There are two interfaces between theenclavizedSnort®anduntrustedDPDK:(1)RX/TXringstopasspointerstonetworkpackets,and(2)OCALLs to initialize and shutdown LibDAQ andDPDK.

TheinterfaceofRX/TXringswasdescribedintheprevious section. It is theonly interfaceduringrun-time, with asynchronous “exitless”communication. This interface is what makesSEC-IDSperformonparwithvanillaSnort®.

For initialization and finalization of DPDK, SEC-IDSrequiresanotherinterfacewithfunctioncallinvocations. In particular, we introduce five

Figure4:InterfacebetweenSnort®andDPDKthreadsviaRX/TXrings.

OCALLname Descriptiondpdk_initialize Process command-line arguments,

allocatememoryforpacketsandRX/TXrings,initializeEAL

dpdk_start_device SetupRXandTXqueues forEthernetdevice, set RSS hash for bidirectionalflows,startEthernetdevice

dpdk_acquire Start infinite loop of DPDK threads:retrieveburstofpacketsfromnetwork,passtoRXring,readfromTXringandpushallowedpacketsbacktonetwork

dpdk_stop CloseEthernetdevicedpdk_shutdown Close Ethernet device and free all

allocatedmemory

Table2:OCALLinterfacebetweenSnort®andDPDK.

Page 10: Snort® Intrusion Detection System with Intel® Software ...for efficient memory management and reduced TLB pressure. This design allows achieving 10-40Gbps throughput in software

10

OCALLs from enclave to untrusted code (seeTable2).

We manually add these OCALLs to Graphene-SGX andpatch theDPDKmodule in LibDAQ tocallthem.Thetypicalworkflowisthusasfollows(with the example of `dpdk_initialize`): (1)Snort® starts inside the enclave, (2) it callsLibDAQ initialization routine, (3) LibDAQrecognizesthatitisinsidetheenclaveandcalls`ocall_dpdk_initialize`, (4) this OCALL exits theenclave, performs untrusted `dpdk_initialize`,andresumesenclaveexecution.

NotethatwelinkLibDAQbothinsideandoutsideof the enclave. The enclavized LibDAQ iscompiledtoserveasaminimalredirectionlayer:it only invokes OCALLs. The outside LibDAQperformsactualDPDKinitialization/finalization.

The five OCALLs are invoked only on SEC-IDSstartup and teardown and do not affectperformance.

Challenge4:Trustedclock.AfterweseparatedenclavizedSnort®anduntrustedDPDKthreads,wemadesurenosystemcallswereissuedbytheapplication. To our surprise, the overhead ofSEC-IDSovervanillaSnort®stillwas1,000%!

Upon further examination, we noticed anunusually high number of enclave exists andresumes. Graphene-SGX exits the enclave onlyonsystemcalls,butweobservednoneviastrace,sowhatwashappening?

The root cause turned out to be the`clock_gettime` system call. In usualenvironments,`clock_gettime`isvirtualizedwithvDSOlibrary,i.e.,insteadofcontext-switchingtothe kernel, this system call is resolved in userspace via the `rdtsc` instruction. In Graphene-SGX, since `rdtsc` isdisallowed insideenclaves,`clock_gettime` is treated as a syscall: upon itsinvocation, Graphene-SGX exits the enclave,executes the virtualized version outside, andresumes secure execution. Thus, we observe

manyenclaveexits(thatdegradeperformance)butnoactualsystemcalls.

Snort®heavilyrelieson clock_gettime`,invokingit at least twice for each packet: it is used tomeasure timeouts, TCP/UDP flow expirations,packet latencies, and passage of time forstatistics. Thus, we could not eliminate thissyscall and had to find a performantworkaround. Note that if we would execute`rdtsc`outsideandpasstheresultingclockbacktotheenclave,wewouldbesusceptibletoIagoattacks (thehacker couldeasily subvert Snort®executionbyprovidingbogustimevalues).

In the end, we settled for a “trusted-clock”thread, technique described in the “MalwareGuard Extension” paper [15]. We introduce ahelper“clock”threadinGraphene-SGXthatrunsinside the enclave and infinitely increments aglobal variable. Additionally, we stubbed`clock_gettime`toreadthevalueofthisvariable,adjustittoourCPU’sspeed,andreturntimeinmicroseconds.Thissimpletechniqueprovidesa“good enough” relative time source for ourpurposes.DetailscanbefoundinAppendix2.

5. ImplementationDetailsWe used the following versions of software:Intel® DPDK 17.05.1, Intel® SGX Driver v1.9(commit 3abcf82), Graphene-SGX commit4d8eacd, Napatech’s fork of LibDAQ v2.2.1

Software TotalLoC ChangedLoCSnort®Snort®binaryLibDAQhwlocotherlibstotal

228,63340,52340,397701,2341,010,787

27(0.02%)1,205(3%)N/A*0(0%)1,232(0.12%)

Graphene-SGXGraphenebinarynewOCALLsclockthreadtotal

1,233,484157211,233,662

0(0%)157(100%)21(100%)178(0.01%)

Table3:Linesofcodeinallusedsoftware.*hwlocwasremovedfromdependencies.

Page 11: Snort® Intrusion Detection System with Intel® Software ...for efficient memory management and reduced TLB pressure. This design allows achieving 10-40Gbps throughput in software

11

(commit7c40e02),Snort3tagBUILD_239,LuaJITv5.1,PCREv8.41,zlibv1.2.11,OpenSSLv1.1.0.

All changes in Snort® and Graphene-SGX aresummarized in Table 3. We expand on thechangesbelow.

InSnort®codebase,wemodified27LoCinonefile (`thread_config.cc`) to disable thedependencyonhwloc.

InNapatech’sLibDAQ,wemodified1,205LoCinone file (`daq_dpdk.c`) to introduce the ECALLinterface and add more functionality like RSS-based mapping to Snort® threads. Forexperiments, we also added code to manuallypinSnort®threadstophysicalcores.

InGraphene-SGX,wemadeseveralchanges:(1)one-line change to support `MAP_32BIT`, (2)adapted the build system to link againstuntrustedLibDAQandDPDK,(3)added157LoCforfiveOCALLs,and(4)added21LoCfortrusted-clockthread.

Graphene-SGX requires all its dependencies tobe built with `-fPIC` flag (as position-independent libraries). We added this flag toDPDKandLibDAQbuildsandobservednodropinperformance.ForenclavizedSnort®build,welink it against the “dummy” LibDAQ that onlyredirectsfunctioncallstotheoutsideLibDAQviaOCALLs.

WeidentifiedandfixedamultithreadingbuginGraphene-SGXthatledtoprematureexhaustionof enclave thread slots. We also added an`exit_group` syscall to terminate all processthreads – this was needed to kill our trusted-clockthread.Finally,weobservedaninfrequentdataracesomewhereinthememory-allocationlogicofGraphene,butcouldnotpinpointitsrootcause.

6. EvaluationIn our evaluation, we aim to answer thefollowingquestions:

• What is the overhead of SEC-IDS withrespect to vanilla Snort® in terms ofachievablethroughput?

• Whatistheeffectofthepacketsizeandthenumberofflowsinworkload?

• What is the effect of the number ofrules,enabledfunctionality,andlogginginSnort®configuration?

• WhatisthescalabilityofSEC-IDS?

Systemplatform.Weusetwoserversconnectedvia a 10Gbps NIC. Each server has an Intel®Xeon® CPU E3-1270 v5 @ 3.60GHz with onesocket, 4 physical cores and hyper-threadingdisabled,64GBofDDR42133MhzRAM,8MBL3cache, 256KB per-core L2 cache, and 32KBinstructionanddataper-coreL1caches.TheNICisX710for10GbESFP+1572,andweusevfio-pciDPDKdriver.WeuseUbuntu16.04withkernelv4.4.0. We assign 16 1GB-sized HugePages(totaling16GBofRAM)andpinDPDKandSnort®threads to dedicated cores. For workloadgeneration,weuseIntel®PktGenv3.4.0.

Methodology.WeranexperimentsonourSEC-IDSaswellasvanillaSnort®.ByvanillaSnort®wemeantheSnort®+DPDKversionwithLibDAQandDPDKcodeasdescribedabovebutwithoutSGXenclaves(andthuswithoutGraphene-SGX).WeusedefaultconfigurationfileofSnort®andtheset of rules `community.rules` from theofficialweb-site.WealsousethedefaultbuildsystemofSnort®.

Weusetwoworkloadscenarios:synthesizedandreal-pcap ones. Synthesized workloads areTCP/IPpacketswithrandompayloadsgeneratedby PktGen. We vary packet sizes from 64B to1024B, numberof simultaneousnetwork flows(TCP connections) from 256 to 32,000, andnumber of Snort® rules used for patternmatching from 0 to 3462 (all rules in`community.rules`). Note that the number offlowsinarealnetworkismuchhigher,typically

Page 12: Snort® Intrusion Detection System with Intel® Software ...for efficient memory management and reduced TLB pressure. This design allows achieving 10-40Gbps throughput in software

12

millionsofsimultaneousflows;however,PktGensegfaultedonmorethan32,000flows.

We also use three real PCAP workloads,downloaded from Tcpreplay.8 Thecharacteristicsoftheseworkloadsareasfollows:(1) `test.pcap` contains 141 packets with 37flows and average packet size 445B, (2)`smallFlows.pcap`contains14,261packetswith1,209 flowsandaveragepacket size646B, and(3) `bigFlows.pcap` contains 791,615 packetswith40,686flowsandaveragepacketsize449B.These threeworkloads stress Snort® rules andoutputrealalerts.

Each experiment is run for 2minutes:we firstinitialize PktGen and start sending workloadpackets(PktGenrepeatspacketsifitreachestheendofaPCAPfile),thenstartSnort®,waitfor2minutes, send a SIGINT signal to Snort® togracefully terminate, logall Snort®output,andstopPktGen.Snort®outputcontainsstatisticson

8http://tcpreplay.appneta.com/wiki/captures.html

packetsreceived,analyzed,andallowed,aswellasthroughputnumbers.Weusethesestatisticstodrawourplots.Appendix3hasdetailsonhowexactlywerunSnort®andPktGen.

Each experimentwas run three times, andweuse a standard mean across three runs. Thestandarderrorislessthan1%inallexperiments.

Resultswithsynthesizedworkloads.Figures5-7show Snort® throughput with varied packetsizes,numberofflows,andnumberofrules.

One immediate thing to note: in many cases,SEC-IDS performs slightly better than vanillaSnort®.Thereasonisourtrusted-clockthreadinthe Intel®SGXversion; thevanilla versionusesregular `rdtsc`. The trusted-clock mechanisminducesalowerlatencyper`rdtsc`,thusSEC-IDSexecutes `clock_gettime` slightly faster thanvanilla.

Figure5:ThroughputofSEC-IDSandvanillaSGXwithincreasingpacketsize.

Figure6:ThroughputofSEC-IDSandvanillaSGXwithincreasingnumberofTCPflows.

Figure7:ThroughputofSEC-IDSandvanillaSGXwithincreasingnumberofrules(max3462).

Page 13: Snort® Intrusion Detection System with Intel® Software ...for efficient memory management and reduced TLB pressure. This design allows achieving 10-40Gbps throughput in software

13

Wechosetoshowfourextremepointsforeachvaried parameter. For example, Figure 5 haspacketsizeonX-axisandfourcombinationsof#flows and # rules. On Figure 5a, we show the“performant”configurationwitha tinynumberofflows(256)andnorulesatall(i.e.,nopatternmatching).Thisconfigurationclearlyhasthebestperformance for both Snort® and SEC-IDS: thestatethatSnort®keepsisverysmallbecausethenumberofflowsisverysmall.Theprocessingperpacket is very fast because there is nopattern

matching. On Figure 5d, we see anotherextreme:Snort®’sstateislarge(upto620MBfor

onethreadand812MBfortwothreads)becausethereare32Kflowsandprocessingperpacketisslow because it requires matching of all 3462rules. Figures 5b and 5c show intermediatepoints:withmany flows but no rules andwithtinynumberofflowsbutallrules.Figures6and7followthesamepattern.

It is clear that increasing packet sizes alsoincrease throughput of both vanilla and SGXSnort®.Asthepacketsizeincreases,theratioof

Snort®processingto“bytesreceived”decreases.Inotherwords, theoverhead isdeterminedby

Figure8:ThroughputofSEC-IDSandvanillaSGXwithoutandwithconfigurationfile(increasingpacketsizes).

Figure9:ThroughputofSEC-IDSandvanillaSGXwithoutandwithoutputtingalerts(increasingpacketsizes).

Figure10:ThroughputofSEC-IDSandvanillaSGXwithreal-worldPCAPtraces(increasingnumberofrules).

Page 14: Snort® Intrusion Detection System with Intel® Software ...for efficient memory management and reduced TLB pressure. This design allows achieving 10-40Gbps throughput in software

14

thenumberofpacketsandnotbythenumberofbytesperpacket.

Figure 6 indicates that increasing number offlows has no effect on vanilla Snort® buthampersperformanceofSEC-IDS.IthintsattheIntel® SGX limitation: storing a flow in Snort®requires 2-4KB of memory, and while vanillaSnort® is not limited by RAM, SEC-IDS quicklyexhausts the EPC size and requires expensiveEPCpaging.ThisisprovedbytheCPUutilizationof the Intel® SGX swapping daemon(ksgxswapd):with32,000flowsitpeaksto12%fortwothreadsand24%forthreethreads(whileinothercasesitiszero).WealsonoticethatSEC-IDSperformswellwithonethreadbutsuffocateswithtwothreads,clearlyseenonFigures6band6d. This is because two Snort® threads do notshare any state and thus require (partial)duplicationofmemory.

Figure7showstheimpactofthenumberofrules.WetakefirstNrulesfromthe`community.rules`file for this experiment, with themaximum of3,462 rules. Here we observe that throughputdropsslightlyinbothvanillaandSGXSnort®,butthereisnodifferencebetweenvanillaandIntel®SGXversions(clearlyseenonFigure7c).This isunderstandable:thewholesetofrulesoccupiesonly28MBanddoesnotstressmemory.

Finally,wecancompare1-threadand2-threadsversions on Figures 5-7. Vanilla Snort® showsperfectly scalability, doubling its throughputwhen moving from a single thread to twothreads.SEC-IDShasamorecomplexbehavior.Withasmallnumberofflows,SEC-IDSalsoscalesperfectly. However, as the number of flowsincreases,thescalabilityofSEC-IDSdeclines;thisis seen especially in Figures 6b and 6d. Wehighlightedthereasonsforthisabove.

As an additional experiment,we looked at theeffect of the configuration file. Figure 8 showstwoplots:withoutaconfigurationfileontheleftandwithitontheright.RunningSnort®without

configuration means that it does not performany preprocessing or analysis of packets: itsimply fetches packets from the RX ring andimmediately allows them (aka useless mode).Clearly,withoutanyprocessing,Snort®doesnotkeepanystateandhasalmost-zero latency foreachpacket.Assoonasweaddtheconfigurationfile, Snort® runs at full capacity, andperformance drops for both vanilla and Intel®SGX versions. Note that Intel® SGX versionperformspoorlyinFigure8bbecausewestressitwith32,000flows.

Figure9showstheeffectofenablingalerts,i.e.,showing them in the console. There is noperformance degradation for both vanilla andSGXSnort®exceptonecase:SEC-IDSis4Xworsewith alerts on 1024B packets and 2 threads(right-most bar). This is again due to the EPCpaging:atthispoint,`ksgxswapd`CPUutilizationjumpsfrom25%to30%.

Finally,weexperimentedwith three real-worldPCAP traces: test (very small, only 37 flows),small (1,209 flows), and big (40,686 flows).Figure10showstheresults,withtheincreasingnumber of rules. Similar to observations onsynthesized workloads, SEC-IDS performanceoverhead isa functionof thenumberof flows:witha`big`PCAPandtwothreads,throughputisroughlyhalved.

Ingeneral,weconcludethattheonlysourceofperformance degradation of SEC-IDS is thelimitedEPCsize.Thisisclearwhenwestressthesystem with 32,000 flows and a 2-threadsconfiguration. In low-memory-footprint cases,SEC-IDSperformsverysimilartovanillaSnort®.AdditionalplotscanbefoundinAppendix4.

7. RelatedWorkIntel®SGX-enabledmiddleboxes.Thebenefitsof Intel® SGX for network middleboxes wereexplored in previousworks, though never at ascale of IDS’s [7—9,23]. For example, LightBox[7]introducesanewframeworkforIntel®SGX-

Page 15: Snort® Intrusion Detection System with Intel® Software ...for efficient memory management and reduced TLB pressure. This design allows achieving 10-40Gbps throughput in software

15

enabled network functions, with secureconfiguration and data channels and a smartstatecachetoreducethenegativeimpactofEPCpaging. In a similar vein, TrustedClick [8] andSlick [9] add secure communication channelswith remote attestation to the existing Clickarchitecture. None of these projects aims tosupport unmodified complex IDS’s such asSnort®.

S-NFV [23] manually separates only the state-handling code and data (kept in the enclave)fromtherestofapplication.Asonecasestudy,S-NFVmodifies Snort® to keep tag state in theenclave: the results indicate huge developereffort and high overheads for a small piece offunctionality. It is unclear how this approachscalestoflowstatesandstreamreassembly,aswell as what security implications result fromleavingmostofSnort®codeanddataoutsideofenclave.

IDS’sandSnort®.ResearchonIDS’sandSnort®inparticular gaineda lotof attention from thesecurity community. To the best of ourknowledge, all these efforts concernedincreasing performance and not security ofSnort®.Forexample,KargusreengineersSnort®using (1) batch processing at all stages and (2)CPU-GPU parallel execution for patternmatching[24].Similarly,GASPPrewritesSnort®to run completely on GPUs for performancebenefit[25].WearenotawareofanyattemptstohardencompleteSnort®forsecurity.

Intel® SGX-enabled network applications. Theworkclosesttoours inspirit is therecentSGX-Tor [21]. Similar to our work, SGX-Tor movessecurity-critical functionality of the Toranonymitynetworkapplication insidean Intel®SGXenclave.Unlikeourwork,SGX-Tormodifiesasignificant3.4%(or11,078LoC)oftheoriginalcode and uses the Intel® SGX SDK for manualseparation.Finally,SGX-Torprovesthatasmartpartitioning into enclave and untrusted partsleadstolowperformanceoverheadsof4-12%.

8. LessonsLearnedSupport for manual separation in shieldedexecution frameworks. Our Snort® case studyhighlights an interesting hybrid of unmodifiedexecution andmanual separation. On the onehand, it would be tedious and error-prone toreorganize a mature application like Snort® touse Intel® SGX enclaves, thus a shieldedexecutionframeworkisanaturalchoice.Ontheotherhand,thedevelopermightwanttotweakthe application to achieve better performancebymovingsomepartsoutsideoftheenclave.

Based on our experience with manual Snort®-DPDK separation in Graphene-SGX, we believethat a generic extension to add application-specificOCALLswouldbebeneficialandeasytoimplement.Eveninourcase,withGraphene-SGXnotdesignedformanualOCALLs,wewereableto achieve clear separation in only 157 LoC.Expanding Graphene-SGX and probablycombiningitwiththeflexibilityoftheIntel®SGXSDKwouldbeastepintherightdirection.

Manualseparationatthelibraryinterface.Ourexperience with Snort® and LibDAQ proves anapparentintuition:separationbetweenenclaveand untrusted code/data is easy at the libraryinterface. Libraries are designed to be self-containedand toexposeonly aminimal setoffunctions. This provides an obvious choice forOCALLs. Additionally, libraries usually exposetheirstateinawell-definedsetofstruct-objects;this shared state is allocated by the libraryoutside of the enclave and the pointer to itshouldbepassedtotheenclave.

Performance limitations of current Intel® SGXserversaretemporary.Manycurrentlimitations– small EPC size, unavailability of the `rdtsc`instruction – will be addressed in futureimplementationsofIntel®SGX.OurworkonSEC-IDS indicates that there will be no tangibleperformancelossoncetheEPCsizeissufficientlylarge.

Page 16: Snort® Intrusion Detection System with Intel® Software ...for efficient memory management and reduced TLB pressure. This design allows achieving 10-40Gbps throughput in software

16

Bibliography

[1] Justine Sherry, Shaddi Hasan, Colin Scott, ArvindKrishnamurthy,SylviaRatnasamy,andVyasSekar.Makingmiddleboxessomeoneelse'sproblem:networkprocessingasacloudservice.SIGCOMM'2012

[2] Luca Melis, Hassan Jameel Asghar, Emiliano DeCristofaro, Mohamed Ali Kaafar. Private Processing ofOutsourced Network Functions: Feasibility andConstructions.arXiv:1601.06454,2016

[3]MartinRoesch.Snort®-LightweightIntrusionDetectionforNetworks.LISA'1999

[4]JustineSherry,ChangLan,RalucaAdaPopa,andSylviaRatnasamy. BlindBox: Deep Packet Inspection overEncryptedTraffic.SIGCOMM'2015

[5] Chang Lan, Justine Sherry, Raluca Ada Popa, SylviaRatnasamy, and Zhi Liu. Embark: securely outsourcingmiddleboxestothecloud.NSDI'2016

[6] Aurojit Panda, Sangjin Han, Keon Jang, MelvinWalls,SylviaRatnasamy,andScottShenker.NetBricks:takingtheVoutofNFV.OSDI'2016

[7]HuayiDuan,XingliangYuan,CongWang.LightBox:SGX-assisted SecureNetwork Functions atNear-native Speed.arXiv:1706.06261,2017

[8]MichaelCoughlin,EricKeller,andEricWustrow.TrustedClick:OvercomingSecurityissuesofNFVintheCloud.SDN-NFVSec'2017

[9]BohdanTrach,AlfredKrohmer,SergeiArnautov,FranzGregor, Pramod Bhatotia, Christof Fetzer. Slick: SecureMiddleboxes using Shielded Execution.arXiv:1709.04226,2017

[10] Costan, Victor, and Srinivas Devadas. Intel® SGXExplained.IACRCryptologyePrintArchive2016

[11]Chia-CheTsai,MonaVij,andDonaldPorter.Graphene-SGX:APracticalLibraryOSforUnmodifiedApplicationsonSGX.USENIXATC’2017

[12]SergeiArnautov,BohdanTrach,FranzGregor,ThomasKnauth,AndreMartin,ChristianPriebe,JoshuaLind,DivyaMuthukumaran, Dan O'Keeffe, Mark L. Stillwell, DavidGoltzsche,DavidEyers,RüdigerKapitza,PeterPietzuch,andChristofFetzer.SCONE:secureLinuxcontainerswithIntel®SGX.OSDI'2016

[13] Meni Orenbach, Pavel Lifshits, Marina Minkin,andMark Silberstein. Eleos: ExitLess OS Services for SGXEnclaves.EuroSys'2017

[14]AndrewBaumann,MarcusPeinado,andGalenHunt.ShieldingapplicationsfromanuntrustedcloudwithHaven.OSDI'2014

[15] Michael Schwarz, Samuel Weiser, DanielGruss, Clémentine Maurice, Stefan Mangard. MalwareGuard Extension: Using SGX to Conceal Cache Attacks.arXiv:1702.08719

[16]Dmitrii Kuvaiskii,OleksiiOleksenko, Sergei Arnautov,BohdanTrach,PramodBhatotia,PascalFelber,andChristofFetzer. SGXBOUNDS: Memory Safety for ShieldedExecution.EuroSys'2017

[17] StephenCheckowayandHovavShacham.2013. Iagoattacks: why the system call API is a bad untrusted RPCinterface.ASPLOS'2013

[18] Shweta Shinde, Dat Le Tien, Shruti Tople, PrateekSaxena. Panoply: Low-TCB Linux Applications with SGXEnclaves.NDSS’2017

[19] Shweta Shinde, Zheng Leong Chua, VisweshNarayanan, and Prateek Saxena. Preventing Page FaultsfromTellingYourSecrets.ASIACCS'2016

[20]MarcusHaehnel,WeidongCui, andMarcusPeinado.High-Resolution Side Channels for Untrusted OperatingSystems.USENIXATC’2017

[21] SeongminKim, JuhyengHan, JaehyeongHa, TaesooKim, and Dongsu Han. Enhancing Security and Privacy ofTor'sEcosystembyUsingTrustedExecutionEnvironments.NSDI’2017

[22] Shinae Woo, Kyoungsoo Park. Scalable TCP SessionMonitoringwithSymmetricReceive-sideScaling.TechnicalReport.2012

[23]Ming-Wei Shih,MohanKumar, TaesooKim, andAdaGavrilovska.S-NFV:SecuringNFVstatesbyusingSGX.SDN-NFVSecurity'2016

[24] Muhammad Asim Jamshed, Jihyung Lee, SangwooMoon,InsuYun,DeokjinKim,SungryoulLee,YungYi,andKyoungSooPark.Kargus:ahighly-scalablesoftware-basedintrusiondetectionsystem.CCS'2012

[25] Giorgos Vasiliadis, Lazaros Koromilas, MichalisPolychronakis, and Sotiris Ioannidis. GASPP: a GPU-acceleratedstatefulpacketprocessingframework.USENIXATC'2014

Page 17: Snort® Intrusion Detection System with Intel® Software ...for efficient memory management and reduced TLB pressure. This design allows achieving 10-40Gbps throughput in software

17

Appendix1:Snort®RulesSnort®rulesareusedtodetectanomaliesinthenetwork,maliciouspackets,andhackerattacks.Most Snort® rules are written as regularexpressions with additional protocol/flowproperties attached. Rules that are morecomplicated are written in C++ or Lua anddistributedwithSnort®orasseparateplugins.

Setsofrulesarecompiledinasinglefileloadedat Snort® startup. These files can be freelydownloadedfromanofficialSnort®web-site(so-called “community” rules). There are also paidrule subscriptions that are updated morefrequentlyandcontainextrarules.Inthiswork,weusethe“community.rules”file.

As of date of this writing, the community setcontains3,462rules.Atypicalruleisaone-linerthatlooksasfollows:

alerttcp$HOME_NET[21,25,443,465,636,992,993,995,2484]->$EXTERNAL_NETany(msg:"OpenSSLSSLv3largeheartbeat response - possible ssl heartbleed attempt";flow:to_client,established,only_stream;content:"|180300|", depth 3; byte_test: 2,>,128,0,relative; metadata:policy balanced-ips drop, policy security-ips drop, rulesetcommunity; service: ssl; reference: cve,2014-0160;classtype:attempted-recon;sid:30514;rev:9;)

TheaboveruledetectsaHeartbleedattackandgeneratesanalertwithamessage`msg`.Snort®will examine all TCP packetswith source ports21,25,443,etc.(typicalportsthatuseSSL/TLS)that are sent to client over an establishedconnection,withfirst3bytescontaining`180300`(messagetype)andthenext2beinggreaterthan “128” (i.e., payload length is greater than128 bytes – indication of Heartbleed).Additionally,theruleinstructstodropthepacketifinIntrusionPrevention(inline)mode.TherestfieldsindicateCVEofvulnerability,ruleset,andruleSIDandversion.

Appendix2:Trusted-clockThreadWeimplementedthetrusted-clockthreadusingthe following assembly code. We changedGraphene-SGX to start an additional enclavethread that infinitely executes`clock_thread_main`.The`trusted_clock`globalvariableservesasatimesource.

volatilelongunsignedtrusted_clock;intclock_thread_main(void*unused){trusted_clock=0;asmvolatile("mov%0,%%rcx\n\t""mov(%%rcx),%%rax\n\t""1:inc%%rax\n\t""mov%%rax,(%%rcx)\n\t""jmp1b":/*nooutputoperands*/:"r"(&trusted_clock):"%rax","%rcx","cc");return0;}

We patch the `ocall_gettime` system call toaccess `trusted_clock` and adjust it to returntimeinmicroseconds.

intocall_gettime(unsignedlong*microsec){#defineCPUFREQ3785.0externvolatilelongunsignedtrusted_clock;*microsec=(longunsigned)(trusted_clock/CPUFREQ);return0;}

Forourprototype,weskippedtheissuesaroundinteger overflow and clock drift. We alsohardcode the coefficient `CPUFREQ` foradjustment to reflect our particular machine.Noteasubtlesecurityimplication:themaliciousOS can preempt the clock thread at will andartificially slow down the passage of time.However,theOScannotrevertpassageoftime,sincethecounteronlymonotonicallyincreases.

We consider our clock-thread feature atemporary workaround, since ultimately the`rdtsc` issue should be fixed in Intel® SGXhardware.

Page 18: Snort® Intrusion Detection System with Intel® Software ...for efficient memory management and reduced TLB pressure. This design allows achieving 10-40Gbps throughput in software

18

Appendix3:CommandsforExperimentsFirst, we modified the GRUB2 boot loader’ssettingsforLinux:

GRUB_CMDLINE_LINUX = "default_hugepagesz=1GBhugepagesz=1Ghugepages=16iommu=ptintel_iommu=onintel_idle.max_cstate=0 intel_pstate=disable isolcpus=2-3nohz_full=2-3"

Inparticular,weinstructLinuxtodedicate16GBofRAMto161GB-sizedHugePages, touseSR-IOVpass-through(pt)mode,todisableC-states(low-powermodes),andto isolateCPUcores2and3(i.e.,noschedulingof interruptsorotherprocesses on these two cores). We thenmanuallypinSnort®threadstocores2and3.

A typical command to run PktGen with asynthesizedworkloadlooksasfollows:

sudo pktgen -l 0-1 -n 2 -m 4096 -- -P -m 1.0 -ftest_64B_256F.lua

The command initializes PktGen to runGUI oncore 0, one DPDK thread on core 1 (handlingRX/TX queues on port 0), with 2 memorychannels and 4GB of HugePages RAM. PktGenenables promiscuous mode on all ports andreads parameters of the workload from file`test_64B_256F.lua`. This file contains a Luascript to start sending 64B packets in 256 TCPflowsonport0.

A typical command to runPktGenwithaPCAPworkloadlooksverysimilar:

sudopktgen-l0-1-n2-m4096---P-m1.0-ftest_start.lua-s0:smallFlows.pcap

Here we add `-s` to specify that we want tostream contents of a PCAP file on port 0. Thescript`test_start.lua`simplystartstransmission.

AtypicalcommandtorunSnort®orSEC-IDS:

sudo-ELD_LIBRARY_PATH="$LD_LIBRARY_PATH"snort®--daqdpdk-idpdk0--daq-vardpdk_args="-n2-l1-m4096"–z3–csnort®.conf–Rcommunity_100.rules–Afast

The command executes Snort® with rootprivileges and passes `LD_LIBRARY_PATH`(required to find library dependencies). Thearguments instruct Snort® to use `dpdk` DAQmodule and call interface `dpdk0` (byconvention). DPDK arguments are specified In`dpdk_args` andare similar to theonesabove.We also specify three threads to run (one forDPDK, two for Snort®), configuration file`snort®.conf`, rules file `community_100.rules`with100rules,andthe`fast`alertoutput.

Appendix4:AdditionalExperimentsForcompleteness,wereportevaluationresultsforpercentageofdroppedandanalyzedpackets.

Figures 4.1 – 4.5 show the percentage ofdropped packets as reported by DPDK. VanillaSnort®almostneverdropspackets,i.e.,therateofnetworkpackets’consumptionishigherthanthe capacity of the DPDK receive queue.Surprisingly, SEC-IDSdrops5-20%ofpackets inall configurations – except the “uselessmode”withoutaconfigurationfileatall–withoutanyparticular pattern or cause. Upon furtherinvestigation, it turned out that the startupphase of SEC-IDS (first 6-9 seconds of itsexecution) sawdrop rates close to100%whileduringthenormalexecutionthedroprateneverexceeded1%. This behavior is understandable:during startup,SEC-IDSactively swapspages inand out, with the Intel® SGX paging daemon`ksgxswapd`consuming5-50%ofCPUtime.SeeAppendix5forexamplestatistics.

Figures 4.6 – 4.10 show the percentage ofanalyzed packets as reported by Snort®. Thesefigures exhibit the same patterns as our mainevaluation(Figures5—10).Ingeneral,byaddingmore Snort® threads, it would be possible toanalyze 100% packets. We project that 8-10Snort®threadsandasingleDPDKthreadwouldbe sufficient to saturate a 10Gbps link(conditionedon the largerEPCsizeavailable innext-generationIntel®SGXmachines).

Page 19: Snort® Intrusion Detection System with Intel® Software ...for efficient memory management and reduced TLB pressure. This design allows achieving 10-40Gbps throughput in software

19

Figure4.1:PercentageofdroppedpacketsofSEC-IDSandvanillaSGXwithincreasingpacketsize.

Figure4.2:PercentageofdroppedpacketsofSEC-IDSandvanillaSGXwithincreasingnumberofTCPflows.

Figure4.3:PercentageofdroppedpacketsofSEC-IDSandvanillaSGXwithincreasingnumberofrules(max3462).

Figure4.4:PercentageofdroppedpacketsofSEC-IDSandvanillaSGXwithoutandwithconfigurationfile.

Figure4.5:PercentageofdroppedpacketsofSEC-IDSandvanillaSGXwithoutandwithoutputtingalerts.

Page 20: Snort® Intrusion Detection System with Intel® Software ...for efficient memory management and reduced TLB pressure. This design allows achieving 10-40Gbps throughput in software

20

Figure4.6:PercentageofanalyzedpacketsofSEC-IDSandvanillaSGXwithincreasingpacketsize.

Figure4.7:PercentageofanalyzedpacketsofSEC-IDSandvanillaSGXwithincreasingnumberofTCPflows.

Figure4.8:PercentageofanalyzedpacketsofSEC-IDSandvanillaSGXwithincreasingnumberofrules(max3462).

Figure4.9:PercentageofanalyzedpacketsofSEC-IDSandvanillaSGXwithoutandwithconfigurationfile.

Figure4.10:PercentageofanalyzedpacketsofSEC-IDSandvanillaSGXwithoutandwithoutputtingalerts.

Page 21: Snort® Intrusion Detection System with Intel® Software ...for efficient memory management and reduced TLB pressure. This design allows achieving 10-40Gbps throughput in software

21

Appendix 5: Example of Drop Rate andCPUUtilizationAsshowninAppendix4andFigures4.1-4.5,thedrop rateof SEC-IDS is veryhigh, ranging from5% to20%.Our suspicionwas that the startupphaseofSEC-IDScausedtheskewinthenumberof dropped packets. The root cause is that atstartup, the enclave needs to initialize a lot ofcodeanddata,whichleadstoahighrateofEPCpaging.Thefollowingtableshowsoneexamplerunwitha3-secondbreakdownof(1)droprateand (2) CPU utilization of EPC paging, i.e.,`ksgxswapd`.

ToremovethisskewfromthedroprateFigures4.1-4.5, we could re-design our experimentssuchthatSEC-IDSisfirststarted,warmedupforseveral seconds, and only then is loaded byPktGen. Fortunately, our current numbers onthroughput (Figures5-10)arestill correctsincethesestatisticswerecollectedbySnort®afteritwentintosteadystate.

Second Droprate,% ksgxswapdCPUutilization,%

0 0.0 0.03 100.0 49.36 100.0 19.79 0.95 0.012 0.21 4.715 0.78 3.018 0.15 27.221 0.0 0.024 0.0 0.027 0.0 0.030 0.0 0.033 0.0 0.036 0.0 0.039 0.0 0.042 0.28 0.045 0.0 0.048 0.0 0.051 0.90 0.054 0.79 0.057 0.67 0.060 0.98 0.0

Table5.1:ExamplerunfortheSEC-IDSconfiguration:2threads,1024Bpacketsize,256flows,3462rules.