sql server on linux, will it perform? - qcon london … · hosted windows apis sql server ... •...

28
SQL Server on Linux, will it perform? Slava Oks

Upload: lamminh

Post on 06-Sep-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

SQLServeronLinux,willitperform?

SlavaOks

ThankYou!MicrosoftResearch

Windowsteam

Midori

OurgoalistomakeSQLServerperformandscaleonanyplatformorhardwareof

customerschoice

Prolog:MeetthePALs

• ModifiedWindowsKerneltoruninusermode,akaLibraryOSorLibOS

• DesignedforrunningonWindowsandleveragesPico-processfeature

• Pico-processisaNTprocesswithemptyaddressspace

IntrotoDrawbridge:Acontainertechnologytoachieveisolation,securityanddensityinthecloud

NT processshared

address space

user32gdi32

ntdll

host

O

S ntoskrnl

win32k

400+NT calls

800+Win32 calls

Picoprocesspicoprocess

isolated address space

ABIboundaryPAL

host

O

S security monitor

ntoskrnl

45calls

• All1200+systemcallsblockedfromuser-mode(NTOSandwin32k)

• Enforcedby35-linechangetoKiSystemServiceHandler

• Noperfimpacttootherprocesses—leverages“slowpath”usedbyUMS

• 45newsystemcallsaddedtoprocess(Drawbridgesystemcalls)

• Evenhard-codedtrapscan’tbreakout

NT UM

LibOS:AusermoderuntimelibraryexposingsemanticsofWindowskernel

Network Stack

I/O

Object Manager

Process Manager

DRTL Simple HeapUnion FS

AFD

Wait Pool

Threads Memory Manager

Loader

PEB/TEB

PAL ABI Handler

Sync Objects ThreadsStreams Memory Manager

APC

• StorageManager

• AsynchronousI/OsubmittedtothehostandregisteredwithWaitPool threads

• OncompletionWaitPool threadsdeliverI/Os totheoriginalthreadthroughAPC

• OriginalthreadsdeliverI/Os totheirfinaldestination

• NetworkManager

• CustomversionofAFD(WinSocksemantics)withathreadpool

• AFDAsynchronousI/OsubmittedtothehostandregisteredwithWaitPoolthreads

• OncompletionWaitPool threadsdeliverI/Os toAFDthreadsthroughAPC

• threadsdelivernetworkrequeststooriginalthreadsinitiatedI/OthroughAPC

• OriginalthreadsdeliverI/Os totheirfinaldestination

• I/OGeneral

• NopropersupportforScatter/Gather

• MemoryManager

• GlobalVirtualAddressDescriptor(VAD)list

• GlobalHeap

• ObjectManager

• GlobalDirectory

• ProcessManager

• Perprocessruntimelibraries– noimagesharing

• Threads

• APCs“injection”throughpolling

SQLOS(SOS):Ausermoderuntimelibraryprovidingperformance,scalabilityanddiagnostic

foundationforSQLServer

Memory Node

CPU Node

Network Manager

Scheduler

Storage Manager

Scheduler

Storage Manager

CPU Node

Network Manager

Scheduler

Storage Manager

Scheduler

Storage Manager

• NetworkManager

• I/Ocompletionport/threadperCPUNode

• Asynchronousdelivery

• StorageManager

• I/Oqueueperscheduler

• Synchronousdeliverythroughperiodicpolling

• MemoryManager/ObjectManager/SchedulingManager

• NUMAawareness

• Partitionedheaps

• Non-preemptivescheduling&UserModeThreads

• Synchronizationprimitives

Chapter1:SQL&PALs

Themarriageinheavenor…

SQLServerOnTopOfPALs

Ring3

SQLServer

SQLOS

Win32

LibOS

PAL

Ring0 LinuxKernel

Drawbridge

Technologies SQL LibOS Host Extension

Object Management ✔ ✔ ✔

MemoryManagement ✔ ✔ ✔

Threading/Scheduling ✔ ✔ ✔

Synchronization ✔ ✔ ✔

I/O(Disk,Network) ✔ ✔ ✔

Chapter2:Thesignisonthewall

IntroducingIntelligentHacks

• Kernelaio

• PumpthreadsvsWaitPool threads

• FastI/O

// We can do Fast I/O if and only if it follows rules employed by SQL Server// disk I/O: which is delivered nonpreemptively through polling an overlapped// data structure// - I/O is asynchronous// - No user mode APC required// - No I/O completion port specified// - Contains an event to be signaled (leveraged by SQL Server to wake up idle scheduler// - Disk I/O//

canDoFastIO = WaitForCompletion == FALSE;

canDoFastIO = canDoFastIO && (ApcRoutine == NULL && FileObject != NULL);

canDoFastIO = canDoFastIO && (Args->SkipCompletionPort ||NtpGetCompletionPortObject(FileObject,

&CompletionKey) == NULL);

canDoFastIO = canDoFastIO && (Args->EventObject != NULL && IoStatusBlock != NULL);

canDoFastIO = canDoFastIO && (NtpGetObjectType(Args->Object) == NTUM_FILE &&NtpIsIoAsynchronous(Args->Object));

canDoFastIO = canDoFastIO && ((FileObject->Type & NtpSeekableFile) &&(Type == NTUM_IO_READ ||Type == NTUM_IO_WRITE ||Type == NTUM_IO_WRITE_GATHER ||Type == NTUM_IO_READ_SCATTER));

// If it is Gather/Scatter I/O then length can't exceed DK_UIO_MAXIOV supported by the Host//canDoFastIO = canDoFastIO && (!(Type == NTUM_IO_WRITE_GATHER ||

Type == NTUM_IO_READ_SCATTER) ||Length <= DK_UIO_MAXIOV);

• PumpthreadsvsWaitPool

• FastI/O~AFDpassthrough

• SQLOScompletionthreadsarepumpthreads~nocontextswitchoncompletion

// Complete I/Os received via the the IOPort are submitted to the I/O// completion port queueStatus = NtpTryToProcessIoCompletion(IoCompletionPort,

IoCompletionInformation);

// Process any APCs or interruptions for this thread. //NtpProcessKernelApc(threadObject);

Request.IOPort = IoCompletionPort->IOPort;Request.PendingIOs = &PendingIOs;

Status = DrtlReadStreamSync(IoCompletionPort->Stream,0,0,(PVOID)&Request,NULL);

while (PendingIOs != NULL){

//// Remember I/O to complete and move to the next I/O before// we complete the current one since by the time we return from// completion routine the completed I/O will be freed//CompletedIO = PendingIOs;PendingIOs = (PDK_ASYNC_RESULTS_LINKED)PendingIOs->Next;

//// Complete I/O//NtpCompleteNetworkIoRequest((PNTUM_IO_REQUEST)CompletedIO->Request);

}

• MultipleHeaps

• I/ORequestfreelistperthread

• PerprocessVirtualAddressSpaceManager

• NUMAsupport

• ProcessorAffinity

PVOIDDrtlAllocate(

__in ULONG Flags,__in SIZE_T Size,__in ULONG Tag)

{ULONG heapIdx;

//// Early boot we might not have a thread//heapIdx = DrtlGetCurrentThreadId() % g_DrtlNumberHeaps;

return DrtlpAllocate(&g_DrtlHeaps[heapIdx], Flags, Size, Tag);}

NtpAllocateIORequestRaw(__in NTUM_IO_TYPE Type)

{

// Use cache if we have i/O request//LocalRequest = (PNTUM_IO_REQUEST)ExpInterlockedPopEntrySList(

&RequestingThread->IORequestsCache);

// If the cache was empty allocate a new request structure.//if (LocalRequest == NULL){

LocalRequest = (PNTUM_IO_REQUEST)ExAllocatePoolWithTag(PagedPool,sizeof(*LocalRequest),' PRI');

}

Chapter3:PressureisOn

HardwareConfigurationPowerSettings:OSControlpoweroption,HighPerformanceinOS,HTOFF,TurboboostOFFNetwork:1x10GBNetworkconnectionpermachineMachine configuration(serverandclient):Gen3systemsModel/Processors:[email protected](2S/16C),Memory:128GBStorage:4x447.13GBSSDs.AllSSDsarestripedtogetherandmountedas1volume.Bothdata

andlogarestoredonthisvolume.

HardwareConfigurationPowerSettings:OSControlpoweroption,HighPerformanceinOS,HTOFF,TurboboostOFFNetwork:1x10GBNetworkconnectionpermachineMachine configuration(serverandclient):4Ssystems(forTPCCtest)Model/Processors:[email protected](4S/40C),Memory:768GBDataStorage:2x1.46TBGBFusionIOdisk.Alldisksarestripedtogetherandmountedas1volume.LogStorage:1x5.54TBHDD

Chapter4:TheultimatePAL

IntroducingSQLPAL

Principles:

• Removeredundancy

• OptimizePerformancecriticalpaths(I/O)

• Shrinkcodepath-lengthLibOSandWin32

Technologies SQL SOSv2 Host Extension

Object Management ❌ ✔ ❌

MemoryManagement ❌ ✔ ✔Hosttranslation(jemalloc)

Threading/Scheduling ❌ ✔ ✔Hosttranslation(pthreads)

Synchronization ❌ ✔ ✔Hosttranslation(conditionvariables)

I/O(Disk,Network) ❌ ✔ ✔Hosttranslation(kaio)

Ring3

SQLServerWin32

SOSv2Lib-OS

HostExtension

Ring0 LinuxKernel

SQLPAL

SQLPALandSOSv2Architecture

HostExtensionandIntegration HEDebuggerBridge

SOSv2(Memory,Scheduling,Synchronization)

StorageManager

NetworkManager

ResourceManager

ProcessManager

SecurityManager

AvailabilityManager

NTUserMode

ConfigManager

PALDebuggerExtension

HostedWindowsAPIs

SQLServer

SOSDirectAPIs

Chapter5NaturalHabita(n)t

LinuxProcessLayout

•HostExtensionisnativeLinuxprocess

• TheHostExtensionloadstheSQLPALnativeWindowslibrary

• SQLPALloadsSQLServerintoavirtualWindowsProcess.

SQLServer(WindowsBinary)

LibOS(WindowsKernelinUserMode)

HostExtension(LinuxorOSX)

Win32Calls(1200+)

ABICalls(50)

LinuxorOSXOSCalls

LinuxorOSXOS

LLDB

Deb

ugger

Debugger

• DebuggerbridgeforWindbg

• FormostscenariosdebuggingisidenticaltoWindows

• LiveDebugging

• StartSQLonLinuxunderdebuggerbridge

• AttachwithWindbg

• Dscripts etc.worksameasagainstWindows

• CrashDump

• Rundebuggerbridgepassingincrashdumpfile

• AttachwithWindbg andit’sthesameasWindows

• ExtractWindowsdumpfromLinuxCoredump

• AbletoextractaWindowsdumpfromLinuxcoredump

• LosesLinuxinformation

• LinuxEnlightenment

• ThedebuggerextensionalsoaddscommandstodebugLinuxpartsofthePAL

• CommandsmirrornormalWindbg commands

• Examples:• ‘k’showsWindowsstack• ‘!k’showsLinuxstack• Samefordv(!dv),dt (!dt),etc.• Sourcecanbelistedandsourcesteppingworks

• VTune isacrossplatformperformancetool

• Process

• CaptureonLinuxandresolveonLinux

• CopytheprojecttoWindows

• Resolvesymbolsandrerunanalysis

• ThisaddstheWindowsinformationtotheproject

• Afterprocessingallthecodeisavailableforanalysis:Linuxcode,sqlpal.dll,Win32,andSQL

Chapter6:ThegameisON

ThankYou