net core: performance stormpublic.jugru.org/dotnext/2016/spb/day_1/track_1/sitnik.pdf · •sockets...
TRANSCRIPT
![Page 1: NET Core: Performance Stormpublic.jugru.org/dotnext/2016/spb/day_1/track_1/sitnik.pdf · •Sockets (already used in ASP.NET Core) •File Streams •ADO.NET Data readers •Gains:](https://reader033.vdocument.in/reader033/viewer/2022051511/60141381be124e59e0036c4f/html5/thumbnails/1.jpg)
.NET Core:Performance Storm
Adam Sitnik
![Page 2: NET Core: Performance Stormpublic.jugru.org/dotnext/2016/spb/day_1/track_1/sitnik.pdf · •Sockets (already used in ASP.NET Core) •File Streams •ADO.NET Data readers •Gains:](https://reader033.vdocument.in/reader033/viewer/2022051511/60141381be124e59e0036c4f/html5/thumbnails/2.jpg)
About myself
Work:
• Energy trading (.NET Core)
• Energy Production Optimization
• Balance Settlement
• Critical Events Detection
Open Source:
• BenchmarkDotNet (.NET Core)
• corefxlab (optimizations)
• & more
Contact:@[email protected]
https://github.com/adamsitnik
2
![Page 3: NET Core: Performance Stormpublic.jugru.org/dotnext/2016/spb/day_1/track_1/sitnik.pdf · •Sockets (already used in ASP.NET Core) •File Streams •ADO.NET Data readers •Gains:](https://reader033.vdocument.in/reader033/viewer/2022051511/60141381be124e59e0036c4f/html5/thumbnails/3.jpg)
Agenda• Slice• ArrayPool• NativeBufferPool• ValueTask• Unsafe• Supported frameworks• Performance Radar• Questions
3
![Page 4: NET Core: Performance Stormpublic.jugru.org/dotnext/2016/spb/day_1/track_1/sitnik.pdf · •Sockets (already used in ASP.NET Core) •File Streams •ADO.NET Data readers •Gains:](https://reader033.vdocument.in/reader033/viewer/2022051511/60141381be124e59e0036c4f/html5/thumbnails/4.jpg)
Slice (Span)Slice is a simple struct that gives you uniform access to any kind of contiguous memory. It provides a uniform API for working with:
• Unmanaged memory buffers• Arrays and subarrays• Strings and substrings
It’s fully type-safe and memory-safe.Almost no overhead.
4
![Page 5: NET Core: Performance Stormpublic.jugru.org/dotnext/2016/spb/day_1/track_1/sitnik.pdf · •Sockets (already used in ASP.NET Core) •File Streams •ADO.NET Data readers •Gains:](https://reader033.vdocument.in/reader033/viewer/2022051511/60141381be124e59e0036c4f/html5/thumbnails/5.jpg)
Slice: internals
5
![Page 6: NET Core: Performance Stormpublic.jugru.org/dotnext/2016/spb/day_1/track_1/sitnik.pdf · •Sockets (already used in ASP.NET Core) •File Streams •ADO.NET Data readers •Gains:](https://reader033.vdocument.in/reader033/viewer/2022051511/60141381be124e59e0036c4f/html5/thumbnails/6.jpg)
Supports unmanaged and managed memorybyte* pointerToStack = stackalloc byte[256];
Span<byte> stackMemory = new Span<byte>(pointerToStack, 256);
IntPtr unmanagedHandle = Marshal.AllocHGlobal(256);
Span<byte> unmanaged = new Span<byte>(unmanagedHandle.ToPointer(), 256);
char[] array = new char[] { 'D', 'O', 'T', ' ', 'N', 'E', 'X', 'T' };
Span<char> fromArray = new Span<char>(array);
6
![Page 7: NET Core: Performance Stormpublic.jugru.org/dotnext/2016/spb/day_1/track_1/sitnik.pdf · •Sockets (already used in ASP.NET Core) •File Streams •ADO.NET Data readers •Gains:](https://reader033.vdocument.in/reader033/viewer/2022051511/60141381be124e59e0036c4f/html5/thumbnails/7.jpg)
Stackalloc isthe fastest way to allocate small chunks of memory in .NET
7
![Page 8: NET Core: Performance Stormpublic.jugru.org/dotnext/2016/spb/day_1/track_1/sitnik.pdf · •Sockets (already used in ASP.NET Core) •File Streams •ADO.NET Data readers •Gains:](https://reader033.vdocument.in/reader033/viewer/2022051511/60141381be124e59e0036c4f/html5/thumbnails/8.jpg)
Unmanaged memory =>deterministic deallocation + NO GC
8
![Page 9: NET Core: Performance Stormpublic.jugru.org/dotnext/2016/spb/day_1/track_1/sitnik.pdf · •Sockets (already used in ASP.NET Core) •File Streams •ADO.NET Data readers •Gains:](https://reader033.vdocument.in/reader033/viewer/2022051511/60141381be124e59e0036c4f/html5/thumbnails/9.jpg)
Uniform access to any kind of contiguous memory
public void Enumeration<T>(Span<T> buffer){
for (int i = 0; i < buffer.Length; i++){
Use(buffer[i]);}
foreach (T item in buffer){
Use(item);}
}
9
![Page 10: NET Core: Performance Stormpublic.jugru.org/dotnext/2016/spb/day_1/track_1/sitnik.pdf · •Sockets (already used in ASP.NET Core) •File Streams •ADO.NET Data readers •Gains:](https://reader033.vdocument.in/reader033/viewer/2022051511/60141381be124e59e0036c4f/html5/thumbnails/10.jpg)
Make subslices without allocations
ReadOnlySpan<char> subslice =
".NET Core: Performance Storm!"
.Slice(start: 0, length: 9);
10
![Page 11: NET Core: Performance Stormpublic.jugru.org/dotnext/2016/spb/day_1/track_1/sitnik.pdf · •Sockets (already used in ASP.NET Core) •File Streams •ADO.NET Data readers •Gains:](https://reader033.vdocument.in/reader033/viewer/2022051511/60141381be124e59e0036c4f/html5/thumbnails/11.jpg)
11
Subslicebenchmarks
![Page 12: NET Core: Performance Stormpublic.jugru.org/dotnext/2016/spb/day_1/track_1/sitnik.pdf · •Sockets (already used in ASP.NET Core) •File Streams •ADO.NET Data readers •Gains:](https://reader033.vdocument.in/reader033/viewer/2022051511/60141381be124e59e0036c4f/html5/thumbnails/12.jpg)
Interpret untyped data without copying or allocating
[StructLayout(LayoutKind.Sequential)]struct Bid{
ushort Amount;float Price;
}
void HandleRequest(Span<byte> rawBytes){
// type and memory safe that comes for free!Span<Bid> bids = rawBytes.Cast<byte, Bid>();
}
bool areBytewiseEqual = sliceOfOneType.BlockEquals(sliceOfOtherType);
12
![Page 13: NET Core: Performance Stormpublic.jugru.org/dotnext/2016/spb/day_1/track_1/sitnik.pdf · •Sockets (already used in ASP.NET Core) •File Streams •ADO.NET Data readers •Gains:](https://reader033.vdocument.in/reader033/viewer/2022051511/60141381be124e59e0036c4f/html5/thumbnails/13.jpg)
Slice: Limitations
• Pointer + Offset overhead (GC compacts the heap)• No cheap pinning (like fixed)• For the unmanaged memory only Value Types are supported• CIL Verification?
13
![Page 14: NET Core: Performance Stormpublic.jugru.org/dotnext/2016/spb/day_1/track_1/sitnik.pdf · •Sockets (already used in ASP.NET Core) •File Streams •ADO.NET Data readers •Gains:](https://reader033.vdocument.in/reader033/viewer/2022051511/60141381be124e59e0036c4f/html5/thumbnails/14.jpg)
.NET Managed Heap*
14
Gen
0
Gen
1
Gen 2 LOH
* - simplified, Workstation mode or view per logical processor in Server mode
![Page 15: NET Core: Performance Stormpublic.jugru.org/dotnext/2016/spb/day_1/track_1/sitnik.pdf · •Sockets (already used in ASP.NET Core) •File Streams •ADO.NET Data readers •Gains:](https://reader033.vdocument.in/reader033/viewer/2022051511/60141381be124e59e0036c4f/html5/thumbnails/15.jpg)
.NET Managed Heap*
15
Gen
0
Gen
1
Gen 2 LOH
* - simplified, Workstation mode or view per logical processor in Server mode
FULL GC
![Page 16: NET Core: Performance Stormpublic.jugru.org/dotnext/2016/spb/day_1/track_1/sitnik.pdf · •Sockets (already used in ASP.NET Core) •File Streams •ADO.NET Data readers •Gains:](https://reader033.vdocument.in/reader033/viewer/2022051511/60141381be124e59e0036c4f/html5/thumbnails/16.jpg)
GC Pauses
16
![Page 17: NET Core: Performance Stormpublic.jugru.org/dotnext/2016/spb/day_1/track_1/sitnik.pdf · •Sockets (already used in ASP.NET Core) •File Streams •ADO.NET Data readers •Gains:](https://reader033.vdocument.in/reader033/viewer/2022051511/60141381be124e59e0036c4f/html5/thumbnails/17.jpg)
ArrayPool• The idea comes from corefxlab, got to corefx• System.Buffers package, part of RC2• Provides a resource pool that enables reusing instances of T[]• Arrays allocated on managed heap with new operator• The default maximum length of each array in the pool is 2^20
(1024*1024 = 1 048 576)
17
![Page 18: NET Core: Performance Stormpublic.jugru.org/dotnext/2016/spb/day_1/track_1/sitnik.pdf · •Sockets (already used in ASP.NET Core) •File Streams •ADO.NET Data readers •Gains:](https://reader033.vdocument.in/reader033/viewer/2022051511/60141381be124e59e0036c4f/html5/thumbnails/18.jpg)
NativeBufferPool• The idea comes from corefxlab• Most probably will move together with Slices to corefx• System.Buffers.Experimental package, corefxlab feed only• Pool of Slices<T>• Unmanaged memory, allocated with Marshal.AllocHGlobal• The default for max size is 2MB, taken as the default since the
average HTTP page is 1.9MB
18
![Page 19: NET Core: Performance Stormpublic.jugru.org/dotnext/2016/spb/day_1/track_1/sitnik.pdf · •Sockets (already used in ASP.NET Core) •File Streams •ADO.NET Data readers •Gains:](https://reader033.vdocument.in/reader033/viewer/2022051511/60141381be124e59e0036c4f/html5/thumbnails/19.jpg)
How to work with Pools• Get your own instance
• ArrayPool<T>.Shared (Thread safe)• NativeBufferPool<byte>.SharedByteBufferPool (TS)• ArrayPool<T>.Create(int maxArrayLength, ..);• Derive custom class from ArrayPool
• Rent• Specify the min size• Bigger can be returned, don’t rely on .Length• If min size > maxArrayLength new instance will be returned
• Finally { Return }
19
![Page 20: NET Core: Performance Stormpublic.jugru.org/dotnext/2016/spb/day_1/track_1/sitnik.pdf · •Sockets (already used in ASP.NET Core) •File Streams •ADO.NET Data readers •Gains:](https://reader033.vdocument.in/reader033/viewer/2022051511/60141381be124e59e0036c4f/html5/thumbnails/20.jpg)
10 KB
20
![Page 21: NET Core: Performance Stormpublic.jugru.org/dotnext/2016/spb/day_1/track_1/sitnik.pdf · •Sockets (already used in ASP.NET Core) •File Streams •ADO.NET Data readers •Gains:](https://reader033.vdocument.in/reader033/viewer/2022051511/60141381be124e59e0036c4f/html5/thumbnails/21.jpg)
1 MB
21
![Page 22: NET Core: Performance Stormpublic.jugru.org/dotnext/2016/spb/day_1/track_1/sitnik.pdf · •Sockets (already used in ASP.NET Core) •File Streams •ADO.NET Data readers •Gains:](https://reader033.vdocument.in/reader033/viewer/2022051511/60141381be124e59e0036c4f/html5/thumbnails/22.jpg)
1 MB
22
Method Median StdDev Scaled Delta Gen 0 Gen 1 Gen 2
stackalloc 51,689.8611 ns 3,343.26 ns 3.76 275.9% - - -
New 13,750.9839 ns 974.0229 ns 1.00 Baseline - - 23 935
NativePool.Shared 186.1173 ns 12.6833 ns 0.01 -98.6% - - -
ArrayPool.Shared 61.4539 ns 3.4862 ns 0.00 -99.6% - - -
SizeAware 54.5332 ns 2.1022 ns 0.00 -99.6% - - -
![Page 23: NET Core: Performance Stormpublic.jugru.org/dotnext/2016/spb/day_1/track_1/sitnik.pdf · •Sockets (already used in ASP.NET Core) •File Streams •ADO.NET Data readers •Gains:](https://reader033.vdocument.in/reader033/viewer/2022051511/60141381be124e59e0036c4f/html5/thumbnails/23.jpg)
23
10 MB
![Page 24: NET Core: Performance Stormpublic.jugru.org/dotnext/2016/spb/day_1/track_1/sitnik.pdf · •Sockets (already used in ASP.NET Core) •File Streams •ADO.NET Data readers •Gains:](https://reader033.vdocument.in/reader033/viewer/2022051511/60141381be124e59e0036c4f/html5/thumbnails/24.jpg)
Async on hotpath
Task<T> SmallMethodExecutedVeryVeryOften(){
if(CanRunSynchronously()) // true most of the time{
return Task.FromResult(ExecuteSynchronous());}return ExecuteAsync();
}
24
![Page 25: NET Core: Performance Stormpublic.jugru.org/dotnext/2016/spb/day_1/track_1/sitnik.pdf · •Sockets (already used in ASP.NET Core) •File Streams •ADO.NET Data readers •Gains:](https://reader033.vdocument.in/reader033/viewer/2022051511/60141381be124e59e0036c4f/html5/thumbnails/25.jpg)
Async on hotpath: consuming method
while (true)
{
var result = await SmallMethodExecutedVeryVeryOften();
Use(result);
}
25
![Page 26: NET Core: Performance Stormpublic.jugru.org/dotnext/2016/spb/day_1/track_1/sitnik.pdf · •Sockets (already used in ASP.NET Core) •File Streams •ADO.NET Data readers •Gains:](https://reader033.vdocument.in/reader033/viewer/2022051511/60141381be124e59e0036c4f/html5/thumbnails/26.jpg)
ValueTask<T>• The idea comes from corefxlab, got to corefx• System.Threading.Tasks.Extensions package, part of RC2• Is Value type: • less space• less heap allocations = NO GC• probably faster memory reads• better data locality
26
![Page 27: NET Core: Performance Stormpublic.jugru.org/dotnext/2016/spb/day_1/track_1/sitnik.pdf · •Sockets (already used in ASP.NET Core) •File Streams •ADO.NET Data readers •Gains:](https://reader033.vdocument.in/reader033/viewer/2022051511/60141381be124e59e0036c4f/html5/thumbnails/27.jpg)
ValueTaskvs
Task:Creation
27
![Page 28: NET Core: Performance Stormpublic.jugru.org/dotnext/2016/spb/day_1/track_1/sitnik.pdf · •Sockets (already used in ASP.NET Core) •File Streams •ADO.NET Data readers •Gains:](https://reader033.vdocument.in/reader033/viewer/2022051511/60141381be124e59e0036c4f/html5/thumbnails/28.jpg)
ValueTask vs Task: Creation
28
Method Median StdDev Scaled Delta Gen 0 Gen 1 Gen 2
Task.FromResult 9.6268 ns 0.1796 ns 1.00 Baseline 9 793 - -
new ValueTask 1.6605 ns 0.0567 ns 0.17 -82.8% - - -
ValueTask.FromResult
1.7420 ns 0.1672 ns 0.18 -81.9% - - -
![Page 29: NET Core: Performance Stormpublic.jugru.org/dotnext/2016/spb/day_1/track_1/sitnik.pdf · •Sockets (already used in ASP.NET Core) •File Streams •ADO.NET Data readers •Gains:](https://reader033.vdocument.in/reader033/viewer/2022051511/60141381be124e59e0036c4f/html5/thumbnails/29.jpg)
ValueTask<T>: the idea• Wraps a TResult and Task<TResult>, only one of which is used• It should not replace Task, but help in some scenarios when:
• method returns Task<TResult>• and very frequently returns synchronously (fast)• and is invoked so often that cost of allocation of
Task<TResult> is a problem
29
![Page 30: NET Core: Performance Stormpublic.jugru.org/dotnext/2016/spb/day_1/track_1/sitnik.pdf · •Sockets (already used in ASP.NET Core) •File Streams •ADO.NET Data readers •Gains:](https://reader033.vdocument.in/reader033/viewer/2022051511/60141381be124e59e0036c4f/html5/thumbnails/30.jpg)
Sample implementation of ValueTask usage
ValueTask<T> SampleUsage()
{
if (IsFastSynchronousExecutionPossible())
{
return ExecuteSynchronous(); // INLINEABLE!!!
}
return new ValueTask<T>(ExecuteAsync());
}
T ExecuteSynchronous() { }
Task<T> ExecuteAsync() { }30
![Page 31: NET Core: Performance Stormpublic.jugru.org/dotnext/2016/spb/day_1/track_1/sitnik.pdf · •Sockets (already used in ASP.NET Core) •File Streams •ADO.NET Data readers •Gains:](https://reader033.vdocument.in/reader033/viewer/2022051511/60141381be124e59e0036c4f/html5/thumbnails/31.jpg)
How to consume ValueTask
var valueTask = SampleUsage(); // INLINEABLEif(valueTask.IsCompleted){
Use(valueTask.Result); }else{
Use(await valueTask.AsTask()); // NO INLINING}
31
![Page 32: NET Core: Performance Stormpublic.jugru.org/dotnext/2016/spb/day_1/track_1/sitnik.pdf · •Sockets (already used in ASP.NET Core) •File Streams •ADO.NET Data readers •Gains:](https://reader033.vdocument.in/reader033/viewer/2022051511/60141381be124e59e0036c4f/html5/thumbnails/32.jpg)
ValueTask<T>: usage && gains• Sample usage:
• Sockets (already used in ASP.NET Core)• File Streams• ADO.NET Data readers
• Gains:• Less heap allocations• Method inlining is possible!
• Facts• Skynet 146ns for Task, 16ns for ValueTask• Tech Empower (Plaintext) +2.6%
32
![Page 33: NET Core: Performance Stormpublic.jugru.org/dotnext/2016/spb/day_1/track_1/sitnik.pdf · •Sockets (already used in ASP.NET Core) •File Streams •ADO.NET Data readers •Gains:](https://reader033.vdocument.in/reader033/viewer/2022051511/60141381be124e59e0036c4f/html5/thumbnails/33.jpg)
System.Runtime.CompilerServices.Unsafe
• Provides the System.Runtime.CompilerServices.Unsafeclass, which provides generic, low-level functionality for manipulating pointers.
• Very similar to https://github.com/DotNetCross/Memory.Unsafe
• Bleeding edge, +- 4 weeks old ;)• Not in RC2, will be part of RTM, currently only corefx feed• Whole code is single IL file
33
![Page 34: NET Core: Performance Stormpublic.jugru.org/dotnext/2016/spb/day_1/track_1/sitnik.pdf · •Sockets (already used in ASP.NET Core) •File Streams •ADO.NET Data readers •Gains:](https://reader033.vdocument.in/reader033/viewer/2022051511/60141381be124e59e0036c4f/html5/thumbnails/34.jpg)
Unsafe class api
public static T As<T>(object o) where T : class;
public static void* AsPointer<T>(ref T value);
public static void Copy<T>(void* destination, ref T source);
public static void Copy<T>(ref T destination, void* source);
public static void CopyBlock(void* destination, void* source, uint byteCount);
public static void InitBlock(void* startAddress, byte value, uint byteCount);
public static T Read<T>(void* source);
public static int SizeOf<T>();
public static void Write<T>(void* destination, T value);
34
![Page 35: NET Core: Performance Stormpublic.jugru.org/dotnext/2016/spb/day_1/track_1/sitnik.pdf · •Sockets (already used in ASP.NET Core) •File Streams •ADO.NET Data readers •Gains:](https://reader033.vdocument.in/reader033/viewer/2022051511/60141381be124e59e0036c4f/html5/thumbnails/35.jpg)
Supported frameworks
35
Package name .NET Standard
.NET Framework
Release Nuget feed
System.Slices 1.0 4.5 RTM? corefxlab
System.Buffers 1.1 4.5 RC2 nuget.org
System.Buffers.Experimental 1.1 4.5 RTM? corefxlab
System.Threading.Task.Extensions 1.0 4.5 RC2 nuget.org
System.Runtime.CompilerServices.Unsafe 1.0 4.5 RTM corefx
![Page 36: NET Core: Performance Stormpublic.jugru.org/dotnext/2016/spb/day_1/track_1/sitnik.pdf · •Sockets (already used in ASP.NET Core) •File Streams •ADO.NET Data readers •Gains:](https://reader033.vdocument.in/reader033/viewer/2022051511/60141381be124e59e0036c4f/html5/thumbnails/36.jpg)
.NET Core Performance Radar
36
![Page 37: NET Core: Performance Stormpublic.jugru.org/dotnext/2016/spb/day_1/track_1/sitnik.pdf · •Sockets (already used in ASP.NET Core) •File Streams •ADO.NET Data readers •Gains:](https://reader033.vdocument.in/reader033/viewer/2022051511/60141381be124e59e0036c4f/html5/thumbnails/37.jpg)
Questions?
You can find the benchmarks at https://github.com/adamsitnik/DotNetCorePerformance