Improving performance using .NET Core 3.0, Span<T>, and friends
Slides from a talk at the Sydney Alt.Net meetup about improving the performance of existing applications using .NET Core 3.0, Span, stackalloc and more.
less overall GC pressure Less time allocating and deallocating objects means more CPU for you Across the framework, lots of small improvements over many classes.
Can’t use it as a field in a class (since a class is on the heap) but can use it in a struct. Can’t do async/await with it (since the compiler creates a state machine… on the heap)
a Span in a method Create it from a string, array, or something implementing IOwnedMemory. Lots of methods in .NET Core 2.1+ take Spans as arguments. Many more do so in .NET Core 3.0 (.Net Standard 2.1) https://apisof.net/
- https://adamsitnik.com/Array-Pool/ var samePool = ArrayPool<byte>.Shared; byte[] buffer = samePool.Rent(minLength); try { Use(buffer); } finally { samePool.Return(buffer); } Cheaper as soon as you need 1K of memory (or more) – and no allocations required.
an assembly into an “intern pool” and references point to them to avoid duplications. String.Intern() is for using the same concept at runtime. Warning: Strings in the intern pool are NEVER GC’ed. Great for unplanned memory leaks! Used with caution can reap large benefits in certain scenarios.
ref int second, ref int third) { ref int max = ref first; if (first < second) max = second; if (second < third) max = third; return ref max; } The method result is simply a reference to whichever value was the largest. It has zero allocations.
sloooow! https://www.danielcrabtree.com/blog/191/casting-to-ienumerable-t-is-two-orders-of- magnitude-slower Boxing operations create invisible allocations. Some boxing operations are hard to spot.
use the List<T> iterator. No casting and no hidden lambda code. public Symbol FindMatchingSymbol(string name) { foreach (Symbol s in symbols) { if (s.Name == name) return s; } return null; }
---- [StructLayout(LayoutKind.Explicit)] public struct Bid { [FieldOffset(0)] public float Value; [FieldOffset(4)] public long ProductId; [FieldOffset(12)] public long UserId; [FieldOffset(20)] public DateTime Time; } … public Bid Deserialize(ReadOnlySpan<byte> serialized) => MemoryMarshal.Read<Bid>(serialized);
stack Don’t overdo it and keep it for short-lived usage Beware: It’s easy to misuse this and make things worse Span<byte> bytes = length <= 128 ? stackalloc byte[length] : new byte[length];
specific to ARM, x64, etc. https://bits.houmus.org/2018-08-18/netcoreapp3.0-instrinsics-in-real-life-pt1 For general use the platform independent Vector SIMD instructions are preferred. (check System.Numerics.Vector.IsHardwareAccelerated)
Identify an area you want to improve Go ahead. Try and improve it. And prove it. ☺ Suggested developer loop: 1. Ensure all unit tests pass & baseline current performance 2. Make a change 3. Check unit tests still pass 4. Measure new performance and compare with baseline 5. Repeat from step 2 until happy