Slide 1

Slide 1 text

Implementation Details Matter David Fowler Partner Software Architect Microsoft

Slide 2

Slide 2 text

About Me • Partner Software Architect at Microsoft on the .NET team. • Co-creator of NuGet, SignalR, ASP.NET Core and the Azure SignalR Service. • I spend lots of my time helping customers debug and diagnose complex issues with .NET in production. • I spend the remainder of time in meetings and writing code.

Slide 3

Slide 3 text

Why this talk? • .NET has been evolving into a modern, high performance platform. • I’ve been spending a chunk of my time helping teams at Microsoft design high performance libraries and diagnosing problems in large .NET based services. • There’s a lot of FUD surrounding premature optimization that I’d like to help dispel.

Slide 4

Slide 4 text

Why do implementation details matter? • Software is built by humans with specific assumptions and scenarios in mind. • Understanding those assumptions can make a big difference at scale. • Understanding the relevant details can help you root cause problems quickly.

Slide 5

Slide 5 text

What are some of those assumptions? • “There should only be a handful of these instances in practice.” • “This should only happen once at the start of the application, so it can be expensive.” • “Nobody is going to create this per request.” Software Engineer

Slide 6

Slide 6 text

What does “at” scale mean? Scale for an application can mean the number of users, the amount of input (like the size of data), or number of times that data needs to be processed (e.g number of requests). Scaling as an engineer means knowing what to ignore and knowing what to pay close attention to.

Slide 7

Slide 7 text

How much do I need to know? • Be curious and learn as much as you can about any area you are working in. • There will be a point of diminishing returns. • Building up a mental model for how things work can be very effective. • Figure out the “cliff notes” of an area. TL;DR

Slide 8

Slide 8 text

How do I learn the details? • Build things. • Reading code others wrote. • Talking to subject matter experts. • Reading the documentation, including the fine print. • Debug the code. • Stack Overflow answers • Read the source code again.

Slide 9

Slide 9 text

Is this a performance talk?

Slide 10

Slide 10 text

Examples of these assumptions in .NET • The GC generally optimizes for throughput, not reduced memory usage. • Timers are optimized for creation and deletion. The assumption is they don’t fire most of the time. • Configuration is expected to be built once on application start.

Slide 11

Slide 11 text

Examples from the real world • To put things in context, we’re going to use real examples. • Various teams at Microsoft use .NET extensively for large scale services. • We’ll look at some problems and root causes. • These code samples are in the hot path.

Slide 12

Slide 12 text

The names of teams have been changed to protect the innocent.

Slide 13

Slide 13 text

Show me the code

Slide 14

Slide 14 text

internal static byte[] DecompressBytes(byte[] raw) { using (var stream = new GZipStream(new MemoryStream(raw), CompressionMode.Decompress)) { const int size = 4096; byte[] buffer = new byte[size]; using (var memory = new MemoryStream()) { int count; do { count = stream.Read(buffer, 0, size); if (count > 0) { memory.Write(buffer, 0, count); } } while (count > 0); return memory.ToArray(); } } } input buffer Copy buffer Another internal buffer Creates a new buffer Copy into internal buffer Allocates an 8K buffer internally

Slide 15

Slide 15 text

Hardcoded at 8K

Slide 16

Slide 16 text

Excessive buffer allocations and copies • This pattern is very common in server applications. • Lots of large temporary buffers that are thrown away per operation. • This can wreak havoc on the GC and your application as a result. • Be mindful about excessive buffer allocations and copies. • Use Streams/Pipelines for large data sets. • Pool and re-use buffers when you need to operate on in-memory data.

Slide 17

Slide 17 text

internal static Stream DecompressBytes(byte[] raw) { return new GZipStream(new MemoryStream(raw), CompressionMode.Decompress); } Make the caller use a Stream Optimizing the code sometimes means changing the pattern completely.

Slide 18

Slide 18 text

public static string OptimizeString(string value) { if (value != null) { string interned = string.IsInterned(value); if (interned != null) { return interned; } } return value; } What does this do? Optimize what?

Slide 19

Slide 19 text

This is a global lock This is the map/dictionary

Slide 20

Slide 20 text

What’s the fix? • This code didn’t need to intern the strings • If the string is already interned, then the check is cheap and lock free • Strings that aren’t interned pay the cost of the global lock • If you need to intern strings, consider implementing your own cache or using a nuget package that does it.

Slide 21

Slide 21 text

public static IEnumerable GetOccurences(this Operation root, Func predicate) { var visitedActivityIds = new HashSet(); var unvisitedOperations = new Queue(); unvisitedOperations.Enqueue(root); while (unvisitedOperations.Any()) { var currentOperation = unvisitedOperations.Dequeue(); if (visitedActivityIds.Contains(currentOperation.ActivityId)) { continue; } visitedActivityIds.Add(currentOperation.ActivityId); if (predicate.Invoke(currentOperation)) { yield return currentOperation; } currentOperation.ChildActions.Where(r => r.IsComplete).ToList().ForEach(unvisitedOperations.Enqueue); } } LINQ LINQ List allocation Delegate allocation

Slide 22

Slide 22 text

public static IEnumerable GetOccurences(this Operation root, Func predicate) { var visitedActivityIds = new HashSet(); var unvisitedOperations = new Queue(); unvisitedOperations.Enqueue(root); while (unvisitedOperations.Count > 0) { var currentOperation = unvisitedOperations.Dequeue(); if (!visitedActivityIds.Add(currentOperation.ActivityId)) { continue; } if (predicate.Invoke(currentOperation)) { yield return currentOperation; } foreach (var item in currentOperation.ChildActions) { if (item.IsComplete) { unvisitedOperations.Enqueue(item); } } } } Directly use Count Add replaces the Contains call Use a normal foreach loop

Slide 23

Slide 23 text

Let's talk about LINQ… • LINQ to objects is powerful and expressive. • It should not be used in your application’s hot paths. • When you need to reduce allocations or save CPU cycles, LINQ is an easy target. • Optimized for what it does • There are lots of special cases to avoid expensive calls.

Slide 24

Slide 24 text

public JsonDomNode(byte[] JsonDOMString, int length) { string str = Encoding.UTF8.GetString(JsonDOMString, 0, length); this.JsonDOMString = str.ToCharArray(); Parse(this.JsonDOMString, 0); } str is thrown away

Slide 25

Slide 25 text

public JsonDomNode(byte[] JsonDOMString, int length) { this.JsonDOMString = Encoding.UTF8.GetChars(JsonDOMString, 0, length); Parse(this.JsonDOMString, 0); } Directly get the char[]

Slide 26

Slide 26 text

private bool CheckIfFilePathIsValid(string localPath) { Regex validPath = new Regex(@"^[a-zA-Z]:\\(?[a-z0-9\-]+)\\$", RegexOptions.IgnoreCase | RegexOptions.Compiled); string potentialId = validPath.Match(localPath).Groups["id"].Value; string localPathRoot = Path.GetPathRoot(localPath).ToLower(); string systemLocalPathRoot = Path.GetPathRoot(Environment.SystemDirectory).ToLower(); Guid unused; return (Guid.TryParse(potentialId, out unused) && (!localPathRoot.Equals(systemLocalPathRoot))); } Compiled Regex Per call 2 ToLower calls Case insensitive comparison

Slide 27

Slide 27 text

Compiled Regexes • Regex compilation is about ~5000 lines of code. • It generates an IL specific to the regular expression specified. • Every new call will create new methods that needs to be JITTed. • Cache these aggressively.

Slide 28

Slide 28 text

Regex validPath = new Regex(@"^[a-zA-Z]:\\(?[a-z0-9\-]+)\\$", RegexOptions.IgnoreCase | RegexOptions.Compiled); private bool CheckIfFilePathIsValid(string localPath) { string potentialId = validPath.Match(localPath).Groups["id"].Value; if (!Guid.TryParse(potentialId, out var unused)) { return false; } string localPathRoot = Path.GetPathRoot(localPath); string systemLocalPathRoot = Path.GetPathRoot(Environment.SystemDirectory); return localPathRoot.Equals(systemLocalPathRoot, StringComparison.OrdinalIgnoreCase); } Compile once and cache Check for Guid first Remove ToLower calls and do case insensitive comparison

Slide 29

Slide 29 text

public static string GetLogLevel(LogLevel logLevel) { return logLevel.ToString(); } How does this work?

Slide 30

Slide 30 text

Get the types and values for the enum type (this is cached) Binary search to find the index

Slide 31

Slide 31 text

public static string GetLogLevel(LogLevel logLevel) => logLevel switch { LogLevel.Trace => "Trace", LogLevel.Debug => "Debug", LogLevel.Information => "Information", LogLevel.Warning => "Warning", LogLevel.Error => "Error", LogLevel.Critical => "Critical", LogLevel.None => "None", _ => throw new NotSupportedException() }; Constant strings

Slide 32

Slide 32 text

private static void ScanForExpiredItems(MemoryCache cache) { DateTimeOffset now = cache._lastExpirationScan = cache._options.Clock.UtcNow; foreach (CacheEntry entry in cache._entries.Values) { if (entry.CheckExpired(now)) { cache.RemoveEntry(entry); } } } This is a ConcurrentDictionary

Slide 33

Slide 33 text

ConcurrentDictionary • Is conceptually Dictionary with more granular locking. • Reads are lock free. • Writes are not! • Several APIs lock the entire collection (.Keys, .Values etc). • Some APIs take snapshots of the underlying collections (allocations and copies). • The default number of concurrent writes is equal to the number of processors on the machine (you’ll have that many locks by default).

Slide 34

Slide 34 text

private static void ScanForExpiredItems(MemoryCache cache) { DateTimeOffset now = cache._lastExpirationScan = cache._options.Clock.UtcNow; foreach (KeyValuePair item in cache._entries) { CacheEntry entry = item.Value; if (entry.CheckExpired(now)) { cache.RemoveEntry(entry); } } } Enumerate entries directly

Slide 35

Slide 35 text

Lessons • Measure what you care about! Do not trust your intuition. • Think about how things are implemented. • Quick to call != Quick to execute. • Everything is a tradeoff. • The compiler isn’t nearly as smart as you think it is…

Slide 36

Slide 36 text

Research Tools • The .NET source (https://source.dot.net) • Sharplab (https://sharplab.io) • Grep.app (https://grep.app/) • Decompilers (ILSpy etc) • Github issues/discussions (ask questions!) • C# Discord

Slide 37

Slide 37 text

Learning journey • This is not “bad code”, we’re all on a learning journey. • Learn from the code you’ve written in the past. • Nobody is perfect. • These examples came from Microsoft. • I learned some things making these slides.

Slide 38

Slide 38 text

Implementation details are volatile • The details are constantly evolving. Make sure your knowledge is up to date! • Specific details often change, concepts don’t change as often. • Continuous measurement is important. Things tend to improve over time.

Slide 39

Slide 39 text

Recipe • Build a mental model for the major parts of the system. • Figure out which parts need to scale. • What’s on the hot path? • Figure out how those parts work, down to the relevant details. • Optimize your code appropriately.

Slide 40

Slide 40 text

Questions?

Slide 41

Slide 41 text

Follow me on twitter @davidfowl