Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Implementation Details Matter

Implementation Details Matter

I'm sure you've heard the phrase "It's an implementation detail". Details matter when it's time to scale and milliseconds add up quickly. Understanding the relevant details of the .NET subsystems can make the difference between scaling and failing.


David Fowler

March 24, 2022

More Decks by David Fowler

Other Decks in Programming


  1. Implementation Details Matter David Fowler Partner Software Architect Microsoft

  2. About Me • Partner Software Architect at Microsoft on the

    .NET team. • Co-creator of NuGet, SignalR, ASP.NET Core and the Azure SignalR Service. • I spend lots of my time helping customers debug and diagnose complex issues with .NET in production. • I spend the remainder of time in meetings and writing code.
  3. Why this talk? • .NET has been evolving into a

    modern, high performance platform. • I’ve been spending a chunk of my time helping teams at Microsoft design high performance libraries and diagnosing problems in large .NET based services. • There’s a lot of FUD surrounding premature optimization that I’d like to help dispel.
  4. Why do implementation details matter? • Software is built by

    humans with specific assumptions and scenarios in mind. • Understanding those assumptions can make a big difference at scale. • Understanding the relevant details can help you root cause problems quickly.
  5. What are some of those assumptions? • “There should only

    be a handful of these instances in practice.” • “This should only happen once at the start of the application, so it can be expensive.” • “Nobody is going to create this per request.” Software Engineer
  6. What does “at” scale mean? Scale for an application can

    mean the number of users, the amount of input (like the size of data), or number of times that data needs to be processed (e.g number of requests). Scaling as an engineer means knowing what to ignore and knowing what to pay close attention to.
  7. How much do I need to know? • Be curious

    and learn as much as you can about any area you are working in. • There will be a point of diminishing returns. • Building up a mental model for how things work can be very effective. • Figure out the “cliff notes” of an area. TL;DR
  8. How do I learn the details? • Build things. •

    Reading code others wrote. • Talking to subject matter experts. • Reading the documentation, including the fine print. • Debug the code. • Stack Overflow answers • Read the source code again.
  9. Is this a performance talk?

  10. Examples of these assumptions in .NET • The GC generally

    optimizes for throughput, not reduced memory usage. • Timers are optimized for creation and deletion. The assumption is they don’t fire most of the time. • Configuration is expected to be built once on application start.
  11. Examples from the real world • To put things in

    context, we’re going to use real examples. • Various teams at Microsoft use .NET extensively for large scale services. • We’ll look at some problems and root causes. • These code samples are in the hot path.
  12. The names of teams have been changed to protect the

  13. Show me the code

  14. internal static byte[] DecompressBytes(byte[] raw) { using (var stream =

    new GZipStream(new MemoryStream(raw), CompressionMode.Decompress)) { const int size = 4096; byte[] buffer = new byte[size]; using (var memory = new MemoryStream()) { int count; do { count = stream.Read(buffer, 0, size); if (count > 0) { memory.Write(buffer, 0, count); } } while (count > 0); return memory.ToArray(); } } } input buffer Copy buffer Another internal buffer Creates a new buffer Copy into internal buffer Allocates an 8K buffer internally
  15. Hardcoded at 8K

  16. Excessive buffer allocations and copies • This pattern is very

    common in server applications. • Lots of large temporary buffers that are thrown away per operation. • This can wreak havoc on the GC and your application as a result. • Be mindful about excessive buffer allocations and copies. • Use Streams/Pipelines for large data sets. • Pool and re-use buffers when you need to operate on in-memory data.
  17. internal static Stream DecompressBytes(byte[] raw) { return new GZipStream(new MemoryStream(raw),

    CompressionMode.Decompress); } Make the caller use a Stream Optimizing the code sometimes means changing the pattern completely.
  18. public static string OptimizeString(string value) { if (value != null)

    { string interned = string.IsInterned(value); if (interned != null) { return interned; } } return value; } What does this do? Optimize what?
  19. This is a global lock This is the map/dictionary

  20. What’s the fix? • This code didn’t need to intern

    the strings • If the string is already interned, then the check is cheap and lock free • Strings that aren’t interned pay the cost of the global lock • If you need to intern strings, consider implementing your own cache or using a nuget package that does it.
  21. public static IEnumerable<Operation> GetOccurences(this Operation root, Func<Operation, bool> predicate) {

    var visitedActivityIds = new HashSet<Guid>(); var unvisitedOperations = new Queue<Operation>(); unvisitedOperations.Enqueue(root); while (unvisitedOperations.Any()) { var currentOperation = unvisitedOperations.Dequeue(); if (visitedActivityIds.Contains(currentOperation.ActivityId)) { continue; } visitedActivityIds.Add(currentOperation.ActivityId); if (predicate.Invoke(currentOperation)) { yield return currentOperation; } currentOperation.ChildActions.Where(r => r.IsComplete).ToList().ForEach(unvisitedOperations.Enqueue); } } LINQ LINQ List allocation Delegate allocation
  22. public static IEnumerable<Operation> GetOccurences(this Operation root, Func<Operation, bool> predicate) {

    var visitedActivityIds = new HashSet<Guid>(); var unvisitedOperations = new Queue<Operation>(); unvisitedOperations.Enqueue(root); while (unvisitedOperations.Count > 0) { var currentOperation = unvisitedOperations.Dequeue(); if (!visitedActivityIds.Add(currentOperation.ActivityId)) { continue; } if (predicate.Invoke(currentOperation)) { yield return currentOperation; } foreach (var item in currentOperation.ChildActions) { if (item.IsComplete) { unvisitedOperations.Enqueue(item); } } } } Directly use Count Add replaces the Contains call Use a normal foreach loop
  23. Let's talk about LINQ… • LINQ to objects is powerful

    and expressive. • It should not be used in your application’s hot paths. • When you need to reduce allocations or save CPU cycles, LINQ is an easy target. • Optimized for what it does • There are lots of special cases to avoid expensive calls.
  24. public JsonDomNode(byte[] JsonDOMString, int length) { string str = Encoding.UTF8.GetString(JsonDOMString,

    0, length); this.JsonDOMString = str.ToCharArray(); Parse(this.JsonDOMString, 0); } str is thrown away
  25. public JsonDomNode(byte[] JsonDOMString, int length) { this.JsonDOMString = Encoding.UTF8.GetChars(JsonDOMString, 0,

    length); Parse(this.JsonDOMString, 0); } Directly get the char[]
  26. private bool CheckIfFilePathIsValid(string localPath) { Regex validPath = new Regex(@"^[a-zA-Z]:\\(?<Id>[a-z0-9\-]+)\\$",

    RegexOptions.IgnoreCase | RegexOptions.Compiled); string potentialId = validPath.Match(localPath).Groups["id"].Value; string localPathRoot = Path.GetPathRoot(localPath).ToLower(); string systemLocalPathRoot = Path.GetPathRoot(Environment.SystemDirectory).ToLower(); Guid unused; return (Guid.TryParse(potentialId, out unused) && (!localPathRoot.Equals(systemLocalPathRoot))); } Compiled Regex Per call 2 ToLower calls Case insensitive comparison
  27. Compiled Regexes • Regex compilation is about ~5000 lines of

    code. • It generates an IL specific to the regular expression specified. • Every new call will create new methods that needs to be JITTed. • Cache these aggressively.
  28. Regex validPath = new Regex(@"^[a-zA-Z]:\\(?<Id>[a-z0-9\-]+)\\$", RegexOptions.IgnoreCase | RegexOptions.Compiled); private bool

    CheckIfFilePathIsValid(string localPath) { string potentialId = validPath.Match(localPath).Groups["id"].Value; if (!Guid.TryParse(potentialId, out var unused)) { return false; } string localPathRoot = Path.GetPathRoot(localPath); string systemLocalPathRoot = Path.GetPathRoot(Environment.SystemDirectory); return localPathRoot.Equals(systemLocalPathRoot, StringComparison.OrdinalIgnoreCase); } Compile once and cache Check for Guid first Remove ToLower calls and do case insensitive comparison
  29. public static string GetLogLevel(LogLevel logLevel) { return logLevel.ToString(); } How

    does this work?
  30. Get the types and values for the enum type (this

    is cached) Binary search to find the index
  31. public static string GetLogLevel(LogLevel logLevel) => logLevel switch { LogLevel.Trace

    => "Trace", LogLevel.Debug => "Debug", LogLevel.Information => "Information", LogLevel.Warning => "Warning", LogLevel.Error => "Error", LogLevel.Critical => "Critical", LogLevel.None => "None", _ => throw new NotSupportedException() }; Constant strings
  32. private static void ScanForExpiredItems(MemoryCache cache) { DateTimeOffset now = cache._lastExpirationScan

    = cache._options.Clock.UtcNow; foreach (CacheEntry entry in cache._entries.Values) { if (entry.CheckExpired(now)) { cache.RemoveEntry(entry); } } } This is a ConcurrentDictionary
  33. ConcurrentDictionary<TKey, TValue> • Is conceptually Dictionary with more granular locking.

    • Reads are lock free. • Writes are not! • Several APIs lock the entire collection (.Keys, .Values etc). • Some APIs take snapshots of the underlying collections (allocations and copies). • The default number of concurrent writes is equal to the number of processors on the machine (you’ll have that many locks by default).
  34. private static void ScanForExpiredItems(MemoryCache cache) { DateTimeOffset now = cache._lastExpirationScan

    = cache._options.Clock.UtcNow; foreach (KeyValuePair<object, CacheEntry> item in cache._entries) { CacheEntry entry = item.Value; if (entry.CheckExpired(now)) { cache.RemoveEntry(entry); } } } Enumerate entries directly
  35. Lessons • Measure what you care about! Do not trust

    your intuition. • Think about how things are implemented. • Quick to call != Quick to execute. • Everything is a tradeoff. • The compiler isn’t nearly as smart as you think it is…
  36. Research Tools • The .NET source (https://source.dot.net) • Sharplab (https://sharplab.io)

    • Grep.app (https://grep.app/) • Decompilers (ILSpy etc) • Github issues/discussions (ask questions!) • C# Discord
  37. Learning journey • This is not “bad code”, we’re all

    on a learning journey. • Learn from the code you’ve written in the past. • Nobody is perfect. • These examples came from Microsoft. • I learned some things making these slides.
  38. Implementation details are volatile • The details are constantly evolving.

    Make sure your knowledge is up to date! • Specific details often change, concepts don’t change as often. • Continuous measurement is important. Things tend to improve over time.
  39. Recipe • Build a mental model for the major parts

    of the system. • Figure out which parts need to scale. • What’s on the hot path? • Figure out how those parts work, down to the relevant details. • Optimize your code appropriately.
  40. Questions?

  41. Follow me on twitter @davidfowl