Scaling ASP.NET Core Applications

1fe64ebb176498be5f73ab51986c6b7b?s=47 David Fowler
January 30, 2019

Scaling ASP.NET Core Applications

Hey my app doesn't scale! ____ Framework sucks! Well, you can write a slow app in any language. This talk will show you why your app isn't scaling and gives you the DOs and the DON'Ts of making big apps do big things in ASP.NET Core.

1fe64ebb176498be5f73ab51986c6b7b?s=128

David Fowler

January 30, 2019
Tweet

Transcript

  1. 2.

    Disclaimer • We don’t build real applications • We see

    *A LOT* of broken applications • We help customers solve their scalability issues
  2. 3.

    What do we mean by “scale”? • Scale is a

    measure of user/request/connections per scale- unit (machine, container etc) • If you do nothing, you can scale it infinitely – Scott Hanselman
  3. 4.

    Types of scaling • Horizontal scale (scaling out) • Adding

    more units of scale (machines/VMs/containers etc) • Vertical scale (scaling up) • Adding more capable resources to an existing scale unit (CPU, memory, bandwidth)
  4. 5.

    Why doesn’t my application scale? • “Work” that doesn’t clean

    up after itself • Creating work faster than work is being executed
  5. 6.

    What affects scale • CPU • Hot paths in your

    application • Contended locks • Memory • Memory leaks (work isn’t cleaning up properly when complete) • Inefficient memory usage (using more memory than expected for the work) • IO • Ephemeral port exhaustion • Running out of disk/storage space • Blocking • Bandwidth & latency
  6. 7.

    What affects scale (CLR) • GC • Too many GC

    pauses • ThreadPool • Thread pool starvation • Timers • Too many timers • Exceptions • Locks • Highly contended locks • Synchronous IO
  7. 8.

    Async Programming • Doing async right can increase scalability •

    Doing async wrong can severely decrease scalability • .NET has lots of async traps • The number one rule is DON’T BLOCK
  8. 9.

    Load testing • Scale issues usually show up when it’s

    too late • It’s important to figure out how much load your application can handle • For a fixed RPS, monitor CPU usage and memory usage • Understand how much each scale unit in your deployment can handle (e.g. each VM can handle 1000 RPS)
  9. 11.

    Scalability Checklist: CPU • Machine resources • CPU usage •

    CLR resources • ThreadPool (work-items and worker threads) • GC (Gen0, Gen1 and Gen2) collections • Locks • Application logic • Serialization • Chatty IO
  10. 12.

    Scalability Checklist: Memory • Machine resources • Memory usage •

    Number of threads • CLR resources • Timers • GC (heap sizes for Gen0, Gen1 and Gen2) • Application logic • Strings • Reading everything into memory instead of using streaming data • Disk IO • Network IO • Disposable objects not being disposed • AsyncLocal leaks
  11. 13.

    Scalability Checklist: IO • Machine resources • Number of open

    files/handles/sockets (check ulimit) • CLR resources • IO threads • Application logic • HttpClient • DbConnection/SqlConnection • FileStream • Inefficient buffering (lots of small reads/writes packets)
  12. 15.

    ThreadPool • Sync over async • APIs that masquerade as

    synchronous but are actually blocking async methods • Uses 2 threads to complete a single operation • Blocking APIs are BAD • Avoid blocking APIs where possible e.g. Task.Wait, Task.Result, Thread.Sleep, GetAwaiter.GetResult() • Excessive blocking on thread pool threads can cause starvation • Thread injection rate beyond configured max is slow (2 per second)
  13. 19.

    Highly contended locks • Web applications are highly concurrent •

    Highly contended locks can be a death knell for scalable services • Lock contention is sometimes hard to look at in basic profilers • Visual Studio Concurrency Visualizer • dotTrace timeline view • Prefer concurrent data structures • Understand which operations take locks and which operations are lock free • Know what BCL APIs take locks on your behalf • String.Intern • System.Drawing (GDI)
  14. 22.

    GC issues • Allocating memory is very cheap, collecting it

    isn’t • Allocating lots of memory can lead to GC pauses • Allocating objects over 85KB in size ends up on the LOH (large object heap) • The LOH is collected with Gen2 but not compacted (by default)
  15. 24.
  16. 25.

    TimerQueue • There’s a TimerQueue per CPU Core • Timers

    within a TimerQueue form a linked list • Timers are optimized for adding and removing • Timer callbacks are scheduled to the thread pool • Each TimerQueue is protected by a lock • Disposing the timer removes it from the queue
  17. 37.

    Diagnostics: Performance Traces • Types of issues • High CPU

    • Tools • Visual Studio • dotTrace • PerfView • dotnet-collect
  18. 38.

    Diagnostics: Post Mortem Debugging • Types of issues • Crashes

    • Hangs (sync and async) • Memory leaks • Locks • Tools • Visual Studio • Windbg • lldb • dotnet-analyze • dotMemory
  19. 39.

    Future Enhancements • Improved documentation on how to scale web

    services • Improvements to the thread pool to better handle blocking workers • Analyzers to catch common mistakes with asynchronous programming • Tools to help better diagnose common issues • Async hangs • Thread pool starvation • More counters in .NET Core • Reduce the amount of .NET async traps • IAsyncDisposable • FileStream (sync over async)