Scaling ASP.NET Core Applications

Scaling ASP.NET Core Applications David Fowler @davidfowl Damian Edwards @damianedwards

Disclaimer • We don’t build real applications • We see
*A LOT* of broken applications • We help customers solve their scalability issues

What do we mean by “scale”? • Scale is a
measure of user/request/connections per scale- unit (machine, container etc) • If you do nothing, you can scale it infinitely – Scott Hanselman

Types of scaling • Horizontal scale (scaling out) • Adding
more units of scale (machines/VMs/containers etc) • Vertical scale (scaling up) • Adding more capable resources to an existing scale unit (CPU, memory, bandwidth)

Why doesn’t my application scale? • “Work” that doesn’t clean
up after itself • Creating work faster than work is being executed

What affects scale • CPU • Hot paths in your
application • Contended locks • Memory • Memory leaks (work isn’t cleaning up properly when complete) • Inefficient memory usage (using more memory than expected for the work) • IO • Ephemeral port exhaustion • Running out of disk/storage space • Blocking • Bandwidth & latency

What affects scale (CLR) • GC • Too many GC
pauses • ThreadPool • Thread pool starvation • Timers • Too many timers • Exceptions • Locks • Highly contended locks • Synchronous IO

Async Programming • Doing async right can increase scalability •
Doing async wrong can severely decrease scalability • .NET has lots of async traps • The number one rule is DON’T BLOCK

Load testing • Scale issues usually show up when it’s
too late • It’s important to figure out how much load your application can handle • For a fixed RPS, monitor CPU usage and memory usage • Understand how much each scale unit in your deployment can handle (e.g. each VM can handle 1000 RPS)

Load testing Run load tests Find bottleneck Fix issues

Scalability Checklist: CPU • Machine resources • CPU usage •
CLR resources • ThreadPool (work-items and worker threads) • GC (Gen0, Gen1 and Gen2) collections • Locks • Application logic • Serialization • Chatty IO

Scalability Checklist: Memory • Machine resources • Memory usage •
Number of threads • CLR resources • Timers • GC (heap sizes for Gen0, Gen1 and Gen2) • Application logic • Strings • Reading everything into memory instead of using streaming data • Disk IO • Network IO • Disposable objects not being disposed • AsyncLocal leaks

Scalability Checklist: IO • Machine resources • Number of open
files/handles/sockets (check ulimit) • CLR resources • IO threads • Application logic • HttpClient • DbConnection/SqlConnection • FileStream • Inefficient buffering (lots of small reads/writes packets)

Sync over Async

ThreadPool • Sync over async • APIs that masquerade as
synchronous but are actually blocking async methods • Uses 2 threads to complete a single operation • Blocking APIs are BAD • Avoid blocking APIs where possible e.g. Task.Wait, Task.Result, Thread.Sleep, GetAwaiter.GetResult() • Excessive blocking on thread pool threads can cause starvation • Thread injection rate beyond configured max is slow (2 per second)

Sync over Async

Demo: Sync over async

Cache Lookup

Highly contended locks • Web applications are highly concurrent •
Highly contended locks can be a death knell for scalable services • Lock contention is sometimes hard to look at in basic profilers • Visual Studio Concurrency Visualizer • dotTrace timeline view • Prefer concurrent data structures • Understand which operations take locks and which operations are lock free • Know what BCL APIs take locks on your behalf • String.Intern • System.Drawing (GDI)

Demo: Cache Lookup

Parsing a JSON payload

GC issues • Allocating memory is very cheap, collecting it
isn’t • Allocating lots of memory can lead to GC pauses • Allocating objects over 85KB in size ends up on the LOH (large object heap) • The LOH is collected with Gen2 but not compacted (by default)

Demo: Parsing a JSON payload

Timeouts

TimerQueue • There’s a TimerQueue per CPU Core • Timers
within a TimerQueue form a linked list • Timers are optimized for adding and removing • Timer callbacks are scheduled to the thread pool • Each TimerQueue is protected by a lock • Disposing the timer removes it from the queue

TimerQueue TimerQueue TimerQueueTimer TimerQueueTimer TimerQueueTimer TimerQueueTimer TimerQueueTimer TimerQueue TimerQueueTimer TimerQueueTimer
TimerQueueTimer TimerQueueTimer TimerQueueTimer

Demo: Timer leak

Demo: Timeout (fixed)

.NET Async traps

.NET Async traps: ConcurrentDictionary

.NET Async traps

Diagnostics: Performance Traces • Types of issues • High CPU
• Tools • Visual Studio • dotTrace • PerfView • dotnet-collect

Diagnostics: Post Mortem Debugging • Types of issues • Crashes
• Hangs (sync and async) • Memory leaks • Locks • Tools • Visual Studio • Windbg • lldb • dotnet-analyze • dotMemory

Future Enhancements • Improved documentation on how to scale web
services • Improvements to the thread pool to better handle blocking workers • Analyzers to catch common mistakes with asynchronous programming • Tools to help better diagnose common issues • Async hangs • Thread pool starvation • More counters in .NET Core • Reduce the amount of .NET async traps • IAsyncDisposable • FileStream (sync over async)

In summary… coding is hard

Scaling ASP.NET Core Applications

Scaling ASP.NET Core Applications

More Decks by David Fowler

Other Decks in Technology

Featured

Transcript