Slide 1

Slide 1 text

Scaling ASP.NET Core Applications David Fowler @davidfowl Damian Edwards @damianedwards

Slide 2

Slide 2 text

Disclaimer • We don’t build real applications • We see *A LOT* of broken applications • We help customers solve their scalability issues

Slide 3

Slide 3 text

What do we mean by “scale”? • Scale is a measure of user/request/connections per scale- unit (machine, container etc) • If you do nothing, you can scale it infinitely – Scott Hanselman

Slide 4

Slide 4 text

Types of scaling • Horizontal scale (scaling out) • Adding more units of scale (machines/VMs/containers etc) • Vertical scale (scaling up) • Adding more capable resources to an existing scale unit (CPU, memory, bandwidth)

Slide 5

Slide 5 text

Why doesn’t my application scale? • “Work” that doesn’t clean up after itself • Creating work faster than work is being executed

Slide 6

Slide 6 text

What affects scale • CPU • Hot paths in your application • Contended locks • Memory • Memory leaks (work isn’t cleaning up properly when complete) • Inefficient memory usage (using more memory than expected for the work) • IO • Ephemeral port exhaustion • Running out of disk/storage space • Blocking • Bandwidth & latency

Slide 7

Slide 7 text

What affects scale (CLR) • GC • Too many GC pauses • ThreadPool • Thread pool starvation • Timers • Too many timers • Exceptions • Locks • Highly contended locks • Synchronous IO

Slide 8

Slide 8 text

Async Programming • Doing async right can increase scalability • Doing async wrong can severely decrease scalability • .NET has lots of async traps • The number one rule is DON’T BLOCK

Slide 9

Slide 9 text

Load testing • Scale issues usually show up when it’s too late • It’s important to figure out how much load your application can handle • For a fixed RPS, monitor CPU usage and memory usage • Understand how much each scale unit in your deployment can handle (e.g. each VM can handle 1000 RPS)

Slide 10

Slide 10 text

Load testing Run load tests Find bottleneck Fix issues

Slide 11

Slide 11 text

Scalability Checklist: CPU • Machine resources • CPU usage • CLR resources • ThreadPool (work-items and worker threads) • GC (Gen0, Gen1 and Gen2) collections • Locks • Application logic • Serialization • Chatty IO

Slide 12

Slide 12 text

Scalability Checklist: Memory • Machine resources • Memory usage • Number of threads • CLR resources • Timers • GC (heap sizes for Gen0, Gen1 and Gen2) • Application logic • Strings • Reading everything into memory instead of using streaming data • Disk IO • Network IO • Disposable objects not being disposed • AsyncLocal leaks

Slide 13

Slide 13 text

Scalability Checklist: IO • Machine resources • Number of open files/handles/sockets (check ulimit) • CLR resources • IO threads • Application logic • HttpClient • DbConnection/SqlConnection • FileStream • Inefficient buffering (lots of small reads/writes packets)

Slide 14

Slide 14 text

Sync over Async

Slide 15

Slide 15 text

ThreadPool • Sync over async • APIs that masquerade as synchronous but are actually blocking async methods • Uses 2 threads to complete a single operation • Blocking APIs are BAD • Avoid blocking APIs where possible e.g. Task.Wait, Task.Result, Thread.Sleep, GetAwaiter.GetResult() • Excessive blocking on thread pool threads can cause starvation • Thread injection rate beyond configured max is slow (2 per second)

Slide 16

Slide 16 text

Sync over Async

Slide 17

Slide 17 text

Demo: Sync over async

Slide 18

Slide 18 text

Cache Lookup

Slide 19

Slide 19 text

Highly contended locks • Web applications are highly concurrent • Highly contended locks can be a death knell for scalable services • Lock contention is sometimes hard to look at in basic profilers • Visual Studio Concurrency Visualizer • dotTrace timeline view • Prefer concurrent data structures • Understand which operations take locks and which operations are lock free • Know what BCL APIs take locks on your behalf • String.Intern • System.Drawing (GDI)

Slide 20

Slide 20 text

Demo: Cache Lookup

Slide 21

Slide 21 text

Parsing a JSON payload

Slide 22

Slide 22 text

GC issues • Allocating memory is very cheap, collecting it isn’t • Allocating lots of memory can lead to GC pauses • Allocating objects over 85KB in size ends up on the LOH (large object heap) • The LOH is collected with Gen2 but not compacted (by default)

Slide 23

Slide 23 text

Demo: Parsing a JSON payload

Slide 24

Slide 24 text

Timeouts

Slide 25

Slide 25 text

TimerQueue • There’s a TimerQueue per CPU Core • Timers within a TimerQueue form a linked list • Timers are optimized for adding and removing • Timer callbacks are scheduled to the thread pool • Each TimerQueue is protected by a lock • Disposing the timer removes it from the queue

Slide 26

Slide 26 text

TimerQueue TimerQueue TimerQueueTimer TimerQueueTimer TimerQueueTimer TimerQueueTimer TimerQueueTimer TimerQueue TimerQueueTimer TimerQueueTimer TimerQueueTimer TimerQueueTimer TimerQueueTimer

Slide 27

Slide 27 text

Demo: Timer leak

Slide 28

Slide 28 text

Demo: Timeout (fixed)

Slide 29

Slide 29 text

.NET Async traps

Slide 30

Slide 30 text

.NET Async traps

Slide 31

Slide 31 text

.NET Async traps: ConcurrentDictionary

Slide 32

Slide 32 text

.NET Async traps

Slide 33

Slide 33 text

.NET Async traps

Slide 34

Slide 34 text

.NET Async traps

Slide 35

Slide 35 text

.NET Async traps

Slide 36

Slide 36 text

.NET Async traps

Slide 37

Slide 37 text

Diagnostics: Performance Traces • Types of issues • High CPU • Tools • Visual Studio • dotTrace • PerfView • dotnet-collect

Slide 38

Slide 38 text

Diagnostics: Post Mortem Debugging • Types of issues • Crashes • Hangs (sync and async) • Memory leaks • Locks • Tools • Visual Studio • Windbg • lldb • dotnet-analyze • dotMemory

Slide 39

Slide 39 text

Future Enhancements • Improved documentation on how to scale web services • Improvements to the thread pool to better handle blocking workers • Analyzers to catch common mistakes with asynchronous programming • Tools to help better diagnose common issues • Async hangs • Thread pool starvation • More counters in .NET Core • Reduce the amount of .NET async traps • IAsyncDisposable • FileStream (sync over async)

Slide 40

Slide 40 text

In summary… coding is hard