Slide 1

Slide 1 text

Entity Framework Core Performance Optimization Pawel Gerr @pawelgerr [email protected] .NET Developer Conference 2019

Slide 2

Slide 2 text

Entity Framework Core Performance Optimization Example for today – Products DB Product group Products Prices Sellers Studio

Slide 3

Slide 3 text

Entity Framework Core Performance Optimization My usual options for Entity Framework performance optimization 2nd level cache Hardware upgrade Materialized views Stored procedures Triggers Database Features Server Raw SELECT * FROM … Raw SQL Bulk operations Indices Entity Framework Requests reduction Query restructure … and some other (minor) features we will see later Use 3rd party libs if available CPU, Memory, I/O

Slide 4

Slide 4 text

Entity Framework Core Performance Optimization Choose your optimization strategy wisely Usually at its limits already Database load (SQL) Application server load (.NET / Entity Framework) Performance Implementation effort Hardware cost SQL generation Materialization

Slide 5

Slide 5 text

Entity Framework Core Performance Optimization Did „the change“ anything good? – My KPI Execution time (LINQ) SELECT * FROM … SELECT * FROM … SELECT * FROM … Number of requests Query statistics SELECT * FROM … “Shape” of SQL Execution plans Parse time Compile time Execution time (DB) Physical reads Logical reads Row count Round trip time SQL generation Execution time (DB) Materialization My key performance indicators

Slide 6

Slide 6 text

Entity Framework Core Performance Optimization Reducing database requests

Slide 7

Slide 7 text

Entity Framework Core Performance Optimization N+1 queries problem Execution of hundreds of requests for one use case Caused by: unintentional execution of queries in loops Possible solutions: • Better code structure (software architecture) • Avoid overgeneralization (shared/core projects) • Help each other to improve (review/feedback) • Know the insights of Entity Framework • Lazy loading • Limitations Reducing database requests

Slide 8

Slide 8 text

Simplified case of indirect execution of database requests in a loop Library Main Entity Framework Core Performance Optimization Multiple requests in loops Reducing database requests var products = LoadProducts(); var prices = new List(); foreach (var product in products) { var price = GetPrice(product.Id); prices.Add(price); } public Price GetPrice(Guid productId) { MyDbContext ctx = …; return ctx.Prices.FirstOrDefault(p => p.ProductId == productId); } Executed N times

Slide 9

Slide 9 text

Library Main Entity Framework Core Performance Optimization LINQ queries are loops Reducing database requests var products = LoadProducts(); var prices = products.Select(product => GetPrice(product.Id)) .ToList(); public Price GetPrice(Guid productId) { MyDbContext ctx = …; return ctx.Prices.FirstOrDefault(p => p.ProductId == productId); }

Slide 10

Slide 10 text

Library Main Entity Framework Core Performance Optimization Approach 1 – Use-case specific methods Reducing database requests var productsWithPrices = LoadProductsWithPrices(); public List LoadProductsWithPrices() { return ctx.Products .Include(p => p.Prices) // merely symbolic, fetching prices is usually more complex .ToList(); }

Slide 11

Slide 11 text

Library Main var products = LoadProducts(); var productIds = products.Select(p => p.Id); var prices = GetPrices(productIds); public List GetPrices(IEnumerable productIds) { return ctx.Prices .Where(p => productIds.Contains(p.ProductId)) .ToList(); } Entity Framework Core Performance Optimization Approach 2 – load all required prices at once Reducing database requests

Slide 12

Slide 12 text

Tips for more complex use cases: • Use abstractions to predetermine possible paths • Take a look at specification pattern to get good encapsulation and flexibility Library Main var deliverableBluRaysWithPrice = LoadProducts() .WithLatestPrice() .AreDeliverable() .Of(MediaType.BluRay) .ToList(); public ProductsQuery LoadProducts() { return new ProductsQuery(ctx); // alternatively, we can provide some kind of “repositories” to work on } Entity Framework Core Performance Optimization Approach 3 – combine 1 and 2 Reducing database requests

Slide 13

Slide 13 text

Given: 100 products being in 5 studio Loading of all products “in one go” leads to 1 + 5 queries Library Main var productsByStudio = ctx.Products .Include(p => p.Studio) .ToLookup(p => p.Studio.Name); Entity Framework Core Performance Optimization Lazy loading Reducing database requests public static ILookup ToLookup( this IEnumerable source, Func keySelector) { ... } Use eager loading to prevent unnecessary requests IEnumerable always forces query execution

Slide 14

Slide 14 text

LINQ Entity Framework Core Performance Optimization Limitations of Entity Framework Core Access to navigational property in EF 2.2 leads to N+1 queries if query is not trivial • Loading of the first product of 5 studios leads to 1 + 5 queries • The issue has been fixed in EF 3.0 Reducing database requests var studios = ctx.Studios .Select(s => { Studio = s, FirstProduct = s.Products.FirstOrDefault() }) .ToList(); not trivial query

Slide 15

Slide 15 text

EF 2.2 evaluates non-translatable parts of the query in memory EF 3.0 throws an InvalidOperationException: The LINQ expression could not be translated. Main Library Entity Framework Core Performance Optimization Client-side evaluation Reducing result set public bool IsDeliverable(Product product) { // business logic } EF cannot translate custom methods to SQL Rewrite the predicate Library public Expression> IsDeliverable() { return p => p.DeliverableFrom <= DateTime.Now && p.DeliverableUntil > DateTime.Now; } var products = ctx.Products .Where(IsDeliverable()) .Where(p => IsDeliverable(p)) .ToList();

Slide 16

Slide 16 text

Entity Framework Core Performance Optimization Finding the problem area Use profiling tools In case of Microsoft SQL Server: XEvent Profiler in SQL Server Management Studio SQL Server Profiler (deprecated, resource intensive) Azure Data Studio Logs coming from EF with EnableSensitiveDataLogging Reducing database requests Logging DbContextOptionsBuilder builder = ...; builder.UseSqlServer("...") // or any other database .UseLoggerFactory(loggerFactory) .EnableSensitiveDataLogging(); In development only because possible security leak!

Slide 17

Slide 17 text

Entity Framework Core Performance Optimization Finding the problem area Look for repetitive lines and cycles Reducing database requests Repetive lines Cycles

Slide 18

Slide 18 text

Entity Framework Core Performance Optimization Reducing query complexity

Slide 19

Slide 19 text

Entity Framework Core Performance Optimization Cartesian explosion problem Fetching a lot of data from multiple tables at once • Increased memory consumption and execution time • High I/O load Caused by: loading of multiple navigational collection-properties Possible solutions: • Reduce Includes (eager loading) • Reduce access to navigational properties in projections (i.e. Select) Reducing query complexity Query splitting

Slide 20

Slide 20 text

Entity Framework Core Performance Optimization Eager loading via Include EF 2.2 is loading collections separately EF 3.0 loads all data using 1 SQL statement • Bigger result sets: 100 products with 5 prices each and 3 sellers (100 * 5 * 3) rows > (100 + 500 + 3) records • EF-forced ORDER BY clause produces considerable load Reducing query complexity Product Id Price Id Seller Id 1 1 1 1 1 2 1 1 3 1 2 1 1 2 2 1 2 3 SQL LINQ var products = ctx.Products .Include(p => p.Studio) .Include(p => p.Prices) .Include(p => p.Sellers) .ToList(); SELECT * FROM Products INNER JOIN Studios ... LEFT JOIN Prices ON ... LEFT JOIN Sellers ON ... ... ORDER BY Products.Id, Prices.Id, Sellers.Id Cartesian explosion

Slide 21

Slide 21 text

Before Entity Framework Core Performance Optimization Using collections in projections Loading all data at once Statistics: • CPU: 31 • Duration: 75 • Reads: 5299 • Row Count: 12342 Reducing query complexity LINQ var studios = ctx.Studios .Select(s => new MyStudio { Id = s.Id, Name = s.Name, Infinity = s.Products.Where(p => p.Name.StartsWith("Infinity")), Endgame = s.Products.Where(p => p.Name.StartsWith("Endgame")) }) .ToList();

Slide 22

Slide 22 text

Before Usually, loading collection separately requires more code but less resources on database Statistics: • CPU: 31 • Duration: 75 • Reads: 5299 • Row Count: 12342 After LINQ Entity Framework Core Performance Optimization Query splitting Reducing query complexity var studios = ctx.Studios.Select(p => new Studio() { Id = p.Id, Name = p.Name }).ToList(); var infinity = ctx.Products.Where(p => p.Name.StartsWith("Infinity")).ToList(); var endgame = ctx.Products.Where(p => p.Name.StartsWith("Endgame")).ToList(); var infinityLookup = infinity.ToLookup(p => p.StudioId); var endgameLookup = endgame.ToLookup(p => p.StudioId); foreach (var studio in studios) { studio.Infinity = infinityLookup[studio.Id]; sturio.Endgame = endgameLooup[studio.Id]; } Necessary, if change tracking is disabled 16 3 352 2322

Slide 23

Slide 23 text

Entity Framework Core Performance Optimization Avoid unnecessary fuzzy searches

Slide 24

Slide 24 text

Entity Framework Core Performance Optimization Fuzzy searches may lead to • Bigger internal working sets • Sub-optimal JOIN order • Parallelization • Usage of temp tables • Higher query memory grant Possible solutions: • Good domain knowledge (code review) • Better code structure Avoid unnecessary fuzzy searches

Slide 25

Slide 25 text

Task: export of products for billing for a studio Option 2 Option 1 Entity Framework Core Performance Optimization Know the domain Avoid unnecessary fuzzy searches var studio = "Walt Disney"; var products = ctx.Products .Where(p => p.Studio.Name == studio) .ToList(); var studio = "Disney"; var products = ctx.Products .Where(p => p.Studio.Name.Contains(studio)) .ToList(); Looks like for the product search on the website

Slide 26

Slide 26 text

Product search, originally implemented for website became “shared” The method above can but should not be used for billing Sharing of code for different use cases may lead to bad performance (and bugs) Shared project Entity Framework Core Performance Optimization Better code structure Avoid unnecessary fuzzy searches public class ProductRepository { public List LoadProducts(string studio) { return ctx.Products .Where(p => p.Studio.Name.Contains(studio)) .ToList(); } } Often in shared/core project but rarely belongs there

Slide 27

Slide 27 text

Entity Framework Core Performance Optimization Understanding queries

Slide 28

Slide 28 text

LINQ 2 LINQ 1 Entity Framework Core Performance Optimization Which one is better? Understanding Queries var groups = ctx.ProductGroups .Select(g => new { g.Products.FirstOrDefault().Id, g.Products.FirstOrDefault().Name }) .ToList(); var groups = ctx.ProductGroups .Select(g => g.Products .Select(p => new { p.Id, p.Name }) .FirstOrDefault()) .ToList(); 2x “FirstOrDefault()” before selecting properties Selection of properties before “FirstOrDefault()”

Slide 29

Slide 29 text

SQL 2 SQL 1 Entity Framework Core Performance Optimization Lets try with SQL … Understanding Queries SELECT ( SELECT TOP(1) p.Id FROM Products p WHERE g.Id = p.GroupId ) AS FirstProductId, ( SELECT TOP(1) p.Name FROM Products p WHERE g.Id = p.GroupId ) AS FirstProductName FROM ProductGroups g SELECT p.Id, p.Name FROM ProductGroups g LEFT JOIN ( SELECT Id, Name, GroupId FROM ( SELECT Id, Name, GroupId, ROW_NUMBER() OVER(PARTITION BY GroupId ORDER BY Id) AS row FROM Products ) p WHERE row <= 1 ) p ON g.Id = p.GroupId 2 sub-selects Window function “ROW_NUMBER”

Slide 30

Slide 30 text

Entity Framework Core Performance Optimization Execution plans

Slide 31

Slide 31 text

Entity Framework Core Performance Optimization Textual representation of all database operations • Type and the order of operations (JOIN, filter, projection, ...) • Indexes being used • Amount of data flowing between two operations • Costs of an operation and a subtree Some tools have built-in support for displaying execution plans as a graph Execution plans

Slide 32

Slide 32 text

Entity Framework Core Performance Optimization Another try … Execution plans Execution plan 1 SQL 1 SELECT ( SELECT TOP(1) p.Id FROM Products p WHERE g.Id = p.GroupId ) AS FirstProductId, ( SELECT TOP(1) p.Name FROM Products p WHERE g.Id = p.GroupId ) AS FirstProductName FROM ProductGroups g

Slide 33

Slide 33 text

Entity Framework Core Performance Optimization Execution plans Execution plan 2 SQL 2 SELECT p.Id, p.Name FROM ProductGroups g LEFT JOIN ( SELECT Id, Name, GroupId FROM ( SELECT Id, Name, GroupId, ROW_NUMBER() OVER(PARTITION BY GroupId ORDER BY Id) AS row FROM Products ) p WHERE row <= 1 ) p ON g.Id = p.GroupId

Slide 34

Slide 34 text

Entity Framework Core Performance Optimization Comparing • Compare estimated subtree costs • Deviance: estimated vs actual number of rows • Take statistics (like reads) into consideration Execution plans Query 1 wins!

Slide 35

Slide 35 text

Entity Framework Core Performance Optimization SQL Server Operators General icon Cursors Parallelism T-SQL Execution plans

Slide 36

Slide 36 text

Entity Framework Core Performance Optimization Execution plans Crash course Table scan Clustered index scan Non-clustered index scan Clustered index seek Non-clustered index seek Scan No clustered index Seek Missing index? Fuzzy search? Bad discriminator? Filter Filtering Filtering Filtering Key lookup Missing (include) columns? RID lookup If no clustered index Sort “Stop & Go” operator! Required? Missing index? Sorting in .NET cheaper? Parallelism Expensive High query complexity

Slide 37

Slide 37 text

Entity Framework Core Performance Optimization JOINs Execution plans DB may perform sort on its own • Resource-saving • Data sets must be ordered • The “default” • Kind-of 2 nested loops • Speed depends on: • Scan of data set A • Seek of data set B • Large and unsorted sets • 2 phases: • Builds hash table for set A • Matches hash values from B Should be looked at Merge join Nested loop join Hash match join

Slide 38

Slide 38 text

Entity Framework Core Performance Optimization Learnings • Look for database requests in “loops” • Split complex queries if necessary • Write conditions as precise as possible • Learn to read execution plans - optimize queries accordingly Resources • Entity Framework Core: https://docs.microsoft.com/en-us/ef/core/ • Execution Plans: https://docs.microsoft.com/en-us/sql/relational-databases/performance/execution-plans • My EF playground: https://github.com/PawelGerr/EntityFrameworkCore-Demos Ask me @pawelgerr [email protected]