Framework performance optimization 2nd level cache Hardware upgrade Materialized views Stored procedures Triggers Database Features Server Raw SELECT * FROM … Raw SQL Bulk operations Indices Entity Framework Requests reduction Query restructure … and some other (minor) features we will see later Use 3rd party libs if available CPU, Memory, I/O
– My KPI Execution time (LINQ) SELECT * FROM … SELECT * FROM … SELECT * FROM … Number of requests Query statistics SELECT * FROM … “Shape” of SQL Execution plans Parse time Compile time Execution time (DB) Physical reads Logical reads Row count Round trip time SQL generation Execution time (DB) Materialization My key performance indicators
hundreds of requests for one use case Caused by: unintentional execution of queries in loops Possible solutions: • Better code structure (software architecture) • Avoid overgeneralization (shared/core projects) • Help each other to improve (review/feedback) • Know the insights of Entity Framework • Lazy loading • Limitations Reducing database requests
Use-case specific methods Reducing database requests var productsWithPrices = LoadProductsWithPrices(); public List<ProductWithPrices> LoadProductsWithPrices() { return ctx.Products .Include(p => p.Prices) // merely symbolic, fetching prices is usually more complex .ToList(); }
predetermine possible paths • Take a look at specification pattern to get good encapsulation and flexibility Library Main var deliverableBluRaysWithPrice = LoadProducts() .WithLatestPrice() .AreDeliverable() .Of(MediaType.BluRay) .ToList(); public ProductsQuery LoadProducts() { return new ProductsQuery(ctx); // alternatively, we can provide some kind of “repositories” to work on } Entity Framework Core Performance Optimization Approach 3 – combine 1 and 2 Reducing database requests
Core Access to navigational property in EF 2.2 leads to N+1 queries if query is not trivial • Loading of the first product of 5 studios leads to 1 + 5 queries • The issue has been fixed in EF 3.0 Reducing database requests var studios = ctx.Studios .Select(s => { Studio = s, FirstProduct = s.Products.FirstOrDefault() }) .ToList(); not trivial query
EF 3.0 throws an InvalidOperationException: The LINQ expression could not be translated. Main Library Entity Framework Core Performance Optimization Client-side evaluation Reducing result set public bool IsDeliverable(Product product) { // business logic } EF cannot translate custom methods to SQL Rewrite the predicate Library public Expression<Func<Product, bool>> IsDeliverable() { return p => p.DeliverableFrom <= DateTime.Now && p.DeliverableUntil > DateTime.Now; } var products = ctx.Products .Where(IsDeliverable()) .Where(p => IsDeliverable(p)) .ToList();
profiling tools In case of Microsoft SQL Server: XEvent Profiler in SQL Server Management Studio SQL Server Profiler (deprecated, resource intensive) Azure Data Studio Logs coming from EF with EnableSensitiveDataLogging Reducing database requests Logging DbContextOptionsBuilder builder = ...; builder.UseSqlServer("...") // or any other database .UseLoggerFactory(loggerFactory) .EnableSensitiveDataLogging(); In development only because possible security leak!
lot of data from multiple tables at once • Increased memory consumption and execution time • High I/O load Caused by: loading of multiple navigational collection-properties Possible solutions: • Reduce Includes (eager loading) • Reduce access to navigational properties in projections (i.e. Select) Reducing query complexity Query splitting
2 Option 1 Entity Framework Core Performance Optimization Know the domain Avoid unnecessary fuzzy searches var studio = "Walt Disney"; var products = ctx.Products .Where(p => p.Studio.Name == studio) .ToList(); var studio = "Disney"; var products = ctx.Products .Where(p => p.Studio.Name.Contains(studio)) .ToList(); Looks like for the product search on the website
above can but should not be used for billing Sharing of code for different use cases may lead to bad performance (and bugs) Shared project Entity Framework Core Performance Optimization Better code structure Avoid unnecessary fuzzy searches public class ProductRepository { public List<Product> LoadProducts(string studio) { return ctx.Products .Where(p => p.Studio.Name.Contains(studio)) .ToList(); } } Often in shared/core project but rarely belongs there
one is better? Understanding Queries var groups = ctx.ProductGroups .Select(g => new { g.Products.FirstOrDefault().Id, g.Products.FirstOrDefault().Name }) .ToList(); var groups = ctx.ProductGroups .Select(g => g.Products .Select(p => new { p.Id, p.Name }) .FirstOrDefault()) .ToList(); 2x “FirstOrDefault()” before selecting properties Selection of properties before “FirstOrDefault()”
try with SQL … Understanding Queries SELECT ( SELECT TOP(1) p.Id FROM Products p WHERE g.Id = p.GroupId ) AS FirstProductId, ( SELECT TOP(1) p.Name FROM Products p WHERE g.Id = p.GroupId ) AS FirstProductName FROM ProductGroups g SELECT p.Id, p.Name FROM ProductGroups g LEFT JOIN ( SELECT Id, Name, GroupId FROM ( SELECT Id, Name, GroupId, ROW_NUMBER() OVER(PARTITION BY GroupId ORDER BY Id) AS row FROM Products ) p WHERE row <= 1 ) p ON g.Id = p.GroupId 2 sub-selects Window function “ROW_NUMBER”
operations • Type and the order of operations (JOIN, filter, projection, ...) • Indexes being used • Amount of data flowing between two operations • Costs of an operation and a subtree Some tools have built-in support for displaying execution plans as a graph Execution plans
Execution plan 1 SQL 1 SELECT ( SELECT TOP(1) p.Id FROM Products p WHERE g.Id = p.GroupId ) AS FirstProductId, ( SELECT TOP(1) p.Name FROM Products p WHERE g.Id = p.GroupId ) AS FirstProductName FROM ProductGroups g
SQL 2 SELECT p.Id, p.Name FROM ProductGroups g LEFT JOIN ( SELECT Id, Name, GroupId FROM ( SELECT Id, Name, GroupId, ROW_NUMBER() OVER(PARTITION BY GroupId ORDER BY Id) AS row FROM Products ) p WHERE row <= 1 ) p ON g.Id = p.GroupId
scan Clustered index scan Non-clustered index scan Clustered index seek Non-clustered index seek Scan No clustered index Seek Missing index? Fuzzy search? Bad discriminator? Filter Filtering Filtering Filtering Key lookup Missing (include) columns? RID lookup If no clustered index Sort “Stop & Go” operator! Required? Missing index? Sorting in .NET cheaper? Parallelism Expensive High query complexity
perform sort on its own • Resource-saving • Data sets must be ordered • The “default” • Kind-of 2 nested loops • Speed depends on: • Scan of data set A • Seek of data set B • Large and unsorted sets • 2 phases: • Builds hash table for set A • Matches hash values from B Should be looked at Merge join Nested loop join Hash match join