Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DDC 2019 - Entity Framework Core - Performance Optimization

Pawel Gerr
November 27, 2019

DDC 2019 - Entity Framework Core - Performance Optimization

Pawel Gerr

November 27, 2019
Tweet

More Decks by Pawel Gerr

Other Decks in Programming

Transcript

  1. Entity Framework Core
    Performance Optimization
    Pawel Gerr
    @pawelgerr
    [email protected]
    .NET Developer
    Conference 2019

    View full-size slide

  2. Entity Framework Core
    Performance Optimization
    Example for today – Products DB
    Product
    group
    Products
    Prices
    Sellers
    Studio

    View full-size slide

  3. Entity Framework Core
    Performance Optimization
    My usual options for Entity Framework performance optimization
    2nd level cache
    Hardware
    upgrade
    Materialized views
    Stored procedures
    Triggers
    Database Features Server
    Raw
    SELECT *
    FROM …
    Raw SQL
    Bulk operations
    Indices
    Entity Framework
    Requests reduction Query restructure
    … and some other (minor) features we will see later
    Use 3rd party
    libs if available
    CPU, Memory, I/O

    View full-size slide

  4. Entity Framework Core
    Performance Optimization
    Choose your optimization strategy wisely
    Usually at its
    limits already
    Database load
    (SQL)
    Application server load
    (.NET / Entity Framework)
    Performance
    Implementation
    effort
    Hardware
    cost
    SQL generation
    Materialization

    View full-size slide

  5. Entity Framework Core
    Performance Optimization
    Did „the change“ anything good? – My KPI
    Execution time (LINQ)
    SELECT *
    FROM …
    SELECT *
    FROM …
    SELECT *
    FROM …
    Number of requests Query statistics
    SELECT *
    FROM …
    “Shape” of SQL Execution plans
    Parse time
    Compile time
    Execution time (DB)
    Physical reads
    Logical reads
    Row count
    Round trip time
    SQL generation
    Execution time (DB)
    Materialization
    My key performance indicators

    View full-size slide

  6. Entity Framework Core
    Performance Optimization
    Reducing database requests

    View full-size slide

  7. Entity Framework Core
    Performance Optimization
    N+1 queries problem
    Execution of hundreds of requests for one use case
    Caused by: unintentional execution of queries in loops
    Possible solutions:
    • Better code structure (software architecture)
    • Avoid overgeneralization (shared/core projects)
    • Help each other to improve (review/feedback)
    • Know the insights of Entity Framework
    • Lazy loading
    • Limitations
    Reducing database requests

    View full-size slide

  8. Simplified case of indirect execution of database requests in a loop
    Library Main
    Entity Framework Core
    Performance Optimization
    Multiple requests in loops
    Reducing database requests
    var products = LoadProducts();
    var prices = new List();
    foreach (var product in products)
    {
    var price = GetPrice(product.Id);
    prices.Add(price);
    }
    public Price GetPrice(Guid productId)
    {
    MyDbContext ctx = …;
    return ctx.Prices.FirstOrDefault(p => p.ProductId == productId);
    }
    Executed N times

    View full-size slide

  9. Library Main
    Entity Framework Core
    Performance Optimization
    LINQ queries are loops
    Reducing database requests
    var products = LoadProducts();
    var prices = products.Select(product => GetPrice(product.Id))
    .ToList();
    public Price GetPrice(Guid productId)
    {
    MyDbContext ctx = …;
    return ctx.Prices.FirstOrDefault(p => p.ProductId == productId);
    }

    View full-size slide

  10. Library Main
    Entity Framework Core
    Performance Optimization
    Approach 1 – Use-case specific methods
    Reducing database requests
    var productsWithPrices = LoadProductsWithPrices();
    public List LoadProductsWithPrices()
    {
    return ctx.Products
    .Include(p => p.Prices) // merely symbolic, fetching prices is usually more complex
    .ToList();
    }

    View full-size slide

  11. Library Main
    var products = LoadProducts();
    var productIds = products.Select(p => p.Id);
    var prices = GetPrices(productIds);
    public List GetPrices(IEnumerable productIds)
    {
    return ctx.Prices
    .Where(p => productIds.Contains(p.ProductId))
    .ToList();
    }
    Entity Framework Core
    Performance Optimization
    Approach 2 – load all required prices at once
    Reducing database requests

    View full-size slide

  12. Tips for more complex use cases:
    • Use abstractions to predetermine possible paths
    • Take a look at specification pattern to get good encapsulation and flexibility
    Library Main
    var deliverableBluRaysWithPrice = LoadProducts()
    .WithLatestPrice()
    .AreDeliverable()
    .Of(MediaType.BluRay)
    .ToList();
    public ProductsQuery LoadProducts()
    {
    return new ProductsQuery(ctx); // alternatively, we can provide some kind of “repositories” to work on
    }
    Entity Framework Core
    Performance Optimization
    Approach 3 – combine 1 and 2
    Reducing database requests

    View full-size slide

  13. Given: 100 products being in 5 studio
    Loading of all products “in one go” leads to 1 + 5 queries
    Library Main
    var productsByStudio = ctx.Products
    .Include(p => p.Studio)
    .ToLookup(p => p.Studio.Name);
    Entity Framework Core
    Performance Optimization
    Lazy loading
    Reducing database requests
    public static ILookup ToLookup(
    this IEnumerable source,
    Func keySelector)
    {
    ...
    }
    Use eager loading to
    prevent unnecessary
    requests
    IEnumerable always
    forces query execution

    View full-size slide

  14. LINQ
    Entity Framework Core
    Performance Optimization
    Limitations of Entity Framework Core
    Access to navigational property in EF 2.2 leads to N+1 queries if query is not trivial
    • Loading of the first product of 5 studios leads to 1 + 5 queries
    • The issue has been fixed in EF 3.0
    Reducing database requests
    var studios = ctx.Studios
    .Select(s =>
    {
    Studio = s,
    FirstProduct = s.Products.FirstOrDefault()
    })
    .ToList();
    not trivial query

    View full-size slide

  15. EF 2.2 evaluates non-translatable parts of the query in memory
    EF 3.0 throws an InvalidOperationException: The LINQ expression could not be translated.
    Main
    Library
    Entity Framework Core
    Performance Optimization
    Client-side evaluation
    Reducing result set
    public bool IsDeliverable(Product product)
    {
    // business logic
    }
    EF cannot translate
    custom methods to
    SQL
    Rewrite the predicate
    Library
    public Expression> IsDeliverable()
    {
    return p => p.DeliverableFrom <= DateTime.Now &&
    p.DeliverableUntil > DateTime.Now;
    }
    var products = ctx.Products
    .Where(IsDeliverable())
    .Where(p => IsDeliverable(p))
    .ToList();

    View full-size slide

  16. Entity Framework Core
    Performance Optimization
    Finding the problem area
    Use profiling tools
    In case of Microsoft SQL Server:
    XEvent Profiler in SQL Server Management Studio
    SQL Server Profiler (deprecated, resource intensive)
    Azure Data Studio
    Logs coming from EF with EnableSensitiveDataLogging
    Reducing database requests
    Logging
    DbContextOptionsBuilder builder = ...;
    builder.UseSqlServer("...") // or any other database
    .UseLoggerFactory(loggerFactory)
    .EnableSensitiveDataLogging();
    In development only
    because possible security leak!

    View full-size slide

  17. Entity Framework Core
    Performance Optimization
    Finding the problem area
    Look for repetitive lines and cycles
    Reducing database requests
    Repetive lines
    Cycles

    View full-size slide

  18. Entity Framework Core
    Performance Optimization
    Reducing query complexity

    View full-size slide

  19. Entity Framework Core
    Performance Optimization
    Cartesian explosion problem
    Fetching a lot of data from multiple tables at once
    • Increased memory consumption and execution time
    • High I/O load
    Caused by: loading of multiple navigational collection-properties
    Possible solutions:
    • Reduce Includes (eager loading)
    • Reduce access to navigational properties in projections (i.e. Select)
    Reducing query complexity
    Query splitting

    View full-size slide

  20. Entity Framework Core
    Performance Optimization
    Eager loading via Include
    EF 2.2 is loading collections separately
    EF 3.0 loads all data using 1 SQL statement
    • Bigger result sets: 100 products with 5 prices each and 3 sellers
    (100 * 5 * 3) rows > (100 + 500 + 3) records
    • EF-forced ORDER BY clause produces considerable load
    Reducing query complexity
    Product Id Price Id Seller Id
    1 1 1
    1 1 2
    1 1 3
    1 2 1
    1 2 2
    1 2 3
    SQL LINQ
    var products = ctx.Products
    .Include(p => p.Studio)
    .Include(p => p.Prices)
    .Include(p => p.Sellers)
    .ToList();
    SELECT *
    FROM Products
    INNER JOIN Studios ...
    LEFT JOIN Prices ON ...
    LEFT JOIN Sellers ON ...
    ...
    ORDER BY Products.Id, Prices.Id, Sellers.Id
    Cartesian
    explosion

    View full-size slide

  21. Before
    Entity Framework Core
    Performance Optimization
    Using collections in projections
    Loading all data at once
    Statistics:
    • CPU: 31
    • Duration: 75
    • Reads: 5299
    • Row Count: 12342
    Reducing query complexity
    LINQ
    var studios = ctx.Studios
    .Select(s => new MyStudio
    {
    Id = s.Id,
    Name = s.Name,
    Infinity = s.Products.Where(p => p.Name.StartsWith("Infinity")),
    Endgame = s.Products.Where(p => p.Name.StartsWith("Endgame"))
    })
    .ToList();

    View full-size slide

  22. Before
    Usually, loading collection separately requires more code but less resources on database
    Statistics:
    • CPU: 31
    • Duration: 75
    • Reads: 5299
    • Row Count: 12342
    After
    LINQ
    Entity Framework Core
    Performance Optimization
    Query splitting
    Reducing query complexity
    var studios = ctx.Studios.Select(p => new Studio() { Id = p.Id, Name = p.Name }).ToList();
    var infinity = ctx.Products.Where(p => p.Name.StartsWith("Infinity")).ToList();
    var endgame = ctx.Products.Where(p => p.Name.StartsWith("Endgame")).ToList();
    var infinityLookup = infinity.ToLookup(p => p.StudioId);
    var endgameLookup = endgame.ToLookup(p => p.StudioId);
    foreach (var studio in studios)
    {
    studio.Infinity = infinityLookup[studio.Id];
    sturio.Endgame = endgameLooup[studio.Id];
    }
    Necessary, if change
    tracking is disabled
    16
    3
    352
    2322

    View full-size slide

  23. Entity Framework Core
    Performance Optimization
    Avoid unnecessary fuzzy searches

    View full-size slide

  24. Entity Framework Core
    Performance Optimization
    Fuzzy searches may lead to
    • Bigger internal working sets
    • Sub-optimal JOIN order
    • Parallelization
    • Usage of temp tables
    • Higher query memory grant
    Possible solutions:
    • Good domain knowledge (code review)
    • Better code structure
    Avoid unnecessary fuzzy searches

    View full-size slide

  25. Task: export of products for billing for a studio
    Option 2 Option 1
    Entity Framework Core
    Performance Optimization
    Know the domain
    Avoid unnecessary fuzzy searches
    var studio = "Walt Disney";
    var products = ctx.Products
    .Where(p => p.Studio.Name == studio)
    .ToList();
    var studio = "Disney";
    var products = ctx.Products
    .Where(p => p.Studio.Name.Contains(studio))
    .ToList();
    Looks like for the product search on the website

    View full-size slide

  26. Product search, originally implemented for website became “shared”
    The method above can but should not be used for billing
    Sharing of code for different use cases may lead to bad performance (and bugs)
    Shared project
    Entity Framework Core
    Performance Optimization
    Better code structure
    Avoid unnecessary fuzzy searches
    public class ProductRepository
    {
    public List LoadProducts(string studio)
    {
    return ctx.Products
    .Where(p => p.Studio.Name.Contains(studio))
    .ToList();
    }
    }
    Often in shared/core project
    but rarely belongs there

    View full-size slide

  27. Entity Framework Core
    Performance Optimization
    Understanding queries

    View full-size slide

  28. LINQ 2 LINQ 1
    Entity Framework Core
    Performance Optimization
    Which one is better?
    Understanding Queries
    var groups = ctx.ProductGroups
    .Select(g => new
    {
    g.Products.FirstOrDefault().Id,
    g.Products.FirstOrDefault().Name
    })
    .ToList();
    var groups = ctx.ProductGroups
    .Select(g => g.Products
    .Select(p => new
    {
    p.Id,
    p.Name
    })
    .FirstOrDefault())
    .ToList();
    2x “FirstOrDefault()”
    before selecting properties
    Selection of properties
    before “FirstOrDefault()”

    View full-size slide

  29. SQL 2 SQL 1
    Entity Framework Core
    Performance Optimization
    Lets try with SQL …
    Understanding Queries
    SELECT
    ( SELECT TOP(1) p.Id FROM Products p WHERE g.Id = p.GroupId ) AS FirstProductId,
    ( SELECT TOP(1) p.Name FROM Products p WHERE g.Id = p.GroupId ) AS FirstProductName
    FROM
    ProductGroups g
    SELECT p.Id, p.Name
    FROM
    ProductGroups g
    LEFT JOIN
    (
    SELECT Id, Name, GroupId
    FROM
    (
    SELECT Id, Name, GroupId, ROW_NUMBER() OVER(PARTITION BY GroupId ORDER BY Id) AS row
    FROM Products
    ) p
    WHERE row <= 1
    ) p ON g.Id = p.GroupId
    2 sub-selects
    Window function
    “ROW_NUMBER”

    View full-size slide

  30. Entity Framework Core
    Performance Optimization
    Execution plans

    View full-size slide

  31. Entity Framework Core
    Performance Optimization
    Textual representation of all database operations
    • Type and the order of operations (JOIN, filter, projection, ...)
    • Indexes being used
    • Amount of data flowing between two operations
    • Costs of an operation and a subtree
    Some tools have built-in support for displaying execution plans as a graph
    Execution plans

    View full-size slide

  32. Entity Framework Core
    Performance Optimization
    Another try …
    Execution plans
    Execution plan 1 SQL 1
    SELECT
    ( SELECT TOP(1) p.Id FROM Products p WHERE g.Id = p.GroupId ) AS FirstProductId,
    ( SELECT TOP(1) p.Name FROM Products p WHERE g.Id = p.GroupId ) AS FirstProductName
    FROM
    ProductGroups g

    View full-size slide

  33. Entity Framework Core
    Performance Optimization
    Execution plans
    Execution plan 2 SQL 2
    SELECT p.Id, p.Name
    FROM
    ProductGroups g
    LEFT JOIN
    (
    SELECT Id, Name, GroupId
    FROM
    (
    SELECT Id, Name, GroupId, ROW_NUMBER() OVER(PARTITION BY GroupId ORDER BY Id) AS row
    FROM Products
    ) p
    WHERE row <= 1
    ) p ON g.Id = p.GroupId

    View full-size slide

  34. Entity Framework Core
    Performance Optimization
    Comparing
    • Compare estimated subtree costs
    • Deviance: estimated vs actual number of rows
    • Take statistics (like reads) into consideration
    Execution plans Query 1 wins!

    View full-size slide

  35. Entity Framework Core
    Performance Optimization
    SQL Server Operators
    General icon
    Cursors
    Parallelism
    T-SQL
    Execution plans

    View full-size slide

  36. Entity Framework Core
    Performance Optimization
    Execution plans
    Crash course
    Table
    scan
    Clustered
    index scan
    Non-clustered
    index scan
    Clustered
    index seek
    Non-clustered
    index seek
    Scan
    No clustered index
    Seek
    Missing index?
    Fuzzy search?
    Bad discriminator?
    Filter
    Filtering
    Filtering Filtering
    Key lookup
    Missing (include) columns?
    RID lookup If no clustered index
    Sort
    “Stop & Go” operator!
    Required?
    Missing index?
    Sorting in .NET cheaper?
    Parallelism
    Expensive
    High query complexity

    View full-size slide

  37. Entity Framework Core
    Performance Optimization
    JOINs
    Execution plans
    DB may perform
    sort on its own
    • Resource-saving
    • Data sets must be ordered
    • The “default”
    • Kind-of 2 nested loops
    • Speed depends on:
    • Scan of data set A
    • Seek of data set B
    • Large and unsorted sets
    • 2 phases:
    • Builds hash table for set A
    • Matches hash values from B
    Should be looked at
    Merge join Nested loop join Hash match join

    View full-size slide

  38. Entity Framework Core
    Performance Optimization
    Learnings
    • Look for database requests in “loops”
    • Split complex queries if necessary
    • Write conditions as precise as possible
    • Learn to read execution plans - optimize queries accordingly
    Resources
    • Entity Framework Core: https://docs.microsoft.com/en-us/ef/core/
    • Execution Plans: https://docs.microsoft.com/en-us/sql/relational-databases/performance/execution-plans
    • My EF playground: https://github.com/PawelGerr/EntityFrameworkCore-Demos
    Ask me
    @pawelgerr
    [email protected]

    View full-size slide