Upgrade to Pro — share decks privately, control downloads, hide ads and more …

GeeCon 2017: Caching for Business Applications: Best Practices and Gotchas

GeeCon 2017: Caching for Business Applications: Best Practices and Gotchas

Caching is relevant for a wide range of business applications and there is a huge variety of products in the market ranging from easy to adopt local heap based caches to powerful distributed data grids. Most of these caches are being promoted with examples from applications that have the luxury of having 'eventual consistency' as a non-functional requirement. Most business / enterprise applications don't have that luxury. This talks aims at developers and architects that want to adopt a caching solution for their business application. I will present 15 caching patterns and best practices for these kinds of applications that address the typical questions being asked in that context. These questions might be: 'what data can I cache?', 'how to I handle consistency in a distributed environment?', 'which cache provider to choose?' or 'how do I integrate a cache provider in my application?'. This talk comes with many live demos, some of them are run on a distributed cache cluster on Raspberry Pis

Michael Plöd

May 17, 2017
Tweet

More Decks by Michael Plöd

Other Decks in Technology

Transcript

  1. Caching

    for business applications
    Michael Plöd - innoQ
    Twitter: @bitboss
    Kraków, 17-19 May 2017

    View Slide

  2. I will talk about
    Caching Types / Topologies
    Best Practices for Caching in Enterprise Applications
    I will NOT talk about
    Latency / Synchronization discussion
    What is the best caching product on the market
    HTTP / Database Caching
    Caching in JPA, Hibernate or other ORMs

    View Slide

  3. Cache
    / kæʃ /


    In computing, a cache is a component that transparently stores data so that future requests
    for that data can be served faster. The data that is stored within a cache might be values that
    have been computed earlier or duplicates of original values that are stored elsewhere. If
    requested data is contained in the cache (cache hit), this request can be served by simply
    reading the cache, which is comparatively faster. Otherwise (cache miss), the data has to be
    recomputed or fetched from its original storage location, which is comparatively slower. Hence,
    the greater the number of requests that can be served from the cache, the faster the overall
    system performance becomes.
    Source: http://en.wikipedia.org/wiki/Cache_(computing)

    View Slide

  4. That’s awesome. Let’s cache everything
    and everywhere and distribute it all in
    a Cluster in a transactional manner
    ohhh by the way: Twitter has been
    doing that for ages
    Are you
    crazy?

    View Slide

  5. Business-Applications
    !=

    Twitter / Facebook & co.

    View Slide

  6. Many enterprise grade projects
    are adapting caching too
    defensive or too offensive and are
    running into consistency or
    performance issues because of
    that

    View Slide

  7. But with a well adjusted caching
    strategy you will make your
    application more scalable, faster
    and cheaper to operate.

    View Slide

  8. CACHES
    Types of

    Places for
    Local Cache, Data Grid, Document Store, JPA
    First Level Cache, JPA Second Level Cache,
    Hybrid Cache
    Database, Heap, HTTP Proxy, Browser,
    Prozessor, Disk, Off Heap, Persistence-
    Framework, Application

    View Slide

  9. We will focus on local and
    distributed caching at the
    application level

    View Slide

  10. Which data shall I
    cache?
    Where shall I cache?
    Which cache shall I use?
    Which impact does it have on my
    infrastructure
    How about data-consistency
    How do I introduce
    caching?
    How do I abstract my
    cache implementation?

    View Slide

  11. 1 Identify suitable layers for
    caching

    View Slide

  12. ComplaintManagementRestController
    ComplaintManagementBusinessService
    DataAggrgationManager
    Host
    Commands
    SAP
    Commands
    Spring Data
    Repository
    HTTP
    Caching
    Read
    Operations
    Read
    Operations
    Read
    Operations
    Read
    Operations
    Read and
    Write
    Operations
    Suitable
    Layers
    for

    Caching

    View Slide

  13. 2 Stay local as long as possible

    View Slide

  14. Lokal In-Memory
    JVM
    Cache

    View Slide

  15. Clustered
    JVM
    Cache
    JVM
    Cache
    JVM
    Cache
    JVM
    Cache

    View Slide

  16. Which data shall I
    cache?
    Where shall I cache?
    Which cache shall I use?
    Which impact does it have on my
    infrastructure
    How about data-consistency
    How do I introduce
    caching?
    How about caching in
    Spring?

    View Slide

  17. JVM
    JVM
    JVM
    JVM
    Clustered - with sync
    Cache
    Cache
    Cache
    Cache
    Invalidation
    Replication

    View Slide

  18. 3 Avoid real replication where
    possible

    View Slide

  19. Cache
    Cache
    Cache
    Cache
    Invalidation - Option 1
    #1
    PUT
    (Insert)
    PUT
    (Insert)
    #1
    #1
    PUT
    (Insert)
    PUT
    (Insert)
    #1

    View Slide

  20. Cache
    Cache
    Cache
    Cache
    #1 #1
    PUT
    (Update)
    #1
    inv #1
    #1
    Invalidation - Option 1

    View Slide

  21. Cache
    Cache
    Cache
    Cache
    Invalidation - Option 2
    #1
    PUT
    (Insert)
    PUT
    (Insert)
    #1
    #1
    PUT
    (Insert)
    PUT
    (Insert)
    #1

    View Slide

  22. Cache
    Cache
    Cache
    Cache
    #1
    #1
    #1
    Replication
    #1
    PUT
    (Insert)
    PUT
    (Update)
    #1

    View Slide

  23. As of now every cache could
    potentially hold every data which
    consumes heap memory

    View Slide

  24. Big Heap
    ?

    View Slide

  25. Which data shall I
    cache?
    Where shall I cache?
    Which cache shall I use?
    Which impact does it have on my
    infrastructure
    How about data-consistency
    How do I introduce
    caching?
    How about caching in
    Spring?

    View Slide

  26. 4 Avoid big heaps just for caching

    View Slide

  27. Big heap
    leads to long
    major GCs
    Application
    Data
    Cache
    32 GB

    View Slide

  28. Long GCs can destabilize your
    cluster
    JVM
    Cache
    JVM
    Cache
    JVM
    Cache
    JVM
    Cache
    GC
    GC

    View Slide

  29. Small caches
    are a bad idea!
    Many evictions, fewer hits,
    no „hot data“.


    This is especially critical for
    replicating caches.

    View Slide

  30. 5 Use a distributed cache for
    big amounts of data

    View Slide

  31. Distributed Caches
    JVM
    JVM JVM
    JVM
    Cache Node
    1
    Cache Node
    2
    Cache Node
    3

    View Slide

  32. 1
    Customer
    #23
    Customer
    #30
    Customer
    #27
    Customer
    #32
    2

    View Slide

  33. 1 2
    Customer
    #23
    Customer
    #30
    Customer
    #27
    Customer
    #32
    BACKUP
    #27
    BACKUP
    #32
    BACKUP
    #23
    BACKUP
    #30
    Data is being
    distributed and
    backed up

    View Slide

  34. 1 2
    Customer
    #23
    Customer
    #30
    Customer
    #27
    Customer
    #32
    BACKUP
    #27
    BACKUP
    #32
    BACKUP
    #23
    BACKUP
    #30
    3

    View Slide

  35. 3
    1 2
    Customer
    #23
    Customer
    #30
    Customer
    #27
    Customer
    #32
    BACKUP
    #27
    BACKUP
    #32
    BACKUP
    #23
    BACKUP
    #30
    4

    View Slide

  36. 4
    3
    1 2
    Customer
    #23
    Customer
    #30
    Customer
    #27
    Customer
    #32
    BACKUP
    #27
    BACKUP
    #32
    BACKUP
    #23
    BACKUP
    #30

    View Slide

  37. A distributed cache leads to
    smaller heaps, more capacity and
    is easy to scale
    Application
    Data
    Cache
    2 - 4 GB
    … Cache

    View Slide

  38. 6 The operations specialist is
    your new best friend

    View Slide

  39. Clustered caches are
    complex. Please make
    sure that operations
    and networking are
    involved as early as
    possible.

    View Slide

  40. Which data shall I
    cache?
    Where shall I cache?
    Which cache shall I use?
    Which impact does it have on my
    infrastructure
    How about data-consistency
    How do I introduce
    caching?
    How about caching in
    Spring?

    View Slide

  41. 7 Make sure that only suitable
    data gets cached

    View Slide

  42. The best cache candidates are
    read-mostly data, which are
    expensive to obtain

    View Slide

  43. If you urgently must cache write-
    intensive data make sure to use a
    distributed cache and not a
    replicated or invalidating one

    View Slide

  44. Which data shall I
    cache?
    Where shall I cache?
    Which cache shall I use?
    Which impact does it have on my
    infrastructure
    How about data-consistency
    How do I introduce
    caching?
    How about caching in
    Spring?

    View Slide

  45. 8 Only use existing cache
    implementations

    View Slide

  46. NEVER
    write your own cache
    implementation
    EVER

    View Slide

  47. CACHE

    Implementations
    Infinispan, EHCache, Hazelcast, Couchbase,
    Memcache, OSCache, SwarmCache, Xtreme
    Cache, Apache DirectMemory
    Terracotta, Coherence, Gemfire, Cacheonix,
    WebSphere eXtreme Scale, Oracle 12c In
    Memory Database

    View Slide

  48. Which data shall I
    cache?
    Where shall I cache?
    Which cache shall I use?
    Which impact does it have on my
    infrastructure
    How about data-consistency
    How do I introduce
    caching?
    How about caching in
    Spring?

    View Slide

  49. 9 Introduce Caching in three
    steps

    View Slide

  50. Optimize your
    application
    Local Cache Distributed Cache
    Performance
    Boost
    Performance
    Loss

    View Slide

  51. 10 Optimize Serialization

    View Slide

  52. Example: Hazelcast

    putting and getting 10.000 objects locally
    GET Time PUT Time Payload Size
    Serializable ? ? ?
    Data

    Serializable
    ? ? ?
    Identifier

    Data

    Serializable
    ? ? ?

    View Slide

  53. Example: Hazelcast

    putting and getting 10.000 objects locally
    GET Time PUT Time Payload Size
    Serializable 1287 ms 1220 ms 1164 byte
    Data

    Serializable
    443 ms 408 ms 916 byte
    Identifier

    Data

    Serializable
    264 ms 207 ms 882 byte

    View Slide

  54. JAVA
    SERIALIZATION

    SUCKS
    for Caching if alternatives are present

    View Slide

  55. 11 Use Off-Heap Storage for
    Cache instances with more
    than 4 GB Heap Size

    View Slide

  56. JVM
    Cache Runtime
    Cache
    Data
    32 GB HEAP

    View Slide

  57. Off Heap
    30 GB RAM
    JVM
    Cache Runtime
    Cache
    Data
    2 GB HEAP
    No Garbage Collection
    Very short Garbage
    Collections

    View Slide

  58. 12 Mind the security gap

    View Slide

  59. Application
    „CRM“ „Host“ DB
    Security
    Security
    Security
    Cache
    CRM Data
    SAP Data
    DB Data
    ?
    Mind security when reading
    data from the cache

    View Slide

  60. 13 Abstract your cache
    provider

    View Slide

  61. public Account retrieveAccount(String accountNumber)
    {
    Cache cache = ehCacheMgr.getCache(„accounts“);
    Account account = null;
    Element element = cache.get(accountNumber);
    if(element == null) {
    //execute some business logic for retrieval
    //account = result of logic above
    cache.put(new Element(accountNumber, account));
    } else {
    account = (Account)element.getObjectValue();
    }
    return account;
    }
    Tying your code to a cache provider is bad practice

    View Slide

  62. public Account retrieveAccount(String accountNumber)
    {
    Cache cache = ehCacheMgr.getCache(„accounts“);
    Account account = null;
    Element element = cache.get(accountNumber);
    if(element == null) {
    //execute some business logic for retrieval
    //account = result of logic above
    cache.put(new Element(accountNumber, account));
    } else {
    account = (Account)element.getObjectValue();
    }
    return account;
    }
    Try switching from EHCache to Hazelcast
    You will
    have to
    adjust these
    lines of code
    to the
    Hazelcast
    API

    View Slide

  63. public Account retrieveAccount(String accountNumber)
    {
    Cache cache = ehCacheMgr.getCache(„accounts“);
    Account account = null;
    Element element = cache.get(accountNumber);
    if(element == null) {
    //execute some business logic for retrieval
    //account = result of logic above
    cache.put(new Element(accountNumber, account));
    } else {
    account = (Account)element.getObjectValue();
    }
    return account;
    }
    You can’t switch cache providers between
    environments
    EHCache is
    tightly
    coupled to
    your code

    View Slide

  64. public Account retrieveAccount(String accountNumber)
    {
    Cache cache = ehCacheMgr.getCache(„accounts“);
    Account account = null;
    Element element = cache.get(accountNumber);
    if(element == null) {
    //execute some business logic for retrieval
    //account = result of logic above
    cache.put(new Element(accountNumber, account));
    } else {
    account = (Account)element.getObjectValue();
    }
    return account;
    }
    You mess up your business logic with
    infrastructure
    This is all
    caching
    related code
    without any
    business
    relevance

    View Slide



  65. class="org.springframework.cache.ehcache.EhCacheCacheManager"
    p:cacheManager-ref="ehcache"/>
    class="org.springframework.cache.ehcache.EhCacheManagerFactoryBean"
    p:configLocation="/ehcache.xml"/>
    @Cacheable("Customers")
    public Customer getCustomer(String customerNumber) {

    }
    Introducing Spring’s cache abstraction

    View Slide

  66. Michael Plöd - @bitboss Kraków, 17-19 May 2017
    THANK YOU!


    Follow me on Twitter for slides

    @bitboss

    View Slide