Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling to 1M concurrent users on the JVM

Scaling to 1M concurrent users on the JVM

Livefyre built a platform that powers real-time comments and curated social media for some of the largest websites, such as CNN, Fox, Sky, CBS, Coca-Cola, HBO, CNET, Universal Music Group, and Break. On average it deals with one million concurrent users on its systems. Java EE will get you a long way, but with these numbers, the company needed to resort to some often-overlooked computer science tricks and techniques to scale its microservices architecture to handle as many as 100,000 active concurrent connections per JVM. This session covers some of the data structures, patterns, best practices, and datastores that Livefyre uses to make this all happen and keep it running.

Jo Voordeckers

October 29, 2015
Tweet

More Decks by Jo Voordeckers

Other Decks in Programming

Transcript

  1. Version 1.1
    Your Audience. Your Story.
    Scaling to 1,000,000 concurrent users on the JVM
    JavaOne 2015 - CON7220
    Jo Voordeckers
    Sr. Software Engineer - Livefyre platform
    @jovoordeckers
    [email protected]

    View Slide

  2. © LIVEFYRE 2015
    Livefyre helps over 1,500 of the most influential 

    brands & media companies build an engaged audience
    © LIVEFYRE 2014

    View Slide

  3. © LIVEFYRE 2015
    © LIVEFYRE 2015
    COMMENTS REVIEWS
    PPL WEARING
    JERSEYS
    2015 ALL-STAR
    GAME
    JUMP SHOTS
    FAN PHOTOS
    HASHTAG
    CAMPAIGN
    #TopicHub
    CHAT LIVE BLOG
    real-time streams of UGC 

    to scale content creation
    Collect
    to quickly find and organize
    the best social content
    Organize
    to your website with 

    no coding required
    Publish audiences with best in class engagement tools
    to increase time on site and build community
    ENGAGE
    SIDENOTES PHOTO UPLOAD

    View Slide

  4. Privileged and Confidential
    © LIVEFYRE 2015
    Real-Time Social Applications
    Comments Sidenotes
    Reviews
    Chat
    Media Wall
    Live Blog
    Polls Storify
    Social Maps
    Feed
    Trending
    Gallery

    View Slide

  5. © LIVEFYRE 2015
    1/ CHALLENGE

    View Slide

  6. © LIVEFYRE 2015
    Real-time challenge
    • 1,000,000 concurrent users
    • 150,000 per JVM
    • 100,000 req/s
    • 6-8x c3.2xlarge
    • long-poll + ws
    • 100s - 1,000s of listeners per stream
    • up to 250,000 listeners
    • read-heavy
    • updates < 2s

    View Slide

  7. © LIVEFYRE 2015
    Real-time challenge
    • Presidential Debate on Fox News
    • from 50,000 req/s
    • to 200,000 req/s
    • 150,000+ listeners to the stream

    View Slide

  8. © LIVEFYRE 2015
    2/BE{TTER,ST} PRACTICES

    View Slide

  9. © LIVEFYRE 2015
    Don’t use the “tech stack du jour”
    • use the right tools for your problem
    • embrace polyglot
    • Java, Scala, Jython
    • Python
    • NodeJS
    • K I S S + Y A G N I

    View Slide

  10. © LIVEFYRE 2015
    Microservices, not your typical SOA
    • well defined tasks
    • horizontal scalability
    • deploy often
    • upstart & supervisord
    • java main()
    • docker?
    • Kafka
    • REST

    View Slide

  11. © LIVEFYRE 2015

    View Slide

  12. © LIVEFYRE 2015
    Monitor all the things!
    are we sad
    • error vs success rates and timing
    • queue depth or lag
    • system resources
    • sample high velocity
    • /ping and /deep-ping
    access patterns
    • optimize scaling strategy
    • anticipate events

    View Slide

  13. © LIVEFYRE 2015
    Mo services mo problems
    Dashboards
    • service vs system health
    • correlate “strange events”
    • capacity planning
    • app specific
    Tools
    • statsd + graphite + grafana / gdash
    • sentry log4j appender
    • nagios + pagerduty

    View Slide

  14. © LIVEFYRE 2015
    Mo services mo problems
    Dashboards
    • service vs system health
    • correlate “strange events”
    • capacity planning
    • app specific
    Tools
    • statsd + graphite + grafana / gdash
    • sentry log4j appender
    • nagios + pagerduty

    View Slide

  15. © LIVEFYRE 2015
    Request distribution or “data access pattern”

    View Slide

  16. © LIVEFYRE 2015
    Request distribution or “data access pattern”
    Keep in memory (L1 cache)

    View Slide

  17. © LIVEFYRE 2015
    Request distribution or “data access pattern”
    Keep in memory (L1 cache)
    Get from S3
    (L2 cache)

    View Slide

  18. © LIVEFYRE 2015
    Request distribution or “data access pattern”
    Keep in memory (L1 cache)
    Get from S3
    (L2 cache)
    Similar reqs
    Partition users

    View Slide

  19. © LIVEFYRE 2015
    Forcing square pegs in a round hole
    • choose the right data stores
    • Database
    • Queue
    • sweet spot
    • type of data
    • type of queries
    • some optimized for write
    • some optimized for indexing
    • trade off of speed and consistency

    View Slide

  20. © LIVEFYRE 2015
    https://aphyr.com/tags/Jepsen
    Call me maybe - a story of unreliable communication

    View Slide

  21. © LIVEFYRE 2015
    3/BUILDING BLOCKS

    View Slide

  22. © LIVEFYRE 2015
    Throttling - Leaky bucket algorithm
    • capped output flow 

    regardless of input flow
    • accrue output allowance over time
    • drop requests if insufficient allowance
    • cost function

    # 1 item per interval
    allowance = rate = 1
    # 10 sec interval
    throttle_interval = 10
    # 1req/10sec = 0.1 qps
    qps = rate / throttle_interval
    last_check = time()
    def throttle(item):
    current = time() # or item.created_at
    size = cost(item) # [0..1]
    time_passed = current - last_check
    last_check = current
    allowance += time_passed * qps
    # Cap to rate
    allowance = min(rate, allowance)
    if allowance < size:
    return True
    allowance -= size
    return False

    View Slide

  23. © LIVEFYRE 2015
    Counting ‘Heavy Hitters’ - Space Saving Algorithm
    • unbounded stream
    • TOP-K in constant space
    • k * (item, count, error)
    • overestimates on replace
    • min(count)
    • MIN Heap + HashMap
    counts = { } # map of item to count
    errors = { } # map of item to error count
    for item in stream:
    if len(counts) < k:
    counts[item] += weight
    else:
    if item in counts:
    counts[item] += 1
    else:
    prev_min = item_with_min_count(counts)
    counts[item] = counts[prev_min] + 1 1
    errors[item] = counts[prev_min]
    counts.remove_key(prev_min)

    View Slide

  24. © LIVEFYRE 2015
    Partitioning - Consistent Hashing
    • article_id % server_count
    • what if hosts added/removed ?
    • thundering herd!
    • Hashing.consistentHash(item, server_count)
    • minimizes shuffling
    • ConsistentHashRing with virtual nodes
    • TreeSet with 100 replicas per node
    - hash(“node1:1”) .. hash(“node1:100”)
    - hash(“node2:1”) .. (“node2:100”) ,…
    • SortedMap.get(hash(item)) or
    • SortedMap.tailMap(hash(item)).firstKey()

    View Slide

  25. © LIVEFYRE 2015
    Partitioning - Consistent Hashing
    • article_id % server_count
    • what if hosts added/removed ?
    • thundering herd!
    • Hashing.consistentHash(item, server_count)
    • minimizes shuffling
    • ConsistentHashRing with virtual nodes
    • TreeSet with 100 replicas per node
    - hash(“node1:1”) .. hash(“node1:100”)
    - hash(“node2:1”) .. (“node2:100”) ,…
    • SortedMap.get(hash(item)) or
    • SortedMap.tailMap(hash(item)).firstKey()

    View Slide

  26. © LIVEFYRE 2015
    Partitioning - Consistent Hashing
    • article_id % server_count
    • what if hosts added/removed ?
    • thundering herd!
    • Hashing.consistentHash(item, server_count)
    • minimizes shuffling
    • ConsistentHashRing with virtual nodes
    • TreeSet with 100 replicas per node
    - hash(“node1:1”) .. hash(“node1:100”)
    - hash(“node2:1”) .. (“node2:100”) ,…
    • SortedMap.get(hash(item)) or
    • SortedMap.tailMap(hash(item)).firstKey()

    View Slide

  27. © LIVEFYRE 2015
    Partitioning - Consistent Hashing
    • article_id % server_count
    • what if hosts added/removed ?
    • thundering herd!
    • Hashing.consistentHash(item, server_count)
    • minimizes shuffling
    • ConsistentHashRing with virtual nodes
    • TreeSet with 100 replicas per node
    - hash(“node1:1”) .. hash(“node1:100”)
    - hash(“node2:1”) .. (“node2:100”) ,…
    • SortedMap.get(hash(item)) or
    • SortedMap.tailMap(hash(item)).firstKey()

    View Slide

  28. © LIVEFYRE 2015
    Partitioning - Consistent Hashing
    • article_id % server_count
    • what if hosts added/removed ?
    • thundering herd!
    • Hashing.consistentHash(item, server_count)
    • minimizes shuffling
    • ConsistentHashRing with virtual nodes
    • TreeSet with 100 replicas per node
    - hash(“node1:1”) .. hash(“node1:100”)
    - hash(“node2:1”) .. (“node2:100”) ,…
    • SortedMap.get(hash(item)) or
    • SortedMap.tailMap(hash(item)).firstKey()

    View Slide

  29. © LIVEFYRE 2015
    Membership test - Bloom Filters
    • very memory efficient
    • almost as fast as CHM
    • small % false pos
    • ZERO false neg
    • append only
    • see Cuckoo Filter
    • BloomFilter.create()

    View Slide

  30. © LIVEFYRE 2015
    Membership test - Bloom Filters
    • very memory efficient
    • almost as fast as CHM
    • small % false pos
    • ZERO false neg
    • append only
    • see Cuckoo Filter
    • BloomFilter.create()

    View Slide

  31. © LIVEFYRE 2015
    Membership test - Bloom Filters
    • very memory efficient
    • almost as fast as CHM
    • small % false pos
    • ZERO false neg
    • append only
    • see Cuckoo Filter
    • BloomFilter.create()

    View Slide

  32. © LIVEFYRE 2015
    Membership test - Bloom Filters
    • very memory efficient
    • almost as fast as CHM
    • small % false pos
    • ZERO false neg
    • append only
    • see Cuckoo Filter
    • BloomFilter.create()

    View Slide

  33. © LIVEFYRE 2015
    • ConcurrentHashMap’s secret
    • eg: ConcurrentBloomFilter
    • up to n threads non-blocking
    • n shards with a ReadWriteLock

    and BloomFilter
    • ConsistentHash index into shards
    • Striped in Guava
    Concurrency for shared resources - Striped Lock
    © LIVEFYRE 2015

    View Slide

  34. © LIVEFYRE 2015
    Random Sampling
    float sampleRate = 0.10f; // 10%
    if (ThreadLocalRandom.current().nextFloat() < sampleRate) {
    statsd.increment("high.velocity.request.success");
    }
    • for high velocity events
    • NEVER for sparse events

    View Slide

  35. © LIVEFYRE 2015
    • metadata store
    • set membership
    • distributed lock
    • leader election
    • Netflix Curator
    • DON’T TRY THIS AT HOME!
    Distributed Consensus - Zookeeper

    View Slide

  36. © LIVEFYRE 2015
    Async IO
    • Get up to 1M connections, capped by bandwidth
    • Netty
    • EPOLL on Linux
    • (Composite)ByteBuf
    • ChannelGroup
    • HashedWheelTimer
    • READ THE SOURCE!
    • Others work as well:
    • Vert.x, NodeJS, Python Gevent

    View Slide

  37. © LIVEFYRE 2015
    Data processing pipelines
    • Kafka Queues with many partitions
    • Auto-scale group of workers
    • commit batches of work to ZK (restart, lag)
    • Emit stats (success, error, timing)
    • Custom dashboard
    • sampled data from the stream
    • inject data in the stream (debug)
    • Future:
    • Spark Streaming
    • Mesos + Marathon + Chronos

    View Slide

  38. © LIVEFYRE 2015
    Mechanical Sympathy
    • Disruptor, lock-free Queue
    • BlockingQueue - backpressure!
    • JCTools - Multi Prod Single Cons Queue
    • CAS - Atomic* & Unsafe
    • OpenHFT
    • off-heap storage
    • cpu affinity for JVM threads
    • zero allocation hashing
    • mechanical-sympathy.blogspot.com

    View Slide

  39. THANK YOU
    San Francisco, CA

    New York, NY

    London, UK
    @livefyre.com
    press.livefyre.com
    blog.livefyre.com
    Jo Voordeckers
    SR. SOFTWARE ENGINEER - LF PLATFORM
    Email: [email protected]
    @jovoordeckers

    View Slide