Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Shipping a Replacement Architecture in Elixir

Chris Bell
February 11, 2018

Shipping a Replacement Architecture in Elixir

Talk given at EMPEX LA, 2018.

Describes the journey at Frame.io of shipping a new Elixir powered set of services to replace our aging Ruby on Rails and Node.js powered stack.

#elixir #phoenix

Chris Bell

February 11, 2018
Tweet

More Decks by Chris Bell

Other Decks in Technology

Transcript

  1. Shipping a Replacement
    Architecture in Elixir
    Chris Bell • @cjbell_
    EMPEX LA 2018

    View full-size slide


  2. I’m Chris and I <3 Elixir
    • 3 years of writing production Elixir apps
    • EMPEX NYC Organizer
    • ElixirTalk Co-host (with @desmondmonster)

    View full-size slide

  3. • Powers Review & Collaboration for Video teams
    • Used by Vice, Turner, Buzzfeed, NYTimes, NASA
    • 450,000+ customers
    • Founded in 2014, based in NYC

    View full-size slide

  4. Ruby on Rails to

    Elixir & Phoenix
    OUR JOURNEY FROM:

    View full-size slide

  5. Why Rewrite & 

    Why Elixir?
    PART I

    View full-size slide

  6. We did that thing you should
    never do: a rewrite
    OOPS, SORRY JOEL

    View full-size slide

  7. Rapid growth and lots of shifting
    requirements left the codebase
    not in the best state

    View full-size slide

  8. • Ruby 1.9.3, Rails 3.2
    • No tests… at all… for anything
    • Custom ORM into DynamoDB
    • Metaprogramming everywhere
    • No logical separation of concerns (authorization,
    persistence, domain logic)
    • No metrics or visibility into performance
    API ISSUES PRE MIGRATION

    View full-size slide

  9. • Everything is a string … even nulls.
    • Foreign keys stored as a JSON encoded list of strings on
    the model. No atomic updates = lost data.
    • DynamoDB cost $$$ to support our workload
    • No ability to paginate anything = up to 45s response
    times and large payloads (> 1mb of JSON)
    DATABASE ISSUES PRE-MIGRATION

    View full-size slide

  10. We felt we were technically
    bankrupt with our existing API
    and services

    View full-size slide

  11. So why Elixir for us?

    View full-size slide

  12. • Easy-ish ramp up for our existing Ruby / Python
    developers
    • Highly concurrent: use resources much more efficiently
    • Building on a mature VM (BEAM) and established
    language (Erlang & OTP)
    • Language attributes promote explicitness:
    immutability, pattern matching, multi function heads.
    WHY ELIXIR?

    View full-size slide

  13. What We Shipped &

    A Peek Into The System
    PART II

    View full-size slide

  14. API 

    (Ruby on Rails)
    Websocket
    connection
    Real-time Service
    (Node.JS)
    Support Tool
    (Node.js)
    Push
    Notifications
    SES
    Email Digest Service
    (Node.js)
    Client

    (iOS / Web / Adobe)
    DynamoDB
    LEGACY ARCHITECTURE

    View full-size slide

  15. API 

    (Ruby on Rails)
    Websocket
    connection
    Real-time Service
    (Node.JS)
    Support Tool
    (Node.js)
    Push
    Notifications
    SES
    Email Digest Service
    (Node.js)
    Client

    (iOS / Web / Adobe)
    DynamoDB
    LEGACY ARCHITECTURE: WHAT WE REPLACED

    View full-size slide

  16. • Elixir powered API, notifications system, real-time
    service, and support tool
    • Migrated all of our data to Postgres from DynamoDB
    • Dockerized all of the above and rebuilt our tooling
    and deploy process from the ground up
    WHAT WE SHIPPED

    View full-size slide

  17. V2 API 

    (Phoenix)
    Websocket
    connection
    Real-time Service 

    (Elixir / Phoenix)
    Support Tool
    (Phoenix)
    Push
    Notifications
    SES
    Email Service
    Client

    (iOS / Web / Adobe)
    Postgres
    UPDATED ARCHITECTURE
    Munger API
    (Phoenix)
    Core Business
    Logic
    Memcached
    Umbrella App

    View full-size slide

  18. • ~40 EC2 Instances to ~5 (running on ECS)
    • API 95th Percentile: ~30ms @ ~120rps
    • Database cost 91% reduced
    • Full visibility into all parts of the system (via statsd
    & datadog)
    • Modular, documented, maintainable codebase
    WHAT WE SHIPPED: RESULTS

    View full-size slide

  19. AND BEST OF ALL:
    Predictable, stable performance
    with no out-of-hours incidents yet
    (and very few in general)

    View full-size slide

  20. 1. The Intermediary API
    2. Umbrella App Structure
    3. Event System
    4. Moving millions of records
    A WHIRLWIND TOUR THROUGH THE SYSTEM

    View full-size slide

  21. • Consumes new API, spits out old schemas and
    maintains legacy contract (by stringifying everything)
    • Allowed us to ship our new stack sooner with fewer
    implications for our different clients
    • Complexity is high, but designed to be thrown away
    (~6 months time)
    THE INTERMEDIARY API

    View full-size slide

  22. THE INTERMEDIARY API: HOW IT WORKS
    HTTP

    Request
    Fetch
    Resources
    Translate
    New to Old
    Serialize
    4
    3
    2
    1

    View full-size slide

  23. THE INTERMEDIARY API: HOW IT WORKS
    HTTP

    Request
    Fetch
    Resources
    Translate
    New to Old
    Serialize
    4
    3
    2
    1

    View full-size slide

  24. THE INTERMEDIARY API: HOW IT WORKS
    HTTP

    Request
    Fetch
    Resources
    Translate
    New to Old
    Serialize
    4
    3
    2
    1

    View full-size slide

  25. 1. The Intermediary API
    2. Umbrella App Structure
    3. Event System
    4. Moving millions of records
    A WHIRLWIND TOUR THROUGH THE SYSTEM

    View full-size slide

  26. • We use a single ‘monorepo’ to contain all our
    separate applications structured as an Umbrella
    • Total of 11 apps right now
    UMBRELLA APP STRUCTURE

    View full-size slide

  27. UMBRELLA APP STRUCTURE
    Core
    API
    Support Tool
    Munger DB
    Cron
    Dynasaur
    Monitoring
    Middleware
    PHOENIX APPS BUSINESS LOGIC SHARED COMPONENTS
    Auth Email

    View full-size slide

  28. UMBRELLA APP STRUCTURE
    Core
    API
    Support Tool
    Munger DB
    Cron
    Dynasaur
    Monitoring
    Middleware
    PHOENIX APPS BUSINESS LOGIC SHARED COMPONENTS
    Auth Email

    View full-size slide

  29. • Apps built and deployed as separate Docker
    containers in CircleCI via Distillery
    • Each build & deploy takes ~5 minutes (run in parallel)
    • Blue / green deploys via ECS
    • All auto-scaled via CPU / Memory threshold alarms
    UMBRELLA APP STRUCTURE

    View full-size slide

  30. UMBRELLA APP STRUCTURE
    Core
    API
    Support Tool
    Munger DB
    Cron
    Dynasaur
    Monitoring
    Middleware
    PHOENIX APPS BUSINESS LOGIC SHARED COMPONENTS
    Auth Email

    View full-size slide

  31. • Core houses all of our business logic, services, Ecto
    schemas, access policies, deferred logic, and more
    • Broken into two contexts: Accounts & Projects
    • API & Support Tool use the Core to fetch data and
    execute requests – they are effectively dumb HTTP
    wrappers
    UMBRELLA APP STRUCTURE: CORE

    View full-size slide

  32. $ Finished in 10.7 seconds
    $ 1183 tests, 0 failures

    … And a lot of tests ✨

    View full-size slide

  33. 1. The Intermediary API
    2. Umbrella App Structure
    3. Event System
    4. Moving millions of records
    A WHIRLWIND TOUR THROUGH THE SYSTEM

    View full-size slide

  34. • All changes through our system broadcasted through a
    single, local event bus
    • Provides a powerful hook to build deferred functionality
    on-top of (like notifications, analytics tracking etc)
    • Implemented using GenStage and Protocols
    EVENT SYSTEM: WHAT IS IT?

    View full-size slide

  35. EVENT SYSTEM
    Service Broadcaster Consumer Implementation

    View full-size slide

  36. EVENT SYSTEM
    Service Broadcaster Consumer Implementation

    View full-size slide

  37. EVENT SYSTEM
    Service Broadcaster Consumer Implementation

    View full-size slide

  38. EVENT SYSTEM
    AssetCreated

    Event
    Broadcaster
    Audits
    Analytics
    Notifications
    Usage
    Cache
    Broadcaster will notify all
    consumers concurrently.
    Events always
    typed as structs
    Consumers implemented
    DynamicSupervisors

    View full-size slide

  39. A WHIRLWIND TOUR THROUGH THE SYSTEM
    1. The Intermediary API
    2. Umbrella App Structure
    3. Event System
    4. Moving millions of records

    View full-size slide

  40. • Moved all our records from DynamoDB into Postgres
    • Migrated through Flow tasks that streamed data from
    our tables and converted into the appropriate
    Postgres schemas
    • Largest table size was ~9m records, each record >
    100kb (lots of JSON)
    MOVING MILLIONS OF RECORDS

    View full-size slide

  41. MOVING MILLIONS OF RECORDS: HOW IT WORKS
    Define
    Dynamo
    Schema
    Stream
    from table
    Translate
    Old to New
    3
    2
    1

    View full-size slide

  42. MOVING MILLIONS OF RECORDS: HOW IT WORKS
    Define
    Dynamo
    Schema
    Stream
    from table
    Translate
    Old to New
    3
    2
    1

    View full-size slide

  43. MOVING MILLIONS OF RECORDS: HOW IT WORKS
    Define
    Dynamo
    Schema
    Stream
    from table
    Translate
    Old to New
    3
    2
    1

    View full-size slide

  44. MOVING MILLIONS OF RECORDS: HOW IT WORKS
    • Each table migration job runs in its own isolated
    Docker container using the ECS run task
    • We monitored errors in our jobs and constantly refined
    and tweaked the parallelism for each job
    • Ran weekly migrations and manually checked the
    migrated data in our QA environment

    View full-size slide

  45. Challenges &
    Takeaways
    PART III

    View full-size slide

  46. • Bugs and replicating old bugs
    • Team ramp up: 3 new developers learning Elixir trying
    to ship a thing is hard. Protip: establish patterns.
    • Understanding the performance characteristics of a
    new system and new database.
    • Estimation of complexity: went 6 weeks over our
    planned delivery date.
    CHALLENGES DURING THE MIGRATION

    View full-size slide

  47. TAKEAWAY #1
    Elixir was a huge win for us, but
    might not be for you.

    View full-size slide

  48. TAKEAWAY #2
    If you do rewrite, don’t move
    databases at the same time

    View full-size slide

  49. TAKEAWAY #3
    Good code isn’t about getting it right the
    first time. Good code is just legacy code that
    doesn’t get in the way.
    @tef_ebooks

    View full-size slide

  50. Thank you. Questions?

    [email protected] • @cjbell_

    View full-size slide