Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Shipping a Replacement Architecture in Elixir

Chris Bell
February 11, 2018

Shipping a Replacement Architecture in Elixir

Talk given at EMPEX LA, 2018.

Describes the journey at Frame.io of shipping a new Elixir powered set of services to replace our aging Ruby on Rails and Node.js powered stack.

#elixir #phoenix

Chris Bell

February 11, 2018
Tweet

More Decks by Chris Bell

Other Decks in Technology

Transcript

  1. Shipping a Replacement
    Architecture in Elixir
    Chris Bell • @cjbell_
    EMPEX LA 2018

    View Slide


  2. I’m Chris and I <3 Elixir
    • 3 years of writing production Elixir apps
    • EMPEX NYC Organizer
    • ElixirTalk Co-host (with @desmondmonster)

    View Slide

  3. • Powers Review & Collaboration for Video teams
    • Used by Vice, Turner, Buzzfeed, NYTimes, NASA
    • 450,000+ customers
    • Founded in 2014, based in NYC

    View Slide

  4. Ruby on Rails to

    Elixir & Phoenix
    OUR JOURNEY FROM:

    View Slide

  5. Why Rewrite & 

    Why Elixir?
    PART I

    View Slide

  6. We did that thing you should
    never do: a rewrite
    OOPS, SORRY JOEL

    View Slide

  7. Rapid growth and lots of shifting
    requirements left the codebase
    not in the best state

    View Slide

  8. • Ruby 1.9.3, Rails 3.2
    • No tests… at all… for anything
    • Custom ORM into DynamoDB
    • Metaprogramming everywhere
    • No logical separation of concerns (authorization,
    persistence, domain logic)
    • No metrics or visibility into performance
    API ISSUES PRE MIGRATION

    View Slide

  9. • Everything is a string … even nulls.
    • Foreign keys stored as a JSON encoded list of strings on
    the model. No atomic updates = lost data.
    • DynamoDB cost $$$ to support our workload
    • No ability to paginate anything = up to 45s response
    times and large payloads (> 1mb of JSON)
    DATABASE ISSUES PRE-MIGRATION

    View Slide

  10. We felt we were technically
    bankrupt with our existing API
    and services

    View Slide

  11. So why Elixir for us?

    View Slide

  12. • Easy-ish ramp up for our existing Ruby / Python
    developers
    • Highly concurrent: use resources much more efficiently
    • Building on a mature VM (BEAM) and established
    language (Erlang & OTP)
    • Language attributes promote explicitness:
    immutability, pattern matching, multi function heads.
    WHY ELIXIR?

    View Slide

  13. What We Shipped &

    A Peek Into The System
    PART II

    View Slide

  14. API 

    (Ruby on Rails)
    Websocket
    connection
    Real-time Service
    (Node.JS)
    Support Tool
    (Node.js)
    Push
    Notifications
    SES
    Email Digest Service
    (Node.js)
    Client

    (iOS / Web / Adobe)
    DynamoDB
    LEGACY ARCHITECTURE

    View Slide

  15. API 

    (Ruby on Rails)
    Websocket
    connection
    Real-time Service
    (Node.JS)
    Support Tool
    (Node.js)
    Push
    Notifications
    SES
    Email Digest Service
    (Node.js)
    Client

    (iOS / Web / Adobe)
    DynamoDB
    LEGACY ARCHITECTURE: WHAT WE REPLACED

    View Slide

  16. • Elixir powered API, notifications system, real-time
    service, and support tool
    • Migrated all of our data to Postgres from DynamoDB
    • Dockerized all of the above and rebuilt our tooling
    and deploy process from the ground up
    WHAT WE SHIPPED

    View Slide

  17. V2 API 

    (Phoenix)
    Websocket
    connection
    Real-time Service 

    (Elixir / Phoenix)
    Support Tool
    (Phoenix)
    Push
    Notifications
    SES
    Email Service
    Client

    (iOS / Web / Adobe)
    Postgres
    UPDATED ARCHITECTURE
    Munger API
    (Phoenix)
    Core Business
    Logic
    Memcached
    Umbrella App

    View Slide

  18. • ~40 EC2 Instances to ~5 (running on ECS)
    • API 95th Percentile: ~30ms @ ~120rps
    • Database cost 91% reduced
    • Full visibility into all parts of the system (via statsd
    & datadog)
    • Modular, documented, maintainable codebase
    WHAT WE SHIPPED: RESULTS

    View Slide

  19. AND BEST OF ALL:
    Predictable, stable performance
    with no out-of-hours incidents yet
    (and very few in general)

    View Slide

  20. 1. The Intermediary API
    2. Umbrella App Structure
    3. Event System
    4. Moving millions of records
    A WHIRLWIND TOUR THROUGH THE SYSTEM

    View Slide

  21. • Consumes new API, spits out old schemas and
    maintains legacy contract (by stringifying everything)
    • Allowed us to ship our new stack sooner with fewer
    implications for our different clients
    • Complexity is high, but designed to be thrown away
    (~6 months time)
    THE INTERMEDIARY API

    View Slide

  22. THE INTERMEDIARY API: HOW IT WORKS
    HTTP

    Request
    Fetch
    Resources
    Translate
    New to Old
    Serialize
    4
    3
    2
    1

    View Slide

  23. THE INTERMEDIARY API: HOW IT WORKS
    HTTP

    Request
    Fetch
    Resources
    Translate
    New to Old
    Serialize
    4
    3
    2
    1

    View Slide

  24. THE INTERMEDIARY API: HOW IT WORKS
    HTTP

    Request
    Fetch
    Resources
    Translate
    New to Old
    Serialize
    4
    3
    2
    1

    View Slide

  25. 1. The Intermediary API
    2. Umbrella App Structure
    3. Event System
    4. Moving millions of records
    A WHIRLWIND TOUR THROUGH THE SYSTEM

    View Slide

  26. • We use a single ‘monorepo’ to contain all our
    separate applications structured as an Umbrella
    • Total of 11 apps right now
    UMBRELLA APP STRUCTURE

    View Slide

  27. UMBRELLA APP STRUCTURE
    Core
    API
    Support Tool
    Munger DB
    Cron
    Dynasaur
    Monitoring
    Middleware
    PHOENIX APPS BUSINESS LOGIC SHARED COMPONENTS
    Auth Email

    View Slide

  28. UMBRELLA APP STRUCTURE
    Core
    API
    Support Tool
    Munger DB
    Cron
    Dynasaur
    Monitoring
    Middleware
    PHOENIX APPS BUSINESS LOGIC SHARED COMPONENTS
    Auth Email

    View Slide

  29. • Apps built and deployed as separate Docker
    containers in CircleCI via Distillery
    • Each build & deploy takes ~5 minutes (run in parallel)
    • Blue / green deploys via ECS
    • All auto-scaled via CPU / Memory threshold alarms
    UMBRELLA APP STRUCTURE

    View Slide

  30. UMBRELLA APP STRUCTURE
    Core
    API
    Support Tool
    Munger DB
    Cron
    Dynasaur
    Monitoring
    Middleware
    PHOENIX APPS BUSINESS LOGIC SHARED COMPONENTS
    Auth Email

    View Slide

  31. • Core houses all of our business logic, services, Ecto
    schemas, access policies, deferred logic, and more
    • Broken into two contexts: Accounts & Projects
    • API & Support Tool use the Core to fetch data and
    execute requests – they are effectively dumb HTTP
    wrappers
    UMBRELLA APP STRUCTURE: CORE

    View Slide

  32. $ Finished in 10.7 seconds
    $ 1183 tests, 0 failures

    … And a lot of tests ✨

    View Slide

  33. 1. The Intermediary API
    2. Umbrella App Structure
    3. Event System
    4. Moving millions of records
    A WHIRLWIND TOUR THROUGH THE SYSTEM

    View Slide

  34. • All changes through our system broadcasted through a
    single, local event bus
    • Provides a powerful hook to build deferred functionality
    on-top of (like notifications, analytics tracking etc)
    • Implemented using GenStage and Protocols
    EVENT SYSTEM: WHAT IS IT?

    View Slide

  35. EVENT SYSTEM
    Service Broadcaster Consumer Implementation

    View Slide

  36. EVENT SYSTEM
    Service Broadcaster Consumer Implementation

    View Slide

  37. EVENT SYSTEM
    Service Broadcaster Consumer Implementation

    View Slide

  38. EVENT SYSTEM
    AssetCreated

    Event
    Broadcaster
    Audits
    Analytics
    Notifications
    Usage
    Cache
    Broadcaster will notify all
    consumers concurrently.
    Events always
    typed as structs
    Consumers implemented
    DynamicSupervisors

    View Slide

  39. A WHIRLWIND TOUR THROUGH THE SYSTEM
    1. The Intermediary API
    2. Umbrella App Structure
    3. Event System
    4. Moving millions of records

    View Slide

  40. • Moved all our records from DynamoDB into Postgres
    • Migrated through Flow tasks that streamed data from
    our tables and converted into the appropriate
    Postgres schemas
    • Largest table size was ~9m records, each record >
    100kb (lots of JSON)
    MOVING MILLIONS OF RECORDS

    View Slide

  41. MOVING MILLIONS OF RECORDS: HOW IT WORKS
    Define
    Dynamo
    Schema
    Stream
    from table
    Translate
    Old to New
    3
    2
    1

    View Slide

  42. MOVING MILLIONS OF RECORDS: HOW IT WORKS
    Define
    Dynamo
    Schema
    Stream
    from table
    Translate
    Old to New
    3
    2
    1

    View Slide

  43. MOVING MILLIONS OF RECORDS: HOW IT WORKS
    Define
    Dynamo
    Schema
    Stream
    from table
    Translate
    Old to New
    3
    2
    1

    View Slide

  44. MOVING MILLIONS OF RECORDS: HOW IT WORKS
    • Each table migration job runs in its own isolated
    Docker container using the ECS run task
    • We monitored errors in our jobs and constantly refined
    and tweaked the parallelism for each job
    • Ran weekly migrations and manually checked the
    migrated data in our QA environment

    View Slide

  45. Challenges &
    Takeaways
    PART III

    View Slide

  46. • Bugs and replicating old bugs
    • Team ramp up: 3 new developers learning Elixir trying
    to ship a thing is hard. Protip: establish patterns.
    • Understanding the performance characteristics of a
    new system and new database.
    • Estimation of complexity: went 6 weeks over our
    planned delivery date.
    CHALLENGES DURING THE MIGRATION

    View Slide

  47. TAKEAWAY #1
    Elixir was a huge win for us, but
    might not be for you.

    View Slide

  48. TAKEAWAY #2
    If you do rewrite, don’t move
    databases at the same time

    View Slide

  49. TAKEAWAY #3
    Good code isn’t about getting it right the
    first time. Good code is just legacy code that
    doesn’t get in the way.
    @tef_ebooks

    View Slide

  50. Thank you. Questions?

    [email protected] • @cjbell_

    View Slide