Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Shipping a Replacement Architecture in Elixir

9ffad8bbc282b748763697965f27b3c8?s=47 Chris Bell
February 11, 2018

Shipping a Replacement Architecture in Elixir

Talk given at EMPEX LA, 2018.

Describes the journey at Frame.io of shipping a new Elixir powered set of services to replace our aging Ruby on Rails and Node.js powered stack.

#elixir #phoenix

9ffad8bbc282b748763697965f27b3c8?s=128

Chris Bell

February 11, 2018
Tweet

Transcript

  1. Shipping a Replacement Architecture in Elixir Chris Bell • @cjbell_

    EMPEX LA 2018
  2. I’m Chris and I <3 Elixir • 3 years of

    writing production Elixir apps • EMPEX NYC Organizer • ElixirTalk Co-host (with @desmondmonster)
  3. • Powers Review & Collaboration for Video teams • Used

    by Vice, Turner, Buzzfeed, NYTimes, NASA • 450,000+ customers • Founded in 2014, based in NYC
  4. Ruby on Rails to
 Elixir & Phoenix OUR JOURNEY FROM:

  5. Why Rewrite & 
 Why Elixir? PART I

  6. We did that thing you should never do: a rewrite

    OOPS, SORRY JOEL
  7. Rapid growth and lots of shifting requirements left the codebase

    not in the best state
  8. • Ruby 1.9.3, Rails 3.2 • No tests… at all…

    for anything • Custom ORM into DynamoDB • Metaprogramming everywhere • No logical separation of concerns (authorization, persistence, domain logic) • No metrics or visibility into performance API ISSUES PRE MIGRATION
  9. • Everything is a string … even nulls. • Foreign

    keys stored as a JSON encoded list of strings on the model. No atomic updates = lost data. • DynamoDB cost $$$ to support our workload • No ability to paginate anything = up to 45s response times and large payloads (> 1mb of JSON) DATABASE ISSUES PRE-MIGRATION
  10. We felt we were technically bankrupt with our existing API

    and services
  11. So why Elixir for us?

  12. • Easy-ish ramp up for our existing Ruby / Python

    developers • Highly concurrent: use resources much more efficiently • Building on a mature VM (BEAM) and established language (Erlang & OTP) • Language attributes promote explicitness: immutability, pattern matching, multi function heads. WHY ELIXIR?
  13. What We Shipped &
 A Peek Into The System PART

    II
  14. API 
 (Ruby on Rails) Websocket connection Real-time Service (Node.JS)

    Support Tool (Node.js) Push Notifications SES Email Digest Service (Node.js) Client
 (iOS / Web / Adobe) DynamoDB LEGACY ARCHITECTURE
  15. API 
 (Ruby on Rails) Websocket connection Real-time Service (Node.JS)

    Support Tool (Node.js) Push Notifications SES Email Digest Service (Node.js) Client
 (iOS / Web / Adobe) DynamoDB LEGACY ARCHITECTURE: WHAT WE REPLACED
  16. • Elixir powered API, notifications system, real-time service, and support

    tool • Migrated all of our data to Postgres from DynamoDB • Dockerized all of the above and rebuilt our tooling and deploy process from the ground up WHAT WE SHIPPED
  17. V2 API 
 (Phoenix) Websocket connection Real-time Service 
 (Elixir

    / Phoenix) Support Tool (Phoenix) Push Notifications SES Email Service Client
 (iOS / Web / Adobe) Postgres UPDATED ARCHITECTURE Munger API (Phoenix) Core Business Logic Memcached Umbrella App
  18. • ~40 EC2 Instances to ~5 (running on ECS) •

    API 95th Percentile: ~30ms @ ~120rps • Database cost 91% reduced • Full visibility into all parts of the system (via statsd & datadog) • Modular, documented, maintainable codebase WHAT WE SHIPPED: RESULTS
  19. AND BEST OF ALL: Predictable, stable performance with no out-of-hours

    incidents yet (and very few in general)
  20. 1. The Intermediary API 2. Umbrella App Structure 3. Event

    System 4. Moving millions of records A WHIRLWIND TOUR THROUGH THE SYSTEM
  21. • Consumes new API, spits out old schemas and maintains

    legacy contract (by stringifying everything) • Allowed us to ship our new stack sooner with fewer implications for our different clients • Complexity is high, but designed to be thrown away (~6 months time) THE INTERMEDIARY API
  22. THE INTERMEDIARY API: HOW IT WORKS HTTP
 Request Fetch Resources

    Translate New to Old Serialize 4 3 2 1
  23. THE INTERMEDIARY API: HOW IT WORKS HTTP
 Request Fetch Resources

    Translate New to Old Serialize 4 3 2 1
  24. THE INTERMEDIARY API: HOW IT WORKS HTTP
 Request Fetch Resources

    Translate New to Old Serialize 4 3 2 1
  25. 1. The Intermediary API 2. Umbrella App Structure 3. Event

    System 4. Moving millions of records A WHIRLWIND TOUR THROUGH THE SYSTEM
  26. • We use a single ‘monorepo’ to contain all our

    separate applications structured as an Umbrella • Total of 11 apps right now UMBRELLA APP STRUCTURE
  27. UMBRELLA APP STRUCTURE Core API Support Tool Munger DB Cron

    Dynasaur Monitoring Middleware PHOENIX APPS BUSINESS LOGIC SHARED COMPONENTS Auth Email
  28. UMBRELLA APP STRUCTURE Core API Support Tool Munger DB Cron

    Dynasaur Monitoring Middleware PHOENIX APPS BUSINESS LOGIC SHARED COMPONENTS Auth Email
  29. • Apps built and deployed as separate Docker containers in

    CircleCI via Distillery • Each build & deploy takes ~5 minutes (run in parallel) • Blue / green deploys via ECS • All auto-scaled via CPU / Memory threshold alarms UMBRELLA APP STRUCTURE
  30. UMBRELLA APP STRUCTURE Core API Support Tool Munger DB Cron

    Dynasaur Monitoring Middleware PHOENIX APPS BUSINESS LOGIC SHARED COMPONENTS Auth Email
  31. • Core houses all of our business logic, services, Ecto

    schemas, access policies, deferred logic, and more • Broken into two contexts: Accounts & Projects • API & Support Tool use the Core to fetch data and execute requests – they are effectively dumb HTTP wrappers UMBRELLA APP STRUCTURE: CORE
  32. $ Finished in 10.7 seconds $ 1183 tests, 0 failures


    … And a lot of tests ✨
  33. 1. The Intermediary API 2. Umbrella App Structure 3. Event

    System 4. Moving millions of records A WHIRLWIND TOUR THROUGH THE SYSTEM
  34. • All changes through our system broadcasted through a single,

    local event bus • Provides a powerful hook to build deferred functionality on-top of (like notifications, analytics tracking etc) • Implemented using GenStage and Protocols EVENT SYSTEM: WHAT IS IT?
  35. EVENT SYSTEM Service Broadcaster Consumer Implementation

  36. EVENT SYSTEM Service Broadcaster Consumer Implementation

  37. EVENT SYSTEM Service Broadcaster Consumer Implementation

  38. EVENT SYSTEM AssetCreated
 Event Broadcaster Audits Analytics Notifications Usage Cache

    Broadcaster will notify all consumers concurrently. Events always typed as structs Consumers implemented DynamicSupervisors
  39. A WHIRLWIND TOUR THROUGH THE SYSTEM 1. The Intermediary API

    2. Umbrella App Structure 3. Event System 4. Moving millions of records
  40. • Moved all our records from DynamoDB into Postgres •

    Migrated through Flow tasks that streamed data from our tables and converted into the appropriate Postgres schemas • Largest table size was ~9m records, each record > 100kb (lots of JSON) MOVING MILLIONS OF RECORDS
  41. MOVING MILLIONS OF RECORDS: HOW IT WORKS Define Dynamo Schema

    Stream from table Translate Old to New 3 2 1
  42. MOVING MILLIONS OF RECORDS: HOW IT WORKS Define Dynamo Schema

    Stream from table Translate Old to New 3 2 1
  43. MOVING MILLIONS OF RECORDS: HOW IT WORKS Define Dynamo Schema

    Stream from table Translate Old to New 3 2 1
  44. MOVING MILLIONS OF RECORDS: HOW IT WORKS • Each table

    migration job runs in its own isolated Docker container using the ECS run task • We monitored errors in our jobs and constantly refined and tweaked the parallelism for each job • Ran weekly migrations and manually checked the migrated data in our QA environment
  45. Challenges & Takeaways PART III

  46. • Bugs and replicating old bugs • Team ramp up:

    3 new developers learning Elixir trying to ship a thing is hard. Protip: establish patterns. • Understanding the performance characteristics of a new system and new database. • Estimation of complexity: went 6 weeks over our planned delivery date. CHALLENGES DURING THE MIGRATION
  47. TAKEAWAY #1 Elixir was a huge win for us, but

    might not be for you.
  48. TAKEAWAY #2 If you do rewrite, don’t move databases at

    the same time
  49. TAKEAWAY #3 Good code isn’t about getting it right the

    first time. Good code is just legacy code that doesn’t get in the way. @tef_ebooks
  50. Thank you. Questions?
 chris@frame.io • @cjbell_