Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Tokyo RubyKaigi 12 - Scaling Ruby at GitHub

Tokyo RubyKaigi 12 - Scaling Ruby at GitHub

John Hawthorn

January 26, 2025
Tweet

More Decks by John Hawthorn

Other Decks in Technology

Transcript

  1. John Hawthorn GitHub: @jhawthorn Twi tt er: @jhawthorn Bluesky: @jhawthorn.com

    Mastodon: @[email protected] δϣϯ ϗʔιʔϯ Sta ff Software Engineer @ GitHub Ruby commi tt er Rails core team
  2. • Joined GitHub in 2018 • "Ruby Architecture" Team •

    Developing & upgrading Ruby • Developing & upgrading Rails • Integrating Ruby/Rails into GitHub • Some database stu f
  3. "The Monolith" ~3.8 million lines of code ~2 million lines

    of tests >1000 developers 2500 pull requests / month
  4. Problems: I can't read 3.8 million lines of code I

    can't review 2500 pull requests / month
  5. Error A tt ribution ActiveRecord::StatementInvalid active_record/lib/active_record/query.rb:123 in `query` app/models/user.rb:124:in 'Array#each'

    app/models/user.rb:123:in 'find_user' app/models/issue.rb:123:in 'some_user' app/controllers/issues_controller.rb:123:in 'show' lib/middleware/cache_middleware.rb:123:in 'call'
  6. ActiveRecord::StatementInvalid active_record/lib/active_record/query.rb:123 in `query` app/models/user.rb:124:in 'Array#each' app/models/user.rb:123:in 'find_user' app/models/issue.rb:123:in 'some_user'

    app/controllers/issues_controller.rb:123:in 'show' lib/middleware/cache_middleware.rb:123:in 'call' user.rb "users" service @github/users-team Error A tt ribution
  7. ActiveRecord::StatementInvalid active_record/lib/active_record/query.rb:123 in `query` app/models/user.rb:124:in 'Array#each' app/models/user.rb:123:in 'find_user' app/models/issue.rb:123:in 'some_user'

    app/controllers/issues_controller.rb:123:in 'show' lib/middleware/cache_middleware.rb:123:in 'call' issues_controller.rb "issues" service @github/issues-team Error A tt ribution
  8. ActiveRecord::StatementInvalid active_record/lib/active_record/query.rb:123 in `query` app/models/user.rb:124:in 'Array#each' app/models/user.rb:123:in 'find_user' app/models/issue.rb:123:in 'some_user'

    app/controllers/issues_controller.rb:123:in 'show' lib/middleware/cache_middleware.rb:123:in 'call' service: users (@github/users-team) cause service: issues (@github/issues-team) Error A tt ribution
  9. # test/linters/openstruct_test.rb openstruct_count = grep(/OpenStruct/).count assert_equal 0, openstruct_count Remove OpenStruct

    # CODEOWNERS test/linters/openstruct_test.rb @github/ruby-architecture Our team has to approve any new OpenStruct 50
  10. # test/linters/openstruct_test.rb openstruct_count = grep(/OpenStruct/).count if openstruct_count > 50 flunk

    "OpenStruct has performance issues." \ "Please use Struct or a plain Ruby class instead." elsif openstruct_count < 50 flunk "You removed an OpenStruct. Thank you!" \ "Please update the counter in #{__FILE__}" end "Ratchet" technique 🔧
  11. 😬 Hard to review Likely to have merge con fl

    icts commit: "Remove OpenStruct everywhere" +1243 -1243
  12. commit: "Remove OpenStruct everywhere" +1243 -1243 script/create-prs-by-codeowner commit: "Remove OpenStruct

    from issues" +30 -30 commit: "Remove OpenStruct from users" +20 -20 ...
  13. Branch deploys https://github.blog/engineering/engineering-principles/deploying-branches-to-github-com/ 1. git branch some_branch 2. Commit and

    push work on some_branch 3. Deploy some_branch to production 4. Merge some_branch to main "main is always production-ready"
  14. If one PR has a bug The whole merge group

    fails The next group is delayed (waiting for the rollback, waiting for CI, etc)
  15. As your application grows the cost of a bug grows

    Users impacted Developers impacted
  16. Feature fl ags if Flipper.enabled?(:some_feature) 100% enabled true Enabled 50%

    of the time rand() < 0.5 Gates "dark shipping" 0% enabled false
  17. Feature fl ags if Flipper.enabled?(:some_feature, current_user) actor actor could be

    any type User Organization @jhawthorn @ruby Repository @ruby/ruby
  18. Feature fl ags if Flipper.enabled?(:some_feature, current_user) actor Enable for 10%

    of users crc(actor.id) % 10 == 0 Enable "sta ff " group actor.staff? Gates
  19. ENV "Feature fl ags" hostname = Socket.gethostname hostname_rand = Zlib.crc32(hostname)

    / 2.0 ** 32 if hostname_rand < ENV["ENABLE_YJIT_PCT"].to_f / 100.0 RubyVM::YJIT.enable end
  20. Scientist users_orig = original_code(args) users_new = new_code(args) unless users_new ==

    users_orig report_error "users didn't match" end return users_orig
  21. Scientist h tt ps://github.com/github/scientist science "users" do |e| e.use {

    original_code(args) } e.try { new_code(args) } end Runs every time. Result returned Controlled by feature fl ag Result compared to "use" block Reports metrics of both! Mismatches recorded
  22. Trilogy • MySQL-compatible database client • Fast! • No dependency

    on libmysqlclient/libmariadb • Automatic casting • Embedding friendly https://github.blog/open-source/maintainers/introducing-trilogy-a-new-database-adapter-for-ruby-on-rails/
  23. DB Scaling - Replication class ApplicationRecord self.abstract_class = true connects_to

    database: { writing: :primary, reading: :primary_replica } end
  24. DB Scaling - Replication Queries the replica SELECT * FROM

    users WHERE id = 123 ActiveRecord::Base.connected_to(role: :reading) do User.find(123) end Available since Rails 6.0.0! https://github.com/rails/rails/pull/34052 by @eileencodes
  25. Primary Replicas logged in < 1s delay Write in last

    1s? yes no "Read your own write"
  26. Primary Replicas logged in Is data replicated? yes no "Read

    your own write" w/ freno 98% 2% 50x scale!
  27. (mostly) Default in Rails 7.1+ # Hack: assume all queries

    are retryable def raw_execute(*args, **kwargs) super(*args, **kwargs.merge(allow_retry: true)) end
  28. circuit_breaker = CircuitBreaker.get("key") if circuit_breaker.allow_request? begin # do something expensive

    circuit_breaker.success rescue => e circuit_breaker.failure # do fallback end else # do fallback end After a certain number of failures Disallow requests
  29. class ResilientTrilogy def query(...) if @circuit_breaker.allow_request? begin ret = @trilogy.query(...)

    @circuit_breaker.success ret rescue ConnectionError => e @circuit_breaker.failure raise end else raise CircuitOpenError end end end
  30. class ResilientTrilogy def query(...) if @circuit_breaker.allow_request? begin ret = @trilogy.query(...)

    @circuit_breaker.success ret rescue ConnectionError => e @circuit_breaker.failure raise end else raise CircuitOpenError end end end No fallback, but allows systems to recover