Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Challenges and Strategies in Self-hosting Canva...

Avatar for Kasa Hsiao Kasa Hsiao
August 29, 2025
37

Challenges and Strategies in Self-hosting Canvas LMS - Expanded Part for RubyJam

Slides presented in the 2nd half of Ruby Jam 2025-08.

Within the Ruby community, Canvas LMS is one of the most longstanding and actively maintained open-source learning management systems: https://github.com/instructure/canvas-lms

During this Ruby Jam session, engineers from NTUCOOL shared:

Part One:
(Slides here)
• The reason why they chose to self-host the open-source Canvas LMS
• The system architecture of Canvas LMS with LTI integration

Part Two:
• Distinct upstream implementations in Canvas LMS and their approaches to adapting them

Avatar for Kasa Hsiao

Kasa Hsiao

August 29, 2025
Tweet

Transcript

  1. > puts NtuCool.self_host(canvas_lms).expanded_talks[:ruby_jam] ⚠ We sincerely appreciate Instructure’s commitment to

    maintaining such a valuable open source EdTech project. However, we have encountered certain challenges that may make it difficult for new contributors to get involved. By sharing our experiences and suggestions, we hope to help lower these barriers and enable more stakeholders to participate and advance EdTech together. Author.where(name: [‘Kasa’, ‘Steven’]) Date.new(2025, 8, 26)
  2. Pitfalls.each 🐷 LOC 💎 Gemfiles 💾 Schema & Migrations 🌏

    I18n 🌿 Git Branching Strategy 3 Some uncommon (at least to us!) implementations in this Rails-based project 🚂 that you might find ‘interesting’... github.com/instructure/canvas-lms I’ll keep this simple and focus on the real impacts we’re seeing. We don’t have all the answers yet, so I’d really appreciate your thoughts and ideas! 󰚥
  3. 🐷 LOC Similar result if: scc \ --include-ext rb,ts,js,jsx,tsx,erb,haml \

    --exclude-dir node_modules,vendor,test,spec,tmp,log,coverag e,dist,docs \ --no-cocomo 4
  4. 🐷 LOC 5 > 1.7M LOC (excluding tests) …Not just

    large overall. Some files exceed 4,500 LOC each. ➡ Impacts? 󰟲 Dev Experience: • Dev tools may face limitations or reduced efficiency (including AI/non-AI assisted). • Searching overhead. ♾ CI/CD: • Longer static analysis (e.g., SonarQube) and build times.
  5. 🐷 LOC 6 > 1.7M LOC (excluding tests) …Not just

    large overall. Some files exceed 4,500 LOC each. ➡ Impacts? 󰟲 Dev Experience: • Dev tools may face limitations or reduced efficiency (including AI/non-AI assisted). • Searching overhead. ♾ CI/CD: • Longer static analysis (e.g., SonarQube) and build times. Why don’t you refactor? …Will mention it later.
  6. 💎 Gemfiles You might think of eval_gemfile. …Yes, they do

    use it. But the real point is the extensive custom logic built around it. They even go so far as to ‘tweak’ Bundler itself. 7
  7. These are some possible intentions behind the design (we won’t

    go into detail today): • Custom plugin management and enforcement • Dynamic and conditional loading of Gemfiles via eval_gemfile • Overriding Bundler methods (such as gem and source) to enable vendoring, pinning, and custom sources • Rigorous checks to prevent plugins from introducing unintended dependencies • Support for multiple Rails versions and plugin sets through custom lockfile logic 💎 Gemfiles 8
  8. ➡ Impact? In short, the mechanics of bundle install here

    involve extensive custom logic. Be sure to comprehend these carefully when you first start working in this codebase. Especially if you need to add or change gems or versions (for example, to address security or compatibility requirements). 💎 Gemfiles 9
  9. 💾 Schema & Migrations • No, there’s no schema.rb. It’s

    designed to run all existing migrations when building the database for the first time (their migration squashing strategy will be discussed later). • Yes, we still have the familiar and convenient ActiveRecord, which relies on the `SHOW FULL FIELDS` SQL query and a schema_cache. However: • Rails uses its own conventional implementation for schema caching. • Canvas, on the other hand, has developed its own schema_cache logic based on the cache store (in our case, Redis). • Canvas schema_cache data flow: Database → Redis → Process Memory 10
  10. 💾 Schema & Migrations • Mechanisum in LoadAccount.check_schema_cache: → Has

    the schema already been loaded? • Yes: return • No: Try to load a valid schema_cache from MultiCache (Redis) → Is there schema cache data in Redis? • Yes: load it • No: Query the schema from the database and write it into Redis 11
  11. 💾 Schema & Migrations When is LoadAccount.check_schema_cache called? One scenario

    is ‘during service startup initialization:’ → During the database migration process, the schema_cache is cleared. → Then, during initialization, LoadAccount.check_schema_cache will query the latest schema from the database and write it into Redis. ➡ Impact? Ensure that the service is connected to the cache store (Redis) when running the DB migration script in your CI/CD pipeline. Otherwise, the schema_cache will not be updated, and you may not notice the issue until a user encounters a NoMethodError. 12
  12. 💾 Schema & Migrations 13 The ‘’Squash Event” we defined:

    • Migration files grows over time → Canvas periodically performs a cleanup since 2024. • Every time a squash event happens, some migration files in db/migration will disappear. Consider… • No schema.rb • (And the upstream Git branching strategy we’ll mention later)
  13. 14 For example: • Target version 2024-05-22 • Current version

    2023-09-27 Between them, two squash events occurred: 2024-03-27 and 2024-04-10 → We must first ensure: All the squashed migration files have been executed before safely upgrading to 2024-05-22. Based on our investigation, after the first major squash event in March 2024, Canvas has continued performing squash events every few months. current ver Squesh Event Squesh Event target ver
  14. 💾 Schema & Migrations 15 ➡ Impact? If one or

    more squash events occurred between the target & current version… → Do run migrations at those versions before each squash event to avoid missing migrations! 💣 How? To detect when a squash event has occurred, there are two main signs: 1. The number of files in db/migration decreases 2. In *_validate_migration_integrity.rb, the value of last_squashed_migration_version changes.
  15. 🌏 I18n Complexity of… Translation Source Files: • Diversed sources/formats

    of translation files • Large number of translation entries ( > 32,000 lines a file) • Translation quality may not meet UX requirements. Dulpicatied values. (some may even be incomprehensible to native speakers) 17
  16. 🌏 I18n Complexity of… I18n Mechanisms: • Impelmented multiple translation

    syntaxes on the frontend. • I18nliner key integration 18 Credit 󰚥 0711kps
  17. 🌏 I18n Impelmented multiple translation syntaxes on the frontend: •

    handlebars.js • jQuery or vanilla.js or backbone, similar to Rails • React 19 Credit 󰚥 0711kps
  18. 🌏 I18n In Rails, using path key: I18n.t(‘key_of_an_umbrella’) # =>

    ‘一把傘’ I18n.t(‘tools.key_of_three_um brella’) # => ‘三把傘’ 20 Credit 󰚥 0711kps In Canvas, using I18nliner key: CanvasI18n.t(‘an umbrella’) # => ‘一把傘’ Be careful, ‘one layer path key’ will be treated as ‘liner key,’ so in view t(...) != I18n.t(...) 🤯 CanvasI18n.t(‘key_of_an_umbrella’) # => ‘key_of_an_umbrella’ CanvasI18n.t(‘tools.key_of_three_umbrell a’) # => ‘三把傘’ I18nliner integration: # Given yml file an_umbrella_{an auto-gen hash}: 一把傘 key_of_an_umbrella: 一把傘 tools: key_of_three_umbrella: 三把傘
  19. 🌏 I18n ➡ Impact? • High demand for translation modification,

    but specific entries are hard to find. • Even when found, consistent maintenance is difficult. • Frequent conflicts occur during upgrades. (Status quo: zh-Hant is not on the crowdsourced list on transifex, and seems supported through another service provider. Recently related language request threads on the forum are not active) 21
  20. Canvas LMS Git tree using… Treesame Commit → Completely Replaced

    • 4 and 6 are exactly the same. • 5 and 7 are exactly the same. • In other words, 7 will not contain the changes from 6(4) 🌿 Git Branching Strategy 22 3 4 5 6 7 treesame commit of 05-24 prod treesame commit of 11-03 05-24 11-03 05-24 11-03 Credit 󰚥 woolninesun
  21. 🌿 Git Branching Strategy 23 3 4 5 6 7

    treesame commit of 05-24 prod treesame commit of 11-03 05-24 11-03 05-24 11-03 Credit 󰚥 woolninesun
  22. We were naive and used ‘merge’ … • Conflicts unrelated

    to our changes (conflicts between 6 and 7) • Parts without conflicts 💣: End up as a mixed version of 05-24 and 11-03 in our release. …Things might seem fine at first but major issues are only discovered latter by QA. 🌿 Git Branching Strategy 24 4 5 05-24 11-03 3 6 7 prod v12.0.0 v11.0.0 (05-24) (11-03) Credit 󰚥 woolninesun
  23. Now we go for ‘rebase’ on their stable/YYYY-MM-DD branch …

    ➡ Impact? • Branch manipulation overhead… • More consistent feature evolution. • Better understanding on the estimated workload of each upgrade Credit 󰚥 woolninesun 🌿 Git Branching Strategy 25 4 05-24 3 6 prod (05-24) Historical modifications clarified Historical modifications to be clarified Modifications after switching to rebasing Our stabe ver head
  24. Pitfalls.each 🐷 LOC 💎 Gemfiles 💾 Schemas & Migrations 🌏

    i18n 🌿 Git Branching Strategy 26 We’d love to hear from you if you happen to have experience working with this OSS project! So far the only one I personally know is Aleks! 🤝 There’re more! 🥲
  25. > slides.last == 'Thank You!' # => true ⚠ We

    sincerely appreciate Instructure’s commitment to maintaining such a valuable open source EdTech project. However, we have encountered certain challenges that may make it difficult for new contributors to get involved. By sharing our experiences and suggestions, we hope to help lower these barriers and enable more stakeholders to participate and advance EdTech together. Author.where(name: [‘Kasa’, ‘Steven’]) Date.new(2025, 8, 26)