Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling Magento by COPIOUS Inc.

Kyle Terry
October 18, 2013

Scaling Magento by COPIOUS Inc.

Reid Parham, Aaron Edmonds, and Kyle Terry talk about what it takes to scale Magento for a large retailer.

Aaron talks about working on how to work around some core magento-isms. Kyle gives an overview of the hardware cluster powering the whole systems. Reid goes on to explain the people and team aspects of working on such a large and complex system.

Kyle Terry

October 18, 2013
Tweet

More Decks by Kyle Terry

Other Decks in Technology

Transcript

  1. Scaling Magento
    Reid Parham, Aaron Edmonds, and Kyle Terry
    Public distribution: sensitive information omitted.

    View full-size slide

  2. COPIOUS
    ● User-Centered Digital Experience Agency
    ● Strategy
    ● Experience
    ● Engineering
    ● http://copio.us/about

    View full-size slide

  3. Scale Your Code
    A.K.A. Magento is hard

    View full-size slide

  4. Code Management
    ● Magento is big!
    ○ Our project has over 820,000 lines of PHP
    ● Multi-lingual, multi-currency, multi-store
    ● Classes can have complex names
    ○ *cough*
    Enterprise_Reward_Block_Adminhtml_Customer_Edit_T
    ab_Reward_History_Grid_Column_Renderer_Reason
    *cough*

    View full-size slide

  5. Code Management (cont.)
    ● Configuration is driven by XML
    ● The dreaded EAV
    ● Magento Indices
    ● Event-Observer

    View full-size slide

  6. Code Management (Tools)
    Good tools make the job easier!
    ● A good IDE
    ○ Magicento
    ● Commerce Bug 2
    ● n98-magerun

    View full-size slide

  7. Code Management
    ● NEVER modify core files
    ○ Magento’s forum never helped
    ● NEVER* add files to app/code/local/Mage
    ○ Magento was built to be modular**
    ● Test your code with flat catalog enabled
    and disabled
    ● Before overwriting classes, check for events

    View full-size slide

  8. Code Optimization (Quick Wins)
    Caching Magento Blocks
    ● DIY! Event to add cache data:
    core_block_abstract_to_html_before
    ● OR use a module https://github.
    com/aligent/CacheObserver

    View full-size slide

  9. Code Optimization (Quick Wins)
    Mage::getModel(‘catalog/product’)->load($_product-
    >getId());
    ● This is bad in templates and when looping
    over product collections
    ● Load with initial data select
    ○ used_in_product_listing attribute option

    View full-size slide

  10. Code Optimization
    Make efficient use of Magento indices
    ● Example: Catalog URL Rewrites
    ○ Includes all products by default (including products
    marked as “Not Visible Individually”)
    ○ Do you need SEO friendly URLs for products that
    will never be seen???
    ○ Reduce your index size by up to 95%
    ○ Mage_Catalog_Model_Resource_Url::_getProducts

    View full-size slide

  11. Code Optimization (Quick Wins)
    Mage_Catalog_Model_Resource_Product_Typ
    e_Configurable_Product_Collection::
    isEnabledFlat?
    FALSE

    View full-size slide

  12. Systems
    ● Hardware Profile
    ● Cluster Design
    ● Scaling

    View full-size slide

  13. Hardware Profile (overview)
    ● 2 racks of hardware and dozens of servers
    ● Top quality of available (and compatible)
    chipsets and memory
    ● Buffered DDR3; 1 channel per CPU
    ● 126 kW of stable, reliable, redundant, and
    backed up power
    ● Minor kernel tweaks

    View full-size slide

  14. Hardware Profile (network)
    ● NetScaler for load balancing
    ○ Vserver pools
    ○ Balances web, database, admin and endeca
    ○ Monitors will remove downed hosts
    ● Redundant Network Infrastructure
    ○ Backplane uses LACP (link aggregation) for
    redundancy, load balancing and failover
    ○ HA pairing of configurations

    View full-size slide

  15. Hardware Profile (network)
    Dynamic port forwarding for browsing:
    kyle@localhost $ ssh -L 2221:127.0.0.1:2221 whitelistedhost.example.com
    kyle@whitelistedhost $ ssh -D 2221 cluster.example.com
    Static port forwarding for Navicat SSH tunneling (tunneling through a tunnel):
    kyle@localhost $ ssh -L 2222:127.0.0.1:2222 whitelistedhost.example.com
    kyle@whitelistedhost $ ssh -L 2222:127.0.0.1:22 cluster.example.com

    View full-size slide

  16. Hardware Profile (web)
    ● Dual Intel Xeon E3-1230 @ 3.30GHz
    ● 32 GB RAM
    ● Dozens of servers
    ● nginx and PHP5-FPM
    ● 6:1 ratio of PHP processes to CPU cores

    View full-size slide

  17. Hardware Profile (database)
    ● Redundant database hosts
    ● MySQL 5.6 chosen for scaling capability
    ● tcmalloc further improves throughput
    ● Master/slave replication
    ● Standby hosts for warm failover
    ● Failure point: > 4,000 checkouts/hour

    View full-size slide

  18. Hardware Profile (database)
    ● Quad Intel Xeon E7-2860
    ○ 10 cores + hyperthreading each totalling 80 threads
    ● 128 GBs of RAM
    ● RAID10 SSDs for data
    ○ writeback cache; noatime,noexec mount options
    ● RAID1 HDDs for OS

    View full-size slide

  19. Hardware Profile (cache)
    ● Powering discrete instances of Redis
    ○ Sessions
    ○ Full page cache
    ○ Magento back end cache
    ○ Background processing queues
    ● Discrete instances are for threading, differing
    memory limits, differing backup rules, and
    multi-db deprecation

    View full-size slide

  20. Hardware Profile (cache)
    ● Content is compressed with LZF
    ○ Compression and decompression with LZF is faster
    than gzip so it’s an ideal solution
    ● Decreased utilization of network capacity
    ● Sentinel for failover (soon)
    ● RDB BGSAVE: prime number intervals

    View full-size slide

  21. Compression Outcomes

    View full-size slide

  22. Hardware Profile (cache)
    ● Quad Intel Xeon E5-2620 @ 2.00GHz
    ● 128 GBs of RAM
    ● 4 bonded network interfaces
    ○ Prevents saturation of private network
    ○ 4 Gb/s
    ○ Bonding mode 5 (balance-tlb)
    ■ No special switch support
    ■ Nice when the colo manages the switch

    View full-size slide

  23. Hardware Profile (utility)
    ● Cron and systems jobs
    ● Scripts
    ● Deploys
    ● Chef Server 10 for deploy and configuration
    ● Tests
    ○ Database test suite in Perl (Test::DatabaseRow)
    ● Backups (and copies)

    View full-size slide

  24. Cluster Overview
    ● Production
    ○ Most hardware serves production
    ● Staging
    ○ Some data promoted to production nightly
    ● Preview{1..n}
    ○ Instances for testing and previewing new features,
    bug fixes and design changes.

    View full-size slide

  25. ● Aggregate hardware availability exceeds
    six nines (99.9999%)
    ● Software availability is ~99.999%
    ● Software, including deployments: 99.98%
    ● Software, including maintenance: 99.9%
    ● Non-recoverable human errors: 98%
    Production Uptime

    View full-size slide

  26. Scale Your Team

    View full-size slide

  27. Team Profile
    ● 16 committers; 8.25 FTE
    ● 4 Project Managers
    ● 5 departments
    ● 31 vendors
    ● 5 time zones

    View full-size slide

  28. Team Values
    ● State your needs; respect others’
    ● Respect is given, then adjusted
    ● Process can always change and improve
    ● Work/life balance
    ● Mature and non-aggressive; mediate conflict
    ● Honesty and transparency

    View full-size slide

  29. Team Mantras
    ● Trust (relevant) data; make things visible
    ● Measurable, repeatable, falsifiable
    (scientific method)
    ● Redundancy reduces risks (if documented)
    ● Set expectations (timing, contents, formats)
    and deliver on them

    View full-size slide

  30. Team Mantras
    ● Automate what is repeated
    ● Use known patterns and
    proven architectures
    ● Grow talent from within
    ● Compartmentalization of some data,
    code, and knowledge

    View full-size slide

  31. 10 Integrated Vendors
    Adobe, Akamai, tax calculation,
    legacy software, Ebay, gift cards,
    ERP (fulfillment and inventory), Oracle,
    Tierpoint (Dallas, Seattle, Spokane),
    Endeca provider

    View full-size slide

  32. advertising, application analytics, email,
    hardware analysis and functionality, maps,
    offsite storage, promotions, payment gateways,
    remarketing, shipping estimates, SMS,
    social networks, uptime
    21 Accessory Vendors

    View full-size slide

  33. ● Group emails: avoid general questions,
    assign actions to people, minimize
    distribution lists
    ● Identify urgency of requests
    ● Use email filters
    ● Coach and mentor
    Effective Communication

    View full-size slide

  34. ● Daily phone calls: only while needed
    ● Set an agenda; keep to a schedule
    ● Encourage people to skip calls
    or to leave early
    ● End the call when completed
    Effective Communication

    View full-size slide

  35. Tools
    ● GitHub
    ● Google Docs
    ● Pivotal Tracker
    ● Conference calls, Skype, and IM
    ● BugHerd

    View full-size slide

  36. QA preparation

    View full-size slide

  37. Off-hours chaos

    View full-size slide

  38. Build Knowledge
    ● Document the “obvious”
    ● 1000-line README
    ● Capture failures and solutions
    ● What happens when?
    ● Which database and server?

    View full-size slide

  39. Automation Schedule

    View full-size slide

  40. “This is how we work.”

    View full-size slide

  41. Example Git Workflow

    View full-size slide

  42. Learn from previous failures.

    View full-size slide

  43. Code Review
    ● Standardize pull request structures
    ● Constructive feedback; ask questions
    ● emoji-cheat-sheet.com

    View full-size slide

  44. Code Review
    Pull requests can also be workspaces

    View full-size slide

  45. Releases and Git flow: rhythm, ownership, and pride.

    View full-size slide

  46. Deployments
    ● Monday through Thursday only!
    ● Communication: tickets, cross references,
    pull requests, QA status, and releases
    ● Set expectations: timings for outages,
    maintenance, and degraded functionality
    ● Are we done, yet?
    ● Explain outcomes and options

    View full-size slide

  47. Community Participation
    ● Patches submitted
    ○ Redis
    ○ Cm_RedisSession
    ○ Cm_Cache_Backend_Redis
    ○ https://github.com/magento/magento2
    ● Modules improved
    ○ CacheObserver
    ○ VF_CustomMenu

    View full-size slide

  48. Community Participation
    ● http://magento.stackexchange.com/
    ● http://stackoverflow.com/
    ● phpredis bug(s)

    View full-size slide

  49. ● Spence, Muneera U. Collaborative
    Processes lecture. 13 Apr. 2006.
    ● Marks, Andrea. "The Role of Writing in a
    Design Curriculum." AIGA: Design Education
    (2004).
    ● Katzenbach, Jon R., and Douglas K. Smith.
    The Wisdom of Teams. HarperCollins, 2003.
    Collaboration Texts

    View full-size slide

  50. ● Bennis, Warren, and Patricia W. Biederman.
    Organizing Genius. Perseus, 1997.
    ● Marcum, James W. After the Information
    Age. Peter Lang, 2006.
    ● https://en.wikipedia.org/wiki/Collaboration
    (and collaborative method)
    Collaboration Texts

    View full-size slide

  51. See Also
    GitHub (and Gist)
    @parhamr
    @kyleterry
    @aedmonds

    View full-size slide