Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Sustainable Small Architecture

Sustainable Small Architecture

We will look at the sort of problems startups face and how that affects the applications those companies create. We will cover tools which can assist with technical decisions and look at what a maintainable product and business might look like. We will also discuss some decisions made by companies in an effort to scale and grow and reflect on the successes of those decisions.

1668868370ee5829339e06031ad0b145?s=128

Robbie Clutton

March 08, 2013
Tweet

More Decks by Robbie Clutton

Other Decks in Technology

Transcript

  1. @robb1e how to lean on others to get stuff done

    Software Engineer, Pivotal Labs Robbie Clutton Startup Architecture
  2. @robb1e how to lean on others to get stuff done

    Software Engineer, Pivotal Labs Robbie Clutton Startup Architecture
  3. @robb1e with Rails applications Software Engineer, Pivotal Labs Robbie Clutton

    Startup Architecture
  4. @robb1e Simple, small web application Service oriented RESTful/pubsub/LMAX/P2P distributed architecture

    Gray area
  5. @robb1e Our codebase is 5 years old and too hard

    to change
  6. @robb1e We’ve allowed our design to evolve into a big

    ball of mud
  7. @robb1e We’ll probably create services at some point, might as

    well start there
  8. @robb1e I’m going to design everything up front based on

    unvaldiated assumptions
  9. @robb1e Kent Beck Make it work, make it right, make

    it fast
  10. @robb1e • Features that have hypotheses • Hypotheses that can

    be easily validated • Code that is always production ready • Code that is easy to change Goals
  11. @robb1e Creating sustainable small architectures Software Engineer, Pivotal Labs Robbie

    Clutton Startup Architecture
  12. @robb1e Real stories from colleagues and myself

  13. @robb1e Names have been changed to protect the innocent

  14. @robb1e Some stories are pre-production, others are in production

  15. @robb1e Crazy Egg Story

  16. @robb1e 10am deploy CrazyEgg

  17. @robb1e 5pm review CrazyEgg

  18. @robb1e Users clicking headers that are not links

  19. @robb1e You could feel the users frustration

  20. @robb1e Simple user testing can pay dividends Lesson

  21. @robb1e • CrazyEgg.com • UserTesting.com • SliverbackApp.com • LeanLaunchLab.com •

    Trello.com Tools
  22. @robb1e Funnels, user testing, hypotheses and validations Story

  23. @robb1e Product with wizard like pages which pre-selected default services

  24. @robb1e Changes to the basket updates price in real-time

  25. @robb1e Funnel showed massive drop off at a certain step

  26. @robb1e In person user testing to discover why the drop

    off was occurring
  27. @robb1e Create hypothesis to stop users leaving at this junction

  28. @robb1e Implement change: allow users to use default or create

    own
  29. @robb1e Review funnel after deployment

  30. @robb1e Learn what is blocking the users Lesson

  31. @robb1e • KissMetrics.com • StatsD (Etsy) • Cube (Square) Tools

  32. @robb1e Always be validating Take away

  33. @robb1e You’re gonna need a bigger boat Story

  34. @robb1e Dave walks into a new job START-UP 3.0

  35. @robb1e We need more RAM for the database Product manager

    tells you
  36. @robb1e Product manager tells you This report takes 20 minutes

    to run.
  37. @robb1e Hmm, ok

  38. @robb1e There are no indexes

  39. @robb1e

  40. @robb1e No primary or foreign keys

  41. @robb1e Needed more RAM so the whole database could fit

    in memory
  42. @robb1e Dave cleans up a bit, report now takes 10

    seconds to run
  43. @robb1e Use tools to discover simple mistakes Lesson

  44. @robb1e Passing tests don’t imply production quality Bonus

  45. @robb1e • Rails Best Practices • SQL Explain • NewRelic.com

    Tools
  46. @robb1e Instrument, refactor, repeat Story

  47. @robb1e Dave walks into a new job START-UP 3.0

  48. @robb1e Client moving from ColdFusion to Ruby

  49. @robb1e Yes, there are people still using ColdFusion

  50. @robb1e Ruby is slow and we’re going to production next

    week. Product manager tells you
  51. @robb1e Product manager tells you We’ve made a terrible mistake...

  52. @robb1e You say... Hold on a minute, let’s take a

    look
  53. @robb1e Instrument to find slow requests/queries

  54. @robb1e Refactor slowest query until more performant with green tests

  55. @robb1e Rinse and repeat until performance has improved enough

  56. @robb1e Paul Hammond, 2012 Every scaling story: 1. Find the

    biggest problem 2. Fix the biggest problem 3. Repeat
  57. @robb1e ‘Friday afternoon’ performance refactoring can build upon itself Bonus

  58. @robb1e • NewRelic.com • CodeClimate.com • Emma, FindBugs Tools

  59. @robb1e Use tools to discover improvements Take away

  60. @robb1e Distributed cache Story

  61. @robb1e Website was growing and gaining visitors

  62. @robb1e Scaling strategy was to add app servers

  63. @robb1e Each server had the web app and a local

    cache
  64. @robb1e Spinning up a new server meant more pressure on

    the database
  65. @robb1e Using a distributed cache bought the team time to

    make improvements
  66. @robb1e Caching can buy significant performance improvements Lesson

  67. @robb1e • MemcacheD.org • Varnish-Cache.org • Squid-Cache.org Tools

  68. @robb1e To cache, or not to cache? Story

  69. @robb1e Sometimes code speaks to you Yo.

  70. @robb1e This part is slow, let’s cache it. Problem solved

  71. @robb1e But I’m going to invalidate that elsewhere

  72. @robb1e Collection of widgets being rendered with new and old

    design
  73. @robb1e Can’t replicate on staging or locally

  74. @robb1e Clear ALL the cache

  75. @robb1e Changing the template had not invalidated the entry

  76. @robb1e - Phil Karlton "There are only two hard things

    in Computer Science: cache invalidation and naming things."
  77. @robb1e Caching can obsure poorly written code Bonus

  78. @robb1e Be careful what you cache Take away

  79. @robb1e Non-essential work during a request Story

  80. @robb1e User registration stopped working

  81. @robb1e Mailing list provider was down

  82. @robb1e Exception bubbled up and prevented registering new user

  83. @robb1e Put mailing list subscription in background job

  84. @robb1e Shorten the request/response cycle Lesson

  85. @robb1e When dealing with integrations, some healthy paranoia is a

    good thing Bonus
  86. @robb1e • Background workers • Message Queues • Threads Tools

  87. @robb1e A tale of two websites Story

  88. @robb1e www.guardian.co.uk 125 requests 1.2MB HTML: 3.7s Loaded: 8.4s

  89. @robb1e m.guardian.co.uk 44 requests 340KB HTML: 1.68s Loaded: 3.32s

  90. @robb1e That’s not the result of better SQL or server

    optimizations
  91. @robb1e Result of highly tuned client-side Javascript and CSS

  92. @robb1e No (large) Javascript libraries

  93. @robb1e Not even jQuery

  94. @robb1e Conditional loading of secondary content

  95. @robb1e - Steve Saunders, 2007 “Optimize front-end performance first, that's

    where 80% or more of the end-user response time is spent”
  96. @robb1e • Firebug • Chrome Developer Tools • Compass •

    YSlow • YUI Compressor Tools
  97. @robb1e Perceived performance is more important than actual performance Take

    away
  98. @robb1e Was that really the best use of your time?

    Story
  99. @robb1e During technical due diligence for an acquisition

  100. @robb1e The company had built their own message queue

  101. @robb1e No persistence

  102. @robb1e Didn’t use standard protocol like AMPQ

  103. @robb1e Not explicitly sending a terminating character would eventually result

    in the queue crashing
  104. @robb1e Almost all transactions passed through this queue

  105. @robb1e Not buying a message queue company

  106. @robb1e - Joel Spolsky, 2001 "If it's a core business

    function - do it yourself, no matter what."
  107. @robb1e Time is the most expensive out going Bonus

  108. @robb1e Real-time vs near-time Story

  109. @robb1e Trading system which updates users’ screen every 10 seconds

  110. @robb1e Lots of number crunching and message queues

  111. @robb1e Did some in the field research

  112. @robb1e Traders only checked values every few minutes

  113. @robb1e This was not high volatile trading http://www.boldjack.com/wp-content/uploads/2012/01/wall_street4.jpg

  114. @robb1e Removed message queues and moved to publishing updates to

    web server directly
  115. @robb1e Reduced complexity of the product

  116. @robb1e Ron Jefferies, ~2005 Always implement things when you actually

    need them, never when you just foresee that you need them
  117. @robb1e ‘Real-time’ can mean different things depending on who you

    talk too Bonus
  118. @robb1e Buy vs build Story

  119. @robb1e $50 a month is really expensive for this hosted

    service
  120. @robb1e We can build it ourselves and get exactly the

    features we need
  121. @robb1e Can you build the widget service yourself in that

    time?
  122. @robb1e Are you in the widget business?

  123. @robb1e Francis Hwang, 2012 The biggest expense for a startup

    is your time. Not your laptop, not your hosting bill, not your office, but the hours in your day.
  124. @robb1e Focus on your differentiators Bonus

  125. @robb1e Over engineering is a form of waste Take away

  126. @robb1e Horizontal Scalability Story

  127. @robb1e Guardian Content API is read only and eventually consistent

  128. @robb1e Used by m., iPhone app, parts of www. and

    more
  129. @robb1e Just a simple API over an indexed data store

  130. @robb1e Each server has it’s own data store

  131. @robb1e Each data store is a replica of an internal

    master
  132. @robb1e Simple, elegant design can prevent complex architecture creep Lesson

  133. @robb1e • Solr • Elastic Search • MongoDB Tools

  134. @robb1e Emergency mode Story

  135. @robb1e Use of feature switches at Guardian enable ‘super happy

    fun mode’
  136. @robb1e Turn features off when site under increased load

  137. @robb1e Content is king and must be readable at all

    times
  138. @robb1e Page pressing enables zero downtime and last fallback

  139. @robb1e Feature flags can offer resilience as well as a

    way to roll out new features Lesson
  140. @robb1e Complex should be lots of simple Take away

  141. @robb1e Allow architecture to evolve Spend your time wisely Refactor

    continuously
  142. @robb1e Sandi Metz “The wrong abstraction is far more damaging

    than no abstraction at all. Waiting trumps guessing every time”
  143. @robb1e Q/A