Upgrade to Pro — share decks privately, control downloads, hide ads and more …

5 ways we screwed up Microservices

5 ways we screwed up Microservices

Lessons learned while adopting Microservices at Yammer

Gonzalo Maldonado

April 23, 2015
Tweet

More Decks by Gonzalo Maldonado

Other Decks in Programming

Transcript

  1. Hi!  My  name  is  Gonzalo  and  we’re  going  to  go

     over  some  challenges  we  had  while   adop:ng  an  SOA  architecture.   1  
  2. *  We  didn’t  use  Circuit  breakers  at  the  start,  so

     we  didn't  have  Back  Pressure.   *  We  had  unbounded  queues  (Which  can  cause  RabbitMQ  to  run  out  of  memory)   9  
  3. *  We  Thought  all  services  had  the  same  storage  requirements

     and  created  a  shared   AP  storage  that  could  get  saturated  by  a  single  service...   *  We  are  not  a  db  company,  so  our  custom  sharing/rou:ng  logic  wasn’t  the  best.   *  Our  consistent  hashing  didn’t  spread  par::ons  enough  and  created  hotspots.   *  Sharing  storage  also  complicates  tracking  state  changes  and  makes  database   migra:ons  a  nightmare.   10  
  4. *  JSON  is  awesome  and  lightweight…  But  it's  schema-­‐less,  so

     you  can't  deprecate  old   endpoints  easily  and  you  will  need  to  create  new  endpoints  for  adding  new  required   fields.   •  Integra:on  tes:ng  is  vital   11  
  5. *  Created  some  Mini-­‐Monoliths   *  Model  dependencies  are  bad,

     Services  dependencies  are  ten  :mes  worse  (SLA   degrades  exponen:ally  as  your  dependencies  increase)  If  you  have  two  services  at   99.99%  availability  now  you  are  down  to  99.95%  (Note:  These  numbers  are  made  up,   but  some:mes  it  is  that  bad.)     *  Graceful  degrada:on  should  not  be  an  aferthought.   *  We  didn’t  enforce  strict  boundaries  which  created  dependencies  on  their  internals.   This  creates  fragile  services.   12  
  6. *  We  didn’t  have  consistent  metrics   *  Our  dashboards

     were  slow   *  We  didn’t  have  aggregated  logs   *  Tracing  issues  across  services  is  hard.   13  
  7. And  watch  this  Big  Ruby  talk  by  Brian  Morton.  

    hgps://www.youtube.com/watch?v=GuJ49PNBsn8     20