Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Urbanesia - Development History

Urbanesia - Development History

Slide deck for Business Connect - 29 October 2012

Indonesia Ministry of Tourism & Creative Economy

Batista Harahap

October 29, 2012
Tweet

More Decks by Batista Harahap

Other Decks in Technology

Transcript

  1. PROS   •  Data  structures  in  MySQL   •  Effec@ve

     memory  caching  implementa@ons   •  Effec@ve  SEO  implementa@ons   •  Effec@ve  search  server  implementa@ons   •  Urbanesia  is  successfully  consumed  as  a   Directory  
  2. CONS   •  No  effec@ve  separa@on  of  Backend  &  Frontend

     web   applica@ons   •  Source  Code  =  SpagheR  Code   •  Storing  low  value,  high  volume  data  in  MySQL   •  Many  queries  using  GROUP  BY  with  highly  populated  tables   •  A  warm  boot  will  cause  +20  seconds  to  generate  any  page   •  Difficult  to  scale  horizontally  &  ver@cally   •  Very  low  concurrency     •  The  product’s  iden@ty  is  weak   •  So  many  features  le^  unused  by  users  
  3. WHAT  WE  LEARNED   •  Do  NOT  use  MySQL  as

     session  storage   •  Use  NoSQL  database  for  low  value,  high  volume   data   •  Separate  backend  &  frontend  web  applica@on,   create  APIs  for  backends   •  Use  output  caching  where  available   •  When  using  PHP-­‐APC,  make  sure  apc.stat  =  0   •  Increase  concurrency  by  reverse  proxying   requests  to  Apache  
  4. CHALLENGES   •  Handle  Google  Bots  traffic  of  over  1

     TB/month   with  only  2  servers   •  Do  output  caching  with  Codeigniter   •  Achieving  sub  second  page  genera@on  even  in   warm  boots   •  Redesign  backend  by  crea@ng  an  API  for  our   na@ve  apps  
  5. PROS   •  Achieved  sub  second  page  genera@on  in  warm

     boots   •  Aggressive  &  effec@ve  caching  mechanism   •  Op@mized  MY_Controller   •  Session  storage  handled  by  Memcache   •  MySQL  read/write  access  lowered  from  ~400  qps  to  only  1  qps   •  Lean  memory  usage  in  database  server   •  Created  an  OAUTH  enabled  API   •  Concurrency  increased  by  using  nginx  as  reverse  proxy   •  The  same  server  setup  can  theore@cally  handle  10x  the  current  traffic   without  scaling  horizontally   •  Google  bots  are  only  limited  by  bandwidth  instead  of  efficient  codes   •  Index  properly  with  MySQL   •  Don’t  use  MySQL,  used  custom  built  MySQL  alterna@ve:  Percona  Server  
  6. CONS   •  Source  code  =  SpagheR  code   • 

    Unpredictable  behavior  of  codes  because  of  V0   inheritance,  when  more  rows  fill,  queries  are  bohlenecks   •  Subqueries  s@ll  exists   •  Everything  is  s@ll  synchronous,  no  message  queue  yet   •  The  end  product  fails  to  impress  the  illusion  of  speed  (fast)   to  users   •  New  hires  have  a  steeper  learning  curve  because  of  the   inherited  complexity  added  with  V1’s  own  complex   •  S@ll  difficult  to  scale  horizontally  &  ver@cally  
  7. WHAT  WE  LEARNED   •  CodeIgniter  is  enabling  fast  product

     delivery  but  op@miza@on  &   efficiency  of  codes  are  ques@onable  at  best   •  Need  to  enable  asynchronous  architecture   •  Do  not  do  things  real@me,  instead  offload  to  message  queues   •  To  impress  users  with  the  illusion  of  speed,  JavaScript  must  be   thoroughly  implemented   •  Emails  should  not  be  handled  by  ourselves,  use  third  party  email   solu@ons  like  AWS  SES   •  Offload  server  side  interna@onal  bandwidth  to  clients,  for   Facebook,  use  Facebook  JS  SDK  instead  of  the  PHP  SDK   •  The  product  gains  more  engagements  with  contents  that  are  more   focused  (thema@c)   •  Speed  of  content  delivery  is  important  to  engagement  metrics  
  8. CHALLENGES   •  Build  a  third  itera@on  with  a  strong

     iden@ty  based  on  users’   personas   •  Focus  more  on  ver@cals,  create  the  illusion  of  a  discovery/ recommenda@on  planorm   •  Progressive  Disclosure  of  contents   •  A  JavaScript  framework  that  is  light,  fast  and  minimal   dependencies   •  Make  everything  asynchronous  and  message/event  based   •  Redefine  Urbanesia’s  atomic  data  structure   •  Do  MySQL  JOINs  in  server  side   •  Get  the  data  first  FAST,  compute  later  
  9. PRODUCTS  &  TECHNOLOGIES   Does  the  product  makes  the  technology

      or  the  technology  makes  the  product?  
  10. REAL  WORLD  EXAMPLES   •  We  need  to  know  which

     part  of  Urbanesia  will   really  work  for  users   •  Store  the  preferences  for  each  users’  dynamic   ac@vity   •  Make  calcula@ons  of  other  contents  a  user   might  consume   •  Present  the  content  unobtrusively   •  Do  it  fast  and  almost  real@me  
  11. TECHNICAL  SPEAK   We  need  to  know  which  part  of

     Urbanesia  will  really   work  for  users     •  Mine  all  user’s  data  each  @me  they  visit,  including   anonymous  users   •  Log  everything  FAST  and  asynchronously   •  Low  value  &  high  volume  data   •  Avoid  MySQL  at  all  cost   •  Model  data  based  on  choosen  NoSQL  database  model    
  12. TECHNICAL  SPEAK   Introducing  Redis     •  Read/Write  data

     from  memory   •  Stores  data  on  disk   •  Key/Value  similarity  with  Memcache   •  Ability  to  perform  atomic  tasks  without  worrying   states   •  Redis’  primi@ve  data  types  are  very  simple   •  Ideal  for  low  value/high  volume  data   •  Less  is  more!  
  13. TECHNICAL  SPEAK   Store  the  preferences  for  each  users’  dynamic

     ac@vity     •  Simple  increments   •  Perfect  for  Sorted  Hashmaps  in  Redis   •  Need  them  sorted  so  analy@cs  func@ons  is  supported   primi@vely  by  Redis  ==  High  Performance   •  Fire  &  Forget  –  Consider  using  async  frameworks  like   Node.js  &  trigger  using  JavaScript   •  Why  trigger  with  JavaScript?  To  make  sure  at  the  very   least  that  it’s  actually  users  accessing  the  page  
  14. TECHNICAL  SPEAK   Node.js  &  Socket.io     •  Node.js

     is  a  Network  ready  daemon  with  Chrome’s  V8   JavaScript  engine  inside   •  Node.js  is  asynchronous  by  default  (event  based)   •  Socket.io  is  the  transport  used  for  data   •  Socket.io  is  abstracted  to  fallback  gracefully  between   Websocket,  Flash  and  plain  AJAX   •  JavaScript  clients  should  only  subscribe  to  onFailed   events  to  minimize  overhead  
  15. TECHNICAL  SPEAK   Make  calcula@ons  of  other  contents  a  user

      might  consume     •  Use  Machine  Learning  algorithms  to  learn   users  behaviors   •  Naïve  Bayes  Classifier  to  the  rescue   •  Independent  per  keyword  assump@ons   •  Proven  algorithm  used  by  many  big  websites  
  16. TECHNICAL  SPEAK   Naïve  Bayes  Classifier     •  There

     is  no  wrong  or  right  assump@ons,  only   accuracy   •  Accuracy  is  increased  with  more  data  and  beher   classifica@ons   •  Rela@vely  easy  to  code   •  Lots  of  libraries  out  there  in  different  languages  
  17. TECHNICAL  SPEAK   Present  the  content  unobtrusively     • 

    Giving  users  the  illusion  that  we  understand   them   •  Do  not  make  this  feature  dominant   •  Show  it  where  you  want  the  content  look   smart  
  18. TECHNICAL  SPEAK   Do  it  fast  and  almost  real@me  

      •  Fast  is  an  illusion   •  Real@me  is  overrated   •  If  you  don’t  have  enough  resource  to  do  so,   schedule  it  and  pre  generate  content   •  Scale  ver@cally  
  19. NAÏVE  BAYES  CLASSIFIER   First  Itera@on:   •  Took  ~1000

     seconds  to  classify  1  keyword   •  MySQL  as  storage   •  No  micro  op@miza@ons  
  20. NAÏVE  BAYES  CLASSIFIER   Second  Itera@on:   •  Took  ~400

     seconds  to  classify  1  keyword   •  MongoDB  as  storage   •  Macro  op@miza@on  trimmed  600  of  1000   seconds   •  No  micro  op@miza@ons  
  21. NAÏVE  BAYES  CLASSIFIER   Third  Itera@on:   •  Took  ~1

     second  to  classify  1  keyword   •  Redis  as  storage   •  Insane  macro  op@miza@on  boost   •  No  micro  op@miza@ons  
  22. NAÏVE  BAYES  CLASSIFIER   Fourth  Itera@on:   •  Took  0.01428

     second  to  classify  1  keyword   •  Redis  as  storage   •  Reworked  classifica@on  algorithm   •  Get  the  data  first  and  compute  later   •  More  memory  usage,  faster  execu@on  @me  
  23. NAÏVE  BAYES  CLASSIFIER   Fi^h  Itera@on:   •  Reworked  the

     trainer  methods   •  Created  deTrain  method  to  update  data   •  Created  helpers  to  do  keyword  blacklists   •  Consistent  performance  from  CLI  or  HTTP  
  24. NAÏVE  BAYES  CLASSIFIER   What  we  learned:   •  Always

     be  open  to  new  things   •  Geek  Talk  with  peers  from  the  industry   •  Very  talented  people  will  always  come  up  with  smarter  and   beher  way  to  do  something   •  Decide,  get  smart  or  get  smarter?   •  Algorithms  are  the  engine  but  it  doesn’t  mean  anything   without  implementa@on   •  Consider  opening  up  source  codes  for  others  to  examine,   the  smarter  the  popula@on,  the  beher  products  we  create   •  Focus  on  USERS  instead  of  technology  
  25. JAJAN   Jajan  is  Open  Source,  get  the  source  codes:

      •  Blackberry  -­‐  hhps://github.com/Urbanesia/Jajan-­‐Blackberry   •  Android  -­‐  hhps://github.com/Urbanesia/Jajan   •  HTML5  -­‐  hhps://github.com/Urbanesia/jajan-­‐html5   Planorms:   •  Blackberry  -­‐  hhps://appworld.blackberry.com/webstore/content/54742/   •  Android  -­‐  hhps://play.google.com/store/apps/details?id=com.bango.jajan   •  iOS  -­‐  hhps://itunes.apple.com/us/app/jajan/id527278768?mt=8   •  HTML5  -­‐  hhps://jajan5.urbanesia.com/    
  26. WHAT’S  NEXT   •  A  rework  from  scratch  both  in

     Product  Design   and  Technical  Implementa@on   •  Focusing  more  on  users  and  our  RICH  content   •  A  social  network  useful  for  everyday  city  life   •  Machine  learning  implementa@on  for  our   recommenda@on  engine  
  27. KEY  TAKEAWAYS   •  Empower  people  working  with  you  

    •  Invest  in  company  culture   •  Focus  on  USERS,  not  technology   •  Macro  to  Micro  op@miza@ons  &  scaling   •  Be  open  to  new  ideas  (things)   •  Geek  Talks  over  whatever  like  Basketball  or  Beer   •  Good  is  not  Great   •  Whatever  WORKS  
  28. THANK  YOU   Email  me:  ba@[email protected]   Twiher:  @@sta  

    Github:  @staharahap   Blog:  www.bango29.com