web applica@ons • Source Code = SpagheR Code • Storing low value, high volume data in MySQL • Many queries using GROUP BY with highly populated tables • A warm boot will cause +20 seconds to generate any page • Difficult to scale horizontally & ver@cally • Very low concurrency • The product’s iden@ty is weak • So many features le^ unused by users
session storage • Use NoSQL database for low value, high volume data • Separate backend & frontend web applica@on, create APIs for backends • Use output caching where available • When using PHP-‐APC, make sure apc.stat = 0 • Increase concurrency by reverse proxying requests to Apache
TB/month with only 2 servers • Do output caching with Codeigniter • Achieving sub second page genera@on even in warm boots • Redesign backend by crea@ng an API for our na@ve apps
boots • Aggressive & effec@ve caching mechanism • Op@mized MY_Controller • Session storage handled by Memcache • MySQL read/write access lowered from ~400 qps to only 1 qps • Lean memory usage in database server • Created an OAUTH enabled API • Concurrency increased by using nginx as reverse proxy • The same server setup can theore@cally handle 10x the current traffic without scaling horizontally • Google bots are only limited by bandwidth instead of efficient codes • Index properly with MySQL • Don’t use MySQL, used custom built MySQL alterna@ve: Percona Server
Unpredictable behavior of codes because of V0 inheritance, when more rows fill, queries are bohlenecks • Subqueries s@ll exists • Everything is s@ll synchronous, no message queue yet • The end product fails to impress the illusion of speed (fast) to users • New hires have a steeper learning curve because of the inherited complexity added with V1’s own complex • S@ll difficult to scale horizontally & ver@cally
delivery but op@miza@on & efficiency of codes are ques@onable at best • Need to enable asynchronous architecture • Do not do things real@me, instead offload to message queues • To impress users with the illusion of speed, JavaScript must be thoroughly implemented • Emails should not be handled by ourselves, use third party email solu@ons like AWS SES • Offload server side interna@onal bandwidth to clients, for Facebook, use Facebook JS SDK instead of the PHP SDK • The product gains more engagements with contents that are more focused (thema@c) • Speed of content delivery is important to engagement metrics
iden@ty based on users’ personas • Focus more on ver@cals, create the illusion of a discovery/ recommenda@on planorm • Progressive Disclosure of contents • A JavaScript framework that is light, fast and minimal dependencies • Make everything asynchronous and message/event based • Redefine Urbanesia’s atomic data structure • Do MySQL JOINs in server side • Get the data first FAST, compute later
part of Urbanesia will really work for users • Store the preferences for each users’ dynamic ac@vity • Make calcula@ons of other contents a user might consume • Present the content unobtrusively • Do it fast and almost real@me
Urbanesia will really work for users • Mine all user’s data each @me they visit, including anonymous users • Log everything FAST and asynchronously • Low value & high volume data • Avoid MySQL at all cost • Model data based on choosen NoSQL database model
from memory • Stores data on disk • Key/Value similarity with Memcache • Ability to perform atomic tasks without worrying states • Redis’ primi@ve data types are very simple • Ideal for low value/high volume data • Less is more!
ac@vity • Simple increments • Perfect for Sorted Hashmaps in Redis • Need them sorted so analy@cs func@ons is supported primi@vely by Redis == High Performance • Fire & Forget – Consider using async frameworks like Node.js & trigger using JavaScript • Why trigger with JavaScript? To make sure at the very least that it’s actually users accessing the page
is a Network ready daemon with Chrome’s V8 JavaScript engine inside • Node.js is asynchronous by default (event based) • Socket.io is the transport used for data • Socket.io is abstracted to fallback gracefully between Websocket, Flash and plain AJAX • JavaScript clients should only subscribe to onFailed events to minimize overhead
might consume • Use Machine Learning algorithms to learn users behaviors • Naïve Bayes Classifier to the rescue • Independent per keyword assump@ons • Proven algorithm used by many big websites
is no wrong or right assump@ons, only accuracy • Accuracy is increased with more data and beher classifica@ons • Rela@vely easy to code • Lots of libraries out there in different languages
second to classify 1 keyword • Redis as storage • Reworked classifica@on algorithm • Get the data first and compute later • More memory usage, faster execu@on @me
be open to new things • Geek Talk with peers from the industry • Very talented people will always come up with smarter and beher way to do something • Decide, get smart or get smarter? • Algorithms are the engine but it doesn’t mean anything without implementa@on • Consider opening up source codes for others to examine, the smarter the popula@on, the beher products we create • Focus on USERS instead of technology
Product Design and Technical Implementa@on • Focusing more on users and our RICH content • A social network useful for everyday city life • Machine learning implementa@on for our recommenda@on engine
• Invest in company culture • Focus on USERS, not technology • Macro to Micro op@miza@ons & scaling • Be open to new ideas (things) • Geek Talks over whatever like Basketball or Beer • Good is not Great • Whatever WORKS