Upgrade to Pro — share decks privately, control downloads, hide ads and more …

APPLICATION ARCHITECTURE FOR THE REST OF US

Sponsored · Ship Features Fearlessly Turn features on and off without deploys. Used by thousands of Ruby developers.

APPLICATION ARCHITECTURE FOR THE REST OF US

Presentation of my talk at phpXperts DevCon 2012 [https://www.facebook.com/events/348343888531593/] covering the Enterprise Application Architecture along with explanation of Leading Technology Tools, Best Practices and Guidelines on DOs & DONTs.

Avatar for M N Islam Shihan

M N Islam Shihan

February 26, 2012
Tweet

More Decks by M N Islam Shihan

Other Decks in Programming

Transcript

  1. Introduction  Target Audience  What is Architecture?  Architecture

    is the foundation of your application  Applications are not like Sky Scrappers  Enterprise Vs Personal Architecture  Why look ahead in Architecture?  Adaptability with Growth  Maintainability  Requirements never ends
  2. Enterprise Architecture (cont…)  Security  Responsiveness  Extendibility 

    Availability  Load Management  Distributed Computation  Caching  Scalability
  3. Security (cont…) Think about Security first of all  Network

    Security: Implement Firewall & Reverse Proxy for your network  SQL Injection: Never forget to escape field values in your queries  XSS (Cross Site Scripting): Never trust user provided (or grabbed from third party data sources) data and display without sanitizing/escaping  CSRF (Cross Site Request Forgery): Never let your forms to be submitted from third party sites
  4. Security (cont…)  DDOS (Distributed Daniel of Services): Enable real

    time monitoring of access to detect and prevent DDOS attacks  Session fixation: Implement session key regeneration for every request  Always hash your security tokens/cookies with new random salts per request/session basis (or in an interval)  Stay tuned and up-to-date with security news and releases of all of your used tools and technologies
  5. Responsiveness (cont…)  Web applications should be as responsive as

    Desktop Applications  Plan well and apply good use of JavaScript to achieve Responsiveness  Detect browsers and provide separate response/interface depending on detected browser type  Implement unobtrusive use of JavaScript  Implement optimal use of Ajax  Use Comet Programming instead of Polling  Implement deferred/asynchronous processing of large computations using Job Queue
  6. Extendibility  Implement and use robust data access interface, so

    that they can be exposed easily via web services (like REST, SOAP, JSONP)  Use architectural patterns & best practices  SOA (Service Oriented Architecture)  MVC (Model View Controller)  Modular architecture with plug-ability  Allow hooks and overrides through Events
  7. Availability (cont…)  Implement well planned Disaster Recovery policy 

    Use version control for your sources  Use RAID for your storage devices  Keep hot standby fallback for each of your primary data/content servers  Perform periodical backup of your source repository, files & data  Implement periodical archiving of your old data  Provide mechanism to the users to switch between current and archived data when possible
  8. Load Management (cont…)  Monitor and Benchmark your servers periodically

    and find pick usage time  Optimize to support at least 150% of pick time load  Use web servers with high I/O performance  Introduce load balancer to distribute loads among multiple application Servers  Start with software (aka. reverse proxy) then grow to use hardware load balancer only if necessary  Use CDNs to serve your static contents  Use public CDNs to serve the open source JavaScript or CSS files when possible
  9. Caching  To Cache Or Not to Cache?  Analyze

    the nature of content and response generated by your application very well  What to cache?  Analyze and set proper expiry time  Invalidate cache whenever content changes  Partial caching will also bring you speed  When caching is bad?  Understand various types of web caches  Browser cache  Proxy cache  Gateway cache
  10. Caching (cont…)  Implement server side caching  Runtime in-memory

    cache  Per request: Global variables  Shared: Memcached  Persistent Cache  Per Server: File based, APC  Shared: Db based, Redis  Optimizers and accelerators: eAccelerator, XCache  Reverse proxy/gateway cache  Varnish cache
  11. Scalability  What the heck is this?  Scalability is

    the soul of enterprise architecture  Scalability pyramid
  12. Scalability  Database Scalability  Vertical: Add resource to server

    as needed  In most cases produce single point of failure  Horizontal: Distribute/replicate data among multiple servers  Cloud Services: Store your data to third party data centers and pay with respect to your usage
  13. Scalability (cont…) Scaling Database Scaling options  Master/Slave  Master

    for Write, Slaves for Read  Cluster Computing  Single storage with multiple server node  Table Partitioning  Large tables are split among partitions  Federated Tables  Tables are shared among multiple servers  Distributed Key Value Stores  Distributed Object DB  Database Sharding
  14. Scalability (cont…) Database Sharding  Smaller databases are easier to

    manage  Smaller databases are faster  Database sharding can reduce costs  Need one or multiple well define shard functions  "Don't do it, if you don't need to!" (37signals.com)  "Shard early and often!" (startuplessonslearned.blo gspot.com)
  15. Scalability (cont…) Database Sharding  High-transaction database applications  Mixed

    workload database usage  Frequent reads, including complex queries and joins  Write-intensive transactions (CRUD statements, including INSERT, UPDATE, DELETE)  Contention for common tables and/or rows  General Business Reporting  Typical "repeating segment" report generation  Some data analysis (mixed with other workloads)  Identify all transaction-intensive tables in your schema.  Determine the transaction volume your database is currently handling (or is expected to handle).  Identify all common SQL statements (SELECT, INSERT, UPDATE, DELETE), and the volumes associated with each.  Develop an understanding of your "table hierarchy" contained in your schema; in other words the main parent-child relationships.  Determine the "key distribution" for transactions on high-volume tables, to determine if they are evenly spread or are concentrated in narrow ranges. When appropriate? What to analyze?
  16. Scalability (cont…) Database Sharding  Challenges  Reliability  Automated

    backups  Database Shard redundancy  Cost-effective hardware redundancy  Automated failover  Disaster Recovery  Distributed queries  Aggregation of statistics  Queries that support comprehensive reports
  17. Scalability (cont…) Database Sharding  Challenges (cont…)  Avoidance of

    cross-shard joins  Auto-increment key management  Support for multiple Shard Schemes  Session-based sharding  Transaction-based sharding  Statement-based sharding  Determine the optimum method for sharding the data  Shard by a primary key on a table  Shard by the modulus of a key value  Maintain a master shard index table
  18. Tools  Application framework  Load balancer with multiple application

    servers  Continuous integration  Automated Testing  TDD (Test Driven Development)  BDD (Behavior Driven Development)  Monitoring  Services  Servers  Error Logging  Access Logging  Content Data Networks (CDN)  FOSS
  19. Think Ahead (cont…)  Understand business model  Analyze requirement

    in greatest detail  Plan for extendibility  Be agile, do incremental architecture  Create/use frameworks  SQL or NoSQL?  Sharding or clustering or both?  Cloud services?
  20. Guidelines  Enrich your knowledge: Read, read & read. Read

    anything available : jokes to religions.  Follow patterns & best practices  Mix technologies  Don’t let your tools/technologies limit your vision  Invent/customize technology if required  Use FOSS  Don’t expect ready solutions  Find the closest match  Customize as needed
  21. Guidelines (cont…) Database Optimization  Use established & proven solutions

     MySQL  PostgreSQL  MongoDB  Redis  Memchached  CouchDB  Understand and utilize indexing & full-text search  Use optimized DB structure & algorithms  Modified Preorder Tree Traversal (MPTT)  Map Reduce  ORM or not?
  22. Guidelines (cont…) Database Optimization  Optimize your queries  One

    big query is faster than repetitive smaller queries  Never be lazy to write optimized queries  One Ring to Rule `em All  Use Runtime In Memory Cache  Filtering in-memory cached dataset is much faster than executing a query in DB
  23. Guidelines (cont…) One Ring to Rule `em All Perform Selection,

    then Projection, then Join A B C 1,000 records 1000,000 records 1000,000,000 records A simple example Write a standard SQL query to find all records with fields A.a1, B.b1 and C.c1 from tables A (id, a1,a2, a3, …,aP), B (id, a_id, b1, b2, b3, …, bQ), and C(id, b_id, c1, c2, c3, …,cR) given that A.aX, B.bY and C.cZ will match ‘X’, ‘Y’ and ‘Z’ values respectively. Assume all tables A, B, C has primary keys defined by id column and a_id and b_id are the foreign keys in B from A and in C from B respectively. a_id
  24. Guidelines One Ring to Rule `em All (cont…) Solution 1

    SELECT A.a1, B.b1, C.c1 FROM A, B, C WHERE A.id = B.a_id AND B.id = C.b_id AND A.aX = ‘X’ AND B.bY = ‘Y’ AND C.cZ = ‘Z’ Why it Sucks? •Remembered the size of A, B and C tables? •Cross product of tables are always memory extensive, why? •A x B x C will have 1,000 x 1,000,000 x 1,000,000,000 records with (P +1) + (Q +2) + (R +2) fields •Can you imagine the size of in-memory result set of joined tables? •It will be HUGE
  25. Guidelines One Ring to Rule `em All (cont…) Solution 2

    SELECT A.a1, B.b1, C.c1 FROM A INNER JOIN B ON A.id = B.a_id INNER JOIN C ON B.id = C.b_id WHERE A.aX = ‘X’ AND B.bY = ‘Y’ AND C.cZ = ‘Z’ Why it still Sucks? •A  B  C will produce (1,000 x 1,000,000) records to perform A  B and then produce another (1,000 x 1,000,000,000) records to compute (A  B)  C and then it will filters the records defined by WHERE clause. •The number of fields, that is P+1 in A, Q+2 in B and R+2 in C will also contribute in memory consumption. •It is optimized but still be HUGE with respect to memory consumption and computation
  26. Guidelines One Ring to Rule `em All (cont…) Optimal Solution

    SELECT A.a1, B.b1, C.c1 FROM (SELECT id, a1 FROM A WHERE aX = ‘X’) as A INNER JOIN ( SELECT id, b1, a_id FROM B WHERE bY = ‘Y’) as B ON A.id = B.a_id INNER JOIN ( SELECT id, c1, b_id FROM C WHERE cZ = ‘Z’) as C ON B.id = C.b_id Why this solution out performs? •Let’s keep the explanation as an exercise 
  27. Reference : Tools  Security  Nmap: http://nmap.org/  Nikto:

    http://cirt.net/Nikto2  List of Tools: http://sectools.org/  Caching  APC: http://php.net/manual/en/book.apc.php  XCache: http://xcache.lighttpd.net/  eAccelerator: http://sourceforge.net/projects/eaccelerator/  Varnish Cache: https://www.varnish-cache.org/  MemCached: http://memcached.org/  Redis: http://redis.io/  Load Balancer  HAProxy: http://haproxy.1wt.eu/  Pound: http://www.apsis.ch/pound/
  28. Reference : Tools (cont…)  NoSQL  MongoDB: http://www.mongodb.org/ 

    CouchDB: http://couchdb.apache.org/  A complete list: http://nosql-database.org/  Distributed Computing  GearMan: http://gearman.org/  Message Queue/Job Server  RabitMQ: http://www.rabbitmq.com/  ActiveMQ: http://activemq.apache.org/  Monitoring  Nagios: http://www.nagios.org/  Testing  Selenium: http://seleniumhq.org/  Cucumber: http://cukes.info/  Watir: http://watir.com/  PhpUnit: http://www.phpunit.de/manual/3.7/en/  MPTT  Shameless Promotion: https://github.com/mnishihan/phpMptt
  29. Reference : Articles  Caching  http://www.mnot.net/cache_docs/  http://bit.ly/9cTJfA 

    Load Balancing  http://en.wikipedia.org/wiki/Load_balancing_%28computing%29  http://1wt.eu/articles/2006_lb/index.html  Scalability & Architecture  http://www.diranieh.com/DistributedDesign_1/Scalability.htm  http://www.infoq.com/presentations/Facebook-Software-Stack  http://99designs.com/tech-blog/blog/2012/01/30/infrastructure-at-99designs/  http://bit.ly/16cKu  Database Sharding  http://www.codefutures.com/database-sharding/  http://bit.ly/Y3b3J  http://www.startuplessonslearned.com/2009/01/sharding-for-startups.html  CDN  http://bit.ly/sMRyxC  MPTT  http://www.sitepoint.com/hierarchical-data-database/