Upgrade to Pro — share decks privately, control downloads, hide ads and more …

APPLICATION ARCHITECTURE FOR THE REST OF US

APPLICATION ARCHITECTURE FOR THE REST OF US

Presentation of my talk at phpXperts DevCon 2012 [https://www.facebook.com/events/348343888531593/] covering the Enterprise Application Architecture along with explanation of Leading Technology Tools, Best Practices and Guidelines on DOs & DONTs.

Avatar for M N Islam Shihan

M N Islam Shihan

February 26, 2012
Tweet

More Decks by M N Islam Shihan

Other Decks in Programming

Transcript

  1. Introduction  Target Audience  What is Architecture?  Architecture

    is the foundation of your application  Applications are not like Sky Scrappers  Enterprise Vs Personal Architecture  Why look ahead in Architecture?  Adaptability with Growth  Maintainability  Requirements never ends
  2. Enterprise Architecture (cont…)  Security  Responsiveness  Extendibility 

    Availability  Load Management  Distributed Computation  Caching  Scalability
  3. Security (cont…) Think about Security first of all  Network

    Security: Implement Firewall & Reverse Proxy for your network  SQL Injection: Never forget to escape field values in your queries  XSS (Cross Site Scripting): Never trust user provided (or grabbed from third party data sources) data and display without sanitizing/escaping  CSRF (Cross Site Request Forgery): Never let your forms to be submitted from third party sites
  4. Security (cont…)  DDOS (Distributed Daniel of Services): Enable real

    time monitoring of access to detect and prevent DDOS attacks  Session fixation: Implement session key regeneration for every request  Always hash your security tokens/cookies with new random salts per request/session basis (or in an interval)  Stay tuned and up-to-date with security news and releases of all of your used tools and technologies
  5. Responsiveness (cont…)  Web applications should be as responsive as

    Desktop Applications  Plan well and apply good use of JavaScript to achieve Responsiveness  Detect browsers and provide separate response/interface depending on detected browser type  Implement unobtrusive use of JavaScript  Implement optimal use of Ajax  Use Comet Programming instead of Polling  Implement deferred/asynchronous processing of large computations using Job Queue
  6. Extendibility  Implement and use robust data access interface, so

    that they can be exposed easily via web services (like REST, SOAP, JSONP)  Use architectural patterns & best practices  SOA (Service Oriented Architecture)  MVC (Model View Controller)  Modular architecture with plug-ability  Allow hooks and overrides through Events
  7. Availability (cont…)  Implement well planned Disaster Recovery policy 

    Use version control for your sources  Use RAID for your storage devices  Keep hot standby fallback for each of your primary data/content servers  Perform periodical backup of your source repository, files & data  Implement periodical archiving of your old data  Provide mechanism to the users to switch between current and archived data when possible
  8. Load Management (cont…)  Monitor and Benchmark your servers periodically

    and find pick usage time  Optimize to support at least 150% of pick time load  Use web servers with high I/O performance  Introduce load balancer to distribute loads among multiple application Servers  Start with software (aka. reverse proxy) then grow to use hardware load balancer only if necessary  Use CDNs to serve your static contents  Use public CDNs to serve the open source JavaScript or CSS files when possible
  9. Caching  To Cache Or Not to Cache?  Analyze

    the nature of content and response generated by your application very well  What to cache?  Analyze and set proper expiry time  Invalidate cache whenever content changes  Partial caching will also bring you speed  When caching is bad?  Understand various types of web caches  Browser cache  Proxy cache  Gateway cache
  10. Caching (cont…)  Implement server side caching  Runtime in-memory

    cache  Per request: Global variables  Shared: Memcached  Persistent Cache  Per Server: File based, APC  Shared: Db based, Redis  Optimizers and accelerators: eAccelerator, XCache  Reverse proxy/gateway cache  Varnish cache
  11. Scalability  What the heck is this?  Scalability is

    the soul of enterprise architecture  Scalability pyramid
  12. Scalability  Database Scalability  Vertical: Add resource to server

    as needed  In most cases produce single point of failure  Horizontal: Distribute/replicate data among multiple servers  Cloud Services: Store your data to third party data centers and pay with respect to your usage
  13. Scalability (cont…) Scaling Database Scaling options  Master/Slave  Master

    for Write, Slaves for Read  Cluster Computing  Single storage with multiple server node  Table Partitioning  Large tables are split among partitions  Federated Tables  Tables are shared among multiple servers  Distributed Key Value Stores  Distributed Object DB  Database Sharding
  14. Scalability (cont…) Database Sharding  Smaller databases are easier to

    manage  Smaller databases are faster  Database sharding can reduce costs  Need one or multiple well define shard functions  "Don't do it, if you don't need to!" (37signals.com)  "Shard early and often!" (startuplessonslearned.blo gspot.com)
  15. Scalability (cont…) Database Sharding  High-transaction database applications  Mixed

    workload database usage  Frequent reads, including complex queries and joins  Write-intensive transactions (CRUD statements, including INSERT, UPDATE, DELETE)  Contention for common tables and/or rows  General Business Reporting  Typical "repeating segment" report generation  Some data analysis (mixed with other workloads)  Identify all transaction-intensive tables in your schema.  Determine the transaction volume your database is currently handling (or is expected to handle).  Identify all common SQL statements (SELECT, INSERT, UPDATE, DELETE), and the volumes associated with each.  Develop an understanding of your "table hierarchy" contained in your schema; in other words the main parent-child relationships.  Determine the "key distribution" for transactions on high-volume tables, to determine if they are evenly spread or are concentrated in narrow ranges. When appropriate? What to analyze?
  16. Scalability (cont…) Database Sharding  Challenges  Reliability  Automated

    backups  Database Shard redundancy  Cost-effective hardware redundancy  Automated failover  Disaster Recovery  Distributed queries  Aggregation of statistics  Queries that support comprehensive reports
  17. Scalability (cont…) Database Sharding  Challenges (cont…)  Avoidance of

    cross-shard joins  Auto-increment key management  Support for multiple Shard Schemes  Session-based sharding  Transaction-based sharding  Statement-based sharding  Determine the optimum method for sharding the data  Shard by a primary key on a table  Shard by the modulus of a key value  Maintain a master shard index table
  18. Tools  Application framework  Load balancer with multiple application

    servers  Continuous integration  Automated Testing  TDD (Test Driven Development)  BDD (Behavior Driven Development)  Monitoring  Services  Servers  Error Logging  Access Logging  Content Data Networks (CDN)  FOSS
  19. Think Ahead (cont…)  Understand business model  Analyze requirement

    in greatest detail  Plan for extendibility  Be agile, do incremental architecture  Create/use frameworks  SQL or NoSQL?  Sharding or clustering or both?  Cloud services?
  20. Guidelines  Enrich your knowledge: Read, read & read. Read

    anything available : jokes to religions.  Follow patterns & best practices  Mix technologies  Don’t let your tools/technologies limit your vision  Invent/customize technology if required  Use FOSS  Don’t expect ready solutions  Find the closest match  Customize as needed
  21. Guidelines (cont…) Database Optimization  Use established & proven solutions

     MySQL  PostgreSQL  MongoDB  Redis  Memchached  CouchDB  Understand and utilize indexing & full-text search  Use optimized DB structure & algorithms  Modified Preorder Tree Traversal (MPTT)  Map Reduce  ORM or not?
  22. Guidelines (cont…) Database Optimization  Optimize your queries  One

    big query is faster than repetitive smaller queries  Never be lazy to write optimized queries  One Ring to Rule `em All  Use Runtime In Memory Cache  Filtering in-memory cached dataset is much faster than executing a query in DB
  23. Guidelines (cont…) One Ring to Rule `em All Perform Selection,

    then Projection, then Join A B C 1,000 records 1000,000 records 1000,000,000 records A simple example Write a standard SQL query to find all records with fields A.a1, B.b1 and C.c1 from tables A (id, a1,a2, a3, …,aP), B (id, a_id, b1, b2, b3, …, bQ), and C(id, b_id, c1, c2, c3, …,cR) given that A.aX, B.bY and C.cZ will match ‘X’, ‘Y’ and ‘Z’ values respectively. Assume all tables A, B, C has primary keys defined by id column and a_id and b_id are the foreign keys in B from A and in C from B respectively. a_id
  24. Guidelines One Ring to Rule `em All (cont…) Solution 1

    SELECT A.a1, B.b1, C.c1 FROM A, B, C WHERE A.id = B.a_id AND B.id = C.b_id AND A.aX = ‘X’ AND B.bY = ‘Y’ AND C.cZ = ‘Z’ Why it Sucks? •Remembered the size of A, B and C tables? •Cross product of tables are always memory extensive, why? •A x B x C will have 1,000 x 1,000,000 x 1,000,000,000 records with (P +1) + (Q +2) + (R +2) fields •Can you imagine the size of in-memory result set of joined tables? •It will be HUGE
  25. Guidelines One Ring to Rule `em All (cont…) Solution 2

    SELECT A.a1, B.b1, C.c1 FROM A INNER JOIN B ON A.id = B.a_id INNER JOIN C ON B.id = C.b_id WHERE A.aX = ‘X’ AND B.bY = ‘Y’ AND C.cZ = ‘Z’ Why it still Sucks? •A  B  C will produce (1,000 x 1,000,000) records to perform A  B and then produce another (1,000 x 1,000,000,000) records to compute (A  B)  C and then it will filters the records defined by WHERE clause. •The number of fields, that is P+1 in A, Q+2 in B and R+2 in C will also contribute in memory consumption. •It is optimized but still be HUGE with respect to memory consumption and computation
  26. Guidelines One Ring to Rule `em All (cont…) Optimal Solution

    SELECT A.a1, B.b1, C.c1 FROM (SELECT id, a1 FROM A WHERE aX = ‘X’) as A INNER JOIN ( SELECT id, b1, a_id FROM B WHERE bY = ‘Y’) as B ON A.id = B.a_id INNER JOIN ( SELECT id, c1, b_id FROM C WHERE cZ = ‘Z’) as C ON B.id = C.b_id Why this solution out performs? •Let’s keep the explanation as an exercise 
  27. Reference : Tools  Security  Nmap: http://nmap.org/  Nikto:

    http://cirt.net/Nikto2  List of Tools: http://sectools.org/  Caching  APC: http://php.net/manual/en/book.apc.php  XCache: http://xcache.lighttpd.net/  eAccelerator: http://sourceforge.net/projects/eaccelerator/  Varnish Cache: https://www.varnish-cache.org/  MemCached: http://memcached.org/  Redis: http://redis.io/  Load Balancer  HAProxy: http://haproxy.1wt.eu/  Pound: http://www.apsis.ch/pound/
  28. Reference : Tools (cont…)  NoSQL  MongoDB: http://www.mongodb.org/ 

    CouchDB: http://couchdb.apache.org/  A complete list: http://nosql-database.org/  Distributed Computing  GearMan: http://gearman.org/  Message Queue/Job Server  RabitMQ: http://www.rabbitmq.com/  ActiveMQ: http://activemq.apache.org/  Monitoring  Nagios: http://www.nagios.org/  Testing  Selenium: http://seleniumhq.org/  Cucumber: http://cukes.info/  Watir: http://watir.com/  PhpUnit: http://www.phpunit.de/manual/3.7/en/  MPTT  Shameless Promotion: https://github.com/mnishihan/phpMptt
  29. Reference : Articles  Caching  http://www.mnot.net/cache_docs/  http://bit.ly/9cTJfA 

    Load Balancing  http://en.wikipedia.org/wiki/Load_balancing_%28computing%29  http://1wt.eu/articles/2006_lb/index.html  Scalability & Architecture  http://www.diranieh.com/DistributedDesign_1/Scalability.htm  http://www.infoq.com/presentations/Facebook-Software-Stack  http://99designs.com/tech-blog/blog/2012/01/30/infrastructure-at-99designs/  http://bit.ly/16cKu  Database Sharding  http://www.codefutures.com/database-sharding/  http://bit.ly/Y3b3J  http://www.startuplessonslearned.com/2009/01/sharding-for-startups.html  CDN  http://bit.ly/sMRyxC  MPTT  http://www.sitepoint.com/hierarchical-data-database/