Leveraging the Web for Services at Yahoo!

Leveraging the Web for Services at Yahoo!

In this talk, recorded at QCon London, Mark Nottingham explains how Yahoo! leverages Web technologies, specifically HTTP-based caching using Squid, to create a high-performance architecture for integrating multiple Yahoo! properties, concluding that the Web provides sophisticated techniques without using SOA tooling such as ESBs.

38f92fdb9ac1b5213d40c595b14ec620?s=128

Mark Nottingham

March 16, 2007
Tweet

Transcript

  1. Leveraging the Web for Services at Mark Nottingham <mnot@yahoo-inc.com>

  2. 992 1994 1996 1998 2000 2002 2004 2006 200 100,000

    users 25 million users Y! Japan Yahooligans! Y! UK Y! Germany Y! France Y! Singapore Y! Classifieds Y! Australia Y! Korea Y! Mail Y! Travel Y! Sports Y! Games Y! Italy Y! Movies Y! Spain Y! Small Business Y! Auctions Y! Shopping GeoCities Y! Entertainment Broadcast.com Y! Health Y! Brazil Y! Messenger Y! China Y! Mexico Y! Photos Y! Argentina Y! India Y! Groups LAUNCH HotJobs Y! Maps Inktomi Y! Search Overture Kelkoo Y! 360 Y! Music Y! Podcasts Y! Go Y! Video Y! Food Y! Tech Bix Y! 500 million users 49 employees Flickr del.icio.us Upcoming 11,400 employees 4 billion daily page views 200 million users 1.3 billion daily page views 65 million daily page views Y! Local Y! Calendar Y! Personal Finance
  3. Y! Japan Yahooligans! Y! UK Y! Germany Y! France Y!

    Singapore Y! Classifieds Y! Australia Y! Korea Y! Mail Y! Travel Y! Sports Y! Games Y! Italy Y! Chinese Y! Spain Y! Small Business Y! Auctions Y! Shopping GeoCities Y! Entertainment Broadcast.com Y! Health Y! Brazil Y! Messenger Y! China Y! Mexico Y! Photos Y! Argentina Y! India Y! Groups LAUNCH HotJobs Y! Maps Inktomi Y! Search Overture Kelkoo Y! 360 Y! Music Y! Podcasts Y! Go Y! Video Y! Food Y! Tech Bix Y! Flickr del.icio.us Upcoming Integration Nightmare
  4. Between properties With partners Within over time With acquisitions

  5. “ Scale”

  6. None
  7. Y! Sports Y! News Media Group Y! Fantasy Sports Y!

    Finance Y! Tech Y! Food Y! Music Y! Personal Finance Y! Movies Y! TV Y! Games Y! Kids Y! Astrology Y! Health Y! Wii
  8. None
  9. Front-End Master Database Slave Database

  10. Large Datasets Don’t Push Well

  11. Large Datasets Don’t Push Well Adding Capacity is Expensive

  12. Large Datasets Don’t Push Well Adding Capacity is Expensive What

    to do after push failure?
  13. Large Datasets Don’t Push Well User-Generated Content Adding Capacity is

    Expensive What to do after push failure?
  14. None
  15. None
  16. Building New Sites

  17. Requirements

  18. Requirements massive scalability flexible deployment highly dynamic separation of concerns

    mashability
  19. i.e., Services

  20. Scalability Simplicity Reuse Interoperability

  21. i.e., HTTP

  22. Caching Back-End Front-End Database

  23. Single Source of Truth

  24. Single Source of Truth Cache Replicates Naturally

  25. Single Source of Truth Adding Capacity Is Easy Cache Replicates

    Naturally
  26. Single Source of Truth Adding Capacity Is Easy UGC Pushes

    Through Cache Replicates Naturally
  27. None
  28. HTTP Caching Intermediaries

  29. HTTP Caching Intermediaries Freshness

  30. HTTP Caching Intermediaries Freshness Validation

  31. HTTP Caching Intermediaries Freshness Validation Metrics 0101011010010101101010011001010101

  32. None
  33. HTTP Caching Intermediaries Freshness Validation Load Balancing Metrics

  34. HTTP Caching Intermediaries Freshness Validation Cache Peering Load Balancing Metrics

  35. HTTP Caching Intermediaries Freshness Validation Cache Peering Negative Caching Load

    Balancing Metrics
  36. HTTP Caching Intermediaries Freshness Validation Cache Peering Collapsed Forwarding Negative

    Caching Load Balancing Metrics
  37. HTTP Caching Intermediaries Freshness Validation Cache Peering Collapsed Forwarding Negative

    Caching stale-while-revalidate Load Balancing Metrics
  38. HTTP Caching Intermediaries Freshness Validation Cache Peering Collapsed Forwarding Negative

    Caching stale-while-revalidate stale-if-error Load Balancing Metrics ?
  39. HTTP Caching Intermediaries Freshness Validation Cache Peering Collapsed Forwarding Negative

    Caching stale-while-revalidate stale-if-error Invalidation Channels Load Balancing Metrics
  40. 12,000 req/sec/core

  41. Tens of Thousands of Connections

  42. None
  43. pitfalls REST vs. WS-* wars theory vs. practice human-intuitive, but

    not programmer-intuitive different deployment/operational concerns formats are hard format / interface proliferation authentication isn’t there yet tools have a way to go
  44. what’s needed tools web description language data-oriented schema language investment

    in the Atom stack HTTP test suite
  45. None