Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Leveraging the Web for Services at Yahoo!

Leveraging the Web for Services at Yahoo!

In this talk, recorded at QCon London, Mark Nottingham explains how Yahoo! leverages Web technologies, specifically HTTP-based caching using Squid, to create a high-performance architecture for integrating multiple Yahoo! properties, concluding that the Web provides sophisticated techniques without using SOA tooling such as ESBs.

Mark Nottingham

March 16, 2007
Tweet

More Decks by Mark Nottingham

Other Decks in Technology

Transcript

  1. Leveraging the Web
    for Services at
    Mark Nottingham

    View Slide

  2. 992 1994 1996 1998 2000 2002 2004 2006 200
    100,000
    users
    25 million
    users
    Y! Japan
    Yahooligans!
    Y! UK
    Y! Germany
    Y! France
    Y! Singapore
    Y! Classifieds
    Y! Australia
    Y! Korea
    Y! Mail
    Y! Travel
    Y! Sports
    Y! Games
    Y! Italy
    Y! Movies
    Y! Spain
    Y! Small Business
    Y! Auctions
    Y! Shopping
    GeoCities
    Y! Entertainment
    Broadcast.com
    Y! Health
    Y! Brazil
    Y! Messenger
    Y! China
    Y! Mexico
    Y! Photos
    Y! Argentina
    Y! India
    Y! Groups
    LAUNCH
    HotJobs
    Y! Maps
    Inktomi
    Y! Search
    Overture
    Kelkoo
    Y! 360
    Y! Music
    Y! Podcasts
    Y! Go
    Y! Video
    Y! Food
    Y! Tech
    Bix
    Y!
    500 million
    users
    49 employees
    Flickr
    del.icio.us
    Upcoming
    11,400 employees
    4 billion
    daily page views
    200 million
    users
    1.3 billion
    daily page views
    65 million
    daily page views
    Y! Local
    Y! Calendar
    Y! Personal Finance

    View Slide

  3. Y! Japan
    Yahooligans!
    Y! UK
    Y! Germany
    Y! France
    Y! Singapore
    Y! Classifieds
    Y! Australia
    Y! Korea
    Y! Mail
    Y! Travel
    Y! Sports
    Y! Games
    Y! Italy
    Y! Chinese
    Y! Spain
    Y! Small Business
    Y! Auctions
    Y! Shopping
    GeoCities
    Y! Entertainment
    Broadcast.com
    Y! Health
    Y! Brazil
    Y! Messenger
    Y! China
    Y! Mexico
    Y! Photos
    Y! Argentina
    Y! India
    Y! Groups
    LAUNCH
    HotJobs
    Y! Maps
    Inktomi
    Y! Search
    Overture
    Kelkoo
    Y! 360
    Y! Music
    Y! Podcasts
    Y! Go
    Y! Video
    Y! Food
    Y! Tech
    Bix
    Y!
    Flickr
    del.icio.us
    Upcoming
    Integration
    Nightmare

    View Slide

  4. Between properties
    With partners
    Within over time
    With acquisitions

    View Slide

  5. “ Scale”

    View Slide

  6. View Slide

  7. Y! Sports
    Y! News
    Media Group
    Y! Fantasy Sports
    Y! Finance
    Y! Tech Y! Food
    Y! Music
    Y! Personal Finance
    Y! Movies
    Y! TV
    Y! Games
    Y! Kids
    Y! Astrology
    Y! Health
    Y! Wii

    View Slide

  8. View Slide

  9. Front-End
    Master
    Database
    Slave
    Database

    View Slide

  10. Large Datasets Don’t Push Well

    View Slide

  11. Large Datasets Don’t Push Well
    Adding Capacity is Expensive

    View Slide

  12. Large Datasets Don’t Push Well
    Adding Capacity is Expensive
    What to do after push failure?

    View Slide

  13. Large Datasets Don’t Push Well
    User-Generated Content Adding Capacity is Expensive
    What to do after push failure?

    View Slide

  14. View Slide

  15. View Slide

  16. Building New Sites

    View Slide

  17. Requirements

    View Slide

  18. Requirements
    massive scalability
    flexible deployment
    highly dynamic
    separation of concerns
    mashability

    View Slide

  19. i.e.,
    Services

    View Slide

  20. Scalability Simplicity
    Reuse Interoperability

    View Slide

  21. i.e.,
    HTTP

    View Slide

  22. Caching
    Back-End
    Front-End
    Database

    View Slide

  23. Single Source of Truth

    View Slide

  24. Single Source of Truth Cache Replicates Naturally

    View Slide

  25. Single Source of Truth
    Adding Capacity Is Easy
    Cache Replicates Naturally

    View Slide

  26. Single Source of Truth
    Adding Capacity Is Easy
    UGC Pushes Through
    Cache Replicates Naturally

    View Slide

  27. View Slide

  28. HTTP Caching Intermediaries

    View Slide

  29. HTTP Caching Intermediaries
    Freshness

    View Slide

  30. HTTP Caching Intermediaries
    Freshness
    Validation

    View Slide

  31. HTTP Caching Intermediaries
    Freshness
    Validation
    Metrics
    0101011010010101101010011001010101

    View Slide

  32. View Slide

  33. HTTP Caching Intermediaries
    Freshness
    Validation
    Load Balancing
    Metrics

    View Slide

  34. HTTP Caching Intermediaries
    Freshness
    Validation Cache Peering
    Load Balancing
    Metrics

    View Slide

  35. HTTP Caching Intermediaries
    Freshness
    Validation Cache Peering
    Negative Caching
    Load Balancing
    Metrics

    View Slide

  36. HTTP Caching Intermediaries
    Freshness
    Validation Cache Peering
    Collapsed Forwarding
    Negative Caching
    Load Balancing
    Metrics

    View Slide

  37. HTTP Caching Intermediaries
    Freshness
    Validation Cache Peering
    Collapsed Forwarding
    Negative Caching
    stale-while-revalidate
    Load Balancing
    Metrics

    View Slide

  38. HTTP Caching Intermediaries
    Freshness
    Validation Cache Peering
    Collapsed Forwarding
    Negative Caching
    stale-while-revalidate
    stale-if-error
    Load Balancing
    Metrics
    ?

    View Slide

  39. HTTP Caching Intermediaries
    Freshness
    Validation Cache Peering
    Collapsed Forwarding
    Negative Caching
    stale-while-revalidate
    stale-if-error
    Invalidation Channels
    Load Balancing
    Metrics

    View Slide

  40. 12,000 req/sec/core

    View Slide

  41. Tens of Thousands of Connections

    View Slide

  42. View Slide

  43. pitfalls
    REST vs. WS-* wars
    theory vs. practice
    human-intuitive, but not programmer-intuitive
    different deployment/operational concerns
    formats are hard
    format / interface proliferation
    authentication isn’t there yet
    tools have a way to go

    View Slide

  44. what’s needed
    tools
    web description language
    data-oriented schema language
    investment in the Atom stack
    HTTP test suite

    View Slide

  45. View Slide