Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Leveraging the Web for Services at Yahoo!

Leveraging the Web for Services at Yahoo!

In this talk, recorded at QCon London, Mark Nottingham explains how Yahoo! leverages Web technologies, specifically HTTP-based caching using Squid, to create a high-performance architecture for integrating multiple Yahoo! properties, concluding that the Web provides sophisticated techniques without using SOA tooling such as ESBs.

Mark Nottingham

March 16, 2007
Tweet

More Decks by Mark Nottingham

Other Decks in Technology

Transcript

  1. Leveraging the Web
    for Services at
    Mark Nottingham

    View full-size slide

  2. 992 1994 1996 1998 2000 2002 2004 2006 200
    100,000
    users
    25 million
    users
    Y! Japan
    Yahooligans!
    Y! UK
    Y! Germany
    Y! France
    Y! Singapore
    Y! Classifieds
    Y! Australia
    Y! Korea
    Y! Mail
    Y! Travel
    Y! Sports
    Y! Games
    Y! Italy
    Y! Movies
    Y! Spain
    Y! Small Business
    Y! Auctions
    Y! Shopping
    GeoCities
    Y! Entertainment
    Broadcast.com
    Y! Health
    Y! Brazil
    Y! Messenger
    Y! China
    Y! Mexico
    Y! Photos
    Y! Argentina
    Y! India
    Y! Groups
    LAUNCH
    HotJobs
    Y! Maps
    Inktomi
    Y! Search
    Overture
    Kelkoo
    Y! 360
    Y! Music
    Y! Podcasts
    Y! Go
    Y! Video
    Y! Food
    Y! Tech
    Bix
    Y!
    500 million
    users
    49 employees
    Flickr
    del.icio.us
    Upcoming
    11,400 employees
    4 billion
    daily page views
    200 million
    users
    1.3 billion
    daily page views
    65 million
    daily page views
    Y! Local
    Y! Calendar
    Y! Personal Finance

    View full-size slide

  3. Y! Japan
    Yahooligans!
    Y! UK
    Y! Germany
    Y! France
    Y! Singapore
    Y! Classifieds
    Y! Australia
    Y! Korea
    Y! Mail
    Y! Travel
    Y! Sports
    Y! Games
    Y! Italy
    Y! Chinese
    Y! Spain
    Y! Small Business
    Y! Auctions
    Y! Shopping
    GeoCities
    Y! Entertainment
    Broadcast.com
    Y! Health
    Y! Brazil
    Y! Messenger
    Y! China
    Y! Mexico
    Y! Photos
    Y! Argentina
    Y! India
    Y! Groups
    LAUNCH
    HotJobs
    Y! Maps
    Inktomi
    Y! Search
    Overture
    Kelkoo
    Y! 360
    Y! Music
    Y! Podcasts
    Y! Go
    Y! Video
    Y! Food
    Y! Tech
    Bix
    Y!
    Flickr
    del.icio.us
    Upcoming
    Integration
    Nightmare

    View full-size slide

  4. Between properties
    With partners
    Within over time
    With acquisitions

    View full-size slide

  5. “ Scale”

    View full-size slide

  6. Y! Sports
    Y! News
    Media Group
    Y! Fantasy Sports
    Y! Finance
    Y! Tech Y! Food
    Y! Music
    Y! Personal Finance
    Y! Movies
    Y! TV
    Y! Games
    Y! Kids
    Y! Astrology
    Y! Health
    Y! Wii

    View full-size slide

  7. Front-End
    Master
    Database
    Slave
    Database

    View full-size slide

  8. Large Datasets Don’t Push Well

    View full-size slide

  9. Large Datasets Don’t Push Well
    Adding Capacity is Expensive

    View full-size slide

  10. Large Datasets Don’t Push Well
    Adding Capacity is Expensive
    What to do after push failure?

    View full-size slide

  11. Large Datasets Don’t Push Well
    User-Generated Content Adding Capacity is Expensive
    What to do after push failure?

    View full-size slide

  12. Building New Sites

    View full-size slide

  13. Requirements

    View full-size slide

  14. Requirements
    massive scalability
    flexible deployment
    highly dynamic
    separation of concerns
    mashability

    View full-size slide

  15. i.e.,
    Services

    View full-size slide

  16. Scalability Simplicity
    Reuse Interoperability

    View full-size slide

  17. Caching
    Back-End
    Front-End
    Database

    View full-size slide

  18. Single Source of Truth

    View full-size slide

  19. Single Source of Truth Cache Replicates Naturally

    View full-size slide

  20. Single Source of Truth
    Adding Capacity Is Easy
    Cache Replicates Naturally

    View full-size slide

  21. Single Source of Truth
    Adding Capacity Is Easy
    UGC Pushes Through
    Cache Replicates Naturally

    View full-size slide

  22. HTTP Caching Intermediaries

    View full-size slide

  23. HTTP Caching Intermediaries
    Freshness

    View full-size slide

  24. HTTP Caching Intermediaries
    Freshness
    Validation

    View full-size slide

  25. HTTP Caching Intermediaries
    Freshness
    Validation
    Metrics
    0101011010010101101010011001010101

    View full-size slide

  26. HTTP Caching Intermediaries
    Freshness
    Validation
    Load Balancing
    Metrics

    View full-size slide

  27. HTTP Caching Intermediaries
    Freshness
    Validation Cache Peering
    Load Balancing
    Metrics

    View full-size slide

  28. HTTP Caching Intermediaries
    Freshness
    Validation Cache Peering
    Negative Caching
    Load Balancing
    Metrics

    View full-size slide

  29. HTTP Caching Intermediaries
    Freshness
    Validation Cache Peering
    Collapsed Forwarding
    Negative Caching
    Load Balancing
    Metrics

    View full-size slide

  30. HTTP Caching Intermediaries
    Freshness
    Validation Cache Peering
    Collapsed Forwarding
    Negative Caching
    stale-while-revalidate
    Load Balancing
    Metrics

    View full-size slide

  31. HTTP Caching Intermediaries
    Freshness
    Validation Cache Peering
    Collapsed Forwarding
    Negative Caching
    stale-while-revalidate
    stale-if-error
    Load Balancing
    Metrics
    ?

    View full-size slide

  32. HTTP Caching Intermediaries
    Freshness
    Validation Cache Peering
    Collapsed Forwarding
    Negative Caching
    stale-while-revalidate
    stale-if-error
    Invalidation Channels
    Load Balancing
    Metrics

    View full-size slide

  33. 12,000 req/sec/core

    View full-size slide

  34. Tens of Thousands of Connections

    View full-size slide

  35. pitfalls
    REST vs. WS-* wars
    theory vs. practice
    human-intuitive, but not programmer-intuitive
    different deployment/operational concerns
    formats are hard
    format / interface proliferation
    authentication isn’t there yet
    tools have a way to go

    View full-size slide

  36. what’s needed
    tools
    web description language
    data-oriented schema language
    investment in the Atom stack
    HTTP test suite

    View full-size slide