Slide 27
Slide 27 text
- Stats (as Monoids) Storage System
- All we want was approximate aggregates real-time
- HTML Archive System
- Stores ~120TB of url and timestamp indexed HTML pages
- Real-time scheduler for our crawlers
- Finds out which of the 20 urls to crawl now out of 3+ billion urls
- Helps crawler crawl 20+ million urls everyday
rocksdb @indix