More Than Websites: PHP And The Firehose @DataSift

More Than Websites: PHP And The Firehose @DataSift

PHP is the world's #1 programming language for creating websites. But it's capable of so much more. How about real-time processing the social firehose? :)

2c1dc90ff7bf69097a151677624777d2?s=128

Stuart Herbert

March 18, 2013
Tweet

Transcript

  1. And The Firehose @ More Than Websites Saturday, 23 March

    13
  2. @ Introduce Yourselves Saturday, 23 March 13

  3. @ @stuherbert Saturday, 23 March 13

  4. @ What is Saturday, 23 March 13

  5. @ Sift through social data Twitter firehose, Facebook, bitly clicks,

    news, videos, comments and more Saturday, 23 March 13
  6. @ Gain insights using augmentations Language, gender, trends, links, sentiment,

    salience & entity analysis and more Saturday, 23 March 13
  7. @ Realtime Get matching data within seconds of it being

    posted Saturday, 23 March 13
  8. @ Historics Search our social data archive going back to

    January 2010 Saturday, 23 March 13
  9. @ Pull the data from our servers via HTTP/1.1 streaming

    or websockets Saturday, 23 March 13
  10. @ Let us push data to you Have the data

    delivered directly to your servers or into your databases Saturday, 23 March 13
  11. @ in numbers Saturday, 23 March 13

  12. @ 30 Sources of social data and data augmentations Saturday,

    23 March 13
  13. @ Up to 20,000 Number of new pieces of data

    ingested into DataSift every second Saturday, 23 March 13
  14. @ 3 Terabytes Amount of new data added to the

    Historics archive every week Saturday, 23 March 13
  15. @ 12 Different ways we can deliver data to you

    Saturday, 23 March 13
  16. @ 1 Average number of seconds to pass the data

    through DataSift Saturday, 23 March 13
  17. @ 12 Number of services data passes through inside DataSift

    Saturday, 23 March 13
  18. @ 25 Number of engineers who write code for the

    DataSift platform Saturday, 23 March 13
  19. @ 5 Primary programming languages: C++, Node, PHP, Python, Scala

    Saturday, 23 March 13
  20. @ 154 Private GitHub repos Saturday, 23 March 13

  21. @ Our GitHub Repositories PHP Java & Scala C &

    C++ JS & Node Unclassified Python Shell Script Ruby C# VimL 0 15 30 45 60 Saturday, 23 March 13
  22. @ Architecture Saturday, 23 March 13

  23. @ Three major data pipelines + supporting services Saturday, 23

    March 13
  24. @ Data Archiving Adds new data to the Historics Archive

    Saturday, 23 March 13
  25. @ Filtering Pipeline Filtering and delivery of data in realtime

    Saturday, 23 March 13
  26. @ Playback Pipeline Filtering and delivery of data from the

    Historics Archive Saturday, 23 March 13
  27. @ DataSift Technical Architecture Ultrahose Archiver push Pickle Node Pickle

    Node Pickle Node Pickle Node Pickle Node Pickle Node Node Shard push Pickle Node Pickle Node Pickle Node Pickle Node Pickle Node Pickle Node Node Shard ACL (with interaction counter) HttpStreaming, PuSH, Search Stream Recorder Monitoring Aggregator EDRs (licensed content metrics) Control Channels (D5) Hardware Load Balancer Ultrahose Archiver 100% Prism 100% Pickle Filtering Engine Twitter Facebook Wikipedia Reddit LexisNexis Meltwater Estimize Digg @lorenzoalberton DataSift Architecture 2.2 Links Resolution + OpenGraph + Twitter Cards + Metadata Deletes Processor Redis Input Streams NewsCred BoardReader MySpace SuperFeeder Augmentation Pipeline push Pickle Node Pickle Node Pickle Node Pickle Node Pickle Node Pickle Node Node Shard push Pickle Node Pickle Node Pickle Node Pickle Node Pickle Node Pickle Node Node Shard Monitoring Kafka Queue Events Storage ACL (with interaction counter) tracker Limit Manager Authentication Manager Notification Service WEB API Stream . Manager . DB Definition . Manager . DB CSDL Compiler, Validator, Normaliser Historics Scheduler Recording Scheduler Push Scheduler Interaction Targets Mapping Filtering Tardis Pickle Interaction Targets Mapping Filtering Tardis Pickle ... ... Hadoop Titan Historics Map/Reduce HBase Cluster Region 1 Region 2 Region N ... ... Data Node Data Node Data Node Data Node Data Node 100% 100% Stop PUB License Manager DB Billing Pipeline DB DB DB Mask Manager DB Connection Manager Time Machine + Insights Post-Processing, Stream Analytics jobs DB chunks DB chunk selector job tracker Worker Snapshotter Buffered Streams Redis Worker Worker Node Meteor Real-time Streams Node Node HTTP Request GET batch PUSH Scheduler subscription X subscription Y job queue PUSH Producer Subscriptions DB PUSH Delivery HTTP(S) POST (S)FTP Amazon S3 DynamoDB Microsoft Azure MongoDB Exports and Analytics WebSockets HTTPStreaming Delivery Subscriptions Connections Storage kafka-consumer Oracle Stream results CouchDB PickleDB . DB Audit Kafka Kafka Historical Queries @datasift Goblin Head Goblin Head Goblin Head Goblin Tail Goblin Tail Goblin Tail Interaction Generation Interaction Generation 3rd party APIs Demographics Trends Analysis Sentiment Analysis Named Entities Topics Analysis Language Detection Klout Score + Profile Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre IBM Cognos HDFS Archiver Data ingestion + Augmentation Bit.ly Stream Splitter/Joiner Deduper Msg splitter Google BigQuery Stream results Cloud Storage DBs BI tools Saturday, 23 March 13
  28. @ Filtering Pipeline Ultrahose Archiver push Pickle Node Pickle Node

    Pickle Node Pickle Node Pickle Node Pickle Node Node Shard push Pickle Node Pickle Node Pickle Node Pickle Node Pickle Node Pickle Node Node Shard ACL (with interaction counter) HttpStreaming, PuSH, Search Stream Recorder Monitoring Aggregator EDRs (licensed content metrics) Control Channels (D5) Hardware Load Balancer Ultrahose Archiver 100% Prism 100% Pickle Filtering Engine Twitter Facebook Wikipedia Reddit LexisNexis Meltwater Estimize Digg @lorenzoalberton DataSift Architecture 2.2 Links Resolution + OpenGraph + Twitter Cards + Metadata Deletes Processor Redis Input Streams NewsCred BoardReader MySpace SuperFeeder Augmentation Pipeline push Pickle Node Pickle Node Pickle Node Pickle Node Pickle Node Pickle Node Node Shard push Pickle Node Pickle Node Pickle Node Pickle Node Pickle Node Pickle Node Node Shard Monitoring Kafka Queue Events Storage ACL (with interaction counter) tracker Limit Manager Authentication Manager Notification Service WEB API Stream . Manager . DB Definition . Manager . DB CSDL Compiler, Validator, Normaliser Historics Scheduler Recording Scheduler Push Scheduler Interaction Targets Mapping Filtering Tardis Pickle Interaction Targets Mapping Filtering Tardis Pickle ... ... Hadoop Titan Historics Map/Reduce HBase Cluster Region 1 Region 2 Region N ... ... Data Node Data Node Data Node Data Node Data Node 100% 100% Stop PUB License Manager DB Billing Pipeline DB DB DB Mask Manager DB Connection Manager Time Machine + Insights Post-Processing, Stream Analytics jobs DB chunks DB chunk selector job tracker Worker Snapshotter Buffered Streams Redis Worker Worker Node Meteor Real-time Streams Node Node HTTP Request GET batch PUSH Scheduler subscription X subscription Y job queue PUSH Producer Subscriptions DB PUSH Delivery HTTP(S) POST (S)FTP Amazon S3 DynamoDB Microsoft Azure MongoDB Exports and Analytics WebSockets HTTPStreaming Delivery Subscriptions Connections Storage kafka-consumer Oracle Stream results CouchDB PickleDB . DB Audit Kafka Kafka Historical Queries @datasift Goblin Head Goblin Head Goblin Head Goblin Tail Goblin Tail Goblin Tail Interaction Generation Interaction Generation 3rd party APIs Demographics Trends Analysis Sentiment Analysis Named Entities Topics Analysis Language Detection Klout Score + Profile Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre IBM Cognos HDFS Archiver Data ingestion + Augmentation Bit.ly Stream Splitter/Joiner Deduper Msg splitter Google BigQuery Stream results Cloud Storage DBs BI tools HttpStreaming, PuSH, Search Twitter Facebook Wikipedia Reddit LexisNexis Meltwater Estimize Digg Links Resolution + OpenGraph + Twitter Cards + Metadata Deletes Processor Redis Input Streams NewsCred BoardReader MySpace SuperFeeder Augmentation Pipeline 100% Goblin Head Goblin Head Goblin Head Goblin Tail Goblin Tail Goblin Tail Interaction Generation Interaction Generation 3rd party APIs Demographics Trends Analysis Sentiment Analysis Named Entities Topics Analysis Language Detection Klout Score + Profile Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Data ingestion + Augmentation Bit.ly Stream Splitter/Joiner Deduper Msg splitter Links Resolution + OpenGraph + Twitter Cards + Metadata Ogre Ogre Ogre Ogre Ogre Ogre push Pickle Node Pickle Node Pickle Node Pickle Node Pickle Node Pickle Node Node Shard push Pickle Node Pickle Node Pickle Node Pickle Node Pickle Node Pickle Node Node Shard ACL (with interaction counter) Stream Recorder Control Channels (D5) Hardware Load Balancer 100% Prism 100% Pickle Filtering Engine push Pickle Node Pickle Node Pickle Node Pickle Node Pickle Node Pickle Node Node Shard push Pickle Node Pickle Node Pickle Node Pickle Node Pickle Node Pickle Node Node Shard ACL (with interaction counter) 100% Worker Snapshotter Buffered Streams Redis Worker Worker Node Meteor Real-time Streams Node Node HTTP Request GET batch PUSH Scheduler subscription X subscription Y job queue PUSH Producer Subscriptions DB PUSH Delivery HTTP(S) POST (S)FTP Amazon S3 DynamoDB Microsoft Azure MongoDB WebSockets HTTPStreaming Delivery Subscriptions kafka-consumer Oracle CouchDB Kafka IBM Cognos Google BigQuery Cloud Storage DBs BI tools Saturday, 23 March 13
  29. @ Data Archiving Pipeline Ultrahose Archiver push Pickle Node Pickle

    Node Pickle Node Pickle Node Pickle Node Pickle Node Node Shard push Pickle Node Pickle Node Pickle Node Pickle Node Pickle Node Pickle Node Node Shard ACL (with interaction counter) HttpStreaming, PuSH, Search Stream Recorder Monitoring Aggregator EDRs (licensed content metrics) Control Channels (D5) Hardware Load Balancer Ultrahose Archiver 100% Prism 100% Pickle Filtering Engine Twitter Facebook Wikipedia Reddit LexisNexis Meltwater Estimize Digg @lorenzoalberton DataSift Architecture 2.2 Links Resolution + OpenGraph + Twitter Cards + Metadata Deletes Processor Redis Input Streams NewsCred BoardReader MySpace SuperFeeder Augmentation Pipeline push Pickle Node Pickle Node Pickle Node Pickle Node Pickle Node Pickle Node Node Shard push Pickle Node Pickle Node Pickle Node Pickle Node Pickle Node Pickle Node Node Shard Monitoring Kafka Queue Events Storage ACL (with interaction counter) tracker Limit Manager Authentication Manager Notification Service WEB API Stream . Manager . DB Definition . Manager . DB CSDL Compiler, Validator, Normaliser Historics Scheduler Recording Scheduler Push Scheduler Interaction Targets Mapping Filtering Tardis Pickle Interaction Targets Mapping Filtering Tardis Pickle ... ... Hadoop Titan Historics Map/Reduce HBase Cluster Region 1 Region 2 Region N ... ... Data Node Data Node Data Node Data Node Data Node 100% 100% Stop PUB License Manager DB Billing Pipeline DB DB DB Mask Manager DB Connection Manager Time Machine + Insights Post-Processing, Stream Analytics jobs DB chunks DB chunk selector job tracker Worker Snapshotter Buffered Streams Redis Worker Worker Node Meteor Real-time Streams Node Node HTTP Request GET batch PUSH Scheduler subscription X subscription Y job queue PUSH Producer Subscriptions DB PUSH Delivery HTTP(S) POST (S)FTP Amazon S3 DynamoDB Microsoft Azure MongoDB Exports and Analytics WebSockets HTTPStreaming Delivery Subscriptions Connections Storage kafka-consumer Oracle Stream results CouchDB PickleDB . DB Audit Kafka Kafka Historical Queries @datasift Goblin Head Goblin Head Goblin Head Goblin Tail Goblin Tail Goblin Tail Interaction Generation Interaction Generation 3rd party APIs Demographics Trends Analysis Sentiment Analysis Named Entities Topics Analysis Language Detection Klout Score + Profile Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre IBM Cognos HDFS Archiver Data ingestion + Augmentation Bit.ly Stream Splitter/Joiner Deduper Msg splitter Google BigQuery Stream results Cloud Storage DBs BI tools Ultrahose Archiver Ultrahose Archiver Kafka HttpStreaming, PuSH, Search Twitter Facebook Wikipedia Reddit LexisNexis Meltwater Estimize Digg Links Resolution + OpenGraph + Twitter Cards + Metadata Deletes Processor Redis Input Streams NewsCred BoardReader MySpace SuperFeeder Augmentation Pipeline 100% Goblin Head Goblin Head Goblin Head Goblin Tail Goblin Tail Goblin Tail Interaction Generation Interaction Generation 3rd party APIs Demographics Trends Analysis Sentiment Analysis Named Entities Topics Analysis Language Detection Klout Score + Profile Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Data ingestion + Augmentation Bit.ly Stream Splitter/Joiner Deduper Msg splitter Kafka HBase Cluster Region 1 Region 2 Region N ... HDFS Archiver Links Resolution + OpenGraph + Twitter Cards + Metadata Ogre Ogre Ogre Ogre Ogre Ogre Saturday, 23 March 13
  30. @ Playback Pipeline Ultrahose Archiver push Pickle Node Pickle Node

    Pickle Node Pickle Node Pickle Node Pickle Node Node Shard push Pickle Node Pickle Node Pickle Node Pickle Node Pickle Node Pickle Node Node Shard ACL (with interaction counter) HttpStreaming, PuSH, Search Stream Recorder Monitoring Aggregator EDRs (licensed content metrics) Control Channels (D5) Hardware Load Balancer Ultrahose Archiver 100% Prism 100% Pickle Filtering Engine Twitter Facebook Wikipedia Reddit LexisNexis Meltwater Estimize Digg @lorenzoalberton DataSift Architecture 2.2 Links Resolution + OpenGraph + Twitter Cards + Metadata Deletes Processor Redis Input Streams NewsCred BoardReader MySpace SuperFeeder Augmentation Pipeline push Pickle Node Pickle Node Pickle Node Pickle Node Pickle Node Pickle Node Node Shard push Pickle Node Pickle Node Pickle Node Pickle Node Pickle Node Pickle Node Node Shard Monitoring Kafka Queue Events Storage ACL (with interaction counter) tracker Limit Manager Authentication Manager Notification Service WEB API Stream . Manager . DB Definition . Manager . DB CSDL Compiler, Validator, Normaliser Historics Scheduler Recording Scheduler Push Scheduler Interaction Targets Mapping Filtering Tardis Pickle Interaction Targets Mapping Filtering Tardis Pickle ... ... Hadoop Titan Historics Map/Reduce HBase Cluster Region 1 Region 2 Region N ... ... Data Node Data Node Data Node Data Node Data Node 100% 100% Stop PUB License Manager DB Billing Pipeline DB DB DB Mask Manager DB Connection Manager Time Machine + Insights Post-Processing, Stream Analytics jobs DB chunks DB chunk selector job tracker Worker Snapshotter Buffered Streams Redis Worker Worker Node Meteor Real-time Streams Node Node HTTP Request GET batch PUSH Scheduler subscription X subscription Y job queue PUSH Producer Subscriptions DB PUSH Delivery HTTP(S) POST (S)FTP Amazon S3 DynamoDB Microsoft Azure MongoDB Exports and Analytics WebSockets HTTPStreaming Delivery Subscriptions Connections Storage kafka-consumer Oracle Stream results CouchDB PickleDB . DB Audit Kafka Kafka Historical Queries @datasift Goblin Head Goblin Head Goblin Head Goblin Tail Goblin Tail Goblin Tail Interaction Generation Interaction Generation 3rd party APIs Demographics Trends Analysis Sentiment Analysis Named Entities Topics Analysis Language Detection Klout Score + Profile Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre IBM Cognos HDFS Archiver Data ingestion + Augmentation Bit.ly Stream Splitter/Joiner Deduper Msg splitter Google BigQuery Stream results Cloud Storage DBs BI tools ACL (with interaction counter) (D5) Hardware Load Balancer ACL (with interaction counter) Worker Snapshotter Buffered Streams Redis Worker Worker Node Meteor Real-time Streams Node Node HTTP Request GET batch PUSH Scheduler subscription X subscription Y job queue PUSH Producer Subscriptions DB PUSH Delivery HTTP(S) POST (S)FTP Amazon S3 DynamoDB Microsoft Azure MongoDB WebSockets HTTPStreaming Delivery Subscriptions kafka-consumer Oracle CouchDB IBM Cognos Google BigQuery Cloud Storage DBs BI tools Interaction Targets Mapping Filtering Tardis Pickle Interaction Targets Mapping Filtering Tardis Pickle ... ... Hadoop Titan Historics Map/Reduce HBase Cluster Region 1 Region 2 Region N ... ... Data Node Data Node Data Node Data Node Data Node Time Machine + Insights Post-Processing, Stream Analytics jobs DB chunks DB chunk selector job tracker Exports and Analytics Stream results Historical Queries HDFS Archiver Stream results Saturday, 23 March 13
  31. @ Written In PHP Ultrahose Archiver push Pickle Node Pickle

    Node Pickle Node Pickle Node Pickle Node Pickle Node Node Shard push Pickle Node Pickle Node Pickle Node Pickle Node Pickle Node Pickle Node Node Shard ACL (with interaction counter) HttpStreaming, PuSH, Search Stream Recorder Monitoring Aggregator EDRs (licensed content metrics) Control Channels (D5) Hardware Load Balancer Ultrahose Archiver 100% Prism 100% Pickle Filtering Engine Twitter Facebook Wikipedia Reddit LexisNexis Meltwater Estimize Digg @lorenzoalberton DataSift Architecture 2.2 Links Resolution + OpenGraph + Twitter Cards + Metadata Deletes Processor Redis Input Streams NewsCred BoardReader MySpace SuperFeeder Augmentation Pipeline push Pickle Node Pickle Node Pickle Node Pickle Node Pickle Node Pickle Node Node Shard push Pickle Node Pickle Node Pickle Node Pickle Node Pickle Node Pickle Node Node Shard Monitoring Kafka Queue Events Storage ACL (with interaction counter) tracker Limit Manager Authentication Manager Notification Service WEB API Stream . Manager . DB Definition . Manager . DB CSDL Compiler, Validator, Normaliser Historics Scheduler Recording Scheduler Push Scheduler Interaction Targets Mapping Filtering Tardis Pickle Interaction Targets Mapping Filtering Tardis Pickle ... ... Hadoop Titan Historics Map/Reduce HBase Cluster Region 1 Region 2 Region N ... ... Data Node Data Node Data Node Data Node Data Node 100% 100% Stop PUB License Manager DB Billing Pipeline DB DB DB Mask Manager DB Connection Manager Time Machine + Insights Post-Processing, Stream Analytics jobs DB chunks DB chunk selector job tracker Worker Snapshotter Buffered Streams Redis Worker Worker Node Meteor Real-time Streams Node Node HTTP Request GET batch PUSH Scheduler subscription X subscription Y job queue PUSH Producer Subscriptions DB PUSH Delivery HTTP(S) POST (S)FTP Amazon S3 DynamoDB Microsoft Azure MongoDB Exports and Analytics WebSockets HTTPStreaming Delivery Subscriptions Connections Storage kafka-consumer Oracle Stream results CouchDB PickleDB . DB Audit Kafka Kafka Historical Queries @datasift Goblin Head Goblin Head Goblin Head Goblin Tail Goblin Tail Goblin Tail Interaction Generation Interaction Generation 3rd party APIs Demographics Trends Analysis Sentiment Analysis Named Entities Topics Analysis Language Detection Klout Score + Profile Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre IBM Cognos HDFS Archiver Data ingestion + Augmentation Bit.ly Stream Splitter/Joiner Deduper Msg splitter Google BigQuery Stream results Cloud Storage DBs BI tools HttpStreaming, PuSH, Search Facebook Wikipedia Reddit LexisNexis Meltwater Estimize Digg Links Resolution + OpenGraph + Twitter Cards + Metadata Deletes Processor Input Streams Augmentation Pipeline Interaction Generation Interaction Generation 3rd party APIs Demographics Trends Analysis Sentiment Analysis Named Entities Topics Analysis Language Detection Klout Score + Profile Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Ogre Data ingestion + Augmentation Bit.ly Links Resolution + OpenGraph + Twitter Cards + Metadata Worker Snapshotter Buffered Streams Redis Worker Worker HTTP Request GET batch PUSH Scheduler subscription X subscription Y job queue Subscriptions DB PUSH Delivery HTTP(S) POST (S)FTP Amazon S3 DynamoDB Microsoft Azure MongoDB Delivery Subscriptions kafka-consumer Oracle CouchDB IBM Cognos Google BigQuery Cloud Storage DBs BI tools Monitoring Aggregator Limit Manager Notification Service License Manager DB Billing Pipeline DB Mask Manager DB Authentication Manager DB DB Stream . Manager . DB Definition . Manager . DB Recording Scheduler Saturday, 23 March 13
  32. @ 100% Every piece of data is handled by our

    PHP code in realtime Saturday, 23 March 13
  33. @ What we do in Saturday, 23 March 13

  34. @ Marketing website Runs on Drupal Saturday, 23 March 13

  35. @ Our main webapp Customer signup, stream creation, account management

    Saturday, 23 March 13
  36. @ Our external API Our main interface with customers Saturday,

    23 March 13
  37. @ Boring! That’s all very standard stuff, well understood The

    interesting uses are behind the scenes Saturday, 23 March 13
  38. @ Behind the scenes? Are you mad?!? Everyone knows that

    PHP is only for building websites! Saturday, 23 March 13
  39. @ Internal services APIs that support our data pipelines User

    management, billing, data security Saturday, 23 March 13
  40. @ Data assembly Convert incoming data into common ‘interaction’ structure

    Saturday, 23 March 13
  41. @ 100% Every piece of data is handled by our

    PHP code in realtime Saturday, 23 March 13
  42. @ Push delivery Outbound delivery of data to customers’ servers

    and into their databases Saturday, 23 March 13
  43. @ 1 MP3/sec How much data we can deliver to

    a single EC2 micro-instance Saturday, 23 March 13
  44. @ 500 Number of simultaneous deliveries to customers every second

    Saturday, 23 March 13
  45. @ Hornet Our EvilTestTool(tm) Designed to melt the data centre

    Saturday, 23 March 13
  46. @ Storyteller Our functional test tool Brings user stories to

    life Fires up VMs, deploys code, tests services Reproducibly Saturday, 23 March 13
  47. @ Why Saturday, 23 March 13

  48. @ Our History DataSift grew out of TweetMeme Saturday, 23

    March 13
  49. @ Our Product PHP is superb at handling unstructured data

    Saturday, 23 March 13
  50. @ Our Customers PHP can talk to any server, database

    / datastore that we want to deliver data to Saturday, 23 March 13
  51. @ Our People Several ‘names’ from PHP community PHP is

    a language most engineers know Saturday, 23 March 13
  52. @ Our Time PHP is a great language to build

    high-quality code very very quickly Saturday, 23 March 13
  53. @ Our Performance PHP is fast enough for data assembly

    work and is getting faster with every major release Saturday, 23 March 13
  54. @ Our Sanity Our PHP applications require less Ops time

    than any of the others Saturday, 23 March 13
  55. @ frameworks Saturday, 23 March 13

  56. @ Rolled our own Frink & Stone Saturday, 23 March

    13
  57. @ Right choice for us We’re not part of the

    target demographic for the major PHP frameworks (nor the minor ones, tbh) Saturday, 23 March 13
  58. @ Frink Tweetmeme’s framework built to handle millions of tweeted

    links a day Saturday, 23 March 13
  59. @ Built for speed Stripped down to the bare essentials

    a reaction to experience with early Zend Framework Saturday, 23 March 13
  60. @ Jobqueues Long-running daemon processes Worker processes handle data queues

    Manager process monitors workers Saturday, 23 March 13
  61. @ Stone Foundation of our in-house test tools Hornet and

    Storyteller Saturday, 23 March 13
  62. @ Built for speed Powers our fake Twitter firehose used

    for testing Saturday, 23 March 13
  63. @ Built for inspection Allows us to measure activity normally

    hidden by libraries and PHP extensions Saturday, 23 March 13
  64. @ tools & utilities Saturday, 23 March 13

  65. @ PHP 5.3.latest Compiled in-house Extensions statically-linked for performance Saturday,

    23 March 13
  66. @ ZeroMQ extension Transport layer for our pipelines Saturday, 23

    March 13
  67. @ APC extension Shared memory for app metrics PHP is

    too slow without an opcache Lack of APC has prevented us moving to PHP 5.4 Saturday, 23 March 13
  68. @ XHProf extension For profiling code Skews the results less

    than Xdebug Saturday, 23 March 13
  69. @ Redis extension Buffering and queueing (being phased out) Saturday,

    23 March 13
  70. @ Xdebug For code coverage metrics (and readable vardump()s!) Saturday,

    23 March 13
  71. @ PHPunit For all our unit tests Saturday, 23 March

    13
  72. @ phpdoc2 For code documentation (although nobody reads it -

    code is king) Saturday, 23 March 13
  73. @ Maven For building all release RPM packages Saturday, 23

    March 13
  74. @ Jenkins Continuous integration Saturday, 23 March 13

  75. @ RPM Packages for deployment into dev, test, staging, and

    production Saturday, 23 March 13
  76. @ Thank you PS: We’re hiring :-) Saturday, 23 March

    13