Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Florian Pfeiffer - User Behavior Tracking

Florian Pfeiffer - User Behavior Tracking

In this talk I'll outline the different ideas we had, when we built our system to track users&events on our websites. Beside the infrastructure we have built I'll also present an actual usecase for that system: feeding data into our recommendation engine for newly registered users to avoid the coldstart problem.

MunichDataGeeks

November 26, 2013
Tweet

More Decks by MunichDataGeeks

Other Decks in Technology

Transcript

  1. User Behaviour Tracking Track - Store - Process ! //Florian

    Pfeiffer - Head of Data&Infrastructure - gutefrage.net !
  2. Ideas,Thoughts&Goals fast / minimal impact on page loading time high

    availability track user over multiple platforms storage engine? -> hbase
  3. Numbers! 10-20ms Response Time per pixel record for now: ~2500

    concurrent reqs 1,5 billion entries in Hbase 10 Nodes in Hadoop Cluster
  4. Storing Infrastructure every nginx node has flume-ng flume ingests logfile

    AsyncHBaseSink with custom Serializer direct writes to HBase
  5. why flume? we had it already in production ;) Storm

    might be an interesting alternative
  6. Why? You can scan through all data and use filters

    for selecting specific data But scanning with start & stop row speeds things up (a lot)
  7. HBase rowkey design Do I need a fast user or

    a fast timespan lookup? User - clientid,ts<,connectionId> Timespan - ts,clientid<,connectionId>
  8. Inverse Timestamps Data in HBase is stored lexicographicaly sorted Normal

    TS - scan would yield oldest results first Inverse TS - newer entries come first (and you can cancel the scan if you have enough data)
  9. The olden times… or Cookies Easy to drop a 3rd

    party cookie with userId on different websites Gets more and more blocked (Safari, FF..)
  10. Fingerprinting Yields interesting results on desktop, difficult on e.g. iPhone

    invisible to user Last resort if everything else fails?
  11. Batch Processing Calculate how many users are active on platform

    A and also on B Get Traffic of all Questions belonging to Channel X sorted by Country
  12. Recommendations @ GF.net User emit signals on questions view, like,

    gives answer, answer is voted best Application sends signals through RabbitMQ to recommendation servers
  13. ?