Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Fraud, Fun and Fortune in Event Log Analytics

Fraud, Fun and Fortune in Event Log Analytics

Presentation on Video Ad Fraud Analytics at PyData 2015 in NYC.

Feyzi R. Bagirov

November 11, 2015
Tweet

More Decks by Feyzi R. Bagirov

Other Decks in Technology

Transcript

  1. Agenda • What is Real-Time Bidding and how does it

    work (RTB)? • What are logs and how do they work • Describing a Fraud • Exploration Log Data Analysis demo
  2. What is Real-Time Bidding? • A technology (based on IAB

    OpenRTB procotol) in online-advertisement industry, enabling real-time auctions of online advertisement.
  3. How does Real-Time Bidding work? Participants: Supply Side Platform (SSP)

    Data Management Platform (DMP) Demand Side Platform (DSP) An advertising network, supplying spots for impressions to the auction An advertising network, providing advertisement content to the auction User Data Supplier
  4. Advertising Server RTB Platform User Opens Browser Goes to the

    Website, e.g. cnn.com • Request Ad from Ad Server The site auctions ads • Redirect to Ad Exchange DSPs (DataXu, etc.) Bid • Send to Rela-Time Bidders DSP Wins Bid • Winner Serves Ad User sees Ad Advertising Server Mechanics of Real-Time Bidding
  5. Video Ad Logs from Treasure Data • Over a 162,000,000

    records across 14 tables • Some of the Log Variables… – Referrer – Ad source – Agent (browser and device) – Locations – Impression data (when somebody is actually viewing an ad) • Log Event types: – Ad call started – Ad call completed – Error No Ad
  6. Opportunities for Video Ad Fraud • Invalid Traffic – Bot

    fraud – Single pixel – Stacked ads – Below the fold • Yield Optimization – Arbitrage problem • Inventory Trading – Publishers and user inventory
  7. Fraud Taxonomy (1/3) • Illegitimate and Non-Human Traffic Sources –

    Hijacked device – any user’s device (browser, phone, app or other system) that has been modified to call html or make ad requests that is not under the control of a user and made without the user’s content – Crawler masquerading as a legitimate user – a browser, server or app that makes page load calls automatically without declaring themselves as a robot, instead declaring a valid regular browser or app user agent where there is no real human user.
  8. Fraud Taxonomy (2/3) • Hijacked Tags: – Ad Tag Hijacking

    – taking ad tags from a publisher’s site and putting them onto another site without the publisher’s knowledge – Creative Hijacking – copying the creating tags from a legitimately served ad so they can be rendered at a later time, without the consent of advertiser or their contracted service provider
  9. Fraud Taxonomy (3/3) • Site or Impression Attributes: – Hidden

    Ads – ads placed in such a manner that they can not ever be viewable e.g. stacked ads, ads clipped by iframes, zero opacity ads. • Ad creative/other – Cookie-stuffing – the process by which a client is provided with cookies from other domains as if the user had visited those other domains.