Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Modern data pipelines in AdTech - life in the t...

Modern data pipelines in AdTech - life in the trenches

"There are various tasks that the modern data pipelines approach helps us solve in different domains, including advertising. Modern data pipelines allow us to process data in a more efficient manner with a diverse set of data transformation tools for both batch and streaming data processing. AdTech is a traditional industry that constantly changes and innovates. Today, it draws a lot of attention as we’re expanding the reach and movement toward a cookieless world.

In this talk, you will learn how to use modern data pipelines for reporting and analytics, as well as the case of historical data reprocessing in AdTech. We’ll dive deeper into each case, exploring the problem itself, implementation, challenges, and future improvements. In cases like business rule changes or errors in past data, we need to re-process our historical data, and it’s not a trivial task as it requires a lot of time, precision, and computational resources for each step. Due to this, a whole section of the talk will be devoted to approaches to historical data reprocessing and data lifecycle management."

Presented at QCon London 2022 (London, UK), QCon Plus 2022, O'Reilly Data Superstream 2022, Codemotion Spain 2022, Big Data Tech Warsaw 2023, Codementors meetup

Roksolana

April 06, 2022
Tweet

More Decks by Roksolana

Other Decks in Technology

Transcript

  1. Roksolana Diachuk •Big Data Developer at Captify •Diversity & Inclusion

    ambassador at Captify •Women Who Code Kyiv Data Engineering Lead •Speaker
  2. Agenda 1. What is AdTech? 2. Data pipelines in AdTech

    3. Practical examples 4. Historical data reprocessing 5. Conclusions
  3. What Captify does? Captify’s technologies unite to collect, connect and

    categorise billions of real-time search events from 2.3 billion consumers.
  4. Data pipelines in AdTech •Reporting •Insights •Data costs attribution •Users

    audiences building •All kinds of data processing/storage
  5. Parquet files => Delta files Spark tables => Delta tables

    … Leveraging data versions through Delta tables history Vacuum unsuitable data Future with Delta lake
  6. 2. There is more than one approach to leveraging data

    Conclusions 1. AdTech is an exciting domain for big data
  7. 2. There is more than one approach to leveraging data

    Conclusions 3. There is always a room for improvement 1. AdTech is an exciting domain for big data