Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

Roksolana Diachuk • Big Data Developer at Captify • Diversity & Inclusion ambassador at Captify • Women Who Code Kyiv Data Engineering Lead • Speaker

Slide 3

Slide 3 text

Agenda 1. What is big data? 2. What Captify does? 3. Big data tasks at Captify 4. Practical examples 5. Conclusions

Slide 4

Slide 4 text

Big Data

Slide 5

Slide 5 text

5 VS

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

AdTech AdTech methodologies deliver the right content at the right time to the right consumer

Slide 8

Slide 8 text

Captify’s technologies unite to collect, connect and categorize billions of real-time search events from 2.3bn consumers. What Captify does

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

Where’s Scala?

Slide 11

Slide 11 text

Big data tasks • Reporting • Insights • Data costs attribution • Users audiences building • All kinds of data processing/storage

Slide 12

Slide 12 text

Reporting

Slide 13

Slide 13 text

Reporting Data provider Transformer Loader

Slide 14

Slide 14 text

Data ingestion

Slide 15

Slide 15 text

S3 lister Dates parsing Schema Data ingestion

Slide 16

Slide 16 text

Data loading Data loading

Slide 17

Slide 17 text

Data loading Data loading Metadata handling First upload vs scheduled ones Schema de f inition

Slide 18

Slide 18 text

Challenges • Diverse data types • Time dependency • External data storage • Constant connection with end users

Slide 19

Slide 19 text

Insights

Slide 20

Slide 20 text

Insights Search data Impala Loader ES Loader Keywords pro f iler

Slide 21

Slide 21 text

Keywords pro f iler

Slide 22

Slide 22 text

Keywords pro f iler Extracting geographical data Extracting demographics

Slide 23

Slide 23 text

Data loading Data loading

Slide 24

Slide 24 text

Data loading Data loading ES service Time-bounded user pro f iles building

Slide 25

Slide 25 text

Challenges • Same as reporting • Large data volume in Elastic search • Cross-team collaboration

Slide 26

Slide 26 text

Data costs attribution

Slide 27

Slide 27 text

Data costs attribution Log-level data Mapper Ingestion Transformer

Slide 28

Slide 28 text

Data costs attribution

Slide 29

Slide 29 text

Data costs attribution Mapping Ingestion Transformation

Slide 30

Slide 30 text

Data costs attribution Mapping Ingestion Transformation S3 lister Uni f ied schema Data costs calculation

Slide 31

Slide 31 text

Challenges • Processing and storing really large data volumes (!) • Failures handling • Historical data re-processing

Slide 32

Slide 32 text

No content

Slide 33

Slide 33 text

dead_flowers22 roksolana-d roksolanadiachuk roksolanad My contact info