Link
Embed
Share
Beginning
This slide
Copy link URL
Copy link URL
Copy iframe embed code
Copy iframe embed code
Copy javascript embed code
Copy javascript embed code
Share
Tweet
Share
Tweet
Slide 1
Slide 1 text
No content
Slide 2
Slide 2 text
Roksolana Diachuk • Big Data Developer at Captify • Diversity & Inclusion ambassador at Captify • Women Who Code Kyiv Data Engineering Lead • Speaker
Slide 3
Slide 3 text
Agenda 1. What is big data? 2. What Captify does? 3. Big data tasks at Captify 4. Practical examples 5. Conclusions
Slide 4
Slide 4 text
Big Data
Slide 5
Slide 5 text
5 VS
Slide 6
Slide 6 text
No content
Slide 7
Slide 7 text
AdTech AdTech methodologies deliver the right content at the right time to the right consumer
Slide 8
Slide 8 text
Captify’s technologies unite to collect, connect and categorize billions of real-time search events from 2.3bn consumers. What Captify does
Slide 9
Slide 9 text
No content
Slide 10
Slide 10 text
Where’s Scala?
Slide 11
Slide 11 text
Big data tasks • Reporting • Insights • Data costs attribution • Users audiences building • All kinds of data processing/storage
Slide 12
Slide 12 text
Reporting
Slide 13
Slide 13 text
Reporting Data provider Transformer Loader
Slide 14
Slide 14 text
Data ingestion
Slide 15
Slide 15 text
S3 lister Dates parsing Schema Data ingestion
Slide 16
Slide 16 text
Data loading Data loading
Slide 17
Slide 17 text
Data loading Data loading Metadata handling First upload vs scheduled ones Schema de f inition
Slide 18
Slide 18 text
Challenges • Diverse data types • Time dependency • External data storage • Constant connection with end users
Slide 19
Slide 19 text
Insights
Slide 20
Slide 20 text
Insights Search data Impala Loader ES Loader Keywords pro f iler
Slide 21
Slide 21 text
Keywords pro f iler
Slide 22
Slide 22 text
Keywords pro f iler Extracting geographical data Extracting demographics
Slide 23
Slide 23 text
Data loading Data loading
Slide 24
Slide 24 text
Data loading Data loading ES service Time-bounded user pro f iles building
Slide 25
Slide 25 text
Challenges • Same as reporting • Large data volume in Elastic search • Cross-team collaboration
Slide 26
Slide 26 text
Data costs attribution
Slide 27
Slide 27 text
Data costs attribution Log-level data Mapper Ingestion Transformer
Slide 28
Slide 28 text
Data costs attribution
Slide 29
Slide 29 text
Data costs attribution Mapping Ingestion Transformation
Slide 30
Slide 30 text
Data costs attribution Mapping Ingestion Transformation S3 lister Uni f ied schema Data costs calculation
Slide 31
Slide 31 text
Challenges • Processing and storing really large data volumes (!) • Failures handling • Historical data re-processing
Slide 32
Slide 32 text
No content
Slide 33
Slide 33 text
dead_flowers22 roksolana-d roksolanadiachuk roksolanad My contact info