Roksolana Diachuk
• Big Data Developer at
Captify
• Women Who Code Kyiv
Data Engineering Lead
• Speaker and traveller
• Big data and Functional
programming fan
Slide 3
Slide 3 text
BIG DATA
WHAT IS
Slide 4
Slide 4 text
5 VS
Slide 5
Slide 5 text
Big data
Structured Unstructured
Semi-structured
Slide 6
Slide 6 text
No content
Slide 7
Slide 7 text
Main tasks
• Building data pipelines
• Data Storage
• Data Processing
• Infrastructure
Slide 8
Slide 8 text
No content
Slide 9
Slide 9 text
Building data pipelines
Slide 10
Slide 10 text
Data storage
Slide 11
Slide 11 text
Data processing
Slide 12
Slide 12 text
Infrastructure
Slide 13
Slide 13 text
BIG DATA DATA SCIENCE
Slide 14
Slide 14 text
• Data processing and cleaning
• Data pipelines
• Testing and maintenance
• Storing data
• Data Pre-Processing
• Data Analysis
• Building ML models
Slide 15
Slide 15 text
Data
Scientist
Data
Analyst
Data
Engineer
Data
Communication
Math, Stats,
Algorithms
Software
Engineering
Slide 16
Slide 16 text
General ML workflow
Slide 17
Slide 17 text
RDBMS ML model Metrics
Slide 18
Slide 18 text
RDBMS ML model Metrics
BIG DATA DATA SCIENCE
Slide 19
Slide 19 text
Back to big data
engineering…
Slide 20
Slide 20 text
Challenges
• Nature of the data
• Instant data processing
• Historical data processing
• Storage of large volumes of data
Slide 21
Slide 21 text
Pros
• Innovative sphere
• Complicated tasks
• Little competition
• Diverse tech stack
Cons
• Tech stack might be
too diverse
• High entry threshold
• Invisible work
Slide 22
Slide 22 text
Background
Slide 23
Slide 23 text
Software engineer
Slide 24
Slide 24 text
Databases developer
Slide 25
Slide 25 text
Data analysis
Slide 26
Slide 26 text
How to become big data
engineer
Slide 27
Slide 27 text
Big data
degree
Internship
Switch from
other IT
professions
Slide 28
Slide 28 text
My career
Slide 29
Slide 29 text
2014 2017
First
years in
NAU
Freelance
as Java
developer
at Upwork
Jan
Sep May
French
Spring
School in
KPI
Coursera
Big Data
Specialization
Aug
Slide 30
Slide 30 text
2017
Java
courses in
EPAM
Oct Dec
DataRoot
university
Job offer
from Ciklum
Graph
databases
research
Jan
Slide 31
Slide 31 text
Graph databases research
Slide 32
Slide 32 text
2019
Jun
Job offer
from
Captify
Big data
developer at
Captify
Jul
Project in
banking
Feb
2020
Kubernetes
research
Oct
Slide 33
Slide 33 text
Kubernetes research
Slide 34
Slide 34 text
2019
Jun
Job offer
from
Captify
Big data
developer at
Captify
Jul
Project in
banking
Feb
2020
Kubernetes
research
Oct
Slide 35
Slide 35 text
Project in banking
Slide 36
Slide 36 text
2019
Jun
Job offer
from
Captify
Big data
developer at
Captify
Jul
Project in
banking
Feb
2020
Kubernetes
research
Oct
Slide 37
Slide 37 text
Why you should become big data
engineer
• Edge of innovation
• Diverse projects/products
• Challenging tasks
• Popularity rise
Slide 38
Slide 38 text
My article about big data
https://dou.ua/lenta/articles/what-is-big-
data-engineering/
Slide 39
Slide 39 text
Coursera Big data specialisation
https://www.coursera.org/specializations/big-data
Slide 40
Slide 40 text
dead_flowers22
roksolana-d
roksolanadiachuk
roksolanad
My contact info