×
Copy
Open
Link
Embed
Share
Beginning
This slide
Copy link URL
Copy link URL
Copy iframe embed code
Copy iframe embed code
Copy javascript embed code
Copy javascript embed code
Share
Tweet
Share
Tweet
Slide 1
Slide 1 text
Building Scalable Data Science Pipeline luigi | spark | flask www.unnati.xyz Raghotham S Nischal HP
Slide 2
Slide 2 text
Agenda ● Introduction ● Data engineering ● Machine Learning ● Data Pipelines ● API ● Hands on
Slide 3
Slide 3 text
Introduction Applying software engineering principles to Data Science
Slide 4
Slide 4 text
Data Engineering Process of acquiring, cleaning, transforming & persisting data
Slide 5
Slide 5 text
Machine Learning Art & science of choosing a model & scaling it
Slide 6
Slide 6 text
Data Pipelines Plumbing data engineering & machine learning tasks
Slide 7
Slide 7 text
API Expose data science as a service
Slide 8
Slide 8 text
Project Structure
Slide 9
Slide 9 text
Hands on Dataset: Bay Area Bike Share Hypothesis based solution
Slide 10
Slide 10 text
Apache Spark ● Distributed in-memory computing ● Distributed machine learning framework ● 100x faster than Hadoop ● RDDs
Slide 11
Slide 11 text
Luigi ● Complex pipelines ● Dependency resolution ● Workflow management ● Visualization ● Exception handling
Slide 12
Slide 12 text
What did we learn today?
Slide 13
Slide 13 text
What did we learn today? Building scalable data science platform is easy
Slide 14
Slide 14 text
What did we learn today? Building scalable data science platform is easy
Slide 15
Slide 15 text
Thank You @unnati_xyz