Slide 1

Slide 1 text

Build high quality ML models quickly using a central Feature Generator Library 2023-09-12

Slide 2

Slide 2 text

Who are we? Nischay Ghattamaraju Roel Bertens

Slide 3

Slide 3 text

The Problem(s) What is holding teams back?

Slide 4

Slide 4 text

Solution Feature Generator Library (FGL) Collaboration § Quality § Single source of truth Reusability § Iteration / development speed § Consistency PoC and PROD § Re-usable code + documentation Efficient computation A python package containing well defined, reusable and tested features Managed dependencies between features Automated generation of documentation and diagrams What is it? Benefits

Slide 5

Slide 5 text

Comparison FGL vs Feature Store FGL Feature Store Stores the logic to generate features Stores the features Computation on demand Frequently triggered updates Computes only what is needed Precomputes all the features Retrieval is slow Retrieval is fast Easier to introduce Increases complexity of the platform No storage costs involved Storage costs are needed

Slide 6

Slide 6 text

Code Walkthroughs

Slide 7

Slide 7 text

Code Walkthroughs How we define a Feature

Slide 8

Slide 8 text

Code Walkthroughs How we define a Feature Group

Slide 9

Slide 9 text

Code Walkthroughs Example Feature Group

Slide 10

Slide 10 text

Code Walkthroughs Usage of the FGL Client_id avg_nr_of_items avg_nr_of_items__ max_per_city avg_nr_of_items__ compared_to_city_max city client_1 3 3 1.0 Utrecht client_2 2 3 0.66 Utrecht

Slide 11

Slide 11 text

Learnings Some of the lessons we learnt Thorough documentation makes it easier to collaborate Invest if you want to scale to more features/models (so no custom feature pipelines) Don’t start with a Feature Store but start simple and upgrade when needed Think about who is going to own and maintain the code. Communicate the value clearly to users and contributors to avoid shelfware

Slide 12

Slide 12 text

Questions?