Data set A
Data set B
Data set C
Data set D
Data set E
Feature Eng 1
Feature Eng 2
Feature Eng 3
Train Model 1
Score
Train Model 2
Score
Train Model 3
Score
Modelling in silos - models built and deployed in isolation
Feature engineering is in a silo -
no reuse between model builds
Slide 4
Slide 4 text
Data set A
Data set B
Data set C
Data set D
Data set E
Feature Eng A
Train Model 1
Score
Train Model 2
Score
Train Model 3
Score
Modelling in silos - models built and deployed in isolation
Feature Store
!
(Ivory)
Feature Eng B
Feature Eng C
Feature Eng D
Feature Eng E
Feature engineering
is done once
Features are reused
across model builds
User experience
• Consistent command line tooling
• Version metadata
• Workflows
• End-to-end example system
Slide 43
Slide 43 text
Extensible dictionary
• Support for rich attribute types (e.g. structs, arrays)
• Arbitrary attribute metadata
• Specification of valid attribute values
• e.g. ‘M’ and ‘F’ only for gender
• Improved validation
• Improved on-disk representation
• Useful for downstream applications, e.g. plots
Slide 44
Slide 44 text
Lazy feature generation
• Lazily generate features derived from
existing facts on extract (chord/snapshot)
• Derived “meta” features (i.e. ‘select’)
• Windowing functions (e.g. “average over
last 3 months”)
• Row-level features
Slide 45
Slide 45 text
Data set A
Data set B
Data set C
Data set D
Data set E
Train Model 1
Score
Train Model 2
Score
Train Model 3
Score
Modelling in silos - models built and deployed in isolation
Feature Store
!
(Ivory)
Feature engineering
is integrated and lazily
generated on extraction
Source data loaded
directly
Feature
Eng
!
(Ivory)
Slide 46
Slide 46 text
Repository forking
• Low-cost cloning/forking of repositories:
• “master” production repo
• “experimental” cloned repo
• Allow a data scientist to join production
features with their own without affecting
production operations
Slide 47
Slide 47 text
Other filesystems
• Support for repository metadata and fact
sets to be on different file systems
• Support HDFS, S3 and POSIX