Slide 1

Slide 1 text

IVORY DATA MODELLING http://github.com/ambiata/ivory © Ambiata 2014

Slide 2

Slide 2 text

WHAT WE START WITH © Ambiata 2014

Slide 3

Slide 3 text

© Ambiata 2014

Slide 4

Slide 4 text

WHAT WE NEED © Ambiata 2014

Slide 5

Slide 5 text

Feature vectors © Ambiata 2014 0.00 3 3001 1.00 634.83 16 4670 0.6875 15.12 2 - 0.50 33.56 2 - 1.00 98.34 12 3303 0.8333 523.81 23 2046 0.4782 1086.05 17 - 1.00 224.81 9 - 0.2222 78.21 2 2134 0.50 126.48 4 - 0.0 1 3 1 1 4 1 2 1 1 1 M - F M F - F F M - gender balance purchases zipcode prop_online num_accs 89340218 feature instance 48149407 18452274 07499337 62948721 93754723 00272446 13374497 31989993 46474236

Slide 6

Slide 6 text

Ivory Repository Ingest facts Extract features © Ambiata 2014

Slide 7

Slide 7 text

© Ambiata 2014 Fact ETL Source data Entity resolution + attribution Factset Ivory Repository Ingest facts Extract features

Slide 8

Slide 8 text

WHAT’S A FACT? © Ambiata 2014

Slide 9

Slide 9 text

WHAT’S A FEATURE? © Ambiata 2014

Slide 10

Slide 10 text

FACT • Atomic piece of information attributed to an entity • 2 types: states and events • Captured as close to the “source” as possible © Ambiata 2014

Slide 11

Slide 11 text

• State facts • Demographics, e.g.: gender, DOB, zipcode, etc • Account statuses • Subscription states • Snapshots, e.g. account balance at end of month • Segments © Ambiata 2014

Slide 12

Slide 12 text

• Event facts • Purchases • Page views • Phone calls • Queries © Ambiata 2014

Slide 13

Slide 13 text

FEATURE • Attribute that describes one aspect of an entity • Derived from facts • Simplest feature is “latest value before ‘date’” © Ambiata 2014

Slide 14

Slide 14 text

• Latest • Days since latest, days since earliest • Count, sum • Mean, quantile, proportion • Gradient, state changes © Ambiata 2014