Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Speaker Deck
PRO
Sign in
Sign up for free
Ivory - Data Modelling
Ambiata
October 20, 2014
Technology
0
360
Ivory - Data Modelling
Ambiata
October 20, 2014
Tweet
Share
More Decks by Ambiata
See All by Ambiata
Improving feature engineering in the lab and production with Ivory
ambiata
3
620
Ivory - A Data Store for Data Science
ambiata
1
570
Ivory - Concepts
ambiata
0
680
Ivory - An Introduction
ambiata
1
1.1k
Other Decks in Technology
See All in Technology
Istioを活用したセキュアなマイクロサービスの実現/Secure Microservices with Istio
ido_kara_deru
3
460
VS Code Meetup #21 - もう一度知りたい基礎編 - ファイル操作、コーディングの基本編
74th
0
200
AWS Step Functions を用いた非同期学習処理の例
hacarus
0
110
DMMプラットフォーム ゼロから始めるKubernetes運用 課題と改善
pospome
0
430
Goで実装するブランドネットワークとの接続ポイント
pongzu
2
300
Identidad en Web3
dschenkelman
0
110
Microsoft Azure を使い始める前に Azure Active Directory と Azure サブスクリプションの役割や関係性を正しく理解する
yoshiakioi
0
190
塩漬けにしているMySQL 8.0.xxをバージョンアップしたくなる、ここ数年でのMySQL 8.0の改善点 / MySQL Update 202208
yoshiakiyamasaki
1
770
ログラスを支える技術的投資の仕組み / loglass-technical-investment
urmot
9
2k
セキュキャンを卒業してその後
kurochan
0
610
ロボットの実行すらメンドクサイ!?
kou12092
0
270
大声で伝えたい!定時に帰る方法
sbtechnight
0
260
Featured
See All Featured
Clear Off the Table
cherdarchuk
79
290k
Docker and Python
trallard
27
1.6k
Intergalactic Javascript Robots from Outer Space
tanoku
260
25k
Fireside Chat
paigeccino
13
1.4k
Raft: Consensus for Rubyists
vanstee
127
5.5k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
39
13k
BBQ
matthewcrist
74
7.9k
Why You Should Never Use an ORM
jnunemaker
PRO
47
7.7k
Making Projects Easy
brettharned
99
4.4k
Stop Working from a Prison Cell
hatefulcrawdad
262
17k
Six Lessons from altMBA
skipperchong
14
1.4k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
6
580
Transcript
IVORY DATA MODELLING http://github.com/ambiata/ivory © Ambiata 2014
WHAT WE START WITH © Ambiata 2014
© Ambiata 2014
WHAT WE NEED © Ambiata 2014
Feature vectors © Ambiata 2014 0.00 3 3001 1.00 634.83
16 4670 0.6875 15.12 2 - 0.50 33.56 2 - 1.00 98.34 12 3303 0.8333 523.81 23 2046 0.4782 1086.05 17 - 1.00 224.81 9 - 0.2222 78.21 2 2134 0.50 126.48 4 - 0.0 1 3 1 1 4 1 2 1 1 1 M - F M F - F F M - gender balance purchases zipcode prop_online num_accs 89340218 feature instance 48149407 18452274 07499337 62948721 93754723 00272446 13374497 31989993 46474236
Ivory Repository Ingest facts Extract features © Ambiata 2014
© Ambiata 2014 Fact ETL Source data Entity resolution +
attribution Factset Ivory Repository Ingest facts Extract features
WHAT’S A FACT? © Ambiata 2014
WHAT’S A FEATURE? © Ambiata 2014
FACT • Atomic piece of information attributed to an entity
• 2 types: states and events • Captured as close to the “source” as possible © Ambiata 2014
• State facts • Demographics, e.g.: gender, DOB, zipcode, etc
• Account statuses • Subscription states • Snapshots, e.g. account balance at end of month • Segments © Ambiata 2014
• Event facts • Purchases • Page views • Phone
calls • Queries © Ambiata 2014
FEATURE • Attribute that describes one aspect of an entity
• Derived from facts • Simplest feature is “latest value before ‘date’” © Ambiata 2014
• Latest • Days since latest, days since earliest •
Count, sum • Mean, quantile, proportion • Gradient, state changes © Ambiata 2014