Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ivory - Concepts
Search
Ambiata
October 20, 2014
Technology
920
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Ivory - Concepts
Ambiata
October 20, 2014
More Decks by Ambiata
See All by Ambiata
Improving feature engineering in the lab and production with Ivory
ambiata
3
680
Ivory - A Data Store for Data Science
ambiata
1
740
Ivory - Data Modelling
ambiata
0
520
Ivory - An Introduction
ambiata
1
1.3k
Other Decks in Technology
See All in Technology
社内 AI エージェント Synapse と セマンティックレイヤーの育て方
hiroakis
2
1.7k
Djangoユーザが知っ得なPostgreSQL機能 - 設計の選択肢を増やす / Djang-use-PostgreSQL
soudai
PRO
1
230
LLMと共に進化するプロセスを目指して
ymatsuwitter
13
4k
Chainlitで作るお手軽チャットUI
ynt0485
0
200
2026TECHFRESH畢業分享會 - Lightning Talk - 資料也要 CI/CD? 用 Airbyte 自動化資料同步
line_developers_tw
PRO
0
830
フロンティアAIのゲート化と地政学リスク
nagatsu
0
130
「エンジニア進化論」2028年の開発完全自動化、エンジニアはどう進化するか
cyberagentdevelopers
PRO
6
4.6k
Microsoft Build Keynoteふりかえり
tomokusaba
0
120
ルールやカスタム機能、どう活かす?ハンズオンで体感するIBM Bobの出力コントロール
muehara
1
130
protovalidate-es を導入してみた
bengo4com
0
170
Claude Code の Sandbox 機能を Anthropic Sandbox Runtime(srt) で試そう!/lets-play-anthropic-sandbox-runtime
tomoki10
1
550
なぜ Platform Engineering の土台に Kubernetes を選ぶのか
r4ynode
2
590
Featured
See All Featured
ReactJS: Keep Simple. Everything can be a component!
pedronauck
666
130k
Prompt Engineering for Job Search
mfonobong
0
340
Hiding What from Whom? A Critical Review of the History of Programming languages for Music
tomoyanonymous
2
850
AI: The stuff that nobody shows you
jnunemaker
PRO
8
710
A Tale of Four Properties
chriscoyier
163
24k
SEO for Brand Visibility & Recognition
aleyda
0
4.6k
Visual Storytelling: How to be a Superhuman Communicator
reverentgeek
2
560
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.8k
Marketing to machines
jonoalderson
1
5.4k
Docker and Python
trallard
47
3.9k
Leveraging Curiosity to Care for An Aging Population
cassininazir
1
270
How STYLIGHT went responsive
nonsquared
100
6.2k
Transcript
IVORY CONCEPTS http://github.com/ambiata/ivory © Ambiata 2014
IVORY A scalable and extensible data store for storing facts
and extracting features © Ambiata 2014
Ivory Repository Ingest facts Extract features © Ambiata 2014
REPOSITORY • Storing and extracting data for a single class
of entity, e.g.: • customer • account • asset © Ambiata 2014
DATA MODEL © Ambiata 2014
customer-1 balance 634 @ 2014-02-01 single “fact” Fact: Entity -
Attribute - Value - Time The value of a feature (attribute) for a given entity known to be valid from a certain point in time. © Ambiata 2014
customer-1 balance 634 @ 2014-02-01 customer-2 customer-3 customer-4 469 @
2014-02-01 276 @ 2014-04-01 1966 @ 2014-03-01 © Ambiata 2014 scalable
customer-2 customer-3 customer-4 customer-1 gender balance purchases zipcode 634 @
2014-02-01 extensible 469 @ 2014-02-01 276 @ 2014-04-01 1966 @ 2014-03-01 ‘M’ @ 2012-01-01 3 @ 2014-03-27 ‘4670’ @ 2009-05-13 © Ambiata 2014
736 @ 2014-01-01 3 @ 2014-02-19 184 @ 2014-02-01 312
@ 2014-03-01 customer-1 customer-2 customer-3 customer-4 gender balance purchases zipcode ‘M’ @ 2012-01-01 276 @ 2014-04-01 4 @ 2014-04-04 2 @ 2014-03-12 3 @ 2014-03-27 ‘2381’ @ 2004-08-19 ‘4670’ @ 2009-05-13 ‘F’ @ 2007-04-01 ‘3001’ @ 2011-09-14 1876 @ 2014-02-01 1966 @ 2014-03-01 634 @ 2014-02-01 Sparse 469 @ 2014-02-01 © Ambiata 2014
INGESTING FACTS © Ambiata 2014
• Facts are ingested in atomic units called factsets •
Facts in a factset can span any set of: • entities • attributes • dates/times © Ambiata 2014
customer-1 balance 634 2014-02-01 customer-3 balance 184 2014-02-01 customer-4 purchases
4 2014-02-04 cusomter-2 balance 312 2014-03-01 customer-3 gender F 2007-04-01 customer-2 zipcode 3001 2011-03-14 © Ambiata 2014
ATTRIBUTE DICTIONARY © Ambiata 2014
• Any attribute that is ingested must be declared in
the repository’s dictionary • Dictionary stores metadata for each attribute • Updated dictionaries can be imported into a repository at any time © Ambiata 2014
namespace name encoding type description demographics gender string categorical Gender
demographics zipcode string categorical Post-code, zip-code accounts balance double numerical Balance of savings account accounts purchases int numerical Number of credit-card purchases © Ambiata 2014
EXTRACTING FEATURES © Ambiata 2014
© Ambiata 2014 0.00 3 3001 634.83 16 4670 15.12
2 - 33.56 2 - 98.34 12 3303 523.81 23 2046 1086.05 17 - 224.81 9 - 78.21 2 2134 126.48 4 - M - F M F - F F M - gender balance purchases zipcode 89340218 feature instance 48149407 18452274 07499337 62948721 93754723 00272446 13374497 31989993 46474236
SNAPSHOTS • Attribute values for entities at a point in
time • Same time for all entities • Select latest attribute values with respect to that time • Typically used in preparing instances for scoring © Ambiata 2014
736 @ 2014-01-01 6 @ 2014-02-19 184 @ 2014-02-01 312
@ 2014-03-01 customer-1 customer-2 customer-3 customer-4 gender balance purchases zipcode ‘M’ @ 2012-01-01 276 @ 2014-04-01 4 @ 2014-02-04 2 @ 2014-03-12 3 @ 2014-03-27 ‘2381’ @ 2004-08-19 ‘4670’ @ 2009-05-13 ‘F’ @ 2007-04-01 ‘3001’ @ 2011-09-14 1876 @ 2014-02-01 1966 @ 2014-03-01 634 @ 2014-02-01 snapshot @ 2014-03-01 469 @ 2014-02-01 © Ambiata 2014
customer-1 customer-2 customer-3 customer-4 gender balance purchases zipcode ‘M’ 312
4 ‘4670’ ‘F’ ‘3001’ 1966 634 469 © Ambiata 2014
• It is assumed snapshots run periodically - e.g. daily,
weekly • Ivory exploits this assumption to improve the runtime of successive snapshots
CHORDS • Attribute values for entities at a point in
time • Different times for different entities • Select latest attribute values with respect to the times • Typically used in preparing instances for training © Ambiata 2014
736 @ 2014-01-01 6 @ 2014-02-19 184 @ 2014-02-01 312
@ 2014-03-01 customer-1 customer-2 customer-3 customer-4 gender balance purchases zipcode ‘M’ @ 2012-01-01 276 @ 2014-04-01 4 @ 2014-04-04 2 @ 2014-03-12 3 @ 2014-03-27 ‘2381’ @ 2004-08-19 ‘4670’ @ 2009-05-13 ‘F’ @ 2007-04-01 ‘3001’ @ 2011-09-14 1876 @ 2014-02-01 1966 @ 2014-03-01 634 @ 2014-02-01 469 @ 2014-02-01 customer2 @ 2014-03-01 customer4 @ 2014-01-01 © Ambiata 2014
customer-2 @ 2014-03-01 customer-4 @ 2014-01-01 gender balance purchases postcode
‘M’ 312 6 ‘4670’ ‘3001’ 1876 © Ambiata 2014
DERIVED FACTS © Ambiata 2014
184 @ 2014-02-01 312 @ 2014-03-01 customer-2 balance max.balance.4M 276
@ 2014-04-01 ? © Ambiata 2014 Maximum balance over last 4 months can be derived from set of balance facts
Many facts can be derived from a time series of
base facts © Ambiata 2014
base fact derived facts balance Maximum balance over the last
month Mean balance over the last 2 months Balance gradient over the last 3 months purchase Number of purchases in the last 3 weeks Proportion of supermarket purchases in the last 2 weeks zipcode Number of times the zipcode has change in the last 5 years Longest period where the zipcode has not changed in the last 5 years © Ambiata 2014
VIRTUAL FEATURES © Ambiata 2014
• Ivory represents derived facts as virtual features • Virtual
features are declared in the dictionary • Specify expressions against base facts • Are computed lazily when features extracted © Ambiata 2014
name source expression window max.balance.4M balance max 4 month mean.balance.6M
balance mean 6 months num.purchases.3W purchase count 3 weeks changes.zipcode.5Y zipcode num_flips 5 years © Ambiata 2014
COMMITS © Ambiata 2014
Ivory Repository Ingest facts Extract features Import dictionary © Ambiata
2014
• A commit is recorded for any repository change: •
factset ingestions • dictionary imports • The repository at a given commit is an immutable data store • Snapshot and chord can be done at a specific commit © Ambiata 2014
1 2 3 4 5 0 create repository import dictionary
ingest factset ingest factset import dictionary ingest factset snapshot snapshot chord © Ambiata 2014
KEY CONCEPTS © Ambiata 2014
• Repository • Commit • Dictionary • Factset • Base
fact • Virtual feature • Snapshot • Chord © Ambiata 2014