Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Billing the Cloud
Search
Pierre-Yves Ritschard
December 15, 2016
Technology
7
2.2k
Billing the Cloud
This talk describes how Exoscale approaches usage metering and billing with Apache Kafka
Pierre-Yves Ritschard
December 15, 2016
Tweet
Share
More Decks by Pierre-Yves Ritschard
See All by Pierre-Yves Ritschard
Meetup Camptocamp: Exoscale SKS
pyr
0
410
The (long) road to Kubernetes
pyr
0
300
From vertical to horizontal: The challenges of scalability in the cloud
pyr
0
59
Change Management at Scale
pyr
0
98
5 years of Clojure
pyr
2
1k
Taming Jenkins
pyr
0
40
Init: then and now
pyr
1
190
Billing the Cloud
pyr
0
290
From Vertical to Horizontal
pyr
2
140
Other Decks in Technology
See All in Technology
より良い開発者体験を実現するために~開発初心者が感じた生成AIの可能性~
masakiokuda
0
210
DETR手法の変遷と最新動向(CVPR2025)
tenten0727
2
1.4k
AIコーディングの最前線 〜活用のコツと課題〜
pharma_x_tech
4
2.3k
Dynamic Reteaming And Self Organization
miholovesq
3
630
SmartHR プロダクトエンジニア求人ガイド_2025 / PdE job guide 2025
smarthr
0
170
“パスワードレス認証への道" ユーザー認証の変遷とパスキーの関係
ritou
1
620
MCPを活用した検索システムの作り方/How to implement search systems with MCP #catalks
quiver
13
7k
PagerDuty×ポストモーテムで築く障害対応文化/Building a culture of incident response with PagerDuty and postmortems
aeonpeople
2
400
Winning at PHP in Production in 2025
beberlei
1
140
ここはMCPの夜明けまえ
nwiizo
30
11k
AIでめっちゃ便利になったけど、結局みんなで学ぶよねっていう話
kakehashi
PRO
1
380
持続可能なドキュメント運用のリアル: 1年間の成果とこれから
akitok_
1
210
Featured
See All Featured
A Modern Web Designer's Workflow
chriscoyier
693
190k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
47
2.7k
A Tale of Four Properties
chriscoyier
158
23k
Fireside Chat
paigeccino
37
3.4k
Intergalactic Javascript Robots from Outer Space
tanoku
270
27k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
8
670
YesSQL, Process and Tooling at Scale
rocio
172
14k
Speed Design
sergeychernyshev
29
900
Facilitating Awesome Meetings
lara
54
6.3k
The Straight Up "How To Draw Better" Workshop
denniskardys
232
140k
Making the Leap to Tech Lead
cromwellryan
133
9.2k
How to Think Like a Performance Engineer
csswizardry
23
1.5k
Transcript
1 Billing the cloud Real world stream processing
2 . 1 @pyr Co-Founder, CTO at Exoscale Open source
developer
3 . 1 Tonight Problem domain Scaling methodologies Our approach
None
4 . 1
5 . 1
6 . 1 7 . 1 Infrastructure isn't free!
8 . 1 Business Model Provide cloud infrastructure ??? Pro
t!
None
9 . 1
10 . 1 11 . 1 10000 mile high view
None
12 . 1 Quantities Resources
13 . 1 14 . 1 Quantities 10 megabytes have
been sent from 159.100.251.251 over the last minute
15 . 1 Resources Account geneva-jug started instance foo with
pro le large today at 12:00 Account geneva-jug stopped instance foo today at 12:15
16 . 1 A bit closer to reality {:type :usage
:entity :vm :action :create :time #inst "2016-12-12T15:48:32.000-00:00" :template "ubuntu-16.04" :source :cloudstack :account "geneva-jug" :uuid "7a070a3d-66ff-4658-ab08-fe3cecd7c70f" :version 1 :offering "medium"}
17 . 1 A bit closer to reality message IPMeasure
{ /* Versioning */ required uint32 header = 1; required uint32 saddr = 2; required uint64 bytes = 3; /* Validity */ required uint64 start = 4; required uint64 end = 5; }
18 . 1 Theory
19 . 1 Quantities are simple
None
20 . 1 21 . 1 Resources are harder
None
22 . 1 23 . 1 This is per-account
None
24 . 1 25 . 1 Solving for all events
resources = {} metering = [] def usage_metering(): for event in fetch_all_events(): uuid = event.uuid() time = event.time() if event.action() == 'start': resources[uuid] = time else: timespan = duration(resources[uuid], time) usage = Usage(uuid, timespan) metering.append(usage) return metering
26 . 1 Practical matters This is a never-ending process
Minute precision billing Only apply once an hour Avoid over billing at all cost Avoid under billing (we need to eat!)
27 . 1 Practical matters Keep a small operational footprint
28 . 1 A naive approach
32 * * * * usage-metering >/dev/null 2>&1
29 . 1
30 . 1
31 . 1 32 . 1 Advantages
Low operational overhead Simple functional boundaries Easy to test
33 . 1 34 . 1 Drawbacks High pressure on
SQL server Hard to avoid overlapping jobs Overlaps result in longer metering intervals
You are in a room full of overlapping cron jobs.
You can hear the screams of a dying MySQL server. An Oracle vendor is here. To the West, a door is marked "Map/Reduce" To the East, a door is marked "Streaming"
35 . 1 36 . 1 > Talk to Oracle
You have been eaten by a grue.
37 . 1 38 . 1 > Go West
None
39 . 1 Conceptually simple Spreads easily Data-locality aware processing
40 . 1 ETL High latency High operational overhead
41 . 1
42 . 1 43 . 1 > Go East
None
44 . 1 Continuous computation on an unbounded stream
45 . 1 Each event processed as it comes in
Very low latency A never ending reduce
46 . 1 (reductions + [1 2 3 4]) ;;
=> (1 3 6 10)
47 . 1 Conceptually harder Where do we store intermediate
results? How does data ow between computation steps?
48 . 1
49 . 1 50 . 1 Deciding factors
51 . 1 Our shopping list
Operational simplicity Integration through our whole stack Going beyond billing
Room to grow
52 . 1 53 . 1 Operational simplicity Experience matters
Spark and Storm are intimidating Hbase & Hive discarded
54 . 1 Integration HDFS would require simple integration Spark
usually goes hand in hand with Cassandra Storm tends to prefer Kafka
55 . 1 Room to grow A ton of logs
A ton of metrics
56 . 1 Thursday confessions Previously knew Kafka
None
57 . 1
58 . 1 Publish & Subscribe Processing Store
59 . 1 60 . 1 Publish & Subscribe Messages
are produced to topics Topics have a prede ned number of partitions Messages have a key which determines its partition
Consumers get assigned a set of partitions Consumers store their
last consumed offset Brokers own partitions, handle replication
61 . 1
62 . 1 Stable consumer topology Memory desaggregation Can rely
on in-memory storage
63 . 1 64 . 1 Stream expiry
None
65 . 1
66 . 1
67 . 1
68 . 1 69 . 1 Problem solved?
Process crashes Undelivered message? Avoiding double billing
70 . 1 71 . 1 Process crashes Triggers a
rebalance Loss of in-memory cache No initial state!
72 . 1 Reconciliation Snapshot of full inventory Converges stored
resource state if necessary Handles failed deliveries as well
73 . 1 Avoiding double billing Reconciler acts as logical
clock When supplying usage, attach a unique transaction ID Reject multiple transaction attempts on a single ID
74 . 1 Looking back Things stay simple (roughly 600
LoC) Room to grow Stable and resilient DNS, Logs, Metrics, Event Sourcing
75 . 1 What about batch Streaming doesn't work for
everything Sometimes throughput matters more than latency Building models in batch, applying with stream processing
76 . 1 Questions? Thanks!