Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Billing the Cloud
Search
Pierre-Yves Ritschard
December 15, 2016
Technology
7
2.1k
Billing the Cloud
This talk describes how Exoscale approaches usage metering and billing with Apache Kafka
Pierre-Yves Ritschard
December 15, 2016
Tweet
Share
More Decks by Pierre-Yves Ritschard
See All by Pierre-Yves Ritschard
Meetup Camptocamp: Exoscale SKS
pyr
0
390
The (long) road to Kubernetes
pyr
0
290
From vertical to horizontal: The challenges of scalability in the cloud
pyr
0
55
Change Management at Scale
pyr
0
91
5 years of Clojure
pyr
2
1k
Taming Jenkins
pyr
0
34
Init: then and now
pyr
1
180
Billing the Cloud
pyr
0
280
From Vertical to Horizontal
pyr
2
130
Other Decks in Technology
See All in Technology
データベースの負荷を紐解く/untangle-the-database-load
emiki
1
120
Apache Iceberg Case Study in LY Corporation
lycorptech_jp
PRO
0
270
速くて安いWebサイトを作る
nishiharatsubasa
15
15k
Snowflakeの開発・運用コストをApache Icebergで効率化しよう!~機能と活用例のご紹介~
sagara
1
350
AIエージェント時代のエンジニアになろう #jawsug #jawsdays2025 / 20250301 Agentic AI Engineering
yoshidashingo
4
410
クラウドサービス事業者におけるOSS
tagomoris
3
980
IAMポリシーのAllow/Denyについて、改めて理解する
smt7174
2
190
AI Agent時代なのでAWSのLLMs.txtが欲しい!
watany
2
180
データマネジメントのトレードオフに立ち向かう
ikkimiyazaki
6
1.2k
データエンジニアリング領域におけるDuckDBのユースケース
chanyou0311
7
2k
わたしがEMとして入社した「最初の100日」の過ごし方 / EMConfJp2025
daiksy
13
4.2k
アジャイルな開発チームでテスト戦略の話は誰がする? / Who Talks About Test Strategy?
ak1210
1
170
Featured
See All Featured
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
4
360
StorybookのUI Testing Handbookを読んだ
zakiyama
28
5.5k
Making the Leap to Tech Lead
cromwellryan
133
9.1k
A Tale of Four Properties
chriscoyier
158
23k
Automating Front-end Workflow
addyosmani
1368
200k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
330
21k
Speed Design
sergeychernyshev
27
800
Done Done
chrislema
182
16k
Building Your Own Lightsaber
phodgson
104
6.2k
Keith and Marios Guide to Fast Websites
keithpitt
411
22k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
280
13k
How To Stay Up To Date on Web Technology
chriscoyier
790
250k
Transcript
1 Billing the cloud Real world stream processing
2 . 1 @pyr Co-Founder, CTO at Exoscale Open source
developer
3 . 1 Tonight Problem domain Scaling methodologies Our approach
None
4 . 1
5 . 1
6 . 1 7 . 1 Infrastructure isn't free!
8 . 1 Business Model Provide cloud infrastructure ??? Pro
t!
None
9 . 1
10 . 1 11 . 1 10000 mile high view
None
12 . 1 Quantities Resources
13 . 1 14 . 1 Quantities 10 megabytes have
been sent from 159.100.251.251 over the last minute
15 . 1 Resources Account geneva-jug started instance foo with
pro le large today at 12:00 Account geneva-jug stopped instance foo today at 12:15
16 . 1 A bit closer to reality {:type :usage
:entity :vm :action :create :time #inst "2016-12-12T15:48:32.000-00:00" :template "ubuntu-16.04" :source :cloudstack :account "geneva-jug" :uuid "7a070a3d-66ff-4658-ab08-fe3cecd7c70f" :version 1 :offering "medium"}
17 . 1 A bit closer to reality message IPMeasure
{ /* Versioning */ required uint32 header = 1; required uint32 saddr = 2; required uint64 bytes = 3; /* Validity */ required uint64 start = 4; required uint64 end = 5; }
18 . 1 Theory
19 . 1 Quantities are simple
None
20 . 1 21 . 1 Resources are harder
None
22 . 1 23 . 1 This is per-account
None
24 . 1 25 . 1 Solving for all events
resources = {} metering = [] def usage_metering(): for event in fetch_all_events(): uuid = event.uuid() time = event.time() if event.action() == 'start': resources[uuid] = time else: timespan = duration(resources[uuid], time) usage = Usage(uuid, timespan) metering.append(usage) return metering
26 . 1 Practical matters This is a never-ending process
Minute precision billing Only apply once an hour Avoid over billing at all cost Avoid under billing (we need to eat!)
27 . 1 Practical matters Keep a small operational footprint
28 . 1 A naive approach
32 * * * * usage-metering >/dev/null 2>&1
29 . 1
30 . 1
31 . 1 32 . 1 Advantages
Low operational overhead Simple functional boundaries Easy to test
33 . 1 34 . 1 Drawbacks High pressure on
SQL server Hard to avoid overlapping jobs Overlaps result in longer metering intervals
You are in a room full of overlapping cron jobs.
You can hear the screams of a dying MySQL server. An Oracle vendor is here. To the West, a door is marked "Map/Reduce" To the East, a door is marked "Streaming"
35 . 1 36 . 1 > Talk to Oracle
You have been eaten by a grue.
37 . 1 38 . 1 > Go West
None
39 . 1 Conceptually simple Spreads easily Data-locality aware processing
40 . 1 ETL High latency High operational overhead
41 . 1
42 . 1 43 . 1 > Go East
None
44 . 1 Continuous computation on an unbounded stream
45 . 1 Each event processed as it comes in
Very low latency A never ending reduce
46 . 1 (reductions + [1 2 3 4]) ;;
=> (1 3 6 10)
47 . 1 Conceptually harder Where do we store intermediate
results? How does data ow between computation steps?
48 . 1
49 . 1 50 . 1 Deciding factors
51 . 1 Our shopping list
Operational simplicity Integration through our whole stack Going beyond billing
Room to grow
52 . 1 53 . 1 Operational simplicity Experience matters
Spark and Storm are intimidating Hbase & Hive discarded
54 . 1 Integration HDFS would require simple integration Spark
usually goes hand in hand with Cassandra Storm tends to prefer Kafka
55 . 1 Room to grow A ton of logs
A ton of metrics
56 . 1 Thursday confessions Previously knew Kafka
None
57 . 1
58 . 1 Publish & Subscribe Processing Store
59 . 1 60 . 1 Publish & Subscribe Messages
are produced to topics Topics have a prede ned number of partitions Messages have a key which determines its partition
Consumers get assigned a set of partitions Consumers store their
last consumed offset Brokers own partitions, handle replication
61 . 1
62 . 1 Stable consumer topology Memory desaggregation Can rely
on in-memory storage
63 . 1 64 . 1 Stream expiry
None
65 . 1
66 . 1
67 . 1
68 . 1 69 . 1 Problem solved?
Process crashes Undelivered message? Avoiding double billing
70 . 1 71 . 1 Process crashes Triggers a
rebalance Loss of in-memory cache No initial state!
72 . 1 Reconciliation Snapshot of full inventory Converges stored
resource state if necessary Handles failed deliveries as well
73 . 1 Avoiding double billing Reconciler acts as logical
clock When supplying usage, attach a unique transaction ID Reject multiple transaction attempts on a single ID
74 . 1 Looking back Things stay simple (roughly 600
LoC) Room to grow Stable and resilient DNS, Logs, Metrics, Event Sourcing
75 . 1 What about batch Streaming doesn't work for
everything Sometimes throughput matters more than latency Building models in batch, applying with stream processing
76 . 1 Questions? Thanks!