Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Billing the Cloud

Billing the Cloud

A real world use case of stream processing. How we went from simlpe & naive cron jobs to Kafka stream processing to bill our customers.

Marc-Aurèle Brothier

February 23, 2017
Tweet

More Decks by Marc-Aurèle Brothier

Other Decks in Technology

Transcript

  1. 2 . 1 @pyr Co-Founder, CTO at Exoscale Open source

    developer @marcaurele Senior Engineer @ Exoscale Ex-sport junky & world traveler
  2. 13 . 1 14 . 1 Quantities 10 megabytes have

    been sent from 159.100.251.251 over the last minute
  3. 15 . 1 Resources Account kickass-company started instance foo with

    pro le large today at 12:00 Account kickass-company stopped instance foo today at 12:15
  4. 16 . 1 A bit closer to reality {:type :usage

    :entity :vm :action :create :time #inst "2016-12-12T15:48:32.000-00:00" :template "ubuntu-16.04" :source :cloudstack :account "kickass-company" :uuid "7a070a3d-66ff-4658-ab08-fe3cecd7c70f" :version 1 :offering "medium"}
  5. 17 . 1 A bit closer to reality message IPMeasure

    { /* Versioning */ required uint32 header = 1; required uint32 saddr = 2; required uint64 bytes = 3; /* Validity */ required uint64 start = 4; required uint64 end = 5; }
  6. 24 . 1 25 . 1 Solving for all events

    resources = {} metering = [] def usage_metering(): for event in fetch_all_events(): uuid = event.uuid() time = event.time() if event.action() == 'start': resources[uuid] = time else: timespan = duration(resources[uuid], time) usage = Usage(uuid, timespan) metering.append(usage) return metering
  7. 26 . 1 Practical matters This is a never-ending process

    Minute precision billing Only apply once an hour Avoid over billing at all cost Avoid under billing (we need to eat!)
  8. 33 . 1 34 . 1 Drawbacks High pressure on

    SQL server Hard to avoid overlapping jobs Overlaps result in longer metering intervals
  9. 35 . 1 You are in a room full of

    overlapping cron jobs. You can hear the screams of a dying MySQL server. An Oracle vendor is here. To the West, a door is marked "Map/Reduce" To the East, a door is marked "Streaming"
  10. 39 . 1 40 . 1 Conceptually simple Spreads easily

    Data-locality aware processing
  11. 45 . 1 46 . 1 Each event processed as

    it comes in Very low latency A never ending reduce
  12. 46 . 1 47 . 1 (reductions + [1 2

    3 4]) ;; => (1 3 6 10)
  13. 47 . 1 48 . 1 Conceptually harder Where do

    we store intermediate results? How does data ow between computation steps?
  14. 52 . 1 53 . 1 Operational simplicity Experience matters

    Spark and Storm are intimidating Hbase & Hive discarded
  15. 54 . 1 Integration HDFS would require simple integration Spark

    usually goes hand in hand with Cassandra Storm tends to prefer Kafka
  16. 59 . 1 60 . 1 Publish & Subscribe Messages

    are produced to topics Topics have a prede ned number of partitions Messages have a key which determines its partition
  17. 61 . 1 Consumers get assigned a set of partitions

    Consumers store their last consumed offset Brokers own partitions, handle replication
  18. 62 . 1 63 . 1 Stable consumer topology Memory

    desaggregation Can rely on in-memory storage
  19. 70 . 1 71 . 1 Process crashes Triggers a

    rebalance Loss of in-memory cache No initial state!
  20. 72 . 1 Reconciliation Snapshot of full inventory Converges stored

    resource state if necessary Handles failed deliveries as well
  21. 73 . 1 Avoiding double billing Reconciler acts as logical

    clock When supplying usage, attach a unique transaction ID Reject multiple transaction attempts on a single ID
  22. 74 . 1 Looking back Things stay simple (roughly 600

    LoC) Room to grow Stable and resilient DNS, Logs, Metrics, Event Sourcing
  23. 75 . 1 What about batch Streaming doesn't work for

    everything Sometimes throughput matters more than latency Building models in batch, applying with stream processing