Billing the Cloud

Billing the Cloud

Updated billing the cloud slides for We are Developers 2017 in Vienna

2fcc875f98607b3007909fe4be99160d?s=128

Pierre-Yves Ritschard

May 12, 2017
Tweet

Transcript

  1. @pyr Billing the cloud Real world stream processing

  2. @pyr Three-line bio • CTO & co-founder at Exoscale •

    Open Source Developer • Monitoring & Distributed Systems Enthusiast
  3. @pyr Billing the cloud Real world stream processing

  4. @pyr • Billing resources • Scaling methodologies • Our approach

  5. @pyr

  6. @pyr provider "exoscale" { api_key = "${var.exoscale_api_key}" secret_key = "${var.exoscale_secret_key}"

    } resource "exoscale_instance" "web" { template = "ubuntu 17.04" disk_size = "50g" template = "ubuntu 17.04" profile = "medium" ssh_key = "production" }
  7. None
  8. None
  9. @pyr Infrastructure isn’t free! (sorry)

  10. @pyr Business Model • Provide cloud infrastructure • (???) •

    Profit!
  11. None
  12. None
  13. @pyr 10000 mile high view

  14. None
  15. Quantities

  16. Quantities • 10 megabytes have been set from 159.100.251.251 over

    the last minute
  17. Resources

  18. Resources • Account WAD started instance foo with profile large

    today at 12:00 • Account WAD stopped instance foo today at 12:15
  19. A bit closer to reality {:type :usage :entity :vm :action

    :create :time #inst "2016-12-12T15:48:32.000-00:00" :template "ubuntu-16.04" :source :cloudstack :account "geneva-jug" :uuid "7a070a3d-66ff-4658-ab08-fe3cecd7c70f" :version 1 :offering "medium"}
  20. A bit closer to reality message IPMeasure { /* Versioning

    */ required uint32 header = 1; required uint32 saddr = 2; required uint64 bytes = 3; /* Validity */ required uint64 start = 4; required uint64 end = 5; }
  21. @pyr Theory

  22. @pyr Quantities are simple

  23. None
  24. @pyr Resources are harder

  25. None
  26. @pyr This is per account

  27. None
  28. @pyr Solving for all events

  29. resources = {} metering = [] def usage_metering(): for event

    in fetch_all_events(): uuid = event.uuid() time = event.time() if event.action() == 'start': resources[uuid] = time else: timespan = duration(resources[uuid], time) usage = Usage(uuid, timespan) metering.append(usage) return metering
  30. @pyr In Practice

  31. @pyr • This is a never-ending process • Minute-precision billing

    • Applied every hour
  32. @pyr • Avoid overbilling at all cost • Avoid underbilling

    (we need to eat!)
  33. @pyr • Keep a small operational footprint

  34. @pyr A naive approach

  35. 30 * * * * usage-metering >/dev/null 2>&1

  36. None
  37. @pyr Advantages

  38. @pyr • Low operational overhead • Simple functional boundaries •

    Easy to test
  39. @pyr Drawbacks

  40. @pyr • High pressure on SQL server • Hard to

    avoid overlapping jobs • Overlaps result in longer metering intervals
  41. You are in a room full of overlapping cron jobs.

    You can hear the screams of a dying MySQL server. An Oracle vendor is here. To the West, a door is marked “Map/Reduce” To the East, a door is marked “Stream Processing”
  42. > Talk to Oracle

  43. You’ve been eaten by a grue.

  44. > Go West

  45. @pyr

  46. @pyr • Conceptually simple • Spreads easily • Data locality

    aware processing
  47. @pyr • ETL • High latency • High operational overhead

  48. > Go East

  49. @pyr

  50. @pyr • Continuous computation on an unbounded stream • Each

    record processed as it arrives • Very low latency
  51. @pyr • Conceptually harder • Where do we store intermediate

    results? • How does data flow between computation steps?
  52. @pyr Deciding factors

  53. @pyr Our shopping list • Operational simplicity • Integration through

    our whole stack • Room to grow
  54. @pyr Operational simplicity • Experience matters • Spark and Storm

    are intimidating • Hbase & Hive discarded
  55. @pyr Integration • HDFS & Kafka require simple integration •

    Spark goes hand in hand with Cassandra
  56. @pyr Room to grow • A ton of logs •

    A ton of metrics
  57. @pyr Small confession • Previously knew Kafka

  58. @pyr

  59. None
  60. @pyr • Publish & Subscribe • Processing • Store

  61. @pyr Publish & Subscribe • Records are produced on topics

    • Topics have a predefined number of partitions • Records have a key which determines their partition
  62. @pyr • Consumers get assigned a set of partitions •

    Consumers store their last consumed offset • Brokers own partitions, handle replication
  63. None
  64. @pyr • Stable consumer topology • Memory disaggregation • Can

    rely on in-memory storage • Age expiry and log compaction
  65. @pyr

  66. @pyr Billing at Exoscale

  67. None
  68. None
  69. None
  70. @pyr Problem solved?

  71. @pyr • Process crashes • Undelivered message? • Avoiding overbilling

  72. @pyr Reconciliation • Snapshot of full inventory • Converges stored

    resource state if necessary • Handles failed deliveries as well
  73. @pyr Avoiding overbilling • Reconciler acts as logical clock •

    When supplying usage, attach a unique transaction ID • Reject multiple transaction attempts on a single ID
  74. @pyr Avoiding overbilling • Reconciler acts as logical clock •

    When supplying usage, attach a unique transaction ID • Reject multiple transaction attempts on a single ID
  75. @pyr Parting words

  76. @pyr Looking back • Things stay simple (roughly 600 LoC)

    • Room to grow • Stable and resilient • DNS, Logs, Metrics, Event Sourcing
  77. @pyr What about batch? • Streaming doesn’t work for everything

    • Sometimes throughput matters more than latency • Building models in batch, applying with stream processing
  78. @pyr Thanks! Questions?