$30 off During Our Annual Pro Sale. View Details »

Billing the Cloud

Billing the Cloud

Updated billing the cloud slides for We are Developers 2017 in Vienna

Pierre-Yves Ritschard

May 12, 2017
Tweet

More Decks by Pierre-Yves Ritschard

Other Decks in Programming

Transcript

  1. @pyr
    Billing the cloud
    Real world stream processing

    View Slide

  2. @pyr
    Three-line bio
    ● CTO & co-founder at Exoscale
    ● Open Source Developer
    ● Monitoring & Distributed Systems Enthusiast

    View Slide

  3. @pyr
    Billing the cloud
    Real world stream processing

    View Slide

  4. @pyr
    ● Billing resources
    ● Scaling methodologies
    ● Our approach

    View Slide

  5. @pyr

    View Slide

  6. @pyr
    provider "exoscale" {
    api_key = "${var.exoscale_api_key}"
    secret_key = "${var.exoscale_secret_key}"
    }
    resource "exoscale_instance" "web" {
    template = "ubuntu 17.04"
    disk_size = "50g"
    template = "ubuntu 17.04"
    profile = "medium"
    ssh_key = "production"
    }

    View Slide

  7. View Slide

  8. View Slide

  9. @pyr
    Infrastructure isn’t free!
    (sorry)

    View Slide

  10. @pyr
    Business Model
    ● Provide cloud infrastructure
    ● (???)
    ● Profit!

    View Slide

  11. View Slide

  12. View Slide

  13. @pyr
    10000 mile high view

    View Slide

  14. View Slide

  15. Quantities

    View Slide

  16. Quantities
    ● 10 megabytes have been set from
    159.100.251.251 over the last minute

    View Slide

  17. Resources

    View Slide

  18. Resources
    ● Account WAD started instance foo with profile
    large today at 12:00
    ● Account WAD stopped instance foo today at
    12:15

    View Slide

  19. A bit closer to reality
    {:type :usage
    :entity :vm
    :action :create
    :time #inst "2016-12-12T15:48:32.000-00:00"
    :template "ubuntu-16.04"
    :source :cloudstack
    :account "geneva-jug"
    :uuid "7a070a3d-66ff-4658-ab08-fe3cecd7c70f"
    :version 1
    :offering "medium"}

    View Slide

  20. A bit closer to reality
    message IPMeasure {
    /* Versioning */
    required uint32 header = 1;
    required uint32 saddr = 2;
    required uint64 bytes = 3;
    /* Validity */
    required uint64 start = 4;
    required uint64 end = 5;
    }

    View Slide

  21. @pyr
    Theory

    View Slide

  22. @pyr
    Quantities are simple

    View Slide

  23. View Slide

  24. @pyr
    Resources are harder

    View Slide

  25. View Slide

  26. @pyr
    This is per account

    View Slide

  27. View Slide

  28. @pyr
    Solving for all events

    View Slide

  29. resources = {}
    metering = []
    def usage_metering():
    for event in fetch_all_events():
    uuid = event.uuid()
    time = event.time()
    if event.action() == 'start':
    resources[uuid] = time
    else:
    timespan = duration(resources[uuid], time)
    usage = Usage(uuid, timespan)
    metering.append(usage)
    return metering

    View Slide

  30. @pyr
    In Practice

    View Slide

  31. @pyr
    ● This is a never-ending process
    ● Minute-precision billing
    ● Applied every hour

    View Slide

  32. @pyr
    ● Avoid overbilling at all cost
    ● Avoid underbilling (we need to eat!)

    View Slide

  33. @pyr
    ● Keep a small operational footprint

    View Slide

  34. @pyr
    A naive approach

    View Slide

  35. 30 * * * * usage-metering >/dev/null 2>&1

    View Slide

  36. View Slide

  37. @pyr
    Advantages

    View Slide

  38. @pyr
    ● Low operational overhead
    ● Simple functional boundaries
    ● Easy to test

    View Slide

  39. @pyr
    Drawbacks

    View Slide

  40. @pyr
    ● High pressure on SQL server
    ● Hard to avoid overlapping jobs
    ● Overlaps result in longer metering intervals

    View Slide

  41. You are in a room full of overlapping cron jobs.
    You can hear the screams of a dying MySQL server.
    An Oracle vendor is here.
    To the West, a door is marked “Map/Reduce”
    To the East, a door is marked “Stream Processing”

    View Slide

  42. > Talk to Oracle

    View Slide

  43. You’ve been eaten by a grue.

    View Slide

  44. > Go West

    View Slide

  45. @pyr

    View Slide

  46. @pyr
    ● Conceptually simple
    ● Spreads easily
    ● Data locality aware processing

    View Slide

  47. @pyr
    ● ETL
    ● High latency
    ● High operational overhead

    View Slide

  48. > Go East

    View Slide

  49. @pyr

    View Slide

  50. @pyr
    ● Continuous computation on an unbounded stream
    ● Each record processed as it arrives
    ● Very low latency

    View Slide

  51. @pyr
    ● Conceptually harder
    ● Where do we store intermediate results?
    ● How does data flow between computation steps?

    View Slide

  52. @pyr
    Deciding factors

    View Slide

  53. @pyr
    Our shopping list
    ● Operational simplicity
    ● Integration through our whole stack
    ● Room to grow

    View Slide

  54. @pyr
    Operational simplicity
    ● Experience matters
    ● Spark and Storm are intimidating
    ● Hbase & Hive discarded

    View Slide

  55. @pyr
    Integration
    ● HDFS & Kafka require simple integration
    ● Spark goes hand in hand with Cassandra

    View Slide

  56. @pyr
    Room to grow
    ● A ton of logs
    ● A ton of metrics

    View Slide

  57. @pyr
    Small confession
    ● Previously knew Kafka

    View Slide

  58. @pyr

    View Slide

  59. View Slide

  60. @pyr
    ● Publish & Subscribe
    ● Processing
    ● Store

    View Slide

  61. @pyr
    Publish & Subscribe
    ● Records are produced on topics
    ● Topics have a predefined number of partitions
    ● Records have a key which determines their
    partition

    View Slide

  62. @pyr
    ● Consumers get assigned a set of partitions
    ● Consumers store their last consumed offset
    ● Brokers own partitions, handle replication

    View Slide

  63. View Slide

  64. @pyr
    ● Stable consumer topology
    ● Memory disaggregation
    ● Can rely on in-memory storage
    ● Age expiry and log compaction

    View Slide

  65. @pyr

    View Slide

  66. @pyr
    Billing at Exoscale

    View Slide

  67. View Slide

  68. View Slide

  69. View Slide

  70. @pyr
    Problem solved?

    View Slide

  71. @pyr
    ● Process crashes
    ● Undelivered message?
    ● Avoiding overbilling

    View Slide

  72. @pyr
    Reconciliation
    ● Snapshot of full inventory
    ● Converges stored resource state if necessary
    ● Handles failed deliveries as well

    View Slide

  73. @pyr
    Avoiding overbilling
    ● Reconciler acts as logical clock
    ● When supplying usage, attach a unique transaction ID
    ● Reject multiple transaction attempts on a single ID

    View Slide

  74. @pyr
    Avoiding overbilling
    ● Reconciler acts as logical clock
    ● When supplying usage, attach a unique transaction ID
    ● Reject multiple transaction attempts on a single ID

    View Slide

  75. @pyr
    Parting words

    View Slide

  76. @pyr
    Looking back
    ● Things stay simple (roughly 600 LoC)
    ● Room to grow
    ● Stable and resilient
    ● DNS, Logs, Metrics, Event Sourcing

    View Slide

  77. @pyr
    What about batch?
    ● Streaming doesn’t work for everything
    ● Sometimes throughput matters more than latency
    ● Building models in batch, applying with stream
    processing

    View Slide

  78. @pyr
    Thanks!
    Questions?

    View Slide