Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Fuel Prices Exposed

alcaeus
October 09, 2023

Fuel Prices Exposed

Held at Symfony Live Berlin 2023.

alcaeus

October 09, 2023
Tweet

More Decks by alcaeus

Other Decks in Programming

Transcript

  1. Station Data What are we working with? { "_id": "0e18d0d3-ed38-4e7f-a18e-507a78ad901d",

    "brand": "OIL!", "city": "München", "first_active": "1970-01-01 01:00:00+01", "house_number": "", "latitude": "48.1807", "longitude": "11.4609", "name": "OIL! Tankstelle München", "openingtimes_json": "...", "post_code": "80999", "street": "Eversbuschstraße 33" }
  2. Price Reports All prices { "_id": ..., "date": "2022-01-01 00:01:08+01",

    "station_uuid": "227a99f5-0be5-49f1-8f58-645ab5da32a8", "diesel": "1.609", "e5": "1.769", "e10": "0.000", "dieselchange": "1", "e5change": "1", "e10change": "0" }
  3. SQL Queries It can be so easy SELECT <columns> FROM

    <table> JOIN <table> WHERE <predicate on rows> GROUP BY <columns> HAVING <predicate on groups> ORDER BY <columns> OFFSET FETCH FIRST
  4. SQL Queries Do we really SELECT FROM? JOIN <table> WHERE

    <predicate on rows> GROUP BY <columns> HAVING <predicate on groups> ORDER BY <columns> OFFSET FETCH FIRST SELECT <columns> FROM <table>
  5. SQL Queries Who cares about order? JOIN <table> WHERE <predicate

    on rows> GROUP BY <columns> HAVING <predicate on groups> ORDER BY <columns> OFFSET FETCH FIRST FROM <table> SELECT <columns>
  6. SQL Queries Tomayto, Tomahto FROM <table> JOIN <table> GROUP BY

    <columns> SELECT <columns> ORDER BY <columns> OFFSET FETCH FIRST WHERE <predicate on rows> HAVING <predicate on groups>
  7. Aggregation Pipeline Write it like this db.collection.aggregate([ { $match: {

    ... }}, { $lookup: { ... }}, { $match: { ... }}, { $group: { ... }}, { $match: { ... }}, { $project: { ... }}, { $sort: { ... }}, { $skip: { ... }}, { $limit: { ... }}, ]);
  8. db.collection.aggregate([ { $lookup: { ... }}, { $group: { ...

    }}, { $project: { ... }}, { $sort: { ... }}, { $skip: { ... }}, { $limit: { ... }}, ]); Aggregation Pipeline Duplicate stages { $match: { ... }}, { $match: { ... }}, { $match: { ... }},
  9. Aggregation Pipeline Debug: run part of a pipeline db.collection.aggregate([ {

    $match: { ... }}, { $lookup: { ... }}, { $match: { ... }}, { $group: { ... }}, { $match: { ... }}, { $project: { ... }}, { $sort: { ... }}, { $skip: { ... }}, { $limit: { ... }}, ]);
  10. Aggregation Pipeline Create views db.createView('viewName', 'collection', [ { $match: {

    ... }}, { $lookup: { ... }}, { $match: { ... }}, { $group: { ... }}, { $match: { ... }}, { $project: { ... }}, { $sort: { ... }}, { $skip: { ... }}, { $limit: { ... }}, ]);
  11. Aggregation Pipeline On-demand materialised views { $out: { ... }},

    db.collection.aggregate([ { $match: { ... }}, { $lookup: { ... }}, { $match: { ... }}, { $group: { ... }}, { $match: { ... }}, { $project: { ... }}, { $sort: { ... }}, { $skip: { ... }}, { $limit: { ... }}, ]);
  12. Aggregation Pipeline Station Data { $replaceWith: { _id: "$_id", name:

    "$name", brand: "$brand", location: { type: "Point", coordinates: [ { $toDouble: "$longitude" }, { $toDouble: "$latitude" }, ], }, address: { street: "$street", number: "$house_number", postCode: "$post_code", city: "$city", }, }}
  13. What’s A Station? Cleaned up and organised { "_id": "0e18d0d3-ed38-4e7f-a18e-507a78ad901d",

    "name": "OIL! Tankstelle München", "brand": "OIL!", "location": { "type": "Point", "coordinates": [11.4609, 48.1807] }, "address": { "street": "Eversbuschstraße 33", "number": "", "postCode": "80999", "city": "München" } }
  14. What’s A Station? Cleaned up and organised { "_id": "0e18d0d3-ed38-4e7f-a18e-507a78ad901d",

    "name": "OIL! Tankstelle München", "brand": "OIL!", "location": { "type": "Point", "coordinates": [11.4609, 48.1807] }, } "address": { "street": "Eversbuschstraße 33", "number": "", "postCode": "80999", "city": "München" }
  15. { "_id": "0e18d0d3-ed38-4e7f-a18e-507a78ad901d", "name": "OIL! Tankstelle München", "brand": "OIL!", "address":

    { "street": "Eversbuschstraße 33", "number": "", "postCode": "80999", "city": "München" } } What’s A Station? Cleaned up and organised "location": { "type": "Point", "coordinates": [11.4609, 48.1807] }, }
  16. Aggregation Pipeline Price reports [ { $set: { date: {

    $dateFromString: { dateString: "$date" } }, station: { _id: "$station_uuid" }, changes: [ { fuel: "diesel", changed: { $toBool: { $toInt: "$dieselchange" } }, price: { $toDouble: "$diesel" }, }, { fuel: "e10", changed: { $toBool: { $toInt: "$e10change" } }, price: { $toDouble: "$e10" }, }, { fuel: "e5", changed: { $toBool: { $toInt: "$e5change" } }, price: { $toDouble: "$e5" }, }, ], }}, { $set: { day: { $dateTrunc: { date: "$date", unit: "day", timezone: "+01", }}, changes: { $filter: { input: "$changes", cond: { $eq: ["$$this.changed", true] }, }}, }}, { $unwind: "$changes" }, { $replaceWith: { _id: { $concat: ["$_id", "-", "$changes.fuel"] }, date: "$date", day: "$day", station: "$station", fuel: "$changes.fuel", price: "$changes.price", }}, ]
  17. Price Change Make it easier to handle { "_id": "2022-01-01-prices-5-diesel",

    "date": { "$date": "2021-12-31T23:01:08.000Z" }, "day": { "$date": "2021-12-31T23:00:00.000Z" }, "station": { "_id": "227a99f5-0be5-49f1-8f58-645ab5da32a8" }, "fuel": "diesel", "price": 1.609 } { "_id": "2022-01-01-prices-5-e5", "date": { "$date": "2021-12-31T23:01:08.000Z" }, "day": { "$date": "2021-12-31T23:00:00.000Z" }, "station": { "_id": "227a99f5-0be5-49f1-8f58-645ab5da32a8" }, "fuel": "e5", "price": 1.769 }
  18. Import All Data Clone the repository git clone https://[email protected]/... Cloning

    into 'tankerkoenig-data'... remote: Azure Repos remote: Found 19574 objects to send. (376 ms) Receiving objects: 0% (64/19574), 83.04 MiB | 20.74 MiB/s
  19. Import All Data What have we got? $ du -hcs

    * 8.0K README.md 57G prices 7.1G stations 64G total $ wc -l prices/*/*/*.csv ... 731239757 total
  20. Import All Data Reduce the dataset $ wc -l prices/2022/*/*.csv

    ... 123735340 total $ wc -l stations/2022/01/2022-01-01-stations.csv 16480 stations/2022/01/2022-01-01-stations.csv
  21. Import All Data 2022 in numbers • ~124 million price

    records • 339000 per day • 16480 stations
  22. Optimise Storage Bucket Pattern { "day": { "$date": "2021-12-31T23:00:00.000Z" },

    "station": { "_id": "227a99f5-0be5-49f1-8f58-645ab5da32a8" }, "fuel": "diesel", "prices": [ { "_id": "2022-01-01-prices-5-diesel", "date": { "$date": "2021-12-31T23:01:08.000Z" }, "price": 1.609 }, { "_id": "2022-01-01-prices-224783-diesel", "date": { "$date": "2022-01-01T16:59:10.000Z" }, "price": 1.579 } ] }
  23. weightedAverage: { $round: [ { $divide: [ { $reduce: {

    input: "$prices", initialValue: 0, in: { $add: [ "$$value", { $multiply: [ "$$this.seconds", "$$this.price" ] }, ]}, }}, { $dateDiff: { startDate: "$day", endDate: { $dateAdd: { startDate: "$day", unit: "day", amount: 1, }}, unit: "second", }}, ]}, 3, ]}, }}, ]
  24. Daily Bucket Metadata { "_id": { "$oid": "650063b7630a232653ab7817" }, "day":

    { "$date": "2022-01-01T23:00:00.000Z" }, "fuel": "diesel", "station": { "_id": "278130b1-e062-4a0f-80cc-19e486b4c024", "name": "Aral Tankstelle", "brand": "ARAL", "location": { "type": "Point", "coordinates": [13.4214869, 52.5141525] }, "address": { "postCode": "10179" } }, ... }
  25. Daily Bucket Full price list { ... "prices": [ {

    "_id": "2022-01-02-prices-16852-diesel", "date": { "$date": "2022-01-02T06:02:11.000Z" }, "price": 1.639, "previousPrice": 1.569 }, { "_id": "2022-01-02-prices-49287-diesel", "date": { "$date": "2022-01-02T08:01:08.000Z" }, "price": 1.689, "previousPrice": 1.639 } ], ... }
  26. Daily Bucket Daily price data { ... "openingPrice": 1.569, "closingPrice":

    1.629, "weightedAverage": 1.591, "lowestPrice": { "_id": "2022-01-02-prices-127016-diesel", "date": { "$date": "2022-01-02T12:05:08.000Z" }, "price": 1.549 }, "highestPrice": { "_id": "2022-01-02-prices-76907-diesel", "date": { "$date": "2022-01-02T09:31:09.000Z" }, "price": 1.709 }, ... }
  27. Daily Bucket Daily averages { ... "dailyAverage": { "lowestPrice": 1.399,

    "highestPrice": 1.939, "averagePrice": 1.575, "percentiles": { "p50": 1.572, "p90": 1.62, "p95": 1.636, "p99": 1.725 } } }
  28. Numbers, At Last Di ff erent Format? Rank Brand Changes

    Diesel e5 e10 1 LD Tankestelle Westheim 101.91 34.04 33.94 33.94 2 Rapp - Freie Tankstelle 85.71 27.87 29.30 29.29 3 Freie Tankstelle Endrass 85.71 26.35 29.68 29.68 4 Sb 79.48 25.89 26.75 26.84 5 DBV 78.93 22.50 22.86 33.56