at least once per 48 hours (some of that more often) In our database there are 1.2 billion of flights (we update around 3.5 billion each day) - it’s only for 6 months to the future This is around 2TB of data stored
contains things like: one way, return, direct and with stops flights (we call them segments) weird pricing policies, fare classes, special offers, child and infant prices, point of sales, different providers, …
aware of where flights should be. She used Redis to store her data there. It is routing based on combination of data - (source, destination, date_of_departure)
with PostgreSQL write speed So the engineer wrote Redis layer for cache PostgreSQL data And wrote insanely complex update/ invalidate mechanism in order to be able to make all things in PostgreSQL consistent with as few writes possible
composite ( (partition1, partition2), cluster1, cluster2, …) You need to specify whole partition key in WHERE clause and then you can add clustering keys one by one (in order), you can use ranges on last used clustering key There are secondary indexes, but you should not use them because of performance
don’t need Redis cache layer It is a cluster - we don’t need any “router” to manually shard data in application It have redundancy and all other features
loaded from older solution and from new solution We’ll have a python module in place for reading those data in all other parts of our system (right now under development)