Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Epistemology (and disks)

Alonisser
September 14, 2021

Epistemology (and disks)

Alonisser

September 14, 2021
Tweet

Other Decks in Programming

Transcript

  1. Castles of the mind The programmer, like the poet, works

    only slightly removed from pure thought-stuff. He builds his castles in the air, from air, creating by exertion of the imagination.” Frederick P. Brooks Jr., The Mythical Man-Month: Essays on Software Engineering
  2. Hi, I’m Alon Nisser and I work in zencity.io I

    like open source and exploring new tech. Distrust software @alonisser on twitter, medium, Github, Gmail We’re helping Data-driven Decision Making for Local Governments of all sizes and shapes. Powered by AI, Zencity transforms data from all of the touchpoints residents have with their city into actionable insights Passionate about local government? Want to build new systems in a growing startup? Come work with us. We’re hiring across the board.
  3. Epistemology From greek: epistēmē 'knowledge' The branch of philosophy concerned

    with knowledge. Epistemologists study the nature, origin, and scope of knowledge, epistemic justification, the rationality of belief, and various related issues
  4. Epistemology From greek: epistēmē 'knowledge' Asks questions like: What is

    knowledge? How can a belief be justified? How do we know that something is true?
  5. Now that’s real big data 150TiB !! 6000$ a month

    just for storage But that’s doesn’t seem to be our data size
  6. Should I believe the “table” UI showing my mere 100GB

    of data? Or the Azure portal UI showing a storage account with 150TiB? Epistemology: The branch of philosophy concerned with knowledge. Epistemologists study the nature, origin, and scope of knowledge, epistemic justification, the rationality of belief, and various related issues
  7. Tragedy was unfolding in the Slack channel Obviously we need

    to remove data. We try to remove quite a lot of data (in unused fields and inactive clients) but data usage is not going down. We’ve reached for Azure for help They are as clueless as we are Resizing the cluster has some effect, but far from what we need. What is going on?
  8. Epistemology While there is a mathematical “solution” to this problem

    (at least for converging series given Achilles and the tortoise aren’t moving in a specific non converging series speed) The main point here is the duality of time/space (or of numbers) which can be seen as an infinite series divisible to measurable yet smaller units but is also a continuum. So reality is not only different from what we perceive but also, different representations of “things” can be “true” in the same time (This doesn’t mean there is no truth, or that all representations are truthful)
  9. What is a delta table • A delta table is

    an abstraction allowing us to query with spark using SQL like syntax a bunch of parquet files while maintaing “ACID” like guarantees • But the delta_log files tracks changes, keep versions of possible rollback and references lot’s of files that aren’t in use anymore. Results of OPTIMIZE commands. Awaiting a VACUUM command to remove them. • Alas, no VACUUM command ever came
  10. What is a delta table • The table UI measured

    the bytes data in the abstraction of a data in a “table” that can be queried by spark sql • While the storage account measured actual bytes on disk Both are Bytes. But different
  11. Running Hosted Mongo in the clouds • It’s managed by

    someone else! It’s running in our datacenter so low ms of latency • It’s fast, and scalable • …. until the number of writes/reads jumps over a certain threshold and suddenly it’s SLOWWWWW.
  12. Running Hosted Mongo in the clouds • Underneath this great

    abstraction, data is still being read from disks. And while SSD is fast. Our cloud provider sets a limit on provisioned IOPS (Basically disk activities) . As our usage increased. We’ve hit the limit and then read/write IO becomes throttled by the cloud provider
  13. Running Hosted Mongo in the clouds • And just to

    make things a bit more confusing, The cloud provider decides the “allowed” IOPS level on SSD premium disks according to the disk Size...
  14. Running Hosted Mongo in the clouds Both of those represent

    the same thing. But in different layers.
  15. Castles of the mind and castles of dirt and flesh

    While our technology achievements often obscure the dirt underneath. The laws of physics still apply to our great castles of the mind . And underneath it’s the messy reality of silicon, Disks rolling and searching for the correct sector to read from and messy data transmitted over noisy networks Both representations are true
  16. Epistemology and Programming - Wrapping up • Different Realities might

    exists in different level of abstractions - accept that • Try to build a mental model of technology you use. (Mental model != know every knob and detail) • Don’t be afraid to peak underneath. • The truth is out there (at least multiple, eventually consistent versions of it ¯\_(ツ)_/¯) • Uncovering it might require us to venture out of our comfort zone
  17. Read/watch • https://www.slideshare.net/holograph/how-shit-works-storage (And the rest of how things works

    youtube videos by Tomer Gabel) • Explaining the Zeno paradox to a child (in Slate) • Numbers every programmer should know