only slightly removed from pure thought-stuff. He builds his castles in the air, from air, creating by exertion of the imagination.” Frederick P. Brooks Jr., The Mythical Man-Month: Essays on Software Engineering
like open source and exploring new tech. Distrust software @alonisser on twitter, medium, Github, Gmail We’re helping Data-driven Decision Making for Local Governments of all sizes and shapes. Powered by AI, Zencity transforms data from all of the touchpoints residents have with their city into actionable insights Passionate about local government? Want to build new systems in a growing startup? Come work with us. We’re hiring across the board.
with knowledge. Epistemologists study the nature, origin, and scope of knowledge, epistemic justification, the rationality of belief, and various related issues
of data? Or the Azure portal UI showing a storage account with 150TiB? Epistemology: The branch of philosophy concerned with knowledge. Epistemologists study the nature, origin, and scope of knowledge, epistemic justification, the rationality of belief, and various related issues
to remove data. We try to remove quite a lot of data (in unused fields and inactive clients) but data usage is not going down. We’ve reached for Azure for help They are as clueless as we are Resizing the cluster has some effect, but far from what we need. What is going on?
(at least for converging series given Achilles and the tortoise aren’t moving in a specific non converging series speed) The main point here is the duality of time/space (or of numbers) which can be seen as an infinite series divisible to measurable yet smaller units but is also a continuum. So reality is not only different from what we perceive but also, different representations of “things” can be “true” in the same time (This doesn’t mean there is no truth, or that all representations are truthful)
an abstraction allowing us to query with spark using SQL like syntax a bunch of parquet files while maintaing “ACID” like guarantees • But the delta_log files tracks changes, keep versions of possible rollback and references lot’s of files that aren’t in use anymore. Results of OPTIMIZE commands. Awaiting a VACUUM command to remove them. • Alas, no VACUUM command ever came
the bytes data in the abstraction of a data in a “table” that can be queried by spark sql • While the storage account measured actual bytes on disk Both are Bytes. But different
someone else! It’s running in our datacenter so low ms of latency • It’s fast, and scalable • …. until the number of writes/reads jumps over a certain threshold and suddenly it’s SLOWWWWW.
abstraction, data is still being read from disks. And while SSD is fast. Our cloud provider sets a limit on provisioned IOPS (Basically disk activities) . As our usage increased. We’ve hit the limit and then read/write IO becomes throttled by the cloud provider
While our technology achievements often obscure the dirt underneath. The laws of physics still apply to our great castles of the mind . And underneath it’s the messy reality of silicon, Disks rolling and searching for the correct sector to read from and messy data transmitted over noisy networks Both representations are true
exists in different level of abstractions - accept that • Try to build a mental model of technology you use. (Mental model != know every knob and detail) • Don’t be afraid to peak underneath. • The truth is out there (at least multiple, eventually consistent versions of it ¯\_(ツ)_/¯) • Uncovering it might require us to venture out of our comfort zone