Upgrade to Pro — share decks privately, control downloads, hide ads and more …

When to rebuild things that already exist

When to rebuild things that already exist

This year I built a library that already exists. The existing solutions didn’t quite meet my needs, I wanted something that ticked all of my boxes. When thinking about building something new people referred me to xkcd #927. But I did it anyway.

For the last 6 years I’ve maintained dask-kubernetes, a Python library for deploying Dask clusters on Kubernetes. In that time I’ve tried nearly every Python Kubernetes client library on PyPI. In fact dask-kubernetes today uses over five different libraries and tools to interact with the Kubernetes API. Each one has different strengths and weaknesses, features and bugs. To satisfy all of the needs of Dask Kubernetes there is no one library that can do it alone.

Should I continue to build wrappers and shims in dask-kubernetes to homogenize the various dependencies? Should I contribute to an existing one to fill in the blanks? Or can I build one library to rule them all?

Earlier this year I decided to build exactly the library I needed. Not a perfect universal library to supersede everything, not a wrapper for everything that exists. Just the library I need to solve my problems, to reduce complexity in my projects and to help me learn the things I need to know to maintain these projects into the future.

In this talk I will dig into my perspective on when to wrap a dependency, when to contribute to a dependency and when to build a new dependency from scratch.

Jacob Tomlinson

September 24, 2024
Tweet

More Decks by Jacob Tomlinson

Other Decks in Technology

Transcript

  1. 0000001111101000011111000100000111100101011110010110100001110100011110100001000111110 0111001100000100010000011111000001101110100100001111000000101101000100101010110111010 0010111000111011111101110011010101010101001101101100011101101010011110000101000000000 0011111010000111110001000001111001010111100101101000011101000111101000010001111100111 0011000001000100000111110000011011101001000011110000001011010001001010101101110100010 1110001110111111011100110101010101010011011011000111011010100111100001010000000000011 1110100001111100010000011110010101111001011010000111010001111010000100011111001110011 0000010001000001111100000110111010010000111100000010110100010010101011011101000101110 0011101111110111001101010101010100110110110001110110101001111000010100000000000111110 1000011111000100000111100101011110010110100001110100011110100001000111110011100110000

    0100010000011111000001101110100100001111000000101101000100101010110111010001011100011 1011111101110011010101010101001101101100011101101010011110000101000000000001111101000 0111110001000001111001010111100101101000011101000111101000010001111100111001100000100 0100000111110000011011101001000011110000001011010001001010101101110100010111000111011 1111011100110101010101010011011011000111011010100111100001010000000000011111010000111 1100010000011110010101111001011010000111010001111010000100011111001110011000001000100 0001111100000110111010010000111100000010110100010010101011011101000101110001110111111 0111001101010101010100110110110001110110101001111000010100000000000111110100001111100 0100000111100101011110010110100001110100011110100001000111110011100110000010001000001 1111000001101110100100001111000000101101000100101010110111010001011100011101111110111 0011010101010101001101101100011101101010011110000101000000000001111101000011111000100 0001111001010111100101101000011101000111101000010001111100111001100000100010000011111 0000011011101001000011110000001011010001001010101101110100010111000111011111101110011 0101010101010011011011000111011010100111100001010000001011010000111010110100010111000 When to rebuild things that already exist Jacob Tomlinson
  2. The Problem Dask Kubernetes had grown to use many different

    Kubernetes libraries because none of them individually had all the features we needed. Some libraries are also much more verbose than others and most have poor documentation. Using many Kubernetes libraries made things hard to maintain.
  3. Library pros/cons Pros Official, lots of examples Semi-official, asyncio Pleasant

    API Pleasant API Familiar to k8s users Familiar, nice Python API kubernetes kubernetes_asyncio pykube-ng kopf kubectl (via subprocess) kubectl (via pytest-kind) Cons Autogenerated, no asyncio Autogenerated, poor docs No asyncio, no port-forward Only useful for controller Adds binary dependency Can only be used in tests
  4. Can I bridge the gap by contributing? kubernetes and kubernetes_asyncio

    are both auto-generated libraries using the OpenAPI specification. They are generally not open to Pull Requests that fundamentally change the design as it would conflict with the auto-generation. pykube-ng is built with requests and has no asyncio support. Adding async support would mean rebuilding the entire internals of the library. kopf is designed for building operators in an event driven manner. It’s components would be hard to factor out into an imperative Kubernetes client. kubectl (via subprocess) is a widely-adopted Go application that would be hard to modify if needed and unpleasant to expose in Python via subprocess. kubectl (via pytest-kind) adds some syntactic sugar around subprocess in pytest-kind fixtures but would need to be extracted and released separately for use in our code.
  5. What about a wrapper? If I can’t add what I

    need upstream can I wrap or shim my dependencies to fill in gaps and expose a consistent internal API?
  6. Blocker aiopykube was a wrapper around pykube-ng that made HTTP

    requests to the Kubernetes API using requests, but to work with asyncio it ran each call via a threadpool executor. When it came to implementing a pure-Python port forward using websockets it was clear this would need low-level access to requests and would be a huge challenge with this design. aiopykube as a wrapper was a dead end.
  7. “I wish I had something that feels like kubectl, but

    in pure Python.” Me, 6 months ago
  8. Should I build a new library? Fortunately, the charging one

    has been solved now that we've all standardized on mini-USB. Or is it micro-USB? Shit. https://xkcd.com/927/
  9. Wishlist for a new library • The existing libraries all

    expose the user experience of the Kubernetes API in Python. I want to expose the user experience of kubectl. • I want a library that supports both sync and async usage. • I want to stop using auto-generated code and make strict client-side schema validation optional. • I want authentication and client creation to be implicit. • I want pure Python implementations of quality-of-life features like port forwarding but with APIs as human as kubectl.
  10. Goals • Make Dask Kubernetes more maintainable and reduce contributor

    friction. • Build the library that I want to use. • Learn more about how the Kubernetes API works, but shield others from it’s complexity. Non-Goals • Create a new standard. • Solve everyone’s problems. • Get everyone to adopt it.
  11. Should I stop and adopt lightkube instead? I’ve already achieved

    my goal of learning more about the Kubernetes API, and it meets some of my requirements. Maybe this is the library I want to use?
  12. Goals from lightkube • Extensive type hints to avoid common

    mistakes and to support autocompletion. • Models and resources generated from the swagger specifications using standard dataclasses. • Support for installing a specific version of the kubernetes models (1.15 to 1.27). • Lazy instantiation of inner models.
  13. When should you build from scratch? • When you can’t

    contribute to or wrap an existing project. • When your needs are met by something fundamentally different. • When you find a niche, don’t just rebuild everything that exists. • When you’ve done your research and exhausted your options. • When you want to learn how something works.
  14. Final thoughts Come and see me for a sticker •

    Building something from scratch will take longer than you think. It’s a commitment. • Don’t hesitate to rebuild things simply to learn! • Consider adding interoperability with existing solutions. • There are many valid reasons to start from scratch. • Check out the companion blog post to this talk for a deeper dive into everything that wouldn’t fit into this talk. http://jacobtomlinson.dev/posts/2023/when-to-rebuild-things-that-already-exist/
  15. 0000001111101000011111000100000111100101011110010110100001110100011110100001000111110 0111001100000100010000011111000001101110100100001111000000101101000100101010110111010 0010111000111011111101110011010101010101001101101100011101101010011110000101000000000 0011111010000111110001000001111001010111100101101000011101000111101000010001111100111 0011000001000100000111110000011011101001000011110000001011010001001010101101110100010 1110001110111111011100110101010101010011011011000111011010100111100001010000000000011 1110100001111100010000011110010101111001011010000111010001111010000100011111001110011 0000010001000001111100000110111010010000111100000010110100010010101011011101000101110 0011101111110111001101010101010100110110110001110110101001111000010100000000000111110 1000011111000100000111100101011110010110100001110100011110100001000111110011100110000

    0100010000011111000001101110100100001111000000101101000100101010110111010001011100011 1011111101110011010101010101001101101100011101101010011110000101000000000001111101000 0111110001000001111001010111100101101000011101000111101000010001111100111001100000100 0100000111110000011011101001000011110000001011010001001010101101110100010111000111011 1111011100110101010101010011011011000111011010100111100001010000000000011111010000111 1100010000011110010101111001011010000111010001111010000100011111001110011000001000100 0001111100000110111010010000111100000010110100010010101011011101000101110001110111111 0111001101010101010100110110110001110110101001111000010100000000000111110100001111100 0100000111100101011110010110100001110100011110100001000111110011100110000010001000001 1111000001101110100100001111000000101101000100101010110111010001011100011101111110111 0011010101010101001101101100011101101010011110000101000000000001111101000011111000100 0001111001010111100101101000011101000111101000010001111100111001100000100010000011111 0000011011101001000011110000001011010001001010101101110100010111000111011111101110011 0101010101010011011011000111011010100111100001010000001011010000111010110100010111000 Thank you @_jacobtomlinson https://jacobtomlinson.dev