Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Nathaniel Manista, Augie Fackler - Code Unto Others

Nathaniel Manista, Augie Fackler - Code Unto Others

Large codebases written by many authors over long periods of time too often become tragedies of the commons riddled with complexity and technical debt. We’ll cover the pathologies that specifically encumber collaborative software development (drawing on examples from the Mercurial codebase) and describe alternative practices, their efficacy, and the costs of adopting them.


PyCon 2016

May 29, 2016

More Decks by PyCon 2016

Other Decks in Programming


  1. Augie is a co-maintainer of Mercurial; Nathaniel works on gRPC.

    NEXT: Let's jump right into some code... Code Unto Others Augie Fackler & Nathaniel Manista Google, Inc. 30 May 2016 Code Unto Others http://localhost:8080/print/ 1 of 75 5/30/16, 09:41
  2. We're just going to jump right into code: here's a

    class in a years-established, widely-used codebase * 1700 lines! * 74 public methods and 36 private methods! * 18 public attributes and 9 private attributes! Code Unto Others http://localhost:8080/print/ 2 of 75 5/30/16, 09:41
  3. Okay so we're good, right? Quick lecture? Is this a

    problem? Not for compiler, interpreter, or tools that process the class. Not for the hardware that executes the class. Not for users who use the software built with the class. Code Unto Others http://localhost:8080/print/ 3 of 75 5/30/16, 09:41
  4. Most software is developed collaboratively, so this is a problem.

    The guilty party here is localrepo.localrepository in Mercurial, which we picked as a test case because the maintainers already know it's a problem and it's publicly visible. Software is Made of People Code Unto Others http://localhost:8080/print/ 4 of 75 5/30/16, 09:41
  5. This is exactly what we call "readability". You don't scale

    in time: you won't be on the project forever. You don't scale in communication: your project may have hundreds of collaborating developers maintaining its code and tens of thousands of developer-users writing other software that relies on it. NEXT: Hierarchy of code needs. Readability Your software needs to describe itself to readers the way you would describe it. Code Unto Others http://localhost:8080/print/ 5 of 75 5/30/16, 09:41
  6. Readability is third priority at best, but orthogonal to the

    others so there's no excuse. NEXT: Our first specific complaint with localrepository is lack of cohesion. Correct Efficient Readable Self-Actualization Kill All Humans Code Unto Others http://localhost:8080/print/ 6 of 75 5/30/16, 09:41
  7. A grab-bag with too little relationship among its parts. Roughly,

    this class is serving three roles: storage logic, business logic on top of that storage, and being a data container for related data. It could probably be (at least) three layers of objects via composition. Lack Of Cohesion Code Unto Others http://localhost:8080/print/ 7 of 75 5/30/16, 09:41
  8. Some of its methods do actual work; some merely wrap

    work-doing-methods for convenience. At least the "set" method comes out and says so in an implementation comment. Mixes function and convenience def set(self, expr, *args): '''Find revisions matching a revset and emit changectx instances. This is a convenience wrapper around ``revs()`` that iterates the result and is a generator of changectx instances. ''' for r in self.revs(expr, *args): yield self[r] Code Unto Others http://localhost:8080/print/ 8 of 75 5/30/16, 09:41
  9. Mixes layers of abstraction (Low-level structures and high-level structures that

    use those low-level structures.) Code Unto Others http://localhost:8080/print/ 9 of 75 5/30/16, 09:41
  10. Most people's working memory is seven plus or minus two

    items. Ten is a sharp person. Ninety-two is right out. How many is too many? When you can't talk about a group of elements without using the language of subdivision and segmentation, that's when the group is too large. Just too many elements in its API Code Unto Others http://localhost:8080/print/ 10 of 75 5/30/16, 09:41
  11. 1700 lines is roughly a magazine article or long blog

    post. No one should look at a single class and think "I'd better put a pot of coffee on". NEXT: well, that's emotional and subjective, can we be more concrete? Text is too long. Code Unto Others http://localhost:8080/print/ 11 of 75 5/30/16, 09:41
  12. quadratic number of relationships is surprising but also not realistic.

    def long_function(parameter): v1 = # expression using parameter v2 = # expression using parameter, v1 v3 = # expression using parameter, v1, v2 v4 = # expression using parameter, v1, v2, v3 v5 = # expression using parameter, v1...v4 # # (much more) # v30 = # expression using parameter, v1...v29 return # expression using parameter, v1...v30 Code Unto Others http://localhost:8080/print/ 12 of 75 5/30/16, 09:41
  13. in practice most values are computed from few recently-computed values

    and relationships are linear. NEXT: but the reader doesn't know that in the moment! def more_realistic_long_function(parameter): v1 = # expression using parameter v2 = # expression using parameter, v1 v3 = # expression using parameter, v1, v2 v4 = # expression using v1, v2, v3 v5 = # expression using v3, v4 # # (much more) # v30 = # expression using v28, 29 return # expression using v29, v30 Code Unto Others http://localhost:8080/print/ 13 of 75 5/30/16, 09:41
  14. last line - the reader has to keep parameter and

    v1 through v30 in working memory; all of those values are *potentially* used in that return expression. Code length is about reader's working memory too. Ends of scopes allow the reader to "clear head". NEXT: localrepository not inflicted by enemy. def long_function_as_read(parameter): v1 = # expression using parameter v2 = # expression using parameter, v1 v3 = # expression using parameter, v1, v2 v4 = # expression using v1, v2, v3 v5 = # expression using v3, v4 # # (much more) # v30 = # expression using v28, 29 # Now imagine yourself about to read # the next line... Code Unto Others http://localhost:8080/print/ 14 of 75 5/30/16, 09:41
  15. No ill will was intended by past authors, yet here

    we are. How can we do better? Let's reflect on the software development process and figure out how it got to this undesirable state without an evil actor. TRANSITION: First, we'll talk about what goes into a change. localrepository wasn't inflicted by a malevolent enemy Code Unto Others http://localhost:8080/print/ 15 of 75 5/30/16, 09:41
  16. All changes must satisfy these requirements, and they get reviewed

    and considered mostly on their incremental merits. Occasionally someone will look at the bigger picture, but that's rare. TRANSITION: It's a near-universal truth that we underestimate how long software lives. Requirements of any software change: A change must meet a need in the domain of the software. A change's author must understand the problem sufficiently to create the change. Code Unto Others http://localhost:8080/print/ 16 of 75 5/30/16, 09:41
  17. Structure is typically given only as much attention as is

    required to avoid broken code. And of course authors have to understand the problems that they are commanding computers to solve. TRANSITION: So what we're looking for is some silver bullets. Most of the time we guess about the lifetime of code, we guess short, and the code outlives our projections. Code Unto Others http://localhost:8080/print/ 17 of 75 5/30/16, 09:41
  18. Tools/practices/etc that we can add or can do that cost

    little but that keep the code clean on an incremental basis. Important for these to be cheap because "pay a lot now; you'll be happy in ten years" won't sell. Silver Bullets Code Unto Others http://localhost:8080/print/ 18 of 75 5/30/16, 09:41
  19. Some early work in the field. Most of the time

    people cite this statement it's to describe how easy it is to infer behavior from state relative to hard it is to infer state from behavior. We agree with that, but really we bring it up now to point out that he's saying he can understand things without even mentioning the problem domain. "Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious." — Fred Brooks The Mythical Man-Month Code Unto Others http://localhost:8080/print/ 19 of 75 5/30/16, 09:41
  20. Mastering a particular problem domain can't generically be made easier,

    but readability and maintainability emerge from code structure, not problem domain. Code Unto Others http://localhost:8080/print/ 20 of 75 5/30/16, 09:41
  21. Again we're speaking independently of problem domain. What's in control,

    and to what does it pass control? What's changing, and what observes and reacts to that change? Out-parameters hide change. Mutable global state hides change and runs the risk of being a hidden input to nearly every function in your system. TRANSITION: Let's talk about how we cope with similar things in the physical world. If it's apparent where the state is changed and how program control is passed from one component to another, most of the puzzle is solved. Code Unto Others http://localhost:8080/print/ 21 of 75 5/30/16, 09:41
  22. We want to exclude undesirable influences from the system under

    test. Why do we perform experiments in laboratories? Code Unto Others http://localhost:8080/print/ 22 of 75 5/30/16, 09:41
  23. Failure rates increase when contaminants are circulating around. Why do

    we manufacture materially complex goods in clean rooms? Code Unto Others http://localhost:8080/print/ 23 of 75 5/30/16, 09:41
  24. Deliberately artificially simple environments not only reduce the problems that

    can take place inside them but also reduce the questions that can be asked about how the things inside them work. Do Where the Doing is Simplest Code Unto Others http://localhost:8080/print/ 24 of 75 5/30/16, 09:41
  25. This sounds good, right? Code Where the Coding is Simplest

    Code Unto Others http://localhost:8080/print/ 25 of 75 5/30/16, 09:41
  26. This is just a constant at class scope - what's

    remarkable about it? * Override a superclass constant? * Be overridden by a subclass? * Be shadowed by a subclass's instance variable? * Be used from outside the class? class bar(foo): ANSWER = 42 Code Unto Others http://localhost:8080/print/ 26 of 75 5/30/16, 09:41
  27. Understand that other maintainers of your code are going to

    have those questions about what you put in a class. The maintainer of this code is going to need to know the answers to all of those. Placing a code element at class-scope invites questions. Code Unto Others http://localhost:8080/print/ 27 of 75 5/30/16, 09:41
  28. Avoid placing code elements at class-scope unless you have no

    alternative. Code Unto Others http://localhost:8080/print/ 28 of 75 5/30/16, 09:41
  29. "'only ever used from' implies 'should be syntactically nested inside'"

    is one of the biggest misconceptions we see. Should all programs be written as one giant main method? The class "looks nice" with it there. I really want it there! It's only ever used from the class. Code Unto Others http://localhost:8080/print/ 29 of 75 5/30/16, 09:41
  30. So where do things go? The class cannot function as

    required by its users without the code element at class scope. Code Unto Others http://localhost:8080/print/ 30 of 75 5/30/16, 09:41
  31. Only "promote" code elements into other scopes when required to

    do so. Place all code elements at module-scope by default. Code Unto Others http://localhost:8080/print/ 31 of 75 5/30/16, 09:41
  32. A test input string that is only used in one

    test method: let's not go overboard; that's fine as a local constant in the one test method. Now that we've talked some about where to put code, let's talk about when it's reasonable to use a class. Classes: module-scope. Functions: module-scope. Constants: module-scope. Code Unto Others http://localhost:8080/print/ 32 of 75 5/30/16, 09:41
  33. Only use classes for what classes are for. And what

    are classes for? Be a class realist. Code Unto Others http://localhost:8080/print/ 33 of 75 5/30/16, 09:41
  34. An important note here is that there's no behavior -

    just data. 1. Classes provide a way to structure and aggregate data. Code Unto Others http://localhost:8080/print/ 34 of 75 5/30/16, 09:41
  35. This is probably the most recognizably traditional use of classes.

    4. Classes provide a way to create arbitrarily many instances that behave in ways that are mostly similar, but different according to values specified at construction. Code Unto Others http://localhost:8080/print/ 37 of 75 5/30/16, 09:41
  36. 3 is implementing an abstract type, 4 is parametric behavior

    Maintainable classes typically avoid mixing these. (3) and (4) are okay but not the others. Code Unto Others http://localhost:8080/print/ 38 of 75 5/30/16, 09:41
  37. How can that be? They're so different! Being functions. Code

    Unto Others http://localhost:8080/print/ 40 of 75 5/30/16, 09:41
  38. need state to compute something? Better a public function using

    a private class than a public class to exist just to expose a single public method. my_object = MyClass(construction_parameter) my_value = my_object.my_method(method_parameter) # is the same as my_value = MyClass(construction_parameter).my_method( method_parameter) # rename `my_method` to `__call__`... my_value = MyClass(construction_parameter)( method_parameter) # elide the class entirely my_value = my_function_that_was_a_class( construction_parameter, method_parameter) Code Unto Others http://localhost:8080/print/ 41 of 75 5/30/16, 09:41
  39. Code smell of using classes for namespacing; use module instead.

    Being concrete, but never being instantiated. Code Unto Others http://localhost:8080/print/ 42 of 75 5/30/16, 09:41
  40. when you're writing an interface and a handful of implementations

    of that interface that you think are the only ones that will ever need to exist. Enumerated polymorphism. Code Unto Others http://localhost:8080/print/ 43 of 75 5/30/16, 09:41
  41. this looks like what we just covered... class MyClass(object): MY_CONSTANT

    = 'value in class' @classmethod def my_class_method(cls, parameter): # some implementation Code Unto Others http://localhost:8080/print/ 44 of 75 5/30/16, 09:41
  42. Don't use class objects for enumerated polymorphism. Python has abc

    module for polymorphism; use that. class MyFirstSubClass(MyClass): MY_CONSTANT = 'value in first subclass' @classmethod def my_class_method(cls, parameter): # some different implementation class MySecondSubClass(MyClass): MY_CONSTANT = 'value in second subclass' @classmethod def my_class_method(cls, parameter): # some still different implementation Code Unto Others http://localhost:8080/print/ 45 of 75 5/30/16, 09:41
  43. Science-fiction movie with alien thing falling to earth and taking

    over the scientist who finds and studies it. Scientist becomes villain because thing couldn't be used without its user becoming it. Code Unto Others http://localhost:8080/print/ 46 of 75 5/30/16, 09:41
  44. Intended to be helper code, but "you can't use this

    without becoming it" is a very antisocial of "help". If you just wanted help and your public API is enlarged by accepting it, that's not help! NEXT: Let's talk some about how to maximize the clarity of class implementations when they're in use. Mixins (Eww, yuck!) Code Unto Others http://localhost:8080/print/ 47 of 75 5/30/16, 09:41
  45. We've talked a fair amount about what classes are for.

    Now let's take a little more time to talk about classes, in terms of things to avoid structurally in their implementing code. Design of Classes Code Unto Others http://localhost:8080/print/ 48 of 75 5/30/16, 09:41
  46. This isn't always bad, but it's often a path to

    confusion. It's also a great way to make your class hard or impossible to subclass (see also Josh Bloch's _Effective Java_, item 17). We bring it up because it displays layering violation - one element using another shows the API spread across at least two layers of abstraction. Avoid self-use of public APIs Code Unto Others http://localhost:8080/print/ 49 of 75 5/30/16, 09:41
  47. We've found it's worth going a step further and not

    passing self as a parameter to functions called from within a class definition. It's a good way to end up with infinite recursion or reference cycles. Avoid self-escape in class implementation Code Unto Others http://localhost:8080/print/ 50 of 75 5/30/16, 09:41
  48. Standard remedy: do you really want the subsystem to which

    you are passing "self" to be able later to call any arbitrary method of self? Mostly the answer is "no, just one part of self". So just pass that one part of "self". Pass something less than self: The value of an instance field. A custom type composed of the values of several instance fields. A bound method. Code Unto Others http://localhost:8080/print/ 51 of 75 5/30/16, 09:41
  49. Never have "these fields are used during a call to

    [method]". (Nathaniel's leaked ten megabytes story.) Minimize instance state. Code Unto Others http://localhost:8080/print/ 52 of 75 5/30/16, 09:41
  50. We've talked about them being the default place for code

    elements. Now we're going to talk about how to organize things in your modules for easier maintenance. Anything to say about modules? Code Unto Others http://localhost:8080/print/ 53 of 75 5/30/16, 09:41
  51. If you don't have a directed acyclic graph of modules,

    you want your code and continuous integration to scream it rather than have it slip by accidentally because you allowed imports to be sprinkled throughout your functions. Always place imports at the top of your modules. Code Unto Others http://localhost:8080/print/ 54 of 75 5/30/16, 09:41
  52. If a function in your module returns an instance of

    referenced_module.UsefulType, import referenced_module into your module. This helps maintain abstraction and avoid circularity and catches places where you're accidentally referring to non-public code elements. Import modules referenced in specification. import referenced_module import used_module def useful_value(parameter): """Returns a referenced_module.UsefulType value.""" return used_module.create_useful_value(parameter + 5) Code Unto Others http://localhost:8080/print/ 55 of 75 5/30/16, 09:41
  53. The underscore only looks weird the first eight hundred times

    you type it. Default to using private visibility for code elements and only "promote" them to public when you make them a deliberate and intentional part of a public API. Code Unto Others http://localhost:8080/print/ 56 of 75 5/30/16, 09:41
  54. You've spent the last week on a software problem. Understanding,

    designing, coding, testing, debugging. You were thinking about the problem not just while working but also at meals, while bathing, right before sleep, right after waking. Are you in the right position to judge what's obvious about the problem and what's obscure? Like those others you are compromised. What's a limit for which you could have signed up back when your judgement was objective? Code Unto Others http://localhost:8080/print/ 59 of 75 5/30/16, 09:41
  55. The longer you spent understanding the subject area and authoring

    code that solves a problem, the less qualified you are to decide anything about what's obvious and not obvious about the code. Line limits and complexity limits on functions, classes, and modules. Code Unto Others http://localhost:8080/print/ 60 of 75 5/30/16, 09:41
  56. Does anyone respect this claim? You don't want to be

    this programmer. How many of you want to investigate a bug in someone else's thousand-line class? "Once you understand everything that's involved in horckleblaxing a fnastitude it's obvious why this function needs to be two hundred lines long!" Code Unto Others http://localhost:8080/print/ 61 of 75 5/30/16, 09:41
  57. NEXT: more than any other reason, laziness What makes conforming

    to line count limits hard? Code Unto Others http://localhost:8080/print/ 62 of 75 5/30/16, 09:41
  58. Since your function is defined in terms of a, b,

    c, x, y, and z, you probably implemented it in terms of a, b, c, x, y, and z, were happy to be done, and walked away. def long_function(a, b, c): """Returns triplet of x, y, and z.""" # # Looooooooooong function body omitted # # # # # # (But trust us, it's really loooooooong!) # ↓ ↓ ↓ ↓ ↓ ↓ Code Unto Others http://localhost:8080/print/ 63 of 75 5/30/16, 09:41
  59. Go ahead and define the D through W. Reduce the

    size of your too-large code elements. They are a kindness to your co-maintainers with no effect on your public API. Helper functions for parts of a process. Data structures for partial results. Code too obvious? Be too obvious rather than not obvious enough. This is "code unto others", right? Code Unto Others http://localhost:8080/print/ 64 of 75 5/30/16, 09:41
  60. Preexisting abbreviations like "TCP" and "HTTP" are fine, but don't

    create new ones. NEXT: blue marble. Don't abbreviate when naming. Code Unto Others http://localhost:8080/print/ 65 of 75 5/30/16, 09:41
  61. Software engineering is a cosmopolitan endeavor. Spend a little horizontal

    space writing names that are more easily understood by readers who aren't as skilled with your language. Code Unto Others http://localhost:8080/print/ 66 of 75 5/30/16, 09:41
  62. In storytelling, across media and in both fiction and nonfiction,

    characters are introduced in isolation before interacting with others. Except when they aren't; that's a narrative technique to surprise the reader. "1984" telescreen moment. Kool-Aid Man. Surprises are the opposite of what we want when reading code. Code Unto Others http://localhost:8080/print/ 67 of 75 5/30/16, 09:41
  63. Having read a lot of code we've found that this

    is an encumberance. The reader has to pause their comprehension and scan ahead for _shadowy_stranger_function further down in the file. def _first_familiar_function(parameter): # some implementation def _second_familiar_function(parameter): # some implementation def high_level_function(parameter): v1 = _first_familiar_function(parameter) v2 = _second_familiar_function(parameter) return _shadowy_stranger_function(v1, v2) Code Unto Others http://localhost:8080/print/ 68 of 75 5/30/16, 09:41
  64. Only mutual recursion can get in the way of this.

    How many of you write mutually recursive recursive functions every day? Sort your code elements in definition- before-use order. Code Unto Others http://localhost:8080/print/ 69 of 75 5/30/16, 09:41
  65. So because private code elements are used by public code

    elements, this means that private code elements appear earlier in files and public code elements appear later in files. NEXT: Pushback is about users having to scroll. def _first_familiar_function(parameter): # some implementation def _second_familiar_function(parameter): # some other implementation def _shadowy_stranger_function(first, second): # not so shadowy and strange anymore! def public_function(first_parameter, second_parameter): v1 = _first_private_function(first_parameter) v2 = _second_private_function(second_parameter) return _shadowy_stranger_function(v1, v2) Code Unto Others http://localhost:8080/print/ 70 of 75 5/30/16, 09:41
  66. Your code has two audiences: maintainers and users. "But I

    want the public parts of my code at the top so that users don't have to scroll!" Code Unto Others http://localhost:8080/print/ 71 of 75 5/30/16, 09:41
  67. You can please all of the people some of the

    time; this is not one of those times. Maintainers and users are two different audiences of readers. Code Unto Others http://localhost:8080/print/ 72 of 75 5/30/16, 09:41
  68. Your maintainers have to read your file, so write your

    file to make your maintainers most happy. Send your users to the documentation generated from your code, since that's all they need to be happy and you have additional tools there to make them happy. Code Unto Others http://localhost:8080/print/ 73 of 75 5/30/16, 09:41
  69. The more you know about your code, the harder you

    may find relating to new readers and users of it Code Unto Others Readability is independent of correctness, efficiency, and problem domain Classes invite questions and complexity Consider setting your own judgement aside When forced to choose favor writing for maintainers rather than users Code Unto Others http://localhost:8080/print/ 74 of 75 5/30/16, 09:41