Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Inheriting a Sloppy Codebase: A Practical Guid...

Casey Kinsey
September 02, 2014

Inheriting a Sloppy Codebase: A Practical Guide to Wrangling Chaotic Code

High fidelity interactive slides available at:
hirelofty.com/djangocon14

In an industry where “lean” has become the mantra and rapidly iterated products imply tight budgets and tighter deadlines, how do you effectively take ownership of someone’s hastily written Django code? In this talk, we’ll dive into a step-by-step process for dicing up legacy projects, short-sighted prototypes, and plain ol’ spaghetti code to turn them into codebases you’ll show off with pride.

Casey Kinsey

September 02, 2014
Tweet

Other Decks in Programming

Transcript

  1. Where does sloppy code come from? 1. Inexperience - Not

    our topic today. 2. Emphasis on development speed 3. Lack of long-term ownership
  2. In the rest of the engineering world A prototype is

    a proof of concept. Prototypes never see production.
  3. Debuggers (Django Debug Toolbar Doesn't Count) There will be much

    debugging. If you're not familiar with a Python debugger, choose one and get comfy with it. pdb, ipdb, pudb, and many more. Bonus Points: Have one integrated with your IDE.
  4. Coverage.py Super easy to use: $ coverage run manage.py test

    Here's a basic .coveragerc for Django: [run] source = . omit = *migrations*,*wsgi.py,*settings.py Generate a report: $ coverage report Name Stmts Miss Cover -------------------------------------------------------- campaign/__init__ 0 0 100% campaign/admin 27 5 81% campaign/forms 38 2 95% campaign/middleware 0 0 100% campaign/models 89 2 98% campaign/tests 139 0 100% campaign/views 99 6 94% -------------------------------------------------------- TOTAL 392 15 96%
  5. PyLint and pep8 PyLint helps clean up poorly structured code.

    Very useful during refactors. pep8 will keep your code legible. It will also make you irreversibly anal. Bonus Points: Have these integrated with your IDE.
  6. Do This First $ git checkout -b original_project master Now,

    when you want to see the way the code originally worked: $ git diff original_project master -- myapp/models.py And once everything is all cleaned up... Bragging rights: $ git diff original_project..master --stat
  7. Fire It Up The goal is to get the project

    running in its last known working configuration. Create a virtual environment. But don't get attached to it. If the project shipped with a dependency list, use it. If the project shipped with tests, try to get them running. Be sure to freeze your dependencies into requirements.txt if you added new ones.
  8. We're not going to go too crazy here Integration test

    coverage for high-level Django concepts. Integration and/or unit test coverage for low-level business critical processes.
  9. Integration Tests For our purposes, an integration test end-to-end test

    of the Python/Django code. Our goal is coverage, first and foremost. We want to catch landmines and see that refactors in one module do not unexpectedly affect another.
  10. Integration Tests: An Example class HeapsortIntegrationTests(HeapsortIntegrationTestCase): def test_home_anon(self): """ Tests

    the homepage view works for anonymous users """ # Unauthenticated self.client.logout() response = self.client.get(reverse('home')) self.assertEqual(response.status_code, 200) self.assertTemplateUsed('home.html')
  11. Unit Tests Our goal is not to achieve full unit

    test coverage. Unit tests will be largely negated by upcoming refactoring. Instead, we'll focus unit testing efforts to core components where accuracy is critical.
  12. Unit Tests: An Example def test_jobs_expire_after_30_days(self): """ A job should

    no longer be returned by the manager when it expires """ now = datetime.datetime.now() the_future = datetime.timedelta(days=30) datetime_mock = mock.Mock() with mock.patch.multiple('jobs.models', datetime=datetime_mock): datetime_mock.datetime.now.return_value = now # Create a job. The job should be returned by the manager test_job = self._create_job( "My Test Job") self.assertTrue(test_job in Job.active.all()) # Fast-forard to... THE FUTURE datetime_mock.datetime.now.return_value = now + the_future self.assertFalse(test_job in Job.active.all())
  13. Test Coverage: How much is enough? Integration test coverage for

    all Django views and user paths, management commands, 3rd party APIs. Unit test coverage for business critical processes and non view-logic. Billing processes Transactional communication Asynchronous tasks Coverage > 95% is ideal. No individual files left uncovered.
  14. The Patchwork Quilt of Dependencies Pattern $ git diff --stat

    original_initial_product development -- requirements.txt requirements.txt | 135 +------------------------------------------------------------------ 1 file changed, 2 insertions(+), 133 deletions(-)
  15. All Dependencies Introduce Complexities Whether or not to use a

    3rd party module is a matter of determining if the ends justify the means Each dependency is a potential obstacle to upgrades. Low-level Django extensions can conflict and make debugging a nightmare. 3rd party modules installed from VCS make no stability or availability guarantees.
  16. Trim the Fat Remember that virtual environment you created? Good.

    Now throw that shit away. Install the absolute bare minimum of requirements you know you need. (Django, South, database drivers, APIs that drive core functionality)
  17. Audit each other requirement one-by-one Research each requirement and know

    exactly what it is, how it works, and why it is used. They will fall into one of three categories. 1. Totally unnecessary, or even unused. Remove these now. 2. Potentially unnecessary, but requires refactoring to remove. Make a judgement call. 3. Necessary functionality for the project.
  18. Test and Upgrade Run the tests! If you missed something,

    go back to the previous step. Once your tests are passing, this is a good time to upgrade packages. Upgrade. Test. Repeat
  19. Monoliths are an organizational nightmare. You cannot succinctly evaluate the

    functionality of any one component. Nothing is portable. Usually rife with from my_app import *
  20. Refactoring Monoliths Create a sane app structure. Kill off 'import

    *' and let PyLint do its thing. Migrate models to their new homes. Everything else will follow.
  21. Migrating Models Across Apps If the project is pre-production, you

    can get away with automatic schema migrations. Using South A great discussion of techniques is on StackOverflow: http://stackoverflow.com/questions/1258130/how-do-i-migrate-a- model-out-of-one-django-app-and-into-a-new-one
  22. The "Every Model is an App" Pattern "I like my

    INSTALLED_APPS to be a list of database tables."
  23. The "Every Model is an App" Project This type of

    structure is another organizational nightmare, for similar reasons to a monolith but due to an opposite design pattern. You cannot succinctly evaluate the functionality of any one component. Nothing is portable. You'll get very acquainted with: Traceback (most recent call last): ... File "/app1/foo.py", line 1, in <module> from app2 import bar File "/app2/bar.py", line 1, in <module> from app1 import foo ImportError: cannot import name foo
  24. The "Every Model is an App" Project Follow the same

    steps as a monolith. The challenges are the same; create an organized structure, migrate models across apps into their new home, and everything else follows. This should be less work than a monolith, as one of your apps will become the 'host' for others.
  25. Signals/Receivers Are Useful Signals allow for a decoupling of models

    which initiate side effects. They are especially useful for triggering side effects in 3rd party code. Signals are great for maintaining data integrity.
  26. ... But Not Always Signal receivers can hide important functionality

    from developers. Signal receivers can become unnecessary abstraction between two models in the same application. Signal receivers can implement over-zealous business logic.
  27. Getting Rid of Superfluous Signal Receivers Be on the lookout

    for: Signal receivers that interact between two database related models. Signal receivers in which the instance simply operates on itself. class MyModel(models.Model): # ... def save(self, *args, **kwargs): # Pre-save! self.some_property = True super(MyModel, self).save(*args, **kwargs) # Post-save! self.foreign_relation.some_poperty = True self.foreign_relation.save()
  28. Getting Rid of Superfluous Signal Receivers Be on the lookout

    for: Signal receivers that implement transactional or business logic. class UserReceiver(ModelReceiver): @staticmethod def pre_save(sender, **kwargs): instance = kwargs['instance'] instance.notify = instance.pk is None @staticmethod def post_save(sender, **kwargs): instance = kwargs['instance'] if instance.notify: NewUserEmail().send(extra_context={'user': instance,}, to=constants.NOTIFY_EMAILS)
  29. ... (reminds me of when I discovered list comprehensions in

    Python) def publish_all(): # NOT SURE WHY IM DOING THIS # BUT IT IS COOL AS HELL! [p.publish() for p in Post.objects.all()]
  30. Context Processors Context Processors are widely abused and can quickly

    impair performance. You should evaluate each context processor in the project and see if:
  31. 1. The context processor is better suited as a templatetag.

    # in myapp.context_processors.py from myapp.utils import products_by_category def products(request): return {'products_by_category': products_by_category()} # in myapp.templatetags from django import template from myapp.utils import products_by_category register = template.Library() @register.assignment_tag(takes_context=True) def get_products_by_category(context): return products_by_category() Assignment Tags are effective here.
  32. 2. The context processor logic should be refactored into relevant

    views. # in myapp.context_processors.py from myapp.utils import products_by_category def products(request): return {'products_by_category': products_by_category()} # in myapp.views from myapp.utils import products_by_category class MyView(DetailView): def get_context_data(self, **kwargs): context = super(MyView, self).get_context_data(kwargs) context.update({'products_by_category': products_by_category()}) return context
  33. Middleware Not as widely abused as Context Processors, but potentially

    have larger performance implications. Do I need this logic to execute on every single request and/or response? If not, you should engineer a way around.
  34. The Undocumented Black Box Full of Black Boxes # Who

    put all these hashtags in my code?
  35. The only thing worse than code with no comments... def

    get_classification(self): if self.classification and len(self.classification) == 2: level = CLASSIFICATIONS.get(self.classification[0], '') class_level = self.classification[1] if class_level == 'U': class_level = '' else: class_level = 'class %s '%class_level string = '%s%s'%(class_level, level) return dict({'level':level, 'class': class_level, 'string':string,}) else: return None
  36. ... is code with just enough comments to piss you

    off def get_classification(self): if self.classification and len(self.classification) == 2: level = CLASSIFICATIONS.get(self.classification[0], '') class_level = self.classification[1] if class_level == 'U': class_level = '' else: class_level = 'class %s '%class_level string = '%s%s'%(class_level, level) # returns dict return dict({'level':level, 'class': class_level, 'string':string,}) else: return None
  37. There's No Easy Way Out of Undocumented Code You could

    try and brute force your way through it, but that's impractical.
  38. Document as You Go Make it policy: If you edit

    the code, you document the code. If you have to dissect code to understand it, comment what you find.
  39. Document as You Go Every module, function, class, and method

    get a docstring. No exceptions! These are the workhorses of your documentation effort. Break apart long, multi-step logical branches with comments. If a statement is difficult read or has some sort of ambiguity, leave a comment!
  40. Other Gotchas to Watch For Hard-coded URLs Try and chase

    these down before refactoring. Broad Try/Except Blocks try: except: pass <-- This ain't @PHP VCS Requirements Fork 'em! Whoever owns the project should own the repo. Misnamed Concepts Bite the bullet and rename them. Premature Configuration Pre-configured code overcomplicates deployments.
  41. Key Concepts Tests first. Dependencies introduce complexity. Organization is important.

    Watch for hidden signals and receivers. Audit context processors and middleware. Document as you go.