Slide 1

Slide 1 text

Inheriting a Sloppy Codebase A Practical Guide to Wrangling Chaotic Code 0

Slide 2

Slide 2 text

Casey Kinsey @cordiskinsey github.com/ckinsey linkedin.com/in/caseykinsey heapsortjobs.com/casey

Slide 3

Slide 3 text

hirelofty.com http://github.com/loftylabs

Slide 4

Slide 4 text

heapsortjobs.com @heapsortapp

Slide 5

Slide 5 text

Get These Slides hirelofty.com/djangocon14

Slide 6

Slide 6 text

Where does sloppy code come from? 1. Inexperience - Not our topic today. 2. Emphasis on development speed 3. Lack of long-term ownership

Slide 7

Slide 7 text

Rapid prototyping is a lie. Or at least what we're calling "prototyping".

Slide 8

Slide 8 text

In the rest of the engineering world A prototype is a proof of concept. Prototypes never see production.

Slide 9

Slide 9 text

In the Software Engineering World Prototypes go into production all the time.

Slide 10

Slide 10 text

You're Gonna Need Some Tools

Slide 11

Slide 11 text

A Good Editor/IDE Code navigation Code introspection Refactoring tools PyCharm, PyDev (Eclipse), emacs, and more

Slide 12

Slide 12 text

Debuggers (Django Debug Toolbar Doesn't Count) There will be much debugging. If you're not familiar with a Python debugger, choose one and get comfy with it. pdb, ipdb, pudb, and many more. Bonus Points: Have one integrated with your IDE.

Slide 13

Slide 13 text

Coverage.py Super easy to use: $ coverage run manage.py test Here's a basic .coveragerc for Django: [run] source = . omit = *migrations*,*wsgi.py,*settings.py Generate a report: $ coverage report Name Stmts Miss Cover -------------------------------------------------------- campaign/__init__ 0 0 100% campaign/admin 27 5 81% campaign/forms 38 2 95% campaign/middleware 0 0 100% campaign/models 89 2 98% campaign/tests 139 0 100% campaign/views 99 6 94% -------------------------------------------------------- TOTAL 392 15 96%

Slide 14

Slide 14 text

PyLint and pep8 PyLint helps clean up poorly structured code. Very useful during refactors. pep8 will keep your code legible. It will also make you irreversibly anal. Bonus Points: Have these integrated with your IDE.

Slide 15

Slide 15 text

Cool. You've just gotten your first look at the sloppy codebase.

Slide 16

Slide 16 text

Don't. Touch. Anything. Everything is Hot Lava

Slide 17

Slide 17 text

You're Gonna Do Some Prep-work

Slide 18

Slide 18 text

Do This First $ git checkout -b original_project master Now, when you want to see the way the code originally worked: $ git diff original_project master -- myapp/models.py And once everything is all cleaned up... Bragging rights: $ git diff original_project..master --stat

Slide 19

Slide 19 text

Fire It Up The goal is to get the project running in its last known working configuration. Create a virtual environment. But don't get attached to it. If the project shipped with a dependency list, use it. If the project shipped with tests, try to get them running. Be sure to freeze your dependencies into requirements.txt if you added new ones.

Slide 20

Slide 20 text

You're Gonna Need Test Coverage

Slide 21

Slide 21 text

We're not going to go too crazy here Integration test coverage for high-level Django concepts. Integration and/or unit test coverage for low-level business critical processes.

Slide 22

Slide 22 text

Integration Tests For our purposes, an integration test end-to-end test of the Python/Django code. Our goal is coverage, first and foremost. We want to catch landmines and see that refactors in one module do not unexpectedly affect another.

Slide 23

Slide 23 text

Integration Tests: An Example class HeapsortIntegrationTests(HeapsortIntegrationTestCase): def test_home_anon(self): """ Tests the homepage view works for anonymous users """ # Unauthenticated self.client.logout() response = self.client.get(reverse('home')) self.assertEqual(response.status_code, 200) self.assertTemplateUsed('home.html')

Slide 24

Slide 24 text

Unit Tests Our goal is not to achieve full unit test coverage. Unit tests will be largely negated by upcoming refactoring. Instead, we'll focus unit testing efforts to core components where accuracy is critical.

Slide 25

Slide 25 text

Unit Tests: An Example def test_jobs_expire_after_30_days(self): """ A job should no longer be returned by the manager when it expires """ now = datetime.datetime.now() the_future = datetime.timedelta(days=30) datetime_mock = mock.Mock() with mock.patch.multiple('jobs.models', datetime=datetime_mock): datetime_mock.datetime.now.return_value = now # Create a job. The job should be returned by the manager test_job = self._create_job( "My Test Job") self.assertTrue(test_job in Job.active.all()) # Fast-forard to... THE FUTURE datetime_mock.datetime.now.return_value = now + the_future self.assertFalse(test_job in Job.active.all())

Slide 26

Slide 26 text

Test Coverage: How much is enough? Integration test coverage for all Django views and user paths, management commands, 3rd party APIs. Unit test coverage for business critical processes and non view-logic. Billing processes Transactional communication Asynchronous tasks Coverage > 95% is ideal. No individual files left uncovered.

Slide 27

Slide 27 text

It's Time to Start Hacking There's No Telling What You'll Find

Slide 28

Slide 28 text

The Patchwork Quilt of Dependencies Pattern $ git diff --stat original_initial_product development -- requirements.txt requirements.txt | 135 +------------------------------------------------------------------ 1 file changed, 2 insertions(+), 133 deletions(-)

Slide 29

Slide 29 text

All Dependencies Introduce Complexities Whether or not to use a 3rd party module is a matter of determining if the ends justify the means Each dependency is a potential obstacle to upgrades. Low-level Django extensions can conflict and make debugging a nightmare. 3rd party modules installed from VCS make no stability or availability guarantees.

Slide 30

Slide 30 text

Trim the Fat Remember that virtual environment you created? Good. Now throw that shit away. Install the absolute bare minimum of requirements you know you need. (Django, South, database drivers, APIs that drive core functionality)

Slide 31

Slide 31 text

Audit each other requirement one-by-one Research each requirement and know exactly what it is, how it works, and why it is used. They will fall into one of three categories. 1. Totally unnecessary, or even unused. Remove these now. 2. Potentially unnecessary, but requires refactoring to remove. Make a judgement call. 3. Necessary functionality for the project.

Slide 32

Slide 32 text

Test and Upgrade Run the tests! If you missed something, go back to the previous step. Once your tests are passing, this is a good time to upgrade packages. Upgrade. Test. Repeat

Slide 33

Slide 33 text

The Monolithic App of Death Pattern One app to rule them all.

Slide 34

Slide 34 text

Monoliths are an organizational nightmare. You cannot succinctly evaluate the functionality of any one component. Nothing is portable. Usually rife with from my_app import *

Slide 35

Slide 35 text

Refactoring Monoliths Create a sane app structure. Kill off 'import *' and let PyLint do its thing. Migrate models to their new homes. Everything else will follow.

Slide 36

Slide 36 text

Migrating Models Across Apps If the project is pre-production, you can get away with automatic schema migrations. Using South A great discussion of techniques is on StackOverflow: http://stackoverflow.com/questions/1258130/how-do-i-migrate-a- model-out-of-one-django-app-and-into-a-new-one

Slide 37

Slide 37 text

The "Every Model is an App" Pattern "I like my INSTALLED_APPS to be a list of database tables."

Slide 38

Slide 38 text

The "Every Model is an App" Project This type of structure is another organizational nightmare, for similar reasons to a monolith but due to an opposite design pattern. You cannot succinctly evaluate the functionality of any one component. Nothing is portable. You'll get very acquainted with: Traceback (most recent call last): ... File "/app1/foo.py", line 1, in from app2 import bar File "/app2/bar.py", line 1, in from app1 import foo ImportError: cannot import name foo

Slide 39

Slide 39 text

The "Every Model is an App" Project Follow the same steps as a monolith. The challenges are the same; create an organized structure, migrate models across apps into their new home, and everything else follows. This should be less work than a monolith, as one of your apps will become the 'host' for others.

Slide 40

Slide 40 text

The "Receivers Everywhere!" Pattern It's like a Hail Mary.

Slide 41

Slide 41 text

Signals/Receivers Are Useful Signals allow for a decoupling of models which initiate side effects. They are especially useful for triggering side effects in 3rd party code. Signals are great for maintaining data integrity.

Slide 42

Slide 42 text

... But Not Always Signal receivers can hide important functionality from developers. Signal receivers can become unnecessary abstraction between two models in the same application. Signal receivers can implement over-zealous business logic.

Slide 43

Slide 43 text

Getting Rid of Superfluous Signal Receivers Be on the lookout for: Signal receivers that interact between two database related models. Signal receivers in which the instance simply operates on itself. class MyModel(models.Model): # ... def save(self, *args, **kwargs): # Pre-save! self.some_property = True super(MyModel, self).save(*args, **kwargs) # Post-save! self.foreign_relation.some_poperty = True self.foreign_relation.save()

Slide 44

Slide 44 text

Getting Rid of Superfluous Signal Receivers Be on the lookout for: Signal receivers that implement transactional or business logic. class UserReceiver(ModelReceiver): @staticmethod def pre_save(sender, **kwargs): instance = kwargs['instance'] instance.notify = instance.pk is None @staticmethod def post_save(sender, **kwargs): instance = kwargs['instance'] if instance.notify: NewUserEmail().send(extra_context={'user': instance,}, to=constants.NOTIFY_EMAILS)

Slide 45

Slide 45 text

The "Context Processors/Middleware Are the Coolest Thing Ever" Pattern Taking DRY to the Max

Slide 46

Slide 46 text

... (reminds me of when I discovered list comprehensions in Python) def publish_all(): # NOT SURE WHY IM DOING THIS # BUT IT IS COOL AS HELL! [p.publish() for p in Post.objects.all()]

Slide 47

Slide 47 text

Context Processors Context Processors are widely abused and can quickly impair performance. You should evaluate each context processor in the project and see if:

Slide 48

Slide 48 text

1. The context processor is better suited as a templatetag. # in myapp.context_processors.py from myapp.utils import products_by_category def products(request): return {'products_by_category': products_by_category()} # in myapp.templatetags from django import template from myapp.utils import products_by_category register = template.Library() @register.assignment_tag(takes_context=True) def get_products_by_category(context): return products_by_category() Assignment Tags are effective here.

Slide 49

Slide 49 text

2. The context processor logic should be refactored into relevant views. # in myapp.context_processors.py from myapp.utils import products_by_category def products(request): return {'products_by_category': products_by_category()} # in myapp.views from myapp.utils import products_by_category class MyView(DetailView): def get_context_data(self, **kwargs): context = super(MyView, self).get_context_data(kwargs) context.update({'products_by_category': products_by_category()}) return context

Slide 50

Slide 50 text

Middleware Not as widely abused as Context Processors, but potentially have larger performance implications. Do I need this logic to execute on every single request and/or response? If not, you should engineer a way around.

Slide 51

Slide 51 text

The Undocumented Black Box Full of Black Boxes # Who put all these hashtags in my code?

Slide 52

Slide 52 text

The only thing worse than code with no comments... def get_classification(self): if self.classification and len(self.classification) == 2: level = CLASSIFICATIONS.get(self.classification[0], '') class_level = self.classification[1] if class_level == 'U': class_level = '' else: class_level = 'class %s '%class_level string = '%s%s'%(class_level, level) return dict({'level':level, 'class': class_level, 'string':string,}) else: return None

Slide 53

Slide 53 text

... is code with just enough comments to piss you off def get_classification(self): if self.classification and len(self.classification) == 2: level = CLASSIFICATIONS.get(self.classification[0], '') class_level = self.classification[1] if class_level == 'U': class_level = '' else: class_level = 'class %s '%class_level string = '%s%s'%(class_level, level) # returns dict return dict({'level':level, 'class': class_level, 'string':string,}) else: return None

Slide 54

Slide 54 text

There's No Easy Way Out of Undocumented Code You could try and brute force your way through it, but that's impractical.

Slide 55

Slide 55 text

Document as You Go Make it policy: If you edit the code, you document the code. If you have to dissect code to understand it, comment what you find.

Slide 56

Slide 56 text

Document as You Go Every module, function, class, and method get a docstring. No exceptions! These are the workhorses of your documentation effort. Break apart long, multi-step logical branches with comments. If a statement is difficult read or has some sort of ambiguity, leave a comment!

Slide 57

Slide 57 text

Other Gotchas to Watch For Hard-coded URLs Try and chase these down before refactoring. Broad Try/Except Blocks try: except: pass <-- This ain't @PHP VCS Requirements Fork 'em! Whoever owns the project should own the repo. Misnamed Concepts Bite the bullet and rename them. Premature Configuration Pre-configured code overcomplicates deployments.

Slide 58

Slide 58 text

Key Concepts Tests first. Dependencies introduce complexity. Organization is important. Watch for hidden signals and receivers. Audit context processors and middleware. Document as you go.

Slide 59

Slide 59 text

Thank You! http://hirelofty.com/djangocon14 Casey Kinsey [email protected] @cordiskinsey