Divide and divide – Refactoring code in Python

Divide and Divide Refactoring code in Python Nino R. Eclarin

- Anything Developer @ Hacarus - Backyard astronomer - Music
enthusiast - Physics and astronomy is <3 - I love data! - Rick and Morty FTW - Curious, always. Call me Ninz! Github: @pprmint Medium: @ninzz

“Any code of your own that you haven't looked at
for six or more months might as well have been written by someone else.” -- Eagleson's Law Let’s start! • What is refactoring? • How does it affect code quality? • Do you need to refactor? ◦ When do you need to refactor? ◦ Setting goals. ◦ Refactoring timeline • Refactoring approach ◦ Looking for affected code ◦ Common design patterns and solutions ◦ Single Responsibility Principle • Results assessment ◦ Is it effective? ◦ Showing the improvements ◦ Future implications of refactoring

Refactoring The process of restructuring existing computer code without changing
its external behavior. (Wikipedia)

Refactoring • Making code pretty (fresh graduate) • Cleaning up
the mess of the first developers (junior devs) • Jumping into the abyss (senior dev) • Just don’t break any existing tests (QA) • Do we have time for that? (project managers/leads) • It’s okay, I guess, ermmm, can you add this new feature? (product owners) • Re . . . what??!! (executives)

Code Quality Initial code base

Code Quality Our goal as developers

Code Quality ==

Code Quality - Makes the code more readable/easy to maintain
- Easy to test - Faster integration

YES Do you need to refactor?

AT SOME POINT Do you need to refactor?

All of us at some point refactor our code, sometimes
unknowingly. Why is it needed? Me looking at my old code

Do you need to refactor? Watch out for the following:
• Repeating logic in processes/use cases • Difficulty to extend the feature/function/class • Tests breaking • New devs are having hard time onboarding

Do you need to refactor? Goals in refactoring • Improve
code quality - make it easier for new people to understand the code • Easy testing • Make integrations faster and efficient • Makes the system more modular and flexible

Do you need to refactor? Schedule your refactoring • Check
with Leads/PM/PO for possible time allotment • Do not refactor early! (Big mistake usually, wastes a lot of time) • Do not refactor too late! (Causes a lot of technical debt) • Small incremental refactoring is ideal.

Refactoring approach

from flask import Flask, abort, make_response app = Flask(__name__) @app.route('/one')
def one(): return 'This is a success.' @app.route('/two') def two(): print('This will return a 404 error') abort(404) @app.route('/three') def three(): print('This will return a 500 error') return make_response('Error', 500) if __name__ == '__main__': app.run(debug = True) Simple Flask Application

Looking for affected code Route “/two” and “/three” does the
same thing. They both return error or response other than 200. Thus, they perform the same behaviour.

class Rick: name = 'Rick' dimension = 'c137' def wubulubudubdub(self):
print('I am not okay') def show_dimension(self): print('I am {} from dimension {}'.format(self.name, self.dimension)) class Morty: name = 'Morty' dimension = 'c137' def awwwww_geeez(self): print('Geeeezz Rick!') def show_dimension(self): print('I am {} from dimension {}'.format(self.name, self.dimension)) if __name__ == '__main__': Rick().show_dimension() Morty().show_dimension() Class Usage

Looking for affected code Both class uses the same function
“show_dimension”. These are the easy to spot ones because they are literally code “duplicates”.

import datetime def type_based_operation(data): if type(data) == str: return 'The
string is: {}'.format(data) elif type(data) == int: return 100 + data elif type(data) == datetime.datetime: return datetime.datetime.now() return print(type_based_operation('Hello')) print(type_based_operation(200)) print(type_based_operation(datetime.datetime(2018, 1, 1))) Filter-based operation

Looking for affected code Conditional approaches can be converted to
hash based access which will allow faster execution. I will show you more detailed example.

# Result from MySQL database ninjas = [ { 'name':
'Naruto', 'team': 7, 'skill': 'bunshin' }, { 'name': 'Ino', 'team': 4, 'skill': 'mind transfer' }, { 'name': 'Sakura', 'team': 7, 'skill': 'strength' }, { 'name': 'hinata', 'team': 2, 'skill': 'byakugan' }, { 'name': 'Neji', 'team': 3, 'skill': 'byakugan' }, .... ] Data formatting

[{ 'members': [ { 'name': 'Naruto', 'team': 7, 'skill': 'bunshin'
}, { 'name': 'Sakura', 'team': 7, 'skill': 'strength' }, { 'name': 'Sasuke', 'team': 7, 'skill': 'sharingan' } ], 'member_count': 3 }, { 'members': [ { 'name': 'Ino', 'team': 4, 'skill': 'mind transfer' }, { 'name': 'Choji', 'team': 4, 'skill': 'expansion' } ], 'member_count': 2 }, . . . . ] Data formatting

teams = [] temp_dict = {} for row in ninjas:
if row['team'] not in temp_dict: temp_dict[row['team']] = { 'members': [], 'member_count': 0 } temp_dict[row['team']]['members'].append(row) temp_dict[row['team']]['member_count'] += 1 for team in temp_dict.values(): teams.append(team) print(teams) Data formatting

Looking for affected code When dealing with formatting or other
similar kinds of data handling, it is important to make assumptions about the data. Always consider the ff: • Data structure • Ordering of data • Type of data • Size of data

Design Patterns and Solutions Running away from all the bugs.

from flask import Flask, abort, make_response app = Flask(__name__) @app.route('/one')
def one(): return 'This is a success.' @app.route('/two') def two(): print('This will return a 404 error') abort(404) @app.route('/three') def three(): print('This will return a 500 error') return make_response('Error', 500) if __name__ == '__main__': app.run(debug = True) Simple Flask Application

Design Patterns and Solutions 1) Create a custom error response
class, use Exception as parent class 2) Flask has a feature @app.errorhandler(ErrorResponse) to be able to catch custom exception errors 3) Throw the error Exception for error handling.

from flask import Flask, abort, make_response, jsonify app = Flask(__name__)
class ErrorResponse(Exception): status_code = 400 def __init__(self, message, status_code=None, payload=None): Exception.__init__(self) self.message = message if status_code is not None: self.status_code = status_code self.payload = payload def to_dict(self): return dict(code=self.status_code, message=self.message, data=self.payload) @app.errorhandler(ErrorResponse) def exception_encountered(error): error = error.to_dict() # You can modify this to return any kind of error # You can return an error page # You can use json return for pure API return make_response(jsonify(error), error['code']) Solution - Flask App

@app.route('/one') def one(): return 'This is a success.' @app.route('/two') def
two(): raise ErrorResponse('This will be 404', status_code=404) @app.route('/three') def three(): raise ErrorResponse('This will be a 500 error', payload='With additional data', status_code=500) if __name__ == '__main__': app.run(debug = True) Solution - Flask App

class Rick: name = 'Rick' dimension = 'c137' def wubulubudubdub(self):
print('I am not okay') def show_dimension(self): print('I am {} from dimension {}'.format(self.name, self.dimension)) class Morty: name = 'Morty' dimension = 'c137' def awwwww_geeez(self): print('Geeeezz Rick!') def show_dimension(self): print('I am {} from dimension {}'.format(self.name, self.dimension)) if __name__ == '__main__': Rick().show_dimension() Morty().show_dimension() Class Usage

Design Patterns and Solutions 1) You can group common methods/functions
of different class by creating parent class that have them as methods 2) Utilize class features such as interfaces, subclasses and singleton patterns. Singletons for example are very useful in storing states in any kinds of application. 3) Method overriding and method overloading

class InfoFormatter: def show_dimension(self): print('I am {} from dimension {}'.format(self.name,
self.dimension)) class Rick(InfoFormatter): name = 'Rick' dimension = 'c137' def wubulubudubdub(self): print('I am not okay') class Morty(InfoFormatter): name = 'Morty' dimension = 'c137' def awwwww_geeez(self): print('Geeeezz Rick!') if __name__ == '__main__': Rick().show_dimension() Morty().show_dimension() Solution - Class Usage

import datetime def type_based_operation(data): if type(data) == str: return 'The
string is: {}'.format(data) elif type(data) == int: return 100 + data elif type(data) == datetime.datetime: return datetime.datetime.now() return print(type_based_operation('Hello')) print(type_based_operation(200)) print(type_based_operation(datetime.datetime(2018, 1, 1))) Filter-based operation

Design Patterns and Solutions 1) Use a dictionary to store
lambda functions needed for each data type. 2) Utilize dictionaries and mapping for faster access and filtering. In python, there are several “hashable” data types that you can use. This is a basic example but it has big potential especially when dealing with much more complex problems.

import datetime type_based_operation = dict() type_based_operation[str] = lambda data: 'The
string is: {}'.format(data) type_based_operation[int] = lambda data: 100 + data type_based_operation[datetime.datetime] = lambda data: datetime.datetime.now() dt = datetime.datetime(2018, 1, 1) print(type_based_operation[type('Hello')]('Hello')) print(type_based_operation[type(200)](200)) print(type_based_operation[type(dt)](dt)) Solution - Type based operation

Design Patterns and Solutions More complex application of the hash
based approach.

# Result from MySQL database ninjas = [ { 'name':
'Naruto', 'team': 7, 'skill': 'bunshin' }, { 'name': 'Ino', 'team': 4, 'skill': 'mind transfer' }, { 'name': 'Sakura', 'team': 7, 'skill': 'strength' }, { 'name': 'hinata', 'team': 2, 'skill': 'byakugan' }, { 'name': 'Neji', 'team': 3, 'skill': 'byakugan' }, .... ] Data formatting

[{ 'members': [ { 'name': 'Naruto', 'team': 7, 'skill': 'bunshin'
}, { 'name': 'Sakura', 'team': 7, 'skill': 'strength' }, { 'name': 'Sasuke', 'team': 7, 'skill': 'sharingan' } ], 'member_count': 3 }, { 'members': [ { 'name': 'Ino', 'team': 4, 'skill': 'mind transfer' }, { 'name': 'Choji', 'team': 4, 'skill': 'expansion' } ], 'member_count': 2 }, . . . . ] Data formatting

teams = [] temp_dict = {} for row in ninjas:
if row['team'] not in temp_dict: temp_dict[row['team']] = { 'members': [], 'member_count': 0 } temp_dict[row['team']]['members'].append(row) temp_dict[row['team']]['member_count'] += 1 for team in temp_dict.values(): teams.append(team) print(teams) Data formatting

Design Patterns and Solutions • Oftentimes, code structure depends on
the data it handles • Data structure, arrangement and data type affects the speed of your code • Know the best data structures for different types of task. Eg. search - trees, etc • Sometimes a tradeoff between speed and aesthetics

teams = [] # Too lazy to create a sorted
return data, assume its sorted in MySQL ninjas = sorted(ninjas, key=lambda x: x['team']) # ignore this current_team = None for row in ninjas: if current_team != row['team']: current_team = row['team'] teams.append({ 'members': [], 'member_count': 0 }) index = len(teams) - 1 teams[index]['members'].append(row) teams[index]['member_count'] += 1 print(teams) Data formatting

Other examples Using “shortcuts” and language quirks. # Merge two
dictionary using unpacking a = {'name': 'john', 'age': 25} b = {'weight': 55, 'height': 160} c = {**a, **b} c = {'weight': 55, 'height': 160, 'name': 'john', 'age': 25}

chars = ['a', 'b', 'c', 'd', 'e', 'f'] look_for =
'n' # Usual implementation found = False for c in chars: if look_for == c: print('Found you!') found = True break if not found: print('Not found "{}"'.format(look_for)) # Use Else for c in chars: if look_for == c: print('Found you!') break else: print('Not found "{}"'.format(look_for)) Else on loops

## Avoid using map(), filter(), etc # Use list comprehension
as much as possible. It is good for performance num = [1, 2, 3] doubles = map(lambda x: x * 2, num) num = [1, 2, 3] doubles = [x * 2 for x in num] List operations

Single Responsibility Principle Module or Class Functionality or Feature

Single Responsibility Principle - One function/class/module should only handle one
responsibility. Eg. add() function should only do addition - Keep responsibility/use case as small as possible - Each function/class should be able to integrate properly. Eg. add(), subtract(), multiply() should be able to create a calculator() function

YES Code refactoring worth it?

MAYBE? Code refactoring worth it?

Measuring effectivity Look at your goals. • Did your code
quality improved? • Is it much more readable? • Is your testing a lot better? No tests should break randomly. • Is it easier to expand and integrate your code? • Is the overall system more modular?

Showing off It’s important that other people in your company
understands what are the changes and how it improved the current system. Organize a small brown bag session to discuss the improvements.

Showing off • Do not forget to DOCUMENT the changes.
• Create benchmarks to show quantitative data of the improvements. You can use http://pyperformance.readthedocs.io • Ask your leads/PM to inform mid managers about the improvements • If you created a new library as a result of refactoring, do not forget to inform other developers that such library/tools exists.

Showing off

Going forward • Have a regular code assessment and review.
Dedicate time once a month. • Make sure your leads and managers understand the importance of a clean codebase. • Early refactoring wastes a lot of time. • Refactoring too late incurs a lot of technical debt. • Use language features and design patterns as much as possible.

You can follow my articles on https://medium.com/@ninzz. I try to
write stuff at least once a month. All code examples can be found at https://github.com/pprmint/pycon_kr If you are interested in AI and Healthcare technology, you can check us out at Hacarus. We use Sparse Modeling instead of the traditional ML/Deep Learning approaches. https://hacarus.com/ https://hacarus.com/information/tech/sparse-modeling-for-it-engineers/

HAPPY REFACTORING! Questions?

Resources • https://github.com/pprmint • http://pyperformance.readthedocs.io • https://medium.com/@ninzz/divide-and-divide-refactoring-code-a t-hacarus-caab8e757559 • https://docs.quantifiedcode.com/python-anti-patterns/
• https://medium.com/unbabel-dev/refactoring-a-python-codebase- using-the-single-responsibility-principle-ed1367baefd6

Divide and divide – Refactoring code in Python

Divide and divide – Refactoring code in Python

More Decks by Hacarus Inc.

Other Decks in Programming

Featured

Transcript