Lock in $30 Savings on PRO—Offer Ends Soon! ⏳

Divide and divide – Refactoring code in Python

Divide and divide – Refactoring code in Python

Presentation deck presented at Pycon Korea 2018.
- Why and when to refactor
- How to approach refactoring
- Good and bad ways of refactoring
- Python design patterns and library features

Hacarus Inc.

August 18, 2018
Tweet

More Decks by Hacarus Inc.

Other Decks in Programming

Transcript

  1. - Anything Developer @ Hacarus - Backyard astronomer - Music

    enthusiast - Physics and astronomy is <3 - I love data! - Rick and Morty FTW - Curious, always. Call me Ninz! Github: @pprmint Medium: @ninzz
  2. “Any code of your own that you haven't looked at

    for six or more months might as well have been written by someone else.” -- Eagleson's Law Let’s start! • What is refactoring? • How does it affect code quality? • Do you need to refactor? ◦ When do you need to refactor? ◦ Setting goals. ◦ Refactoring timeline • Refactoring approach ◦ Looking for affected code ◦ Common design patterns and solutions ◦ Single Responsibility Principle • Results assessment ◦ Is it effective? ◦ Showing the improvements ◦ Future implications of refactoring
  3. Refactoring • Making code pretty (fresh graduate) • Cleaning up

    the mess of the first developers (junior devs) • Jumping into the abyss (senior dev) • Just don’t break any existing tests (QA) • Do we have time for that? (project managers/leads) • It’s okay, I guess, ermmm, can you add this new feature? (product owners) • Re . . . what??!! (executives)
  4. All of us at some point refactor our code, sometimes

    unknowingly. Why is it needed? Me looking at my old code
  5. Do you need to refactor? Watch out for the following:

    • Repeating logic in processes/use cases • Difficulty to extend the feature/function/class • Tests breaking • New devs are having hard time onboarding
  6. Do you need to refactor? Goals in refactoring • Improve

    code quality - make it easier for new people to understand the code • Easy testing • Make integrations faster and efficient • Makes the system more modular and flexible
  7. Do you need to refactor? Schedule your refactoring • Check

    with Leads/PM/PO for possible time allotment • Do not refactor early! (Big mistake usually, wastes a lot of time) • Do not refactor too late! (Causes a lot of technical debt) • Small incremental refactoring is ideal.
  8. from flask import Flask, abort, make_response app = Flask(__name__) @app.route('/one')

    def one(): return 'This is a success.' @app.route('/two') def two(): print('This will return a 404 error') abort(404) @app.route('/three') def three(): print('This will return a 500 error') return make_response('Error', 500) if __name__ == '__main__': app.run(debug = True) Simple Flask Application
  9. Looking for affected code Route “/two” and “/three” does the

    same thing. They both return error or response other than 200. Thus, they perform the same behaviour.
  10. class Rick: name = 'Rick' dimension = 'c137' def wubulubudubdub(self):

    print('I am not okay') def show_dimension(self): print('I am {} from dimension {}'.format(self.name, self.dimension)) class Morty: name = 'Morty' dimension = 'c137' def awwwww_geeez(self): print('Geeeezz Rick!') def show_dimension(self): print('I am {} from dimension {}'.format(self.name, self.dimension)) if __name__ == '__main__': Rick().show_dimension() Morty().show_dimension() Class Usage
  11. Looking for affected code Both class uses the same function

    “show_dimension”. These are the easy to spot ones because they are literally code “duplicates”.
  12. import datetime def type_based_operation(data): if type(data) == str: return 'The

    string is: {}'.format(data) elif type(data) == int: return 100 + data elif type(data) == datetime.datetime: return datetime.datetime.now() return print(type_based_operation('Hello')) print(type_based_operation(200)) print(type_based_operation(datetime.datetime(2018, 1, 1))) Filter-based operation
  13. Looking for affected code Conditional approaches can be converted to

    hash based access which will allow faster execution. I will show you more detailed example.
  14. # Result from MySQL database ninjas = [ { 'name':

    'Naruto', 'team': 7, 'skill': 'bunshin' }, { 'name': 'Ino', 'team': 4, 'skill': 'mind transfer' }, { 'name': 'Sakura', 'team': 7, 'skill': 'strength' }, { 'name': 'hinata', 'team': 2, 'skill': 'byakugan' }, { 'name': 'Neji', 'team': 3, 'skill': 'byakugan' }, .... ] Data formatting
  15. [{ 'members': [ { 'name': 'Naruto', 'team': 7, 'skill': 'bunshin'

    }, { 'name': 'Sakura', 'team': 7, 'skill': 'strength' }, { 'name': 'Sasuke', 'team': 7, 'skill': 'sharingan' } ], 'member_count': 3 }, { 'members': [ { 'name': 'Ino', 'team': 4, 'skill': 'mind transfer' }, { 'name': 'Choji', 'team': 4, 'skill': 'expansion' } ], 'member_count': 2 }, . . . . ] Data formatting
  16. teams = [] temp_dict = {} for row in ninjas:

    if row['team'] not in temp_dict: temp_dict[row['team']] = { 'members': [], 'member_count': 0 } temp_dict[row['team']]['members'].append(row) temp_dict[row['team']]['member_count'] += 1 for team in temp_dict.values(): teams.append(team) print(teams) Data formatting
  17. Looking for affected code When dealing with formatting or other

    similar kinds of data handling, it is important to make assumptions about the data. Always consider the ff: • Data structure • Ordering of data • Type of data • Size of data
  18. from flask import Flask, abort, make_response app = Flask(__name__) @app.route('/one')

    def one(): return 'This is a success.' @app.route('/two') def two(): print('This will return a 404 error') abort(404) @app.route('/three') def three(): print('This will return a 500 error') return make_response('Error', 500) if __name__ == '__main__': app.run(debug = True) Simple Flask Application
  19. Design Patterns and Solutions 1) Create a custom error response

    class, use Exception as parent class 2) Flask has a feature @app.errorhandler(ErrorResponse) to be able to catch custom exception errors 3) Throw the error Exception for error handling.
  20. from flask import Flask, abort, make_response, jsonify app = Flask(__name__)

    class ErrorResponse(Exception): status_code = 400 def __init__(self, message, status_code=None, payload=None): Exception.__init__(self) self.message = message if status_code is not None: self.status_code = status_code self.payload = payload def to_dict(self): return dict(code=self.status_code, message=self.message, data=self.payload) @app.errorhandler(ErrorResponse) def exception_encountered(error): error = error.to_dict() # You can modify this to return any kind of error # You can return an error page # You can use json return for pure API return make_response(jsonify(error), error['code']) Solution - Flask App
  21. @app.route('/one') def one(): return 'This is a success.' @app.route('/two') def

    two(): raise ErrorResponse('This will be 404', status_code=404) @app.route('/three') def three(): raise ErrorResponse('This will be a 500 error', payload='With additional data', status_code=500) if __name__ == '__main__': app.run(debug = True) Solution - Flask App
  22. class Rick: name = 'Rick' dimension = 'c137' def wubulubudubdub(self):

    print('I am not okay') def show_dimension(self): print('I am {} from dimension {}'.format(self.name, self.dimension)) class Morty: name = 'Morty' dimension = 'c137' def awwwww_geeez(self): print('Geeeezz Rick!') def show_dimension(self): print('I am {} from dimension {}'.format(self.name, self.dimension)) if __name__ == '__main__': Rick().show_dimension() Morty().show_dimension() Class Usage
  23. Design Patterns and Solutions 1) You can group common methods/functions

    of different class by creating parent class that have them as methods 2) Utilize class features such as interfaces, subclasses and singleton patterns. Singletons for example are very useful in storing states in any kinds of application. 3) Method overriding and method overloading
  24. class InfoFormatter: def show_dimension(self): print('I am {} from dimension {}'.format(self.name,

    self.dimension)) class Rick(InfoFormatter): name = 'Rick' dimension = 'c137' def wubulubudubdub(self): print('I am not okay') class Morty(InfoFormatter): name = 'Morty' dimension = 'c137' def awwwww_geeez(self): print('Geeeezz Rick!') if __name__ == '__main__': Rick().show_dimension() Morty().show_dimension() Solution - Class Usage
  25. import datetime def type_based_operation(data): if type(data) == str: return 'The

    string is: {}'.format(data) elif type(data) == int: return 100 + data elif type(data) == datetime.datetime: return datetime.datetime.now() return print(type_based_operation('Hello')) print(type_based_operation(200)) print(type_based_operation(datetime.datetime(2018, 1, 1))) Filter-based operation
  26. Design Patterns and Solutions 1) Use a dictionary to store

    lambda functions needed for each data type. 2) Utilize dictionaries and mapping for faster access and filtering. In python, there are several “hashable” data types that you can use. This is a basic example but it has big potential especially when dealing with much more complex problems.
  27. import datetime type_based_operation = dict() type_based_operation[str] = lambda data: 'The

    string is: {}'.format(data) type_based_operation[int] = lambda data: 100 + data type_based_operation[datetime.datetime] = lambda data: datetime.datetime.now() dt = datetime.datetime(2018, 1, 1) print(type_based_operation[type('Hello')]('Hello')) print(type_based_operation[type(200)](200)) print(type_based_operation[type(dt)](dt)) Solution - Type based operation
  28. # Result from MySQL database ninjas = [ { 'name':

    'Naruto', 'team': 7, 'skill': 'bunshin' }, { 'name': 'Ino', 'team': 4, 'skill': 'mind transfer' }, { 'name': 'Sakura', 'team': 7, 'skill': 'strength' }, { 'name': 'hinata', 'team': 2, 'skill': 'byakugan' }, { 'name': 'Neji', 'team': 3, 'skill': 'byakugan' }, .... ] Data formatting
  29. [{ 'members': [ { 'name': 'Naruto', 'team': 7, 'skill': 'bunshin'

    }, { 'name': 'Sakura', 'team': 7, 'skill': 'strength' }, { 'name': 'Sasuke', 'team': 7, 'skill': 'sharingan' } ], 'member_count': 3 }, { 'members': [ { 'name': 'Ino', 'team': 4, 'skill': 'mind transfer' }, { 'name': 'Choji', 'team': 4, 'skill': 'expansion' } ], 'member_count': 2 }, . . . . ] Data formatting
  30. teams = [] temp_dict = {} for row in ninjas:

    if row['team'] not in temp_dict: temp_dict[row['team']] = { 'members': [], 'member_count': 0 } temp_dict[row['team']]['members'].append(row) temp_dict[row['team']]['member_count'] += 1 for team in temp_dict.values(): teams.append(team) print(teams) Data formatting
  31. Design Patterns and Solutions • Oftentimes, code structure depends on

    the data it handles • Data structure, arrangement and data type affects the speed of your code • Know the best data structures for different types of task. Eg. search - trees, etc • Sometimes a tradeoff between speed and aesthetics
  32. teams = [] # Too lazy to create a sorted

    return data, assume its sorted in MySQL ninjas = sorted(ninjas, key=lambda x: x['team']) # ignore this current_team = None for row in ninjas: if current_team != row['team']: current_team = row['team'] teams.append({ 'members': [], 'member_count': 0 }) index = len(teams) - 1 teams[index]['members'].append(row) teams[index]['member_count'] += 1 print(teams) Data formatting
  33. Other examples Using “shortcuts” and language quirks. # Merge two

    dictionary using unpacking a = {'name': 'john', 'age': 25} b = {'weight': 55, 'height': 160} c = {**a, **b} c = {'weight': 55, 'height': 160, 'name': 'john', 'age': 25}
  34. chars = ['a', 'b', 'c', 'd', 'e', 'f'] look_for =

    'n' # Usual implementation found = False for c in chars: if look_for == c: print('Found you!') found = True break if not found: print('Not found "{}"'.format(look_for)) # Use Else for c in chars: if look_for == c: print('Found you!') break else: print('Not found "{}"'.format(look_for)) Else on loops
  35. ## Avoid using map(), filter(), etc # Use list comprehension

    as much as possible. It is good for performance num = [1, 2, 3] doubles = map(lambda x: x * 2, num) num = [1, 2, 3] doubles = [x * 2 for x in num] List operations
  36. Single Responsibility Principle - One function/class/module should only handle one

    responsibility. Eg. add() function should only do addition - Keep responsibility/use case as small as possible - Each function/class should be able to integrate properly. Eg. add(), subtract(), multiply() should be able to create a calculator() function
  37. Measuring effectivity Look at your goals. • Did your code

    quality improved? • Is it much more readable? • Is your testing a lot better? No tests should break randomly. • Is it easier to expand and integrate your code? • Is the overall system more modular?
  38. Showing off It’s important that other people in your company

    understands what are the changes and how it improved the current system. Organize a small brown bag session to discuss the improvements.
  39. Showing off • Do not forget to DOCUMENT the changes.

    • Create benchmarks to show quantitative data of the improvements. You can use http://pyperformance.readthedocs.io • Ask your leads/PM to inform mid managers about the improvements • If you created a new library as a result of refactoring, do not forget to inform other developers that such library/tools exists.
  40. Going forward • Have a regular code assessment and review.

    Dedicate time once a month. • Make sure your leads and managers understand the importance of a clean codebase. • Early refactoring wastes a lot of time. • Refactoring too late incurs a lot of technical debt. • Use language features and design patterns as much as possible.
  41. You can follow my articles on https://medium.com/@ninzz. I try to

    write stuff at least once a month. All code examples can be found at https://github.com/pprmint/pycon_kr If you are interested in AI and Healthcare technology, you can check us out at Hacarus. We use Sparse Modeling instead of the traditional ML/Deep Learning approaches. https://hacarus.com/ https://hacarus.com/information/tech/sparse-modeling-for-it-engineers/