Python Performance Tuning

Slide 1

Slide 1 text

Python Performance Tuning FALUDI, Bence Database Manager

Slide 2

Slide 2 text

Positions: -  Database Manager @ Mito -  Head Of Development @ Ozmo -  Organizer @ Database Meetup Tasks: -  Data warehouse design -  Mathematical predictions -  Data Cleansing & Analytics -  ETL & Python Development -  IT Project Management Introduction FALUDI, Bence [email protected]

Slide 3

Slide 3 text

Process of modifying a software system to work more efficiently or use fewer resources. •  Design level: architectural design, manage resources, choose efficient algorithms, create architectural design •  Source code level: avoiding poor quality code and obvious slowdowns, use the correct functions •  Compile level: option build flags, compiler prediction, optimize run-time compilers Program optimization

Slide 4

Slide 4 text

Region of a computer program where most time is spent during the execution. Pareto principle: Roughly 80% of the eﬀects come from 20% of the causes. Also known as 80-20 rule. In source code level optimization have to ﬁnd the hot spot and make performance tuning. Bottleneck

Slide 5

Slide 5 text

Deterministic profiling of Python programs. Make statistics which contains how long and how often different parts of the program executed. cProfile is C extension which suitable for long running program profiling. Statistics can be processed via pstats module. Used to determine the program’s hot spot. cProfile

Slide 6

Slide 6 text

Start proﬁling: python -m cProfile –o Make formatted statistics: import pstats pstats.Stats(<filename>) \ .strip_dirs() \ .sort_stats("cumulative") \ .print_stats() Measure functions executing time with timeit cProﬁle

Slide 7

Slide 7 text

cProﬁle

Slide 8

Slide 8 text

Remove deepcopy This is the most common cause of the speed issues. We have to remove it immediately. Let’s see the following example: class Item( object ): def __init__( self, name ): self.name = name.upper() class Container( object ): def __init__( self, name, items ): self.name, self.items = name, items

Slide 9

Slide 9 text

Remove deepcopy * 1000 operation with following object: Container('test data',[Item(str(x)) for x in xrange(100)]) Examples Speed*, last items are same? d = copy.copy(c) d.items.append(Item(’fck')) Same last item, it’s not good! 0.0131s d = copy.deepcopy(c) d.items.append(Item(’fck')) Last items are not equal! 2.4987s d = c.clone() d.items.append(Item(’fck')) Last items are not equal! 0.0909s Copy is quick but inaccurate with inner objects while deepcopy is slow and accurate. Have to write own method to create copies for the object.

Slide 10

Slide 10 text

Remove deepcopy Good version could be the following with ~27x speed upgrade. class Item( object ): def __init__( self, name ): self.name = name.upper() def clone(self): return Item(self.name) class Container( object ): def __init__( self, name, items ): self.name, self.items = name, items def clone(self): return Container( self.name, [i.clone() for i in self.items] )

Slide 11

Slide 11 text

Xrange instead of Range Avoid Use range(100000000) # Create: 2.7908s # Loop: 0.7717s # All: 3.5625s xrange(100000000) # Create: 0.0000s # Loop: 2.0659s # All: 2.0659s Range result will be list while xrange is a generator object. It could be created instantly, calculates the next value in real-time and don’t allocate huge amount of memory unnecessary!

Slide 12

Slide 12 text

Generator functions Use generator object or generator expressions to save memory and not collect unnecessary data over and over. Avoid Use def getUpperWords(words): rlst = [] for word in words: rlst.append( word.upper() ) return rlst def getUpperWords(words): for word in words: yield word.upper() def getUpperWords(words): return (w.upper() for w in words) y = [ x for x in xrange(n) if x % 3 == 0 ] y = ( x for x in xrange(n) if x % 3 == 0 )

Slide 13

Slide 13 text

Generators vs. Iterators A generator function is diﬀerent than an object that supports iteration! In generator function you can iterate over the generated data once, but if you want to do it again, you have to call the generator function again. It is diﬀerent than a list you can not iterate over as many times as you want!

Slide 14

Slide 14 text

String concatenation Avoid Use s = ‘’ for substring in list: s += substring s = ‘’.join(list) s = ‘’ for substring in list: s += call_fn( substring ) s = ‘’.join([call_fn(ss) for ss in list]) "" + head + prologue + query + tail + "" "%(head)s%(prologue)s% (query)s%(tail)s" % locals() These methods are more quicker and don’t allocate more memory unnecessary.

Slide 15

Slide 15 text

Given a string with the following format and want to return the integer part of the string: \w+-\d+ This is how most of you solve it: int('normalstandardstr-42'.split('-',1)[1]) Split the string on the dash, get the second part and convert it into integer. Is it eﬃcient? Split strings

Slide 16

Slide 16 text

Split is allocate a list of two elements with two string. It is create two copies and convert one of it into integer and throwing the rest away. Some solutions: 1.  int(s.split('-',1)[1]) 2.  int(s[s.index('-')+1:]) 3.  int(s[s.find('-')+1:]) The correct and best solution to do this is the third way! Index is using ﬁnd function but throw exception when the substring is not found. Split strings

Slide 17

Slide 17 text

Append and upper are function references and have to reevaluated every time in the loop. Use it with caution but in basic loops these method could result mayor improvement. 25-33% Remove dots Avoid Use nlst = [] for word in oldlist: nlst.append(word.upper()) nlst = [] upper = str.upper append = nlst.append for word in oldlist: append(upper(word))

Slide 18

Slide 18 text

You could use map to push from interpreter into compiled C code. More better way if you don’t create the entire list at once instead of create a generator object which can be iterated over item by item. Call function to all list items Avoid Use nlst = [] for word in oldlist: nlst.append( word.upper() ) # Use genexp instead of list nlst = ( w.upper() for w in oldlist ) # Use map function (7x) nlst = map(str.upper,oldlist)

Slide 19

Slide 19 text

Other frequent situation when we want to create a counter. For example we building a dictionary of word frequencies. Usually you would do one of the following: Counter 1. Good when key ~ words 2. Never good choice wdict = {} for word in words: if word not in wdict: wdict[word] = 0 wdict[word] += 1 wdict = {} for word in words: wdict.setdefault(word,0) wdict[word] += 1 Except for the ﬁrst time the if statement test will fail. If you have many items in a list and few keys this solution is not good enough and will be slow.

Slide 20

Slide 20 text

Let’s see some solution for the problem for using collections module and some tricks to avoid condition: Counter 3. Never good choice 4. One of the best solution wdict = Counter( words ) widct = defaultdict(int) for word in words: wdict[word] += 1 5. Good when key << word 6. Best solution wdict = {} for word in words: try: wdict[word] += 1 except KeyError: wdict[word] = 1 wdict = {} get = wdict.get for word in words: wdict[word] = get(word,0)+1

Slide 21

Slide 21 text

Counter Quickest solution the 6th option for most of the time and it is the best solution when it has many diﬀerent keys. The 4th option is very good when few keys expected. Method Few key running time Many key running time Place 1st 2nd 3rd 4th 5th 6th 0.0231s (3) 0.0363s (5) 0.0441s (6) 0.0165s (1) 0.0304s (4) 0.0220s (2) 0.4915s (2) 0.5698s (4) 0.5769s (5) 0.5159s (3) 1.0867s (6) 0.3880s (1) 3rd 2nd 1st

Slide 22

Slide 22 text

Runtime function mapping class Test2( object ): def run( self ): wordnumbers = [] append = wordnumbers.append for line in lines: self.do( line, append ) def do( self, line, append ): append(line*100) if line.count('f’)==4 and line.count('f’)==4 and line[0]= '6': self.do = self.doThen def doThen( self, line, append ): append( line ) When you have a complex IF statement* in your loop and exist a limit when you want to do other you should use runtime function mapping. => 15-25% based on the condition * Never use when your condition is not pretty complicated. Could occur 50% slowdowns.

Slide 23

Slide 23 text

PyPy PyPy is a fast alternative implementation of Python. Many unique features and ability available in PyPy which could improve the performance and speed. PyPy contains: Just In Time compiler, Better memory management, Compatible with existing python codes, Supports sandboxing and micro-threads for concurrency

Slide 24

Slide 24 text

PyPy Lists hasn’t got pre-allocation in Python language so every time a new element appended it has to resize the entire list and copy all data. If you know the list’s length use newlist_hint Avoid (0.6906s for 10M number) Use (0.1140s for 10M number) nlst = [] append = nlst.append for i in xrange(n): append(i+i) from __pypy__ import newlist_hint nlst = newlist_hint(n) append = nlst.append for i in xrange(n): append(i+i)

Slide 25

Slide 25 text

Still slow? If you want to use more then once CPU core in native Python code you have to spawn multiple processes. Usable modules: •  Multiprocessing •  Parallel Python

Slide 26

Slide 26 text

Monitoring Never forget to monitor your tools and web applications. Code optimization will always be required, it is never done. Give a try to New Relic to monitor your WSGI application.

Slide 27

Slide 27 text

FALUDI, Bence [email protected] Website: http://bfaludi.com Thank you for your attention! Tests are available to download

Slide 28

Slide 28 text

Looking for a Junior Database Developer to maintain and develop Python BI tools and design data warehouses for well-‐known clients. h@p://mito.hu/karrier/junior-‐adatbazis-‐fejleszto/ Join Us!