Python Performance Tuning

Python Performance Tuning FALUDI, Bence Database Manager

Positions: -  Database Manager @ Mito -  Head Of Development
@ Ozmo -  Organizer @ Database Meetup Tasks: -  Data warehouse design -  Mathematical predictions -  Data Cleansing & Analytics -  ETL & Python Development -  IT Project Management Introduction FALUDI, Bence [email protected]

Process of modifying a software system to work more efficiently
or use fewer resources. •  Design level: architectural design, manage resources, choose efficient algorithms, create architectural design •  Source code level: avoiding poor quality code and obvious slowdowns, use the correct functions •  Compile level: option build flags, compiler prediction, optimize run-time compilers Program optimization

Region of a computer program where most time is spent
during the execution. Pareto principle: Roughly 80% of the eﬀects come from 20% of the causes. Also known as 80-20 rule. In source code level optimization have to ﬁnd the hot spot and make performance tuning. Bottleneck

Deterministic profiling of Python programs. Make statistics which contains how
long and how often different parts of the program executed. cProfile is C extension which suitable for long running program profiling. Statistics can be processed via pstats module. Used to determine the program’s hot spot. cProfile

Start proﬁling: python -m cProfile –o <filename> <script> Make formatted
statistics: import pstats pstats.Stats(<filename>) \ .strip_dirs() \ .sort_stats("cumulative") \ .print_stats() Measure functions executing time with timeit cProﬁle

cProﬁle

Remove deepcopy This is the most common cause of the
speed issues. We have to remove it immediately. Let’s see the following example: class Item( object ): def __init__( self, name ): self.name = name.upper() class Container( object ): def __init__( self, name, items ): self.name, self.items = name, items

Remove deepcopy * 1000 operation with following object: Container('test data',[Item(str(x))
for x in xrange(100)]) Examples Speed*, last items are same? d = copy.copy(c) d.items.append(Item(’fck')) Same last item, it’s not good! 0.0131s d = copy.deepcopy(c) d.items.append(Item(’fck')) Last items are not equal! 2.4987s d = c.clone() d.items.append(Item(’fck')) Last items are not equal! 0.0909s Copy is quick but inaccurate with inner objects while deepcopy is slow and accurate. Have to write own method to create copies for the object.

Remove deepcopy Good version could be the following with ~27x
speed upgrade. class Item( object ): def __init__( self, name ): self.name = name.upper() def clone(self): return Item(self.name) class Container( object ): def __init__( self, name, items ): self.name, self.items = name, items def clone(self): return Container( self.name, [i.clone() for i in self.items] )

Xrange instead of Range Avoid Use range(100000000) # Create: 2.7908s
# Loop: 0.7717s # All: 3.5625s xrange(100000000) # Create: 0.0000s # Loop: 2.0659s # All: 2.0659s Range result will be list while xrange is a generator object. It could be created instantly, calculates the next value in real-time and don’t allocate huge amount of memory unnecessary!

Generator functions Use generator object or generator expressions to save
memory and not collect unnecessary data over and over. Avoid Use def getUpperWords(words): rlst = [] for word in words: rlst.append( word.upper() ) return rlst def getUpperWords(words): for word in words: yield word.upper() def getUpperWords(words): return (w.upper() for w in words) y = [ x for x in xrange(n) if x % 3 == 0 ] y = ( x for x in xrange(n) if x % 3 == 0 )

Generators vs. Iterators A generator function is diﬀerent than an
object that supports iteration! In generator function you can iterate over the generated data once, but if you want to do it again, you have to call the generator function again. It is diﬀerent than a list you can not iterate over as many times as you want!

String concatenation Avoid Use s = ‘’ for substring in
list: s += substring s = ‘’.join(list) s = ‘’ for substring in list: s += call_fn( substring ) s = ‘’.join([call_fn(ss) for ss in list]) "<html>" + head + prologue + query + tail + "</html>" "<html>%(head)s%(prologue)s% (query)s%(tail)s</html>" % locals() These methods are more quicker and don’t allocate more memory unnecessary.

Given a string with the following format and want to
return the integer part of the string: \w+-\d+ This is how most of you solve it: int('normalstandardstr-42'.split('-',1)[1]) Split the string on the dash, get the second part and convert it into integer. Is it eﬃcient? Split strings

Split is allocate a list of two elements with two
string. It is create two copies and convert one of it into integer and throwing the rest away. Some solutions: 1.  int(s.split('-',1)[1]) 2.  int(s[s.index('-')+1:]) 3.  int(s[s.find('-')+1:]) The correct and best solution to do this is the third way! Index is using ﬁnd function but throw exception when the substring is not found. Split strings

Append and upper are function references and have to reevaluated
every time in the loop. Use it with caution but in basic loops these method could result mayor improvement. 25-33% Remove dots Avoid Use nlst = [] for word in oldlist: nlst.append(word.upper()) nlst = [] upper = str.upper append = nlst.append for word in oldlist: append(upper(word))

You could use map to push from interpreter into compiled
C code. More better way if you don’t create the entire list at once instead of create a generator object which can be iterated over item by item. Call function to all list items Avoid Use nlst = [] for word in oldlist: nlst.append( word.upper() ) # Use genexp instead of list nlst = ( w.upper() for w in oldlist ) # Use map function (7x) nlst = map(str.upper,oldlist)

Other frequent situation when we want to create a counter.
For example we building a dictionary of word frequencies. Usually you would do one of the following: Counter 1. Good when key ~ words 2. Never good choice wdict = {} for word in words: if word not in wdict: wdict[word] = 0 wdict[word] += 1 wdict = {} for word in words: wdict.setdefault(word,0) wdict[word] += 1 Except for the ﬁrst time the if statement test will fail. If you have many items in a list and few keys this solution is not good enough and will be slow.

Let’s see some solution for the problem for using collections
module and some tricks to avoid condition: Counter 3. Never good choice 4. One of the best solution wdict = Counter( words ) widct = defaultdict(int) for word in words: wdict[word] += 1 5. Good when key << word 6. Best solution wdict = {} for word in words: try: wdict[word] += 1 except KeyError: wdict[word] = 1 wdict = {} get = wdict.get for word in words: wdict[word] = get(word,0)+1

Counter Quickest solution the 6th option for most of the
time and it is the best solution when it has many diﬀerent keys. The 4th option is very good when few keys expected. Method Few key running time Many key running time Place 1st 2nd 3rd 4th 5th 6th 0.0231s (3) 0.0363s (5) 0.0441s (6) 0.0165s (1) 0.0304s (4) 0.0220s (2) 0.4915s (2) 0.5698s (4) 0.5769s (5) 0.5159s (3) 1.0867s (6) 0.3880s (1) 3rd 2nd 1st

Runtime function mapping class Test2( object ): def run( self
): wordnumbers = [] append = wordnumbers.append for line in lines: self.do( line, append ) def do( self, line, append ): append(line*100) if line.count('f’)==4 and line.count('f’)==4 and line[0]= '6': self.do = self.doThen def doThen( self, line, append ): append( line ) When you have a complex IF statement* in your loop and exist a limit when you want to do other you should use runtime function mapping. => 15-25% based on the condition * Never use when your condition is not pretty complicated. Could occur 50% slowdowns.

PyPy PyPy is a fast alternative implementation of Python. Many
unique features and ability available in PyPy which could improve the performance and speed. PyPy contains: Just In Time compiler, Better memory management, Compatible with existing python codes, Supports sandboxing and micro-threads for concurrency

PyPy Lists hasn’t got pre-allocation in Python language so every
time a new element appended it has to resize the entire list and copy all data. If you know the list’s length use newlist_hint Avoid (0.6906s for 10M number) Use (0.1140s for 10M number) nlst = [] append = nlst.append for i in xrange(n): append(i+i) from __pypy__ import newlist_hint nlst = newlist_hint(n) append = nlst.append for i in xrange(n): append(i+i)

Still slow? If you want to use more then once
CPU core in native Python code you have to spawn multiple processes. Usable modules: •  Multiprocessing •  Parallel Python

Monitoring Never forget to monitor your tools and web applications.
Code optimization will always be required, it is never done. Give a try to New Relic to monitor your WSGI application.

FALUDI, Bence [email protected] Website: http://bfaludi.com Thank you for your
attention! Tests are available to download

Looking for a Junior Database Developer to maintain and
develop Python BI tools and design data warehouses for well-‐known clients. h@p://mito.hu/karrier/junior-‐adatbazis-‐fejleszto/ Join Us!

Python Performance Tuning

Python Performance Tuning

Bence Faludi

More Decks by Bence Faludi

Other Decks in Technology

Featured

Transcript

Python Performance Tuning FALUDI, Bence Database Manager

Positions: -  Database Manager @ Mito -  Head Of Development

Process of modifying a software system to work more eﬃciently

Region of a computer program where most time is spent

Deterministic proﬁling of Python programs. Make statistics which contains how

Start proﬁling: python -m cProfile –o <filename> <script> Make formatted

cProﬁle

Remove deepcopy This is the most common cause of the

Remove deepcopy * 1000 operation with following object: Container('test data',[Item(str(x))

Remove deepcopy Good version could be the following with ~27x

Xrange instead of Range Avoid Use range(100000000) # Create: 2.7908s

Generator functions Use generator object or generator expressions to save

Generators vs. Iterators A generator function is diﬀerent than an

String concatenation Avoid Use s = ‘’ for substring in

Given a string with the following format and want to

Split is allocate a list of two elements with two

Append and upper are function references and have to reevaluated

You could use map to push from interpreter into compiled

Other frequent situation when we want to create a counter.

Let’s see some solution for the problem for using collections

Counter Quickest solution the 6th option for most of the

Runtime function mapping class Test2( object ): def run( self

PyPy PyPy is a fast alternative implementation of Python. Many

PyPy Lists hasn’t got pre-allocation in Python language so every

Still slow? If you want to use more then once

Monitoring Never forget to monitor your tools and web applications.

FALUDI, Bence [email protected] Website: http://bfaludi.com Thank you for your

Looking for a Junior Database Developer to maintain and