Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Python Performance Tuning

Python Performance Tuning

Speak about the code optimization. Why is it important and what can we do to minimize CPU and memory resources. Want to write stable and fast code everytime but what are the limits and when we have to stop the performance tuning and should consider and accept a good enough version.

Bence Faludi

June 26, 2013
Tweet

More Decks by Bence Faludi

Other Decks in Technology

Transcript

  1. Positions: -  Database Manager @ Mito -  Head Of Development

    @ Ozmo -  Organizer @ Database Meetup Tasks: -  Data warehouse design -  Mathematical predictions -  Data Cleansing & Analytics -  ETL & Python Development -  IT Project Management Introduction FALUDI, Bence [email protected]
  2. Process of modifying a software system to work more efficiently

    or use fewer resources. •  Design level: architectural design, manage resources, choose efficient algorithms, create architectural design •  Source code level: avoiding poor quality code and obvious slowdowns, use the correct functions •  Compile level: option build flags, compiler prediction, optimize run-time compilers Program optimization
  3. Region of a computer program where most time is spent

    during the execution. Pareto principle: Roughly 80% of the effects come from 20% of the causes. Also known as 80-20 rule. In source code level optimization have to find the hot spot and make performance tuning. Bottleneck
  4. Deterministic profiling of Python programs. Make statistics which contains how

    long and how often different parts of the program executed. cProfile is C extension which suitable for long running program profiling. Statistics can be processed via pstats module. Used to determine the program’s hot spot. cProfile
  5. Start profiling: python -m cProfile –o <filename> <script> Make formatted

    statistics: import pstats pstats.Stats(<filename>) \ .strip_dirs() \ .sort_stats("cumulative") \ .print_stats() Measure functions executing time with timeit cProfile
  6. Remove deepcopy This is the most common cause of the

    speed issues. We have to remove it immediately. Let’s see the following example: class Item( object ): def __init__( self, name ): self.name = name.upper() class Container( object ): def __init__( self, name, items ): self.name, self.items = name, items
  7. Remove deepcopy * 1000 operation with following object: Container('test data',[Item(str(x))

    for x in xrange(100)]) Examples Speed*, last items are same? d = copy.copy(c) d.items.append(Item(’fck')) Same last item, it’s not good! 0.0131s d = copy.deepcopy(c) d.items.append(Item(’fck')) Last items are not equal! 2.4987s d = c.clone() d.items.append(Item(’fck')) Last items are not equal! 0.0909s Copy is quick but inaccurate with inner objects while deepcopy is slow and accurate. Have to write own method to create copies for the object.
  8. Remove deepcopy Good version could be the following with ~27x

    speed upgrade. class Item( object ): def __init__( self, name ): self.name = name.upper() def clone(self): return Item(self.name) class Container( object ): def __init__( self, name, items ): self.name, self.items = name, items def clone(self): return Container( self.name, [i.clone() for i in self.items] )
  9. Xrange instead of Range Avoid Use range(100000000) # Create: 2.7908s

    # Loop: 0.7717s # All: 3.5625s xrange(100000000) # Create: 0.0000s # Loop: 2.0659s # All: 2.0659s Range result will be list while xrange is a generator object. It could be created instantly, calculates the next value in real-time and don’t allocate huge amount of memory unnecessary!
  10. Generator functions Use generator object or generator expressions to save

    memory and not collect unnecessary data over and over. Avoid Use def getUpperWords(words): rlst = [] for word in words: rlst.append( word.upper() ) return rlst def getUpperWords(words): for word in words: yield word.upper() def getUpperWords(words): return (w.upper() for w in words) y = [ x for x in xrange(n) if x % 3 == 0 ] y = ( x for x in xrange(n) if x % 3 == 0 )
  11. Generators vs. Iterators A generator function is different than an

    object that supports iteration! In generator function you can iterate over the generated data once, but if you want to do it again, you have to call the generator function again. It is different than a list you can not iterate over as many times as you want!
  12. String concatenation Avoid Use s = ‘’ for substring in

    list: s += substring s = ‘’.join(list) s = ‘’ for substring in list: s += call_fn( substring ) s = ‘’.join([call_fn(ss) for ss in list]) "<html>" + head + prologue + query + tail + "</html>" "<html>%(head)s%(prologue)s% (query)s%(tail)s</html>" % locals() These methods are more quicker and don’t allocate more memory unnecessary.
  13. Given a string with the following format and want to

    return the integer part of the string: \w+-\d+ This is how most of you solve it: int('normalstandardstr-42'.split('-',1)[1]) Split the string on the dash, get the second part and convert it into integer. Is it efficient? Split strings
  14. Split is allocate a list of two elements with two

    string. It is create two copies and convert one of it into integer and throwing the rest away. Some solutions: 1.  int(s.split('-',1)[1]) 2.  int(s[s.index('-')+1:]) 3.  int(s[s.find('-')+1:]) The correct and best solution to do this is the third way! Index is using find function but throw exception when the substring is not found. Split strings
  15. Append and upper are function references and have to reevaluated

    every time in the loop. Use it with caution but in basic loops these method could result mayor improvement. 25-33% Remove dots Avoid Use nlst = [] for word in oldlist: nlst.append(word.upper()) nlst = [] upper = str.upper append = nlst.append for word in oldlist: append(upper(word))
  16. You could use map to push from interpreter into compiled

    C code. More better way if you don’t create the entire list at once instead of create a generator object which can be iterated over item by item. Call function to all list items Avoid Use nlst = [] for word in oldlist: nlst.append( word.upper() ) # Use genexp instead of list nlst = ( w.upper() for w in oldlist ) # Use map function (7x) nlst = map(str.upper,oldlist)
  17. Other frequent situation when we want to create a counter.

    For example we building a dictionary of word frequencies. Usually you would do one of the following: Counter 1. Good when key ~ words 2. Never good choice wdict = {} for word in words: if word not in wdict: wdict[word] = 0 wdict[word] += 1 wdict = {} for word in words: wdict.setdefault(word,0) wdict[word] += 1 Except for the first time the if statement test will fail. If you have many items in a list and few keys this solution is not good enough and will be slow.
  18. Let’s see some solution for the problem for using collections

    module and some tricks to avoid condition: Counter 3. Never good choice 4. One of the best solution wdict = Counter( words ) widct = defaultdict(int) for word in words: wdict[word] += 1 5. Good when key << word 6. Best solution wdict = {} for word in words: try: wdict[word] += 1 except KeyError: wdict[word] = 1 wdict = {} get = wdict.get for word in words: wdict[word] = get(word,0)+1
  19. Counter Quickest solution the 6th option for most of the

    time and it is the best solution when it has many different keys. The 4th option is very good when few keys expected. Method Few key running time Many key running time Place 1st 2nd 3rd 4th 5th 6th 0.0231s (3) 0.0363s (5) 0.0441s (6) 0.0165s (1) 0.0304s (4) 0.0220s (2) 0.4915s (2) 0.5698s (4) 0.5769s (5) 0.5159s (3) 1.0867s (6) 0.3880s (1) 3rd 2nd 1st
  20. Runtime function mapping class Test2( object ): def run( self

    ): wordnumbers = [] append = wordnumbers.append for line in lines: self.do( line, append ) def do( self, line, append ): append(line*100) if line.count('f’)==4 and line.count('f’)==4 and line[0]= '6': self.do = self.doThen def doThen( self, line, append ): append( line ) When you have a complex IF statement* in your loop and exist a limit when you want to do other you should use runtime function mapping. => 15-25% based on the condition * Never use when your condition is not pretty complicated. Could occur 50% slowdowns.
  21. PyPy PyPy is a fast alternative implementation of Python. Many

    unique features and ability available in PyPy which could improve the performance and speed. PyPy contains: Just In Time compiler, Better memory management, Compatible with existing python codes, Supports sandboxing and micro-threads for concurrency
  22. PyPy Lists hasn’t got pre-allocation in Python language so every

    time a new element appended it has to resize the entire list and copy all data. If you know the list’s length use newlist_hint Avoid (0.6906s for 10M number) Use (0.1140s for 10M number) nlst = [] append = nlst.append for i in xrange(n): append(i+i) from __pypy__ import newlist_hint nlst = newlist_hint(n) append = nlst.append for i in xrange(n): append(i+i)
  23. Still slow? If you want to use more then once

    CPU core in native Python code you have to spawn multiple processes. Usable modules: •  Multiprocessing •  Parallel Python
  24. Monitoring Never forget to monitor your tools and web applications.

    Code optimization will always be required, it is never done. Give a try to New Relic to monitor your WSGI application.
  25. Looking  for  a  Junior  Database  Developer  to  maintain   and

     develop  Python  BI  tools  and  design  data   warehouses  for  well-­‐known  clients.     h@p://mito.hu/karrier/junior-­‐adatbazis-­‐fejleszto/     Join Us!