Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Why is Python So Fast? - Eric Chiang

Why is Python So Fast? - Eric Chiang

A peek under the hood of all your favorite scientific Python packages.

PyGotham 2014

August 17, 2014
Tweet

More Decks by PyGotham 2014

Other Decks in Programming

Transcript

  1. Me

  2. All That Is Solid Melts Into Air - Data and

    the Law Big Data in Biology: A Case Study in Computational Proteomics with Python and MongoDB Bend Postgres to Your Pythonic Will Canonical sectors and evolutions of US stocks: an application of machine learning in Python Debunking Other People's Data Science Building flexible tools to store sums and report on CSV data Enough Machine Learning to Make Hacker News Readable Again Graph Analysis with Python Harlem Election Rematch: Money and Demographics in the 13th Helping Python Play Chess How I use Python to Fight human trafficking MEDS: Malware Evolution Discovery System Monary: Really fast analysis with MongoDB and NumPy One Gestalt to Rule Them All Practical Approaches to Problems in the Financial Industry using Python Pretty Pictures Please Preventing Data Flat-lining Python for Curious People who Like Natural Language a Lot Python in the Video Game Industry – Best Practices and Finding Cheaters SMS for Humans: Using NLP To Make Text Message Interfaces That Fat Fingers Can Use Sparkling Pandas - using Apache Spark to scale Pandas Speed without drag Statistics and Linear Regression Models with Python TSAR (the TimeSeries AggregatoR) ... Video, Python and FFmpeg: What you can do! Weather of The Century What Problem Are You Trying to Solve, Anyway? Case in point:
  3. Awesome toolbox >  from  sklearn.linear_model  import  ElasticNet   >  

    >  import  statsmodels.api  as  sm   >  sm.tsa.ARMA   >       >  from  pandas  import  DataFrame   >  
  4. Python is slow a  =  [0,  1,  2,  3,  4]

      b  =  [x  +  1  for  x  in  a]  #  [1,2,3,4,5]  
  5. Python is slow n  =  10000000  #  ten  million  

    m  =  100     import  random   import  time     _min  =  -­‐100000   _max  =  100000     a  =  [random.randint(_min,_max)  for  _  in  xrange(n)]     begin  =  time.time()   for  _  in  xrange(m):          b  =  [x  +  1  for  x  in  a]   end  =  time.time()     print  "Time  spent:  %f"  %  (end  -­‐  begin,)    
  6. Python is slow n  =  10000000  #  ten  million  

    m  =  100     import  random   import  time     _min  =  -­‐100000   _max  =  100000     a  =  [random.randint(_min,_max)  for  _  in  xrange(n)]     begin  =  time.time()   for  _  in  xrange(m):          b  =  [x  +  1  for  x  in  a]   end  =  time.time()     print  "Time  spent:  %f"  %  (end  -­‐  begin,)    
  7. Python is slow n  =  10000000  #  ten  million  

    m  =  100     import  random   import  time     _min  =  -­‐100000   _max  =  100000     a  =  [random.randint(_min,_max)  for  _  in  xrange(n)]     begin  =  time.time()   for  _  in  xrange(m):          b  =  [x  +  1  for  x  in  a]   end  =  time.time()     print  "Time  spent:  %f"  %  (end  -­‐  begin,)     ~100 sec
  8. Python is slow #define  n  10000000   #define  m  100

      int  main(){          clock_t  begin,  end;          double  time_spent;          int  *a  =  malloc(n  *  sizeof(int));          int  *b  =  malloc(n  *  sizeof(int));          int  i,  j;          for(i=0;  i<n;  i++)  {  a[i]  =  rand();  }          begin  =  clock();          for(i=0;  i<m;  i++)                  for(j=0;  j<n;  j++)                          b[j]  =  a[j]  +  1;          end  =  clock();          time_spent  =  (double)(end  -­‐  begin)  /  CLOCKS_PER_SEC;          printf("Time  spent:  %f\n",  time_spent);   }    
  9. Python is slow #define  n  10000000   #define  m  100

      int  main(){          clock_t  begin,  end;          double  time_spent;          int  *a  =  malloc(n  *  sizeof(int));          int  *b  =  malloc(n  *  sizeof(int));          int  i,  j;          for(i=0;  i<n;  i++)  {  a[i]  =  rand();  }          begin  =  clock();          for(i=0;  i<m;  i++)                  for(j=0;  j<n;  j++)                          b[j]  =  a[j]  +  1;          end  =  clock();          time_spent  =  (double)(end  -­‐  begin)  /  CLOCKS_PER_SEC;          printf("Time  spent:  %f\n",  time_spent);   }    
  10. Python is slow #define  n  10000000   #define  m  100

      int  main(){          clock_t  begin,  end;          double  time_spent;          int  *a  =  malloc(n  *  sizeof(int));          int  *b  =  malloc(n  *  sizeof(int));          int  i,  j;          for(i=0;  i<n;  i++)  {  a[i]  =  rand();  }          begin  =  clock();          for(i=0;  i<m;  i++)                  for(j=0;  j<n;  j++)                          b[j]  =  a[j]  +  1;          end  =  clock();          time_spent  =  (double)(end  -­‐  begin)  /  CLOCKS_PER_SEC;          printf("Time  spent:  %f\n",  time_spent);   }     ~2.5 sec
  11. Adding two numbers x  =  1   y  =  1

      z  =  x  +  y   An example from Why Python is Slow by Jake Vanderplas
  12. This isn’t novel “Using Python, better applications can be developed

    because different kinds of programmers can work together on a project. For example, when building a scientific application, C/C++ programmers can implement efficient numerical algorithms, while scientists on the same project can write Python programs that test and use those algorithms.” - Guido van Rossum; 1998
  13. This isn’t novel “Using Python, better applications can be developed

    because different kinds of programmers can work together on a project. For example, when building a scientific application, C/C++ programmers can implement efficient numerical algorithms, while scientists on the same project can write Python programs that test and use those algorithms.” - Guido van Rossum; 1998
  14. This isn’t novel “Using Python, better applications can be developed

    because different kinds of programmers can work together on a project. For example, when building a scientific application, C/C++ programmers can implement efficient numerical algorithms, while scientists on the same project can write Python programs that test and use those algorithms.” - Guido van Rossum; 1998
  15. All That Is Solid Melts Into Air - Data and

    the Law Big Data in Biology: A Case Study in Computational Proteomics with Python and MongoDB Bend Postgres to Your Pythonic Will Canonical sectors and evolutions of US stocks: an application of machine learning in Python Debunking Other People's Data Science Building flexible tools to store sums and report on CSV data Enough Machine Learning to Make Hacker News Readable Again Graph Analysis with Python Harlem Election Rematch: Money and Demographics in the 13th Helping Python Play Chess How I use Python to Fight human trafficking MEDS: Malware Evolution Discovery System Monary: Really fast analysis with MongoDB and NumPy One Gestalt to Rule Them All Practical Approaches to Problems in the Financial Industry using Python Pretty Pictures Please Preventing Data Flat-lining Python for Curious People who Like Natural Language a Lot Python in the Video Game Industry – Best Practices and Finding Cheaters SMS for Humans: Using NLP To Make Text Message Interfaces That Fat Fingers Can Use Sparkling Pandas - using Apache Spark to scale Pandas Speed without drag Statistics and Linear Regression Models with Python TSAR (the TimeSeries AggregatoR) ... Video, Python and FFmpeg: What you can do! Weather of The Century What Problem Are You Trying to Solve, Anyway? Case in point:
  16. Interesting links: •  Jake Vanderplas; Why Python is Slow: Looking

    Under the Hood; May, 2014; http://jakevdp.github.io/blog/2014/05/09/why-python-is-slow/ •  Matt Asay; Python Displacing R As The Programming Language For Data Science; November, 2013; http://readwrite.com/2013/11/25/python-displacing-r-as-the-programming- language-for-data-science •  Guido van Rossum; Glue It All Together With Python; January, 1998; https://www.python.org/doc/essays/omg-darpa-mcc-position/ •  NumPy docs; Using Python as Glue; (Last updated) March, 2014; http://docs.scipy.org/doc/numpy/user/c-info.python-as-glue.html