Slide 1

Slide 1 text

Thursday, February 28, 13

Slide 2

Slide 2 text

Why Python, Ruby, and Javascript are slow Alex Gaynor Waza 2013 Thursday, February 28, 13 Hi everyone, thanks for coming here on such short notice, I know there’s a lot of way more talented speakers you could be seeing right now.

Slide 3

Slide 3 text

You may know me from... Thursday, February 28, 13 So, first who am I. Here are some places you may have seen me.

Slide 4

Slide 4 text

rdio.com Thursday, February 28, 13 I work for a company called rdio, we do streaming internet music. We’re awesome, we’re hiring, even if you don’t want a job you should totally use our product.

Slide 5

Slide 5 text

• CPython • Django • PyPy Thursday, February 28, 13 I also write a lot of open source. I write primarily in Python, web stuff, compilers, I like infrastructure type projects.

Slide 6

Slide 6 text

Twitter rants about how computers are bad Thursday, February 28, 13 If you have the misfortune to follow me on twitter, I tweet about how computers are bad, software is bad, and we should all feel bad.

Slide 7

Slide 7 text

Topaz topazruby.com Thursday, February 28, 13 And recently I open source topaz. A high performance implementation of Ruby on top of RPython, which is the toolchain that powers PyPy. It’s pretty sweet, check it out and contribute!

Slide 8

Slide 8 text

There is no benchmark but your benchmark Thursday, February 28, 13 If you’ve read PyPy is X times faster than CPython. Or C is Y times faster than Java or any other number like that. IGNORE THEM. These are myths, lies, and inappropriate attempts to use statistics.

Slide 9

Slide 9 text

Lame Excuses about why they’re Slow Thursday, February 28, 13 So we’re going to talk about why a bunch of dynamic languages (and yes this basically applies to all of them, and also a bunch of statically typed languages) are slow. First let’s look at the common wisdom about why they’re slow and see where that gets us (hint: this slide is telling you).

Slide 10

Slide 10 text

Dynamic Typing Thursday, February 28, 13 So first bit of common wisdom: dynamic typing, it makes stuff slow because the compiler can’t optimize stuff based on types. This is complete and utter bollocks. Dynamic typing is not a problem. We have good ways to solve it, like tracing JITs, method JITs with runtime type feedback, and predictive type infrencers.

Slide 11

Slide 11 text

“You can monkey patch anything” Thursday, February 28, 13 Again, not a problem. When you’re building a JIT, it’s basically a solved problem, it’s called “deoptimization” and basically that means, generate really optimized code, but still check your assumptions, if they’re wrong you bail out. Most advanced JITs, like RPython (as seen in PyPy and Topaz), and v8 have this.

Slide 12

Slide 12 text

Harder to Optimize vs. Slow Thursday, February 28, 13 It’s harder to write a good VM for these languages because of those features. It’s hard to make reading an attribute in Python as fast as reading a field out of a struct in C. But we’ve done that so now we’re going to talk about why stuff is REALLY slow.

Slide 13

Slide 13 text

The Truth Thursday, February 28, 13 In truth, there are two things that can cause code to be slow: data structures and algorithms. Those are of course so broad, so general that they could apply to anything. But let’s start with a core point: I can run a line of Python or Ruby as fast as you can run a line of C or Java. It’s all about what that line does.

Slide 14

Slide 14 text

Let’s talk about C Thursday, February 28, 13 So C is fast, it’s basically a DSL for assembly, right? So C is fast because it’s basically like x86 instructions. But comparing what we do in it with what we do in higher level languages can make stuff clear.

Slide 15

Slide 15 text

struct Point { double x; double y; double z; }; Thursday, February 28, 13 This is defining a Point struct in C. It’s got 3 fields, they’re all doubles.

Slide 16

Slide 16 text

class Point(object): def __init__(self, x, y, z): self.x = x self.y = y self.z = z Thursday, February 28, 13 This is a reasonable Python version of that. Some obvious differences: * no type declarations * you can add any other field you want to point instances * point instances are always allocated by the GC on the heap NONE OF THAT MATTERS, I WRITE COMPILERS I’LL DEAL WITH THAT

Slide 17

Slide 17 text

data = { "x" x, "y": y, "z": z, } Thursday, February 28, 13 Here’s what I see people do a lot. This is a dictionary. Dictionaries are not objects, dictionaries are hash tables.

Slide 18

Slide 18 text

Dictionary vs. Object Thursday, February 28, 13 A dictionary is a mapping of arbitrary keys (usually of homogenous type) to arbirary values. An object is thing with a fixed set of properties and behaviors. You should never confuse them even if the performance difference didn’t exist, these are different things.

Slide 19

Slide 19 text

std::hash_set point; point["x"] = x; point["y"] = y; point["z"] = z; Thursday, February 28, 13 I wanted to use a pure C hash table implementation, but googling “C hash table” didn’t bring up useful stuff. Anyways, if you put this in a code review, first your coworkers would laugh a lot, and then they’d make mean jokes about you for the next year. It looks ridiculous, and it is ridiculous.

Slide 20

Slide 20 text

And it would be slow Thursday, February 28, 13 If you mixed up hash tables and objects in C or the like, it would be idiotically slow.

Slide 21

Slide 21 text

Why don’t people care? Thursday, February 28, 13 On naive implementations of these languages, that is CPython, MRI, and most JS implementations up until 5 years ago. There’s no performance difference. When you have a real JIT though, the difference is huge. They train themselves to think of a dictionary as a lightweight object as a weird dogma. And in Javascript it’s all one type so that’s a mess anyways.

Slide 22

Slide 22 text

Let’s talk about strings Thursday, February 28, 13 Strings are probably the most used datastructure in all of programming. You use them EVERYWHERE. It’s pretty rare to have a single function that doesn’t have a string in it.

Slide 23

Slide 23 text

Given a string matching: “\w+-\d+” return the integral part of the value Thursday, February 28, 13 This type of problem is a pretty common practical parsing task.

Slide 24

Slide 24 text

int(s.split("-", 1)[1]) Thursday, February 28, 13 This is probably how like 99% of python people would solve it. Split the string on the dash, get the second part, convert to an int. Let’s talk about efficiency. split is going to allocate a list of 2 elements, allocate 2 strings, do 2 copies, and then convert one to an int, throwing the rest away.

Slide 25

Slide 25 text

atoi(strchr(s, '-') + 1) Thursday, February 28, 13 What does this do? Finds the first instance of a -, and converts the remainder of a string to an int. 0 allocations, 0 copies. Doing this with 0 copies is pretty much impossible in Python, and probably in ruby and Javascript too.

Slide 26

Slide 26 text

Things that take time • Hash table lookups • Allocations • Copying Thursday, February 28, 13 The JIT will take care of the accidental ones, but the ones that are fundamental part of your algorithm, you need to deal with those.

Slide 27

Slide 27 text

The C way: BYOB Thursday, February 28, 13 Bring Your Own Buffer. In, C no stdlib function allocates a string basically, you’re always responsible for bringing a buffer to allocate data into. In practice this means C programmers tends to work on a single allocated block of memory.

Slide 28

Slide 28 text

char *data = malloc(1024); while (true) { read(fd, data, 1024); char *start = data; while (start < data + 1024) { if (isspace(*start)) { break; } start++; } printf("%s\n", start); } Thursday, February 28, 13 A C funciton that reads data from a file descriptor, and then prints each chunk with the leading spaces removed. It does one allocation the whole time.

Slide 29

Slide 29 text

while True: data = os.read(fd, 1024) print data.lstrip() Thursday, February 28, 13 The Python version is much prettier, much shorter, and much slower. It does multiple allocations per iteration of the loop. We can easily imagine if we wanted to add a .lower() to print the in lowercase, now there’s an extra allocation and an extra copy in the loop, whereas the C one could easily add that with no additional copying.

Slide 30

Slide 30 text

long *squares(long n) { long *sq = malloc(sizeof(long) * n); for (long i = 0; i < n; i++) { sq[i] = i * i; } return sq; } Thursday, February 28, 13 This is a basically idiomatic C function to return an array of squares in C. One allocation, no copying.

Slide 31

Slide 31 text

def squares(n): sq = [] for i in xrange(n): sq.append(i * i) return sq Thursday, February 28, 13 A basically idiomatic version of the same in Python. No list pre-allocation, so every iteration through the list we have the potential to need to resize the list and copy all the data. That’s inefficient.

Slide 32

Slide 32 text

Missing APIs Thursday, February 28, 13 Python, Ruby, Javascript, it’s not that there’s any necessary reason for these inefficiency. It’s that we’re missing APIs, and we could have GREAT APIs for this. PyPy has some secret ones:

Slide 33

Slide 33 text

from __pypy__ import newlist_hint def squares(n): sq = newlist_hint(n) for i in xrange(n): sq.append(i * i) return sq Thursday, February 28, 13 newlist_hint returns you a list that looks totally normal, but internally it’s been preallocated. For building large lists this is way faster, with 0 loss of power, 0 loss of flexibility.

Slide 34

Slide 34 text

Don’t make us add heuristics Thursday, February 28, 13 As a compile author I’m begging you: don’t make dynamic languages compilers add crazy heuristics. We need good APIs in Python, Ruby, and Javascript for preallocating data, for not copying stuff left and right. And we can build them, it’s way easier to make a kickass API for something in any of these languages than C.

Slide 35

Slide 35 text

Heuristics = WAG Thursday, February 28, 13 WAG = Wild Ass Guess. And that’s what heuristics devolve to. But if we, as a community, don’t embrace a set of idioms for high performance Python, then as a compiler author this is what I’ll end up writing, and it’ll make me sad.

Slide 36

Slide 36 text

Growing divide between optimizing and not Thursday, February 28, 13 Code that is optimal for PyPy is not for CPython, and this divide is growing wider as we add more optimizations, and the same is true of Ruby and Javascript VMs. As a dynamic language community we need to get our act together on this front.

Slide 37

Slide 37 text

Recap • Line for line these languages are fast! • Take care in data structures (data structure heuristics are the WORST) • We need better no-copy/preallocate APIs Thursday, February 28, 13

Slide 38

Slide 38 text

Don’t abandon beauty, simplicity, our values for performance Make performance beautiful. Thursday, February 28, 13

Slide 39

Slide 39 text

Thank you! https://speakerdeck.com/alex @alex_gaynor Thursday, February 28, 13 I’m told there’s no official Q/A so please come find me in the halls and bug me, I’m super excited to talk about this stuff, PyPy, Topaz, web application architecture, pretty much anything.

Slide 40

Slide 40 text

If there’s time • Java collections vs. Array and Hash. Need more choices. • Stop writing C extensions, use something like cffi • Teach good benchmarking practices Thursday, February 28, 13