2013 Thursday, February 28, 13 Hi everyone, thanks for coming here on such short notice, I know there’s a lot of way more talented speakers you could be seeing right now.
source topaz. A high performance implementation of Ruby on top of RPython, which is the toolchain that powers PyPy. It’s pretty sweet, check it out and contribute!
13 If you’ve read PyPy is X times faster than CPython. Or C is Y times faster than Java or any other number like that. IGNORE THEM. These are myths, lies, and inappropriate attempts to use statistics.
So we’re going to talk about why a bunch of dynamic languages (and yes this basically applies to all of them, and also a bunch of statically typed languages) are slow. First let’s look at the common wisdom about why they’re slow and see where that gets us (hint: this slide is telling you).
common wisdom: dynamic typing, it makes stuff slow because the compiler can’t optimize stuff based on types. This is complete and utter bollocks. Dynamic typing is not a problem. We have good ways to solve it, like tracing JITs, method JITs with runtime type feedback, and predictive type infrencers.
not a problem. When you’re building a JIT, it’s basically a solved problem, it’s called “deoptimization” and basically that means, generate really optimized code, but still check your assumptions, if they’re wrong you bail out. Most advanced JITs, like RPython (as seen in PyPy and Topaz), and v8 have this.
harder to write a good VM for these languages because of those features. It’s hard to make reading an attribute in Python as fast as reading a field out of a struct in C. But we’ve done that so now we’re going to talk about why stuff is REALLY slow.
two things that can cause code to be slow: data structures and algorithms. Those are of course so broad, so general that they could apply to anything. But let’s start with a core point: I can run a line of Python or Ruby as fast as you can run a line of C or Java. It’s all about what that line does.
is fast, it’s basically a DSL for assembly, right? So C is fast because it’s basically like x86 instructions. But comparing what we do in it with what we do in higher level languages can make stuff clear.
self.y = y self.z = z Thursday, February 28, 13 This is a reasonable Python version of that. Some obvious differences: * no type declarations * you can add any other field you want to point instances * point instances are always allocated by the GC on the heap NONE OF THAT MATTERS, I WRITE COMPILERS I’LL DEAL WITH THAT
a mapping of arbitrary keys (usually of homogenous type) to arbirary values. An object is thing with a fixed set of properties and behaviors. You should never confuse them even if the performance difference didn’t exist, these are different things.
= z; Thursday, February 28, 13 I wanted to use a pure C hash table implementation, but googling “C hash table” didn’t bring up useful stuff. Anyways, if you put this in a code review, first your coworkers would laugh a lot, and then they’d make mean jokes about you for the next year. It looks ridiculous, and it is ridiculous.
implementations of these languages, that is CPython, MRI, and most JS implementations up until 5 years ago. There’s no performance difference. When you have a real JIT though, the difference is huge. They train themselves to think of a dictionary as a lightweight object as a weird dogma. And in Javascript it’s all one type so that’s a mess anyways.
probably the most used datastructure in all of programming. You use them EVERYWHERE. It’s pretty rare to have a single function that doesn’t have a string in it.
like 99% of python people would solve it. Split the string on the dash, get the second part, convert to an int. Let’s talk about efficiency. split is going to allocate a list of 2 elements, allocate 2 strings, do 2 copies, and then convert one to an int, throwing the rest away.
this do? Finds the first instance of a -, and converts the remainder of a string to an int. 0 allocations, 0 copies. Doing this with 0 copies is pretty much impossible in Python, and probably in ruby and Javascript too.
• Copying Thursday, February 28, 13 The JIT will take care of the accidental ones, but the ones that are fundamental part of your algorithm, you need to deal with those.
Own Buffer. In, C no stdlib function allocates a string basically, you’re always responsible for bringing a buffer to allocate data into. In practice this means C programmers tends to work on a single allocated block of memory.
char *start = data; while (start < data + 1024) { if (isspace(*start)) { break; } start++; } printf("%s\n", start); } Thursday, February 28, 13 A C funciton that reads data from a file descriptor, and then prints each chunk with the leading spaces removed. It does one allocation the whole time.
28, 13 The Python version is much prettier, much shorter, and much slower. It does multiple allocations per iteration of the loop. We can easily imagine if we wanted to add a .lower() to print the in lowercase, now there’s an extra allocation and an extra copy in the loop, whereas the C one could easily add that with no additional copying.
for (long i = 0; i < n; i++) { sq[i] = i * i; } return sq; } Thursday, February 28, 13 This is a basically idiomatic C function to return an array of squares in C. One allocation, no copying.
* i) return sq Thursday, February 28, 13 A basically idiomatic version of the same in Python. No list pre-allocation, so every iteration through the list we have the potential to need to resize the list and copy all the data. That’s inefficient.
not that there’s any necessary reason for these inefficiency. It’s that we’re missing APIs, and we could have GREAT APIs for this. PyPy has some secret ones:
i in xrange(n): sq.append(i * i) return sq Thursday, February 28, 13 newlist_hint returns you a list that looks totally normal, but internally it’s been preallocated. For building large lists this is way faster, with 0 loss of power, 0 loss of flexibility.
a compile author I’m begging you: don’t make dynamic languages compilers add crazy heuristics. We need good APIs in Python, Ruby, and Javascript for preallocating data, for not copying stuff left and right. And we can build them, it’s way easier to make a kickass API for something in any of these languages than C.
Ass Guess. And that’s what heuristics devolve to. But if we, as a community, don’t embrace a set of idioms for high performance Python, then as a compiler author this is what I’ll end up writing, and it’ll make me sad.
Code that is optimal for PyPy is not for CPython, and this divide is growing wider as we add more optimizations, and the same is true of Ruby and Javascript VMs. As a dynamic language community we need to get our act together on this front.
there’s no official Q/A so please come find me in the halls and bug me, I’m super excited to talk about this stuff, PyPy, Topaz, web application architecture, pretty much anything.