In an office considerably less sparse than this one, I assure you. Mea culpa, that’s “worked” in the past tense. I quit to join a startup last month. After signing up to give this talk. But I left on very good terms so I’m still doing it.
Adams kept popping into my brain. I do feel like we had scalding thrust upon us at Etsy, rather than choosing it intentionally. Which is not the same as saying that I was personally unhappy with it, exactly. I was not. This is the character that went on to try to insult every being in the cosmos in alphabetical order. So I’m not sure if it was intended as intentional allegory about the scala community.
Don’t run for the exits or anything. What I want to communicate with it is that in abstract, we aggregate logs from the live site, put them on hdfs. Then from there we crunch them to build internal tooling and features. For live features we’re putting job outputs into mysql shards; for backend tools we typically use a BI database (vertica) to fill the same need.
there. Cases where you’re probably the first person to ask not just this specifically, but you’re also probably the first person to ask any question even similar to it. Like this one. Etsy gets traffic to items that are sold. How often could we redirect that traffic to items that have close tags and titles?
is in a relatively raw form, which I’ll wave my hands and call analysis. And then we also build features and systems with scalding, which is more like what I’d call “engineering.” We do work for ranking, for recommendations, and so on in scalding.
about 800 scalding jobs in source control. And if everyone is like me, there are probably twice as many in working directories, not committed. Only about 90 of those, though, run as part of our nightly batch process.
added scalding to the build. And then he started trying to make things with it. Etsy’s not bureaucratic in any way I understand the word. But in theory there’s supposed to be at least some discussion before you start using a new framework. That didn’t happen at all with Scalding.
force of his intellect and personality doesn’t explain scalding’s runaway success. If that’s all it was about everyone would have stopped using it the minute he left. But the opposite of that happened.
two branches, one for the searches and one for the purchases. Then you cross join them and filter that shit down. And then you wind up with a branch for conversions per search term and a branch for visits per term, and you join those back together to get your answer.
also turns out to be a lot slower, too. Cascading doesn’t have a query optimizer, and this might be a lot closer if it did. But it doesn’t, so jruby winds up being done in many more mapreduce steps and takes like eight times longer.
resource problem in JRuby, which was taking seven hours to run every night. Someone rewrote it in scalding in a day or two and got it down to 20 minutes. The problem wasn’t that anything was impossible in cascading.jruby. The point is merely that scalding makes doing it the right way feel natural.
still carrying the baggage of 20th century software around with us. So analysis up front, which you’d do to see if you can make a case for doing the feature at all, feels like you’re not working. And the stuff in the middle feels like you’re really making progress. Even if it’s progress on something that could never actually work.
programmers are using day to day. Don’t mistake this as me saying they’re not smart enough, because they are. And it's not that learning FP wouldn't be good for everyone, because I think it is. And it's not that functional programming is fundamentally too hard, or anything like that. It’s just a statement of fact. Most programmers I know are not experienced with functional programming, and scala shares many functional idioms.
asking the question and getting an answer there’s this weird period in the middle where you have to learn a bunch of category theory. Sure it’s good for them, or something. But it’s also going to stop them from getting their answer.