other systems so we can quickly list/suggest friends. This started life on Redis, but the data was growing too quickly. We decoupled it from Redis and wrote a Cassandra backend. We incrementally deployed it and got Redis out of the picture within two weeks. That was cool.
before launch, I switched to “devlops” mode. Along with Adam McManus, our ops guy, we focused on tuning Cassandra for better read performance and to resolve stability problems. We ended up bringing in a DataStax consultant to help us verify we were doing the right things with Cassandra. The result of this was that, at launch, our cluster held up well and we didn’t have any Cassandra-related problems.
interesting and easy to tune. There is a little bit of upfront research in ﬁguring out exactly what the knobs mean and what the reporting tools are saying. Once I ﬁgured that out, it was easy to iteratively tweak things and see if they were having a positive effect on the performance of our cluster.
databases I’ve tinkered with, Cassandra stands out in terms of modeling time-related data. If an application is going to pull data in time-order most of the time, Cassandra is a really great place to start. I also like the column-oriented data model. It’s great if you mostly need a key-value store, but occasionally need a key-key-value store.
Cassandra in the 0.6 release, when it was a giant pain to set up locally (XML conﬁgs). It’s better now, but I should have put more energy into helping the other developers on our team getting Cassandra up and working properly. If I were to do it again, I’d probably look into leaning on the install scripts the cassandra gem includes, rather than Homebrew and a myriad of scripts to hack the Cassandra conﬁg.
does not work like MySQL or Redis. It has different design constraints and a relatively unique approach to those constraints. In advocating and explaining Cassandra, I think I pitched too much as a database nerd and not enough as “here’s a great tool that can help us solve some problems”. I hope that CQL makes it easier to put Cassandra in front of non-database nerds in terms that they can easily relate to and immediately ﬁnd productivity.
several million rows of data into Cassandra, we found it difficult to quickly change how we represented that data. It became a game of “how can we incrementally rejigger this data structure to have these other properties we just ﬁgured out we want?” I’m not sure that’s a game you can easily win at with Cassandra. I’d love to read more about building evolvable data structures in Cassandra and see how people are dealing with high- volume, evolving data.
29, 11 Having developed a database-like thing, I have come to the conclusion that developers really don’t like them very much. AR was hugely successful because it was so much more effective than anything previous to it that tried to make databases just go away. The closer a database is to one of the native data structures in the host language, the better. If it’s not a native data structure, it should be something they can create in a REPL and then say “magically save this for me!”
every abstraction leaks. Once it does, developers want simple and useful tools that let them ﬁgure out what’s going on, what the data really looks like, tinker with it, and get back to their abstracted world as quickly as possible. This starts with tools for setting up the database, continues through interacting with it (database REPL), and for operating it (logging, introspection, etc.) Cassandra does pretty well with these tools, but they’re still a bit nerdy.
applications to use secondary indexes (a great feature) because they didn’t exist just yet. I should have spent more time integrating this into the design of our services. We got bit a lot towards the end of our release cycle because we were building all of our indexes in the application and hadn’t designed for reverse indexes. We also designed a rather coarse schema, which further complicated ad-hoc querying, which is another thing non-database-nerds love.