Slide 1

Slide 1 text

Tom Stuart, LRUG 2011-02-07 RELATIONAL ALGEBRA & AREL

Slide 2

Slide 2 text

A record is a set of attributes (name-value pairs). (name, “Tom Stuart”) (email, “[email protected]”) (twitter, “@tomstuart”) (name, “John Doe”) (email, “[email protected]”) (twitter, “@johndoe”) records relation A relation is a set of records. RELATIONAL ALGEBRA Relational algebra is the maths of records and relations. In maths, a record is a set of attributes, so each of these green things is a record, each containing three attributes with names and values. A relation is just a set of these records. You can think of it as a mathematical model of a database table.

Slide 3

Slide 3 text

An algebraic structure is a set with operations. They turn members of the set into other members. So that's the "relational" part, but what about the "algebra" part? Well, an algebraic structure is just a set with some operations that turn members of the set into other members. So you might have a unary operation that turns one member into another, or a binary operation that combines two members to make a third. If the members are relations then this whole thing is a relational algebra.

Slide 4

Slide 4 text

A “mathematical query language” for manipulating sets of records. So relational algebra = operations on relations. relational algebra SQL database vendor API raw data abstract concrete So relational algebra is all about operations on relations. It's like a mathematical query language for manipulating sets of records. In this respect it's a more abstract version of SQL, which is itself an abstraction over different database vendor APIs, which themselves abstract over the raw data stored in a database.

Slide 5

Slide 5 text

Selection: σP(R) P is a predicate Projection: πa,b,c,...(R) a,b,c,... are attribute names σtwitter=“@johndoe”(users) = πname,twitter(users) = (name, “Tom Stuart”) (twitter, “@tomstuart”) (name, “John Doe”) (twitter, “@johndoe”) (name, “John Doe”) (email, “[email protected]”) (twitter, “@johndoe”) There are six primitive operations in relational algebra. The first one is projection, which lets you pick out particular attributes from all records — you can see here that I've picked just the "name" and "twitter" attributes of all users. The second operation is selection, which lets you pick out all the records that satisfy some predicate — here are all the users whose Twitter screen name is "johndoe".

Slide 6

Slide 6 text

Product: R1 ! R2 users ! currencies = (name, “Tom Stuart”) (twitter, “@tomstuart”) (email, “[email protected]”) (code, “GBP”) (symbol, “£”) (name, “John Doe”) (twitter, “@johndoe”) (email, “[email protected]”) (code, “GBP”) (symbol, “£”) (name, “Tom Stuart”) (twitter, “@tomstuart”) (email, “[email protected]”) (code, “USD”) (symbol, “$”) (name, “John Doe”) (twitter, “@johndoe”) (email, “[email protected]”) (code, “USD”) (symbol, “$”) Rename: ρa/b(R) a and b are attribute names Union: R1 ∪ R2 Difference: R1 - R2 Then there's the product operation, which gives you every combination of records from one relation with records from another. So if you have two users and two currencies, you get four records in the product. And there's also rename, union and difference operations, which I don't have time to go into now.

Slide 7

Slide 7 text

You can compose operations: Can build natural join (⋈), outer joins (⟕, ⟖, ⟗), semijoins (⋉, ⋊), antijoin (Ὂ), intersection (∩), ... πname,tweet(σtwitter=screen_name(users ! ρtweet/text(tweets))) (name, “Tom Stuart”) (tweet, “Heading to @lrug”) (name, “Tom Stuart”) (tweet, “Talking at @lrug right now!”) (name, “John Doe”) (tweet, “WORST. LRUG TALK. EVER.”) = You can compose these operations. You can start out with one relation, then rename one of its attributes, then take the product with another relation, then do a selection, then do a projection, and get a relation like this. By composing the basic operations you can build more complex ones like the many varieties of database join.

Slide 8

Slide 8 text

Representing queries like this has advantages. •simple •unambiguous •flexible •transformable (e.g. query optimisation) It’s better than concatenating strings. Representing queries in this mathematical way has advantages. It's simple and unambiguous; it's flexible; and because the operations are well understood, you can safely transform relational algebra expressions to make them more efficient. In any case it's much less complex than manipulating strings of SQL.

Slide 9

Slide 9 text

AREL December 2007: Nick Kallen (Pivotal) starts sql_algebra January 2008: sql_algebra is renamed to Active Relation April 2008: Active Relation is renamed to Arel So, onto Arel. In 2007 Nick Kallen started the sql_algebra project, soon renamed to Active Relation and then ARel. Its tagline was "Arel is a Relational Algebra for Ruby", and it provided Ruby implementations of relations and the relational operations, plus the ability to convert them into SQL and execute them against a database.

Slide 10

Slide 10 text

>> comments = Arel::Table.new(:comments) => # >> Arel::Table.superclass => Arel::Relation >> comments[:user_id].count => > >> Arel::Count.superclasses => [Arel::Expression, Arel::Attribute, Object] >> comments[:user_id].eq(1) => #, @operand2=1> >> Arel::Equality.superclasses => [Arel::Binary, Arel::Predicate, Object] With Arel you can create a relation object which represents a table in a database. You do that by instantiating the Arel::Table class, which is a subclass of Relation. Once you have a relation you can build up expressions, like counting all the values of an attribute, or predicates, like testing the value of an attribute.

Slide 11

Slide 11 text

>> comment_counts = comments. group(comments[:user_id]). project(comments[:user_id], comments[:id].count.as(:comment_count)) => #, >], @relation= #], @relation=#>> >> Arel::Project.superclasses => [Arel::Compound, Arel::Relation, Object] You can apply operations to relations. Here I'm using projection to make a new relation which shows the number of comments per user. That gives me an instance of Arel::Project, which is also a subclass of Relation, so I can keep applying more operations to it until I get the relation I want.

Slide 12

Slide 12 text

>> users_with_comment_counts = users. join(comment_counts). on(users[:id].eq(comment_counts[:user_id])) => #], @relation1=#, @relation2= #, ], @relation= #], @relation=#>>> >> Arel::InnerJoin.superclasses => [Arel::Join, Arel::Relation, Object] For example, I can join it with another relation to cross-reference user names with comment counts. The resulting object is an Arel::InnerJoin, which is still a relation. I can keep doing this all day, just building up larger and larger Relation objects.

Slide 13

Slide 13 text

>> comment_counts.to_sql => "SELECT comments.user_id, COUNT(comments.id) AS comment_count FROM comments GROUP BY comments.user_id" >> users_with_comment_counts.to_sql => "SELECT users.id, users.name, users.email, users.twitter, comments_external.user_id, comments_external.comment_count FROM users INNER JOIN (SELECT comments.user_id, COUNT (comments.id) AS comment_count FROM comments GROUP BY comments.user_id) AS comments_external ON users.id = comments_external.user_id" Each Arel relation has a #to_sql method, which tells you what to execute in a database to actually get that relation's records. If you've built up complex relations with joins and calculations then this SQL will get quite long and involved, but you don't need to care about the details.

Slide 14

Slide 14 text

>> users_with_comment_counts.first.tuple => ["1", "Tom Stuart", "[email protected]", "@tomstuart", "1", "2"] >> users_with_comment_counts. project(users_with_comment_counts[:name], users_with_comment_counts[:comment_count]). first.tuple => ["Tom Stuart", "2"] >> tweets = Arel::Array.new([['@tomstuart', 'Talking'], ['@joebloggs', 'Worst ever']], [:screen_name, :text]) => # >> Arel::Array.superclass => Arel::Relation >> users.join(tweets).on(tweets[:screen_name]. eq(users[:twitter])).project(users[:name], tweets[:text]). first.tuple => ["Tom Stuart", "Talking"] In fact it'll execute the query *for* you. Here I'm pulling out the first record from that big relation, and here I'm building another relation and pulling a record from that. Relations don't even have to backed by a database table: here I'm making an in-memory relation using Arel::Array, then joining it with a relation from the database and pulling out a record. Magic.

Slide 15

Slide 15 text

ACTIVERECORD::RELATION June 2008: Nick Kallen moves to Twitter, loses interest May 2009: Bryan Helmkamp takes over as Arel maintainer GSoC 2009: Emilio Tagua integrates it with Active Record So, in 2008 Nick Kallen moved to Twitter and lost interest in Arel, but it was picked up by Bryan Helmkamp in 2009, and he mentored Emilio Tagua on a Google Summer of Code project to integrate Arel with Active Record. Here's the commit where he introduced ActiveRecord::Relation, which lets Active Record talk to the database through Arel.

Slide 16

Slide 16 text

• Internally: Active Record can build queries with the Arel API instead of concatenating SQL • Externally: ActiveRecord::Base.where et al return a lazy relation (like a named scope) • e.g. maybe your partial that renders the collection is in the fragment cache Why bother? The point of integrating Arel is to give Active Record a clean way of building queries without assembling strings of SQL. Another benefit is that Active Record's finder methods can return a relation instead of an array. Relations are lazy like named scopes, so they don't need to hit the database until you actually enumerate them.

Slide 17

Slide 17 text

Last August, Aaron Patterson landed an Arel commit with the ominous message "clean slate". And it's a very clean slate: he basically deleted all of the files and started again from scratch. This was the genesis of Arel 2.0, which is what Rails uses as of 3.0.2 (November). Arel 2.0 is very different — basically an SQL compiler.

Slide 18

Slide 18 text

Arel 2.0 Last August, Aaron Patterson landed an Arel commit with the ominous message "clean slate". And it's a very clean slate: he basically deleted all of the files and started again from scratch. This was the genesis of Arel 2.0, which is what Rails uses as of 3.0.2 (November). Arel 2.0 is very different — basically an SQL compiler.

Slide 19

Slide 19 text

>> comments = Arel::Table.new(:comments) => # >> Arel::Table.superclass => Object >> comment_counts = comments. group(comments[:user_id]). project(comments[:user_id], comments[:id].count.as('comment_count')) => #, @ctx=#, @visitor=#, @connection=#, ...> >> Arel::SelectManager.superclasses => [Arel::TreeManager, Object] ? ? So now when you make an Arel::Table, it's not a relation, it's just a table. When you do a projection on a table you get a SelectManager object, which isn't a relation either. But you can use these objects to build the abstract syntax tree of an SQL query.

Slide 20

Slide 20 text

>> comment_counts.to_sql => "SELECT comments.user_id, COUNT(comments.id) AS comment_count FROM comments GROUP BY comments.user_id" >> users_with_comment_counts = users. join(comment_counts). on(users[:id].eq(comment_counts[:user_id])) NoMethodError: undefined method `[]' for # >> users_with_comment_counts = users. join(comment_counts). on(users[:id].eq(comments[:user_id])) => # >> users_with_comment_counts.to_sql TypeError: Cannot visit Arel::SelectManager ? ? Once you have an abstract syntax tree you can turn it into SQL just like a relation. I tried to do a join with a table but it didn't work until I tweaked the syntax a bit, and then the resulting object blew up when I called to_sql on it, so it's safe to say I have no idea how to use the Arel 2.0 API.

Slide 21

Slide 21 text

Arel 1.0 was slow. "This implementation of “closure under composition” resulted in a “dup the world” on every method call. People were reporting that it would take minutes before the query even went to the database. Another problem is that nobody actually used its features. I asked Aaron Patterson why he started again. He said: "Arel 1.0 was slow. This implementation of "closure under composition" resulted in a "dup the world" on every method call. People were reporting that it would take minutes before the query even went to the database. Another problem is that nobody actually used its features."

Slide 22

Slide 22 text

Arel was an aesthetically motivated project. I'm not sure that there's a place in the world for art- software — certainly there's no audience for it. I'm not sure whether what I produced is good, but I did accomplish what I set out to accomplish. It still empowers Active Record's query language to be OO and composable like named_scope. I'm glad the Rails world has that now. I asked Nick Kallen what he thought about all this. He said: "Arel was an aesthetically motivated project. I'm not sure that there's a place in the world for art-software — certainly there's no audience for it. I'm not sure whether what I produced is good, but I did accomplish what I set out to accomplish. It still empowers Active Record's query language to be OO and composable like named_scope. I'm glad the Rails world has that now."

Slide 23

Slide 23 text

EOF