Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Relational Algebra and Arel

Tom Stuart
February 07, 2011

Relational Algebra and Arel

A 20x20 lightning talk about relational algebra and the Arel library.

Given at the February 2011 LRUG meeting (http://lrug.org/meetings/2011/01/16/february-2011-meeting/). There's a video of this talk at http://skillsmatter.com/podcast/ajax-ria/arel-relational-algebra.

Tom Stuart

February 07, 2011
Tweet

More Decks by Tom Stuart

Other Decks in Programming

Transcript

  1. A record is a set of attributes (name-value pairs). (name,

    “Tom Stuart”) (email, “[email protected]”) (twitter, “@tomstuart”) (name, “John Doe”) (email, “[email protected]”) (twitter, “@johndoe”) records relation A relation is a set of records. RELATIONAL ALGEBRA Relational algebra is the maths of records and relations. In maths, a record is a set of attributes, so each of these green things is a record, each containing three attributes with names and values. A relation is just a set of these records. You can think of it as a mathematical model of a database table.
  2. An algebraic structure is a set with operations. They turn

    members of the set into other members. So that's the "relational" part, but what about the "algebra" part? Well, an algebraic structure is just a set with some operations that turn members of the set into other members. So you might have a unary operation that turns one member into another, or a binary operation that combines two members to make a third. If the members are relations then this whole thing is a relational algebra.
  3. A “mathematical query language” for manipulating sets of records. So

    relational algebra = operations on relations. relational algebra SQL database vendor API raw data abstract concrete So relational algebra is all about operations on relations. It's like a mathematical query language for manipulating sets of records. In this respect it's a more abstract version of SQL, which is itself an abstraction over different database vendor APIs, which themselves abstract over the raw data stored in a database.
  4. Selection: σP(R) P is a predicate Projection: πa,b,c,...(R) a,b,c,... are

    attribute names σtwitter=“@johndoe”(users) = πname,twitter(users) = (name, “Tom Stuart”) (twitter, “@tomstuart”) (name, “John Doe”) (twitter, “@johndoe”) (name, “John Doe”) (email, “[email protected]”) (twitter, “@johndoe”) There are six primitive operations in relational algebra. The first one is projection, which lets you pick out particular attributes from all records — you can see here that I've picked just the "name" and "twitter" attributes of all users. The second operation is selection, which lets you pick out all the records that satisfy some predicate — here are all the users whose Twitter screen name is "johndoe".
  5. Product: R1 ! R2 users ! currencies = (name, “Tom

    Stuart”) (twitter, “@tomstuart”) (email, “[email protected]”) (code, “GBP”) (symbol, “£”) (name, “John Doe”) (twitter, “@johndoe”) (email, “[email protected]”) (code, “GBP”) (symbol, “£”) (name, “Tom Stuart”) (twitter, “@tomstuart”) (email, “[email protected]”) (code, “USD”) (symbol, “$”) (name, “John Doe”) (twitter, “@johndoe”) (email, “[email protected]”) (code, “USD”) (symbol, “$”) Rename: ρa/b(R) a and b are attribute names Union: R1 ∪ R2 Difference: R1 - R2 Then there's the product operation, which gives you every combination of records from one relation with records from another. So if you have two users and two currencies, you get four records in the product. And there's also rename, union and difference operations, which I don't have time to go into now.
  6. You can compose operations: Can build natural join (⋈), outer

    joins (⟕, ⟖, ⟗), semijoins (⋉, ⋊), antijoin (Ὂ), intersection (∩), ... πname,tweet(σtwitter=screen_name(users ! ρtweet/text(tweets))) (name, “Tom Stuart”) (tweet, “Heading to @lrug”) (name, “Tom Stuart”) (tweet, “Talking at @lrug right now!”) (name, “John Doe”) (tweet, “WORST. LRUG TALK. EVER.”) = You can compose these operations. You can start out with one relation, then rename one of its attributes, then take the product with another relation, then do a selection, then do a projection, and get a relation like this. By composing the basic operations you can build more complex ones like the many varieties of database join.
  7. Representing queries like this has advantages. •simple •unambiguous •flexible •transformable

    (e.g. query optimisation) It’s better than concatenating strings. Representing queries in this mathematical way has advantages. It's simple and unambiguous; it's flexible; and because the operations are well understood, you can safely transform relational algebra expressions to make them more efficient. In any case it's much less complex than manipulating strings of SQL.
  8. AREL December 2007: Nick Kallen (Pivotal) starts sql_algebra January 2008:

    sql_algebra is renamed to Active Relation April 2008: Active Relation is renamed to Arel So, onto Arel. In 2007 Nick Kallen started the sql_algebra project, soon renamed to Active Relation and then ARel. Its tagline was "Arel is a Relational Algebra for Ruby", and it provided Ruby implementations of relations and the relational operations, plus the ability to convert them into SQL and execute them against a database.
  9. >> comments = Arel::Table.new(:comments) => #<Arel::Table @name="comments"> >> Arel::Table.superclass =>

    Arel::Relation >> comments[:user_id].count => <Arel::Count <Arel::Attribute user_id>> >> Arel::Count.superclasses => [Arel::Expression, Arel::Attribute, Object] >> comments[:user_id].eq(1) => #<Arel::Equality @operand1=<Arel::Attribute user_id>, @operand2=1> >> Arel::Equality.superclasses => [Arel::Binary, Arel::Predicate, Object] With Arel you can create a relation object which represents a table in a database. You do that by instantiating the Arel::Table class, which is a subclass of Relation. Once you have a relation you can build up expressions, like counting all the values of an attribute, or predicates, like testing the value of an attribute.
  10. >> comment_counts = comments. group(comments[:user_id]). project(comments[:user_id], comments[:id].count.as(:comment_count)) => #<Arel::Project @projections=

    [<Arel::Attribute user_id>, <Arel::Count <Arel::Attribute id>>], @relation= #<Arel::Group @groupings=[<Arel::Attribute user_id>], @relation=#<Arel::Table @name="comments">>> >> Arel::Project.superclasses => [Arel::Compound, Arel::Relation, Object] You can apply operations to relations. Here I'm using projection to make a new relation which shows the number of comments per user. That gives me an instance of Arel::Project, which is also a subclass of Relation, so I can keep applying more operations to it until I get the relation I want.
  11. >> users_with_comment_counts = users. join(comment_counts). on(users[:id].eq(comment_counts[:user_id])) => #<Arel::InnerJoin @predicates=[#<Arel::Equality ...>],

    @relation1=#<Arel::Table @name="users">, @relation2= #<Arel::Project @projections= [<Arel::Attribute ...>, <Arel::Count ...>], @relation= #<Arel::Group @groupings=[<Arel::Attribute user_id>], @relation=#<Arel::Table @name="comments">>>> >> Arel::InnerJoin.superclasses => [Arel::Join, Arel::Relation, Object] For example, I can join it with another relation to cross-reference user names with comment counts. The resulting object is an Arel::InnerJoin, which is still a relation. I can keep doing this all day, just building up larger and larger Relation objects.
  12. >> comment_counts.to_sql => "SELECT comments.user_id, COUNT(comments.id) AS comment_count FROM comments

    GROUP BY comments.user_id" >> users_with_comment_counts.to_sql => "SELECT users.id, users.name, users.email, users.twitter, comments_external.user_id, comments_external.comment_count FROM users INNER JOIN (SELECT comments.user_id, COUNT (comments.id) AS comment_count FROM comments GROUP BY comments.user_id) AS comments_external ON users.id = comments_external.user_id" Each Arel relation has a #to_sql method, which tells you what to execute in a database to actually get that relation's records. If you've built up complex relations with joins and calculations then this SQL will get quite long and involved, but you don't need to care about the details.
  13. >> users_with_comment_counts.first.tuple => ["1", "Tom Stuart", "[email protected]", "@tomstuart", "1", "2"]

    >> users_with_comment_counts. project(users_with_comment_counts[:name], users_with_comment_counts[:comment_count]). first.tuple => ["Tom Stuart", "2"] >> tweets = Arel::Array.new([['@tomstuart', 'Talking'], ['@joebloggs', 'Worst ever']], [:screen_name, :text]) => #<Arel::Array @array=[["@tomstuart", "Talking"], ["@joebloggs", "Worst ever"]], @attribute_names= [:screen_name, :text]> >> Arel::Array.superclass => Arel::Relation >> users.join(tweets).on(tweets[:screen_name]. eq(users[:twitter])).project(users[:name], tweets[:text]). first.tuple => ["Tom Stuart", "Talking"] In fact it'll execute the query *for* you. Here I'm pulling out the first record from that big relation, and here I'm building another relation and pulling a record from that. Relations don't even have to backed by a database table: here I'm making an in-memory relation using Arel::Array, then joining it with a relation from the database and pulling out a record. Magic.
  14. ACTIVERECORD::RELATION June 2008: Nick Kallen moves to Twitter, loses interest

    May 2009: Bryan Helmkamp takes over as Arel maintainer GSoC 2009: Emilio Tagua integrates it with Active Record So, in 2008 Nick Kallen moved to Twitter and lost interest in Arel, but it was picked up by Bryan Helmkamp in 2009, and he mentored Emilio Tagua on a Google Summer of Code project to integrate Arel with Active Record. Here's the commit where he introduced ActiveRecord::Relation, which lets Active Record talk to the database through Arel.
  15. • Internally: Active Record can build queries with the Arel

    API instead of concatenating SQL • Externally: ActiveRecord::Base.where et al return a lazy relation (like a named scope) • e.g. maybe your partial that renders the collection is in the fragment cache Why bother? The point of integrating Arel is to give Active Record a clean way of building queries without assembling strings of SQL. Another benefit is that Active Record's finder methods can return a relation instead of an array. Relations are lazy like named scopes, so they don't need to hit the database until you actually enumerate them.
  16. Last August, Aaron Patterson landed an Arel commit with the

    ominous message "clean slate". And it's a very clean slate: he basically deleted all of the files and started again from scratch. This was the genesis of Arel 2.0, which is what Rails uses as of 3.0.2 (November). Arel 2.0 is very different — basically an SQL compiler.
  17. Arel 2.0 Last August, Aaron Patterson landed an Arel commit

    with the ominous message "clean slate". And it's a very clean slate: he basically deleted all of the files and started again from scratch. This was the genesis of Arel 2.0, which is what Rails uses as of 3.0.2 (November). Arel 2.0 is very different — basically an SQL compiler.
  18. >> comments = Arel::Table.new(:comments) => #<Arel::Table @name="comments"> >> Arel::Table.superclass =>

    Object >> comment_counts = comments. group(comments[:user_id]). project(comments[:user_id], comments[:id].count.as('comment_count')) => #<Arel::SelectManager @ast=#<Arel::Nodes::SelectStatement>, @ctx=#<Arel::Nodes::SelectCore>, @visitor=#<Arel::Visitors::MySQL>, @connection=#<ActiveRecord::...::Mysql2Adapter>, ...> >> Arel::SelectManager.superclasses => [Arel::TreeManager, Object] ? ? So now when you make an Arel::Table, it's not a relation, it's just a table. When you do a projection on a table you get a SelectManager object, which isn't a relation either. But you can use these objects to build the abstract syntax tree of an SQL query.
  19. >> comment_counts.to_sql => "SELECT comments.user_id, COUNT(comments.id) AS comment_count FROM comments

    GROUP BY comments.user_id" >> users_with_comment_counts = users. join(comment_counts). on(users[:id].eq(comment_counts[:user_id])) NoMethodError: undefined method `[]' for #<Arel::SelectManager> >> users_with_comment_counts = users. join(comment_counts). on(users[:id].eq(comments[:user_id])) => #<Arel::SelectManager ...> >> users_with_comment_counts.to_sql TypeError: Cannot visit Arel::SelectManager ? ? Once you have an abstract syntax tree you can turn it into SQL just like a relation. I tried to do a join with a table but it didn't work until I tweaked the syntax a bit, and then the resulting object blew up when I called to_sql on it, so it's safe to say I have no idea how to use the Arel 2.0 API.
  20. Arel 1.0 was slow. "This implementation of “closure under composition”

    resulted in a “dup the world” on every method call. People were reporting that it would take minutes before the query even went to the database. Another problem is that nobody actually used its features. I asked Aaron Patterson why he started again. He said: "Arel 1.0 was slow. This implementation of "closure under composition" resulted in a "dup the world" on every method call. People were reporting that it would take minutes before the query even went to the database. Another problem is that nobody actually used its features."
  21. Arel was an aesthetically motivated project. I'm not sure that

    there's a place in the world for art- software — certainly there's no audience for it. I'm not sure whether what I produced is good, but I did accomplish what I set out to accomplish. It still empowers Active Record's query language to be OO and composable like named_scope. I'm glad the Rails world has that now. I asked Nick Kallen what he thought about all this. He said: "Arel was an aesthetically motivated project. I'm not sure that there's a place in the world for art-software — certainly there's no audience for it. I'm not sure whether what I produced is good, but I did accomplish what I set out to accomplish. It still empowers Active Record's query language to be OO and composable like named_scope. I'm glad the Rails world has that now."
  22. EOF