Tom Stuart
February 07, 2011
910

# Relational Algebra and Arel

A 20x20 lightning talk about relational algebra and the Arel library.

Given at the February 2011 LRUG meeting (http://lrug.org/meetings/2011/01/16/february-2011-meeting/). There's a video of this talk at http://skillsmatter.com/podcast/ajax-ria/arel-relational-algebra.

## Tom Stuart

February 07, 2011

## Transcript

1. Tom Stuart, LRUG 2011-02-07
RELATIONAL
ALGEBRA
& AREL

2. A record is a set of attributes (name-value pairs).
(name, “Tom Stuart”)
(email, “[email protected]”)
(name, “John Doe”)
(email, “[email protected]”)
records
relation
A relation is a set of records.
RELATIONAL ALGEBRA
Relational algebra is the maths of records and relations. In maths, a record is a set of
attributes, so each of these green things is a record, each containing three attributes with
names and values. A relation is just a set of these records. You can think of it as a
mathematical model of a database table.

3. An algebraic structure is a set with operations.
They turn members of the set into other members.
So that's the "relational" part, but what about the "algebra" part? Well, an algebraic structure
is just a set with some operations that turn members of the set into other members. So you
might have a unary operation that turns one member into another, or a binary operation that
combines two members to make a third. If the members are relations then this whole thing is
a relational algebra.

4. A “mathematical query language” for manipulating
sets of records.
So relational algebra = operations on relations.
relational algebra
SQL
database vendor API
raw data
abstract
concrete
So relational algebra is all about operations on relations. It's like a mathematical query
language for manipulating sets of records. In this respect it's a more abstract version of SQL,
which is itself an abstraction over different database vendor APIs, which themselves abstract
over the raw data stored in a database.

5. Selection: σP(R) P is a predicate
Projection: πa,b,c,...(R) a,b,c,... are attribute names
(name, “John Doe”)
(name, “John Doe”)
(email, “[email protected]”)
There are six primitive operations in relational algebra. The ﬁrst one is projection, which lets
you pick out particular attributes from all records — you can see here that I've picked just the
"name" and "twitter" attributes of all users. The second operation is selection, which lets you
pick out all the records that satisfy some predicate — here are all the users whose Twitter
screen name is "johndoe".

6. Product: R1
! R2
users !
currencies =
(name, “Tom Stuart”)
(email, “[email protected]”)
(code, “GBP”)
(symbol, “£”)
(name, “John Doe”)
(email, “[email protected]”)
(code, “GBP”)
(symbol, “£”)
(name, “Tom Stuart”)
(email, “[email protected]”)
(code, “USD”)
(symbol, “\$”)
(name, “John Doe”)
(email, “[email protected]”)
(code, “USD”)
(symbol, “\$”)
Rename: ρa/b(R) a and b are attribute names
Union: R1 ∪ R2
Difference: R1 - R2
Then there's the product operation, which gives you every combination of records from one
relation with records from another. So if you have two users and two currencies, you get four
records in the product. And there's also rename, union and difference operations, which I
don't have time to go into now.

7. You can compose operations:
Can build natural join (⋈), outer joins (⟕, ⟖, ⟗),
semijoins (⋉, ⋊), antijoin (Ὂ), intersection (∩), ...
(name, “Tom Stuart”)
(name, “Tom Stuart”)
(tweet, “Talking at @lrug right now!”)
(name, “John Doe”)
(tweet, “WORST. LRUG TALK. EVER.”)
=
You can compose these operations. You can start out with one relation, then rename one of
its attributes, then take the product with another relation, then do a selection, then do a
projection, and get a relation like this. By composing the basic operations you can build more
complex ones like the many varieties of database join.

8. Representing queries like this has advantages.
•simple
•unambiguous
•ﬂexible
•transformable (e.g. query optimisation)
It’s better than concatenating strings.
Representing queries in this mathematical way has advantages. It's simple and unambiguous;
it's ﬂexible; and because the operations are well understood, you can safely transform
relational algebra expressions to make them more efficient. In any case it's much less
complex than manipulating strings of SQL.

9. AREL
December 2007: Nick Kallen (Pivotal) starts sql_algebra
January 2008: sql_algebra is renamed to Active Relation
April 2008: Active Relation is renamed to Arel
So, onto Arel. In 2007 Nick Kallen started the sql_algebra project, soon renamed to Active
Relation and then ARel. Its tagline was "Arel is a Relational Algebra for Ruby", and it provided
Ruby implementations of relations and the relational operations, plus the ability to convert
them into SQL and execute them against a database.

=> #
>> Arel::Table.superclass
=> Arel::Relation
=> >
>> Arel::Count.superclasses
=> [Arel::Expression, Arel::Attribute, Object]
=> #@operand1=, @operand2=1>
>> Arel::Equality.superclasses
=> [Arel::Binary, Arel::Predicate, Object]
With Arel you can create a relation object which represents a table in a database. You do that
by instantiating the Arel::Table class, which is a subclass of Relation. Once you have a
relation you can build up expressions, like counting all the values of an attribute, or
predicates, like testing the value of an attribute.

=> #@projections=
[,
>],
@relation=
#@groupings=[],
@relation=#>>
>> Arel::Project.superclasses
=> [Arel::Compound, Arel::Relation, Object]
You can apply operations to relations. Here I'm using projection to make a new relation which
shows the number of comments per user. That gives me an instance of Arel::Project, which is
also a subclass of Relation, so I can keep applying more operations to it until I get the
relation I want.

12. >> users_with_comment_counts = users.
join(comment_counts).
on(users[:id].eq(comment_counts[:user_id]))
=> #@predicates=[#],
@relation1=#,
@relation2=
#@projections=
[, ],
@relation=
#@groupings=[],
@relation=#>>>
>> Arel::InnerJoin.superclasses
=> [Arel::Join, Arel::Relation, Object]
For example, I can join it with another relation to cross-reference user names with comment
counts. The resulting object is an Arel::InnerJoin, which is still a relation. I can keep doing
this all day, just building up larger and larger Relation objects.

13. >> comment_counts.to_sql
>> users_with_comment_counts.to_sql
=> "SELECT users.id, users.name, users.email,
Each Arel relation has a #to_sql method, which tells you what to execute in a database to
actually get that relation's records. If you've built up complex relations with joins and
calculations then this SQL will get quite long and involved, but you don't need to care about
the details.

14. >> users_with_comment_counts.first.tuple
=> ["1", "Tom Stuart", "[email protected]", "@tomstuart",
"1", "2"]
>> users_with_comment_counts.
project(users_with_comment_counts[:name],
users_with_comment_counts[:comment_count]).
first.tuple
=> ["Tom Stuart", "2"]
>> tweets = Arel::Array.new([['@tomstuart', 'Talking'],
['@joebloggs', 'Worst ever']], [:screen_name, :text])
=> #["@joebloggs", "Worst ever"]], @attribute_names=
[:screen_name, :text]>
>> Arel::Array.superclass
=> Arel::Relation
>> users.join(tweets).on(tweets[:screen_name].
first.tuple
=> ["Tom Stuart", "Talking"]
In fact it'll execute the query *for* you. Here I'm pulling out the ﬁrst record from that big
relation, and here I'm building another relation and pulling a record from that. Relations don't
even have to backed by a database table: here I'm making an in-memory relation using
Arel::Array, then joining it with a relation from the database and pulling out a record. Magic.

15. ACTIVERECORD::RELATION
June 2008: Nick Kallen moves to Twitter, loses interest
May 2009: Bryan Helmkamp takes over as Arel maintainer
GSoC 2009: Emilio Tagua integrates it with Active Record
So, in 2008 Nick Kallen moved to Twitter and lost interest in Arel, but it was picked up by
Bryan Helmkamp in 2009, and he mentored Emilio Tagua on a Google Summer of Code
project to integrate Arel with Active Record. Here's the commit where he introduced
ActiveRecord::Relation, which lets Active Record talk to the database through Arel.

16. • Internally: Active Record can build queries with
the Arel API instead of concatenating SQL
• Externally: ActiveRecord::Base.where et al
return a lazy relation (like a named scope)
• e.g. maybe your partial that renders the
collection is in the fragment cache
Why bother?
The point of integrating Arel is to give Active Record a clean way of building queries without
assembling strings of SQL. Another beneﬁt is that Active Record's ﬁnder methods can return
a relation instead of an array. Relations are lazy like named scopes, so they don't need to hit
the database until you actually enumerate them.

17. Last August, Aaron Patterson landed an Arel commit with the ominous message "clean slate".
And it's a very clean slate: he basically deleted all of the ﬁles and started again from scratch.
This was the genesis of Arel 2.0, which is what Rails uses as of 3.0.2 (November). Arel 2.0 is
very different — basically an SQL compiler.

18. Arel 2.0
Last August, Aaron Patterson landed an Arel commit with the ominous message "clean slate".
And it's a very clean slate: he basically deleted all of the ﬁles and started again from scratch.
This was the genesis of Arel 2.0, which is what Rails uses as of 3.0.2 (November). Arel 2.0 is
very different — basically an SQL compiler.

=> #
>> Arel::Table.superclass
=> Object
=> #@ast=#,
@ctx=#,
@visitor=#,
@connection=#,
...>
>> Arel::SelectManager.superclasses
=> [Arel::TreeManager, Object]
?
?
So now when you make an Arel::Table, it's not a relation, it's just a table. When you do a
projection on a table you get a SelectManager object, which isn't a relation either. But you can
use these objects to build the abstract syntax tree of an SQL query.

20. >> comment_counts.to_sql
>> users_with_comment_counts = users.
join(comment_counts).
on(users[:id].eq(comment_counts[:user_id]))
NoMethodError: undefined method `[]' for
#
>> users_with_comment_counts = users.
join(comment_counts).
=> #
>> users_with_comment_counts.to_sql
TypeError: Cannot visit Arel::SelectManager
?
?
Once you have an abstract syntax tree you can turn it into SQL just like a relation. I tried to
do a join with a table but it didn't work until I tweaked the syntax a bit, and then the resulting
object blew up when I called to_sql on it, so it's safe to say I have no idea how to use the Arel
2.0 API.

21. Arel 1.0 was slow. "This implementation of
“closure under composition” resulted in a
“dup the world” on every method call.
People were reporting that it would take
minutes before the query even went to the
database.
Another problem is that nobody actually
used its features.
I asked Aaron Patterson why he started again. He said: "Arel 1.0 was slow. This
implementation of "closure under composition" resulted in a "dup the world" on every method
call. People were reporting that it would take minutes before the query even went to the
database. Another problem is that nobody actually used its features."

22. Arel was an aesthetically motivated project. I'm not
sure that there's a place in the world for art-
software — certainly there's no audience for it.
I'm not sure whether what I produced is good, but I
did accomplish what I set out to accomplish. It still
empowers Active Record's query language to be
OO and composable like named_scope.
I'm glad the Rails world has that now.
I asked Nick Kallen what he thought about all this. He said: "Arel was an aesthetically
motivated project. I'm not sure that there's a place in the world for art-software — certainly
there's no audience for it. I'm not sure whether what I produced is good, but I did accomplish
what I set out to accomplish. It still empowers Active Record's query language to be OO and
composable like named_scope. I'm glad the Rails world has that now."

23. EOF