Ruby's Enumerable module provides simple, functional methods that allow for a great deal of flexibility and composability. We'll examine some lesser known features to expand our toolkit for dealing with collections. Learn more: http://bit.ly/1U3RRHw
LEARN HOW TO USE Enumerators ‣ SEE Lots of examples 4 At a high level, I hope you’ll get out this out of this talk, some new tips and ideas for using Enumerable and Enumerators. We’re going to see a lot of examples so strap on your seatbelts.
growing up. Out of college, I joined Teach for America to teach 8th grade science in Houston TX. After my very first day of school, my department chair handed me a Lego Mindstorms kit and said I don’t know what to do with this, you figure it out. So here I am, this green, 21 year old in an inner-city classroom with no idea how to teach or program, so I did what anyone would do in my position.
printf("Time left=%ds\n",i); sleep(10.0); } } For years, this is was what I thought programming was. Cobbling together routines and variables with for loops and while loops. Very procedural. I got into “software development” and began programming full-time in Java. So in other words, not much better than this.
I first saw Ruby, I was blown away. I remember seeing an expression like this. I didn’t fully understand how it worked at first, but the significance was huge. Code could be concise and beautiful too. So there’s some context. And we know this particular expression relates to the functionality of the Enumerable module.
joyful and expressive. Though you’re all familiar with Enumerable, it has some features that are not widely used or well understood. That’s what I’m going to focus on today.
p Array.included_modules p Hash.included_modules require 'set' p Set.included_modules p Range.included_modules p Array.included_modules - Object.included modules It says something that Enumerable is one of only two modules included by default in Ruby’s most important collection classes. In fact, Kernel is included in Object, so Enumerable is the only module directly included in the these classes.
} p [1, 2, 3].select { |n| n > 1 } p [1, 2, 3].any? { |n| n > 1 } p [1, 2, 3].all? { |n| n > 1 } [1, 4, 9] [2, 3] true false And you know how to use many of Enumerable methods. Most of you have used map, select, any?, all? to name a few. And we know and love this pattern of using a block to describe how to interact with members of the collection.
Let’s begin by looking at some lesser known methods and forms of Enumerable. The code you’ll see in today’s presentation is written for Ruby 2.2. Your mileage may vary in other Ruby versions.
:drop, :drop_while, :each_cons, :each_entry, :each_slice, :each_with_index, :each_with_object, :entries, :find, :find_all, :find_index, :first, :flat_map, :grep, :group_by, :include?, :inject, :lazy, :map, :max, :max_by, :member?, :min, :min_by, :minmax, :minmax_by, :none?, :one?, :partition, :reduce, :reject, :reverse_each, :select, :slice_after, :slice_before, :slice_when, :sort, :sort_by, :take, :take_while, :to_a, :to_h, :zip] p Enumerable.instance_methods.sort Speaking of Enumerable methods, we’ve got a lot to choose from. There are over 50 instance methods provided by Enumerable, some of which you know well, and I’ll bet, some less familiar. It could benefit you to explore aspects you haven’t used much yet.
} p (1..100).select { |n| n % 13 == 0 }.length p (1..100).count { |n| n % 13 == 0 } [13, 26, 39, 52, 65, 78, 91] 7 7 Since select returns a new array we don’t need in this expression, it’s wasteful to do so. It is more direct to take advantage of the block form of count, which will simply return the number.
Enumerable’s grep method with a regular expression to select a set of string values. It’s useful for other situations as well since grep matches its argument to items with the three-equals operator.
dates = [Date.new(1988, 7, 15), Date.new(1990, 1, 1)] p dates.select { |date| eighties.member?(date) }.map { |d| d.to_s } p dates.grep(eighties) { |d| d.to_s } p eighties === Date.new(1988, 7, 15) ["1988-07-15"] ["1988-07-15"] true We can use a date range with grep. Consider selecting a group of dates that happened in the eighties and printing them as strings. Instead of using a select/map expression, use a date range and a block. This works because range three-equals will return true for items that are contained in the range.
a => 1 ruby => 5 { } So we’re seeing there are often several ways to solve a given problem. The same applies to grouping. We’re know going to look at several ways count the occurrences of words in some text. The result will be a hash with key value pairs of words to word count. Basically a frequency histogram.
words.each do |w| histogram[w] ||= 0 histogram[w] += 1 end histogram puts histogram.sort_by { |k, v| -v }.take(6).to_h 22 {"the"=>29, "you"=>23, "thriller"=>20, "and"=>14, "to"=>12, "night"=>10, ...} Here’s the naive approach. We’re initializing a hash outside the iteration, lazy initializing each key value pair to 0, and incrementing the word count on each pass.
do |hist, n| hist[n] += 1 hist end puts histogram.sort_by { |k, v| -v }.take(6).to_h 23 {"the"=>29, "you"=>23, "thriller"=>20, "and"=>14, "to"=>12, "night"=>10, ...} Well, this works, but we can add some sophistication. We know that we can initialize Hash with a default value of 0 so there’s no need to lazy initialize the counts. We can also use reduce, also known as inject, to accumulate the histogram as an internal variable to the block. This means we don’t need to initialize a variable in the outer scope. We need to be careful to return the hash in the block.
b => 2 q => 1 { } {} reduce letters.reduce(Hash.new(0)) do |hist, n| hist[n] += 1 hist end {a=>1} a {a=>2} a {a=>2 b=>1} a b Reduce really confused me when I first encountered it. I’ll attempt to explain it with some janky pastel diagrams. Given a collection of letters, when we call reduce, we can inject a start value, in this case a hash, and feed values through some operation, which is to increment a value in a hash, and accumulate the final result like a snowball.
{ |sum, n| sum + n } sum 0 + 2 0 2 1 + 9 2 9 11 + 8 11 8 Remember, we can reduce to any value, like a number which is a the accumulated sum of adding the values of a number array together.
do |hist, n| hist[n] += 1 hist end puts histogram.sort_by { |k, v| -v }.take(6).to_h 26 {"the"=>29, "you"=>23, "thriller"=>20, "and"=>14, "to"=>12, "night"=>10, ...} Back to our word count example. We can actually replace reduce with the `each_with_object` method.
do |n, hist| hist[n] += 1 end puts histogram.sort_by { |k, v| -v }.take(6).to_h 27 {"the"=>29, "you"=>23, "thriller"=>20, "and"=>14, "to"=>12, "night"=>10, ...} It’s basically reduce with side effects. No matter what we do in the block, the object given is returned from the method, so we can clean up our reduce expression.
{ |w| w } .flat_map { |k, v| [k, v.size] }] puts histogram.sort_by { |k, v| -v }.take(6).to_h 28 {"the"=>29, "you"=>23, "thriller"=>20, "and"=>14, "to"=>12, "night"=>10, ...} Instead of reducing, we chain `group_by` with `flat_map` to group words by occurrence, count the size of each group, and convert the result into a flattened array that can be used as an argument to the Hash brackets method to produce the same histogram. I won’t diagram this one out, but it’s worth taking a look at on your own.
words.each do |w| histogram[w] ||= 0; histogram[w] += 1 end end x.report("reduce") do words.reduce(Hash.new(0)) { |hist, n| hist[n] += 1; hist } end x.report("each_with_object") do words.each_with_object(Hash.new(0)) { |n, hist| hist[n] += 1 } end x.report("group_by + flat_map") do Hash[*words.group_by { |w| w }.flat_map { |k, v| [k, v.size] }] end x.compare! # Output the comparison end 29 When you have multiple ways of solving a problem like this, it’s worth considering the pros and cons like semantics and benchmarking. Comparing our four methods via the benchmark/ips gem…
always been straightforward to iterate over multiple arrays at once. For this, we can use zip. Let’s solve pascal’s triangle using zip. Recall that pascal’s triangle is a collection of rows, where each element is the sum of the adjacent members of the preceding row.
p row = pascal_row(row) p row = pascal_row(row) p row = pascal_row(row) p row = pascal_row(row) 32 [1] [1, 1] [1, 2, 1] [1, 3, 3, 1] [1, 4, 6, 4, 1] Let’s write a method called pascal row that will take a row as an argument and return the next row in the triangle.
0, 1, 1 1, 1, 0 zip 1, 2, 1 0, 1 , 1, 1, 1, 0 0, 1 1, 1 1, 0 0, 1 , 1, 1, 1, 0 0+1 1+1 1+0 map 0, 1, 1 1, 1 1, 1, 0 1, 1, Let’s break down how this will work. We’ll create two arrays from the given row with 0 appended to either end - you’ll see why in a second. We’ll zip the two arrays together to form a single array of pairs. We’ll then add each pair together with map to produce the final row.
|a, b| a + b } end row = [1] p row p row = pascal_row(row) p row = pascal_row(row) p row = pascal_row(row) p row = pascal_row(row) 34 [1] [1, 1] [1, 2, 1] [1, 3, 3, 1] [1, 4, 6, 4, 1] Here’s our implementation: append and prepend zero, zip the arrays, map to sum the pairs.
the 1000-digit number that have the greatest product. What is the value of this product? text = <<-SEQ 73167176531330624919225119674426574742355349194934 96983520312774506326239578318016984801869478851843 85861560789112949495459501737958331952853208805511 12540698747158523863050715693290963295227443043557 ... SEQ numbers = text.gsub(/\s+/, '').each_char.map(&:to_i) p numbers.each_cons(13) # ... 36 ??? One of my favorite chunking Enumerables is `each_cons`. It enumerates consecutive elements in groups of a given size like a sliding window. Check out Project Euler number 8. You need to find the greatest product of thirteen adjacent digits in a huge string of numbers. I’ve given you a hint, but it would be bad form to tell you the answer.
Enumerable into our own classes to add collection behavior. I’d like to show a few good use cases for doing so. The main point in these next examples is how we can exploit our implementation of each.
yield "yellow" yield "green" yield "blue" yield "indigo" yield "violet" end end 38 from Well-Grounded Rubyist Here’s a simple example of a Rainbow Enumerable which has the basic components of a custom collection: it has included enumerable, it implements `each`, and `each` yields items. We don’t need to use hashes or arrays.
} puts "Starts with y?", rainbow.grep(%r{^y}, &:upcase) 39 Next color: red Next color: orange Next color: yellow Next color: green Next color: blue Next color: indigo Next color: violet Starts with y? YELLOW Now we can call any Enumerable method we like.
S ] RANKS = %w[ 2 3 4 5 6 7 8 9 10 J Q K A ] def initialize(n = 1) @cards = SUITS.cycle(n).flat_map do |suit| RANKS.map do |rank| [suit, rank] end end end def each(&block) @cards.each(&block) end end 40 from Well-Grounded Rubyist We can also wrap a standard collection in another class. Here we layer in the behavior of a card deck to build a collection of playing cards. The custom each method simply delegates to the underlying array of cards.
at a more pragmatic example like an API client that fetches a collection of tweets or search results. A custom collection class can be really useful when you’re consuming an endpoint with paginated data.
Array(@collection[start..-1]).each do |element| yield(element) end unless last? start = [@collection.size, start].max fetch_next_page # api call adds to @collection each(start, &Proc.new) end self end end end 42 The twitter gem implements `each` to hide the complexity of fetching pages from the API. Here each will enumerate, fetch the next page, and recursively call itself until the last page is reached all hidden away from the caller. This is really mind-blowing: you can start enumerating before you even have data to enumerate!
example? Since the Rack API expects the body of the response to respond to each, we can exploit this to implement a streaming endpoint to send data to the client before the page is fully rendered.
{'Content-Type' => 'text/plain'}, self] end def each yield 'one' sleep 1 yield 'two' sleep 2 yield 'three' sleep 3 end end use Rack::Chunked run App.new This is how the Rack::Chunked middleware works. Here, I show a small rack app that yields content in steps interspersed with sleep statements. The Rack::Chunked middleware does some work, including adding the “chunked” Transfer-Encoding header, to stream the data back in chunks. You can see headers return immediately then the content gradually comes in.
= "\r\n" @body.each do |chunk| size = chunk.bytesize next if size == 0 chunk = chunk.dup.force_encoding(Encoding::BINARY) yield [size.to_s(16), term, chunk, term].join end yield "0#{term}#{term}" end end end end Rack::Chunked wraps the body of the request and hijacks the each method to yield each new line with its bytesize separately.
render stream: true end end This is the same mechanism invoked when you declare the `stream` option in your Rails controller. Just one Rails slide today.
do |item, index| [item, index % 3] end p pairs 50 [["a", 0], ["b", 1], ["c", 2], ["d", 0], ["e", 1]] For instance, we used to complain about the fact that there was `each_with_index` but no `map_with_index`. You can now chain map and `with_index`, a special Enumerator method. Both the item and index are yielded to the block at the end of the expression.
do |item, index| index % 3 end group_2 = letters.reverse_each.each_with_index.group_by do |item, index| index % 3 end p group_1 p group_2 51 {0=>["e", "b"], 1=>["d", "a"], 2=>["c"]} {0=>[["e", 0], ["b", 3]], 1=>[["d", 1], ["a", 4]], 2=>[["c", 2]]} You chain several Enumerators, like reversing, grouping, and indexing results in one go. Keep in mind, the order may affect the results. So we’re extending behavior of our existing enumerable methods.
['aliceblue', 'ghostwhite'].cycle require 'erb' erb = (<<-ERB) <table> <% projects.each_with_index do |project, index| %> <tr style="background: <%= colors.next %>"> <td><%= index + 1 %></td> <td><%= project.name %></td> </tr> <% end %> </table> ERB puts ERB.new(erb).result(binding).gsub(/^$\n/, "") 54 <table> <tr style="background: aliceblue"> <td>1</td> <td>TODO</td> </tr> <tr style="background: ghostwhite"> <td>2</td> <td>Work</td> </tr> <tr style="background: aliceblue"> <td>3</td> <td>Home</td> </tr> </table> We use cycle here with the next method to get external enumeration. When we have a reference to an enumerator, calling next repeatedly will enumerate values one by one. So we can create striped table rows, by getting the next color in each iteration of the block in this ERB tag.
enumerator.next rescue StopIteration => e p "Halt!!! #{e.class}" end enumerator.rewind loop do p enumerator.next end p "Done!!!" 55 1 2 "Halt!!! StopIteration" 1 2 "Done!!!" When using an Enumerator with next, it’s worth knowing that an error will be raised when then end of the enumeration is reached. Rewind allows you to start the enumeration over. Ruby’s loop is smart enough to rescue from a StopIteration error to exit the loop as you can see here.
"red" yield "orange" yield "yellow" yield "green" yield "blue" yield "indigo" yield "violet" end end Here’s how to do it: the Kernel module provides `to_enum`, also called `enum_for`. It takes a method name and other arguments if necessary and returns an Enumerator with a reference to your Enumerable object and method that was called. Now we can expose a colors method to get an enumerable method on Rainbow - we don’t actually need to include Enumerable to get this behavior.
} puts "Starts with y?", rainbow.colors.grep(%r{^y}, &:upcase) 58 Next color: red Next color: orange Next color: yellow Next color: green Next color: blue Next color: indigo Next color: violet Starts with y? YELLOW Now we can call any Enumerable method we like.
0) return to_enum(:each, start) unless block_given? Array(@collection[start..-1]).each do |element| yield(element) end unless last? start = [@collection.size, start].max fetch_next_page # api call adds to @collection each(start, &Proc.new) end self end end end
end p enum.map { |n| n * n } [1, 4] Here’s an example of a bare Enumerator. We can create one with `Enumerator.new` and pass a block to declare how values will be generated. This example will behave like a two item array.
end enum.map { |n| n * n } Enumerator::Yielder Taking a closer look, notice a y parameter is given to the block. This object is called a Yielder. Notice that we’re not actually using the yield keyword in the block but calling a yield method on the Yielder. The Yielder allows us to pass values from one block to another.
do y.yield n n += 1 end end enum.each { |i| puts i } And we can do anything we want within the template block like maintain state and use loops. You can use an Enumerator to generate an infinite sequence.
1 loop do y.yield a a, b = b, a + b end end end p fibonacci.take(10) [1, 1, 2, 3, 5, 8, 13, 21, 34, 55] Here’s a fibonacci sequeunce implemented with an Enumerator. This is pretty amazing: we usually think of concrete collections with Enumerables; but now we can apply them to mathematical concepts.
= first loop do y.yield current current = next_row(current) end end end def next_row(row) ([0] + row).zip(row + [0]).map { |a, b| a + b } end end require 'pp' pp PascalsTriangle.new.rows.take(7) 65 [[1], [1, 1], [1, 2, 1], [1, 3, 3, 1], [1, 4, 6, 4, 1], [1, 5, 10, 10, 5, 1], [1, 6, 15, 20, 15, 6, 1]] We can go one step further with the Pascal’s triangle example using an Enumerator to repeatedly call our next_row method to generate the triangle as a sequence.
x }.first(10) Ctrl-C!!! Consider the problem with this infinite number generator. We’re attempting to map over the results of an infinite range then take the first 10. This will never finish because by default, enumerables are eager: all values will be processed before being passed down the chain. With one little change, we can insert the lazy method, which returns a special type of enumerator, a lazy one, and voila, we get the first ten results.
x }.first(10) [1, 4, 9, 16, 25, 36, 49, 64, 81, 100] 1 1 2 4 3 9 Lazy is special type of Enumerator that will yield each value to the next caller before enumerating the remaining items instead of doing this eagerly. Each successive Enumerator in the chain is reimplemented to behave lazily. This means the end of the chain can control the flow of execution. In this case, when the first ten items are received, first at the end will raise an error which is rescued further back up the chain, allowing the enumeration to exit. We can take advantage of this feature in special use cases.
row['ADM_RATE'].to_f rate > 0 && range === rate end end CSV.open('../../data/college-scorecard.csv', 'rb', headers: true) do |csv| results = csv.each.lazy. grep(CSVAdmissionRate.new(0.0..0.10)).take(10). each_with_object({}) do |row, data| puts row['INSTNM'] # streaming results end end 69 Yale University University of Chicago Harvard University Massachusetts Institute of Technology Dartmouth College Princeton University Columbia University Cooper Union for the Advancement of Science and Art The Juilliard School Curtis Institute of Music Imagine we’re processing a large CSV file in a memory constrained environment. Here I’m using the college scorecard data you can download off data.gov and the CSV class from the standard library to process CSV data. We have an AdmissionRate object that implements the three-equals method to select CSV rows with an admission rate falling in a given range. Using lazy, I can grep for the first 10 colleges with an admission rate of less than 10% and stream back the results without loading the entire CSV file in memory.
web crawler in Ruby. We’ll create a Spider class that will process a root webpage and follow links in breadth first search order and recording data from each page. We only know the root url to start. We’ll dynamically add to our list of urls to visit, while simultaneously consuming those urls and recording a dataset of page information as we go. We’ll use enumerators to do this.
block_given? i = @results.length url_enum.each do |url, handler, data| send handler, agent.get(url), data if block_given? && @results.length > i yield @results.last i += 1 end sleep @interval if @interval > 0 end end end Our public method, results, which will yield crawled data as we iterate over the urls to crawl. Notice the call to to_enum if no block is given. Then we’ll grab the next url to process and repeat. We also provide a sleep interval to respect the crawl limits requested in robots.txt.
do |y| while index < @urls.count && index <= @max_urls url = @urls[index] next unless url handler, data = @handlers[url] y.yield url, handler, data index += 1 end end end end Our `url_enum` is an Enumerator that yields urls and handlers given as method names to process each page.
links = page.links_with(href: %r{^/ebooks/\d+$}) links.map do |a| title, author = a.text.strip.split("\n") process resolve_url(a.href, page), :process_book, title: title end end def process_book(page, data) books = %w[epub kindle txt htm].each_with_object({}) do |fmt, hash| hash[fmt] = page.links_with(href: %r{ebooks/[^/]*#{fmt}.*$}).map(&:href) end record data.merge(books) end end @urls << url @results << data The Gutenberg Spider subclass specifies handlers for crawling gutenberg.org to grab links for freely-available classic literature in ebook format.
# Search for Charles Dickens ebooks spider = GutenbergSpider.new('http://www.gutenberg.org/ebooks/author/ 37', :process_index) spider.results.lazy.take(5).each_with_index do |result, i| puts "storing #{i}: #{result.inspect}" # streaming store.transaction do store[result[:title]] = result store.commit end end To use the spider, we’ll simply enumerate the results. No need for the caller to be concerned with the implementation details of fetching and parsing pages. We make it lazy to avoid processing all the pages and to stream back the first 5 results into a datastore. Here I’m using Yaml store, a nice abstraction over file-based data storage provided by the standard library.
like to emphasize that the Enumerable API is **not magic**. I highly recommend implementing it yourself in Ruby as an exercise to better understand how it works under the hood.
= CustomList.new(3, 13, 42, 4, 7) end it "supports map" do @list.map { |x| x + 1 }.must_equal([4, 14, 43, 5, 8]) end it "supports find" do @list.find { |x| x > 40 }.must_equal(42) @list.find(-> { 0 }) { |x| x > 50 }.must_equal(0) end it "supports select" do @list.select { |x| x.even? }.must_equal([42, 4]) end end 77 from practicingruby.com Consider a test-driven approach where you specify assertions for a “Custom Enumerable” module that you will implement yourself. Here are some sample specs for map, find, and select.
each end def sum # each end def average # each end end 78 For homework, implement as many of the Enumerable API methods as you like. Start with or reduce or grep. Then try adding your own extensions to Enumerable, like sum and average. In all cases, you should be able to write your implementation in terms of each.
enumerators 80 To wrap up, I hope you’ll be inspired to use more of the API, practice implementing Enumerable methods on your own for deeper understanding, and play with Enumerators to see how they can be useful in your everyday coding.
is that Enumerable provides us with basic building blocks that perform simple, functional roles and combine in many ways to create more powerful constructs. In other words, Enumerable is like Ruby Legos.