Enumerable - How I Fell in Love with Ruby

ENUMERABLE How I fell in love with Ruby NYC.rb Nov
10, 2015 1 Welcome!

@rossta 2 My name is Ross Kaffenberger. I’m @rossta on
the Internet. This is my dog Simon.

3 I’m the head of engineering at Devpost the leading
platform for hackathons. Check us out at devpost.com. For developers like yourselves, it’s a great place to share your work.

What to expect ‣ tips FOR USING EnumerablE API ‣
LEARN HOW TO USE Enumerators ‣ SEE Lots of examples 4 At a high level, I hope you’ll get out this out of this talk, some new tips and ideas for using Enumerable and Enumerators. We’re going to see a lot of examples so strap on your seatbelts.

BRIEF HISTORY teacher to programmer 5 I didn’t program much
growing up. Out of college, I joined Teach for America to teach 8th grade science in Houston TX. After my very first day of school, my department chair handed me a Lego Mindstorms kit and said I don’t know what to do with this, you figure it out. So here I am, this green, 21 year old in an inner-city classroom with no idea how to teach or program, so I did what anyone would do in my position.

6 BOTBALL I started a robotics club. I’d teach myself
how to program just enough to pass on to my students each day. Eventually we entered competitions, including one called Botball.

7 BOTBALL We learned how to program in C to
manipulate servos and sensors for this micro controller called the MIT Handyboard, a sort of predecessor to the Arduino.

KARBON PRESENTATION 6 • 24 LEVEL 13, 82 SPENCER ST,
MELBOURNE 8 BOTBALL And we competed. And we won the Texas State Botball Championship. And I discovered that I loved programming.

9 void play_time(int play_time_) { int i; for(i=play_time_; i>=10; i=i-10){
printf("Time left=%ds\n",i); sleep(10.0); } } For years, this is was what I thought programming was. Cobbling together routines and variables with for loops and while loops. Very procedural. I got into “software development” and began programming full-time in Java. So in other words, not much better than this.

10 [1, 2, 3].each { |i| puts i } When
I first saw Ruby, I was blown away. I remember seeing an expression like this. I didn’t fully understand how it worked at first, but the significance was huge. Code could be concise and beautiful too. So there’s some context. And we know this particular expression relates to the functionality of the Enumerable module.

11 include Enumerable I’ve always found Enumerable to be quite
joyful and expressive. Though you’re all familiar with Enumerable, it has some features that are not widely used or well understood. That’s what I’m going to focus on today.

ENUMERABLE PROVIDES traversal searching sorting... and more 12 We know,
as the Ruby docs state, that Enumerable is a module that provides methods traversal, searching and sorting. I think you’ll agree it’s more than that.

13 [Enumerable, Kernel] [Enumerable, Kernel] [Enumerable, Kernel] [Enumerable, Kernel] [Enumerable]
p Array.included_modules p Hash.included_modules require 'set' p Set.included_modules p Range.included_modules p Array.included_modules - Object.included modules It says something that Enumerable is one of only two modules included by default in Ruby’s most important collection classes. In fact, Kernel is included in Object, so Enumerable is the only module directly included in the these classes.

14 p [1, 2, 3].map { |n| n * n
} p [1, 2, 3].select { |n| n > 1 } p [1, 2, 3].any? { |n| n > 1 } p [1, 2, 3].all? { |n| n > 1 } [1, 4, 9] [2, 3] true false And you know how to use many of Enumerable methods. Most of you have used map, select, any?, all? to name a few. And we know and love this pattern of using a block to describe how to interact with members of the collection.

USE THE API more than just map and select 15
Let’s begin by looking at some lesser known methods and forms of Enumerable. The code you’ll see in today’s presentation is written for Ruby 2.2. Your mileage may vary in other Ruby versions.

16 [:all?, :any?, :chunk, :collect, :collect, :concat, :count, :cycle, :detect,
:drop, :drop_while, :each_cons, :each_entry, :each_slice, :each_with_index, :each_with_object, :entries, :find, :find_all, :find_index, :first, :flat_map, :grep, :group_by, :include?, :inject, :lazy, :map, :max, :max_by, :member?, :min, :min_by, :minmax, :minmax_by, :none?, :one?, :partition, :reduce, :reject, :reverse_each, :select, :slice_after, :slice_before, :slice_when, :sort, :sort_by, :take, :take_while, :to_a, :to_h, :zip] p Enumerable.instance_methods.sort Speaking of Enumerable methods, we’ve got a lot to choose from. There are over 50 instance methods provided by Enumerable, some of which you know well, and I’ll bet, some less familiar. It could benefit you to explore aspects you haven’t used much yet.

COUNT BLOCK > select Length 17 Ok, so to count
a subset of a collection, use the count method. I often see folks use select and length to accomplish this.

18 p (1..100).select { |n| n % 13 == 0
} p (1..100).select { |n| n % 13 == 0 }.length p (1..100).count { |n| n % 13 == 0 } [13, 26, 39, 52, 65, 78, 91] 7 7 Since select returns a new array we don’t need in this expression, it’s wasteful to do so. It is more direct to take advantage of the block form of count, which will simply return the number.

GREP === not just for regexp 19 We usually see
Enumerable’s grep method with a regular expression to select a set of string values. It’s useful for other situations as well since grep matches its argument to items with the three-equals operator.

20 require 'date' eighties = Date.new(1980, 1, 1)...Date.new(1990, 1, 1)
dates = [Date.new(1988, 7, 15), Date.new(1990, 1, 1)] p dates.select { |date| eighties.member?(date) }.map { |d| d.to_s } p dates.grep(eighties) { |d| d.to_s } p eighties === Date.new(1988, 7, 15) ["1988-07-15"] ["1988-07-15"] true We can use a date range with grep. Consider selecting a group of dates that happened in the eighties and printing them as strings. Instead of using a select/map expression, use a date range and a block. This works because range three-equals will return true for items that are contained in the range.

GROUP THERAPY more than one way 21 learn => 3
a => 1 ruby => 5 { } So we’re seeing there are often several ways to solve a given problem. The same applies to grouping. We’re know going to look at several ways count the occurrences of words in some text. The result will be a hash with key value pairs of words to word count. Basically a frequency histogram.

words = ["it's", "close", "to", "midnight", ...] histogram = {}
words.each do |w| histogram[w] ||= 0 histogram[w] += 1 end histogram puts histogram.sort_by { |k, v| -v }.take(6).to_h 22 {"the"=>29, "you"=>23, "thriller"=>20, "and"=>14, "to"=>12, "night"=>10, ...} Here’s the naive approach. We’re initializing a hash outside the iteration, lazy initializing each key value pair to 0, and incrementing the word count on each pass.

words = ["it's", "close", "to", "midnight", ...] histogram = words.reduce(Hash.new(0))
do |hist, n| hist[n] += 1 hist end puts histogram.sort_by { |k, v| -v }.take(6).to_h 23 {"the"=>29, "you"=>23, "thriller"=>20, "and"=>14, "to"=>12, "night"=>10, ...} Well, this works, but we can add some sophistication. We know that we can initialize Hash with a default value of 0 so there’s no need to lazy initialize the counts. We can also use reduce, also known as inject, to accumulate the histogram as an internal variable to the block. This means we don’t need to initialize a variable in the outer scope. We need to be careful to return the hash in the block.

24 a b r q a a a => 3
b => 2 q => 1 { } {} reduce letters.reduce(Hash.new(0)) do |hist, n| hist[n] += 1 hist end {a=>1} a {a=>2} a {a=>2 b=>1} a b Reduce really confused me when I first encountered it. I’ll attempt to explain it with some janky pastel diagrams. Given a collection of letters, when we call reduce, we can inject a start value, in this case a hash, and feed values through some operation, which is to increment a value in a hash, and accumulate the final result like a snowball.

25 2 1 7 8 3 9 0 reduce numbers.reduce(0)
{ |sum, n| sum + n } sum 0 + 2 0 2 1 + 9 2 9 11 + 8 11 8 Remember, we can reduce to any value, like a number which is a the accumulated sum of adding the values of a number array together.

words = ["it's", "close", "to", "midnight", ...] histogram = words.reduce(Hash.new(0))
do |hist, n| hist[n] += 1 hist end puts histogram.sort_by { |k, v| -v }.take(6).to_h 26 {"the"=>29, "you"=>23, "thriller"=>20, "and"=>14, "to"=>12, "night"=>10, ...} Back to our word count example. We can actually replace reduce with the `each_with_object` method.

words = ["it's", "close", "to", "midnight", ...] histogram = words.each_with_object(Hash.new(0))
do |n, hist| hist[n] += 1 end puts histogram.sort_by { |k, v| -v }.take(6).to_h 27 {"the"=>29, "you"=>23, "thriller"=>20, "and"=>14, "to"=>12, "night"=>10, ...} It’s basically reduce with side effects. No matter what we do in the block, the object given is returned from the method, so we can clean up our reduce expression.

words = ["it's", "close", "to", "midnight", ...] histogram = Hash[*words.group_by
{ |w| w } .flat_map { |k, v| [k, v.size] }] puts histogram.sort_by { |k, v| -v }.take(6).to_h 28 {"the"=>29, "you"=>23, "thriller"=>20, "and"=>14, "to"=>12, "night"=>10, ...} Instead of reducing, we chain `group_by` with `flat_map` to group words by occurrence, count the size of each group, and convert the result into a flattened array that can be used as an argument to the Hash brackets method to produce the same histogram. I won’t diagram this one out, but it’s worth taking a look at on your own.

require "benchmark/ips" Benchmark.ips do |x| x.report("each") do histogram = {}
words.each do |w| histogram[w] ||= 0; histogram[w] += 1 end end x.report("reduce") do words.reduce(Hash.new(0)) { |hist, n| hist[n] += 1; hist } end x.report("each_with_object") do words.each_with_object(Hash.new(0)) { |n, hist| hist[n] += 1 } end x.report("group_by + flat_map") do Hash[*words.group_by { |w| w }.flat_map { |k, v| [k, v.size] }] end x.compare! # Output the comparison end 29 When you have multiple ways of solving a problem like this, it’s worth considering the pros and cons like semantics and benchmarking. Comparing our four methods via the benchmark/ips gem…

30 Calculating ------------------------------------- each 603.000 i/100ms reduce 750.000 i/100ms each_with_object
770.000 i/100ms group_by + flat_map 460.000 i/100ms ------------------------------------------------- each 6.227k (± 2.4%) i/s - 31.356k reduce 7.305k (± 6.3%) i/s - 36.750k each_with_object 7.580k (± 3.1%) i/s - 38.500k group_by + flat_map 4.444k (± 4.7%) i/s - 22.540k Comparison: each_with_object: 7579.8 i/s reduce: 7304.7 i/s - 1.04x slower each: 6227.0 i/s - 1.22x slower group_by + flat_map: 4444.0 i/s - 1.71x slower ??? shows that our each_with_object and reduce are slightly more performant than the alternatives. Consider applying this kind of analysis in your own applications when performance is of high value.

ZIP IT UP more than one array 31 It hasn’t
always been straightforward to iterate over multiple arrays at once. For this, we can use zip. Let’s solve pascal’s triangle using zip. Recall that pascal’s triangle is a collection of rows, where each element is the sum of the adjacent members of the preceding row.

def pascal_row(row = [1]) end row = [1] p row
p row = pascal_row(row) p row = pascal_row(row) p row = pascal_row(row) p row = pascal_row(row) 32 [1] [1, 1] [1, 2, 1] [1, 3, 3, 1] [1, 4, 6, 4, 1] Let’s write a method called pascal row that will take a row as an argument and return the next row in the triangle.

Given [1, 1], return [1, 2, 1] 33 1, 1
0, 1, 1 1, 1, 0 zip 1, 2, 1 0, 1 , 1, 1, 1, 0 0, 1 1, 1 1, 0 0, 1 , 1, 1, 1, 0 0+1 1+1 1+0 map 0, 1, 1 1, 1 1, 1, 0 1, 1, Let’s break down how this will work. We’ll create two arrays from the given row with 0 appended to either end - you’ll see why in a second. We’ll zip the two arrays together to form a single array of pairs. We’ll then add each pair together with map to produce the final row.

def pascal_row(row = [1]) ([0] + row).zip(row + [0]).map {
|a, b| a + b } end row = [1] p row p row = pascal_row(row) p row = pascal_row(row) p row = pascal_row(row) p row = pascal_row(row) 34 [1] [1, 1] [1, 2, 1] [1, 3, 3, 1] [1, 4, 6, 4, 1] Here’s our implementation: append and prepend zero, zip the arrays, map to sum the pairs.

CHUNK AND RUN 35 flavors of each_* It’s also good
to know how to iterate a collection in chunks.

# Euler No. 8: Find the thirteen adjacent digits in
the 1000-digit number that have the greatest product. What is the value of this product? text = <<-SEQ 73167176531330624919225119674426574742355349194934 96983520312774506326239578318016984801869478851843 85861560789112949495459501737958331952853208805511 12540698747158523863050715693290963295227443043557 ... SEQ numbers = text.gsub(/\s+/, '').each_char.map(&:to_i) p numbers.each_cons(13) # ... 36 ??? One of my favorite chunking Enumerables is `each_cons`. It enumerates consecutive elements in groups of a given size like a sliding window. Check out Project Euler number 8. You need to find the greatest product of thirteen adjacent digits in a huge string of numbers. I’ve given you a hint, but it would be bad form to tell you the answer.

INCLUDE ENUMERABLE enumerability by contract: each 37 We can include
Enumerable into our own classes to add collection behavior. I’d like to show a few good use cases for doing so. The main point in these next examples is how we can exploit our implementation of each.

class Rainbow include Enumerable def each yield "red" yield "orange"
yield "yellow" yield "green" yield "blue" yield "indigo" yield "violet" end end 38 from Well-Grounded Rubyist Here’s a simple example of a Rainbow Enumerable which has the basic components of a custom collection: it has included enumerable, it implements `each`, and `each` yields items. We don’t need to use hashes or arrays.

rainbow = Rainbow.new puts rainbow.map { |color| "Next color: #{color}"
} puts "Starts with y?", rainbow.grep(%r{^y}, &:upcase) 39 Next color: red Next color: orange Next color: yellow Next color: green Next color: blue Next color: indigo Next color: violet Starts with y? YELLOW Now we can call any Enumerable method we like.

class Deck include Enumerable SUITS = %w[ C D H
S ] RANKS = %w[ 2 3 4 5 6 7 8 9 10 J Q K A ] def initialize(n = 1) @cards = SUITS.cycle(n).flat_map do |suit| RANKS.map do |rank| [suit, rank] end end end def each(&block) @cards.each(&block) end end 40 from Well-Grounded Rubyist We can also wrap a standard collection in another class. Here we layer in the behavior of a card deck to build a collection of playing cards. The custom each method simply delegates to the underlying array of cards.

DEFER DATA FETCHING 41 api clients for example Let’s look
at a more pragmatic example like an API client that fetches a collection of tweets or search results. A custom collection class can be really useful when you’re consuming an endpoint with paginated data.

module Twitter module Enumerable include ::Enumerable def each(start = 0)
Array(@collection[start..-1]).each do |element| yield(element) end unless last? start = [@collection.size, start].max fetch_next_page # api call adds to @collection each(start, &Proc.new) end self end end end 42 The twitter gem implements `each` to hide the complexity of fetching pages from the API. Here each will enumerate, fetch the next page, and recursively call itself until the last page is reached all hidden away from the caller. This is really mind-blowing: you can start enumerating before you even have data to enumerate!

STREAMING DATA 43 for a website! How about a web
example? Since the Rack API expects the body of the response to respond to each, we can exploit this to implement a streaming endpoint to send data to the client before the page is fully rendered.

44 #!/usr/bin/env rackup -s puma class App def call(env) [200,
{'Content-Type' => 'text/plain'}, self] end def each yield 'one' sleep 1 yield 'two' sleep 2 yield 'three' sleep 3 end end use Rack::Chunked run App.new This is how the Rack::Chunked middleware works. Here, I show a small rack app that yields content in steps interspersed with sleep statements. The Rack::Chunked middleware does some work, including adding the “chunked” Transfer-Encoding header, to stream the data back in chunks. You can see headers return immediately then the content gradually comes in.

45 module Rack class Chunked class Body def each term
= "\r\n" @body.each do |chunk| size = chunk.bytesize next if size == 0 chunk = chunk.dup.force_encoding(Encoding::BINARY) yield [size.to_s(16), term, chunk, term].join end yield "0#{term}#{term}" end end end end Rack::Chunked wraps the body of the request and hijacks the each method to yield each new line with its bytesize separately.

46 class StreamingController < ApplicationController def index @articles = Article.most_recent
render stream: true end end This is the same mechanism invoked when you declare the `stream` option in your Rails controller. Just one Rails slide today.

USE ENUMERATORS extend and combine behavior 47 A lesser known
feature of Enumerable is the Enumerator class. I love Enumerators and hope you’ll be able to see how they can be useful.

p [1, 2, 3].each p [1, 2, 3].map 48 #<Enumerator:
[1, 2, 3]:each> #<Enumerator: [1, 2, 3]:map> Many of the Enumerable methods will return an instance of Enumerator when called without a block.

CHAIN FOR GAIN index for any enum 49 What does
this get us? This is useful when you need to combine behaviors by chaining Enumerators together.

letters = %w[a b c d e] pairs = letters.map.with_index
do |item, index| [item, index % 3] end p pairs 50 [["a", 0], ["b", 1], ["c", 2], ["d", 0], ["e", 1]] For instance, we used to complain about the fact that there was `each_with_index` but no `map_with_index`. You can now chain map and `with_index`, a special Enumerator method. Both the item and index are yielded to the block at the end of the expression.

letters = %w[a b c d e] group_1 = letters.reverse_each.group_by.each_with_index
do |item, index| index % 3 end group_2 = letters.reverse_each.each_with_index.group_by do |item, index| index % 3 end p group_1 p group_2 51 {0=>["e", "b"], 1=>["d", "a"], 2=>["c"]} {0=>[["e", 0], ["b", 3]], 1=>[["d", 1], ["a", 4]], 2=>[["c", 2]]} You chain several Enumerators, like reversing, grouping, and indexing results in one go. Keep in mind, the order may affect the results. So we’re extending behavior of our existing enumerable methods.

RE CYCLE Enumerate repeatedly 52 Another use case for an
Enumerator is as an the cycle method. Cycle also returns an Enumerator when called without a block.

p ['aliceblue', 'ghostwhite'].cycle.take(5) 53 ["aliceblue", "ghostwhite", "aliceblue", "ghostwhite", "aliceblue"] Cycle
will repeat iteration over and over forever unless we take specified number.

Project = Struct.new(:name) projects = [Project.new("TODO"), Project.new("Work"), Project.new("Home")] colors =
['aliceblue', 'ghostwhite'].cycle require 'erb' erb = (<<-ERB) <table> <% projects.each_with_index do |project, index| %> <tr style="background: <%= colors.next %>"> <td><%= index + 1 %></td> <td><%= project.name %></td> </tr> <% end %> </table> ERB puts ERB.new(erb).result(binding).gsub(/^$\n/, "") 54 <table> <tr style="background: aliceblue"> <td>1</td> <td>TODO</td> </tr> <tr style="background: ghostwhite"> <td>2</td> <td>Work</td> </tr> <tr style="background: aliceblue"> <td>3</td> <td>Home</td> </tr> </table> We use cycle here with the next method to get external enumeration. When we have a reference to an enumerator, calling next repeatedly will enumerate values one by one. So we can create striped table rows, by getting the next color in each iteration of the block in this ERB tag.

enumerator = [1, 2].each p enumerator.next p enumerator.next begin p
enumerator.next rescue StopIteration => e p "Halt!!! #{e.class}" end enumerator.rewind loop do p enumerator.next end p "Done!!!" 55 1 2 "Halt!!! StopIteration" 1 2 "Done!!!" When using an Enumerator with next, it’s worth knowing that an error will be raised when then end of the enumeration is reached. Rewind allows you to start the enumeration over. Ruby’s loop is smart enough to rescue from a StopIteration error to exit the loop as you can see here.

RETURN ENUMERATOR unless block_given? 56 As we’ve seen, calling many
of our Enumerable methods without a block will return instances of Enumerator.

57 class Rainbow def colors return to_enum(:colors) unless block_given? yield
"red" yield "orange" yield "yellow" yield "green" yield "blue" yield "indigo" yield "violet" end end Here’s how to do it: the Kernel module provides `to_enum`, also called `enum_for`. It takes a method name and other arguments if necessary and returns an Enumerator with a reference to your Enumerable object and method that was called. Now we can expose a colors method to get an enumerable method on Rainbow - we don’t actually need to include Enumerable to get this behavior.

rainbow = Rainbow.new puts rainbow.colors.map { |color| "Next color: #{color}"
} puts "Starts with y?", rainbow.colors.grep(%r{^y}, &:upcase) 58 Next color: red Next color: orange Next color: yellow Next color: green Next color: blue Next color: indigo Next color: violet Starts with y? YELLOW Now we can call any Enumerable method we like.

59 module Twitter module Enumerable include ::Enumerable def each(start =
0) return to_enum(:each, start) unless block_given? Array(@collection[start..-1]).each do |element| yield(element) end unless last? start = [@collection.size, start].max fetch_next_page # api call adds to @collection each(start, &Proc.new) end self end end end

CREATE ENUMERATORS templates for generating values 60 Let’s take a
closer look at creating Enumerators outside the context of an Enumerable.

61 enum = Enumerator.new do |y| y.yield 1 y.yield 2
end p enum.map { |n| n * n } [1, 4] Here’s an example of a bare Enumerator. We can create one with `Enumerator.new` and pass a block to declare how values will be generated. This example will behave like a two item array.

62 enum = Enumerator.new do |y| y.yield 1 y.yield 2
end enum.map { |n| n * n } Enumerator::Yielder Taking a closer look, notice a y parameter is given to the block. This object is called a Yielder. Notice that we’re not actually using the yield keyword in the block but calling a yield method on the Yielder. The Yielder allows us to pass values from one block to another.

63 enum = Enumerator.new do |y| n = 0 loop
do y.yield n n += 1 end end enum.each { |i| puts i } And we can do anything we want within the template block like maintain state and use loops. You can use an Enumerator to generate an infinite sequence.

64 def fibonacci Enumerator.new do |y| a, b = 1,
1 loop do y.yield a a, b = b, a + b end end end p fibonacci.take(10) [1, 1, 2, 3, 5, 8, 13, 21, 34, 55] Here’s a fibonacci sequeunce implemented with an Enumerator. This is pretty amazing: we usually think of concrete collections with Enumerables; but now we can apply them to mathematical concepts.

class PascalsTriangle def rows(first = [1]) Enumerator.new do |y| current
= first loop do y.yield current current = next_row(current) end end end def next_row(row) ([0] + row).zip(row + [0]).map { |a, b| a + b } end end require 'pp' pp PascalsTriangle.new.rows.take(7) 65 [[1], [1, 1], [1, 2, 1], [1, 3, 3, 1], [1, 4, 6, 4, 1], [1, 5, 10, 10, 5, 1], [1, 6, 15, 20, 15, 6, 1]] We can go one step further with the Pascal’s triangle example using an Enumerator to repeatedly call our next_row method to generate the triangle as a sequence.

BE MORE LAZY avoid eager evaluation 66 Another amazing thing
we can do with Enumerators is evaluate them lazily. This feature is a more recent addition to the Ruby language and may be less familiar.

67 range = (1..Float::INFINITY) p range.map { |x| x *
x }.first(10) Ctrl-C!!! Consider the problem with this infinite number generator. We’re attempting to map over the results of an infinite range then take the first 10. This will never finish because by default, enumerables are eager: all values will be processed before being passed down the chain. With one little change, we can insert the lazy method, which returns a special type of enumerator, a lazy one, and voila, we get the first ten results.

68 range = (1..Float::INFINITY) p range.lazy.map { |x| x *
x }.first(10) [1, 4, 9, 16, 25, 36, 49, 64, 81, 100] 1 1 2 4 3 9 Lazy is special type of Enumerator that will yield each value to the next caller before enumerating the remaining items instead of doing this eagerly. Each successive Enumerator in the chain is reimplemented to behave lazily. This means the end of the chain can control the flow of execution. In this case, when the first ten items are received, first at the end will raise an error which is rescued further back up the chain, allowing the enumeration to exit. We can take advantage of this feature in special use cases.

require 'csv' CSVAdmissionRate = Struct.new(:range) do def ===(row) rate =
row['ADM_RATE'].to_f rate > 0 && range === rate end end CSV.open('../../data/college-scorecard.csv', 'rb', headers: true) do |csv| results = csv.each.lazy. grep(CSVAdmissionRate.new(0.0..0.10)).take(10). each_with_object({}) do |row, data| puts row['INSTNM'] # streaming results end end 69 Yale University University of Chicago Harvard University Massachusetts Institute of Technology Dartmouth College Princeton University Columbia University Cooper Union for the Advancement of Science and Art The Juilliard School Curtis Institute of Music Imagine we’re processing a large CSV file in a memory constrained environment. Here I’m using the college scorecard data you can download off data.gov and the CSV class from the standard library to process CSV data. We have an AdmissionRate object that implements the three-equals method to select CSV rows with an admission rate falling in a given range. Using lazy, I can grep for the first 10 colleges with an admission rate of less than 10% and stream back the results without loading the entire CSV file in memory.

KITCHEN SINK webcrawler 70 Let’s take a look at a
web crawler in Ruby. We’ll create a Spider class that will process a root webpage and follow links in breadth first search order and recording data from each page. We only know the root url to start. We’ll dynamically add to our list of urls to visit, while simultaneously consuming those urls and recording a dataset of page information as we go. We’ll use enumerators to do this.

71 Spider @urls << url data.merge(info) @results << data results.each
url url info info

72 require 'mechanize' class Spider def results return enum_for(:results) unless
block_given? i = @results.length url_enum.each do |url, handler, data| send handler, agent.get(url), data if block_given? && @results.length > i yield @results.last i += 1 end sleep @interval if @interval > 0 end end end Our public method, results, which will yield crawled data as we iterate over the urls to crawl. Notice the call to to_enum if no block is given. Then we’ll grab the next url to process and repeat. We also provide a sleep interval to respect the crawl limits requested in robots.txt.

73 class Spider private def url_enum index = 0 Enumerator.new
do |y| while index < @urls.count && index <= @max_urls url = @urls[index] next unless url handler, data = @handlers[url] y.yield url, handler, data index += 1 end end end end Our `url_enum` is an Enumerator that yields urls and handlers given as method names to process each page.

74 class GutenbergSpider < Spider def process_index(page, data = {})
links = page.links_with(href: %r{^/ebooks/\d+$}) links.map do |a| title, author = a.text.strip.split("\n") process resolve_url(a.href, page), :process_book, title: title end end def process_book(page, data) books = %w[epub kindle txt htm].each_with_object({}) do |fmt, hash| hash[fmt] = page.links_with(href: %r{ebooks/[^/]*#{fmt}.*$}).map(&:href) end record data.merge(books) end end @urls << url @results << data The Gutenberg Spider subclass specifies handlers for crawling gutenberg.org to grab links for freely-available classic literature in ebook format.

75 require 'yaml/store' path = File.expand_path("../../data/ebooks.store", __FILE__) store = YAML::Store.new(path)
# Search for Charles Dickens ebooks spider = GutenbergSpider.new('http://www.gutenberg.org/ebooks/author/ 37', :process_index) spider.results.lazy.take(5).each_with_index do |result, i| puts "storing #{i}: #{result.inspect}" # streaming store.transaction do store[result[:title]] = result store.commit end end To use the spider, we’ll simply enumerate the results. No need for the caller to be concerned with the implementation details of fetching and parsing pages. We make it lazy to avoid processing all the pages and to stream back the first 5 results into a datastore. Here I’m using Yaml store, a nice abstraction over file-based data storage provided by the standard library.

ROLL YOUR OWN learn Enumerable by implementing Enumerable 76 I’d
like to emphasize that the Enumerable API is **not magic**. I highly recommend implementing it yourself in Ruby as an exercise to better understand how it works under the hood.

require "minitest/autorun" require_relative "./custom_list" describe "CustomEnumerable" do before do @list
= CustomList.new(3, 13, 42, 4, 7) end it "supports map" do @list.map { |x| x + 1 }.must_equal([4, 14, 43, 5, 8]) end it "supports find" do @list.find { |x| x > 40 }.must_equal(42) @list.find(-> { 0 }) { |x| x > 50 }.must_equal(0) end it "supports select" do @list.select { |x| x.even? }.must_equal([42, 4]) end end 77 from practicingruby.com Consider a test-driven approach where you specify assertions for a “Custom Enumerable” module that you will implement yourself. Here are some sample specs for map, find, and select.

module CustomEnumerable def reduce # each end def grep #
each end def sum # each end def average # each end end 78 For homework, implement as many of the Enumerable API methods as you like. Start with or reduce or grep. Then try adding your own extensions to Enumerable, like sum and average. In all cases, you should be able to write your implementation in terms of each.

79 rubinius/rubinius Check out the specs for Enumerable from rubinius,
which is Ruby implemented in Ruby, to go further with this.

LOVE ENUMERABLE use the api get “under the hood” embrace
enumerators 80 To wrap up, I hope you’ll be inspired to use more of the API, practice implementing Enumerable methods on your own for deeper understanding, and play with Enumerators to see how they can be useful in your everyday coding.

BUILDING BLOCKS 81 The key point I’ve been working towards
is that Enumerable provides us with basic building blocks that perform simple, functional roles and combine in many ways to create more powerful constructs. In other words, Enumerable is like Ruby Legos.

82 rossta/loves-enumerable @rossta ['thanks'] You can find all the code
snippets I shared today published on github at rossta/loves-enumerable. Say hi to me after the meetup or on twitter at rossta.

Enumerable - How I Fell in Love with Ruby

Enumerable - How I Fell in Love with Ruby

More Decks by Ross Kaffenberger

Other Decks in Technology

Featured

Transcript