Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Enumerable - How I Fell in Love with Ruby

Enumerable - How I Fell in Love with Ruby

Ruby's Enumerable module provides simple, functional methods that allow for a great deal of flexibility and composability. We'll examine some lesser known features to expand our toolkit for dealing with collections. Learn more: http://bit.ly/1U3RRHw

Ross Kaffenberger

November 10, 2015
Tweet

More Decks by Ross Kaffenberger

Other Decks in Technology

Transcript

  1. ENUMERABLE
    How I fell in love with Ruby
    NYC.rb
    Nov 10, 2015
    1
    Welcome!

    View Slide

  2. @rossta
    2
    My name is Ross Kaffenberger. I’m @rossta on the Internet. This is my dog Simon.

    View Slide

  3. 3
    I’m the head of engineering at Devpost the leading platform for hackathons. Check us out at devpost.com. For developers like yourselves, it’s a great place to share your work.

    View Slide

  4. What to expect

    tips FOR USING EnumerablE API

    LEARN HOW TO USE Enumerators

    SEE Lots of examples
    4
    At a high level, I hope you’ll get out this out of this talk, some new tips and ideas for using Enumerable and Enumerators. We’re going to see a lot of examples so strap on your
    seatbelts.

    View Slide

  5. BRIEF HISTORY
    teacher to programmer
    5
    I didn’t program much growing up. Out of college, I joined Teach for America to teach 8th grade science in Houston TX. After my very first day of school, my department chair
    handed me a Lego Mindstorms kit and said I don’t know what to do with this, you figure it out. So here I am, this green, 21 year old in an inner-city classroom with no idea
    how to teach or program, so I did what anyone would do in my position.

    View Slide

  6. 6
    BOTBALL
    I started a robotics club. I’d teach myself how to program just enough to pass on to my students each day.
    Eventually we entered competitions, including one called Botball.

    View Slide

  7. 7
    BOTBALL
    We learned how to program in C to manipulate servos and sensors for this micro controller called the MIT Handyboard, a sort of predecessor to the Arduino.

    View Slide

  8. KARBON PRESENTATION 6 • 24
    LEVEL 13, 82 SPENCER ST, MELBOURNE 8
    BOTBALL
    And we competed. And we won the Texas State Botball Championship. And I discovered that I loved programming.

    View Slide

  9. 9
    void play_time(int play_time_)
    {
    int i;
    for(i=play_time_; i>=10; i=i-10){
    printf("Time left=%ds\n",i);
    sleep(10.0);
    }
    }
    For years, this is was what I thought programming was. Cobbling together routines and variables with for loops and while loops. Very procedural. I got into “software
    development” and began programming full-time in Java. So in other words, not much better than this.

    View Slide

  10. 10
    [1, 2, 3].each { |i| puts i }
    When I first saw Ruby, I was blown away. I remember seeing an expression like this. I didn’t fully understand how it worked at first, but the significance was huge. Code could be
    concise and beautiful too. So there’s some context. And we know this particular expression relates to the functionality of the Enumerable module.

    View Slide

  11. 11
    include Enumerable
    I’ve always found Enumerable to be quite joyful and expressive. Though you’re all familiar with Enumerable, it has some features that are not widely used or well understood.
    That’s what I’m going to focus on today.

    View Slide

  12. ENUMERABLE PROVIDES
    traversal
    searching
    sorting... and more
    12
    We know, as the Ruby docs state, that Enumerable is a module that provides methods traversal, searching and sorting. I think you’ll agree it’s more than that.

    View Slide

  13. 13
    [Enumerable, Kernel]
    [Enumerable, Kernel]
    [Enumerable, Kernel]
    [Enumerable, Kernel]
    [Enumerable]
    p Array.included_modules
    p Hash.included_modules
    require 'set'
    p Set.included_modules
    p Range.included_modules
    p Array.included_modules - Object.included modules
    It says something that Enumerable is one of only two modules included by default in Ruby’s most important collection classes. In fact, Kernel is included in Object, so
    Enumerable is the only module directly included in the these classes.

    View Slide

  14. 14
    p [1, 2, 3].map { |n| n * n }
    p [1, 2, 3].select { |n| n > 1 }
    p [1, 2, 3].any? { |n| n > 1 }
    p [1, 2, 3].all? { |n| n > 1 }
    [1, 4, 9]
    [2, 3]
    true
    false
    And you know how to use many of Enumerable methods. Most of you have used map, select, any?, all? to name a few.
    And we know and love this pattern of using a block to describe how to interact with members of the collection.

    View Slide

  15. USE THE API
    more than just
    map and select
    15
    Let’s begin by looking at some lesser known methods and forms of Enumerable. The code you’ll see in today’s presentation is written for Ruby 2.2. Your mileage may vary in
    other Ruby versions.

    View Slide

  16. 16
    [:all?, :any?, :chunk, :collect,
    :collect, :concat, :count, :cycle,
    :detect, :drop, :drop_while,
    :each_cons, :each_entry,
    :each_slice, :each_with_index,
    :each_with_object, :entries,
    :find, :find_all, :find_index,
    :first, :flat_map, :grep,
    :group_by, :include?, :inject,
    :lazy, :map, :max, :max_by,
    :member?, :min, :min_by, :minmax,
    :minmax_by, :none?, :one?,
    :partition, :reduce, :reject,
    :reverse_each, :select,
    :slice_after, :slice_before,
    :slice_when, :sort, :sort_by,
    :take, :take_while, :to_a, :to_h,
    :zip]
    p Enumerable.instance_methods.sort
    Speaking of Enumerable methods, we’ve got a lot to choose from. There are over 50 instance methods provided by Enumerable, some of which you know well, and I’ll bet, some
    less familiar. It could benefit you to explore aspects you haven’t used much yet.

    View Slide

  17. COUNT BLOCK
    > select Length
    17
    Ok, so to count a subset of a collection, use the count method. I often see folks use select and length to accomplish this.

    View Slide

  18. 18
    p (1..100).select { |n| n % 13 == 0 }
    p (1..100).select { |n| n % 13 == 0 }.length
    p (1..100).count { |n| n % 13 == 0 }
    [13, 26, 39, 52, 65, 78, 91]
    7
    7
    Since select returns a new array we don’t need in this expression, it’s wasteful to do so. It is more direct to take advantage of the block form of count, which will simply return
    the number.

    View Slide

  19. GREP ===
    not just for regexp
    19
    We usually see Enumerable’s grep method with a regular expression to select a set of string values. It’s useful for other situations as well since grep matches its argument to
    items with the three-equals operator.

    View Slide

  20. 20
    require 'date'
    eighties = Date.new(1980, 1, 1)...Date.new(1990, 1, 1)
    dates = [Date.new(1988, 7, 15), Date.new(1990, 1, 1)]
    p dates.select { |date| eighties.member?(date) }.map { |d| d.to_s }
    p dates.grep(eighties) { |d| d.to_s }
    p eighties === Date.new(1988, 7, 15)
    ["1988-07-15"]
    ["1988-07-15"]
    true
    We can use a date range with grep. Consider selecting a group of dates that happened in the eighties and printing them as strings.
    Instead of using a select/map expression, use a date range and a block. This works because range three-equals will return true for items that are contained in the range.

    View Slide

  21. GROUP THERAPY
    more than one way
    21
    learn => 3
    a => 1
    ruby => 5
    { }
    So we’re seeing there are often several ways to solve a given problem. The same applies to grouping.
    We’re know going to look at several ways count the occurrences of words in some text. The result will be a hash with key value pairs of words to word count. Basically a
    frequency histogram.

    View Slide

  22. words = ["it's", "close", "to", "midnight", ...]
    histogram = {}
    words.each do |w|
    histogram[w] ||= 0
    histogram[w] += 1
    end
    histogram
    puts histogram.sort_by { |k, v| -v }.take(6).to_h
    22
    {"the"=>29, "you"=>23, "thriller"=>20,
    "and"=>14, "to"=>12, "night"=>10, ...}
    Here’s the naive approach. We’re initializing a hash outside the iteration, lazy initializing each key value pair to 0, and incrementing the word count on each pass.

    View Slide

  23. words = ["it's", "close", "to", "midnight", ...]
    histogram = words.reduce(Hash.new(0)) do |hist, n|
    hist[n] += 1
    hist
    end
    puts histogram.sort_by { |k, v| -v }.take(6).to_h
    23
    {"the"=>29, "you"=>23, "thriller"=>20,
    "and"=>14, "to"=>12, "night"=>10, ...}
    Well, this works, but we can add some sophistication. We know that we can initialize Hash with a default value of 0 so there’s no need to lazy initialize the counts. We can also
    use reduce, also known as inject, to accumulate the histogram as an internal variable to the block. This means we don’t need to initialize a variable in the outer scope. We need
    to be careful to return the hash in the block.

    View Slide

  24. 24
    a
    b r
    q
    a
    a a => 3
    b => 2
    q => 1
    { }
    {}
    reduce
    letters.reduce(Hash.new(0)) do |hist, n|
    hist[n] += 1
    hist
    end
    {a=>1}
    a
    {a=>2}
    a
    {a=>2
    b=>1}
    a
    b
    Reduce really confused me when I first encountered it. I’ll attempt to explain it with some janky pastel diagrams. Given a collection of letters, when we call reduce, we can inject
    a start value, in this case a hash, and feed values through some operation, which is to increment a value in a hash, and accumulate the final result like a snowball.

    View Slide

  25. 25
    2
    1 7
    8
    3
    9
    0
    reduce
    numbers.reduce(0) { |sum, n| sum + n }
    sum
    0 + 2
    0 2
    1 + 9
    2 9
    11 + 8
    11 8
    Remember, we can reduce to any value, like a number which is a the accumulated sum of adding the values of a number array together.

    View Slide

  26. words = ["it's", "close", "to", "midnight", ...]
    histogram = words.reduce(Hash.new(0)) do |hist, n|
    hist[n] += 1
    hist
    end
    puts histogram.sort_by { |k, v| -v }.take(6).to_h
    26
    {"the"=>29, "you"=>23, "thriller"=>20,
    "and"=>14, "to"=>12, "night"=>10, ...}
    Back to our word count example. We can actually replace reduce with the `each_with_object` method.

    View Slide

  27. words = ["it's", "close", "to", "midnight", ...]
    histogram = words.each_with_object(Hash.new(0)) do |n, hist|
    hist[n] += 1
    end
    puts histogram.sort_by { |k, v| -v }.take(6).to_h
    27
    {"the"=>29, "you"=>23, "thriller"=>20,
    "and"=>14, "to"=>12, "night"=>10, ...}
    It’s basically reduce with side effects. No matter what we do in the block, the object given is returned from the method, so we can clean up our reduce expression.

    View Slide

  28. words = ["it's", "close", "to", "midnight", ...]
    histogram = Hash[*words.group_by { |w| w }
    .flat_map { |k, v| [k, v.size] }]
    puts histogram.sort_by { |k, v| -v }.take(6).to_h
    28
    {"the"=>29, "you"=>23, "thriller"=>20,
    "and"=>14, "to"=>12, "night"=>10, ...}
    Instead of reducing, we chain `group_by` with `flat_map` to group words by occurrence, count the size of each group, and convert the result into a flattened array that can be
    used as an argument to the Hash brackets method to produce the same histogram.
    I won’t diagram this one out, but it’s worth taking a look at on your own.

    View Slide

  29. require "benchmark/ips"
    Benchmark.ips do |x|
    x.report("each") do
    histogram = {}
    words.each do |w|
    histogram[w] ||= 0; histogram[w] += 1
    end
    end
    x.report("reduce") do
    words.reduce(Hash.new(0)) { |hist, n| hist[n] += 1; hist }
    end
    x.report("each_with_object") do
    words.each_with_object(Hash.new(0)) { |n, hist| hist[n] += 1 }
    end
    x.report("group_by + flat_map") do
    Hash[*words.group_by { |w| w }.flat_map { |k, v| [k, v.size] }]
    end
    x.compare! # Output the comparison
    end
    29
    When you have multiple ways of solving a problem like this, it’s worth considering the pros and cons like semantics and benchmarking. Comparing our four methods via the
    benchmark/ips gem…

    View Slide

  30. 30
    Calculating -------------------------------------
    each 603.000 i/100ms
    reduce 750.000 i/100ms
    each_with_object 770.000 i/100ms
    group_by + flat_map 460.000 i/100ms
    -------------------------------------------------
    each 6.227k (± 2.4%) i/s - 31.356k
    reduce 7.305k (± 6.3%) i/s - 36.750k
    each_with_object 7.580k (± 3.1%) i/s - 38.500k
    group_by + flat_map 4.444k (± 4.7%) i/s - 22.540k
    Comparison:
    each_with_object: 7579.8 i/s
    reduce: 7304.7 i/s - 1.04x slower
    each: 6227.0 i/s - 1.22x slower
    group_by + flat_map: 4444.0 i/s - 1.71x slower
    ???
    shows that our each_with_object and reduce are slightly more performant than the alternatives. Consider applying this kind of analysis in your own applications when
    performance is of high value.

    View Slide

  31. ZIP IT UP
    more than one array
    31
    It hasn’t always been straightforward to iterate over multiple arrays at once. For this, we can use zip. Let’s solve pascal’s triangle using zip.
    Recall that pascal’s triangle is a collection of rows, where each element is the sum of the adjacent members of the preceding row.

    View Slide

  32. def pascal_row(row = [1])
    end
    row = [1]
    p row
    p row = pascal_row(row)
    p row = pascal_row(row)
    p row = pascal_row(row)
    p row = pascal_row(row)
    32
    [1]
    [1, 1]
    [1, 2, 1]
    [1, 3, 3, 1]
    [1, 4, 6, 4, 1]
    Let’s write a method called pascal row that will take a row as an argument and return the next row in the triangle.

    View Slide

  33. Given [1, 1], return [1, 2, 1]
    33
    1, 1
    0, 1, 1 1, 1, 0
    zip
    1, 2, 1
    0, 1 , 1, 1, 1, 0
    0, 1 1, 1 1, 0
    0, 1 , 1, 1, 1, 0
    0+1 1+1 1+0
    map
    0, 1, 1
    1, 1 1, 1, 0
    1, 1,
    Let’s break down how this will work. We’ll create two arrays from the given row with 0 appended to either end - you’ll see why in a second. We’ll zip the two arrays together to
    form a single array of pairs. We’ll then add each pair together with map to produce the final row.

    View Slide

  34. def pascal_row(row = [1])
    ([0] + row).zip(row + [0]).map { |a, b| a + b }
    end
    row = [1]
    p row
    p row = pascal_row(row)
    p row = pascal_row(row)
    p row = pascal_row(row)
    p row = pascal_row(row)
    34
    [1]
    [1, 1]
    [1, 2, 1]
    [1, 3, 3, 1]
    [1, 4, 6, 4, 1]
    Here’s our implementation: append and prepend zero, zip the arrays, map to sum the pairs.

    View Slide

  35. CHUNK AND RUN
    35
    flavors of each_*
    It’s also good to know how to iterate a collection in chunks.

    View Slide

  36. # Euler No. 8: Find the thirteen adjacent digits in the 1000-digit number
    that have the greatest product. What is the value of this product?
    text = <<-SEQ
    73167176531330624919225119674426574742355349194934
    96983520312774506326239578318016984801869478851843
    85861560789112949495459501737958331952853208805511
    12540698747158523863050715693290963295227443043557
    ...
    SEQ
    numbers = text.gsub(/\s+/, '').each_char.map(&:to_i)
    p numbers.each_cons(13) # ...
    36
    ???
    One of my favorite chunking Enumerables is `each_cons`. It enumerates consecutive elements in groups of a given size like a sliding window. Check out Project Euler number 8.
    You need to find the greatest product of thirteen adjacent digits in a huge string of numbers. I’ve given you a hint, but it would be bad form to tell you the answer.

    View Slide

  37. INCLUDE ENUMERABLE
    enumerability by
    contract: each
    37
    We can include Enumerable into our own classes to add collection behavior. I’d like to show a few good use cases for doing so. The main point in these next examples is how we
    can exploit our implementation of each.

    View Slide

  38. class Rainbow
    include Enumerable
    def each
    yield "red"
    yield "orange"
    yield "yellow"
    yield "green"
    yield "blue"
    yield "indigo"
    yield "violet"
    end
    end
    38
    from Well-Grounded Rubyist
    Here’s a simple example of a Rainbow Enumerable which has the basic components of a custom collection: it has included enumerable, it implements `each`, and `each` yields
    items. We don’t need to use hashes or arrays.

    View Slide

  39. rainbow = Rainbow.new
    puts rainbow.map { |color| "Next color: #{color}" }
    puts "Starts with y?", rainbow.grep(%r{^y}, &:upcase)
    39
    Next color: red
    Next color: orange
    Next color: yellow
    Next color: green
    Next color: blue
    Next color: indigo
    Next color: violet
    Starts with y?
    YELLOW
    Now we can call any Enumerable method we like.

    View Slide

  40. class Deck
    include Enumerable
    SUITS = %w[ C D H S ]
    RANKS = %w[ 2 3 4 5 6 7 8 9 10 J Q K A ]
    def initialize(n = 1)
    @cards = SUITS.cycle(n).flat_map do |suit|
    RANKS.map do |rank|
    [suit, rank]
    end
    end
    end
    def each(&block)
    @cards.each(&block)
    end
    end
    40
    from Well-Grounded Rubyist
    We can also wrap a standard collection in another class. Here we layer in the behavior of a card deck to build a collection of playing cards. The custom each method simply
    delegates to the underlying array of cards.

    View Slide

  41. DEFER DATA FETCHING
    41
    api clients for example
    Let’s look at a more pragmatic example like an API client that fetches a collection of tweets or search results. A custom collection class can be really useful when you’re
    consuming an endpoint with paginated data.

    View Slide

  42. module Twitter
    module Enumerable
    include ::Enumerable
    def each(start = 0)
    Array(@collection[start..-1]).each do |element|
    yield(element)
    end
    unless last?
    start = [@collection.size, start].max
    fetch_next_page # api call adds to @collection
    each(start, &Proc.new)
    end
    self
    end
    end
    end
    42
    The twitter gem implements `each` to hide the complexity of fetching pages from the API. Here each will enumerate, fetch the next page, and recursively call itself until the
    last page is reached all hidden away from the caller.
    This is really mind-blowing: you can start enumerating before you even have data to enumerate!

    View Slide

  43. STREAMING DATA
    43
    for a website!
    How about a web example? Since the Rack API expects the body of the response to respond to each, we can exploit this to implement a streaming endpoint to send data to the
    client before the page is fully rendered.

    View Slide

  44. 44
    #!/usr/bin/env rackup -s puma
    class App
    def call(env)
    [200, {'Content-Type' => 'text/plain'}, self]
    end
    def each
    yield 'one'
    sleep 1
    yield 'two'
    sleep 2
    yield 'three'
    sleep 3
    end
    end
    use Rack::Chunked
    run App.new
    This is how the Rack::Chunked middleware works. Here, I show a small rack app that yields content in steps interspersed with sleep statements. The Rack::Chunked middleware
    does some work, including adding the “chunked” Transfer-Encoding header, to stream the data back in chunks. You can see headers return immediately then the content
    gradually comes in.

    View Slide

  45. 45
    module Rack
    class Chunked
    class Body
    def each
    term = "\r\n"
    @body.each do |chunk|
    size = chunk.bytesize
    next if size == 0
    chunk = chunk.dup.force_encoding(Encoding::BINARY)
    yield [size.to_s(16), term, chunk, term].join
    end
    yield "0#{term}#{term}"
    end
    end
    end
    end
    Rack::Chunked wraps the body of the request and hijacks the each method to yield each new line with its bytesize separately.

    View Slide

  46. 46
    class StreamingController < ApplicationController
    def index
    @articles = Article.most_recent
    render stream: true
    end
    end
    This is the same mechanism invoked when you declare the `stream` option in your Rails controller. Just one Rails slide today.

    View Slide

  47. USE ENUMERATORS
    extend and combine
    behavior
    47
    A lesser known feature of Enumerable is the Enumerator class. I love Enumerators and hope you’ll be able to see how they can be useful.

    View Slide

  48. p [1, 2, 3].each
    p [1, 2, 3].map
    48
    #
    #
    Many of the Enumerable methods will return an instance of Enumerator when called without a block.

    View Slide

  49. CHAIN FOR GAIN
    index for any enum
    49
    What does this get us? This is useful when you need to combine behaviors by chaining Enumerators together.

    View Slide

  50. letters = %w[a b c d e]
    pairs = letters.map.with_index do |item, index|
    [item, index % 3]
    end
    p pairs
    50
    [["a", 0], ["b", 1], ["c", 2], ["d", 0], ["e", 1]]
    For instance, we used to complain about the fact that there was `each_with_index` but no `map_with_index`. You can now chain map and `with_index`, a special Enumerator
    method. Both the item and index are yielded to the block at the end of the expression.

    View Slide

  51. letters = %w[a b c d e]
    group_1 = letters.reverse_each.group_by.each_with_index do |item, index|
    index % 3
    end
    group_2 = letters.reverse_each.each_with_index.group_by do |item, index|
    index % 3
    end
    p group_1
    p group_2
    51
    {0=>["e", "b"], 1=>["d", "a"], 2=>["c"]}
    {0=>[["e", 0], ["b", 3]], 1=>[["d", 1], ["a", 4]], 2=>[["c", 2]]}
    You chain several Enumerators, like reversing, grouping, and indexing results in one go. Keep in mind, the order may affect the results.
    So we’re extending behavior of our existing enumerable methods.

    View Slide

  52. RE CYCLE
    Enumerate repeatedly
    52
    Another use case for an Enumerator is as an the cycle method. Cycle also returns an Enumerator when called without a block.

    View Slide

  53. p ['aliceblue', 'ghostwhite'].cycle.take(5)
    53
    ["aliceblue",
    "ghostwhite",
    "aliceblue",
    "ghostwhite",
    "aliceblue"]
    Cycle will repeat iteration over and over forever unless we take specified number.

    View Slide

  54. Project = Struct.new(:name)
    projects = [Project.new("TODO"),
    Project.new("Work"),
    Project.new("Home")]
    colors = ['aliceblue', 'ghostwhite'].cycle
    require 'erb'
    erb = (<<-ERB)

    <% projects.each_with_index do |project, index| %>

    <%= index + 1 %>
    <%= project.name %>

    <% end %>

    ERB
    puts ERB.new(erb).result(binding).gsub(/^$\n/, "") 54


    1
    TODO


    2
    Work


    3
    Home


    We use cycle here with the next method to get external enumeration. When we have a reference to an enumerator, calling next repeatedly will enumerate values one by one. So
    we can create striped table rows, by getting the next color in each iteration of the block in this ERB tag.

    View Slide

  55. enumerator = [1, 2].each
    p enumerator.next
    p enumerator.next
    begin
    p enumerator.next
    rescue StopIteration => e
    p "Halt!!! #{e.class}"
    end
    enumerator.rewind
    loop do
    p enumerator.next
    end
    p "Done!!!"
    55
    1
    2
    "Halt!!! StopIteration"
    1
    2
    "Done!!!"
    When using an Enumerator with next, it’s worth knowing that an error will be raised when then end of the enumeration is reached. Rewind allows you to start the enumeration
    over. Ruby’s loop is smart enough to rescue from a StopIteration error to exit the loop as you can see here.

    View Slide

  56. RETURN ENUMERATOR
    unless block_given?
    56
    As we’ve seen, calling many of our Enumerable methods without a block will return instances of Enumerator.

    View Slide

  57. 57
    class Rainbow
    def colors
    return to_enum(:colors) unless block_given?
    yield "red"
    yield "orange"
    yield "yellow"
    yield "green"
    yield "blue"
    yield "indigo"
    yield "violet"
    end
    end
    Here’s how to do it: the Kernel module provides `to_enum`, also called `enum_for`. It takes a method name and other arguments if necessary and returns an Enumerator with a
    reference to your Enumerable object and method that was called.
    Now we can expose a colors method to get an enumerable method on Rainbow - we don’t actually need to include Enumerable to get this behavior.

    View Slide

  58. rainbow = Rainbow.new
    puts rainbow.colors.map { |color| "Next color: #{color}" }
    puts "Starts with y?", rainbow.colors.grep(%r{^y}, &:upcase)
    58
    Next color: red
    Next color: orange
    Next color: yellow
    Next color: green
    Next color: blue
    Next color: indigo
    Next color: violet
    Starts with y?
    YELLOW
    Now we can call any Enumerable method we like.

    View Slide

  59. 59
    module Twitter
    module Enumerable
    include ::Enumerable
    def each(start = 0)
    return to_enum(:each, start) unless block_given?
    Array(@collection[start..-1]).each do |element|
    yield(element)
    end
    unless last?
    start = [@collection.size, start].max
    fetch_next_page # api call adds to @collection
    each(start, &Proc.new)
    end
    self
    end
    end
    end

    View Slide

  60. CREATE ENUMERATORS
    templates for
    generating values
    60
    Let’s take a closer look at creating Enumerators outside the context of an Enumerable.

    View Slide

  61. 61
    enum = Enumerator.new do |y|
    y.yield 1
    y.yield 2
    end
    p enum.map { |n| n * n }
    [1, 4]
    Here’s an example of a bare Enumerator. We can create one with `Enumerator.new` and pass a block to declare how values will be generated. This example will behave like a two
    item array.

    View Slide

  62. 62
    enum = Enumerator.new do |y|
    y.yield 1
    y.yield 2
    end
    enum.map { |n| n * n }
    Enumerator::Yielder
    Taking a closer look, notice a y parameter is given to the block. This object is called a Yielder. Notice that we’re not actually using the yield keyword in the block but calling a
    yield method on the Yielder. The Yielder allows us to pass values from one block to another.

    View Slide

  63. 63
    enum = Enumerator.new do |y|
    n = 0
    loop do
    y.yield n
    n += 1
    end
    end
    enum.each { |i| puts i }
    And we can do anything we want within the template block like maintain state and use loops. You can use an Enumerator to generate an infinite sequence.

    View Slide

  64. 64
    def fibonacci
    Enumerator.new do |y|
    a, b = 1, 1
    loop do
    y.yield a
    a, b = b, a + b
    end
    end
    end
    p fibonacci.take(10) [1, 1, 2, 3, 5, 8, 13, 21, 34, 55]
    Here’s a fibonacci sequeunce implemented with an Enumerator. This is pretty amazing: we usually think of concrete collections with Enumerables; but now we can apply them to
    mathematical concepts.

    View Slide

  65. class PascalsTriangle
    def rows(first = [1])
    Enumerator.new do |y|
    current = first
    loop do
    y.yield current
    current = next_row(current)
    end
    end
    end
    def next_row(row)
    ([0] + row).zip(row + [0]).map { |a, b| a + b }
    end
    end
    require 'pp'
    pp PascalsTriangle.new.rows.take(7)
    65
    [[1],
    [1, 1],
    [1, 2, 1],
    [1, 3, 3, 1],
    [1, 4, 6, 4, 1],
    [1, 5, 10, 10, 5, 1],
    [1, 6, 15, 20, 15, 6, 1]]
    We can go one step further with the Pascal’s triangle example using an Enumerator to repeatedly call our next_row method to generate the triangle as a sequence.

    View Slide

  66. BE MORE LAZY
    avoid eager evaluation
    66
    Another amazing thing we can do with Enumerators is evaluate them lazily. This feature is a more recent addition to the Ruby language and may be less familiar.

    View Slide

  67. 67
    range = (1..Float::INFINITY)
    p range.map { |x| x * x }.first(10)
    Ctrl-C!!!
    Consider the problem with this infinite number generator. We’re attempting to map over the results of an infinite range then take the first 10. This will never finish because by
    default, enumerables are eager: all values will be processed before being passed down the chain. With one little change, we can insert the lazy method, which returns a special
    type of enumerator, a lazy one, and voila, we get the first ten results.

    View Slide

  68. 68
    range = (1..Float::INFINITY)
    p range.lazy.map { |x| x * x }.first(10)
    [1, 4, 9, 16, 25,
    36, 49, 64, 81, 100]
    1 1
    2 4
    3 9
    Lazy is special type of Enumerator that will yield each value to the next caller before enumerating the remaining items instead of doing this eagerly. Each successive Enumerator
    in the chain is reimplemented to behave lazily. This means the end of the chain can control the flow of execution. In this case, when the first ten items are received, first at the
    end will raise an error which is rescued further back up the chain, allowing the enumeration to exit. We can take advantage of this feature in special use cases.

    View Slide

  69. require 'csv'
    CSVAdmissionRate = Struct.new(:range) do
    def ===(row)
    rate = row['ADM_RATE'].to_f
    rate > 0 && range === rate
    end
    end
    CSV.open('../../data/college-scorecard.csv',
    'rb', headers: true) do |csv|
    results = csv.each.lazy.
    grep(CSVAdmissionRate.new(0.0..0.10)).take(10).
    each_with_object({}) do |row, data|
    puts row['INSTNM'] # streaming results
    end
    end
    69
    Yale University
    University of Chicago
    Harvard University
    Massachusetts Institute of
    Technology
    Dartmouth College
    Princeton University
    Columbia University
    Cooper Union for the Advancement
    of Science and Art
    The Juilliard School
    Curtis Institute of Music
    Imagine we’re processing a large CSV file in a memory constrained environment. Here I’m using the college scorecard data you can download off data.gov and the CSV class from
    the standard library to process CSV data. We have an AdmissionRate object that implements the three-equals method to select CSV rows with an admission rate falling in a given
    range.
    Using lazy, I can grep for the first 10 colleges with an admission rate of less than 10% and stream back the results without loading the entire CSV file in memory.

    View Slide

  70. KITCHEN SINK
    webcrawler
    70
    Let’s take a look at a web crawler in Ruby. We’ll create a Spider class that will process a root webpage and follow links in breadth first search order and recording data from each
    page. We only know the root url to start. We’ll dynamically add to our list of urls to visit, while simultaneously consuming those urls and recording a dataset of page information
    as we go. We’ll use enumerators to do this.

    View Slide

  71. 71
    Spider
    @urls << url
    data.merge(info)
    @results << data
    results.each
    url
    url
    info
    info

    View Slide

  72. 72
    require 'mechanize'
    class Spider
    def results
    return enum_for(:results) unless block_given?
    i = @results.length
    url_enum.each do |url, handler, data|
    send handler, agent.get(url), data
    if block_given? && @results.length > i
    yield @results.last
    i += 1
    end
    sleep @interval if @interval > 0
    end
    end
    end
    Our public method, results, which will yield crawled data as we iterate over the urls to crawl. Notice the call to to_enum if no block is given. Then we’ll grab the next url to
    process and repeat. We also provide a sleep interval to respect the crawl limits requested in robots.txt.

    View Slide

  73. 73
    class Spider
    private
    def url_enum
    index = 0
    Enumerator.new do |y|
    while index < @urls.count && index <= @max_urls
    url = @urls[index]
    next unless url
    handler, data = @handlers[url]
    y.yield url, handler, data
    index += 1
    end
    end
    end
    end
    Our `url_enum` is an Enumerator that yields urls and handlers given as method names to process each page.

    View Slide

  74. 74
    class GutenbergSpider < Spider
    def process_index(page, data = {})
    links = page.links_with(href: %r{^/ebooks/\d+$})
    links.map do |a|
    title, author = a.text.strip.split("\n")
    process resolve_url(a.href, page), :process_book, title: title
    end
    end
    def process_book(page, data)
    books = %w[epub kindle txt htm].each_with_object({}) do |fmt, hash|
    hash[fmt] = page.links_with(href: %r{ebooks/[^/]*#{fmt}.*$}).map(&:href)
    end
    record data.merge(books)
    end
    end
    @urls << url
    @results << data
    The Gutenberg Spider subclass specifies handlers for crawling gutenberg.org to grab links for freely-available classic literature in ebook format.

    View Slide

  75. 75
    require 'yaml/store'
    path = File.expand_path("../../data/ebooks.store", __FILE__)
    store = YAML::Store.new(path)
    # Search for Charles Dickens ebooks
    spider = GutenbergSpider.new('http://www.gutenberg.org/ebooks/author/
    37', :process_index)
    spider.results.lazy.take(5).each_with_index do |result, i|
    puts "storing #{i}: #{result.inspect}" # streaming
    store.transaction do
    store[result[:title]] = result
    store.commit
    end
    end
    To use the spider, we’ll simply enumerate the results. No need for the caller to be concerned with the implementation details of fetching and parsing pages. We make it lazy to
    avoid processing all the pages and to stream back the first 5 results into a datastore. Here I’m using Yaml store, a nice abstraction over file-based data storage provided by the
    standard library.

    View Slide

  76. ROLL YOUR OWN
    learn Enumerable by implementing
    Enumerable
    76
    I’d like to emphasize that the Enumerable API is **not magic**. I highly recommend implementing it yourself in Ruby as an exercise to better understand how it works under the
    hood.

    View Slide

  77. require "minitest/autorun"
    require_relative "./custom_list"
    describe "CustomEnumerable" do
    before do
    @list = CustomList.new(3, 13, 42, 4, 7)
    end
    it "supports map" do
    @list.map { |x| x + 1 }.must_equal([4, 14, 43, 5, 8])
    end
    it "supports find" do
    @list.find { |x| x > 40 }.must_equal(42)
    @list.find(-> { 0 }) { |x| x > 50 }.must_equal(0)
    end
    it "supports select" do
    @list.select { |x| x.even? }.must_equal([42, 4])
    end
    end
    77
    from practicingruby.com
    Consider a test-driven approach where you specify assertions for a “Custom Enumerable” module that you will implement yourself. Here are some sample specs for map, find,
    and select.

    View Slide

  78. module CustomEnumerable
    def reduce
    # each
    end
    def grep
    # each
    end
    def sum
    # each
    end
    def average
    # each
    end
    end
    78
    For homework, implement as many of the Enumerable API methods as you like. Start with or reduce or grep. Then try adding your own extensions to Enumerable, like sum and
    average. In all cases, you should be able to write your implementation in terms of each.

    View Slide

  79. 79
    rubinius/rubinius
    Check out the specs for Enumerable from rubinius, which is Ruby implemented in Ruby, to go further with this.

    View Slide

  80. LOVE ENUMERABLE
    use the api
    get “under the hood”
    embrace enumerators
    80
    To wrap up, I hope you’ll be inspired to use more of the API, practice implementing Enumerable methods on your own for deeper understanding, and play with Enumerators to
    see how they can be useful in your everyday coding.

    View Slide

  81. BUILDING BLOCKS
    81
    The key point I’ve been working towards is that Enumerable provides us with basic building blocks that perform simple, functional roles and combine in many ways to create
    more powerful constructs. In other words, Enumerable is like Ruby Legos.

    View Slide

  82. 82
    rossta/loves-enumerable
    @rossta
    ['thanks']
    You can find all the code snippets I shared today published on github at rossta/loves-enumerable.
    Say hi to me after the meetup or on twitter at rossta.

    View Slide