Hidden Gems

Transcript

‣Created the Ruby Quiz

Back at Lone Star ‣I am James Edward Gray II

‣Created the Ruby Quiz ‣released FasterCSV, HighLine, and Elif

Back at Lone Star ‣I am James Edward Gray II

‣Created the Ruby Quiz ‣released FasterCSV, HighLine, and Elif ‣Wrote a couple of Pragmatic Programmer books with a lot of Ruby in them

Back at Lone Star ‣I am James Edward Gray II

‣Created the Ruby Quiz ‣released FasterCSV, HighLine, and Elif ‣Wrote a couple of Pragmatic Programmer books with a lot of Ruby in them ‣I’ve given a talk at every LSRC so far

None

Why BSG? ‣The old ship that was to be decommissioned

becomes the only thing keeping the human race alive

Why BSG? ‣The old ship that was to be decommissioned

becomes the only thing keeping the human race alive ‣Those are my kind of odds

BS My Opinion of the Speed Rumor

Ruby is as Fast as we Want her to Be!

‣We can always write C extensions for speed-critical portions

Ruby is as Fast as we Want her to Be!

‣We can always write C extensions for speed-critical portions ‣This is rarely actually needed though

Ruby is as Fast as we Want her to Be!

‣We can always write C extensions for speed-critical portions ‣This is rarely actually needed though ‣We can make use of libraries, some written in C, that help with our problem

Ruby is as Fast as we Want her to Be!

‣We can always write C extensions for speed-critical portions ‣This is rarely actually needed though ‣We can make use of libraries, some written in C, that help with our problem ‣We can add more processing power

Ruby is as Fast as we Want her to Be!

‣We can always write C extensions for speed-critical portions ‣This is rarely actually needed though ‣We can make use of libraries, some written in C, that help with our problem ‣We can add more processing power ‣We can rework our data structures to better support the task at hand

Ruby is as Fast as we Want her to Be!

‣We can always write C extensions for speed-critical portions ‣This is rarely actually needed though ‣We can make use of libraries, some written in C, that help with our problem ‣We can add more processing power ‣We can rework our data structures to better support the task at hand ‣Always the big win, in my opinion

Ruby in the Fast Lane

Ruby in the Fast Lane ‣Let’s see how fast Ruby

can run using:

Ruby in the Fast Lane ‣Let’s see how fast Ruby

can run using: ‣NArray

Ruby in the Fast Lane ‣Let’s see how fast Ruby

can run using: ‣NArray ‣SQLite

Ruby in the Fast Lane ‣Let’s see how fast Ruby

can run using: ‣NArray ‣SQLite ‣RBTree

Ruby in the Fast Lane ‣Let’s see how fast Ruby

can run using: ‣NArray ‣SQLite ‣RBTree ‣FSDB

Ruby in the Fast Lane ‣Let’s see how fast Ruby

can run using: ‣NArray ‣SQLite ‣RBTree ‣FSDB ‣Rinda

Ruby in the Fast Lane ‣Let’s see how fast Ruby

can run using: ‣NArray ‣SQLite ‣RBTree ‣FSDB ‣Rinda ‣Thinking outside the box

Super Fast Number crunching

Super Fast Number crunching NArray

When the Numbers Count

When the Numbers Count ‣Ruby’s Numeric family of objects were

built for ease of use

When the Numbers Count ‣Ruby’s Numeric family of objects were

built for ease of use ‣This makes them a bit slower

When the Numbers Count ‣Ruby’s Numeric family of objects were

built for ease of use ‣This makes them a bit slower ‣C’s numbers were built for speed

When the Numbers Count ‣Ruby’s Numeric family of objects were

built for ease of use ‣This makes them a bit slower ‣C’s numbers were built for speed ‣Ruby can borrow them with NArray

Problem: Faster Imaging

Problem: Faster Imaging ‣I use a trivial PPM library to

generate images for one project

Problem: Faster Imaging ‣I use a trivial PPM library to

generate images for one project ‣The PPM code was taking about 1.3 seconds for a 400 by 200 pixel image

Problem: Faster Imaging ‣I use a trivial PPM library to

generate images for one project ‣The PPM code was taking about 1.3 seconds for a 400 by 200 pixel image ‣I replaced a two dimensional Array of Color objects with a 3D NArray

Problem: Faster Imaging ‣I use a trivial PPM library to

generate images for one project ‣The PPM code was taking about 1.3 seconds for a 400 by 200 pixel image ‣I replaced a two dimensional Array of Color objects with a 3D NArray ‣I changed less than ten lines of code

Problem: Faster Imaging ‣I use a trivial PPM library to

generate images for one project ‣The PPM code was taking about 1.3 seconds for a 400 by 200 pixel image ‣I replaced a two dimensional Array of Color objects with a 3D NArray ‣I changed less than ten lines of code ‣The speed on the same image dropped to about 1/100th of a second

Creating The Canvas

Creating The Canvas def initialize(options = Hash.new) options = DEFAULT_OPTIONS.merge(options)

@width = options[:width] @height = options[:height] @background = options[:background] @foreground = options[:foreground] @mode = options[:mode] @canvas = Array.new(@height) { Array.new(@width) { @background } } end

Creating The Canvas def initialize(options = Hash.new) options = DEFAULT_OPTIONS.merge(options)

@width = options[:width] @height = options[:height] @background = options[:background] @foreground = options[:foreground] @mode = options[:mode] @canvas = Array.new(@height) { Array.new(@width) { @background } } end

Creating The Canvas def initialize(options = Hash.new) require "rubygems" require

"narray" ! options = DEFAULT_OPTIONS.merge(options) @width = options[:width] @height = options[:height] @background = options[:background] @foreground = options[:foreground] @mode = options[:mode] @canvas = NArray.byte(@width, @height, 3) end

Creating The Canvas def initialize(options = Hash.new) require "rubygems" require

"narray" ! options = DEFAULT_OPTIONS.merge(options) @width = options[:width] @height = options[:height] @background = options[:background] @foreground = options[:foreground] @mode = options[:mode] @canvas = NArray.byte(@width, @height, 3) end

Marking Pixels

Marking Pixels def draw_point(x, y, color = @foreground) return unless

x.between? 0, @width - 1 return unless y.between? 0, @height - 1 ! @canvas[y][x] = color end

Marking Pixels def draw_point(x, y, color = @foreground) return unless

x.between? 0, @width - 1 return unless y.between? 0, @height - 1 ! @canvas[y][x] = color end

Marking Pixels def draw_point(x, y, color = @foreground) return unless

x.between? 0, @width - 1 return unless y.between? 0, @height - 1 ! @canvas[x, y, 0..2] = color.to_a end

Marking Pixels def draw_point(x, y, color = @foreground) return unless

x.between? 0, @width - 1 return unless y.between? 0, @height - 1 ! @canvas[x, y, 0..2] = color.to_a end

Drawing an Image

Drawing an Image def save(file) File.open(file.sub(/\.ppm$/i, "") + ".ppm", "w")

do |image| image.puts @mode image.puts "#{@width} #{@height} 255" @canvas.each do |row| pixels = row.map { |pixel| pixel.to_s(@mode) } image.send( @mode == "P6" ? :print : :puts, pixels.join(@mode == "P6" ? "" : " ") ) end end end

Drawing an Image def save(file) File.open(file.sub(/\.ppm$/i, "") + ".ppm", "w")

do |image| image.puts @mode image.puts "#{@width} #{@height} 255" @canvas.each do |row| pixels = row.map { |pixel| pixel.to_s(@mode) } image.send( @mode == "P6" ? :print : :puts, pixels.join(@mode == "P6" ? "" : " ") ) end end end

Drawing an Image def save(file) File.open(file.sub(/\.ppm$/i, "") + ".ppm", "w")

do |image| image.puts @mode image.puts "#{@width} #{@height} 255" 0.upto(@height - 1) do |y| row = @canvas[0...@width, y, 0..2].transpose(-1, 0) image.send( @mode == "P6" ? :print : :puts, @mode == "P6" ? row.to_s : row.to_a.join(" ") ) end end end

Drawing an Image def save(file) File.open(file.sub(/\.ppm$/i, "") + ".ppm", "w")

do |image| image.puts @mode image.puts "#{@width} #{@height} 255" 0.upto(@height - 1) do |y| row = @canvas[0...@width, y, 0..2].transpose(-1, 0) image.send( @mode == "P6" ? :print : :puts, @mode == "P6" ? row.to_s : row.to_a.join(" ") ) end end end

Other Nice Features

Other Nice Features ‣NArray supports integers, floats, and even complex

numbers in various sizes

Other Nice Features ‣NArray supports integers, floats, and even complex

numbers in various sizes ‣Aside from indexing and iteration, NArray supports data generation, arithmetic operations, comparisons, bitwise manipulations, statistic calculations, and more

Other Nice Features ‣NArray supports integers, floats, and even complex

numbers in various sizes ‣Aside from indexing and iteration, NArray supports data generation, arithmetic operations, comparisons, bitwise manipulations, statistic calculations, and more ‣View the large API with examples At: http://narray.rubyforge.org/

Conway’s Game of Life

Conway’s Game of Life #!/usr/bin/env ruby -wKU ! require "rubygems"

require "narray" ! # build cells life = NArray.byte(5, 5) life[1, 1] = NArray.byte(3, 3).random!(2) p life ! # count neighbors counts = NArray.byte(*life.shape) counts[1..-2, 1..-2] = life[0..-3, 0..-3] + life[0..-3, 1..-2] + life[0..-3, 2..-1] + life[1..-2, 0..-3] + life[1..-2, 2..-1] + life[2..-1, 0..-3] + life[2..-1, 1..-2] + life[2..-1, 2..-1] p counts ! # one step of the game life[] = counts.eq(3) | (counts.eq(2) & life) p life

Conway’s Game of Life #!/usr/bin/env ruby -wKU ! require "rubygems"

require "narray" ! # build cells life = NArray.byte(5, 5) life[1, 1] = NArray.byte(3, 3).random!(2) p life ! # count neighbors counts = NArray.byte(*life.shape) counts[1..-2, 1..-2] = life[0..-3, 0..-3] + life[0..-3, 1..-2] + life[0..-3, 2..-1] + life[1..-2, 0..-3] + life[1..-2, 2..-1] + life[2..-1, 0..-3] + life[2..-1, 1..-2] + life[2..-1, 2..-1] p counts ! # one step of the game life[] = counts.eq(3) | (counts.eq(2) & life) p life

Conway’s Game of Life NArray.byte(5,5): [ [ 0, 0, 0,

0, 0 ], [ 0, 1, 0, 1, 0 ], [ 0, 0, 0, 0, 0 ], [ 0, 1, 1, 0, 0 ], [ 0, 0, 0, 0, 0 ] ] #!/usr/bin/env ruby -wKU ! require "rubygems" require "narray" ! # build cells life = NArray.byte(5, 5) life[1, 1] = NArray.byte(3, 3).random!(2) p life ! # count neighbors counts = NArray.byte(*life.shape) counts[1..-2, 1..-2] = life[0..-3, 0..-3] + life[0..-3, 1..-2] + life[0..-3, 2..-1] + life[1..-2, 0..-3] + life[1..-2, 2..-1] + life[2..-1, 0..-3] + life[2..-1, 1..-2] + life[2..-1, 2..-1] p counts ! # one step of the game life[] = counts.eq(3) | (counts.eq(2) & life) p life

Conway’s Game of Life NArray.byte(5,5): [ [ 0, 0, 0,

0, 0 ], [ 0, 1, 0, 1, 0 ], [ 0, 0, 0, 0, 0 ], [ 0, 1, 1, 0, 0 ], [ 0, 0, 0, 0, 0 ] ] #!/usr/bin/env ruby -wKU ! require "rubygems" require "narray" ! # build cells life = NArray.byte(5, 5) life[1, 1] = NArray.byte(3, 3).random!(2) p life ! # count neighbors counts = NArray.byte(*life.shape) counts[1..-2, 1..-2] = life[0..-3, 0..-3] + life[0..-3, 1..-2] + life[0..-3, 2..-1] + life[1..-2, 0..-3] + life[1..-2, 2..-1] + life[2..-1, 0..-3] + life[2..-1, 1..-2] + life[2..-1, 2..-1] p counts ! # one step of the game life[] = counts.eq(3) | (counts.eq(2) & life) p life

Conway’s Game of Life NArray.byte(5,5): [ [ 0, 0, 0,

0, 0 ], [ 0, 1, 0, 1, 0 ], [ 0, 0, 0, 0, 0 ], [ 0, 1, 1, 0, 0 ], [ 0, 0, 0, 0, 0 ] ] NArray.byte(5,5): [ [ 0, 0, 0, 0, 0 ], [ 0, 0, 2, 0, 0 ], [ 0, 3, 4, 2, 0 ], [ 0, 1, 1, 1, 0 ], [ 0, 0, 0, 0, 0 ] ] #!/usr/bin/env ruby -wKU ! require "rubygems" require "narray" ! # build cells life = NArray.byte(5, 5) life[1, 1] = NArray.byte(3, 3).random!(2) p life ! # count neighbors counts = NArray.byte(*life.shape) counts[1..-2, 1..-2] = life[0..-3, 0..-3] + life[0..-3, 1..-2] + life[0..-3, 2..-1] + life[1..-2, 0..-3] + life[1..-2, 2..-1] + life[2..-1, 0..-3] + life[2..-1, 1..-2] + life[2..-1, 2..-1] p counts ! # one step of the game life[] = counts.eq(3) | (counts.eq(2) & life) p life

Conway’s Game of Life NArray.byte(5,5): [ [ 0, 0, 0,

0, 0 ], [ 0, 1, 0, 1, 0 ], [ 0, 0, 0, 0, 0 ], [ 0, 1, 1, 0, 0 ], [ 0, 0, 0, 0, 0 ] ] NArray.byte(5,5): [ [ 0, 0, 0, 0, 0 ], [ 0, 0, 2, 0, 0 ], [ 0, 3, 4, 2, 0 ], [ 0, 1, 1, 1, 0 ], [ 0, 0, 0, 0, 0 ] ] #!/usr/bin/env ruby -wKU ! require "rubygems" require "narray" ! # build cells life = NArray.byte(5, 5) life[1, 1] = NArray.byte(3, 3).random!(2) p life ! # count neighbors counts = NArray.byte(*life.shape) counts[1..-2, 1..-2] = life[0..-3, 0..-3] + life[0..-3, 1..-2] + life[0..-3, 2..-1] + life[1..-2, 0..-3] + life[1..-2, 2..-1] + life[2..-1, 0..-3] + life[2..-1, 1..-2] + life[2..-1, 2..-1] p counts ! # one step of the game life[] = counts.eq(3) | (counts.eq(2) & life) p life

Conway’s Game of Life NArray.byte(5,5): [ [ 0, 0, 0,

0, 0 ], [ 0, 1, 0, 1, 0 ], [ 0, 0, 0, 0, 0 ], [ 0, 1, 1, 0, 0 ], [ 0, 0, 0, 0, 0 ] ] NArray.byte(5,5): [ [ 0, 0, 0, 0, 0 ], [ 0, 0, 2, 0, 0 ], [ 0, 3, 4, 2, 0 ], [ 0, 1, 1, 1, 0 ], [ 0, 0, 0, 0, 0 ] ] NArray.byte(5,5): [ [ 0, 0, 0, 0, 0 ], [ 0, 0, 0, 0, 0 ], [ 0, 1, 0, 0, 0 ], [ 0, 0, 0, 0, 0 ], [ 0, 0, 0, 0, 0 ] ] #!/usr/bin/env ruby -wKU ! require "rubygems" require "narray" ! # build cells life = NArray.byte(5, 5) life[1, 1] = NArray.byte(3, 3).random!(2) p life ! # count neighbors counts = NArray.byte(*life.shape) counts[1..-2, 1..-2] = life[0..-3, 0..-3] + life[0..-3, 1..-2] + life[0..-3, 2..-1] + life[1..-2, 0..-3] + life[1..-2, 2..-1] + life[2..-1, 0..-3] + life[2..-1, 1..-2] + life[2..-1, 2..-1] p counts ! # one step of the game life[] = counts.eq(3) | (counts.eq(2) & life) p life

A Data DSL

A Data DSL SQLite

Thinking About Data can be Hard

Thinking About Data can be Hard ‣SQLite has already solved

many hard problems for data storage and retrieval

Thinking About Data can be Hard ‣SQLite has already solved

many hard problems for data storage and retrieval ‣It gives you an entire language to express your data needs

Problem: IP to Country

Problem: IP to Country ‣Given an IP address, return the

country for that IP

Problem: IP to Country ‣Given an IP address, return the

country for that IP ‣this was a Ruby Quiz

Problem: IP to Country ‣Given an IP address, return the

country for that IP ‣this was a Ruby Quiz ‣Solutions were also expected to be efficient in memory and speed

Problem: IP to Country ‣Given an IP address, return the

country for that IP ‣this was a Ruby Quiz ‣Solutions were also expected to be efficient in memory and speed ‣This is a real world task I’ve had to do for my job

The Data

The Data # © 2002-2008 Webnet77.com # # # #

"0","16777215","IANA","410227200","ZZ","ZZZ","RESERVED" "50331648","67108863","ARIN","572572800","US","USA","UNITED STATES" "67108864","83886079","ARIN","0","US","USA","UNITED STATES" "100663296","117440511","ARIN","0","US","USA","UNITED STATES" "117440512","134217727","ARIN","880329600","US","USA","UNITED STATES"

The Data # © 2002-2008 Webnet77.com # # # #

"0","16777215","IANA","410227200","ZZ","ZZZ","RESERVED" "50331648","67108863","ARIN","572572800","US","USA","UNITED STATES" "67108864","83886079","ARIN","0","US","USA","UNITED STATES" "100663296","117440511","ARIN","0","US","USA","UNITED STATES" "117440512","134217727","ARIN","880329600","US","USA","UNITED STATES"

Solutions

Solutions ‣Many solved the problem with a binary search on

the file

Solutions ‣Many solved the problem with a binary search on

the file ‣Most of those prepossessed the file to make that search easier

Solutions ‣Many solved the problem with a binary search on

the file ‣Most of those prepossessed the file to make that search easier ‣I’m going to show a SQLite solution

Solutions ‣Many solved the problem with a binary search on

the file ‣Most of those prepossessed the file to make that search easier ‣I’m going to show a SQLite solution ‣It’s very close to the same speed (about 1/3rd of a second to lookup an IP)

Solutions ‣Many solved the problem with a binary search on

the file ‣Most of those prepossessed the file to make that search easier ‣I’m going to show a SQLite solution ‣It’s very close to the same speed (about 1/3rd of a second to lookup an IP) ‣I didn’t have to be clever or even add an index

Solutions ‣Many solved the problem with a binary search on

the file ‣Most of those prepossessed the file to make that search easier ‣I’m going to show a SQLite solution ‣It’s very close to the same speed (about 1/3rd of a second to lookup an IP) ‣I didn’t have to be clever or even add an index ‣It was easier for me to use full country names

Setup

Setup #!/usr/bin/env ruby -KU ! require "open-uri" require "zlib" !

require "rubygems" require "faster_csv" require "sqlite3" ! REMOTE_DB = "http://software77.net/cgi-bin/" + "ip-country/geo-ip.pl?action=download" LOCAL_DB = "country_ips.sqlite" ! File.unlink(LOCAL_DB) if ARGV.delete("-r") and File.exist? LOCAL_DB ! # ...

Build The Database

Build The Database # ... ! unless File.exist? LOCAL_DB db

= SQLite3::Database.new(LOCAL_DB) db.execute(<<-END_TABLE.strip) CREATE TABLE ips ( low_ip INTEGER, high_ip INTEGER, country TEXT ) END_TABLE open(REMOTE_DB) do |url| Zlib::GzipReader.new(url).each do |line| next if line =~ /\A\s*(?:#|\z)/ args = FCSV.parse_line(line).values_at(0..1, 6) db.execute(<<-END_INSERT.strip, *args) INSERT INTO ips( low_ip, high_ip, country) VALUES( ?, ?, ?) END_INSERT end end db.close end ! # ...

Build The Database # ... ! unless File.exist? LOCAL_DB db

Query

Query # ... ! ip = ARGV.shift or abort "Usage:

#{File.basename($PROGRAM_NAME)} IP" ip_int = ip.split(".").map { |n| Integer(n) }. pack("C*").unpack("N").first db = SQLite3::Database.new(LOCAL_DB) ! puts db.get_first_value(<<-END_SELECT.strip, :ip => ip_int) || "Unknown" SELECT country FROM ips WHERE low_ip <= :ip AND :ip <= high_ip END_SELECT

Query # ... ! ip = ARGV.shift or abort "Usage:

#{File.basename($PROGRAM_NAME)} IP" ip_int = ip.split(".").map { |n| Integer(n) }. pack("C*").unpack("N").first db = SQLite3::Database.new(LOCAL_DB) ! puts db.get_first_value(<<-END_SELECT.strip, :ip => ip_int) || "Unknown" SELECT country FROM ips WHERE low_ip <= :ip AND :ip <= high_ip END_SELECT

Did You Know?

Did You Know? ‣SQLite is totally free

Did You Know? ‣SQLite is totally free ‣You can receive

query results in a Hash to index by column name and/or convert column values to Ruby objects based on type

Did You Know? ‣SQLite is totally free ‣You can receive

query results in a Hash to index by column name and/or convert column values to Ruby objects based on type ‣It can work with in-memory databases

Did You Know? ‣SQLite is totally free ‣You can receive

query results in a Hash to index by column name and/or convert column values to Ruby objects based on type ‣It can work with in-memory databases ‣It can run queries across tables in multiple database files

Did You Know? ‣SQLite is totally free ‣You can receive

query results in a Hash to index by column name and/or convert column values to Ruby objects based on type ‣It can work with in-memory databases ‣It can run queries across tables in multiple database files ‣You can define SQL functions and aggregates for it in Ruby code

Ruby Friendly Data

Ruby Friendly Data #!/usr/bin/env ruby -KU ! require "rubygems" require

"sqlite3" ! country = ARGV.shift or abort "Usage: #{File.basename($PROGRAM_NAME)} COUNTRY" db = SQLite3::Database.new("country_ips.sqlite") db.results_as_hash = true db.type_translation = true ! db.execute( "SELECT * FROM ips WHERE country LIKE ?", "%#{country}%" ) do |match| low, high = match.values_at("low_ip", "high_ip"). map { |i| [i].pack("N").unpack("C*").join(".") } puts "%s: %15s - %15s" % [match["country"], low, high] end

Ruby Friendly Data #!/usr/bin/env ruby -KU ! require "rubygems" require

In-Memory

In-Memory #!/usr/bin/env ruby -KU ! # ... requires unchanged ...

! REMOTE_DB = "http://software77.net/cgi-bin/" + "ip-country/geo-ip.pl?action=download" ! db = SQLite3::Database.new(":memory:") ! # ... database loading unchanged ... ! stmt = db.prepare(<<-END_SELECT.strip) SELECT country FROM ips WHERE low_ip <= :ip AND :ip <= high_ip END_SELECT loop do print "IP address? " ip = gets.to_s.strip if ip =~ /\S/ ip_int = ip.split(".").map { |n| Integer(n) }. pack("C*").unpack("N").first puts stmt.execute(:ip => ip_int).each { |c| break c[0] } || "Unknown" else break end end

In-Memory #!/usr/bin/env ruby -KU ! # ... requires unchanged ...

Attach and Functions

Attach and Functions #!/usr/bin/env ruby -KU ! require "rubygems" require

"sqlite3" ! user = ARGV.shift or abort "Usage: #{File.basename($PROGRAM_NAME)} USER" db = SQLite3::Database.new("users.sqlite") db.execute("ATTACH DATABASE 'country_ips.sqlite' AS country_ips") ! db.create_function("IP2INT", 1) do |func, ip| func.result = ip.to_s.split(".").map { |n| Integer(n) }. pack("C*").unpack("N").first end ! sql = <<END_SQL SELECT users.name, users.ip, ips.country FROM users INNER JOIN ips ON ips.low_ip <= IP2INT(users.ip) AND IP2INT(users.ip) <= ips.high_ip WHERE users.name LIKE ? LIMIT 1 END_SQL puts db.get_first_row(sql, "%#{user}%").join(", ")

Attach and Functions #!/usr/bin/env ruby -KU ! require "rubygems" require

A Binary Tree

A Binary Tree RBTree

Sometimes you Just Need the big Guns

Sometimes you Just Need the big Guns ‣Binary search and

binary trees are pretty big guns in computing

Sometimes you Just Need the big Guns ‣Binary search and

binary trees are pretty big guns in computing ‣RBTree provides a super efficient binary tree implementation

Sometimes you Just Need the big Guns ‣Binary search and

binary trees are pretty big guns in computing ‣RBTree provides a super efficient binary tree implementation ‣It’s written in C

Another Solution: IP to Country

Another Solution: IP to Country ‣This time we will use

a real binary search

Another Solution: IP to Country ‣This time we will use

a real binary search ‣But we don’t have to write it

Another Solution: IP to Country ‣This time we will use

a real binary search ‣But we don’t have to write it ‣This drops the search time below 1/1,000th of a second

Another Solution: IP to Country ‣This time we will use

a real binary search ‣But we don’t have to write it ‣This drops the search time below 1/1,000th of a second ‣We will Marshal RBTree to build our persistent database

Another Solution: IP to Country ‣This time we will use

a real binary search ‣But we don’t have to write it ‣This drops the search time below 1/1,000th of a second ‣We will Marshal RBTree to build our persistent database ‣We will use RBTree’s bounds search methods to perform the search

The Same Setup

The Same Setup #!/usr/bin/env ruby -wKU ! require "open-uri" require

"zlib" ! require "rubygems" require "faster_csv" require "rbtree" ! REMOTE_DB = "http://software77.net/cgi-bin/" + "ip-country/geo-ip.pl?action=download" LOCAL_DB = "country_ips.marshal" ! File.unlink(LOCAL_DB) if ARGV.delete("-r") and File.exist? LOCAL_DB ! # ...

An Easier Load

An Easier Load # ... ! unless File.exist? LOCAL_DB ips

= RBTree.new open(REMOTE_DB) do |url| Zlib::GzipReader.new(url).each do |line| next if line =~ /\A\s*(?:#|\z)/ low, high, country = FCSV.parse_line(line).values_at(0..1, 6) ips[Integer(low)] = [Integer(high), country] end end File.open(LOCAL_DB, "wb") { |file| Marshal.dump(ips, file) } end ! # ...

An Easier Load # ... ! unless File.exist? LOCAL_DB ips

A Much Faster Search

A Much Faster Search # ... ! ips = File.open(LOCAL_DB,

"rb") { |file| Marshal.load(file) } loop do print "IP address? " ip = gets.to_s.strip if ip =~ /\S/ ip_int = ip.split(".").map { |n| Integer(n) }. pack("C*").unpack("N").first match = ips.upper_bound(ip_int) puts match && ip_int <= match.last.first ? match.last.last : "Unknown" else break end end

A Much Faster Search # ... ! ips = File.open(LOCAL_DB,

Other Nice Features

Other Nice Features ‣RBTree is pretty much a drop in

replacement for a Hash you want to keep ordered by keys

Other Nice Features ‣RBTree is pretty much a drop in

replacement for a Hash you want to keep ordered by keys ‣Just having RBTree available magically speeds up Ruby SortedSet (over 15 times faster for simple iteration) in the standard library

An Ordered Hash

An Ordered Hash #!/usr/bin/env ruby -wKU ! require "rubygems" require

"rbtree" ! ordered_hash = RBTree.new ! ordered_hash[2] = "two" ordered_hash[1] = "one" ordered_hash[3] = "three" ! ordered_hash.each do |key, value| puts "#{key}: #{value}" end # >> 1: one # >> 2: two # >> 3: three

Magically Improving SortedSet

Magically Improving SortedSet #!/usr/bin/env ruby -wKU ! # pass an

argument, like --rbtree, and this sets up the load require "rubygems" unless ARGV.empty? ! require "set" dictionary = SortedSet.new File.foreach("/usr/share/dict/words") do |word| dictionary << word.strip if word =~ /\S/ end ! start = Time.now dictionary.to_a # force the set into order puts "Time to order: #{Time.now - start}"

The Filesystem as a Hash

The Filesystem as a Hash FSDB

Stay Flexible

Stay Flexible ‣Data can be in many different formats and

related in many different ways

Stay Flexible ‣Data can be in many different formats and

related in many different ways ‣FSDB gives you a lot of flexibility in these areas

Stay Flexible ‣Data can be in many different formats and

related in many different ways ‣FSDB gives you a lot of flexibility in these areas ‣Get from: http:// redshift.sourcefor ge.net/fsdb/

Problem: Server Monitoring

Problem: Server Monitoring ‣At my job we do a lot

of server monitoring

Problem: Server Monitoring ‣At my job we do a lot

of server monitoring ‣We collect various statistics from servers at regular intervals

Problem: Server Monitoring ‣At my job we do a lot

of server monitoring ‣We collect various statistics from servers at regular intervals ‣We later analyze this data for spikes and trends

Problem: Server Monitoring ‣At my job we do a lot

of server monitoring ‣We collect various statistics from servers at regular intervals ‣We later analyze this data for spikes and trends ‣Time Series data is one thing RDBMS don’t do well

A Solution

A Solution ‣Store the data so it is easy to

focus in on the parts that matter to you now

A Solution ‣Store the data so it is easy to

focus in on the parts that matter to you now ‣FSDB is essentially a Hash backed by the filesystem

A Solution ‣Store the data so it is easy to

focus in on the parts that matter to you now ‣FSDB is essentially a Hash backed by the filesystem ‣This allows you to use paths to drill down to subsets of the data

A Solution ‣Store the data so it is easy to

focus in on the parts that matter to you now ‣FSDB is essentially a Hash backed by the filesystem ‣This allows you to use paths to drill down to subsets of the data ‣It avoids irrelevant data and even the need for an index in some cases

A Solution ‣Store the data so it is easy to

focus in on the parts that matter to you now ‣FSDB is essentially a Hash backed by the filesystem ‣This allows you to use paths to drill down to subsets of the data ‣It avoids irrelevant data and even the need for an index in some cases ‣Techniques like this improved our graphing speed from almost four seconds to well under one

FSDB Structure

Creating a Database

Creating a Database #!/usr/bin/env ruby -wKU ! require "fsdb" !

module TimeSeries DB = FSDB::Database.new("server_stats/") ! module_function def record(data, time = Time.now) DB[time.strftime("%Y/%m/%d/%H/%M.obj")] = data end # ...

Creating a Database #!/usr/bin/env ruby -wKU ! require "fsdb" !

Query

Query # ... def sum(year, *args, &block) total = 0

path = [year, *args][0..5].join("/").gsub(/\b\d\b/, '0\0') if File.extname(path) == ".obj" total += block[DB[path]] else (DB[path] || []).each do |new_path| total += sum(File.join(path, new_path), &block) end end total end def average(*args) count = 0 sum(*args) { |data| count += 1; yield data } / count.to_f end end ! # ...

Query # ... def sum(year, *args, &block) total = 0

sample Usage

sample Usage # ... ! if FILE == $PROGRAM_NAME include

TimeSeries record( { :load_average => 70, :disk_free => 50 }, Time.local(2008, 9, 4, 12, 0) ) record( { :load_average => 10, :disk_free => 50 }, Time.local(2008, 9, 4, 12, 30) ) record( { :load_average => 20, :disk_free => 51 }, Time.local(2008, 9, 4, 13, 0) ) p average(2008, 9, 4) { |data| data[:load_average] } # >> 33.3333333333333 p average(2008, 9, 4, 12) { |data| data[:load_average] } # >> 40.0 end

Other Nice Features

Other Nice Features ‣FSDB is multi-thread and multi-process safe on

most platforms

Other Nice Features ‣FSDB is multi-thread and multi-process safe on

most platforms ‣It supports read only and read/write transactions and they can even be nested

Other Nice Features ‣FSDB is multi-thread and multi-process safe on

most platforms ‣It supports read only and read/write transactions and they can even be nested ‣You can define your own formats for files

Transactions

Transactions #!/usr/bin/env ruby -wKU ! require "fsdb" db = FSDB::Database.new("server_stats/")

! # a read only transaction (shared lock) db.browse "2008/09/04/12/00.obj" do |data| p data[:load_average] # >> 70 p data[:disk_free] # >> 50 end ! # a read/write transaction (exclusive lock) db.replace "2008/09/04/12/00.obj" do |data| data.merge(:uptime => 21 * 60) end ! p db["2008/09/04/12/00.obj"][:uptime] # >> 1260

Transactions #!/usr/bin/env ruby -wKU ! require "fsdb" db = FSDB::Database.new("server_stats/")

A Custom Format

A Custom Format #!/usr/bin/env ruby -wKU ! require "fsdb" db

= FSDB::Database.new("images/") ! PNG_FORMAT = FSDB::Format.new( /\.png\z/i, :binary, :name => "PNG_FORMAT", :load => lambda { |f| f.seek(16) w, h = f.read(8).unpack("N2") {:width => w, :height => h} }, :dump => lambda { raise "Read only format." } ) db.formats = [PNG_FORMAT] ! db.browse_each_child "/" do |image, details| puts "%s: %p" % [image, details] end # >> /fsdb_tree.png: {:width=>727, :height=>249} # >> /ip_to_country_quiz.png: {:width=>798, :height=>732} # >> /scout.png: {:width=>865, :height=>713}

A Custom Format #!/usr/bin/env ruby -wKU ! require "fsdb" db

Dirt Simple IPC

Dirt Simple IPC Rinda

The Importance of Networking

The Importance of Networking ‣When you need more processing power,

you have to start hooking CPU’s together

The Importance of Networking ‣When you need more processing power,

you have to start hooking CPU’s together ‣Rinda can make the communication between processes a snap

Problem: Descrambling

Problem: Descrambling ‣Given some scrambled letters, find all dictionary words

that can be formed by rearranging those letters

Problem: Descrambling ‣Given some scrambled letters, find all dictionary words

that can be formed by rearranging those letters ‣This task represents any task that just needs some processing time to sort out

Problem: Descrambling ‣Given some scrambled letters, find all dictionary words

that can be formed by rearranging those letters ‣This task represents any task that just needs some processing time to sort out ‣All this requires is some I/O and simple comparisons

The Trivial Solution

The Trivial Solution #!/usr/bin/env ruby -wKU ! class String def

signature strip.downcase.delete("^a-z").split("").sort.join end end ! pattern = ARGV.shift.signature descrambled = [ ] ! File.foreach("/usr/share/dict/words") do |word| descrambled << word if word.signature == pattern end ! puts descrambled

The Trivial Solution #!/usr/bin/env ruby -wKU ! class String def

signature strip.downcase.delete("^a-z").split("").sort.join end end ! pattern = ARGV.shift.signature descrambled = [ ] ! File.foreach("/usr/share/dict/words") do |word| descrambled << word if word.signature == pattern end ! puts descrambled

Dividing the Work

Dividing the Work ‣Heavy processing almost always benefits from more

processes doing the work

Dividing the Work ‣Heavy processing almost always benefits from more

processes doing the work ‣This is true multiprocessing, unlike Ruby’s thread model

Dividing the Work ‣Heavy processing almost always benefits from more

processes doing the work ‣This is true multiprocessing, unlike Ruby’s thread model ‣Rinda’s TupleSpace makes the Inter- Process Communication super easy

Dividing the Work ‣Heavy processing almost always benefits from more

processes doing the work ‣This is true multiprocessing, unlike Ruby’s thread model ‣Rinda’s TupleSpace makes the Inter- Process Communication super easy ‣This task is I/O bound, but it still halved the time to put four processes on it

Setup

Setup #!/usr/bin/env ruby -wKU ! require "rinda/tuplespace" ! class String

def signature strip.downcase.delete("^a-z").split("").sort.join end end ! DICT = "/usr/share/dict/words" workers = ARGV.first =~ /\A\d+\z/ ? ARGV.shift.to_i : 4 pattern = ARGV.shift.signature chunk_size = File.stat(DICT).size / workers ! # ...

Spawn Workers

Spawn Workers # ... ! workers.times do |n| fork do

descrambled = [ ] File.open(DICT) do |words| my_start = chunk_size * n my_end = my_start + chunk_size words.seek(my_start) words.gets unless my_start.zero? words.each do |word| descrambled << word if word.signature == pattern break if words.pos > my_end end end ! results = Rinda::TupleSpaceProxy.new( DRbObject.new_with_uri("druby://localhost:61676") ) results.write([pattern, descrambled]) end end ! # ...

Spawn Workers # ... ! workers.times do |n| fork do

descrambled = [ ] File.open(DICT) do |words| my_start = chunk_size * n my_end = my_start + chunk_size words.seek(my_start) words.gets unless my_start.zero? words.each do |word| descrambled << word if word.signature == pattern break if words.pos > my_end end end ! results = Rinda::TupleSpaceProxy.new( DRbObject.new_with_uri("druby://localhost:61676") ) results.write([pattern, descrambled]) end end ! # ...

Collect Results

Collect Results # ... ! results = Rinda::TupleSpace.new DRb.start_service("druby://localhost:61676", results)

workers.times do descrambled = results.take([/\b#{Regexp.escape(pattern)}\b/, Array]) puts descrambled.last end Process.waitall

Collect Results # ... ! results = Rinda::TupleSpace.new DRb.start_service("druby://localhost:61676", results)

Other Nice Features

Other Nice Features ‣You can set expiration times for tuples

added to a TupleSpace

Other Nice Features ‣You can set expiration times for tuples

added to a TupleSpace ‣Rinda also comes with a RingServer for zero configuration networking

Using RingServer

Using RingServer #!/usr/bin/env ruby -wKU ! require "rinda/ring" # for

RingServer require "rinda/tuplespace" # for TupleSpace ! # start a RingServer DRb.start_service Rinda::RingServer.new(Rinda::TupleSpace.new) ! # ...

Using RingServer #!/usr/bin/env ruby -wKU ! require "rinda/ring" # for

RingServer require "rinda/tuplespace" # for TupleSpace ! # start a RingServer DRb.start_service Rinda::RingServer.new(Rinda::TupleSpace.new) ! # ...

Using RingServer #!/usr/bin/env ruby -wKU ! require "rinda/ring" # for

RingServer require "rinda/tuplespace" # for TupleSpace ! # start a RingServer DRb.start_service Rinda::RingServer.new(Rinda::TupleSpace.new) ! # ... #!/usr/bin/env ruby -wKU ! require "rinda/ring" # for RingFinger require "rinda/tuplespace" # for TupleSpace ! # find a RingServer DRb.start_service ring_server = Rinda::RingFinger.primary ! # ...

Hidden Gems

Hidden Gems

More Decks by jeg2

Other Decks in Technology

Featured

Transcript