BUILDING PRODUCTS THAT MAKE DATA SPEAK LEGACY / CRM / HEALTHCARE (unable to answer to questions despite having the data) ETL MODERN PRODUCT / APP (tailored to answer to those questions)
YOU CAN DO MORE WITH ETL & DATA PROCESSING… ▸ Migrate your data from a schema to another (with grace). ▸ Generate reports automatically. ▸ Synchronize all or part of 2 of your apps. ▸ Prepare and clean data before indexing them for full-text search. ▸ Aggregate data from multiple sources inside your searchable app. ▸ Geocode records to present them online to your users. ▸ Implement a data import or export for your users.
require 'csv' class CSVSource def initialize(filename:) @filename = filename end def each csv = CSV.open(@filename, headers: true) csv.each do |row| yield(row.to_hash) end csv.close end end
require 'recurly' class RecurlyInvoices def initialize(from:, to:, fields:, cache: NullCache.new) @range = (from..to) @cache = cache @fields = fields end def each @range.each do |number| cache_key = ([number][email protected]).map(&:to_s).join(':') row = @cache.fetch(cache_key) do invoice = Recurly::Invoice.find(number) invoice.attributes.slice(*@fields) end yield row.dup end end end
class ParseDate def initialize(from:, to:, format:) @from, @to = from, to @format = format end def process(row) row[@to] = Date.strptime(row[@from], @format) row end end
class CSVDestination def initialize(output_file) @csv = CSV.open(output_file, 'w') @headers_written = false end def write(row) unless @headers_written @headers_written = true @csv << row.keys end @csv << row.values end def close @csv.close end end
▸ Code-centric ETL. ▸ Versioned in git (branches, yay!). ▸ Testable components, with clear separation of concerns. ▸ Reusable components across jobs. ▸ ETL jobs easy to maintain on the very long run. ▸ Ruby ecosystem (tap into gems for extra features). ▸ Blueprints and components soon available in Kiba Pro. KIBA ETL