Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Lightweight Business Intelligence in Ruby

Lightweight Business Intelligence in Ruby

Long the provence of specialists in the hermetic world of enterprise software development, business intelligence (BI) is increasingly important to smaller, more agile companies and startups who need access to near-real-time information to make critical business decisions. With its support for aggregation and reduction of massive amounts of data and its flexible schemas, MongoDB is a great choice for creating lightweight, denormalized data stores optimized for BI, with the added bonus of peaceful co-existence with transactional data stores. In this talk I will explore how Trunk Club captures and analyzes customer information, monitors user behaviour, feeds machine-learning algorithms for decision support, and delivers value to business stakeholders through simple querying and reporting interfaces.

Coraline Ada Ehmke

April 06, 2013
Tweet

More Decks by Coraline Ada Ehmke

Other Decks in Programming

Transcript

  1. WHO AM I? A developer with a long memory &

    a longer history An active open source contributor A senior engineer at Trunk Club A lifelong learner
  2. Service-oriented startup Software optimizes and streamlines business processes Technology is

    a key differentiator Engineering provides leverage for scaling the business TRUNK CLUB
  3. OUR ENGINEERING GOALS Startups make critical decisions on a daily

    basis Better data leads to better decision-making Our mission is ∴ to provide this data in a timely and useful form
  4. SOME CRITICAL DATA POINTS Marketing campaign performance Member on-boarding funnel

    Trunk lifecycle Stylist interactions Product performance
  5. WHAT IS BUSINESS INTELLIGENCE? Collection & organization of mission-critical knowledge

    Historical view of business operations Tools to support decision making
  6. THE NAÏVE APPROACH Reporting out of the transactional database Raw

    SQL embedded in your code for “performance” Granting direct db access to stakeholders
  7. THE NAÏVE APPROACH: SHORTCOMINGS Fighting the schema with complex joins

    Poor performance Impact on production resources
  8. THE ENTERPRISE APPROACH Distinct BI database Nightly ETL (extract, transform,

    load) Schema designed for reporting Separate hardware and software stack Combination of static and dynamic reports
  9. THE ENTERPRISE APPROACH: SHORTCOMINGS 24 hour delay in information Expensive

    to configure and maintain Requires highly specialized resources Hard to change your mind or adapt Enterprise-y
  10. TRADITIONAL BI: THE WRONG HAMMER Waterfall approach Painful data migrations

    Brittle ETL processes No automated testing Logic embedded in the data store
  11. THE FOUR NOBLE GOALS OF THE LIGHTWEIGHT BI APPROACH Provide

    real-time data to support decisions Leverage familiar technology & infrastructure Use existing development staff Support an iterative, agile approach
  12. GETTING STARTED Collaborate! Determine KPIs that actually matter Figure out

    what sort of questions the business is asking on a daily basis
  13. TURN INFERENCES INTO FACTS Find a way to tell a

    story with your data Design your schema based on facts De-normalize like a boss
  14. PRESENT ANSWERS Provide a central, single source of truth Present

    the data wherever it’s needed Dashboard design is harder than you think Plan for iterations & ongoing collaboration
  15. BRINGING RUBY TO THE PARTY Makes agile, test-driven development easy

    Quickly deploy new apps Plenty of visualization libraries Powerful SQL & NoSQL ORMs Great data munging capabilities
  16. LEVERAGING MONGODB Flexible and dynamic schemas Support for native datatypes

    Powerful querying and aggregation Fast and performant Easy to scale up
  17. PARALLEL DB DEPLOYMENT Modern frameworks support multiple orms Use SQL

    for transactions Use NoSQL for reporting Business logic in your apps, not in your database
  18. STATISTICAL MODELS Collections of facts, not attributes Data spanning deep

    and wide object graphs De-normalized and optimized for reporting
  19. STATISTICAL MODELS module  MemberDataProfiles    class  Performance      

     include  Mongoid::Document        include  Mongoid::Timestamps        field  :member_id,                                  :type  =>  Integer        field  :annual_trunk_frequency,        :type  =>  Integer,    :default  =>  0        field  :average_trunk_value,              :type  =>  Float,        :default  =>  0.0        field  :is_customer,                              :type  =>  Boolean,    :default  =>  false        field  :keep_rate,                                  :type  =>  Float,        :default  =>  0.0        field  :last_transaction_date,          :type  =>  Date        field  :member_creation_date,            :type  =>  Date        field  :total_value,                              :type  =>  Float,        :default  =>  0.0        field  :total_number_of_trunks,        :type  =>  Integer,    :default  =>  0    end end
  20. STREAMING ETL Event-triggered, continuous data extraction Calculations are defined in

    code rather than SQL or ETL scripts Allow resource-intensive data munging to happen in the background Provide near-real-time data
  21. STREAMING ETL #  rabbit_notifier.rb class  RabbitNotifier        def

     self.notify_of_action(model,  action,  extras  =  {})            notify(                :model_actions,                  model,                headers(model,  extras.merge('action'  =>  action.to_s))            )          end ... end
  22. STREAMING ETL #  listeners.rb ListenerConfig.map  do    route  :product_shipped,  

       :to  =>  UpdatesProduct    route  :product_returned,    :to  =>  UpdatesProduct end
  23. STREAMING ETL #  updates_product.rb def  self.with(json)    product  =  Product.init_with_params(extracted_params(json))

       product.update_stats! end def  self.extracted_params(json)    JSON.parse(json,  :symbolize_names  =>  true) end
  24. STREAMING ETL #  product.rb class  Product    include  Mongoid::Document  

     include  Mongoid::Timestamps    include  Products::Calculations    field  :name    field  :price,                          :type  =>  Float,      :default  =>  0.0    field  :cost,                            :type  =>  Float,      :default  =>  0.0    field  :profit_margin,          :type  =>  Float,      :default  =>  0.0    field  :keep_rate,                  :type  =>  Float,      :default  =>  0.0    field  :quantity_shipped,    :type  =>  Integer,  :default  =>  0    field  :quantity_returned,  :type  =>  Integer,  :default  =>  0    field  :profitability,          :type  =>  Float,      :default  =>  0.0    field  :trunkability,            :type  =>  Float,      :default  =>  0.0    field  :positive_feedback,  :type  =>  Array,      :default  =>  []    field  :negative_feedback,  :type  =>  Array,      :default  =>  [] ...
  25. STREAMING ETL #  product.rb  cont’d        ...  

     def  self.init_with_params(params={})        product  =  Product.where(:name  =>  params[:name]).first            product  ||=  Product.new(params)    end    def  update_stats!        calculate_stats        save    end end
  26. APIS EVERYWHERE Provide easy access to your data Allow reuse

    of data in novel ways Quickly build dashboards and data explorers Use the Faceted gem to make building APIs easy
  27. LEVIATHAN Records events from all applications Subscribes to all message

    queues Collects and displays real-time data Browse, search, & drill-down interface Longitudinal analysis with dynamic cohorts
  28. LEVIATHAN: EVENT MODEL class  Event    include  Mongoid::Document    include

     Mongoid::Timestamps    field  :label    field  :application    field  :details,  :type  =>  Hash,  :default  =>  {}    def  self.record!(label,  application,  params={})        Event.create(            :label  =>  label,            :application  =>  application,            :details  =>  params        )    end    def  self.search_details(criteria={})        where("details.#{criteria.keys.first}"  =>  /#{criteria.values.first}/i)    end end
  29. SUMMING UP Real-time data is valuable and possible Step outside

    of purely relational thinking Use the technologies you’re most familiar with
  30. SUMMING UP Wrap your data in easy-to-use APIs Build micro-apps

    to deliver stakeholder-specific dashboards Be agile, not enterprisey