Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Opower(ing) Energy Efficiency

Elastic Co
October 06, 2015

Opower(ing) Energy Efficiency

The challenge and success of using Elasticsearch to provide data visualization and targeting for personalizing utility customer messaging intended to maximize energy savings.

Elastic Co

October 06, 2015
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. What we do at Opower •  Motivate and enable a

    sustainable energy future! •  Reduce energy usage by working with utilities to inform their customers about their energy usage and how to reduce it. •  We send out reports, various alerts, and provide a web presence. •  Since we started we have saved ~ 6 TeraWatt Hours! 3
  2. Segmentation & Targeting at Opower •  We provide the tools

    that allow our users to ▪  Explore our customer population ▪  Break it down into smaller populations ▪  Customize the experience for those customers ▪  Do it all visually without deep technical expertise in our datastore
  3. Engineering is people! 5 Joanna Kochaniak Jan Rubio Nayyara Samuel

    Jamie Swogger Salman Suhail Nicholas Grippin Franklin Zheng Ravjot Pasricha Anton Vattay Ben Siemon
  4. The Problem •  Unique Customers ▪  Optimize energy savings ▪ 

    Many attributes •  Utilities Requirements ▪  Respect their structure •  Prove our Savings ▪  Randomized Control Tests ▪  Verified by 3rd parties 6 •  Customer ▪  Recipient ▪  Did not opt out ▪  Active Account ▪  Is a home owner •  Has a larger house •  Content Improve Insulation New Water Heater etc..
  5. The Old Way •  Datastores ▪  MySQL cluster ▪  HDFS

    •  Ad-hoc selection •  Hours of execution •  Hard to generate •  No Attribute Library 7 •  select * from customer ▪  inner join account… ▪  inner join service point ▪  inner join preferences ▪  inner join blah… ▪  left join report ▪  where blah is null ▪  etc… •  Come back in a few hours
  6. How do we solve this? •  Use a search index!

    ▪  Provide a place to fuse our data ▪  Query data quickly, on the order of seconds •  Create a DSL to model the problem •  Create a UI to visualze it •  Why Elasticsearch? ▪  Easiest to set up ▪  Extensive documentation ▪  Good scaling story
  7. How did we get there? 10 •  How do we

    get out data into Elasticsearch? •  How do we support validating energy savings? •  How do we represent and enforce our data schema? •  How do we represent belonging to some group? •  How do we represent the document hierarchy? •  How does this DSL look? •  How does the UI work? •  How does Elasticsearch scale with our needs?
  8. Data Import / Refresh Guiding principles ▪  Idempotent ▪  Disposable

    ▪  Create don’t update ▪  Concurrency (r/w speed) •  Map Reduce and batch ▪  Read elementary components ▪  Compose into hierarchy ▪  (Over)Write to ES ▪  Can also do minor transforms 11
  9. Balanced Random Population Splits Start with balanced population ▪  Measuring

    savings is easier at the end. ▪  Must balance the variance of many attributes •  What is “good enough” balance? •  Solution ▪  Run many random splits (thousands) ▪  Index candidates (lists of customer ids) ▪  Query Elasticsearch for variance of each candidate ▪  Choose the best balanced candidate 12
  10. Document Schema Guiding Principles ▪  Idempotent refresh ▪  Immutable /

    Overwrite Only ▪  Mapping not canonical •  Class + Annotations ▪  Document hierarchy ▪  Attribute metadata 13
  11. Membership Representation •  As a basic field, required customer re-write

    ▪  Changes often •  Documents of customer ids ▪  Terms Lookup Filter was slow •  “Marker Document” ▪  Child document of Customer ▪  Tiny data-less document ▪  Independent update and query 14 Group 1 = 1, 2, 5, 8, 9, … , n Customer1 Group 1
  12. Document/Attribute Hierarchy •  All queries are relative to a root

    “Customer” object •  Mix of child and nested documents ▪  Children for independent refresh •  Remaining challenge: Querying ▪  Want: Active/ELEC ▪  Get: One active GAS, one inactive ELEC 15 Customer Utility Account Utility Account Active? F Active? T Type: ELEC Type: GAS
  13. DSL Design (SRL) 17 •  Guiding Principles ▪  Model a

    dataflow ▪  Encapsulate against future backwards incompatible changes. o  Have done multiple Elasticsearch upgrades ▪  Easy to generate •  Each population (node in the tree) ▪  Generate Elasticsearch filter to get counts or aggregations. •  Function ▪  Split on Attribute ▪  Balanced Random Split ▪  Merges
  14. Scaling Elasticsearch •  Started with 0.90.7 ▪  Upgrade often. ▪ 

    Metrics •  Some war stories ▪  Child document id cache (< 1.3.2) o  The summer of manual cache clears ▪  The incredibly slow terms lookup query ▪  An endless cycle of garbage collection ▪  An incredible deep and foreboding search queue and UX implications ▪  Aliasing across all indexes can cause a field cache overflow 18
  15. It all comes together •  Elasticsearch is really critical in

    bringing our vision to life ▪  Currently used for selecting almost every utility with Opower ▪  Used daily by non-technical users ▪  Enabled difficult selections ▪  Met cost saving goals for operations ▪  Deliver unique campaigns to our customers ▪  Saved energy! •  It’s been two years ▪  Planning on 2.0 upgrade ▪  Long term core technology