Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building Data Products at Home: Automating Phil...

Building Data Products at Home: Automating Philips Hue Lights

Today’s smart devices come with full fledged API’s driving the proliferation of micro-services as apps. Philips Hue lights are no exception. With the coming of age of open source libraries for data science, building data products at home has become easier than ever. Not only can we automate away repetitive task , we also get to experiment freely in the tools we use; a comfortable way to learn new tricks.

These are the slides of the talk I gave at the 2017 Budapest BI Conference on how I built a small system to control my Philips Hue lights at home, and what I have learned in the process. The brains of the project is written in R, while more general programming tasks such as interfacing with API’s and workflow management are done in Python.

Tamas Szilagyi

November 20, 2017
Tweet

More Decks by Tamas Szilagyi

Other Decks in Programming

Transcript

  1. • Senior Data Analyst at in Amsterdam • Avid user

    • Blog on hobby data projects: http://tamaszilagyi.com/ tamaszilagyi.com About me
  2. What are data products? “Data products automate complex analysis tasks

    or use technology to expand the utility of a data informed model, algorithm or inference.” (Coursera: Developing Data Products) tamaszilagyi.com
  3. • You most likely developed / worked on data products.

    tamaszilagyi.com What are data products?
  4. • You most likely developed / worked on data products.

    • 100% sure interacted with them as an end-user tamaszilagyi.com What are data products?
  5. Doing them at home Pro’s • Both creator and end-user

    • Experiment with tools tamaszilagyi.com
  6. Doing them at home Pro’s • Both creator and end-user

    • Experiment with tools • No deadline tamaszilagyi.com
  7. Doing them at home Pro’s • Both creator and end-user

    • Experiment with tools • No deadline Cons? tamaszilagyi.com
  8. Doing them at home Pro’s • Both creator and end-user

    • Experiment with tools • No deadline Cons? tamaszilagyi.com
  9. Most discussion focuses on which one is better. Why not

    use both? Complement each other. tamaszilagyi.com R and Python …wait, what?
  10. tamaszilagyi.com Case study: Hue lights Automating away the light switch

    • Merge daily files
 • Train model Save .json daily Expose model as API PUT request GET request
  11. 1. Log, clean and save data 2. Explore, train model

    3. Communicate with bridge tamaszilagyi.com Case study: Hue lights Automating away the light switch
  12. tamaszilagyi.com Case study: Hue lights Automating away the light switch

    1. Log, clean and save data 2. Explore, train model 3. Communicate with bridge
  13. tamaszilagyi.com Case study: Hue lights Automating away the light switch

    1. Log, clean and save data 2. Explore, train model 3. Communicate with bridge
  14. • Prototype on laptop • Push to git • Deploy

    on Pi tamaszilagyi.com Workflow
  15. • Python is more mature when it comes to general

    purpose programming • Glue is workflow management tool Luigi • Task are executed in order of dependencies tamaszilagyi.com Log, clean and save data
  16. tamaszilagyi.com Log, clean and save data 1. Send GET request

    and save file to new file every day locally.
  17. tamaszilagyi.com Log, clean and save data 1. Send GET request

    and save file to new file every day locally. 2. Copy data to S3 for long-term storage.
  18. tamaszilagyi.com Log, clean and save data 1. Send GET request

    and save file to new file every day locally. 2. Copy data to S3 for long-term storage. 3. Read full data, train and save model.
  19. This way we can chain together all the tasks of

    our pipeline. One task’s output goes into the next one’s require() method. tamaszilagyi.com Log, clean and save data An example of a Luigi task
  20. This is where R really shines: • Interactive use •

    Stats tamaszilagyi.com Explore data, train model
  21. tamaszilagyi.com • Unbalanced dataset • We have a time series

    Potential caveats Explore data, train model
  22. Add model training as task to Luigi pipeline? Package up

    R code as a script and use subprocess.
  23. Communicate with Bridge tamaszilagyi.com Custom predict function 1. Read last

    model from S3 2. Expose model as an API on Pi 3. GET prediction, PUT brightness value
  24. tamaszilagyi.com Lessons learned • Seeing the model in action has

    something deeply satisfying • Try out new tech on something else that the iris dataset
  25. tamaszilagyi.com Lessons learned • Seeing the model in action has

    something deeply satisfying • Try out new tech on something else that the iris dataset • Compiling Rcpp on Raspberry Pi takes forever