Upgrade to Pro — share decks privately, control downloads, hide ads and more …

R *is* Production Safe

Avatar for sellorm sellorm
November 15, 2016

R *is* Production Safe

For those of us within the R Community, it almost seems like a silly question but, based on my experiences working with those outside the community, it can be a very real fear. In a sense, this talk is more aimed at those for whom R is a strange and exotic beast to be feared, but hopefully you’ll get something from it too ;)

Avatar for sellorm

sellorm

November 15, 2016

More Decks by sellorm

Other Decks in Technology

Transcript

  1. Mark Sellors – Head of Data Engineering [email protected] R is

    Production-Safe Mark Sellors Head of Data Engineering - Mango Solutions
  2. Mark Sellors – Head of Data Engineering [email protected] About Me

    • Head of Data Engineering • Run Mango Data Labs • Architecture • Automation • Industrialisation • DevOps/DataOps • @sellorm dsradar.com
  3. Mark Sellors – Head of Data Engineering [email protected] What is

    production? • Many different things to many people • Could be: • A web-scale, public facing server environment • Could just be a simple script • In general it’s just something that gets run more than once, and is important to your organisation
  4. Mark Sellors – Head of Data Engineering [email protected] Things We

    Know About R… • Mature • Well documented • Well understood • Commercial versions and/or support • Dynamic – fast moving ecosystem • Amazing community
  5. Mark Sellors – Head of Data Engineering [email protected] Things other

    people think about R… • Weird syntax • Not many users • Only used by statisticians and Data Scientists • Performance problems • Not very versatile • No good for large data sets
  6. Mark Sellors – Head of Data Engineering [email protected] Some myths

    about R… • Weird syntax • Not many users • Only used by statisticians and Data Scientists • Performance problems • Not very versatile • No good for large data sets
  7. Mark Sellors – Head of Data Engineering [email protected] Disconnect, what

    disconnect? • We all know that R is great • ”All the people I follow on twitter are using R in production” • There are conferences, like this one (!), where even more people are using R
  8. Mark Sellors – Head of Data Engineering [email protected] Why the

    disconnect? • Great communities can also create powerful feedback loops and echo chambers • R’s domain specific nature makes it somewhat unapproachable for many outside the community • Fear – A thing that I don’t understand must surely be a bad thing
  9. Mark Sellors – Head of Data Engineering [email protected] How do

    we combat fear? • Education • Outreach • Collaboration • Empathy
  10. Mark Sellors – Head of Data Engineering [email protected] Allianz •

    What: Insurance Claim scoring • How: API’s in R • Using: Rserve, Java
  11. Mark Sellors – Head of Data Engineering [email protected] Worldpay •

    What: Call Centre CRM tool and x-sell pipeline • How: Web app written in R • Using: Shiny
  12. Mark Sellors – Head of Data Engineering [email protected] ONS •

    What: Migrate from SAS • How: Migrate SAS based services to R • Using: Plumber & command line
  13. Mark Sellors – Head of Data Engineering [email protected] Hedge Fund

    • What: Batch based scoring system • How: Sophisticated command line application • Using: R and various databases
  14. Mark Sellors – Head of Data Engineering [email protected] It’s not

    that hard! • Collaboration • Thinking about what they’re doing • Implementing more formal methodologies • Rigour around processes • Well managed release strategy • Isolated environments
  15. Mark Sellors – Head of Data Engineering [email protected] How much

    rigour do I need? This all sounds like a lot of effort
  16. Mark Sellors – Head of Data Engineering [email protected] Rigour investment

    checklist • What’s the impact of what you’re doing? • Who’s affected if it’s not working? • What is the cost of an outage to the business? • Why is the new solution even needed? • Where will the code be run? • When will it be run?
  17. Mark Sellors – Head of Data Engineering [email protected] Why does

    any of this matter? • Reduce time to insight • Lower costs • Improved quality
  18. Mark Sellors – Head of Data Engineering [email protected] Don’t despair,

    there is hope • Find a sympathetic person on the ops side • Learn to do it yourself • Start small • Amaze people!
  19. Mark Sellors – Head of Data Engineering [email protected] > summary(talk)

    •You can use R in production environments • If you think about what you’re doing • and approach the task with care • Find sympathetic collaborators • If you are doing it already, tell people about it • If you aren’t, what are you waiting for?