Data Science DC Meetup Talk - Jan. 2020

Fd59f90efdaa9dea8f7d9c2f0c930a2b?s=47 kellobri
January 15, 2020

Data Science DC Meetup Talk - Jan. 2020

Fd59f90efdaa9dea8f7d9c2f0c930a2b?s=128

kellobri

January 15, 2020
Tweet

Transcript

  1. Reflections on a year spent talking to Data Scientists about

    DevOps
  2. Solutions Engineering isn’t Dev and it isn’t Ops... Industrial Research

    Business Management Human Resources Government Work Regulated Environments Big Data Applications Cloud Infrastructure R in Production What is there to learn? What are the needs? What are the problems? Solutions Engineers!
  3. What are the problems? 1. Legitimacy How do you get

    R recognized as an analytic standard? How do you make R a legitimate part of your organization and get the resources you need to support it? In many organizations, R enters through the back door when analysts download the free software and install it on their local workstations… Some organizations struggle to standardize on R due to a lack of management and governance around open source software. At the same time, organizations may neglect R on user workstations, thereby increasing security, legal, and operational risks. - Nathan Stephens, R Views 2016
  4. What are the problems?

  5. Starting a conversation can be challenging Organizational learning and a

    safety culture (or lack thereof) Chapter 4 The DevOps Handbook (2016) Kim, Debois, Willis, Humble Westrum Organizational Typology Model: How Orgs Process Information
  6. (super-quick) Introduction to DevOps

  7. None
  8. 1. DevOps is a philosophy / set of practices 2.

    Which create new processes for collaboration between Dev and Ops teams 3. There’s nothing new in DevOps A framework for making sense out of common sense
  9. Healthy emphasis on introspection Are we part of the problem?

  10. Classic DevOps Silo Diagram Dev Silo IT/Ops Silo Focus on

    THE FEAR “Hey - could you just put this thing in production real quick?” “Uh.. I just deployed this little change, and something might be broken”
  11. SUPER-vicious cycle of mutual resentment and distrust Data Science Silo

    IT/Ops Silo THE FEAR “Hey - I wrote this code using a bunch of open source packages some random person from the internet created … Also, I built a Web App - is that cool?”
  12. SUPER-vicious cycle of mutual resentment and distrust Data Science Silo

    IT/Ops Silo THE FEAR “Hey - I wrote this code using a bunch of open source packages some random person from the internet created … Also, I built a Web App - is that cool?” A Credibility Crisis
  13. Challenges for the R User Organizational • Legitimizing R •

    Working with IT Technical • Experience • Education • Exposure Credibility Crisis Management Plan
  14. Strategies for Managing Code Handoffs Steal Existing & Define Shared

    Goals
  15. Code Quality and Performance The “Hour-Long-Talk” of Data Products -

    Rambling, Cluttered - Parts that work well - Parts that work not-so well Local Development EDA, Prototyping, Iteration The “Lightning-Talk” of Data Products - Targeted - Elegant - Streamlined - Optimized Production Development
  16. Turn a Prototype into a Production Application Performance Workflow 1.

    Use shinyloadtest to see if app is fast enough 2. If not, use profvis to see what’s making it slow 3. Optimize a. Move work out of shiny (very often) b. Make code faster (very often) c. Use caching (sometimes) d. Use async (occasionally) 4. Repeat!
  17. Start by answering some questions… - What is a Shiny

    Application? - Who is the audience? - What is your service level agreement definition? (SLA) - What does your analytic architecture look like today? - What are your goals for evolving this architecture? - How will monitoring be handled? - Who is responsible for maintenance? Make work visible, Define shared goals, Build a checklist, Iterate Empathetic Communication is Challenging What does ‘Production’ mean? Keep it up: unplanned outages are rare or nonexistent Keep it safe: data, functionality, and code are all kept safe from unauthorized users Keep it correct: works as intended, provides the right answers Keep it snappy: fast response times, ability to predict needed capacity for expected traffic
  18. Shiny in Production Journey Code Profiling Version Control Testing Deployment/Release

    Access/Security Performance Tuning Shared Goal: Shorten the distance between development and production Shared Goal: The improvement of daily work Shared Goal: Reduce the risk of deploying a breaking change Testing! Automated Testing! Getting a Sandbox!
  19. Shared Goal: Shorten the distance between development and production ADVOCATE

    FOR A SANDBOX PUBLISHING ENVIRONMENT B. User Acceptance Testing A. Automated Snapshot Testing
  20. Learning Environments for Zero Dollars

  21. • Deployment is any push of code to an environment

    (test, prod) • Release is when that code (feature) is made available to users Application-based release patterns vs. Environment-based release patterns DevOps Learning: Decouple deployment from release
  22. The DevOps Handbook 1. Accelerate Flow - Make work visible

    - Limit Work in Progress (WIP) - Reduce Batch Sizes - Reduce the number of handoffs - Continually identify and elevate constraints - Eliminate hardships and waste 2. Utilize Feedback - See problems as they occur - Swarm to solve problems and build new knowledge - Keep pushing quality closer to the source - Enable optimizing for downstream work centers 3. Learn and Experiment - Enable organizational learning and a safety culture - Institutionalize the improvement of daily work - Transform local discoveries into global improvements - Inject resilience patterns into daily work Three principles form the underpinnings of DevOps:
  23. January 1. Shiny in Production Workshop 2. Configuration Management Tools

    for the R Admin April 3. Championing Analytic Infrastructure July 4. Art of the Feature Toggle 5. Environmental Release Patterns August 6. Shiny in Production: Building bridges from data science to IT September 7. Data Product Delivery: The R user’s journey toward improving daily work 8. The R in Production Handoff: Building bridges from data science to IT October 9. Interactivity in Production 10. Is there a Future for DevOps? speakerdeck.com/kellobri solutions.rstudio.com community.rstudio.com #radmins
  24. We’re hiring!

  25. A conference for R users on March 28! SatRday DC

    is a community run event that focuses on the R statistical language. The goal of this conference is to bring together and inspire useRs located in the Washington metropolitan area. We encourage all talks on Data Science, Data Visualization, Data Engineering, working in data teams, data education, and anything relating to R. dc2020.netlify.com