Slide 1

Slide 1 text

Reflections on a year spent talking to Data Scientists about DevOps

Slide 2

Slide 2 text

Solutions Engineering isn’t Dev and it isn’t Ops... Industrial Research Business Management Human Resources Government Work Regulated Environments Big Data Applications Cloud Infrastructure R in Production What is there to learn? What are the needs? What are the problems? Solutions Engineers!

Slide 3

Slide 3 text

What are the problems? 1. Legitimacy How do you get R recognized as an analytic standard? How do you make R a legitimate part of your organization and get the resources you need to support it? In many organizations, R enters through the back door when analysts download the free software and install it on their local workstations… Some organizations struggle to standardize on R due to a lack of management and governance around open source software. At the same time, organizations may neglect R on user workstations, thereby increasing security, legal, and operational risks. - Nathan Stephens, R Views 2016

Slide 4

Slide 4 text

What are the problems?

Slide 5

Slide 5 text

Starting a conversation can be challenging Organizational learning and a safety culture (or lack thereof) Chapter 4 The DevOps Handbook (2016) Kim, Debois, Willis, Humble Westrum Organizational Typology Model: How Orgs Process Information

Slide 6

Slide 6 text

(super-quick) Introduction to DevOps

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

1. DevOps is a philosophy / set of practices 2. Which create new processes for collaboration between Dev and Ops teams 3. There’s nothing new in DevOps A framework for making sense out of common sense

Slide 9

Slide 9 text

Healthy emphasis on introspection Are we part of the problem?

Slide 10

Slide 10 text

Classic DevOps Silo Diagram Dev Silo IT/Ops Silo Focus on THE FEAR “Hey - could you just put this thing in production real quick?” “Uh.. I just deployed this little change, and something might be broken”

Slide 11

Slide 11 text

SUPER-vicious cycle of mutual resentment and distrust Data Science Silo IT/Ops Silo THE FEAR “Hey - I wrote this code using a bunch of open source packages some random person from the internet created … Also, I built a Web App - is that cool?”

Slide 12

Slide 12 text

SUPER-vicious cycle of mutual resentment and distrust Data Science Silo IT/Ops Silo THE FEAR “Hey - I wrote this code using a bunch of open source packages some random person from the internet created … Also, I built a Web App - is that cool?” A Credibility Crisis

Slide 13

Slide 13 text

Challenges for the R User Organizational ● Legitimizing R ● Working with IT Technical ● Experience ● Education ● Exposure Credibility Crisis Management Plan

Slide 14

Slide 14 text

Strategies for Managing Code Handoffs Steal Existing & Define Shared Goals

Slide 15

Slide 15 text

Code Quality and Performance The “Hour-Long-Talk” of Data Products - Rambling, Cluttered - Parts that work well - Parts that work not-so well Local Development EDA, Prototyping, Iteration The “Lightning-Talk” of Data Products - Targeted - Elegant - Streamlined - Optimized Production Development

Slide 16

Slide 16 text

Turn a Prototype into a Production Application Performance Workflow 1. Use shinyloadtest to see if app is fast enough 2. If not, use profvis to see what’s making it slow 3. Optimize a. Move work out of shiny (very often) b. Make code faster (very often) c. Use caching (sometimes) d. Use async (occasionally) 4. Repeat!

Slide 17

Slide 17 text

Start by answering some questions… - What is a Shiny Application? - Who is the audience? - What is your service level agreement definition? (SLA) - What does your analytic architecture look like today? - What are your goals for evolving this architecture? - How will monitoring be handled? - Who is responsible for maintenance? Make work visible, Define shared goals, Build a checklist, Iterate Empathetic Communication is Challenging What does ‘Production’ mean? Keep it up: unplanned outages are rare or nonexistent Keep it safe: data, functionality, and code are all kept safe from unauthorized users Keep it correct: works as intended, provides the right answers Keep it snappy: fast response times, ability to predict needed capacity for expected traffic

Slide 18

Slide 18 text

Shiny in Production Journey Code Profiling Version Control Testing Deployment/Release Access/Security Performance Tuning Shared Goal: Shorten the distance between development and production Shared Goal: The improvement of daily work Shared Goal: Reduce the risk of deploying a breaking change Testing! Automated Testing! Getting a Sandbox!

Slide 19

Slide 19 text

Shared Goal: Shorten the distance between development and production ADVOCATE FOR A SANDBOX PUBLISHING ENVIRONMENT B. User Acceptance Testing A. Automated Snapshot Testing

Slide 20

Slide 20 text

Learning Environments for Zero Dollars

Slide 21

Slide 21 text

● Deployment is any push of code to an environment (test, prod) ● Release is when that code (feature) is made available to users Application-based release patterns vs. Environment-based release patterns DevOps Learning: Decouple deployment from release

Slide 22

Slide 22 text

The DevOps Handbook 1. Accelerate Flow - Make work visible - Limit Work in Progress (WIP) - Reduce Batch Sizes - Reduce the number of handoffs - Continually identify and elevate constraints - Eliminate hardships and waste 2. Utilize Feedback - See problems as they occur - Swarm to solve problems and build new knowledge - Keep pushing quality closer to the source - Enable optimizing for downstream work centers 3. Learn and Experiment - Enable organizational learning and a safety culture - Institutionalize the improvement of daily work - Transform local discoveries into global improvements - Inject resilience patterns into daily work Three principles form the underpinnings of DevOps:

Slide 23

Slide 23 text

January 1. Shiny in Production Workshop 2. Configuration Management Tools for the R Admin April 3. Championing Analytic Infrastructure July 4. Art of the Feature Toggle 5. Environmental Release Patterns August 6. Shiny in Production: Building bridges from data science to IT September 7. Data Product Delivery: The R user’s journey toward improving daily work 8. The R in Production Handoff: Building bridges from data science to IT October 9. Interactivity in Production 10. Is there a Future for DevOps? speakerdeck.com/kellobri solutions.rstudio.com community.rstudio.com #radmins

Slide 24

Slide 24 text

We’re hiring!

Slide 25

Slide 25 text

A conference for R users on March 28! SatRday DC is a community run event that focuses on the R statistical language. The goal of this conference is to bring together and inspire useRs located in the Washington metropolitan area. We encourage all talks on Data Science, Data Visualization, Data Engineering, working in data teams, data education, and anything relating to R. dc2020.netlify.com