$30 off During Our Annual Pro Sale. View Details »

Data Science DC Meetup Talk - Jan. 2020

kellobri
January 15, 2020

Data Science DC Meetup Talk - Jan. 2020

kellobri

January 15, 2020
Tweet

More Decks by kellobri

Other Decks in Technology

Transcript

  1. Reflections on a year spent
    talking to Data Scientists
    about DevOps

    View Slide

  2. Solutions Engineering isn’t Dev and it isn’t Ops...
    Industrial Research
    Business Management
    Human Resources
    Government Work
    Regulated Environments
    Big Data Applications
    Cloud Infrastructure
    R in Production
    What is there to learn?
    What are the needs?
    What are the problems?
    Solutions Engineers!

    View Slide

  3. What are the problems?
    1. Legitimacy
    How do you get R recognized as
    an analytic standard?
    How do you make R a legitimate
    part of your organization and get
    the resources you need to
    support it?
    In many organizations, R enters through the back
    door when analysts download the free software
    and install it on their local workstations…
    Some organizations struggle to standardize on R
    due to a lack of management and governance
    around open source software.
    At the same time, organizations may neglect R
    on user workstations, thereby increasing
    security, legal, and operational risks.
    - Nathan Stephens, R Views 2016

    View Slide

  4. What are the problems?

    View Slide

  5. Starting a conversation can be challenging
    Organizational
    learning and a
    safety culture
    (or lack thereof)
    Chapter 4
    The DevOps Handbook
    (2016) Kim, Debois, Willis,
    Humble
    Westrum Organizational Typology Model: How Orgs Process Information

    View Slide

  6. (super-quick)
    Introduction to DevOps

    View Slide

  7. View Slide

  8. 1. DevOps is a philosophy / set of practices
    2. Which create new processes for
    collaboration between Dev and Ops teams
    3. There’s nothing new in DevOps
    A framework for making sense out of common sense

    View Slide

  9. Healthy emphasis on
    introspection
    Are we part of the problem?

    View Slide

  10. Classic DevOps Silo Diagram
    Dev Silo IT/Ops Silo
    Focus on THE FEAR
    “Hey - could you just put this thing in
    production real quick?”
    “Uh.. I just deployed this little change, and
    something might be broken”

    View Slide

  11. SUPER-vicious cycle of mutual resentment and distrust
    Data Science Silo IT/Ops Silo
    THE FEAR
    “Hey - I wrote this code using a bunch of
    open source packages some random person
    from the internet created …
    Also, I built a Web App - is that cool?”

    View Slide

  12. SUPER-vicious cycle of mutual resentment and distrust
    Data Science Silo IT/Ops Silo
    THE FEAR
    “Hey - I wrote this code using a bunch of
    open source packages some random person
    from the internet created …
    Also, I built a Web App - is that cool?”
    A Credibility Crisis

    View Slide

  13. Challenges for the R User
    Organizational
    ● Legitimizing R
    ● Working with IT
    Technical
    ● Experience
    ● Education
    ● Exposure
    Credibility Crisis Management Plan

    View Slide

  14. Strategies for Managing
    Code Handoffs
    Steal Existing
    & Define Shared Goals

    View Slide

  15. Code Quality and Performance
    The “Hour-Long-Talk” of
    Data Products
    - Rambling, Cluttered
    - Parts that work well
    - Parts that work not-so well
    Local Development
    EDA, Prototyping, Iteration
    The “Lightning-Talk”
    of Data Products
    - Targeted
    - Elegant
    - Streamlined
    - Optimized
    Production
    Development

    View Slide

  16. Turn a Prototype into a Production Application
    Performance Workflow
    1. Use shinyloadtest to see if app is fast enough
    2. If not, use profvis to see what’s making it slow
    3. Optimize
    a. Move work out of shiny (very often)
    b. Make code faster (very often)
    c. Use caching (sometimes)
    d. Use async (occasionally)
    4. Repeat!

    View Slide

  17. Start by answering some questions…
    - What is a Shiny Application?
    - Who is the audience?
    - What is your service level agreement definition? (SLA)
    - What does your analytic architecture look like today?
    - What are your goals for evolving this architecture?
    - How will monitoring be handled?
    - Who is responsible for maintenance?
    Make work visible, Define shared goals, Build a checklist, Iterate
    Empathetic Communication is Challenging
    What does ‘Production’ mean?
    Keep it up: unplanned outages are rare or
    nonexistent
    Keep it safe: data, functionality, and
    code are all kept safe from unauthorized
    users
    Keep it correct: works as intended,
    provides the right answers
    Keep it snappy: fast response times,
    ability to predict needed capacity for
    expected traffic

    View Slide

  18. Shiny in Production Journey
    Code Profiling
    Version Control
    Testing
    Deployment/Release
    Access/Security
    Performance Tuning
    Shared Goal:
    Shorten the distance between
    development and production
    Shared Goal:
    The improvement of daily work
    Shared Goal:
    Reduce the risk of deploying a
    breaking change
    Testing!
    Automated Testing!
    Getting a Sandbox!

    View Slide

  19. Shared Goal:
    Shorten the distance between
    development and production
    ADVOCATE FOR A
    SANDBOX PUBLISHING
    ENVIRONMENT
    B. User Acceptance Testing
    A. Automated
    Snapshot Testing

    View Slide

  20. Learning Environments for Zero Dollars

    View Slide

  21. ● Deployment is any push of code to an environment (test, prod)
    ● Release is when that code (feature) is made available to users
    Application-based release patterns vs. Environment-based release patterns
    DevOps Learning: Decouple deployment from release

    View Slide

  22. The DevOps Handbook
    1. Accelerate Flow
    - Make work visible
    - Limit Work in Progress (WIP)
    - Reduce Batch Sizes
    - Reduce the number of handoffs
    - Continually identify and elevate
    constraints
    - Eliminate hardships and waste
    2. Utilize Feedback
    - See problems as they occur
    - Swarm to solve problems and
    build new knowledge
    - Keep pushing quality closer to
    the source
    - Enable optimizing for
    downstream work centers
    3. Learn and Experiment
    - Enable organizational learning
    and a safety culture
    - Institutionalize the improvement
    of daily work
    - Transform local discoveries into
    global improvements
    - Inject resilience patterns into daily
    work
    Three principles form the
    underpinnings of DevOps:

    View Slide

  23. January
    1. Shiny in Production Workshop
    2. Configuration Management Tools for the R Admin
    April
    3. Championing Analytic Infrastructure
    July
    4. Art of the Feature Toggle
    5. Environmental Release Patterns
    August
    6. Shiny in Production: Building bridges from data science
    to IT
    September
    7. Data Product Delivery: The R user’s journey toward
    improving daily work
    8. The R in Production Handoff: Building bridges from data
    science to IT
    October
    9. Interactivity in Production
    10. Is there a Future for DevOps?
    speakerdeck.com/kellobri
    solutions.rstudio.com
    community.rstudio.com
    #radmins

    View Slide

  24. We’re hiring!

    View Slide

  25. A conference for R users on March 28!
    SatRday DC is a community run event that focuses on the R
    statistical language.
    The goal of this conference is to bring together and inspire
    useRs located in the Washington metropolitan area.
    We encourage all talks on Data Science, Data Visualization,
    Data Engineering, working in data teams, data education,
    and anything relating to R.
    dc2020.netlify.com

    View Slide