Slide 1

Slide 1 text

Constructing Open Source SDKs for Ops Teams with REST and GraphQL Chris Wahl

Slide 2

Slide 2 text

Chris Wahl ❖ Chief Technologist @ Rubrik ❖ Author of Networking for VMware Administrators ❖ Open Source Enabler at Rubrik Build ❖ he/him Twitter: @ChrisWahl GitHub: chriswahl LinkedIn: /wahlchris Blog: Wahl Network

Slide 3

Slide 3 text

@ChrisWahl | #DevWeek2019 3 https://twitter.com/AxolotlCure/status/1136284938830045184

Slide 4

Slide 4 text

This is a story about toil And a lot of learning through triumph and mistakes @ChrisWahl | #DevWeek2019 4

Slide 5

Slide 5 text

“ ” The kind of work tied to running a production service that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows - Toil @ChrisWahl | #DevWeek2019 5

Slide 6

Slide 6 text

@ChrisWahl | #DevWeek2019 6

Slide 7

Slide 7 text

Life of an operator • At the end of the release cycle • “Here’s a thing, make it work, keep it working” • Myriad of systems to understand and maintain while being short staffed @ChrisWahl | #DevWeek2019 7

Slide 8

Slide 8 text

“ ” I need a one-liner or script to accomplish this task so I can copy and paste it into my environment, solve my problem, and get back to putting out a hundred other fires - Systems Administrators @ChrisWahl | #DevWeek2019 8

Slide 9

Slide 9 text

Abuse from Crude Tools Tools like AutoIt • Script GUI actions using a DSL • The ultimate “sad panda” @ChrisWahl | #DevWeek2019 9

Slide 10

Slide 10 text

Key Ingredients @ChrisWahl | #DevWeek2019 10 RESTful API Operator Audience Free Time SDK

Slide 11

Slide 11 text

Initial Research • Our audience preferred Microsoft PowerShell • Auto generation of SDK was ugly • Our swagger specification was non-standard • Decided to craft a bespoke SDK @ChrisWahl | #DevWeek2019 11

Slide 12

Slide 12 text

The Mission • Give operators a familiar tool to manage our product and remove toil • Use my background as an operator to control the UX • Selfishly: Learn how to build an SDK @ChrisWahl | #DevWeek2019 12

Slide 13

Slide 13 text

Project Plan • Everything in GitHub as an open source project • MIT licensing (Legal ) • One project per repository • Official product support for projects • Unit tests for new features • External CI: AppVeyor, Azure Pipelines • Internal CI: CircleCI • Integration of Jira and GitHub via Zapier @ChrisWahl | #DevWeek2019 13

Slide 14

Slide 14 text

People use this thing? The mysterious tale of unloved APIs @ChrisWahl | #DevWeek2019 14

Slide 15

Slide 15 text

Our API’s Original Purpose • Distributed systems to chat with each other • Supply the GUI with an interface @ChrisWahl | #DevWeek2019 15 me

Slide 16

Slide 16 text

This created friction • There were no API versions • Breaking changes were normal • Standards for model, params, enums, etc. did not exist • The product surface area was rapidly expanding @ChrisWahl | #DevWeek2019 16

Slide 17

Slide 17 text

@ChrisWahl | #DevWeek2019 17

Slide 18

Slide 18 text

@ChrisWahl | #DevWeek2019 18

Slide 19

Slide 19 text

We Made Versions! • Internal • meant for testing and developing new features and for providing command and control endpoints for the software itself. • Versioned (Vn) • meant for public consumption with a declaration on versioning, deprecation, and when breaking changes would be introduced. @ChrisWahl | #DevWeek2019 19

Slide 20

Slide 20 text

“ ” API versioning does not prevent breaking changes. It just helps control when, where, and how the break occurs. Someone must still update their code. - Me @ChrisWahl | #DevWeek2019 20

Slide 21

Slide 21 text

More Cleanup • Placed major integrations at the parent (root) level • Leveraged HTTP methods to simplify workflows • Used Boolean field naming conventions @ChrisWahl | #DevWeek2019 21 Ugly: POST to “/add_node” and “/remove_node/{id}” Pretty: POST to “/node” and DELETE to “/node/{id}” Start with ‘has’, ‘is’ or ‘should’ to make it clear that it is a Boolean field Examples: ‘hasRootAccess’, ‘isAdmin’ and ‘shouldDoSomething’

Slide 22

Slide 22 text

“ ” The sooner you start to code, the longer the program will take. - Roy Carlson @ChrisWahl | #DevWeek2019 22

Slide 23

Slide 23 text

Internal Became the Hypnotoad • No incentives for versioning • Over 95% of the API resided in Internal @ChrisWahl | #DevWeek2019 23

Slide 24

Slide 24 text

The Universal Solvent Embracing our audience further @ChrisWahl | #DevWeek2019 24

Slide 25

Slide 25 text

Too Much Complexity • Each function with the SDK was a closed loop • The community found it too difficult to contribute • A new architecture was needed @ChrisWahl | #DevWeek2019 25

Slide 26

Slide 26 text

SDK Design Goal API File • Gather information for each supported endpoint • Supply the SDK with methods, params, status codes, etc. • Version the data for backwards compatibility Generic Functions • Functions look at the API File to understand their purpose • Functions can alter their state based on the target product version @ChrisWahl | #DevWeek2019 26

Slide 27

Slide 27 text

@ChrisWahl | #DevWeek2019 27 Product versions 1.0+ Product versions 5.0+

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

Enablement and Communication Too focused on the technology Not enough focus on the hygiene Lots of questions from our customers General fear of GitHub and coding More was needed @ChrisWahl | #DevWeek2019 30

Slide 31

Slide 31 text

No content

Slide 32

Slide 32 text

Choose Your Own Adventure

Slide 33

Slide 33 text

Educational Workshops for Operators

Slide 34

Slide 34 text

Communication Efforts @ChrisWahl | #DevWeek2019 34 The rules of versioning and deprecation. Future deprecation of endpoints / resources. New or updated endpoints / resources.

Slide 35

Slide 35 text

And then GraphQL appeared There goes the neighborhood @ChrisWahl | #DevWeek2019 35

Slide 36

Slide 36 text

@ChrisWahl | #DevWeek2019 36

Slide 37

Slide 37 text

“ ” You haven't mastered a tool until you understand when it should not be used. - Kelsey Hightower @ChrisWahl | #DevWeek2019 37

Slide 38

Slide 38 text

Initial Research in 2017 • Dramatic speed improvements for the GUI • As more objects are added, REST continues to fall behind • Simple to query all objects and use cursor / pagination • More flexibility with our returned values @ChrisWahl | #DevWeek2019 38 Stress tested load times 95th percentile load times with GraphQL: 3.256 seconds 95th percentile load times with REST: 6.619 seconds

Slide 39

Slide 39 text

Since Then • Added GraphQL to our on-premises product. • Reporting • Dashboards • Various other components • Constructed a SaaS platform with GraphQL as the standard API • Started from scratch • Using what we learned • Lots of tweaking @ChrisWahl | #DevWeek2019 39

Slide 40

Slide 40 text

Challenges • Schema is in flux • There are no versions • Documentation holy wars • We’re all still learning GraphQL • Graph-Que-What? @ChrisWahl | #DevWeek2019 40

Slide 41

Slide 41 text

Current State • Schema tools (Voyager, GraphiQL) for visualization • Internal construction of new SDKs • Existing auth methods (e.g. tokens) are valid globally @ChrisWahl | #DevWeek2019 41 Base platform will continue with REST and GraphQL SaaS platform will remain entirely GraphQL Using GitHub private repos for development

Slide 42

Slide 42 text

No content

Slide 43

Slide 43 text

SDK Development Let use cases drive stack-ranking Mimic a near-identical UX Educate and enable in parallel Invite early-adopters and give them checklists @ChrisWahl | #DevWeek2019 43

Slide 44

Slide 44 text

Takeaways @ChrisWahl | #DevWeek2019 44

Slide 45

Slide 45 text

If we could do it all over again • Increased collaboration with engineering and support • Create incentives to document and polish the API • Make documentation a top priority • Educate internal stakeholders on API usage • Bring operators into your SDK build process @ChrisWahl | #DevWeek2019 45 Use cases, UX, testing, feedback

Slide 46

Slide 46 text

Thank you Twitter: @ChrisWahl GitHub: chriswahl