Achieving repeatable, extensible and self serve infrastructure

2 tasdikrahman.me @tasdikrahman • Product Engineer @ Gojek • Contributor
to oVirt • Backpacker • Weekend chef • Chelsea FC!!

What does Gojek do? 3

4 Ref: gojek.io

What am I gonna talk about? 5

6 Ref: shutterstock.com

7 Ref: shutterstock.com Evolution of Infrastructure @ Gojek

Travelling back in time 8

Rapid Demand 9

How to deal with it? 10

Central Infrastructure Team 11

Intent? 12

Abstract out Infrastructure For Product Teams 13

Outcome? 14

Adhoc requests 15

“Measure what is measurable, and make measurable what is not
so” - Galileo 16 Credits: biography.com

Service request tickets 17

18 Example service request in our ticket system by a
team (names redacted)

19 Example service request to increase disk size (names redacted)

Number of service requests kept increasing with scale and more
product groups coming in 20

21 Ref: gunshowcomic.com/648

How does one keep up with service requests? 22

Scale your team vertically and keep doing so 23

Sustainable? 24

Very hard to do, but mostly No 25

Eventually, we noticed we were becoming the bottleneck 26

Give access to someone from the product team? 27

Chances of Security loopholes 28

29 Ref: https://blog.codinghorror.com/the-broken-window-theory/

What do we do then? 30

Quick detour 31

Where did systems administration start? 32

Evolution of Automation at Gojek 33

Evolution of Automation at Gojek 34 • Scripts • Chef-cookbooks
• Rundeck • Deployment scripts

Problems with the earlier solutions 35 • Multiple ways around
building and using automation • Managing dependencies for the automation. Eg: people using gcloud/AWS

Problems with the earlier solutions 36 • Lack of convention
leading to meagre contributions to automation from devs. • Adhoc way of managing access to tools like terraform, knife leading to stray accidents. • No central platform for automation.

Number of tickets getting created still not decreasing 37

Clearing infrastructure debts 38

Moving from maintenance to innovation mode 39

Making infrastructure boring for product teams 40

Proctor: Our automation orchestrator 41 Ref: github.com/gojek/proctor

Installation 44

45 Helm all the way Reference value: stable/proctor-service/values.yaml

Automation using proctor 46

Sample proc to increase disk 47

Sample proc to increase disk 48

Scripts can be added by developers and they get added
to proctor after our review 49

Sample procs in our ecosystem 50

Demo 51

Proﬁt? 52

Outcome of having proctor? 53

Decrease in number of tickets which were mechanical in nature
54

Having terraform inside CI 55 +

But before that 56

Creating the gcloud project 57

58 Sample directory structure

59 .gitlab-yml for the gcloud project in gitlab

61 Plan and apply

Private terraform registry consisting of 90+ modules 62

Outcome? 63

Teams managing and provisioning their own infra with our best
practices baked in terraform modules 64

OSS alternatives? 65

66 Reference: runatlantis.io/

Ideal state? 67

68 Ref: Google SRE book: Eliminating toil

Known caveats? 69

Deletion of infra 70

Teams forget what they are using 71

Lessons learnt? 72

Avoid premature automation 73

High service requests for product teams is a smell 74

No Big bang changes 75

Documentation should go hand in hand, would affect productivity directly
76

Reduce steps for onboarding to your tooling, lesser the better
77

Invisible infrastructure 78

Product managers in Infrastructure teams 79

Prioritizing on innovation 80

Links and References • https://github.com/gojek/proctor • https://blog.gojekengineering.com/olympus-terraforming-repeatabl e-and-extensible-infrastructure-at-go-jek-42ad5b0a4f9a • https://learn.hashicorp.com/terraform/development/running-terrafor
m-in-automation • https://lethain.com/product-management-infra-engineering/ 81

82 @tasdikrahman tasdikrahman.me

Achieving repeatable, extensible and self serve...

Achieving repeatable, extensible and self serve infrastructure

More Decks by Tasdik Rahman

Other Decks in Programming

Featured

Transcript