Slide 1

Slide 1 text

©2022 Databricks Inc. — All rights reserved Growing a Terraform provider: lessons learned 4 million downloads, 90%+ test coverage, and other fun things 1 Serge Smertin June 2022

Slide 2

Slide 2 text

©2022 Databricks Inc. — All rights reserved 2 About Serge ▪ Lead maintainer of Databricks Terraform Provider ▪ Worked in all stages of data lifecycle for the past 15 years ▪ Built a couple of data science platforms from scratch ▪ Tracked cyber criminals through massively scaled data forensics ▪ Bringing Databricks strategic customers to next level as full-time job now

Slide 3

Slide 3 text

©2022 Databricks Inc. — All rights reserved 3 Simple Unify your data warehousing and AI use cases on a single platform Open Built on open source and open standards Multicloud One consistent data platform across clouds Databricks Lakehouse Platform Lakehouse Platform Data Warehousing Data Engineering Data Science and ML Data Streaming All structured and unstructured data Cloud Data Lake Unity Catalog Fine-grained governance for data and AI Delta Lake Data reliability and performance

Slide 4

Slide 4 text

©2022 Databricks Inc. — All rights reserved 4 76 000 Lines of code And a lot of reflection 80 entities Resources and data sources And growing 90%+ Unit test coverage ~1200 tests, 40 seconds to run 4000000+ Provider downloads 130000+ every week Databricks Terraform Provider Top 5% among all providers by downloads, entities, contributors, and commits

Slide 5

Slide 5 text

©2022 Databricks Inc. — All rights reserved 5 Where a tiny bit of reflection won’t actually hurt and can save 50%-60% of boilerplate code?

Slide 6

Slide 6 text

©2022 Databricks Inc. — All rights reserved 6 package main import … func provider() *schema.Provider { type providerConfig map[string]string withName := map[string]*schema.Schema{ "name": { Type: schema.TypeString, Required: true, }, } return &schema.Provider{ Schema: withName, ConfigureContextFunc: func(ctx context.Context, rd *schema.ResourceData) (interface{}, diag.Diagnostics) { return providerConfig{ "name": rd.Get("name").(string), }, nil }, ResourcesMap: map[string]*schema.Resource{ "dummy_thing": { Schema: withName, CreateContext: func(ctx context.Context, rd *schema.ResourceData, i interface{}) diag.Diagnostics { conf := i.(providerConfig) rd.SetId(fmt.Sprintf("%s/%s", conf["name"], rd.Get("name").(string))) return nil }, ReadContext: schema.NoopContext, UpdateContext: schema.NoopContext, DeleteContext: schema.NoopContext, }, }, } } func main() { plugin.Serve(&plugin.ServeOpts{ ProviderFunc: provider, }) } Minimal Working Terraform Provider Define and validate configuration Configure connectivity to backend Connect to backend and do something, setting resource ID on success

Slide 7

Slide 7 text

©2022 Databricks Inc. — All rights reserved 7 func ResourceCustomerManagedKey() *schema.Resource { s := common.StructToSchema(CustomerManagedKey{}, nil) p := common.NewPairSeparatedID("account_id", "customer_managed_key_id", "/") return common.Resource{ Create: func(ctx context.Context, d *schema.ResourceData, c *common.DatabricksClient) error { var cmk CustomerManagedKey common.DataToStructPointer(d, s, &cmk) customerManagedKeyData, err := NewCustomerManagedKeysAPI(ctx, c).Create(cmk) if err != nil { return err } d.Set("customer_managed_key_id", customerManagedKeyData.CustomerManagedKeyID) p.Pack(d) return nil }, Read: func(ctx context.Context, d *schema.ResourceData, c *common.DatabricksClient) error { accountID, cmkID, err := p.Unpack(d) if err != nil { return err } cmk, err := NewCustomerManagedKeysAPI(ctx, c).Read(accountID, cmkID) if err != nil { return err } return common.StructToData(cmk, s, d) }, Delete: func(ctx context.Context, d *schema.ResourceData, c *common.DatabricksClient) error { accountID, cmkID, err := p.Unpack(d) if err != nil { return err } return NewCustomerManagedKeysAPI(ctx, c).Delete(accountID, cmkID) }, Schema: s, SchemaVersion: 1, StateUpgraders: []schema.StateUpgrader{ { Version: 0, Type: ResourceCustomerManagedKeyV0(), Upgrade: migrateResourceCustomerManagedKeyV0, }, }, }.ToResource() } reflection reflection reflection boilerplate Minimal boilerplate as the primary goal Working on making things work instead of making wrappers. boilerplate

Slide 8

Slide 8 text

©2022 Databricks Inc. — All rights reserved 8 // AwsKeyInfo has information about the KMS key for BYOK type AwsKeyInfo struct { KeyArn string `json:"key_arn"` KeyAlias string `json:"key_alias"` KeyRegion string `json:"key_region,omitempty" tf:"computed"` } // CustomerManagedKey contains key information and metadata for BYOK for E2 type CustomerManagedKey struct { CustomerManagedKeyID string `json:"customer_managed_key_id,omitempty" tf:"computed"` AwsKeyInfo *AwsKeyInfo `json:"aws_key_info" tf:"force_new"` AccountID string `json:"account_id" tf:"force_new"` CreationTime int64 `json:"creation_time,omitempty" tf:"computed"` UseCases []string `json:"use_cases"` } func ResourceCustomerManagedKeyV0() cty.Type { return (&schema.Resource{ Schema: map[string]*schema.Schema{ "account_id": { Type: schema.TypeString, ForceNew: true, }, "customer_managed_key_id": { Type: schema.TypeString, Optional: true, Computed: true, }, "creation_time": { Type: schema.TypeInt, Computed: true, }, "aws_key_info": { Type: schema.TypeList, ForceNew: true, Elem: &schema.Resource{ Schema: map[string]*schema.Schema{ "key_arn": { Type: schema.TypeString, }, "key_alias": { Type: schema.TypeString, }, "key_region": { Type: schema.TypeString, Optional: true, Computed: true, }, }, }, }, }, }).CoreConfigSchema().ImpliedType() } Life is easy for compatible schema changes … but harder with incompatible changes

Slide 9

Slide 9 text

©2022 Databricks Inc. — All rights reserved 9 var jobSchema = common.StructToSchema(JobSettings{}, func(s map[string]*schema.Schema) map[string]*schema.Schema { jobSettingsSchema(&s, "") jobSettingsSchema(&s["task"].Elem.(*schema.Resource).Schema, "task.0.") jobSettingsSchema(&s["job_cluster"].Elem.(*schema.Resource).Schema, "job_cluster.0.") gitSourceSchema(s["git_source"].Elem.(*schema.Resource), "") if p, err := common.SchemaPath(s, "schedule", "pause_status"); err == nil { p.ValidateFunc = validation.StringInSlice([]string{"PAUSED", "UNPAUSED"}, false) } s["max_concurrent_runs"].ValidateDiagFunc = validation.ToDiagFunc(validation.IntAtLeast(1)) s["max_concurrent_runs"].Default = 1 s["url"] = &schema.Schema{ Type: schema.TypeString, Computed: true, } s["always_running"] = &schema.Schema{ Optional: true, Default: false, Type: schema.TypeBool, } return s }) // JobSettings contains the information for configuring a job on databricks type JobSettings struct { Name string `json:"name,omitempty" tf:"default:Untitled"` // BEGIN Jobs API 2.0 ExistingClusterID string `json:"existing_cluster_id,omitempty" tf:"group:cluster_type"` NewCluster *clusters.Cluster `json:"new_cluster,omitempty" tf:"group:cluster_type"` NotebookTask *NotebookTask `json:"notebook_task,omitempty" tf:"group:task_type"` SparkJarTask *SparkJarTask `json:"spark_jar_task,omitempty" tf:"group:task_type"` SparkPythonTask *SparkPythonTask `json:"spark_python_task,omitempty" tf:"group:task_type"` SparkSubmitTask *SparkSubmitTask `json:"spark_submit_task,omitempty" tf:"group:task_type"` PipelineTask *PipelineTask `json:"pipeline_task,omitempty" tf:"group:task_type"` PythonWheelTask *PythonWheelTask `json:"python_wheel_task,omitempty" tf:"group:task_type"` Libraries []libraries.Library `json:"libraries,omitempty" tf:"slice_set,alias:library"` TimeoutSeconds int32 `json:"timeout_seconds,omitempty"` MaxRetries int32 `json:"max_retries,omitempty"` MinRetryIntervalMillis int32 `json:"min_retry_interval_millis,omitempty"` RetryOnTimeout bool `json:"retry_on_timeout,omitempty"` // END Jobs API 2.0 // BEGIN Jobs API 2.1 Tasks []JobTaskSettings `json:"tasks,omitempty" tf:"alias:task"` Format string `json:"format,omitempty" tf:"computed"` JobClusters []JobCluster `json:"job_clusters,omitempty" tf:"alias:job_cluster"` // END Jobs API 2.1 // BEGIN Jobs + Repo integration preview GitSource *GitSource `json:"git_source,omitempty"` // END Jobs + Repo integration preview Schedule *CronSchedule `json:"schedule,omitempty"` MaxConcurrentRuns int32 `json:"max_concurrent_runs,omitempty"` EmailNotifications *EmailNotifications `json:"email_notifications,omitempty" tf:"suppress_diff"` Tags map[string]string `json:"tags,omitempty"` } Reflection is not a silver bullet And reflected schemas frequently use customization

Slide 10

Slide 10 text

©2022 Databricks Inc. — All rights reserved 10 Custom overlay on top of Terraform SDK to enable rapid growth of provider with new features

Slide 11

Slide 11 text

©2022 Databricks Inc. — All rights reserved 11 func (r Resource) ToResource() *schema.Resource { read := func(ctx context.Context, d *schema.ResourceData, m interface{}) diag.Diagnostics { err := recoverable(r.Read)(ctx, d, m.(*DatabricksClient)) if IsMissing(err) { log.Printf("[INFO] %s[id=%s] is removed on backend", ResourceName.GetOrUnknown(ctx), d.Id()) d.SetId("") return nil } if err != nil { err = nicerError(ctx, err, "read") return diag.FromErr(err) } return nil } return &schema.Resource{ Schema: r.Schema, // with ForceNew to all attributes if no Update … CreateContext: func(ctx context.Context, d *schema.ResourceData, m interface{}) diag.Diagnostics { c := m.(*DatabricksClient) err := recoverable(r.Create)(ctx, d, c) if err != nil { err = nicerError(ctx, err, "create") return diag.FromErr(err) } if err = recoverable(r.Read)(ctx, d, c); err != nil { err = nicerError(ctx, err, "read") return diag.FromErr(err) } return nil }, ReadContext: read, UpdateContext: update, DeleteContext: …, Importer: &schema.ResourceImporter{ StateContext: func(...) { d.MarkNewResource() diags := read(ctx, d, m) … return []*schema.ResourceData{d}, err }, }, Timeouts: r.Timeouts, } } no panic cleanup Diags are inconvenient auto CRUD vs CRD Correct creation with drift detection in one single place Default importers Make contributing new resource super simple Focus on nicer error messaging for the end user, automate everything with 5+ occurrences, recover from panics, hook into context.

Slide 12

Slide 12 text

©2022 Databricks Inc. — All rights reserved 12 func (c *DatabricksClient) userAgent(ctx context.Context) string { resource := "unknown" terraformVersion := "unknown" if rn, ok := ctx.Value(ResourceName).(string); ok { resource = rn } if c.Provider != nil { terraformVersion = c.Provider.TerraformVersion } return fmt.Sprintf("databricks-tf-provider/ %s (+%s) terraform/ %s", Version(), resource, terraformVersion) } func AddContextToAllResources(p *schema.Provider, prefix string) { for k, r := range p.DataSourcesMap { name := strings. ReplaceAll (k, prefix+ "_", "") wrap := op(r.ReadContext).addContext(ResourceName, name).addContext (IsData, "yes") r.ReadContext = schema.ReadContextFunc(wrap) } for k, r := range p.ResourcesMap { k = strings. ReplaceAll (k, prefix+ "_", "") addContextToResource(k, r) } } context.WithValue

Slide 13

Slide 13 text

©2022 Databricks Inc. — All rights reserved 13 Reverse engineering deployments for customers who don’t do IaC yet

Slide 14

Slide 14 text

©2022 Databricks Inc. — All rights reserved 14 "databricks_repo": { Service: "repos", Name: func(d *schema.ResourceData) string { name := d.Get("path").(string) if name == "" { return d.Id() } return strings.TrimPrefix(name, "/") }, List: func(ic *importContext) error { repoList, err := repos.NewReposAPI(ic.Context, ic.Client).ListAll() if err != nil { return err } for offset, repo := range repoList { if repo.Url != "" { ic.Emit(&resource{ Resource: "databricks_repo", ID: fmt.Sprintf("%d", repo.ID), }) } log.Printf("[INFO] Scanned %d of %d repos", offset+1, len(repoList)) } return nil }, Import: func(ic *importContext, r *resource) error { if ic.meAdmin { ic.Emit(&resource{ Resource: "databricks_permissions", ID: fmt.Sprintf("/repos/%s", r.ID), Name: "repo_" + ic.Importables["databricks_repo"].Name(r.Data), }) } return nil }, }, List all relevant resources Discover relevant resources

Slide 15

Slide 15 text

©2022 Databricks Inc. — All rights reserved 15 "databricks_group_instance_profile": { Service: "access", Depends: []reference{ {Path: "group_id", Resource: "databricks_group"}, {Path: "instance_profile_id", Resource: "databricks_instance_profile"}, }, }, resource "databricks_instance_profile" "instance_profile" { // ID: bcd instance_profile_arn = "my_instance_profile_arn" } resource "databricks_group" "my_group" { // ID: abc display_name = "my_group_name" } resource "databricks_group_instance_profile" "my_group_instance_profile" { group_id = "abc" instance_profile_id = "bcd" } resource "databricks_instance_profile" "instance_profile" { instance_profile_arn = "my_instance_profile_arn" } resource "databricks_group" "my_group" { display_name = "my_group_name" } resource "databricks_group_instance_profile" "my_group_instance_profile" { group_id = databricks_group.my_group.id instance_profile_id = databricks_instance_profile.instance_profile.id } Referential Integrity resolution

Slide 16

Slide 16 text

©2022 Databricks Inc. — All rights reserved 16 1. Aztfy (https://azure.github.io/aztfy/#1) - only Azure-focused, though has a great introduction on the intentions for this tooling. 2. Terraforming (http://terraforming.dtan4.net/) - only AWS-focused. 3. Terraformer (https://github.com/GoogleCloudPlatform/terraformer) - at the time of writing, it was easier to ship within the same artifact, as terraform provider itself, where keeping resources and resource schema was a concern. 4. TerraCognita (https://github.com/cycloidio/terracognita/) - at the time of creation, there was no significant dev activity. Similar tools for reverse-engineering deployments

Slide 17

Slide 17 text

©2022 Databricks Inc. — All rights reserved 17 HTTP Fixture framework: strange testing technique that saves plethora of time

Slide 18

Slide 18 text

©2022 Databricks Inc. — All rights reserved 18 func TestSomethingExciting(t *testing.T) { qa.ResourceFixture{ Fixtures: []qa.HTTPFixture{ //.. pure TDD! }, Resource: ResourceCatalog(), Create: true, HCL: ` name = "a" comment = "b" properties = { c = "d" } `, }.ApplyNoError(t) } func TestCatalogCornerCases(t *testing.T) { qa.ResourceCornerCases(t, ResourceCatalog()) } The failure that almost fixes itself Make testing fun again Simplest heuristics change change

Slide 19

Slide 19 text

©2022 Databricks Inc. — All rights reserved 19 func TestCatalogCreateAlsoDeletesDefaultSchema (t *testing.T) { qa.ResourceFixture{ Fixtures: []qa.HTTPFixture{ { Method: "POST", Resource: "/api/2.0/unity-catalog/catalogs", ExpectedRequest: CatalogInfo{ Name: "a", Comment: "b", Properties: map[string]string{ "c": "d", }, }, }, { Method: "DELETE", Resource: "/api/2.0/unity-catalog/schemas/a.default", }, { Method: "GET", Resource: "/api/2.0/unity-catalog/catalogs/a", Response: CatalogInfo{ Name: "a", Comment: "b", Properties: map[string]string{ "c": "d", }, MetastoreID: "e", Owner: "f", }, }, }, Resource: ResourceCatalog (), Create: true, HCL: ` name = "a" comment = "b" properties = { c = "d" } `, }.ApplyNoError (t) } After two years in practice Tests read almost like documentation - easier for new people to respond to escalations. Simple copy-pasta creates bug reproduce scaffolding, that is immediately debuggable. Favorite button

Slide 20

Slide 20 text

©2022 Databricks Inc. — All rights reserved 20 Adoption jump Adoption jump Adoption jump Unit test coverage trend Customer adoption trend

Slide 21

Slide 21 text

©2022 Databricks Inc. — All rights reserved 21 There’s more than just unit tests

Slide 22

Slide 22 text

©2022 Databricks Inc. — All rights reserved 22 is awesome

Slide 23

Slide 23 text

©2022 Databricks Inc. — All rights reserved 23

Slide 24

Slide 24 text

©2022 Databricks Inc. — All rights reserved 24 It’s actually awesome until you hit an issue like this:

Slide 25

Slide 25 text

©2022 Databricks Inc. — All rights reserved 25 Acceptance testing across AWS, Azure, and GCP With 90% unit test coverage, acceptance tests focus on making sure that 8 different authentication methods are working, API serialization is up-to-date, and there are no unexpected configurations drifts.

Slide 26

Slide 26 text

©2022 Databricks Inc. — All rights reserved 26 26 Always make data-driven decisions Enhance the most used, test the most critical, automate the reporting as much as possible, try to keep up with the features behind gated enablement, keep backwards compatibility almost forever.

Slide 27

Slide 27 text

©2022 Databricks Inc. — All rights reserved 27 Thank you