Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Reaching 3M downloads and 90%+ unit test coverage for a Terraform provider: lessons learned

Reaching 3M downloads and 90%+ unit test coverage for a Terraform provider: lessons learned

This will be a story of building and growing a Databricks Terraform Provider over the course of two years, as well as tactics, techniques and procedures that allowed it to achieve millions of installations. This talk will be useful for every Terraform Provider maintainer, as well as those who are planning to write one.

Serge Smertin

July 14, 2022
Tweet

More Decks by Serge Smertin

Other Decks in Programming

Transcript

  1. ©2022 Databricks Inc. — All rights reserved Growing a Terraform

    provider: lessons learned 4 million downloads, 90%+ test coverage, and other fun things 1 Serge Smertin June 2022
  2. ©2022 Databricks Inc. — All rights reserved 2 About Serge

    ▪ Lead maintainer of Databricks Terraform Provider ▪ Worked in all stages of data lifecycle for the past 15 years ▪ Built a couple of data science platforms from scratch ▪ Tracked cyber criminals through massively scaled data forensics ▪ Bringing Databricks strategic customers to next level as full-time job now
  3. ©2022 Databricks Inc. — All rights reserved 3 Simple Unify

    your data warehousing and AI use cases on a single platform Open Built on open source and open standards Multicloud One consistent data platform across clouds Databricks Lakehouse Platform Lakehouse Platform Data Warehousing Data Engineering Data Science and ML Data Streaming All structured and unstructured data Cloud Data Lake Unity Catalog Fine-grained governance for data and AI Delta Lake Data reliability and performance
  4. ©2022 Databricks Inc. — All rights reserved 4 76 000

    Lines of code And a lot of reflection 80 entities Resources and data sources And growing 90%+ Unit test coverage ~1200 tests, 40 seconds to run 4000000+ Provider downloads 130000+ every week Databricks Terraform Provider Top 5% among all providers by downloads, entities, contributors, and commits
  5. ©2022 Databricks Inc. — All rights reserved 5 Where a

    tiny bit of reflection won’t actually hurt and can save 50%-60% of boilerplate code?
  6. ©2022 Databricks Inc. — All rights reserved 6 package main

    import … func provider() *schema.Provider { type providerConfig map[string]string withName := map[string]*schema.Schema{ "name": { Type: schema.TypeString, Required: true, }, } return &schema.Provider{ Schema: withName, ConfigureContextFunc: func(ctx context.Context, rd *schema.ResourceData) (interface{}, diag.Diagnostics) { return providerConfig{ "name": rd.Get("name").(string), }, nil }, ResourcesMap: map[string]*schema.Resource{ "dummy_thing": { Schema: withName, CreateContext: func(ctx context.Context, rd *schema.ResourceData, i interface{}) diag.Diagnostics { conf := i.(providerConfig) rd.SetId(fmt.Sprintf("%s/%s", conf["name"], rd.Get("name").(string))) return nil }, ReadContext: schema.NoopContext, UpdateContext: schema.NoopContext, DeleteContext: schema.NoopContext, }, }, } } func main() { plugin.Serve(&plugin.ServeOpts{ ProviderFunc: provider, }) } Minimal Working Terraform Provider Define and validate configuration Configure connectivity to backend Connect to backend and do something, setting resource ID on success
  7. ©2022 Databricks Inc. — All rights reserved 7 func ResourceCustomerManagedKey()

    *schema.Resource { s := common.StructToSchema(CustomerManagedKey{}, nil) p := common.NewPairSeparatedID("account_id", "customer_managed_key_id", "/") return common.Resource{ Create: func(ctx context.Context, d *schema.ResourceData, c *common.DatabricksClient) error { var cmk CustomerManagedKey common.DataToStructPointer(d, s, &cmk) customerManagedKeyData, err := NewCustomerManagedKeysAPI(ctx, c).Create(cmk) if err != nil { return err } d.Set("customer_managed_key_id", customerManagedKeyData.CustomerManagedKeyID) p.Pack(d) return nil }, Read: func(ctx context.Context, d *schema.ResourceData, c *common.DatabricksClient) error { accountID, cmkID, err := p.Unpack(d) if err != nil { return err } cmk, err := NewCustomerManagedKeysAPI(ctx, c).Read(accountID, cmkID) if err != nil { return err } return common.StructToData(cmk, s, d) }, Delete: func(ctx context.Context, d *schema.ResourceData, c *common.DatabricksClient) error { accountID, cmkID, err := p.Unpack(d) if err != nil { return err } return NewCustomerManagedKeysAPI(ctx, c).Delete(accountID, cmkID) }, Schema: s, SchemaVersion: 1, StateUpgraders: []schema.StateUpgrader{ { Version: 0, Type: ResourceCustomerManagedKeyV0(), Upgrade: migrateResourceCustomerManagedKeyV0, }, }, }.ToResource() } reflection reflection reflection boilerplate Minimal boilerplate as the primary goal Working on making things work instead of making wrappers. boilerplate
  8. ©2022 Databricks Inc. — All rights reserved 8 // AwsKeyInfo

    has information about the KMS key for BYOK type AwsKeyInfo struct { KeyArn string `json:"key_arn"` KeyAlias string `json:"key_alias"` KeyRegion string `json:"key_region,omitempty" tf:"computed"` } // CustomerManagedKey contains key information and metadata for BYOK for E2 type CustomerManagedKey struct { CustomerManagedKeyID string `json:"customer_managed_key_id,omitempty" tf:"computed"` AwsKeyInfo *AwsKeyInfo `json:"aws_key_info" tf:"force_new"` AccountID string `json:"account_id" tf:"force_new"` CreationTime int64 `json:"creation_time,omitempty" tf:"computed"` UseCases []string `json:"use_cases"` } func ResourceCustomerManagedKeyV0() cty.Type { return (&schema.Resource{ Schema: map[string]*schema.Schema{ "account_id": { Type: schema.TypeString, ForceNew: true, }, "customer_managed_key_id": { Type: schema.TypeString, Optional: true, Computed: true, }, "creation_time": { Type: schema.TypeInt, Computed: true, }, "aws_key_info": { Type: schema.TypeList, ForceNew: true, Elem: &schema.Resource{ Schema: map[string]*schema.Schema{ "key_arn": { Type: schema.TypeString, }, "key_alias": { Type: schema.TypeString, }, "key_region": { Type: schema.TypeString, Optional: true, Computed: true, }, }, }, }, }, }).CoreConfigSchema().ImpliedType() } Life is easy for compatible schema changes … but harder with incompatible changes
  9. ©2022 Databricks Inc. — All rights reserved 9 var jobSchema

    = common.StructToSchema(JobSettings{}, func(s map[string]*schema.Schema) map[string]*schema.Schema { jobSettingsSchema(&s, "") jobSettingsSchema(&s["task"].Elem.(*schema.Resource).Schema, "task.0.") jobSettingsSchema(&s["job_cluster"].Elem.(*schema.Resource).Schema, "job_cluster.0.") gitSourceSchema(s["git_source"].Elem.(*schema.Resource), "") if p, err := common.SchemaPath(s, "schedule", "pause_status"); err == nil { p.ValidateFunc = validation.StringInSlice([]string{"PAUSED", "UNPAUSED"}, false) } s["max_concurrent_runs"].ValidateDiagFunc = validation.ToDiagFunc(validation.IntAtLeast(1)) s["max_concurrent_runs"].Default = 1 s["url"] = &schema.Schema{ Type: schema.TypeString, Computed: true, } s["always_running"] = &schema.Schema{ Optional: true, Default: false, Type: schema.TypeBool, } return s }) // JobSettings contains the information for configuring a job on databricks type JobSettings struct { Name string `json:"name,omitempty" tf:"default:Untitled"` // BEGIN Jobs API 2.0 ExistingClusterID string `json:"existing_cluster_id,omitempty" tf:"group:cluster_type"` NewCluster *clusters.Cluster `json:"new_cluster,omitempty" tf:"group:cluster_type"` NotebookTask *NotebookTask `json:"notebook_task,omitempty" tf:"group:task_type"` SparkJarTask *SparkJarTask `json:"spark_jar_task,omitempty" tf:"group:task_type"` SparkPythonTask *SparkPythonTask `json:"spark_python_task,omitempty" tf:"group:task_type"` SparkSubmitTask *SparkSubmitTask `json:"spark_submit_task,omitempty" tf:"group:task_type"` PipelineTask *PipelineTask `json:"pipeline_task,omitempty" tf:"group:task_type"` PythonWheelTask *PythonWheelTask `json:"python_wheel_task,omitempty" tf:"group:task_type"` Libraries []libraries.Library `json:"libraries,omitempty" tf:"slice_set,alias:library"` TimeoutSeconds int32 `json:"timeout_seconds,omitempty"` MaxRetries int32 `json:"max_retries,omitempty"` MinRetryIntervalMillis int32 `json:"min_retry_interval_millis,omitempty"` RetryOnTimeout bool `json:"retry_on_timeout,omitempty"` // END Jobs API 2.0 // BEGIN Jobs API 2.1 Tasks []JobTaskSettings `json:"tasks,omitempty" tf:"alias:task"` Format string `json:"format,omitempty" tf:"computed"` JobClusters []JobCluster `json:"job_clusters,omitempty" tf:"alias:job_cluster"` // END Jobs API 2.1 // BEGIN Jobs + Repo integration preview GitSource *GitSource `json:"git_source,omitempty"` // END Jobs + Repo integration preview Schedule *CronSchedule `json:"schedule,omitempty"` MaxConcurrentRuns int32 `json:"max_concurrent_runs,omitempty"` EmailNotifications *EmailNotifications `json:"email_notifications,omitempty" tf:"suppress_diff"` Tags map[string]string `json:"tags,omitempty"` } Reflection is not a silver bullet And reflected schemas frequently use customization
  10. ©2022 Databricks Inc. — All rights reserved 10 Custom overlay

    on top of Terraform SDK to enable rapid growth of provider with new features
  11. ©2022 Databricks Inc. — All rights reserved 11 func (r

    Resource) ToResource() *schema.Resource { read := func(ctx context.Context, d *schema.ResourceData, m interface{}) diag.Diagnostics { err := recoverable(r.Read)(ctx, d, m.(*DatabricksClient)) if IsMissing(err) { log.Printf("[INFO] %s[id=%s] is removed on backend", ResourceName.GetOrUnknown(ctx), d.Id()) d.SetId("") return nil } if err != nil { err = nicerError(ctx, err, "read") return diag.FromErr(err) } return nil } return &schema.Resource{ Schema: r.Schema, // with ForceNew to all attributes if no Update … CreateContext: func(ctx context.Context, d *schema.ResourceData, m interface{}) diag.Diagnostics { c := m.(*DatabricksClient) err := recoverable(r.Create)(ctx, d, c) if err != nil { err = nicerError(ctx, err, "create") return diag.FromErr(err) } if err = recoverable(r.Read)(ctx, d, c); err != nil { err = nicerError(ctx, err, "read") return diag.FromErr(err) } return nil }, ReadContext: read, UpdateContext: update, DeleteContext: …, Importer: &schema.ResourceImporter{ StateContext: func(...) { d.MarkNewResource() diags := read(ctx, d, m) … return []*schema.ResourceData{d}, err }, }, Timeouts: r.Timeouts, } } no panic cleanup Diags are inconvenient auto CRUD vs CRD Correct creation with drift detection in one single place Default importers Make contributing new resource super simple Focus on nicer error messaging for the end user, automate everything with 5+ occurrences, recover from panics, hook into context.
  12. ©2022 Databricks Inc. — All rights reserved 12 func (c

    *DatabricksClient) userAgent(ctx context.Context) string { resource := "unknown" terraformVersion := "unknown" if rn, ok := ctx.Value(ResourceName).(string); ok { resource = rn } if c.Provider != nil { terraformVersion = c.Provider.TerraformVersion } return fmt.Sprintf("databricks-tf-provider/ %s (+%s) terraform/ %s", Version(), resource, terraformVersion) } func AddContextToAllResources(p *schema.Provider, prefix string) { for k, r := range p.DataSourcesMap { name := strings. ReplaceAll (k, prefix+ "_", "") wrap := op(r.ReadContext).addContext(ResourceName, name).addContext (IsData, "yes") r.ReadContext = schema.ReadContextFunc(wrap) } for k, r := range p.ResourcesMap { k = strings. ReplaceAll (k, prefix+ "_", "") addContextToResource(k, r) } } context.WithValue
  13. ©2022 Databricks Inc. — All rights reserved 13 Reverse engineering

    deployments for customers who don’t do IaC yet
  14. ©2022 Databricks Inc. — All rights reserved 14 "databricks_repo": {

    Service: "repos", Name: func(d *schema.ResourceData) string { name := d.Get("path").(string) if name == "" { return d.Id() } return strings.TrimPrefix(name, "/") }, List: func(ic *importContext) error { repoList, err := repos.NewReposAPI(ic.Context, ic.Client).ListAll() if err != nil { return err } for offset, repo := range repoList { if repo.Url != "" { ic.Emit(&resource{ Resource: "databricks_repo", ID: fmt.Sprintf("%d", repo.ID), }) } log.Printf("[INFO] Scanned %d of %d repos", offset+1, len(repoList)) } return nil }, Import: func(ic *importContext, r *resource) error { if ic.meAdmin { ic.Emit(&resource{ Resource: "databricks_permissions", ID: fmt.Sprintf("/repos/%s", r.ID), Name: "repo_" + ic.Importables["databricks_repo"].Name(r.Data), }) } return nil }, }, List all relevant resources Discover relevant resources
  15. ©2022 Databricks Inc. — All rights reserved 15 "databricks_group_instance_profile": {

    Service: "access", Depends: []reference{ {Path: "group_id", Resource: "databricks_group"}, {Path: "instance_profile_id", Resource: "databricks_instance_profile"}, }, }, resource "databricks_instance_profile" "instance_profile" { // ID: bcd instance_profile_arn = "my_instance_profile_arn" } resource "databricks_group" "my_group" { // ID: abc display_name = "my_group_name" } resource "databricks_group_instance_profile" "my_group_instance_profile" { group_id = "abc" instance_profile_id = "bcd" } resource "databricks_instance_profile" "instance_profile" { instance_profile_arn = "my_instance_profile_arn" } resource "databricks_group" "my_group" { display_name = "my_group_name" } resource "databricks_group_instance_profile" "my_group_instance_profile" { group_id = databricks_group.my_group.id instance_profile_id = databricks_instance_profile.instance_profile.id } Referential Integrity resolution
  16. ©2022 Databricks Inc. — All rights reserved 16 1. Aztfy

    (https://azure.github.io/aztfy/#1) - only Azure-focused, though has a great introduction on the intentions for this tooling. 2. Terraforming (http://terraforming.dtan4.net/) - only AWS-focused. 3. Terraformer (https://github.com/GoogleCloudPlatform/terraformer) - at the time of writing, it was easier to ship within the same artifact, as terraform provider itself, where keeping resources and resource schema was a concern. 4. TerraCognita (https://github.com/cycloidio/terracognita/) - at the time of creation, there was no significant dev activity. Similar tools for reverse-engineering deployments
  17. ©2022 Databricks Inc. — All rights reserved 17 HTTP Fixture

    framework: strange testing technique that saves plethora of time
  18. ©2022 Databricks Inc. — All rights reserved 18 func TestSomethingExciting(t

    *testing.T) { qa.ResourceFixture{ Fixtures: []qa.HTTPFixture{ //.. pure TDD! }, Resource: ResourceCatalog(), Create: true, HCL: ` name = "a" comment = "b" properties = { c = "d" } `, }.ApplyNoError(t) } func TestCatalogCornerCases(t *testing.T) { qa.ResourceCornerCases(t, ResourceCatalog()) } The failure that almost fixes itself Make testing fun again Simplest heuristics change change
  19. ©2022 Databricks Inc. — All rights reserved 19 func TestCatalogCreateAlsoDeletesDefaultSchema

    (t *testing.T) { qa.ResourceFixture{ Fixtures: []qa.HTTPFixture{ { Method: "POST", Resource: "/api/2.0/unity-catalog/catalogs", ExpectedRequest: CatalogInfo{ Name: "a", Comment: "b", Properties: map[string]string{ "c": "d", }, }, }, { Method: "DELETE", Resource: "/api/2.0/unity-catalog/schemas/a.default", }, { Method: "GET", Resource: "/api/2.0/unity-catalog/catalogs/a", Response: CatalogInfo{ Name: "a", Comment: "b", Properties: map[string]string{ "c": "d", }, MetastoreID: "e", Owner: "f", }, }, }, Resource: ResourceCatalog (), Create: true, HCL: ` name = "a" comment = "b" properties = { c = "d" } `, }.ApplyNoError (t) } After two years in practice Tests read almost like documentation - easier for new people to respond to escalations. Simple copy-pasta creates bug reproduce scaffolding, that is immediately debuggable. Favorite button
  20. ©2022 Databricks Inc. — All rights reserved 20 Adoption jump

    Adoption jump Adoption jump Unit test coverage trend Customer adoption trend
  21. ©2022 Databricks Inc. — All rights reserved 21 There’s more

    than just unit tests
  22. ©2022 Databricks Inc. — All rights reserved 22 is awesome

  23. ©2022 Databricks Inc. — All rights reserved 23

  24. ©2022 Databricks Inc. — All rights reserved 24 It’s actually

    awesome until you hit an issue like this:
  25. ©2022 Databricks Inc. — All rights reserved 25 Acceptance testing

    across AWS, Azure, and GCP With 90% unit test coverage, acceptance tests focus on making sure that 8 different authentication methods are working, API serialization is up-to-date, and there are no unexpected configurations drifts.
  26. ©2022 Databricks Inc. — All rights reserved 26 26 Always

    make data-driven decisions Enhance the most used, test the most critical, automate the reporting as much as possible, try to keep up with the features behind gated enablement, keep backwards compatibility almost forever.
  27. ©2022 Databricks Inc. — All rights reserved 27 Thank you