Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building Databricks integrations with Go

Building Databricks integrations with Go

Serge Smertin

June 13, 2024
Tweet

More Decks by Serge Smertin

Other Decks in Programming

Transcript

  1. ©2024 Databricks Inc. — All rights reserved 2 CAN BE

    EVERYTHING: FROM INFRASTRUCTURE TO APPLIED LLM USE
  2. ©2024 Databricks Inc. — All rights reserved ©2022 Databricks Inc.

    — All rights reserved 3 About Serge ▪ At Databricks since 2019 ▪ Author of Databricks Terraform Provider (written in Go) ▪ Author of Databricks SDK for Go (and Python, Java, …) ▪ Initiated replatforming of Databricks CLI into Go (from Python) ▪ Driving Databricks Labs ▪ With love-hate attitude towards GoLang since 2020
  3. ©2024 Databricks Inc. — All rights reserved ©2024 Databricks Inc.

    — All rights reserved 4 - RELEASE NOTES WITH LLMS - CLEANUP OF DATABRICKS WORKSPACES
  4. ©2024 Databricks Inc. — All rights reserved ©2024 Databricks Inc.

    — All rights reserved 5 - ~50 REPOSITORIE S
  5. ©2024 Databricks Inc. — All rights reserved ©2024 Databricks Inc.

    — All rights reserved UNRELEASED COMMITS GIT DIFF PER COMMIT SPLIT DIFF PER FILE SUMMARIZE FILE CHANGE SUMMARIZE ALL CHANGES WRITE CHANGELOG.MD WRITE ANNOUNCEMENT 1 2 3 4 5 6 7
  6. ©2024 Databricks Inc. — All rights reserved 9 var fileDiffTemplate

    = MessageTemplate(` Here is the commit message terminated by --- for the context: {{.Message}} --- Do not hallucinate. You are Staff Software Engineer, and you are reviewing one file at a time in a unified diff format. Do not use phrases like "In this diff", "In this pull request", or "In this file". Do not mention file names, because they are not relevant for the feature description. If new methods are added, explain what these methods are doing. If existing functionality is changed, explain the scope of these changes. Please summarize the input as a single paragraph of text written in American English. Your target audience is software engineers, who adopt your project. If the prompt contains ordered or unordered lists, rewrite the entire response as a paragraph of text. `) func (lln *llNotes) Commit(ctx context.Context, commit *github.RepositoryCommit) (History, error) { … err := lln.http.Do(ctx, "GET", fmt.Sprintf("https://github.com/%s/%s/commit/%s.diff", lln.org, lln.repo, commit.SHA), httpclient.WithResponseUnmarshal(&buf)) var httpErr *httpclient.HttpError if errors.As(err, &httpErr) && httpErr.StatusCode == 404 { return History{ AssistantMessage(fmt.Sprintf("Commit %s was not found", commit.SHA)), }, nil } tokens := strings.Split(commit.Commit.Message, " ") if len(tokens) > 15_000 { commit.Commit.Message = strings.Join(tokens[:15_000], " ") } return lln.explainDiff(ctx, History{ fileDiffTemplate.AsSystem(commit.Commit), }, &buf)
  7. ©2024 Databricks Inc. — All rights reserved 11 func (lln

    *llNotes) Talk(ctx context.Context, h History) (History, error) { logger.Debugf(ctx, "Talking with AI:\n%s", h.Excerpt(80)) response, err := lln. w.ServingEndpoints.Query (ctx, serving.QueryEndpointInput{ Name: lln.model, Messages: h.Messages(), MaxTokens: lln.cfg.MaxTokens, }) if err != nil { return nil, fmt.Errorf("llm: %w", err) } for _, v := range response.Choices { h = h.With(AssistantMessage(v.Message.Content)) } return h, nil }
  8. ©2024 Databricks Inc. — All rights reserved 13 NO LANGCHAIN.

    PURE DATABRICKS REST API. ONE BINARY AS RESULT.
  9. ©2024 Databricks Inc. — All rights reserved ©2024 Databricks Inc.

    — All rights reserved ~1 hour Explaining release and writing announcements ~7 minutes ~~> Removing LLM hallucinations and other minor edits … after two weeks of part-time effort in between other work and meetings I was able to reduce release time:
  10. ©2024 Databricks Inc. — All rights reserved ©2024 Databricks Inc.

    — All rights reserved 16 HOW TO WIPE THOUSANDS OF TEST JOBS / USERS … AT SCALE
  11. ©2024 Databricks Inc. — All rights reserved ©2024 Databricks Inc.

    — All rights reserved LIST ALL WORKSPACES LIST 30 OBJECT TYPES VERIFY EXPECTED CONFIGS WAIT FOR ANY CI JOBS TO FINISH RANDOMIZED PARALLEL DELETE 1 2 3 4 5