Upgrade to Pro — share decks privately, control downloads, hide ads and more …

From Days to Minutes: How We Taught an AI to On...

From Days to Minutes: How We Taught an AI to Onboard 50+ Tenants on our AI Features

We run a multi-tenant AI platform on Databricks serving 50+ B2B wholesale customers. Each customer gets 5+ ML/Data products eployed as Databricks Asset Bundles across 3 environments. Configuring these AI products for a new customer used to take 2-3 days of ML engineer time: analyzing the customer's data, tuning algorithm parameters, and generating the right bundle configuration.

We reduced that to ~30 minutes by building two things: (1) dabgen, a code generator that sits above Databricks Asset Bundles and produces tenant-specific bundles from hierarchical config templates, and (2) Claude Code skills -- AI-powered workflows that query the customer's production data via the Databricks SQL MCP server, make data-driven configuration decisions, and generate the bundle with human confirmation at key checkpoints.

Avatar for Miguel Cabrera

Miguel Cabrera

April 27, 2026

More Decks by Miguel Cabrera

Other Decks in Education

Transcript

  1. RUNBOOK № 01 DATABRICKS UG BERLIN · 2026-04-28 From Days

    to How we taught an AI to onboard 50+ tenants on our AI features. AUTHOR Miguel Cabrera ROLE AI/ML Engineering Lead ORG Plato · platoapp.ai Minutes.
  2. § 0 0 · A U T H O R

    N O T E About me. MIGUEL CABRERA ROLE AI/ML Engineering Lead AT Plato · platoapp.ai PRIOR Shopify · NewYorker · TrustYou COMMUNITY Co-founder, Munich Datageeks Past organiser, PyData Berlin & PyBerlin A B O U T P L A T O SALES INTELLIGENCE FOR B2B WHOLESALERS. We predict which customers are about to leave, which products they should be buying, and which accounts deserve the most attention. 50+ T E N A N T S 25+ A L G O R I T H M S 100+ C A T A L O G S ALL ON DATABRICKS. RUNBOOK · FROM DAYS TO MINUTES DATABRICKS UG BERLIN · 2026-04-28 · MIGUEL CABRERA · PLATO
  3. § 0 1 · P R O L O G

    U E · T H E D E S K V I S I T The day a demo turned into an ultimatum. In B2B SaaS, every signed customer comes with a pilot. They give us their data and want to see the AI magic as soon as possible. “ We have a demo tomorrow. Can we have the AI features live? CUSTOMER SUCCESS LEAD · STANDING AT MY DESK Our Customer Success lead, standing at my desk. CS is excited. I'm doing the math in my head. RUNBOOK · FROM DAYS TO MINUTES DATABRICKS UG BERLIN · 2026-04-28 · MIGUEL CABRERA · PLATO
  4. § 0 2 · P R O D U C

    T · E X H I B I T A What the sales rep sees. F O U R S U R F A C E S · O N E D A I L Y P I P E L I N E 01 Churn signals revenue impact 02 Cross-sell category & item 03 Account timeline next action 04 ROI tracking AI impact Tip of the iceberg. Generated daily for every customer. Now let me show you what it takes to build that. RUNBOOK · FROM DAYS TO MINUTES DATABRICKS UG BERLIN · 2026-04-28 · MIGUEL CABRERA · PLATO
  5. § 0 2 · S C O P E The

    math that broke us. What it actually is — two parallel tracks: T R A C K 0 1 · I N F R A S T R U C T U R E Terraform catalogs · S3 · IAM · service principals · databases ONGOING EFFORT — NOT TODAY'S FOCUS T R A C K 0 2 · A I P R O D U C T S dabgen + skills 5 product lines · 25+ algorithms · business rules TODAY'S FOCUS — WAS ENTIRELY MANUAL 50+ tenants × 5 product lines × 25+ algorithms 250+ job configs AI onboarding per customer 2–3 days of ML engineer time Growth target 10+ new customers / quarter We needed to go from artisan to factory — for the AI layer. RUNBOOK · FROM DAYS TO MINUTES DATABRICKS UG BERLIN · 2026-04-28 · MIGUEL CABRERA · PLATO
  6. § 0 2 · F I G . 1 The

    end-to-end data flow. Every box is tenant-isolated. Everything on Databricks ships via DABs. The 2–3 days of onboarding? Configuration and Hyperparameters of PlatoML algorithms. RUNBOOK · FROM DAYS TO MINUTES DATABRICKS UG BERLIN · 2026-04-28 · MIGUEL CABRERA · PLATO
  7. § 0 3 What 2-3 days actually means. nolte.yml —

    the tenant-specific config override. 14+ lines. Each one is a data-driven decision. KNN VS INDUSTRY? Query the data. ALS RANK? Catalog density. REGULARIZATION? Matrix sparsity. CLUSTER COUNT? Measure the CV. REPEAT THRESHOLD? Check the p75. This is what took 2–3 days per customer. Not the tooling — the decisions. RUNBOOK · FROM DAYS TO MINUTES sales_insights: category_depth: 2 # depths 1-4, 2 optimal revenue_estimation_method: combined # industry_sector 0.05% NULL category_recommender: als_factors: 8 # low rank — small catalog als_regularization: 8 # high reg — sparse matrix als_iterations: 50 # converges fast share_of_wallet: n_clusters: 11 # CV=1.60 → moderate threshold_by_customer: 0.01 # sparse purchase matrix previously_purchased_category: min_repeat_purchase_rate: 0.35 # p75 = 36% item_recommendations: min_transactions: 50 # noise floor DATABRICKS UG BERLIN · 2026-04-28 · MIGUEL CABRERA · PLATO
  8. § 0 3 · P R I M E R

    Quick primer: Databricks Asset Bundles (DABs). T Y P I C A L D A B L A Y O U T my_pipeline/ ├── databricks.yml └── resources/ └── jobs.yml W H A T D A B S G I V E Y O U Version-controlled job YAML Multi-environment deploys CI/CD integration W H A T D A B S D O N ' T S O L V E 50+ tenants, each needing a slightly different version of the same bundle. That's where we needed something more. RUNBOOK · FROM DAYS TO MINUTES # databricks.yml -- this is what a DAB looks like bundle: name: ai__nolte resources: jobs: churn_warnings: name: "[nolte] [sales_insights] generation" schedule: quartz_cron_expression: "0 0 2 ? * *" tasks: - task_key: calculate_churn notebook_task: notebook_path: ../jobs/churn_warnings.py DATABRICKS UG BERLIN · 2026-04-28 · MIGUEL CABRERA · PLATO
  9. § 0 3 · D A B G E N

    Generating generators: dabgen. DABs don't have native multi-tenant provisioning. We needed a generator that generates the bundle configs. Claude Code skills Intent layer · analyzes data, decides config NEW · AI AUTOMATION ↓ dabgen Generation layer · config + templates → bundles OUR TOOL ↓ Databricks Asset Bundles Deployment layer · YAML → deployed jobs DATABRICKS NATIVE ↓ Databricks Runtime layer · jobs · clusters · catalogs PLATFORM CDK → CloudFormation. Terragrunt → Terraform. dabgen → DABs. RUNBOOK · FROM DAYS TO MINUTES DATABRICKS UG BERLIN · 2026-04-28 · MIGUEL CABRERA · PLATO
  10. § 0 3 · E X H I B I

    T Override by example. Global defaults + a per-tenant file → the final merged config the bundle deploys. D E F A U L T . Y M L . J 2 · G L O B A L T E M P L A T E production: sp_id: "ssm:///{{ tenant_slug }}/runner_id" sales_insights: category_depth: 2 revenue_estimation_method: industry category_recommender: als_factors: 16 als_regularization: 0.05 als_iterations: 100 share_of_wallet: n_clusters: 5 N O L T E . Y M L · O V E R R I D E sales_insights: revenue_estimation_method: combined category_recommender: als_factors: 8 als_regularization: 8 als_iterations: 50 share_of_wallet: n_clusters: 11 M E R G E D · W H A T D E P L O Y S sales_insights: category_depth: 2 revenue_estimation_method: combined category_recommender: als_factors: 8 als_regularization: 8 als_iterations: 50 share_of_wallet: n_clusters: 11 Tenant file wins on conflict. Fields the tenant omits inherit from defaults. Most tenants override fewer than 15 fields. M E C H A N I S M OmegaConf.merge(default, tenant) → resolve ssm:/// refs → Jinja2 render templates → bundles/dist/{tenant}/ RUNBOOK · FROM DAYS TO MINUTES DATABRICKS UG BERLIN · 2026-04-28 · MIGUEL CABRERA · PLATO
  11. § 0 3 · D A B G E N

    dabgen: one command, 50+ tenants. default.yml.j2 Global defaults + SSM {tenant}.yml 14-line override ▶ Merged config OmegaConf merge ▶ Resolved config SSM refs injected 5 product bundles / tenant databricks.yml.j2 + resources/*/jobs.yml.j2 → dozens of YAML files Every tenant. Regenerated in seconds. RUNBOOK · FROM DAYS TO MINUTES python scripts/dabgen.py --auto -w --sync DATABRICKS UG BERLIN · 2026-04-28 · MIGUEL CABRERA · PLATO
  12. § 0 3 · S A F E T Y

    · F I R S T - C L A S S , N O T A N A F T E R T H O U G H T "This is boring, and that’s the point." L O C A L — D A B G E N Preview dry run Stage bundles/tmp/ Sync bundles/dist/ Validate bundle validate Review PR diff ▲ human + @claude review C I / C D — G I T H U B A C T I O N S dab-check on PR · parallel bundle validate per tenant ▶ dab-deploy on merge to main · matrix deploy to prod AI generates → human + @claude review → CI validates → merge triggers deploy. RUNBOOK · FROM DAYS TO MINUTES DATABRICKS UG BERLIN · 2026-04-28 · MIGUEL CABRERA · PLATO
  13. § 0 4 · D E F I N I

    T I O N S What’s a Claude Code skill? Claude Code is Anthropic's CLI coding agent — it operates in your repo. SKILL.md files in .claude/skills/ tell it how to do specific workflows. It picks the right one based on your prompt. C L A U D E C O D E S K I L L · E X E C U T A B L E M A R K D O W N Runs the SQL query against Databricks. Analyzes the results programmatically. Makes data-driven parameter decisions. Asks for human confirmation only when needed. Same knowledge. The AI executes it. Our skills connect to the Databricks SQL MCP server to query Unity Catalog directly. RUNBOOK · FROM DAYS TO MINUTES T R A D I T I O N A L R U N B O O K · M A R K D O W N D O C "Run this SQL query…" "Check if the result is > 100…" "If yes, set parameter to X…" Engineer reads, interprets, executes. DATABRICKS UG BERLIN · 2026-04-28 · MIGUEL CABRERA · PLATO
  14. § 0 4 · E X H I B I

    T A skill is just a file. Markdown frontmatter, steps, SQL snippets. No DSL. Versioned alongside the code it operates on. RUNBOOK · FROM DAYS TO MINUTES DATABRICKS UG BERLIN · 2026-04-28 · MIGUEL CABRERA · PLATO
  15. § 0 4 · T A X O N O

    M Y · T H E S K I L L F A M I L Y Eight composable skills. T I E R 0 1 · O R C H E S T R A T O R S — C O M P O S E O T H E R S K I L L S tenant-onboarding End-to-end customer onboarding bsr-issue-resolver Linear ticket → investigation → PR T I E R 0 2 · S P E C I A L I S T S — D O O N E T H I N G W E L L bundle-generator generate YAML bsr-annotation-gen NL → regex bsr-executor run on prod job-run-inspector diagnose failures insight-triage analyze feedback T I E R 0 3 · P R I M I T I V E S — S H A R E D B U I L D I N G B L O C K S notebook-runner called by bsr-executor Databricks SQL MCP SQL against Unity Catalog Linear MCP tickets · comments · PRs Orchestrators chain specialists — specialists call primitives. RUNBOOK · FROM DAYS TO MINUTES DATABRICKS UG BERLIN · 2026-04-28 · MIGUEL CABRERA · PLATO
  16. § 0 4 · W O R K F L

    O W What the onboarding skill does. 10-step automated workflow: 01 Verify infrastructure & data history catalogs, order volume 02 Revenue estimation method knn / industry / combined 03 Category depth configuration scores depths 1–4 04 Previously purchased category p75 repeat-purchase rate 05 Share of Wallet params cluster sizing via CV 06 Inactive product filtering remove noise 07 Generate consolidated bundle dabgen , 5 product bundles 08 FBT parameters item_detail_page thresholds 09 Validate ITC/GTC coverage training/generation data 10 Commit and PR human review Each step runs real SQL against production. Human checkpoints: config sign-off, PR approval. Re-run to catch drift. RUNBOOK · FROM DAYS TO MINUTES DATABRICKS UG BERLIN · 2026-04-28 · MIGUEL CABRERA · PLATO
  17. § 0 4 · K N O W L E

    D G E S C A F F O L D I N G The agent doesn’t know your stack. Claude Code knows bash , git , python . Your data model, your SQL dialect, your deploy order — everything else, you teach it. 00 Documented data model DATA_MODEL.md — schema, joins, semantics. Same prereq as Databricks AI/BI Genie. FOUNDATION 01 CLAUDE.md project identity · key commands · repo layout EVERY SESSION 02 .claude/rules/ standing knowledge — SQL style · logging · bundle ops · data-querying protocol WHEN RELEVANT 03 .claude/skills/ workflows — onboard tenant · resolve BSR issue · generate bundle ON DEMAND 04 Tools databricks cli · dabgen.py · MCP + query_databricks.py fallback HOW IT ACTS Prompt-engineering gets you a demo. Knowledge scaffolding gets you production. RUNBOOK · FROM DAYS TO MINUTES DATABRICKS UG BERLIN · 2026-04-28 · MIGUEL CABRERA · PLATO
  18. § 0 5 · D E M O · R

    E - O N B O A R D I N G T E N A N T " N O L T E " Onboarding a real customer. 01 REAL SQL AGAINST UNITY CATALOG 02 DATA-DRIVEN CONFIG 03 2 HUMAN DECISIONS RUNBOOK · FROM DAYS TO MINUTES DATABRICKS UG BERLIN · 2026-04-28 · MIGUEL CABRERA · PLATO
  19. § 0 6 · R E S U L T

    S The scoreboard. A I P R O D U C T O N B O A R D I N G — B E F O R E V S A F T E R METRIC BEFORE AFTER AI config time 2–3 days ~30 min Config errors 1–2 per run 0 (validated) Engineer hrs/month 20+ ~5 (review) Tenants with all 5 products ~60% 100% Active tenant configs N/A 50+ W H A T T H E D E M O S H O W E D 16 SQL queries 7 Config decisions dozens Files generated 2 Human decisions 1 Skill invoked Zero config errors. Every parameter validated against real data. Bonus: re-run on existing tenants to catch drift — data-model changes, catalog growth, behavior shifts. RUNBOOK · FROM DAYS TO MINUTES DATABRICKS UG BERLIN · 2026-04-28 · MIGUEL CABRERA · PLATO
  20. § 0 7 · T A K E A W

    A Y S · P A T T E R N S T O S T E A L Eight patterns you can steal. 01 Generator of generators Template the templates. CDK for CloudFormation, dabgen for DABs. 02 Skills, not scripts Convert runbooks into executable Claude Code skills. 03 Safety workflows for AI Stage, validate, then review. Never let AI deploy directly. 04 Hierarchical config merging Default + override. Most tenants need zero overrides. 05 LLM as translator Natural language to regex, blacklists, whitelists. 06 Data model as foundation AI agents need well-documented data. Same as Databricks AI/BI Genie. 07 Compose skills into loops Linear ticket → investigation → fix → PR. Fully autonomous. 08 Knowledge scaffolding CLAUDE.md → rules/ → skills/ → tools. RUNBOOK · FROM DAYS TO MINUTES DATABRICKS UG BERLIN · 2026-04-28 · MIGUEL CABRERA · PLATO
  21. § 0 7 · C A L L T O

    A C T I O N · T H R E E T H I N G S F O R T H I S W E E K Start here. 1 Pick your most repetitive Databricks workflow. Write a Claude Code skill file for it. Just create .claude/skills/your-skill/SKILL.md . 2 If you manage multiple tenants, build a config hierarchy. Default + override pattern. You'll be amazed how many tenants need zero overrides. 3 Try Claude Code with the Databricks SQL MCP server. Let the AI query your catalogs directly. Start read-only. The results are startling. RUNBOOK · FROM DAYS TO MINUTES DATABRICKS UG BERLIN · 2026-04-28 · MIGUEL CABRERA · PLATO
  22. Questions? Remember that desk visit? Demo tomorrow, AI features live.

    Now when CS walks over, the AI products are configured in 30 minutes. Miguel Cabrera AI/ML ENGINEERING LEAD PLATO · PLATOAPP.AI @MFCABRERA GITHUB · LINKEDIN · X SLIDES · SPEAKERDECK RUNBOOK · FROM DAYS TO MINUTES DATABRICKS UG BERLIN · 2026-04-28 · MIGUEL CABRERA · PLATO
  23. Backup slides multi-tenant pattern · autonomous loop RUNBOOK · FROM

    DAYS TO MINUTES DATABRICKS UG BERLIN · 2026-04-28 · MIGUEL CABRERA · PLATO
  24. B A C K U P The multi-tenant pattern. E

    X A M P L E : T E N A N T _ S L U G = N O L T E T H E O N B O A R D I N G P I P E L I N E Terraform provisions infra → dbt ingests data → dabgen configures AI products → web app serves insights The slug is the glue. Every tool in the stack derives resource names from it. RUNBOOK · FROM DAYS TO MINUTES Data Catalog: customer__{tenant_slug}__{env} AI Catalog: internal__{tenant_slug}__{env} Service Principal: ai-{tenant_slug}-runner customer__nolte__production internal__nolte__production ai-nolte-runner DATABRICKS UG BERLIN · 2026-04-28 · MIGUEL CABRERA · PLATO
  25. B A C K U P · B S R

    - I S S U E - R E S O L V E R The autonomous loop: from ticket to fix. 01 Reads a Linear ticket "Customer X wants to exclude shipping fees" READ 02 Investigates the data queries Unity Catalog, analyzes revenue impact QUERY 03 Designs regex patterns German terms, compound words, case sensitivity DESIGN 04 Validates against prod RLIKE queries, confirms match counts VALIDATE 05 Executes business rules runs notebooks on Databricks EXECUTE 06 Posts investigation report results back to the Linear ticket REPORT RUNBOOK · FROM DAYS TO MINUTES DATABRICKS UG BERLIN · 2026-04-28 · MIGUEL CABRERA · PLATO
  26. § 0 4 · L L M A S T

    R A N S L A T O R The business rules magic. The hardest part isn't the SQL. It's translating business intent into machine-readable rules. 0 1 · S A L E S I N T E R V I E W "Exclude customers starting with Kasse or Bar." 0 2 · R E G E X P A T T E R N (?i)^(Kasse|Bar) 0 3 · V A L I D A T I O N 23 matches RLIKE against prod 0 4 · B U S I N E S S R U L E BSR: blacklist + entity links Supports blacklists, whitelists, specific value conditions — all path-based and auditable. The LLM handles German business terms and validates every rule against real data. M O R E E X A M P L E S BUSINESS REQUEST GENERATED REGEX MATCHES "Exclude shipping fees" (?i)versand.*kosten|verpackungskosten 847 products, 98.8% revenue "Customers containing Zugabe" (?i).Zugabe. 12 matches "Products with asterisk prefix" ^(*) 34 matches RUNBOOK · FROM DAYS TO MINUTES DATABRICKS UG BERLIN · 2026-04-28 · MIGUEL CABRERA · PLATO
  27. Skills, Not Scripts "Runbooks document what to do. Skills do

    it." RUNBOOK · FROM DAYS TO MINUTES DATABRICKS UG BERLIN · 2026-04-28 · MIGUEL CABRERA · PLATO