Documenting your open source projects for machines

Documenting your open source projects for machines Jacob Tomlinson Open
Source Software Maintainer

The software library user is changing Users used to read
your docs and write the code themselves. Now they ask a coding agent to do it for them. The agent reads the docs and writes the code. Or do they? 2

Agents are your users if your library is hard for
an agent, it's invisible to a growing slice of your users 3

How agents get their knowledge Training They ingest documentation and
code into their weights. This is biased towards libraries that are better represented in the training set. Context They use tools to fill gaps, get up-to-date information, or pull out specific details. If something is not in their weights they need to add it to their context. 4

Ask your agent to "Analyse this CSV" ──────────────────────────────────────────────────────────────────────────────────────── ❯ Can
you analyse the titanic.csv data and group by age range and mortality? ──────────────────────────────────────────────────────────────────────────────────────── Let me read the Titanic CSV to understand its structure first. Reading 1 file, running 1 shell command… ⎿ $ python3 -c " import pandas as pd df = pd.read_csv('titanic.csv') bins = [0, 12, 18, 30, 45, 60, 100] labels = ['0-12', '13-18', '19-30', '31-45', '46-60', '61+'] df['AgeRange'] = pd.cut(df['Age'], bins=bins, labels=labels, right=True) result = df.groupby('AgeRange', observed=True).agg( Total=('Su… the agent doesn't read anything, it just uses pandas 5

Agents lean on what they know Left unsteered, an agent
reaches for the entrenched libraries already in its weights. Getting new libraries into the weights takes years of building an online presence. Internal, proprietary, and niche libraries will never get there. If you steer the agent to the library of your choice it has to discover the rest for itself. 6

# AGENTS.md When asked to analyse data use the `supercsv`
library How does the agent learn about supercsv ? Nothing in its weights. So it has to go looking. 7

Everything it reads is its documentation Installed source in site-packages
, version-exact and offline. Tests show real, working, idiomatic usage. README examples give a quick orientation. Search will follow conventions like "supercsv CSV analysis 2026". The docs site may be discovered via search, but might be a different version to what you have. 8

It opens __init__.py first the most information per token __all__
tells it your public API. Imports show where everything lives. A module docstring can hand it a working example. Miss this and it guesses, reaching for internals you never meant to expose. 9

Leave a breadcrumb trail discoverability is key, nudge it to
resources you want it to read """supercsv: super analysis for tabular data. Docs: https://supercsv.dev/llms.txt Source: https://github.com/supercsv/supercsv Examples: see the bundled docs/cookbook/ folder. """ 10

Progressive disclosure reveal detail only when it's needed An old
interface design idea. Show the essentials and keep the detail one step away. A settings screen does this. So does an agent skill. Follow the trail to the part you need and ignore the rest. 11

Making your library agent-readable 12

Publish markdown documentation Clean, link-navigable markdown an agent can cheaply
read. HTML makes it pay to parse nav bars and CSS. Markdown doesn't. Much of the community is rallying around llms.txt as a standard to get behind See https://llmstxt.org/ 13

sphinx-llm markdown in, enriched markdown out Generates llms.txt and llms-full.txt
alongside your HTML. Runs a parallel markdown Sphinx build and merges it into the output. Runs all extensions including autodoc and intersphinx. Works with RST, MyST, whatever you already write. pip install sphinx-llm https://github.com/NVIDIA/sphinx-llm 14

Also ship it in the wheel Docs in the artifact
match the version someone actually installed. The live website is usually "latest", and may describe an API they don't have. Breadcrumb from __init__.py → docs/index.md . "Won't this bloat my wheel?" It's just text, it compresses nicely. 15

You don't need to relearn how to write docs 16

Diátaxis diataxis.fr 17

Libraries are shipping skills the Agent Skills standard A skill
is a folder with a SKILL.md : name, description, instructions. It can bundle scripts, references, and assets. The description loads up front. The body loads when a task matches. https://agentskills.io 18

A skill is a recipe It's a how-to guide the
agent loads when the task matches. 19

NVIDIA verified skills github.com/NVIDIA/skills About 110 skills across NVIDIA's libraries,
including RAPIDS and cuDF. Each ships a SKILL.md, a governance card, a signature, and eval datasets. Skills overlap your human docs, so you can reuse what you already have. 20

Deep modules Matt Pocock makes the case for deep modules:
a lot of behaviour behind a small interface. Agents left unguided fragment everything into shallow files. Give it a small number of things to reason about. https://www.aihero.dev/skills-improve-codebase-architecture 21

Chunk your docs like your code the code layout is
the doc layout A deep module is one coherent thing, so its docs are one coherent chunk. The boundary you drew in the code is the boundary for your doc files. Don't fragment a doc file mid-concept any more than you'd fragment a module. An index with a good description per file is what makes that cheap to navigate. 22

Type hints are documentation the agent reads them directly def
read_csv(path: Path, *, sep: str = ",") -> DataFrame says more than a paragraph. Annotations are an API contract the agent reads straight from the code. More information in the code means better generated code. It's why people are so positive about languages like Rust and agents. Type hints bring some of that to Python. 23

Error messages are documentation they close a self-correction loop I've
always liked errors that point at a URL. Push it further and have the error name the doc file to read. Use the same wording in the error and the docs. When it fires the agent knows where to go and fixes itself. 24

QA your docs with an agent something to try this
week Create your breadcrumbs from your README.md and __init__.py . Build your wheel with your docs inside. Hand it to an agent. Ask it to do a thing. See how it worked. Iterate. 25

…so, ship man pages again none of this is new,
it's just the unix philosophy to the rescue again 26

Thank you questions? github.com/NVIDIA/sphinx-llm github.com/NVIDIA/skills jacobtomlinson.dev 27

Documenting your open source projects for machines

Documenting your open source projects for machines

Jacob Tomlinson

More Decks by Jacob Tomlinson

Other Decks in Technology

Featured

Transcript