Agents as Pets: An Experiment

A Borg cube in space with a cat sitting in a tiny illuminated cat flap on one of its faces.

Most AI coding agent runners I've used (or built) treat each invocation as disposable. New container, new context window, new checkout, fresh npm install, replay the conversation history if you want any of it back. The agent is a stateless function: input prompt, output diff, throw it away.

This is a perfectly good model. At work I built something that does exactly that, and for the workloads it serves (a fleet of short-lived agents, each handling one well-defined task) it's the right call. Cattle scale. Cattle isolate. Cattle don't surprise you with leftover state from yesterday.

But it's not the only model. And I got curious what the other end of the spectrum looks like. So I built Vinculum for fun, in a weekend, to find out.

Why Pets Might Be Interesting

The cattle pattern came from a real place. Reproducibility, isolation, easy cleanup. The same logic that made stateless web servers a good idea fifteen years ago. And for one-shot work (code review, batch refactors, anything you'd run from CI) it's clearly correct.

But not all agent work is one-shot. Some of it is a session. You're working on a thing, you poke it, you read the diff, you push back, you poke again. Each round depends on the round before. An agent that already knows we tried the obvious fix two prompts ago and it broke the test suite is a more useful collaborator than one that has to re-derive that from a 400-line conversation log on every boot.

And it's not just conversation. It's the workspace. A clone of a real repository takes minutes. The first run pulls a Docker image, runs npm install, indexes the codebase, maybe boots a dev server. The second run, in a fresh container, does it all again. That's a real tax, and one of the costs you implicitly accept when you go cattle.

What if you just… didn't? Kubernetes is also good at running long-lived stateful workloads.

Drones That Remember

Vinculum runs each agent as a long-lived Kubernetes Deployment. One pod per agent. The pod holds an open charmbracelet/crush session and a persistent /workspace PVC. You submit work to it as Task resources, which run serially inside the same pod. Conversation history accumulates. Filesystem state accumulates. The node_modules from the last task is still there for the next one.

Resistance is, in fact, fertile.

The four CRDs are the whole API surface:

apiVersion: vinculum.dev/v1alpha1
kind: Agent
metadata:
  name: locutus
spec:
  model: claude-sonnet-4-6
  providerSecretRef: anthropic-keys
  workspaceSize: 10Gi
  mcpServerRefs: [filesystem, github]
---
apiVersion: vinculum.dev/v1alpha1
kind: Task
metadata:
  generateName: refactor-
spec:
  agentRef: locutus
  prompt: "Convert the auth middleware to async/await."
  workspace: { mode: shared }

The operator reconciles Agent into a Deployment + Service + PVC + RBAC. Task is a unit of work. AgentSchedule is a cron trigger that stamps Tasks from a template (your nightly "audit dependencies" agent that picks up where it left off). MCPServer is a reusable tool definition you attach to agents by reference.

That's the whole thing. Everything else is derived.

MCP Servers as First-Class Drones

This is the design decision I'm most quietly proud of. Most agent runners treat MCP servers as configuration buried inside the agent definition: every agent declares its own copy of "here's the GitHub MCP, here's the filesystem MCP, here's the credentials". You end up with the same six MCP server definitions copy-pasted across every agent.

In Vinculum, an MCPServer is its own resource. Declare it once:

vnclm create mcp --name filesystem --command npx \
  --arg -y --arg @modelcontextprotocol/server-filesystem --arg /workspace --enabled

…then any agent attaches it by reference: mcpServerRefs: [filesystem, github]. Stdio MCPs run as subprocesses inside the agent pod with their secrets injected via envFrom. HTTP MCPs are referenced by URL. One server definition, many drones using it. Update the GitHub MCP's secret in one place, every agent sees it next reconcile.

It's a small thing, but it's the kind of thing that only feels obvious after you've built it.

How a Pile of Go Code Got Built

Full disclosure: this got built almost exclusively by vibe-coding. A Kubernetes operator, an in-pod agent supervisor, a CLI with port-forwarding and shell completion, a Helm chart, a Homebrew tap, a GitHub Actions release pipeline, a marketing site. Three Go services, an OCI-distributed chart, the works. I did not personally write most of the lines.

This is the part that probably needs unpacking, because the discourse around "vibe coding" tends to swing between two equally lazy poles: "it's magic, anyone can ship anything" and "it's slop, nothing real ever gets built this way". The truth, as usual, is more interesting.

What the agent was great at:

  • The boilerplate operator code. controller-runtime reconcilers, CRD scaffolding, status patching, finalizers, RBAC manifests. This is the sort of code that takes hours to write and minutes to verify, which is exactly the shape AI handles well.
  • Glue work: the GitHub Actions matrix, the Homebrew formula, the goreleaser config, the vhs tape for the demo gif. All things I know how to do but find tedious.
  • The website. I described what I wanted; it produced HTML that looked nothing like generic AI slop because I'd given it a strong, weird brief (Borg, terminal aesthetics, Star Trek quotes). Strong taste in, strong output out.

What the agent could not do for me:

  • Decide to try the pets model in the first place. That's a design choice. It came from getting curious about a corner of the design space my work project deliberately ignored, and wondering what it would feel like. Agents are happy to build whatever you ask; they won't volunteer "actually, what about the opposite of what you do at work?"
  • Decide that MCP servers should be their own CRD instead of nested config. That's an architectural call about coupling and reuse, the kind of decision where the agent is happy to do whichever you ask, but neither answer is what it'll volunteer.
  • Name it Vinculum. The Borg framing is the whole personality of the project. It's not retrofitted color, it's the thing that made the design coherent in my head before any code was written.

I wrote about this from a different angle in Vibe-Docs: the agent is excellent at the work that's well-specified and tedious, and useless for the work of figuring out what should be built. Vinculum was the same lesson played out on a side project I wouldn't have realistically shipped on my own time, two years ago. The thinking was mine. The typing was mostly the agent's. That's a real shift in how much one person can ship on a weekend, and I don't think enough people have noticed yet.

Try It

Vinculum is open source (MIT, on GitHub) and at v0.1.0. If you have a Kubernetes cluster lying around (any flavor, any context), three commands gets you assimilated:

helm install vinculum oci://ghcr.io/florianwenzel/helm/vinculum \
  --version 0.1.0 -n vinculum-system --create-namespace

brew install FlorianWenzel/vinculum/vnclm

vnclm create provider   # interactive wizard
vnclm create agent      # interactive wizard
vnclm run "Compose a haiku about the Borg collective."

Things I think Vinculum is good for: long iterative coding sessions where you don't want to lose context, multi-agent setups where each agent has its own personality + workspace + tool loadout. Things it isn't good for: anything where you want strong sandboxing between tasks, anything that needs more than one replica per agent, hosted multi-tenant scenarios. (For those, use cattle.)