How Docker’s Fleet of AI Agents Streamlines Testing and Bug Fixing

Docker's Coding Agent Sandboxes team has created a virtual team of seven AI agents, known as the Fleet, to automate testing, issue triage, release notes, and bug fixes. Built on Claude Code skills, these agents operate autonomously in CI and locally, accelerating development and improving reliability.

What Is Coding Agent Sandboxes and the Fleet?

The Coding Agent Sandboxes project, or sbx, provides secure microVM-based isolation for running AI coding agents like Claude Code, Gemini, Codex, Docker Agent, and Kiro. Each agent gets full autonomy inside a sandbox—its own Docker daemon, network, and filesystem—without affecting the host system. On top of this, the team built the Fleet: a virtual team of seven AI agent roles that test the product, triage issues, post release notes, and even fix bugs, all running autonomously in CI. This accelerates the development cycle and reduces manual overhead.

How Docker’s Fleet of AI Agents Streamlines Testing and Bug Fixing — Source: www.docker.com

How Does the Fleet Work?

The Fleet relies on Claude Code skills, which are markdown files that define an agent's persona, responsibilities, and allowed tools. A skill is not a script with rigid steps; it is a role description that says, “you are the build engineer, here’s what you know and how you make decisions.” This distinction is crucial because agents need judgment, not just instructions. When a test fails unexpectedly, a script stops—but a role investigates. The same skill file produces identical behavior whether run on a developer’s laptop or in CI, ensuring consistency and reducing debugging time.

What Is the “Local First, CI Second” Principle?

Every skill in the Fleet is first developed and tested locally before being deployed to CI. For example, the /cli-tester skill was invoked on a developer’s terminal to build binaries, exercise CLI commands, find issues, and report them. Only after it performed correctly locally was it wired into a GitHub workflow. This approach avoids the painful commit-push-wait-read-logs cycle of debugging CI-only agents. Local iteration takes seconds, whereas CI cycles can take minutes. CI is simply another runtime for the same skill, with no separate version or translation layer—just environment setup and invocation.

How Are Skills Different from Traditional Scripts?

Traditional scripts execute a predetermined sequence of steps and fail if something unexpected occurs. Skills, by contrast, provide a role-based persona with context, decision-making guidelines, and tool access. They allow agents to adapt to novel situations. For instance, if a test fails due to an environmental issue, a skill-based agent can investigate logs, diagnose the root cause, and perhaps even apply a fix. A script would simply halt and require human intervention. This flexibility makes the Fleet effective for exploratory testing, bug triage, and release management—tasks that require judgment and not just automation.

What Specific Agent Roles Exist in the Fleet?

The Fleet includes seven distinct agent roles, each with specialized responsibilities. One example is the /cli-tester role, which performs exploratory testing of the sbx CLI tool across multiple platforms (macOS, Linux, Windows). Other roles likely handle issue triage, release notes generation, and bug fixing. Each role has its own skill file defining its persona, tools, and decision framework. These agents work autonomously in CI, running nightly to ensure continuous coverage. By offloading these tasks to agents, the team maintains high velocity without dedicating full-time human effort to repetitive or analytical work.

How Does the Fleet Handle Multiplatform Testing?

sbx runs on macOS, Linux, and Windows. Every release must be tested across all three platforms, including upgrade paths and sustained load to catch resource leaks. The Fleet’s /cli-tester skill executes the same markdown-based skill on each platform’s CI runners nightly. The skill autonomously builds the binaries, tests CLI commands, validates behavior, and reports issues. Because the skill is platform-agnostic (it runs in the sandboxed environment), the same role works identically everywhere. This ensures comprehensive, consistent testing without maintaining separate test suites for each OS.

What Are the Key Benefits of Using Agent Teams?

Using a fleet of AI agents provides several advantages. Speed: agents work in parallel and run continuously, reducing manual testing and triage time. Consistency: the same skill runs locally and in CI, eliminating environment-specific bugs. Autonomy: agents investigate failures and adapt, not just stop. Scalability: adding new roles is as simple as writing a new skill file. Focus: human developers can concentrate on feature work instead of routine testing and bug triage. The Fleet effectively acts as a force multiplier, allowing the small team to ship faster and with higher quality.

Tags: