Build Your Own AI Agent Fleet: A Step-by-Step Guide to Shipping Faster with Virtual Teams

Introduction

Imagine a team of seven AI assistants working around the clock to test your product, triage bugs, write release notes, and even fix issues – all without you lifting a finger. That’s exactly what the Coding Agent Sandboxes team at Docker achieved with their “Fleet” of virtual agents. By leveraging Claude Code skills and secure sandbox isolation, they transformed a set of static scripts into a dynamic, autonomous workforce that accelerates shipping. In this guide, you’ll learn how to build a similar fleet, step by step, from defining agent roles to running them seamlessly in CI. Whether you’re a solo developer or part of a larger team, this approach can help you ship faster and reduce manual overhead.

Build Your Own AI Agent Fleet: A Step-by-Step Guide to Shipping Faster with Virtual Teams — Source: www.docker.com

What You Need

Claude Code (or a compatible AI coding agent platform) – the engine that runs your skills.
A sandboxing tool (like Docker’s Coding Agent Sandboxes, known as sbx) to provide secure, microVM-based isolation for your agents. This gives each agent its own Docker daemon, network, and filesystem without touching your host.
Markdown editor for writing skill files – each skill defines an agent’s persona, responsibilities, and allowed tools.
CI/CD platform (e.g., GitHub Actions) to orchestrate scheduled and event-driven agent runs.
Access to your codebase and issue tracker so agents can test, review, and triage.
Patience and a test-driven mindset – you’ll iterate on skills locally before unleashing them in production.

Step 1: Define Your Agent Roles and Responsibilities

Before writing any code, identify the manual tasks that slow down your shipping. Common candidates include exploratory testing, regression checks, triaging issues, writing release notes, and fixing repetitive bugs. For each task, define a distinct role (e.g., “CLI Tester,” “Build Engineer,” “Release Manager”). Give each role a clear set of responsibilities and boundaries. For example, a CLI tester should focus on exercising commands and reporting failures, while a build engineer might manage version upgrades and performance benchmarks. This clarity will guide the skill file you create in the next step.

Jump to Step 2

Step 2: Create Skill Files as Role Descriptions

A skill file is a markdown document that describes an agent’s persona, what it knows, and how it makes decisions. Think of it as a role description, not a script. For instance, a build engineer skill might say: “You are an expert in Docker builds. Your main task is to compile the CLI tool across three platforms (MacOS, Linux, Windows) and flag any compilation errors. You have access to a sandbox with internet and the source repository. When a build fails, investigate the error message and propose a fix.”

Structure your skill file with clear sections: Persona, Responsibilities, Tools & Permissions, Decision Rules. Use natural language that the AI can interpret. The key is to enable judgment – if a test fails unexpectedly, the agent should investigate, not stop. Save each file with a descriptive name like /cli-tester-skill.md. Ensure the same skill behaves identically whether run on your laptop or in CI.

Now let’s test it locally

Step 3: Test Skills Locally First

Do not wire your skill directly into a CI workflow. Instead, run it on your development machine using the same environment (sandbox + AI agent) that CI will use. Invoke the skill manually: watch the agent think, see where it gets confused, and listen to its decisions. This fast feedback loop saves hours of debugging later. For example, if your CLI tester skill builds the binary and runs commands, observe whether it correctly identifies a broken flag or misinterprets an error. Tweak the skill file, re-invoke, and repeat until it performs as desired. Only after local success should you consider CI integration.

Continue to Step 4

Step 4: Wire the Skill into CI Without Modification

Your skill file is now validated locally. To run it in CI, create a workflow (e.g., GitHub Actions) that sets up the sandbox environment, checks out the repository, and calls the exact same skill file. Do not create a separate “CI version.” The workflow should only handle environment variables, secrets, and scheduling – the agent’s logic remains untouched. For example, a nightly workflow can trigger the CLI tester on MacOS, Linux, and Windows runners simultaneously. The same skill file that worked on your laptop now runs autonomously in production, providing consistent results.

Move to Step 5 for iteration

Step 5: Iterate and Expand Your Fleet

Once your first agent is running in CI, monitor its reports and performance. Use the insights to refine the skill file – add new responsibilities, adjust decision rules, or improve prompting. Then, repeat the cycle for other roles: create a skill, test locally, add to CI. Docker’s fleet, for instance, grew to seven roles covering testing, triage, release notes, and bug fixing. Over time, you can schedule agents to run on different triggers (pull requests, nightly, weekly) and even let them collaborate (e.g., a tester agent files an issue, a triage agent reads it and auto-assigns). Keep each skill focused and independent to avoid conflicts.

Check the Tips section

Tips for Success

Start small – begin with one agent role that addresses your biggest bottleneck, then expand.
Invest in local debugging – the ability to see the agent think in your terminal is invaluable. Use it.
Keep skills self-contained – a skill should contain everything the agent needs to perform its role, with minimal external dependencies.
Document agent behaviors – maintain a log of what each skill does and how it evolves, so team members understand the fleet.
Monitor agent actions – especially in early stages, review agent outputs to catch hallucinations or unintended actions.
Embrace failure as feedback – when an agent makes a mistake, update the skill file to improve its judgment.
Share skills across team – everyone can run the same skill locally, making collaboration and debugging easier.

Building an AI agent fleet is not about replacing your team – it’s about augmenting it. By following these steps, you can automate repetitive tasks, reduce manual toil, and ship more confidently. The Docker team proved that a virtual squad of agents can dramatically speed up development, and now you have the blueprint to do the same. Happy coding!

Tags: