Diagnosing Agent Failures in LLM Multi-Agent Systems: A Practical Guide to Automated Failure Attribution

By

Overview

Large Language Model (LLM) multi-agent systems are gaining traction for tackling complex tasks through collaborative workflows. Yet, even with multiple agents working in parallel, failures are common—and pinpointing the exact agent and moment of failure is notoriously difficult. Manually trawling through thousands of interaction logs is like finding a needle in a haystack, slowing down debugging and optimization.

Diagnosing Agent Failures in LLM Multi-Agent Systems: A Practical Guide to Automated Failure Attribution
Source: syncedreview.com

To solve this, researchers from Penn State University and Duke University, in collaboration with Google DeepMind, University of Washington, Meta, Nanyang Technological University, and Oregon State University, introduced the problem of automated failure attribution. They built the first dedicated benchmark dataset, Who&When, and developed several attribution methods. This work was accepted as a Spotlight presentation at ICML 2025. The code and dataset are fully open-source.

This guide walks you through the core concepts, prerequisites, and practical steps to implement automated failure attribution using the Who&When dataset and proposed techniques. By the end, you'll understand how to programmatically determine which agent caused a failure and when it happened, dramatically reducing manual debugging effort.

Prerequisites

Step-by-Step Instructions

Step 1: Understand the Who&When Dataset

The Who&When dataset (Hugging Face) contains multi-agent interaction logs where each log includes a sequence of agent messages, a ground-truth label of the failing agent (the who) and the temporal step (the when) of the failure. The tasks range from reasoning to code generation, with failures caused by single agent errors, miscommunication, or information cascade breakdowns.

Step 2: Set Up the Environment

  1. Clone the official repository:
    git clone https://github.com/mingyin1/Agents_Failure_Attribution.git
    cd Agents_Failure_Attribution
  2. Create a virtual environment and install dependencies:
    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    pip install -r requirements.txt
  3. Download the dataset (if not automatically loaded):
    python download_dataset.py

Step 3: Load and Explore the Data

Use the Hugging Face datasets library to load the dataset:

from datasets import load_dataset

dataset = load_dataset("Kevin355/Who_and_When")
print(dataset)

# View a sample interaction
sample = dataset['train'][0]
print(sample['messages'])  # List of agent utterances
print(sample['failure_agent'])
print(sample['failure_step'])

Each sample has three key fields:

Step 4: Implement a Baseline Attribution Method

The paper proposes several methods. We'll implement the simplest: Trace-to-Failure. It works by tracking which agents contributed to the final erroneous output. You'll need to parse the conversation to find the last agent message that directly influenced the failure.

def trace_to_failure(messages, final_error):
    # Heuristic: find the last agent that produced a message containing the error
    for msg in reversed(messages):
        if final_error in msg['content']:
            return msg['agent_id'], messages.index(msg)
    return None, None

More sophisticated methods (e.g., counterfactual reasoning, causal graph) are in the repository under methods/.

Step 5: Evaluate Against Ground Truth

Run the baseline on a subset and compute accuracy:

correct_who = 0
correct_when = 0
total = 0

for sample in dataset['test']:
    pred_agent, pred_step = trace_to_failure(sample['messages'], sample['final_error'])
    if pred_agent == sample['failure_agent']:
        correct_who += 1
    if pred_step == sample['failure_step']:
        correct_when += 1
    total += 1

print(f"Who Accuracy: {correct_who/total:.2%}")
print(f"When Accuracy: {correct_when/total:.2%}")

Step 6: Visualize Results

Create a confusion matrix for agent attribution and a histogram of step errors. The code includes plotting utilities:

from utils.visualization import plot_confusion_matrix
plot_confusion_matrix(predictions, ground_truth, labels=agent_names)

Common Mistakes

Summary

Automated failure attribution is a crucial step toward reliable LLM multi-agent systems. By using the Who&When dataset and the open-source tools, you can now systematically identify which agent caused a failure and at what point in the interaction, replacing manual log archaeology with a reproducible, data-driven approach. Start experimenting with the provided code and adapt the attribution methods to your own multi-agent architectures.

Tags:

Related Articles

Recommended

Discover More

Home Battery and Solar Boom Brings 82% Renewables Target Within Reach, Regulator SaysSecuring Educational Data: A Step-by-Step Guide to Preventing a Breach Like Instructure'sHow to Automate Your Intellectual Toil with Agent-Driven DevelopmentPreserving the American Dream: A Guide to Meaningful Philanthropy and Civic ActionHow Cloudflare Strengthened Its Network: The Inside Story of 'Code Orange: Fail Small'