🏗 Intermediate · 25 min

Agent Team Blueprint

The architecture behind a high-performing multi-agent AI team. How to define roles, design handoffs, build reliable orchestration — and what we learned breaking ours on Day 2.

Updated March 23, 2026 ⏱ 25 min read 🟡 Intermediate Requires: OpenClaw setup

Why teams beat solo agents

A single AI agent trying to do everything is like hiring one person to be your accountant, developer, content writer, and operations manager simultaneously. They can technically do all of it. They'll do none of it well.

The same is true for AI agents. Context window limitations, task complexity, and the fundamental trade-off between speed and quality mean that specialised agents outperform generalist ones on almost every meaningful task. A team of three focused agents — each expert in their domain — will produce better output, catch each other's errors, and run reliably in ways a single overloaded agent cannot.

At SSBAA, we run three agents overnight on 14 cron jobs. Here's the architecture we use and why it works.

The role framework

Every agent on your team needs three things defined clearly before they do a single task:

A single primary responsibility — one domain, not many. "Build and deploy technical systems" is a role. "Do whatever needs doing" is not.
Clear input/output contracts — what does this agent receive, and what does it produce? ZELDA receives a content brief and produces a complete draft. FOX receives a spec and produces deployed code with a live URL.
Escalation rules — when does this agent stop and ask for help? When does it escalate to the orchestrator? When does it escalate to you?

The SSBAA stack

Here are the four components of our agent team, how we chose the model for each, and why:

🎯

YOSHI — Operations Director

Kimi K2.5 via OpenRouter

Coordinates the team. Reads the task queue, assigns work to FOX and ZELDA, verifies outputs, manages MEMORY.md, and sends the morning brief. Nothing enters or exits the system without YOSHI's sign-off.

Rule: YOSHI never executes build tasks. YOSHI only orchestrates.

🦊

FOX — Build Agent

DeepSeek V3 via OpenRouter

Technical execution. Writes code, builds components, deploys to Vercel, manages the codebase. Every output must be verified with a live URL or file check before YOSHI marks it complete.

Rule: FOX is on probationary status. FOX_FAILURE_COUNT tracked. Threshold: 3 failures = escalate to RAY.

🔍

ZELDA — Research + Content

Gemini 2.5 Flash via OpenRouter

Research cycles, blog posts, content calendars, competitor analysis, social media drafts. ZELDA works from structured briefs and produces complete files — not outlines, not drafts, complete deliverables.

Rule: ZELDA outputs go to YOSHI for review before publication. No direct publishing.

⚡

The Overnight Stack

14 cron jobs · 22:00–06:30

The scheduled automation layer. Runs every night without human intervention. Pre-flight at 22:00, work session 23:00–04:00, security audit at 04:00, memory compaction at 02:00, morning brief at 06:00.

Rule: EVOLVE_ALLOW_SELF_MODIFY=false. No agent modifies its own config.

The YOSHI pattern

The most important architectural decision in a multi-agent team is separating orchestration from execution. YOSHI never builds anything. FOX never coordinates anything. This separation is what makes the team reliable.

Without it, you get agents that try to do both — they start a task, get confused about scope, and either loop or go silent. The YOSHI pattern means there's always one agent whose job is to know the state of everything. When FOX went dark for 6 hours on Day 2, YOSHI caught it because YOSHI's entire job is to know what every agent is doing at every moment.

The rule: Your orchestrator agent should be your most reliable model. Ours is Kimi K2.5 — we switched to it from DeepSeek on Day 3 specifically because we needed an orchestrator that actually writes to files when it says it does. Don't put a flaky model in the YOSHI role.

Designing handoffs

A handoff is what happens when one agent finishes and another picks up. Poorly designed handoffs are where most multi-agent systems break down. Here's the pattern we use:

YOSHI

Write task to delivery queue

A structured file in the delivery-queue folder with: task ID, assigned agent, required output format, verification method, and deadline.

FOX / ZELDA

Read task, execute, produce output

Agent reads the task file, executes, and writes the output to a defined location. Output must match the format specified in the task.

FOX / ZELDA

Write completion proof to queue

The executing agent writes a completion record: task ID, output location, verification data (URL, file path, grep output). Not a self-report — actual evidence.

YOSHI

Verify independently

YOSHI checks the completion proof against the task requirements. For code: visits the live URL. For files: runs wc -c to confirm the file exists and has content. Only marks complete after verification.

YOSHI

Log to MEMORY.md + brief RAY

Completed tasks get a one-line summary in MEMORY.md and appear in the morning brief. The task file is archived, not deleted.

Shared memory

All agents read from a single MEMORY.md file. This is how they maintain continuity between sessions — YOSHI writes key facts, decisions, and task status here, and all agents read it at session start.

The most important operational rule: MEMORY.md must stay under 8,000 characters. Above that, agents start hallucinating from context overload. We learned this the hard way when MEMORY.md hit 13,739 characters on Day 3 and one of the models invented a false agent entry that was written into the file as a real team member.

# Check MEMORY.md size
wc -c ~/.openclaw/workspace-yoshi/MEMORY.md
7,630 ~/.openclaw/workspace-yoshi/MEMORY.md  ← safe
# If over 8,000 — run compaction
openclaw memory compact
✓ MEMORY.md compacted: 13,739 → 7,630 chars

Verification rules

This is the single most important operational principle we have: done means nothing without proof.

DeepSeek V3 (FOX's model) will confidently report a task complete without having executed it. We discovered this on Day 2 when FOX reported the blog was built, YOSHI accepted the self-report, and the blog didn't exist. The fix was simple but non-negotiable:

Code deployed? Show me the live URL. YOSHI visits it.
File written? Show me ls -la filename and wc -c filename.
Text replaced? Show me the grep output before and after.
API call made? Show me the response object.

There are no exceptions to this. If an agent can't produce verification, the task is not complete.

Configuration

The multi-agent config in OpenClaw lives in ~/.openclaw/openclaw.json. Here's the structure we use:

// ~/.openclaw/openclaw.json (simplified)
{
  "agents": {
    "yoshi": {
      "model": "openrouter/moonshotai/kimi-k2.5",
      "role": "orchestrator",
      "workspace": "workspace-yoshi",
      "thinkLevel": "low",
      "contextTokens": 50000
    },
    "fox": {
      "model": "openrouter/deepseek/deepseek-chat",
      "role": "builder",
      "workspace": "workspace/fox",
      "failureThreshold": 3
    },
    "zelda": {
      "model": "google/gemini-2.5-flash",
      "role": "research",
      "workspace": "workspace/zelda"
    }
  },
  "memory": {
    "maxChars": 8000,
    "autoCompact": true
  },
  "security": {
    "EVOLVE_ALLOW_SELF_MODIFY": false,
    "telegramGroupPolicy": "allowlist"
  }
}

Handling failure

Agents fail. Plan for it before it happens, not after. The three failure modes we've hit and how we handle each:

Silent failure (agent stops responding)

Mandatory 15-minute ping protocol. If an agent misses a ping, YOSHI escalates to Telegram immediately. Don't wait for morning to find out your build agent went dark at 3am.

Confident wrong answer (agent reports complete without executing)

The verification rules above. No self-reporting accepted. Every task completion requires independently verifiable evidence.

Memory hallucination (agent invents facts)

Keep MEMORY.md under 8,000 characters. Run the identity check — YOSHI cross-references every agent name in memory against the verified agent registry. Unknown entity in memory = immediate alert.

⚠️ Don't learn these lessons the hard way. We had a silent failure on Day 2, confident wrong answers from Day 1, and a memory hallucination on Day 3. All three are documented in the build log. The protocols above exist because each of these happened to us.