Agent Team Blueprint
The architecture behind a high-performing multi-agent AI team. How to define roles, design handoffs, build reliable orchestration — and what we learned breaking ours on Day 2.
Why teams beat solo agents
A single AI agent trying to do everything is like hiring one person to be your accountant, developer, content writer, and operations manager simultaneously. They can technically do all of it. They'll do none of it well.
The same is true for AI agents. Context window limitations, task complexity, and the fundamental trade-off between speed and quality mean that specialised agents outperform generalist ones on almost every meaningful task. A team of three focused agents — each expert in their domain — will produce better output, catch each other's errors, and run reliably in ways a single overloaded agent cannot.
At SSBAA, we run three agents overnight on 14 cron jobs. Here's the architecture we use and why it works.
The role framework
Every agent on your team needs three things defined clearly before they do a single task:
- A single primary responsibility — one domain, not many. "Build and deploy technical systems" is a role. "Do whatever needs doing" is not.
- Clear input/output contracts — what does this agent receive, and what does it produce? ZELDA receives a content brief and produces a complete draft. FOX receives a spec and produces deployed code with a live URL.
- Escalation rules — when does this agent stop and ask for help? When does it escalate to the orchestrator? When does it escalate to you?
The SSBAA stack
Here are the four components of our agent team, how we chose the model for each, and why:
The YOSHI pattern
The most important architectural decision in a multi-agent team is separating orchestration from execution. YOSHI never builds anything. FOX never coordinates anything. This separation is what makes the team reliable.
Without it, you get agents that try to do both — they start a task, get confused about scope, and either loop or go silent. The YOSHI pattern means there's always one agent whose job is to know the state of everything. When FOX went dark for 6 hours on Day 2, YOSHI caught it because YOSHI's entire job is to know what every agent is doing at every moment.
The rule: Your orchestrator agent should be your most reliable model. Ours is Kimi K2.5 — we switched to it from DeepSeek on Day 3 specifically because we needed an orchestrator that actually writes to files when it says it does. Don't put a flaky model in the YOSHI role.
Designing handoffs
A handoff is what happens when one agent finishes and another picks up. Poorly designed handoffs are where most multi-agent systems break down. Here's the pattern we use:
Shared memory
All agents read from a single MEMORY.md file. This is how they maintain continuity between sessions — YOSHI writes key facts, decisions, and task status here, and all agents read it at session start.
The most important operational rule: MEMORY.md must stay under 8,000 characters. Above that, agents start hallucinating from context overload. We learned this the hard way when MEMORY.md hit 13,739 characters on Day 3 and one of the models invented a false agent entry that was written into the file as a real team member.
Verification rules
This is the single most important operational principle we have: done means nothing without proof.
DeepSeek V3 (FOX's model) will confidently report a task complete without having executed it. We discovered this on Day 2 when FOX reported the blog was built, YOSHI accepted the self-report, and the blog didn't exist. The fix was simple but non-negotiable:
- Code deployed? Show me the live URL. YOSHI visits it.
- File written? Show me
ls -la filenameandwc -c filename. - Text replaced? Show me the grep output before and after.
- API call made? Show me the response object.
There are no exceptions to this. If an agent can't produce verification, the task is not complete.
Configuration
The multi-agent config in OpenClaw lives in ~/.openclaw/openclaw.json. Here's the structure we use:
Handling failure
Agents fail. Plan for it before it happens, not after. The three failure modes we've hit and how we handle each:
Silent failure (agent stops responding)
Mandatory 15-minute ping protocol. If an agent misses a ping, YOSHI escalates to Telegram immediately. Don't wait for morning to find out your build agent went dark at 3am.
Confident wrong answer (agent reports complete without executing)
The verification rules above. No self-reporting accepted. Every task completion requires independently verifiable evidence.
Memory hallucination (agent invents facts)
Keep MEMORY.md under 8,000 characters. Run the identity check — YOSHI cross-references every agent name in memory against the verified agent registry. Unknown entity in memory = immediate alert.
⚠️ Don't learn these lessons the hard way. We had a silent failure on Day 2, confident wrong answers from Day 1, and a memory hallucination on Day 3. All three are documented in the build log. The protocols above exist because each of these happened to us.