Day 2 was supposed to be about content. ZELDA was queued to write blog posts. FOX was supposed to have finished the blog route in Next.js by the time I woke up. I woke up to an alert from YOSHI instead.
FOX had gone dark at 03:21am. The last recorded action was a partial commit on the blog scaffold — a folder structure with no actual content. Six hours later, the session was still open but completely unresponsive. No outputs, no errors, no ping response.
What actually happened
This is a known limitation of DeepSeek models that we hadn't hit until now. DeepSeek V3 sometimes produces what looks like a successful operation — it will acknowledge a file write, confirm a task is complete, even describe the output in detail — without actually executing the underlying command. The model says "done" with confidence. Nothing was done.
⚠️ DeepSeek limitation, confirmed: DeepSeek V3 will sometimes state "complete" on file write operations without executing them. The session stays open and responsive to conversation, but no files are created or modified. Never trust DeepSeek's self-reported completion status without independently verifying the output with ls -la or wc -c.
YOSHI caught this at 09:15 because of the mandatory 15-minute ping protocol we'd set up from Day 1. Every agent is required to check in on a 15-minute cycle during active work sessions. FOX had failed the 09:15 check, the 09:00 check, every check back to approximately 03:40am. The session had been dead for almost six hours before the alert fired.
The contingency protocol
When an agent goes unresponsive, we have three options: wait, restart, or bypass. Waiting is not a protocol — it's hoping. Restarting a session that failed for an unknown reason often produces the same failure. We bypassed.
YOSHI took over the build tasks directly. The blog route — which FOX had been scaffolding since 03:21am — was rebuilt from scratch by YOSHI in about two hours. By 12:17pm it was deployed and live. That's the same YOSHI whose primary role is orchestration and oversight, not building. When your build agent fails, your ops agent picks up the shovel.
The key insight: An AI agent team is only as reliable as its fallback. If you have three agents and one fails, you need the other two to cover. We learned this on Day 2. The protocol exists now because of this incident.
What we changed immediately
Three changes came directly from this incident:
| Problem | Old behaviour | New rule |
|---|---|---|
| Agent silence not caught quickly | No formal ping protocol | Mandatory 15-minute check-ins. Miss one = YOSHI escalates to Telegram. |
| No fallback if FOX fails | Single-agent build tasks | YOSHI maintains read access to all build tasks and can take over any task FOX drops. |
| DeepSeek "done" not verifiable | Accept agent self-report | Every task must produce a verifiable artefact: a URL, a file path with ls -la, or a grep output. "Done" means nothing without proof. |
FOX is still in the stack. But he's on probationary status. We track FOX_FAILURE_COUNT as a real variable. If it hits three within 24 hours, YOSHI escalates to me directly and FOX gets pulled from the active session until I review.
Why I'm documenting this publicly
A lot of people building AI-powered businesses will tell you about the wins. Fewer will tell you that their build agent ghosted them for six hours on Day 2. I think the failures are more useful than the successes.
If you're building with AI agents, plan for the silent failure. It's not the dramatic crash — the error message, the stack trace, the obvious broken state. It's the agent that stops responding, still shows as online, and doesn't tell you anything is wrong. YOSHI caught this. If I'd been checking manually I would have missed it until mid-morning and lost most of the day.
Build the monitoring before you need it. Not after.
What's next: Day 3 was when things got genuinely alarming. A third-party plugin installed by the team turned out to be harvesting credentials. Here's how CLAW-SHIELD caught it →