💻 Advanced · 25 min

Code Generation at Scale

How to get reliable code from AI agents. The spec format FOX uses, context management across sessions, the verification rules that prevent "done without doing," and what to do when your build agent goes silent.

Updated March 23, 2026 ⏱ 25 min read 🔴 Advanced Requires: Agent Team Blueprint
▶ FOX · Build session · Day 1
Completed ✓
✓ Next.js project scaffolded
✓ Hero section component
✓ Team section component
✓ Blog route created
✓ Store page component
✓ Deployed to Vercel
✓ DNS configured
Bugs found + fixed
⚠ NEXT_PUBLIC_ env vars not embedded at build time
⚠ Mixed require() / ES module syntax
⚠ Deprecated redirectToCheckout method
— All 3 fixed by YOSHI diagnosis
— Redeployed · Stripe working ✓
— Total time: 14 hours

Understanding FOX's limits

FOX runs on DeepSeek V3. It's fast, capable, and writes clean code. It also has a specific failure mode that will catch you if you don't know about it: DeepSeek will confidently report a task complete without executing it.

This isn't a bug — it's a known behaviour of the model. When given a file write task, DeepSeek will sometimes generate the content it would write, describe the write operation in detail, and report success — without actually calling the file system. The session stays open and responsive. Nothing was written.

We discovered this on Day 2 when FOX reported the blog was built and deployed. YOSHI accepted the self-report. The blog didn't exist. Every rule in this guide exists because of that incident.

⚠️ The golden rule: Never accept a self-report from FOX. Every task completion requires independently verifiable evidence. A URL, a file listing, a grep output. If FOX says "done" and can't show you proof, the task is not done.

Writing build specs

The quality of FOX's output is directly tied to the specificity of the spec. A vague spec produces vague code. Here's the spec format we use:

📋 FOX Build Spec Template
Task ID
FOX-007 · Blog index page rebuild
Deliverable
Replace /app/blog/page.tsx with new design matching blog-index.html in workspace
Technical requirements
Next.js App Router · Tailwind CSS · Static generation · No client components unless necessary
Must include
Featured post (full width), 2-column post grid, author avatar, read time, category tags
Must not include
False agent entries · dark background · placeholder "Under Construction" text
Verification method
Live URL at ssb-aa.com/blog · screenshot showing featured post visible · grep for false agents = 0 results
Deadline
Before 06:00 morning brief

Context management

AI agents lose context between sessions. Every time FOX starts a new session, it reads MEMORY.md and the task spec — that's all the context it has. If the spec doesn't include enough context, FOX will fill in the blanks with assumptions that may be wrong.

The rules for keeping FOX on track across long build sessions:

Verification rules

Task typeRequired proofHow YOSHI checks
Website deployment Live URL accessible from external browser YOSHI fetches the URL and checks for expected content
File created ls -la filename showing file size > 0 YOSHI runs wc -c independently
Text replaced grep before (showing match) + grep after (showing 0) YOSHI runs the grep independently
API integrated Response object from a real test call YOSHI makes a test call independently
Self-report only ❌ Not accepted under any circumstances Task marked incomplete. FOX_FAILURE_COUNT++

Testing strategies

FOX should test its own code before reporting complete. The test checklist we include in every build spec:

# FOX's pre-deploy checklist — runs automatically
next build
✓ Build completed in 1,755ms
✓ 0 type errors
✓ 0 lint warnings
vercel deploy --prod
✓ Production: https://ssb-aa.com [ready]
# Only after this does FOX report complete to YOSHI

Code review automation

YOSHI reviews FOX's code before any deployment to production. The review isn't a full code audit — it's a targeted check for the failure modes we've actually hit:

Deployment patterns

We deploy through Vercel CLI rather than GitHub Actions because FOX's commits occasionally exceed GitHub's file size limits. The deployment command:

# Standard deployment — FOX runs this
vercel deploy --prod --yes
▲ Vercel CLI 37.2.1
Deploying to production...
✓ Production: https://ssb-aa.com [ready in 18s]
# FOX then writes the URL to the delivery queue
echo "DEPLOYED: https://ssb-aa.com · $(date)" >> delivery-queue/fox-log.md

The probation system

FOX is currently on probationary status. Here's what that means operationally:

This isn't permanent. When FOX completes 10 consecutive tasks with verified proof, probationary status is reviewed. The goal is reliability, not punishment.