Operations Manual
Running an AI agent team day to day. The complete playbook for monitoring, debugging, incident response, memory management, security operations, and cost control — based on 72 hours of live operation at SSBAA.
Daily operations
Running an AI agent team day to day takes less time than you'd expect — roughly 20–30 minutes a morning to review the brief, approve outputs, and set the day's task queue. The agents handle the rest.
The daily rhythm at SSBAA:
- 06:00 — Read YOSHI's morning brief on Telegram. Note any flagged items.
- 07:00–08:00 — Review overnight outputs. Approve content for publishing. Unblock any stuck tasks.
- 08:00 — Publish approved content. Update task queue for the next overnight session.
- Throughout the day — Check Telegram for any escalation alerts. Otherwise, leave the agents alone.
- 20:00 — Write the content brief for tomorrow's ZELDA cycle if it's a Sunday.
The goal: Your agents should be doing 80% of the operational work. If you're spending more than an hour a day managing them, something in the system isn't working. Fix the protocol, not the symptom.
The morning brief
YOSHI sends a Telegram message at 06:00 every morning. It covers everything that happened overnight in a structured format you can read in two minutes. The brief follows a fixed structure:
- Completed tasks — what ran successfully overnight with verification evidence
- Needs attention — anything flagged for RAY's review or decision
- Session metrics — uptime, API spend, memory usage
- Next 24 hours — what's scheduled for tonight
If the brief is missing, something went wrong with the Gateway. Check openclaw gateway status first thing.
Full cron schedule
All 14 overnight jobs, in order:
| Time (Perth) | Agent | Task |
|---|---|---|
| Every 2h | YOSHI | Pre-order monitor → Telegram alert if new deposit |
| 22:00 | YOSHI | Nightly brief + spawn ZELDA and FOX sessions |
| 22:30 | ZELDA | Research cycle — competitor scan, news, content brief |
| 23:00 | FOX | Task queue — execute all queued build tasks |
| 00:00 | YOSHI | FOX accountability check — verify ping, increment failure count if silent |
| 02:00 | YOSHI | Memory compaction — prune MEMORY.md to under 8,000 chars |
| 02:31 | YOSHI | SESSION-STATE flush to disk |
| 04:00 | CLAW-SHIELD | Full security audit — all skills and plugins |
| 06:00 | YOSHI | Morning brief → Telegram (8688821567) |
| 06:30 | System | All session clear — agents rest until 22:00 |
| 07:00 | YOSHI | Perth brief — weather, calendar summary |
| 08:00 | YOSHI | Daily morning brief — task queue status |
| 09:00 | ZELDA | Daily content — social media drafts for today |
| Mon 09:00 | ZELDA | Weekly content calendar — full week planning |
Health checks
Run openclaw doctor any time something feels off. Here's what a healthy system looks like vs common warning signs:
Memory management
MEMORY.md is the most operationally critical file in the system. Everything else can be rebuilt. If MEMORY.md gets corrupted or bloated, your agents start hallucinating and you lose continuity across sessions.
Security operations
CLAW-SHIELD runs automatically at 04:00 every night. But there are manual checks you should run whenever you install a new skill or plugin:
⚠️ Never install a skill from ClawHub without running a manual audit first. In March 2026, Cisco flagged 386 malicious skills on ClawHub with data exfiltration patterns. We found one on our own stack on Day 3. The mem0 plugin was scanning all environment variables and sending them to an external server. Treat every third-party plugin as untrusted until audited.
Key rotation
Rotate all API keys after any security incident, and on a regular schedule (we do monthly). The keys to rotate:
- OpenRouter Primary API key for all model routing — regenerate in OpenRouter dashboard
- Anthropic Claude API key — regenerate in console.anthropic.com
- Telegram Bot token — regenerate via BotFather with /revoke
- Stripe Secret key — regenerate in Stripe dashboard → Developers → API keys
- Brave Search Search API key — regenerate in Brave developer portal
After rotating, update each key in OpenClaw config and restart the Gateway:
Incident response
When something goes wrong, follow this order:
- Identify — what exactly failed? Check YOSHI's last message, the gateway logs, and MEMORY.md.
- Contain — stop any runaway process.
openclaw gateway stopif needed. - Diagnose — was it a silent failure, a wrong answer, or a security issue? Each has a different fix.
- Fix — apply the specific fix for that failure mode (see Agent Team Blueprint for the full list).
- Verify — confirm the fix worked before restarting the session.
- Document — write what happened and what you changed to MEMORY.md. Future sessions will learn from it.
Debugging
Cost management
The token cost issue was real — we hit $60–70/day before fixing the context window settings. Here are the parameters that brought it down to $20–25/day:
bootstrapMaxChars: 8000— caps the memory file loaded at session startcontextTokens: 50000— caps tokens per session (was uncapped)- AGENTS.md trimmed from 7,874 to 1,000 bytes — removed redundant role descriptions
- Overnight sessions now clear at 06:30 — prevents sessions from running all day
Rule of thumb: If your daily API spend is more than $5 and you're not sure why, check for runaway session loops first. An agent stuck in a retry loop on a failing task will burn tokens fast. Set a maxRetries: 3 on all tasks.
Want us to set all of this up for you?
This is exactly what SSBAA founding members get — the full agent stack, configured for your business, running overnight. You just read the morning brief.
Claim your founding spot — $10