📋 All Levels · 35 min

Operations Manual

Running an AI agent team day to day. The complete playbook for monitoring, debugging, incident response, memory management, security operations, and cost control — based on 72 hours of live operation at SSBAA.

Updated March 23, 2026 ⏱ 35 min read 🟢 All Levels Live operation reference

YOSHI MORNING BRIEF — SSBAA Operations

06:00 · March 23, 2026

Overnight completed (4 tasks)

✓

ZELDA — Weekly content calendar drafted (7 pieces, Mon–Sun)

✓

FOX — Blog index page deployed · ssb-aa.com/blog live

✓

CLAW-SHIELD — Security audit passed · all skills clean

✓

Memory compaction — 13,739 → 7,630 chars ✓

Needs your attention (2 items)

⚠

FOX_FAILURE_COUNT = 1 · FOX on probationary status

⚠

/join page still missing Stripe button · FOX task queued

Daily operations

Running an AI agent team day to day takes less time than you'd expect — roughly 20–30 minutes a morning to review the brief, approve outputs, and set the day's task queue. The agents handle the rest.

The daily rhythm at SSBAA:

06:00 — Read YOSHI's morning brief on Telegram. Note any flagged items.
07:00–08:00 — Review overnight outputs. Approve content for publishing. Unblock any stuck tasks.
08:00 — Publish approved content. Update task queue for the next overnight session.
Throughout the day — Check Telegram for any escalation alerts. Otherwise, leave the agents alone.
20:00 — Write the content brief for tomorrow's ZELDA cycle if it's a Sunday.

The goal: Your agents should be doing 80% of the operational work. If you're spending more than an hour a day managing them, something in the system isn't working. Fix the protocol, not the symptom.

The morning brief

YOSHI sends a Telegram message at 06:00 every morning. It covers everything that happened overnight in a structured format you can read in two minutes. The brief follows a fixed structure:

Completed tasks — what ran successfully overnight with verification evidence
Needs attention — anything flagged for RAY's review or decision
Session metrics — uptime, API spend, memory usage
Next 24 hours — what's scheduled for tonight

If the brief is missing, something went wrong with the Gateway. Check openclaw gateway status first thing.

Full cron schedule

All 14 overnight jobs, in order:

Time (Perth)	Agent	Task
Every 2h	YOSHI	Pre-order monitor → Telegram alert if new deposit
22:00	YOSHI	Nightly brief + spawn ZELDA and FOX sessions
22:30	ZELDA	Research cycle — competitor scan, news, content brief
23:00	FOX	Task queue — execute all queued build tasks
00:00	YOSHI	FOX accountability check — verify ping, increment failure count if silent
02:00	YOSHI	Memory compaction — prune MEMORY.md to under 8,000 chars
02:31	YOSHI	SESSION-STATE flush to disk
04:00	CLAW-SHIELD	Full security audit — all skills and plugins
06:00	YOSHI	Morning brief → Telegram (8688821567)
06:30	System	All session clear — agents rest until 22:00
07:00	YOSHI	Perth brief — weather, calendar summary
08:00	YOSHI	Daily morning brief — task queue status
09:00	ZELDA	Daily content — social media drafts for today
Mon 09:00	ZELDA	Weekly content calendar — full week planning

Health checks

Run openclaw doctor any time something feels off. Here's what a healthy system looks like vs common warning signs:

Gateway

Healthy

WebSocket running on port 18789. All agent sessions active. No reconnection events in last hour.

ws://127.0.0.1:18789 · uptime 8h 14m

Memory

Healthy

MEMORY.md within safe limit. Auto-compaction running. No hallucination events detected.

7,630 / 8,000 chars · 95% full

FOX

Caution

On probationary status. FOX_FAILURE_COUNT = 1. Last verified output 6 hours ago. Ping monitoring active.

FOX_FAILURE_COUNT: 1 / 3

API spend

Normal

Overnight spend within budget. Context token limit set to 50,000 per session. No runaway loops detected.

$1.84 overnight · $22 / month est.

Memory management

MEMORY.md is the most operationally critical file in the system. Everything else can be rebuilt. If MEMORY.md gets corrupted or bloated, your agents start hallucinating and you lose continuity across sessions.

# Check memory size
wc -c ~/.openclaw/workspace-yoshi/MEMORY.md
7,630 bytes ← safe (under 8,000)
# Manual compaction if over limit
openclaw memory compact --agent yoshi
✓ Compacted: 13,739 → 7,630 chars
# Check for hallucinated agent names
grep -i "false-agent\|hallucinated-agent\|unknown-agent" ~/.openclaw/workspace-yoshi/MEMORY.md
No matches ← clean

Security operations

CLAW-SHIELD runs automatically at 04:00 every night. But there are manual checks you should run whenever you install a new skill or plugin:

# Manual security audit
openclaw shield audit --full
Scanning 6 installed skills...
✓ openclaw-core — clean
✓ yoshi-skill — clean
✓ fox-skill — clean
✓ zelda-skill — clean
✓ claw-shield — clean
✓ blog-builder — clean
All skills passed. No threats detected.

⚠️ Never install a skill from ClawHub without running a manual audit first. In March 2026, Cisco flagged 386 malicious skills on ClawHub with data exfiltration patterns. We found one on our own stack on Day 3. The mem0 plugin was scanning all environment variables and sending them to an external server. Treat every third-party plugin as untrusted until audited.

Key rotation

Rotate all API keys after any security incident, and on a regular schedule (we do monthly). The keys to rotate:

OpenRouter Primary API key for all model routing — regenerate in OpenRouter dashboard
Anthropic Claude API key — regenerate in console.anthropic.com
Telegram Bot token — regenerate via BotFather with /revoke
Stripe Secret key — regenerate in Stripe dashboard → Developers → API keys
Brave Search Search API key — regenerate in Brave developer portal

After rotating, update each key in OpenClaw config and restart the Gateway:

openclaw config set openrouter_api_key sk-or-NEW-KEY
openclaw config set telegram_bot_token NEW-TOKEN
openclaw gateway restart
✓ Gateway restarted with updated credentials

Incident response

When something goes wrong, follow this order:

Identify — what exactly failed? Check YOSHI's last message, the gateway logs, and MEMORY.md.
Contain — stop any runaway process. openclaw gateway stop if needed.
Diagnose — was it a silent failure, a wrong answer, or a security issue? Each has a different fix.
Fix — apply the specific fix for that failure mode (see Agent Team Blueprint for the full list).
Verify — confirm the fix worked before restarting the session.
Document — write what happened and what you changed to MEMORY.md. Future sessions will learn from it.

Debugging

# Gateway not responding
openclaw gateway status
✗ Gateway not running
openclaw gateway start --daemon
✓ Gateway started
# Agent not responding to pings
openclaw ping yoshi
✗ No response after 30s
openclaw session restart yoshi
# Config syntax error (happened Day 3)
openclaw config validate
✗ Syntax error at line 198
cp openclaw.json.bak openclaw.json
✓ Restored from backup

Cost management

The token cost issue was real — we hit $60–70/day before fixing the context window settings. Here are the parameters that brought it down to $20–25/day:

bootstrapMaxChars: 8000 — caps the memory file loaded at session start
contextTokens: 50000 — caps tokens per session (was uncapped)
AGENTS.md trimmed from 7,874 to 1,000 bytes — removed redundant role descriptions
Overnight sessions now clear at 06:30 — prevents sessions from running all day

Rule of thumb: If your daily API spend is more than $5 and you're not sure why, check for runaway session loops first. An agent stuck in a retry loop on a failing task will burn tokens fast. Set a maxRetries: 3 on all tasks.

Want us to set all of this up for you?

This is exactly what SSBAA founding members get — the full agent stack, configured for your business, running overnight. You just read the morning brief.

Claim your founding spot — $10

97 spots remaining · $29/month at launch · Price locked forever