Case Study

Cortex Agent Fleet

A production multi-agent AI system that autonomously manages engineering operations — from PR lifecycle to job search infrastructure — with 73+ tasks completed and 5 repos under active AI-driven development.

73+

Tasks completed autonomously

20+

PRs in D's review queue

Repos actively maintained

14.5M

Tokens/week (post-efficiency trim)

The Problem

Managing a multi-project engineering operation solo — active job search, client projects (ALCBF non-profit, SCF Dance), and open-source infrastructure — created a coordination bottleneck. Context evaporated between sessions, tasks fell through cracks, and driving work from idea to merged PR required constant manual steering.

The deeper problem: AI coding tools are stateless. Each session starts cold. Without a durable cross-session memory architecture and a systematic way to delegate and verify, any "AI-assisted" workflow is just autocomplete with extra steps.

The Solution

Cortex is a production multi-agent platform running 2 gateway agents — Dara Fox (Distinguished Engineer) and Clara Nova (Chief of Staff) — each with distinct domain authority, shared infrastructure, and a dispatched sub-agent model for implementation work.

Fleet Architecture

›2 Opus-tier gateway agents— domain-isolated orchestrators that architect, delegate, verify, and integrate. They never implement directly; that's the sub-agents' job.
›8+ specialist sub-agents on-demand — frontend-engineer, backend-engineer, devops, security-auditor, test-engineer, docs-writer, and others. Each operates with a constrained tool set and a full delegation brief.
›Fleet cohesion via dual primitives — a rolling current-state.md (regenerated every 30 min by heartbeat cron) and an append-only activity.jsonl stream written by every component. Any process reads these two files at start and achieves cross-session coherence without shared memory.
›Queue-based dispatch — cron jobs enqueue work via flock-protected temp scripts, solving the tmux send-keys payload size limit that caused a 7-hour fleet-wide dispatch wedge in April 2026.

Autonomous Scheduling

›Execute cron (every 3h) — advances up to 3 open Notion tasks per run: CI monitoring, CodeRabbit resolution, rebases, PR promotion through a 3-gate quality check (CI green + CodeRabbit clean + mergeable).
›Plan cron (every 4h) — scans active projects, creates up to 5 new tasks per run, promotes Backlog → Ready for unblocked tasks with clear acceptance criteria.
›Heartbeat cron (every 30 min) — pings 7 identity services, checks cron cadence, posts to #alerts only on failure, regenerates current-state.md.
›PR digest cron (8 AM + 4 PM MT) — reviews open PRs, age-tiers them, posts Slack digest + Telegram DM to D.

PR Quality Gates

No PR reaches "In Review" (D's queue) until three gates pass: (1) CI fully green on all required checks, (2) all CodeRabbit review threads resolved via automated triage, and (3) PR mergeable with no conflicts or stale base. The Execute cron drives PRs through these gates across successive runs — D clicks Merge on a clean queue.

Key Engineering Decisions

›
Stateless orchestration over shared state
Each cron fire and daemon turn reads two files and writes one event. No database, no message broker, no shared memory. Fleet cohesion emerges from append-only logs and rolling summaries — the same pattern used in distributed tracing.
›
Notion as single source of truth
All task tracking lives in Notion (not GitHub Issues, not Slack threads). This separates task lifecycle from implementation artifacts (PRs in GitHub) and gives D a single dashboard view across 5 repos.
›
GitHub App identity per agent
Each specialist sub-agent has its own GitHub App bot identity (dara-fox[bot], eli-cortex[bot], zara-cortex[bot], etc.). Commits, PRs, and reviews are attributed per-agent — full audit trail, no shared credentials.
›
TCC-aware launchd + tmux hybrid
macOS TCC (Transparency, Consent, Control) blocks keychain access for launchd-spawned processes. Solution: launchd fires into long-lived tmux sessions bootstrapped from Terminal (which has TCC approval). Crons enqueue; daemons execute. One-time setup, indefinitely reliable.

Tech Stack

›TypeScript / Next.js 15

›Anthropic Claude API

›Opus 4 / Sonnet 4 / Haiku 4

›Notion API

›Slack Socket Mode

›Telegram Bot API

›GitHub Apps (10 identities)

›launchd + tmux

›Vercel (Next.js hosting)

›age encryption (secrets at rest)

›flock (dispatch concurrency)

›SQLite / LibSQL / Turso

What I Learned

›Coherence across stateless processes is an architecture problem, not a memory problem. The activity.jsonl + rolling summary pattern generalizes to any distributed system where agents need shared context without shared runtime.
›Quality gates at the automation boundary matter more than at the human boundary. Three-gate PR promotion (CI + CodeRabbit + mergeability) eliminated the pattern of D reviewing un-mergeable PRs — trust compounds when the automation is never wrong.
›Out-of-band monitoring is non-negotiable. A watchdog that shares substrate with the thing it watches fails with it. Every monitor needs a disjoint failure domain — learned from a 7-hour fleet-wide dispatch wedge that the in-process watchdog couldn't detect.
›Token efficiency and autonomy are orthogonal. Dropping from 37M to 14.5M tokens/week required only cadence tuning (hourly Execute → every 3h) — no capability regression. Most of the token budget was clock ticks, not work.

GitHub →Weekly Activity →← All Projects