TL;DR
AI coding agents like Claude Code are impressively good -- but they forget everything between sessions. In a project with 139 sessions, 24 rule files, and 14,000+ commits, I show how to build a "harness" that keeps your agent consistent, safe, and productive. It's not the prompts that make the difference. It's the system around them.
AI insights for decision-makers
Weekly. Practical. No spam.
What Is a Harness?
When you work with an AI coding agent, you have a problem: the agent has no memory between sessions. Session 1 and session 100 start with the same knowledge. The agent knows neither your architecture, nor your conventions, nor the mistakes it made yesterday.
A harness is the solution. It's all the files and systems that give the agent the necessary context at every start. Think of it as an onboarding package for a new developer -- except that developer starts fresh every day.
In my case, the harness consists of four layers:
- CLAUDE.md: The main document. Commands, stack, architecture decisions.
- Rules: 24 rule files defining specific standards.
- Memory: Persistent information that applies across sessions.
- Session Handovers: 102 handover documents passing state between sessions.
Layer 1: CLAUDE.md as the Main Document
CLAUDE.md is the first thing the agent reads. It must be short, precise, and current. In my project, it contains:
- Commands:
bun dev,npm test,tsgo --noEmit. The agent needs to know how to verify the code. - Stack: Next.js 16.2.1, React 19, TypeScript 5.9, Tailwind 4. No guessing.
- Architecture: Three Supabase client patterns (Browser, Server, Admin). Who uses what when.
- Quality Gate:
tsgo --noEmit && bun run build && npm test. Before every deploy.
A common mistake: CLAUDE.md gets too long. Stuffing everything in, 500 lines. The agent skims it and forgets half. My solution: stay under 100 lines. Everything detailed lives in the Rules.
Layer 2: Rules as Domain-Specific Standards
24 files in .claude/rules/. Each covers exactly one topic:
error-handling-catch.md: No.catch(() => {}). Every catch must log.api-response-consistency.md: All API responses viaapiResponse.*(). No rawNextResponse.json().test-colocation.md: Tests belong in__tests__/directories next to the source.voice-consistency.md: First-person "I" perspective, no "we" on the website.content-writing-standards.md: No em-dashes. No AI buzzwords.
The key: each rule contains not just the rule, but also anti-patterns and correct examples. An agent that only reads "Use apiResponse" still makes mistakes. An agent that sees a concrete WRONG and CORRECT example makes the right decision.
Layer 3: Session Handovers as Memory Bridges
After every session, I write a handover document. 102 of them over 139 sessions. Each handover contains:
- What was done (commits, changes)
- What's open (open issues, known bugs)
- Next steps (prioritized)
The next session's agent reads the handover and immediately knows where to start. Without handovers, session 140 would start from zero and potentially redo work that was completed in session 138.
Layer 4: Automated Quality Gates
Humans forget things. So do agents. That's why I automate everything that can be checked:
- TypeScript:
tsgo --noEmit. 0 errors as baseline. Non-negotiable. - Build:
bun run build. Missing translations, broken imports, everything surfaces here. - Tests: 7,474 tests in 250 files. Each run takes 27 seconds.
- Content checks: 23 automated checks. Em-dashes, AI buzzwords, i18n parity, CTA repetitions.
- ESLint: Custom rules against
console.log(PII leak prevention) anderror.messagein responses (SEC-015).
The quality gates run before every deploy. The agent cannot deploy without all checks being green.
Book a consultation
30 minutes. Free. No strings attached.
Common Failure Modes Without a Harness
What happens without a harness:
Context Drift: Session 50 starts using patterns that session 30 already replaced. Without rules, there's no reference for what's "current."
Hallucination Accumulation: The agent invents a function that doesn't exist. Without quality gates, this only surfaces in production.
Architecture Erosion: Each session makes small deviations. After 20 sessions, you have five different error-handling patterns in the same project.
Repeated Mistakes: The agent makes the same mistake as 10 sessions ago. Without feedback memory, there's no learning effect.
What Worked Over 139 Sessions
A few patterns that proved themselves:
-
SSOT files for everything that can change. Prices, biographical facts, service definitions. One file, one truth. The agent doesn't invent, it looks up.
-
Composable middleware instead of copy-paste.
withErrorHandler(),withAdminAuth(),withRateLimit(). Three wrappers that combine freely. No boilerplate across 62 admin routes. -
Rules with "why." Not just "Use apiResponse." But: "Because raw NextResponse.json() produces inconsistent error shapes and breaks the tests." Agents that know the "why" apply rules correctly even in edge cases.
-
Feedback memory as a learning loop. When the agent makes a mistake and I correct it, I save that as memory. Next session: same mistake doesn't occur.
The Harness Grows With the Project
In session 1, I had a CLAUDE.md with 30 lines. In session 139, I have 24 rules, 102 handovers, a memory system, and automated quality gates with 23 content checks.
The harness isn't a project you set up once. It grows with every problem you solve. Every mistake that occurs twice becomes a rule. Every architecture decision that needs context becomes an SSOT document.
Conclusion
AI coding agents are tools. Like any tool, the result depends on how you use it. The prompts are the obvious part. The harness is the invisible one. And often the decisive one.
Have an AI project in mind?
Let's analyze your potential together.



