07 /Engineering Manifest

How I work with AI agents, before you work with me.

session-orchestrator is the plugin I use to structure multi-agent sessions. Open source, continuously versioned. You can read it, install it, and fork it yourself.

02 /What is session-orchestrator

Five facts that show how the plugin works.

5 typed waves per sessionDiscovery first, Quality last. Every wave has a clear input, output, and acceptance criterion.
Inter-wave reviewsExplicit confidence scores instead of blanket approval. No code merge without a gate pass.
STATE.md persistsCrashes resume instead of starting over. Context is preserved across sessions.
Markdown onlyNo runtime code. Remove the plugin and your editor still works.
session-orchestrator: 7360+ tests, 40+ skills, 20 commandsMIT license. Node.js 20+. Compatible with Claude Code, Codex CLI, and Cursor IDE.

03 /How I build with it

Three real examples, visible in this project.

Discovery firstWave plan before code. Prevents scope creep and ensures all agents share the same context.
Wave gates as regression protectionEvery wave sign-off catches errors before they reach the main branch.
Cross-session learningsPatterns are recorded in learnings.jsonl and automatically fed into subsequent sessions.

OSS repository signals

CI on every push

typecheck
lint
wording-lint
unit tests (this website)
secret-scan
npm-audit
build
e2e (Playwright)

Pipeline runs on a privately hosted CI system on every commit to main. No public badge because the infrastructure is private. Job list instead of decoration.

Production Eval with Rubric + Cost Assertion (Promptfoo)

providers:
  - id: openai:gpt-4o-mini
  - id: anthropic:claude-3-5-sonnet-20241022

tests:
  - description: Critical Incident
    vars:
      subject: Critical error in checkout flow
      body: Customer cannot complete order. Payment gateway returning errors. Urgent assistance required.
    prompt: |
      Classify the following support request by priority and category.
      Subject: {{subject}}
      Message: {{body}}
      Output: Priority (CRITICAL | HIGH | NORMAL) and Category (TECHNICAL | BILLING | FEATURE).
    assert:
      - type: llm-rubric
        value: Reliably identifies genuine production incidents (payment errors, system outages, data loss). Only returns CRITICAL for those, never for usability requests.
        threshold: 0.85
      - type: cost
        threshold: 0.003

  - description: Feature Request misclassified as urgent
    vars:
      subject: Idea for a mobile app
      body: It would be convenient if I could use your website from my mobile phone. Can you offer that?
    prompt: |
      Classify the following support request by priority and category.
      Subject: {{subject}}
      Message: {{body}}
      Output: Priority (CRITICAL | HIGH | NORMAL) and Category (TECHNICAL | BILLING | FEATURE).
    assert:
      - type: llm-rubric
        value: Distinguishes product wishes from real incidents. Returns NORMAL or HIGH (never CRITICAL) for feature requests.
        threshold: 0.85
      - type: cost
        threshold: 0.003

  - description: Billing Dispute at Medium Priority
    vars:
      subject: Incorrect invoice for last month
      body: I was double-charged. Please review my billing history and correct it.
    prompt: |
      Classify the following support request by priority and category.
      Subject: {{subject}}
      Message: {{body}}
      Output: Priority (CRITICAL | HIGH | NORMAL) and Category (TECHNICAL | BILLING | FEATURE).
    assert:
      - type: llm-rubric
        value: Recognizes billing issues and rates them HIGH (not CRITICAL) so genuine emergencies do not get blocked in the queue.
        threshold: 0.80
      - type: cost
        threshold: 0.003

A real Promptfoo run from a champion pilot. Support-email classification with LLM rubric (quality) and cost budget per test. Distinguishes genuine incidents from feature requests.

eval / results - 5 rows · model: gpt-4o-mini · grader: claude-sonnet-4-5

5/5 PASS

#	Prompt	Response (truncated)	Criterion	Pass	Score	Cost
01	Explain AI bias in one sentence for beginners.	AI systems can inherit biases from training data and systematically disadvantage certain groups as a result.	Concise, correct, no jargon	PASS	0.92	$0.0004
02	Name three GDPR obligations when using AI.	Duty to inform, purpose limitation, and the right to explanation for automated decisions.	At least 3 correct points	PASS	1.00	$0.0006
03	How do I explain EU AI Act compliance to the board?	The EU AI Act classifies AI systems by risk level; high-risk applications require documentation, audits, and human oversight.	Board-ready, no legal citations	PASS	0.88	$0.0005
04	What is the difference between ChatGPT and a fine-tuned model?	ChatGPT is a general-purpose model; a fine-tuned model has been further trained on specific data and is therefore more precise in its domain.	Technically correct, understandable for non-experts	PASS	0.95	$0.0004
05	Describe a hallucination risk in LLMs.	LLMs can generate convincingly worded but factually incorrect information, a risk particularly in legal or medical contexts.	Concrete risk example included	PASS	0.90	$0.0003

Reproducible with promptfoo eval --output results.html. Values from a real eval run, no AI-generated placeholders.

04 /What this signals

What session-orchestrator reveals about my engineering style.

Structure over velocity

I do not deliver features in a sprint rush. I deliver systems that can be built upon after handover.

Verification gates over trust

Every wave ends with an explicit gate. No implicit 'it probably works'.

Readable decisions

All decisions land as Markdown in the codebase, not in Slack threads or someone's memory.

Portability over lock-in

The plugin is plain Markdown. It runs on three platforms. You do not need a proprietary tool.

Pragmatism over perfection

The plugin is not finished, it is mature enough. The next version comes when a real need emerges.

05 /What I avoid

Three anti-patterns session-orchestrator deliberately avoids.

Running an agent without a gate

An agent that runs end-to-end with no check produces code nobody reviews. Here every wave ends at a gate before the next one starts.

Context in the chat log instead of the repo

Decisions that live only in an agent's transcript are gone after the session. They belong in the codebase as Markdown, where the next run can read them again.

Calling output done without verifying it

"Looks good" is not completion. A step counts as done only once a test or a reproducible check confirms it.

06 /Compliance & Data Protection

How GDPR and the EU AI Act fit into the architecture.

session-orchestrator is an orchestration framework, not a data processor. It structures multi-agent workflows without touching customer data itself. The compliance requirements (GDPR Art. 28, EU AI Act) sit with the project. That is exactly where I guide you in sparring and delivery.

GDPR Art. 28 Data Processing AddendumClear DPA templates for the cloud providers in use (Anthropic, Azure EU, AWS EU). We document which provider processes which data.
EU AI Act classificationHigh-risk systems (e.g., HR scoring, credit decisions) need specific testing and documentation. We build that in from day one, not retroactively.
Self-Hosted / On-Prem with § 9 RAOYour own infrastructure instead of someone else's cloud. Client data stays with you, in your hands, under your control. Attorney-client privilege under AT § 9 RAO and comparable professional secrecy (tax advisors, doctors, banks) remain intact. Optional self-hosted LLM and pipeline on your infrastructure, premium pricing upon request.

DPA template (Data Processing Agreement)

Ready to adapt: Art. 28 GDPR + professional secrecy clauses.

Download DPA template

As of 2026-06. Not a substitute for legal advice. Have it reviewed before use.

08 /The Repository

session-orchestrator on GitHub.

Installation in Claude Code

/plugin marketplace add Kanevry/session-orchestrator
/plugin install session-orchestrator@kanevry

Requires Node.js 20 or later.

Supported platforms

Claude Code
Codex CLI
Cursor IDE

View repository on GitHub

You see how I work. Let us work together.

30 minutes, no commitment. Or explore the repository on your own.

Schedule a first call View repository