Skip to content
Bernhard Götzendorfer
07 /Engineering Manifest

How I work with AI agents, before you work with me.

session-orchestrator is the plugin I use to structure multi-agent sessions. Open source. v3.6.0. You can read it, install it, and fork it yourself.

02 /What is session-orchestrator

Five facts that show how the plugin works.

  • 5 typed waves per sessionDiscovery first, Quality last. Every wave has a clear input, output, and acceptance criterion.
  • Inter-wave reviewsExplicit confidence scores instead of blanket approval. No code merge without a gate pass.
  • STATE.md persistsCrashes resume instead of starting over. Context is preserved across sessions.
  • Markdown onlyNo runtime code. Remove the plugin and your editor still works.
  • v3.6.0, 5129 tests, 36 skills, 16 commandsMIT license. Node.js 20+. Compatible with Claude Code, Codex CLI, and Cursor IDE.
03 /How I build with it

Three real examples, visible in this project.

  1. Discovery firstWave plan before code. Prevents scope creep and ensures all agents share the same context.
  2. Wave gates as regression protectionEvery wave sign-off catches errors before they reach the main branch.
  3. Cross-session learningsPatterns are recorded in learnings.jsonl and automatically fed into subsequent sessions.

OSS repository signals

session-orchestrator current versionsession-orchestrator last commitsession-orchestrator test coverage
Production Eval with Rubric + Cost Assertion (Promptfoo)
providers:
  - id: openai:gpt-4o-mini
  - id: anthropic:claude-3-5-sonnet-20241022

tests:
  - description: Critical Incident
    vars:
      subject: Critical error in checkout flow
      body: Customer cannot complete order. Payment gateway returning errors. Urgent assistance required.
    prompt: |
      Classify the following support request by priority and category.
      Subject: {{subject}}
      Message: {{body}}
      Output: Priority (CRITICAL | HIGH | NORMAL) and Category (TECHNICAL | BILLING | FEATURE).
    assert:
      - type: llm-rubric
        value: Reliably identifies genuine production incidents (payment errors, system outages, data loss). Only returns CRITICAL for those, never for usability requests.
        threshold: 0.85
      - type: cost
        threshold: 0.003

  - description: Feature Request misclassified as urgent
    vars:
      subject: Idea for a mobile app
      body: It would be convenient if I could use your website from my mobile phone. Can you offer that?
    prompt: |
      Classify the following support request by priority and category.
      Subject: {{subject}}
      Message: {{body}}
      Output: Priority (CRITICAL | HIGH | NORMAL) and Category (TECHNICAL | BILLING | FEATURE).
    assert:
      - type: llm-rubric
        value: Distinguishes product wishes from real incidents. Returns NORMAL or HIGH (never CRITICAL) for feature requests.
        threshold: 0.85
      - type: cost
        threshold: 0.003

  - description: Billing Dispute at Medium Priority
    vars:
      subject: Incorrect invoice for last month
      body: I was double-charged. Please review my billing history and correct it.
    prompt: |
      Classify the following support request by priority and category.
      Subject: {{subject}}
      Message: {{body}}
      Output: Priority (CRITICAL | HIGH | NORMAL) and Category (TECHNICAL | BILLING | FEATURE).
    assert:
      - type: llm-rubric
        value: Recognizes billing issues and rates them HIGH (not CRITICAL) so genuine emergencies do not get blocked in the queue.
        threshold: 0.80
      - type: cost
        threshold: 0.003

A real Promptfoo run from a champion pilot. Support-email classification with LLM rubric (quality) and cost budget per test. Distinguishes genuine incidents from feature requests.

04 /What this signals

What session-orchestrator reveals about my engineering style.

Structure over velocity

I do not deliver features in a sprint rush. I deliver systems that can be built upon after handover.

Verification gates over trust

Every wave ends with an explicit gate. No implicit 'it probably works'.

Readable decisions

All decisions land as Markdown in the codebase, not in Slack threads or someone's memory.

Portability over lock-in

The plugin is plain Markdown. It runs on three platforms. You do not need a proprietary tool.

Pragmatism over perfection

v3.6.0 is not finished, it is mature enough. The next version comes when a real need emerges.

06 /Compliance & Data Protection

How GDPR and the EU AI Act fit into the architecture.

session-orchestrator is an orchestration framework, not a data processor. It structures multi-agent workflows without touching customer data itself. The compliance requirements (GDPR Art. 28, EU AI Act) sit with the project. That is exactly where I guide you in sparring and delivery.

  • GDPR Art. 28 Data Processing AddendumClear DPA templates for the cloud providers in use (Anthropic, Azure EU, AWS EU). We document which provider processes which data.
  • EU AI Act classificationHigh-risk systems (e.g., HR scoring, credit decisions) need specific testing and documentation. We build that in from day one, not retroactively.
  • Self-Hosted / On-Prem with § 9 RAOYour own infrastructure instead of someone else's cloud. Client data stays with you, in your hands, under your control. Attorney-client privilege under AT § 9 RAO and comparable professional secrecy (tax advisors, doctors, banks) remain intact. Optional self-hosted LLM and pipeline on your infrastructure, premium pricing upon request.
05 /The Repository

session-orchestrator on GitHub.

Installation in Claude Code

/plugin marketplace add Kanevry/session-orchestrator
/plugin install session-orchestrator@kanevry

Requires Node.js 20 or later.

Supported platforms

  • Claude Code
  • Codex CLI
  • Cursor IDE
View repository on GitHub

You see how I work. Let us work together.

30 minutes, no commitment. Or explore the repository on your own.