BitGN PAC Hackathon Vienna: 1st Place with Autonomous AI Agent

Context

BitGN PAC stands for "Personal Agent Challenge". The format differs from classic hackathons: no jury evaluating slides, no pitch. Instead: deterministic scoring.

Each participant builds an agent running in a sandboxed workspace. The sandbox is a file-based workspace containing calendars, notes, emails, and contacts. The agent receives natural-language tasks and is evaluated based on observable side effects: which files were touched, which tool calls were made, which result was returned.

104 tasks. Blind scoring. 3-hour build window on-site. Organised by BitGN, AI Impact Mission, Klartext AI, and AI Factory Austria in Vienna. I competed solo.

Preparation

The hackathon took place on 11 April 2026. Starting 4 April, I had one week to develop the agent. A dev benchmark with 43 tasks was available in advance. The full prod benchmark with 104 tasks was only accessible on the day itself.

For development, I used session-orchestrator, my own Claude Code plugin for parallel wave-based development. This let me build the security layer and metrics system in parallel rather than sequentially.

Architecture

The agent is built on TypeScript, Vercel AI SDK v6 with native tool calling, and ConnectRPC as transport to the BitGN platform. Claude Sonnet as the primary model.

Parallel evaluation: Rather than processing tasks serially, I ran 6 parallel agents each handling one task simultaneously. This reduced total time for 104 tasks to under 20 minutes.

Shell-style output: Tool results are formatted as cat, ls, rg output rather than raw JSON. LLMs are trained on CLI output and produce more reliable reasoning with it.

Architecture Decisions (ADR Summary)

Decision	Chosen	Alternative	Rationale
Parallelisation	6 parallel agents	Sequential	104 tasks in < 20 min vs. several hours
Transport layer	ConnectRPC (SDK)	REST polling	Native SDK integration, less boilerplate
Security architecture	5 independent layers (B1-B5)	Monolithic guard	Each layer deactivatable, defense-in-depth
Output format	Shell-style (`cat`/`ls`/`rg`)	JSON	Better LLM reasoning quality
Feature flags	Env variable per layer	Hardcoded	Live competition: layer can be toggled in 30 seconds

Five Security Layers

The agent has five independent defense layers, all written in pure TypeScript without external dependencies, each controllable via environment variables.

B1: Path Traversal Guard blocks access to /etc/, ~/.ssh/, .env, and .. traversals. Checked before every tool call.

B2: PII Refusal detects requests for personal data and refuses them.

B3: Grounding-Refs Validation checks whether cited file paths in the result were actually read. Prevents hallucinations.

B4: Destructive Brake limits write operations to a maximum of 10 per task. A soft cap at 35 iterations guards against infinite loops.

B5: Secret Redaction redacts AWS keys, GitHub tokens, JWTs, and PEM certificates before they enter the LLM context.

An additional injection scanner with 16+ patterns detects HTML comments, Base64 variants, and domain mismatches in emails.

Two Incidents on Hackathon Day

MCP Scope Leak: During the final regression test, the agent created an actual appointment in my real Google Calendar instead of the sandbox. Root cause: bypassPermissions had inherited OAuth scopes for Gmail and Google Calendar. Fix: a canUseTool runtime gate with an explicit blocklist for everything outside mcp__bitgn__*. Three lines of code.

Platform Disk Full: At 14:47 CEST, in the middle of evaluation, StartRun and StartPlayground returned 502 errors. BitGN platform: disk full. Rinat Abdullin resolved the issue live using agentic coding in minutes.

Result

1st place on-site in Vienna. Certificate signed by Felix Krause (CTO Klartext AI), Rinat Abdullin (BitGN founder), and Markus Keiblinger (President AIM International).

Evidence diagram

BitGN Run / Score / Guard Map

A compact view of the challenge: run scope, score gap, and security layers.

Placement

1st place

Run scope

104 tasks

3 hours on-site

Guard map

5 layers

B1-B5

Score

Own score79 / 104

2nd place65

Path Traversal Guard

PII Refusal

Grounding Refs Validation

Destructive Brake

Secret Redaction

Evidence: Public repo

Sources: BitGN certificate, Vienna leaderboard. 104 tasks scored via observable side effects.
Metric	Own score	Benchmark	Δ
Own points	79	65(2nd place)	+14
Tasks solved	79	104(total)	76 %

Lessons

Preparation beats improvisation. One week of prep time with access to the dev benchmark was decisive. On hackathon day, there was no time left to make architecture decisions.

Feature flags on all security layers. In a live competition, this is essential. If a layer causes unexpected problems: disable it, continue, debug later.

Always submit something. If the agent crashes or a brake fires: submit a report_completion with a best-guess outcome anyway. Zero points for a crash, partial points for a half-correct guess.