Skip to content
Bernhard Götzendorfer
hackathon

BitGN PAC Hackathon Vienna: 1st Place with Autonomous AI Agent

Autonomous AI agent built for the BitGN Personal Agent Challenge hackathon in Vienna. Solo development over one week, 104 tasks, 5 security layers. Result: 1st place on-site.

1st Place

79 of 104 tasks solved · 2nd place had 65 points · Solo development, 3 hours on-site

Client: BitGN / AI Factory Austria2026TypeScript · Vercel AI SDK · Claude Sonnet · ConnectRPC

Context

BitGN PAC stands for "Personal Agent Challenge". The format differs from classic hackathons: no jury evaluating slides, no pitch. Instead: deterministic scoring.

Each participant builds an agent running in a sandboxed workspace. The sandbox is a file-based workspace containing calendars, notes, emails, and contacts. The agent receives natural-language tasks and is evaluated based on observable side effects: which files were touched, which tool calls were made, which result was returned.

104 tasks. Blind scoring. 3-hour build window on-site. Organised by BitGN, AI Impact Mission, Klartext AI, and AI Factory Austria in Vienna. I competed solo.

Preparation

The hackathon took place on 11 April 2026. Starting 4 April, I had one week to develop the agent. A dev benchmark with 43 tasks was available in advance. The full prod benchmark with 104 tasks was only accessible on the day itself.

For development, I used session-orchestrator, my own Claude Code plugin for parallel wave-based development. This let me build the security layer and metrics system in parallel rather than sequentially.

Architecture

The agent is built on TypeScript, Vercel AI SDK v6 with native tool calling, and ConnectRPC as transport to the BitGN platform. Claude Sonnet as the primary model.

Parallel evaluation: Rather than processing tasks serially, I ran 6 parallel agents each handling one task simultaneously. This reduced total time for 104 tasks to under 20 minutes.

Shell-style output: Tool results are formatted as cat, ls, rg output rather than raw JSON. LLMs are trained on CLI output and produce more reliable reasoning with it.

Architecture Decisions (ADR Summary)

DecisionChosenAlternativeRationale
Parallelisation6 parallel agentsSequential104 tasks in < 20 min vs. several hours
Transport layerConnectRPC (SDK)REST pollingNative SDK integration, less boilerplate
Security architecture5 independent layers (B1-B5)Monolithic guardEach layer deactivatable, defense-in-depth
Output formatShell-style (cat/ls/rg)JSONBetter LLM reasoning quality
Feature flagsEnv variable per layerHardcodedLive competition: layer can be toggled in 30 seconds

Five Security Layers

The agent has five independent defense layers, all written in pure TypeScript without external dependencies, each controllable via environment variables.

B1: Path Traversal Guard blocks access to /etc/, ~/.ssh/, .env, and .. traversals. Checked before every tool call.

B2: PII Refusal detects requests for personal data and refuses them.

B3: Grounding-Refs Validation checks whether cited file paths in the result were actually read. Prevents hallucinations.

B4: Destructive Brake limits write operations to a maximum of 10 per task. A soft cap at 35 iterations guards against infinite loops.

B5: Secret Redaction redacts AWS keys, GitHub tokens, JWTs, and PEM certificates before they enter the LLM context.

An additional injection scanner with 16+ patterns detects HTML comments, Base64 variants, and domain mismatches in emails.

Two Incidents on Hackathon Day

MCP Scope Leak: During the final regression test, the agent created an actual appointment in my real Google Calendar instead of the sandbox. Root cause: bypassPermissions had inherited OAuth scopes for Gmail and Google Calendar. Fix: a canUseTool runtime gate with an explicit blocklist for everything outside mcp__bitgn__*. Three lines of code.

Platform Disk Full: At 14:47 CEST, in the middle of evaluation, StartRun and StartPlayground returned 502 errors. BitGN platform: disk full. Rinat Abdullin resolved the issue live using agentic coding in minutes.

Result

1st place on-site in Vienna. Certificate signed by Felix Krause (CTO Klartext AI), Rinat Abdullin (BitGN founder), and Markus Keiblinger (President AIM International).

<ScoreDeltaTable rows={[ { category: 'Own points', ownValue: 79, benchmarkValue: 65, benchmarkLabel: '2nd place', delta: '+14', }, { category: 'Tasks solved', ownValue: 79, benchmarkValue: 104, benchmarkLabel: 'total', delta: '76 %', }, ]} headerMetric="Metric" headerOwn="Own score" headerBenchmark="Benchmark" headerDelta="Δ" caption="Sources: BitGN certificate, Vienna leaderboard. 104 tasks scored via observable side effects." />

Lessons

Preparation beats improvisation. One week of prep time with access to the dev benchmark was decisive. On hackathon day, there was no time left to make architecture decisions.

Feature flags on all security layers. In a live competition, this is essential. If a layer causes unexpected problems: disable it, continue, debug later.

Always submit something. If the agent crashes or a brake fires: submit a report_completion with a best-guess outcome anyway. Zero points for a crash, partial points for a half-correct guess.