BitGN PAC Hackathon Vienna: 1st Place with Autonomous AI Agent
Autonomous AI agent built for the BitGN Personal Agent Challenge hackathon in Vienna. Solo development over one week, 104 tasks, 5 security layers. Result: 1st place on-site.
1st Place
79 of 104 tasks solved · 2nd place had 65 points · Solo development, 3 hours on-site
Context
BitGN PAC stands for "Personal Agent Challenge". The format differs from classic hackathons: no jury evaluating slides, no pitch. Instead: deterministic scoring.
Each participant builds an agent running in a sandboxed workspace. The sandbox is a file-based workspace containing calendars, notes, emails, and contacts. The agent receives natural-language tasks and is evaluated based on observable side effects: which files were touched, which tool calls were made, which result was returned.
104 tasks. Blind scoring. 3-hour build window on-site. Organised by BitGN, AI Impact Mission, Klartext AI, and AI Factory Austria in Vienna. I competed solo.
Preparation
The hackathon took place on 11 April 2026. Starting 4 April, I had one week to develop the agent. A dev benchmark with 43 tasks was available in advance. The full prod benchmark with 104 tasks was only accessible on the day itself.
For development, I used session-orchestrator, my own Claude Code plugin for parallel wave-based development. This let me build the security layer and metrics system in parallel rather than sequentially.
Architecture
The agent is built on TypeScript, Vercel AI SDK v6 with native tool calling, and ConnectRPC as transport to the BitGN platform. Claude Sonnet as the primary model.
Parallel evaluation: Rather than processing tasks serially, I ran 6 parallel agents each handling one task simultaneously. This reduced total time for 104 tasks to under 20 minutes.
Shell-style output: Tool results are formatted as cat, ls, rg output rather than raw JSON. LLMs are trained on CLI output and produce more reliable reasoning with it.
Architecture Decisions (ADR Summary)
| Decision | Chosen | Alternative | Rationale |
|---|---|---|---|
| Parallelisation | 6 parallel agents | Sequential | 104 tasks in < 20 min vs. several hours |
| Transport layer | ConnectRPC (SDK) | REST polling | Native SDK integration, less boilerplate |
| Security architecture | 5 independent layers (B1-B5) | Monolithic guard | Each layer deactivatable, defense-in-depth |
| Output format | Shell-style (cat/ls/rg) | JSON | Better LLM reasoning quality |
| Feature flags | Env variable per layer | Hardcoded | Live competition: layer can be toggled in 30 seconds |
Five Security Layers
The agent has five independent defense layers, all written in pure TypeScript without external dependencies, each controllable via environment variables.
B1: Path Traversal Guard blocks access to /etc/, ~/.ssh/, .env, and .. traversals. Checked before every tool call.
B2: PII Refusal detects requests for personal data and refuses them.
B3: Grounding-Refs Validation checks whether cited file paths in the result were actually read. Prevents hallucinations.
B4: Destructive Brake limits write operations to a maximum of 10 per task. A soft cap at 35 iterations guards against infinite loops.
B5: Secret Redaction redacts AWS keys, GitHub tokens, JWTs, and PEM certificates before they enter the LLM context.
An additional injection scanner with 16+ patterns detects HTML comments, Base64 variants, and domain mismatches in emails.
Two Incidents on Hackathon Day
MCP Scope Leak: During the final regression test, the agent created an actual appointment in my real Google Calendar instead of the sandbox. Root cause: bypassPermissions had inherited OAuth scopes for Gmail and Google Calendar. Fix: a canUseTool runtime gate with an explicit blocklist for everything outside mcp__bitgn__*. Three lines of code.
Platform Disk Full: At 14:47 CEST, in the middle of evaluation, StartRun and StartPlayground returned 502 errors. BitGN platform: disk full. Rinat Abdullin resolved the issue live using agentic coding in minutes.
Result
1st place on-site in Vienna. Certificate signed by Felix Krause (CTO Klartext AI), Rinat Abdullin (BitGN founder), and Markus Keiblinger (President AIM International).
<ScoreDeltaTable rows={[ { category: 'Own points', ownValue: 79, benchmarkValue: 65, benchmarkLabel: '2nd place', delta: '+14', }, { category: 'Tasks solved', ownValue: 79, benchmarkValue: 104, benchmarkLabel: 'total', delta: '76 %', }, ]} headerMetric="Metric" headerOwn="Own score" headerBenchmark="Benchmark" headerDelta="Δ" caption="Sources: BitGN certificate, Vienna leaderboard. 104 tasks scored via observable side effects." />
Lessons
Preparation beats improvisation. One week of prep time with access to the dev benchmark was decisive. On hackathon day, there was no time left to make architecture decisions.
Feature flags on all security layers. In a live competition, this is essential. If a layer causes unexpected problems: disable it, continue, debug later.
Always submit something. If the agent crashes or a brake fires: submit a report_completion with a best-guess outcome anyway. Zero points for a crash, partial points for a half-correct guess.