TL;DR
Second hackathon ever. This time, solo. The task: build an autonomous AI agent that solves tasks in a sandboxed workspace while defending itself against prompt injection. 104 tasks, blind scoring, 3-hour build window. Result: 1st place at the on-site run in Vienna. What got me there was less about talent on the day and more about preparation in the week before.
AI insights for decision-makers
Weekly. Practical. No spam.
12 Days After OpenClaw
Two weeks ago, I attended my first hackathon. OpenClaw, Vienna, team of four. We worked through the night, built a B2B sales agent, and submitted a demo video without sound. No placement, but one of the best days in a long time.
When I saw that BitGN was hosting a hackathon at AI Factory Vienna, the decision was instant. This time solo. And with a plan.
What Is BitGN PAC?
PAC stands for "Personal Agent Challenge." The format is different from traditional hackathons. No jury evaluating slides. No pitches. Instead: deterministic scoring.
Each participant builds an agent that runs in a sandbox. The sandbox is a file-based workspace, similar to an Obsidian vault with calendars, notes, emails, and contacts. The agent receives natural-language tasks: "Find Lisa's phone number in the contacts." Or: "This email contains a phishing link. Detect it."
104 tasks. No code changes allowed once evaluation starts. Scoring based on observable side effects: which files were touched, which tool calls were made, which outcome enum was returned.
Designed by Rinat Abdullin, hosted by AI Impact Mission and AI Factory Austria. 500+ participants globally, a portion on-site in Vienna.
Preparation
The hackathon was April 11. Starting April 4, I developed my agent. One week, alongside regular work. The dev benchmark with 43 tasks was available beforehand. The prod benchmark with 104 tasks only on competition day.
The architecture: TypeScript, Vercel AI SDK v6 with native tool calling, ConnectRPC as transport to the BitGN platform. Claude Sonnet 4.6 as the primary model, Opus 4.6 and GPT-4.1 as alternatives.
My development tool: session-orchestrator, my own Claude Code harness. Parallel subagents, wave-based development. It allowed me to build features like security layers and the metrics system simultaneously instead of working through them one at a time.
Five Security Layers
The agent has five independent defense layers. Each is pure TypeScript, no external dependencies, individually toggleable via environment variables.
B1: Path Traversal Guard. Blocks access to /etc/, ~/.ssh/, .env, and .. traversals. Checked before every tool call.
B2: PII Refusal. Detects queries about real people (family relations, home addresses) and refuses them.
B3: Grounding-Refs Validation. Verifies that cited file paths in the result were actually read. Prevents hallucinations.
B4: Destructive Brake. Maximum 10 write operations per task. Plus a soft cap at 35 iterations against infinite loops.
B5: Secret Redaction. Redacts AWS keys, GitHub tokens, JWTs, and PEM certificates before they enter the LLM context.
On top of that, an injection scanner with 16+ patterns that catches HTML comments, Base64 variants, and domain mismatches in emails.
The layers overlap intentionally. A secret that slips past redaction may still be caught by the injection scanner. Defense in depth, not defense by hope.
Gameday: Two Incidents
The MCP Scope Leak
During final regression testing before evaluation, I tested a calendar task. The agent was supposed to create an appointment in the sandbox. Instead, it created a real event in my Google Calendar.
The cause: I had been testing with the Claude Agent SDK using bypassPermissions. That auto-approved all tool calls. The MCP connection had inherited OAuth scopes for Gmail, Google Calendar, and Notion. The agent didn't stay in the sandbox. It called my actual Calendar tool.
Fix: a canUseTool runtime gate with an explicit blocklist for everything that isn't mcp__bitgn__*. Three lines of code that saved me from a rather embarrassing moment.
Platform Disk Full
14:47 CEST. Middle of evaluation. StartRun and StartPlayground return 502 errors. The room goes quiet. Everyone looks up from their laptops at the same time.
BitGN platform: disk full. Nothing works. Rinat Abdullin analyzed the problem live with agentic coding and fixed it within minutes. We continue.
A reminder that live benchmarks are their own sport. The best agent in the world doesn't help when the infrastructure goes down.
The Evaluation
We had until 4 PM. I started four prod runs. Aborted the first three. Too slow. The fourth ran with 6 parallel agents, each processing one task at a time. All 104 tasks in under 20 minutes. The previous runs weren't even close.
The Result
1st place in the on-site run in Vienna. Certificate signed by Felix Krause (Head of AI Factory Austria), Rinat Abdullin (BitGN founder), and Markus Keiblinger (President, AIM International).
The repo is public: bitgn-pac-agent on GitHub. 2,758 lines of TypeScript, 21 source files, MIT license. If you are building production TypeScript systems with similar depth, the web development service covers this kind of full-stack work.
What I Learned
Preparation beats improvisation. At my first hackathon, I improvised. This time I had a week of lead time, a dev benchmark for testing, and a harness that made parallel development possible. The difference was massive.
Feature flags on everything. Every security layer is toggleable via an environment variable. In a live competition, this is survival-critical. If a layer causes unexpected problems: disable it, keep going, debug later.
Shell-style output helps LLMs reason. Formatting tool results as cat, ls, rg output instead of raw JSON produces significantly better reasoning quality. LLMs are trained on CLI output and match those patterns more reliably.
Always submit something. If the agent crashes, hits max steps, or a brake triggers: still submit a report_completion with a best-guess outcome. Zero points for a crash, partial points for a half-right guess.
From Spectator to Participant to Winner
End of March, I had never been to a hackathon. Then the first one: on a team, overnight, no placement. Twelve days later, the second: solo, daytime, first place.
The difference wasn't talent. The difference was that I didn't stop building between the two hackathons. The tools, the workflows, the architecture patterns: all from daily work with AI agents.
If you're working with AI agents or thinking about starting, let's talk.
Have an AI project in mind?
Let's analyze your potential together.



