The VibeSec Reckoning

Why prompting your AI to “be secure” is not enough, and what actually is

Ruth is an AI engineer in Global Marketing at Thoughtworks who builds intelligent systems and AI agents that turn complex data into practical insights and scalable, real-world solutions.

Neil is an AI Engineer in Global Marketing at Thoughtworks, specialising in data engineering, multi-agent systems, and research into new AI capabilities.

Lucian is an AI Engineer in Global Marketing at Thoughtworks, specialising in Retrieval Augmented Generation and agentic systems.

Gautam is Head of AI applications, Global Marketing at Thoughtworks. He leads AI platform initiatives and applied AI engineering teams focused on building and scaling production-ready, token-efficient GenAI applications across the Google ecosystem.

“Vibe coding” - the practice of non-technical citizen builders using generative AI tools to rapidly develop applications, this has significantly accelerated software prototyping. However, because AI agents naturally prioritise the path of least resistance, they frequently recommend insecure configurations, creating systemic security exposure across industries. To combat this we need to write a security context file to guide the AI, be cautious with AI permission requests, create a daily security intelligence feed, and provide builders with a secure-by-default harness and templates.

Vibe coding is enabling non-technical users (or as we call them, citizen builders) to build applications with AI that they simply could not have built before. When our AI applications team in Global Marketing at Thoughtworks was asked to scale a vibe coded prototype built by one of our citizen builders in global marketing, we discovered serious cracks that prevent vibe coded applications from going into production safely.

Speed without guardrails is a risk no team can afford to ignore. What follows is the story of what we found, what it means for teams building with AI, and the steps we are taking to make sure every workflow, prototype, and app we ship is one we can stand behind.

What we learned the hard way The AI applications team within Global Marketing was asked to scale a video assembly prototype built with Gemini, Replit AI and Claude AI to create on-brand videos to be used across our 10,000 employees. The team ran into two moments that stopped work cold. In both cases, the AI suggested a path with serious security implications. In both cases, it took a human asking the right question to catch it. Security risk # 1 Public storage access The AI recommended making the storage bucket public, or setting cloud file storage to “anyone with the link.” When challenged, it justified this by saying every company does it. Only a firm rejection prompted a secure alternative. This could have leaked sensitive unreleased brand assets and audience data to the public internet. Security risk # 2 Excessive token permissions A service account was assigned the Access Token Creator role, granting it the ability to create short-lived tokens and access databases and other resources far beyond what the task required. The team caught this before running the code. This would have allowed a compromised service account to move laterally through an entire cloud workspace. The key insight here is that AI tools often suggest the path of least resistance. That path is not always the secure one. Human judgment remains essential, but it should not be the only control. The goal is to give agents technical security rules as context from the first prompt, then validate their output through deterministic checks in the development workflow so insecure code, permissions, secrets, or infrastructure cannot pass unnoticed.

The real problem: prompts are not enough After sharing these incidents with engineering and security colleagues, a clear message came back: telling an AI agent to be safe is not the same as enforcing that it is safe. Prompts can be overridden, misunderstood, or ignored. The moment a user pushes back on a restriction, or phrases a request differently, the constraint can evaporate. “It is not sufficient to merely tell the LLM the desired behavior of your output artifacts. If you absolutely do not want something to be true, it must be codified in non-negotiable rules somewhere in your development lifecycle.” - Engineering leadership Think of it this way: prompting for test-driven development is not the same as enforcing code coverage thresholds in your build tool. One is a suggestion. The other is a gate. Birgitta Böckeler’s work on harness engineering makes this concrete by outlining a mental model for building trust in coding agents. Instead of relying solely on prompts, developers wrap the agent in an outer “harness” structured along two axes: Guides (feedforward controls) anticipate unwanted behavior and steer the model before it acts, while Sensors (feedback controls) observe the code after the agent acts to flag errors.

... continue reading