Skip to main content

Your AI-generated code works. It's probably not production ready.

3 min read

Shipping features with Claude Code or Cursor is fast now. Getting that code to hold up in production is a separate problem entirely. AI reduces implementation time. It does not produce production engineering.

I went through 8 AI-generated production apps. They all had roughly the same issues:

  • Supabase RLS misconfigured
  • secrets sitting in the codebase
  • no rate limiting, no caching
  • bad data structures
  • components re-rendering constantly
  • AI features open to prompt injection and RAG attacks
  • basically no tests around anything important

Most of them worked. Hardly any were production ready.

A year ago, writing code was the bottleneck. Now it’s reviewing and hardening what got generated. That’s a different skill, and most teams aren’t there yet.

The reason this keeps happening: AI is excellent at extending local patterns. It’s much worse at understanding long-term system boundaries, scaling behavior, and operational risk. It generates code that looks right in isolation and breaks under real conditions.

Production review flow

Six things I look at before calling AI-generated code production ready.


Security

This is where things go quietly wrong. The code compiles, tests pass, and then someone finds a misconfigured auth check six months later.

  • Auth flows: does every protected route actually verify the session? Are role checks on the server, not just the client?
  • Exposed secrets: API keys in frontend code, .env values hardcoded as fallbacks, secrets logged during error handling
  • Injection risks: SQL, command, and path injection in any user-controlled inputs
  • LLM prompt injection: if your app passes user input into an AI prompt, can a user rewrite what the AI does?
  • RAG document injection: can a user upload a document that manipulates your AI’s behavior?

Quality

Happy path works fine. The edges are where it falls apart.

  • Dead code and unused imports — AI generates confidently, including things it never wires up
  • Weak typing: any used to paper over uncertainty, missing null checks, unsafe type assertions
  • Anti-patterns: misused hooks, unnecessary useEffects, logic in the wrong layer
  • Architecture drift: after 50 prompts, does the codebase still follow the same conventions it started with?

Performance

AI-generated code tends to duplicate logic instead of abstracting correctly, miss caching layers entirely, and generate DB access patterns that work fine in development and fall apart under load.

  • Slow queries: missing indexes, N+1 patterns, fetching more columns than the page needs
  • Cold starts: heavy dependencies, unoptimized bundles, serverless functions loading too much on init
  • Render cascades: components re-rendering on every state change because nothing is memoized
  • Heavy bundles: entire libraries pulled in at the top level when one function was needed

Compliance

Some of this is stack-specific. PII handling isn’t.

  • Payment flows: are Stripe webhooks handled correctly? Any card data stored that shouldn’t be?
  • App Store: are in-app purchases routed right? Anything that’ll get the app rejected on review?
  • Data handling: GDPR basics — deletion, consent, data residency for EU users
  • PII and external APIs: are you sending user data to an AI API you haven’t agreed to share it with?

Testing

There are usually tests. They usually test the wrong things.

  • Critical paths: auth, payments, data writes — the flows that actually hurt users when they fail
  • Are tests checking behavior, or just that the function runs without throwing?
  • Are edge cases there, or just the happy path the AI was given in the prompt?

Observability

Most AI-generated codebases have none. Everything’s fine locally. Then it breaks in production and there’s nothing to look at.

  • Error tracking: are exceptions being captured, or swallowed silently?
  • Logging: structured and useful, or just console.log statements scattered around?
  • Alerting: do you find out when something breaks, or do users tell you?
  • Tracing: for AI calls and external APIs, can you follow a request end to end?

Most teams are treating code generation and code review as the same problem. They’re not. The faster teams ship with AI, the faster review debt accumulates — and most teams have no process for it yet.

That’s why I built Vibe Audit. It runs this audit automatically across your codebase and surfaces production risks before they become incidents. GitHub

Ran it on a real app. This is what came back.

Vibe Audit sample report

Share X (Twitter) LinkedIn