Back to the blog|Launch Readiness

Built with AI and ready to launch? A production-readiness checklist

June 2, 2026
Timo WevelsiepTimo Wevelsiep
veriploy

Built with AI and ready to launch? A production-readiness checklist

Built with AI, but actually ready to launch? A production-readiness checklist covering repo, security, CVEs, database and infrastructure before go-live.

veriploy.de Blog

This post is for general technical information and does not replace an individual review or legal advice. As of the time of publication.

AI now builds a working product in hours that used to take weeks. The bottleneck is no longer writing code, it is verifying it: working is not the same as production-ready, and that is exactly where a launch holds or breaks.

Table of Contents

The verification gap

The speed at which AI tools produce code has created a new problem: more code is generated than is carefully reviewed. Sonar's State of Code Developer Survey (January 2026, over 1,100 developers) captures the gap precisely. 96% of developers do not fully trust that AI-generated code is functionally correct. Yet only 48% always check their AI-assisted code before committing it.[1]

That discrepancy is the verification gap. Almost everyone knows they should look. Fewer than half do so consistently. Sonar describes it as a new bottleneck at the verification stage of software development: value is no longer defined by the speed of writing code, but by the confidence in deploying it.[1]

That this problem is not theoretical shows up in production. In CloudBees' 2026 State of Code Abundance (213 enterprise technology leaders), 81% report an increase in production issues attributable to AI-generated code. The striking part: in the same study, 92% say they trust the production readiness of AI code.[2] That overconfidence is exactly the danger. Trust the code more than it deserves and you review less, then notice the deficit only when it gets expensive.

National security agencies are raising the same point. The CEO of the UK's NCSC, Dr Richard Horne, warned in March 2026 that insecure software produced without human review could potentially propagate vulnerabilities.[3] The conclusion across all three sources is identical: go-live needs a deliberate, human verification step. A launch checklist is exactly that, in structured form.

Why AI code fails in production

AI generators are good at writing code that meets the obvious requirement. They are bad at thinking about what was not in the prompt: permissions, error cases, abuse, operations. Veracode tested code from over 100 LLMs and found that around 45% of code samples failed security tests and introduced OWASP Top 10 vulnerabilities.[4] Notably, larger and newer models wrote functionally better code, but not more secure code. More model power does not close the security gap by itself.

The OWASP Top 10:2021 is a proven frame for classifying the typical failure patterns.[5] Four categories show up especially often in AI-built apps:

  • A01 Broken Access Control has been the most frequently found category since 2021 and is the classic AI-code failure. Generators often build the login (authentication) but no clean role and object checks (authorisation). The result: logged-in users access other people's records by ID (IDOR), admin endpoints are reachable unprotected, or writing API routes for POST, PUT and DELETE check no permission. All explicitly listed by OWASP under A01.[6]
  • A03 Injection also covers dynamic queries where user input is not validated, filtered or handled in a context-safe way. The defence: separate data from commands and use parameterised APIs or ORMs.[7]
  • A05 Security Misconfiguration arises when framework defaults are left untouched: error pages with stack traces, open CORS, active default configurations, missing hardening, unnecessary features.[8]
  • A06 Vulnerable and Outdated Components hits apps whose dependencies are unknown, outdated or unscanned. OWASP explicitly calls it vulnerable if you do not scan for vulnerabilities regularly and do not patch in a timely fashion.[9]
  • A09 Security Logging and Monitoring Failures is the operational blind spot. AI apps often log neither logins nor failed logins nor security-relevant events, and there is no alerting. Attacks simply stay invisible.[10]

The pattern behind it is always the same: AI builds the feature, not the operational readiness. Skip checking that before launch and you are relying on the generator having thought about something that was never part of the task.

The launch checklist

The checklist below is grouped by the five pillars we also look at in ongoing operations at Veriploy. It is deliberately scannable: work through it and honestly mark what is open.

Repo & code

Check Why
No dead or commented-out security code (e.g. disabled auth checks) AI often leaves TODO placeholders or "temporarily open" in place
Branch protection and review on the main branch Stops unreviewed code landing straight in production
.env, dumps and build artefacts in .gitignore Secrets and data do not belong in the history
Error handling instead of swallowed exceptions Silent failures become data loss in production

Security (access control)

This is where the biggest risk sits. Check authorisation separately from authentication:

  • Every read and write route checks whether the logged-in user may see or change this specific object. Not just whether they are logged in.
  • Admin functions are checked server-side against the role, not just hidden in the frontend.
  • IDs in URLs and API calls cannot be enumerated (test: insert someone else's ID, it should return 403).

Supabase RLS as a concrete example: Row Level Security is Postgres' mechanism for row-based access control. A correct policy restricts each user to their own rows, for example using ( (select auth.uid()) = user_id ).[11] The decisive pitfall: RLS is not globally on. Tables created via the Table Editor have RLS enabled by default, but tables created via raw SQL or the SQL editor do not.[11] And without RLS, a table in the exposed public schema is fully accessible to any role with a matching grant, for example anon, meaning anyone with the public key.[12] This exact pattern led to CVE-2025-48757 (CVSS 9.3): in AI- and low-code-generated apps (Lovable), tables without correct RLS exposed data unauthenticated, affecting 303 endpoints across 170 projects.[13] Important framing: this is not a Supabase bug but missing configuration by the developer or generator. Checklist item: RLS enabled on all exposed tables, matching policies in place, and a real multi-user test (user A must not see user B's data).

Two recurring practical mistakes, which OWASP conceptually places under A05 (hardening) and A04 Insecure Design (missing anti-automation): secrets in the client bundle and missing rate limits on login and API endpoints both belong closed before launch.

Dependencies & CVEs

Scanning dependencies is mandatory, not optional. The tools deliver a list of known vulnerabilities in seconds:

Tool What it does
npm audit (Node) Lists known CVEs by severity (info, low, moderate, high, critical); npm audit fix patches automatically where possible without a breaking change[14]
composer audit (PHP) Lists vulnerabilities and additionally flags abandoned packages and packages flagged as malware[15]
Dependabot alerts (GitHub) Reports vulnerable dependencies continuously; security updates automatically open patch PRs to the minimum fixed version[16][17]

But seeing alerts is not the same as understanding risk. Severity follows CVSS,[18] yet whether a gap is genuinely exploitable depends on context. Is the package in production or only a dev dependency? Is it included directly or transitively? GitHub provides signals for this, like EPSS (exploit likelihood over the next 30 days, generally available since February 2025) and a Most important sort order.[19] And GitHub says so itself: "Alerts can't catch every security issue." Only reviewed advisories trigger alerts, and manifest and lock files must be current.[16] Auto-patches and audits are the baseline, but they do not replace a deliberate review before go-live. That exact judgement, turning 40 alerts into the three that really block, is where a human eye counts.

Database

  • RLS or equivalent server-side access control on all tables with user data (see above).
  • Migrations versioned and reproducible, not maintained by hand in the console.
  • Indexes on the columns the app filters by. AI code generates queries, rarely matching indexes.
  • No production data in test or staging environments.

Infrastructure

Check Why
Backups in place AND a restore tested A backup without a tested restore is not a backup
Monitoring and error tracking active So a failure does not first surface via a customer email (A09)
Secrets in a secret manager, not in the repo or CI config OWASP: keys do not belong in version control; ideally pre-commit checks (e.g. git-leaks)[20]
Deployment via CI/CD instead of manually onto the server Reproducible and secured instead of error-prone by hand[21]
Staging strictly separated from production Prevents accidental changes to live data
Rate limits against abuse and cost blow-ups Protects login, API and expensive endpoints
Structured logging without secrets or personal data in plain text Traceability without a new data-protection risk

Tool review vs. the human eye

AI code review tools are useful. They comment on pull requests, catch obvious bugs and suggest style changes. But they see exactly what is in the PR: code diffs. They do not see whether the backup was ever restored, whether secrets sit correctly and separated in the production environment, whether the RLS policy holds in a real multi-user case, or whether a finding is critical or irrelevant for your specific business.

That is not a failure of the tools but a limit of how they are built. A review tool prioritises findings by general heuristics. A human with senior experience prioritises by business risk: which of the 40 alerts really blocks the launch, and which can wait? That translation from scanner output into a decision is what a tool alone cannot deliver, and what the NCSC is pointing at with its call for human review.[3]

How Veriploy helps

Veriploy is exactly that second, human eye, but not as an expensive one-off audit that is stale after three weeks. AI-built software changes weekly; a snapshot does not fit that. So Veriploy is ongoing technical oversight: a regular look at repo, CVEs and infrastructure, prioritised risks and clear next steps instead of a wall of scanner output.

A sensible entry point is the Preflight check: a quick first assessment of your app. If you want the depth of an initial inventory, start with the Baseline and then move into ongoing plans, from a monthly look (Watch) through weekly sparring (Guard) to close support for live products (Launch). What a finding actually looks like is shown in the sample report. Questions up front go straight through contact.

Clearly scoped: Veriploy does not make your app "secure", gives no guarantee and is no penetration-test replacement. It is also not feature development. Veriploy creates visibility and prioritisation so you can decide, with a clear basis, what really needs to be done before launch.

Conclusion

The bottleneck in building with AI is no longer writing, it is verifying. 96% distrust the code, only 48% review it consistently, and 81% of tech leaders see more production issues from AI code. A launch checklist closes exactly that gap: access control before login polish, RLS plus a real multi-user test, a dependency scan with context-aware prioritisation, secrets out of the repo, tested backups, and monitoring that alerts before the customer does. AI builds the feature fast. Whether it survives the first real surge of users is decided by the verification beforehand, and that is human work, not a prompt.

Sources

  1. Sonar, "Sonar Data Reveals Critical Verification Gap in AI Coding" (press release, 08.01.2026): https://www.sonarsource.com/company/press-releases/sonar-data-reveals-critical-verification-gap-in-ai-coding/
  2. CloudBees, "Enterprise Technology Leaders Report Production Failures from AI-Generated Code" (2026 State of Code Abundance, 19.05.2026): https://www.cloudbees.com/newsroom/enterprise-technology-leaders-report-production-failures-from-ai-generated-code
  3. NCSC, Dr Richard Horne, "Seize disruptive vibe coding opportunity to make software more secure" (24.03.2026): https://www.ncsc.gov.uk/news/ncsc-ceo-seize-disruptive-vibe-coding-opportunity-to-make-software-more-secure
  4. Veracode, "2025 GenAI Code Security Report" (blog, 30.07.2025): https://www.veracode.com/blog/genai-code-security-report/
  5. OWASP Top 10:2021 (overview): https://owasp.org/Top10/2021/
  6. OWASP A01:2021 Broken Access Control: https://owasp.org/Top10/2021/A01_2021-Broken_Access_Control/
  7. OWASP A03:2021 Injection: https://owasp.org/Top10/2021/A03_2021-Injection/
  8. OWASP A05:2021 Security Misconfiguration: https://owasp.org/Top10/2021/A05_2021-Security_Misconfiguration/
  9. OWASP A06:2021 Vulnerable and Outdated Components: https://owasp.org/Top10/2021/A06_2021-Vulnerable_and_Outdated_Components/
  10. OWASP A09:2021 Security Logging and Monitoring Failures: https://owasp.org/Top10/2021/A09_2021-Security_Logging_and_Monitoring_Failures/
  11. Supabase Docs, Row Level Security: https://supabase.com/docs/guides/database/postgres/row-level-security
  12. Supabase Docs, Securing your API: https://supabase.com/docs/guides/api/securing-your-api
  13. Matt Palmer, CVE-2025-48757 (disclosure): https://mattpalmer.io/posts/2025/05/CVE-2025-48757/
  14. npm Docs, npm audit (CLI v10): https://docs.npmjs.com/cli/v10/commands/npm-audit
  15. Composer Docs, composer audit: https://getcomposer.org/doc/03-cli.md
  16. GitHub Docs, About Dependabot alerts: https://docs.github.com/en/code-security/dependabot/dependabot-alerts/about-dependabot-alerts
  17. GitHub Docs, About Dependabot security updates: https://docs.github.com/en/code-security/dependabot/dependabot-security-updates/about-dependabot-security-updates
  18. GitHub Docs, About the GitHub Advisory Database: https://docs.github.com/en/code-security/security-advisories/working-with-global-security-advisories-from-the-github-advisory-database/about-the-github-advisory-database
  19. GitHub Docs, Prioritizing Dependabot alerts using metrics (EPSS): https://docs.github.com/en/code-security/tutorials/manage-security-alerts/prioritizing-dependabot-alerts-using-metrics
  20. OWASP Secrets Management Cheat Sheet: https://cheatsheetseries.owasp.org/cheatsheets/Secrets_Management_Cheat_Sheet.html
  21. OWASP CI/CD Security Cheat Sheet: https://cheatsheetseries.owasp.org/cheatsheets/CI_CD_Security_Cheat_Sheet.html

Frequently Asked Questions

Is an AI-built app automatically secure enough to launch?
No. AI tools produce working code fast, but working is not the same as production-ready. Veracode tested code from over 100 LLMs and found that around 45% of code samples failed security tests and introduced OWASP Top 10 vulnerabilities. Before launch, access control, dependencies, secrets handling and infrastructure all need deliberate review.
What is the verification gap in AI code?
According to Sonar's State of Code Developer Survey (January 2026, over 1,100 developers), 96% of developers do not fully trust that AI-generated code is functionally correct, yet only 48% always check their AI-assisted code before committing it. That gap between distrust and actual review is the verification gap.
Is an AI code review tool enough as a pre-launch check?
Not on its own. AI review tools see the code in pull requests and comment on it line by line. They do not see whether a backup was ever restored, whether secrets sit correctly in your infrastructure, or which finding is business-critical. That is the difference between tool output and human judgement.
Why is Supabase RLS so often the problem in AI-built apps?
Row Level Security is not globally on: tables created with raw SQL do not have RLS enabled by default. Without RLS, a table in the exposed public schema is fully accessible to anyone with the public anon key. This exact pattern led to CVE-2025-48757 (CVSS 9.3) in AI- and low-code-generated apps.
What does a red Dependabot alert actually mean?
An alert reports a known vulnerability in a dependency, ranked by CVSS severity (Low/Moderate/High/Critical). Whether the gap is genuinely relevant depends on context: is the package in production or only a dev dependency, direct or transitive. GitHub provides signals like EPSS, but prioritisation still needs judgement.
Does Veriploy make my app secure or replace a pentest?
No, neither. Veriploy gives you ongoing technical oversight: visibility into repo, CVEs and infrastructure, prioritised risks and clear next steps. Veriploy is not a penetration test, not a security guarantee and not feature development, but a second pair of eyes with senior experience.
What should a solo founder check at minimum before launch?
At minimum: per-object access control (not just login), RLS on all exposed tables plus matching policies, a dependency scan with npm audit or composer audit, secrets kept out of the repo, a tested backup restore, and monitoring so a failure does not first surface via a customer email.
Timo Wevelsiep

Written by

Timo Wevelsiep

Developer & Founder · Veriploy

Veriploy: technical oversight for AI-built software. Run by Timo Wevelsiep.

Repo fit

Check repo fit

Briefly describe the project.

Direct contact with me, no anonymous ticket system. I get back to you with a first assessment and the right entry point.

Timo Wevelsiep

Timo Wevelsiep

Direct contact with me, no anonymous ticket system.

[email protected]

By submitting, you agree to our Privacy Policy.

or