Writing

AI demos reward speed. Shipping rewards judgement.

Why an impressive AI-assisted demo is not the same as software you can trust.

2026-05-258 min read

The most impressive AI-assisted development demo is usually the one where almost nothing exists at the start. A prompt goes in. A working interface appears. There is a little suspense, a little theatre, and then a browser tab with something that looks surprisingly complete.

That is a real shift. It is also a bad model for judging whether a product is ready to ship.

Demos reward what you can see quickly. Shipping rewards work that keeps holding up after release. Those are related, but they are not the same thing. A demo can hide unclear rules for what "done" means, odd cases that were never checked, vague ownership, broken page information, poor accessibility, and shortcuts in how it goes live. A real release cannot hide those things for long.

The closer the work gets to production, the less useful it is to ask, "did the model make something?" The better question is, "do we understand what changed, why it changed, and what proof says it is safe enough?"

Generation is not the bottleneck anymore

AI has made the act of producing code cheaper. It can create a page, set up page information, generate a first pass at copy, suggest a test shape, or turn a rough instruction into usable code. That speed is valuable. It gives small builders more leverage, especially when the task is clear and the surrounding system is simple enough to understand.

But faster production does not automatically mean safer production. In many cases, the bottleneck moves from typing to judging. Someone still has to decide whether the generated output fits the intent. Someone has to inspect the changed files, notice the extra code package that should not be there, catch the page that does not belong, and say no to the tempting flourish that would make the demo better and the product worse.

That steering work is not glamorous, but it is the work. The person using the tool has to frame the task, limit the scope, protect the structure, run checks, read failures, and decide whether "looks done" has become "is safe to ship." AI can help with each of those steps, but it cannot remove responsibility for them.

Clear rules for done get more important

When code is cheap to generate, vague requests become expensive in a different way. The cost shows up as review time, rework, and false confidence. A model can fill gaps quickly, but it may fill them with assumptions that feel plausible and still miss the thing that matters. Clear rules for what "done" means stop the work wandering.

While building the site you are reading now, the prompts that worked best were specific: build the initial shell, use the existing site setup, keep analytics, do not add a workbench, do not add 3D, do not create fake metrics. Later passes had similarly narrow goals: remove one browser warning, add a principles section, add a contact path, add the first project notes, strengthen the page information, add basic security settings, and tidy the public codebase.

Those limits did not slow the work down. They made the work shippable. Without them, it would have been easy to wander into the sort of impressive side quest that creates more surface area than value. The 3D or workbench idea can wait. The basic site needed identity, pages, copy, analytics, a path to production, email, page information, and a clean public codebase first.

Evidence beats vibes

A generated change can look tidy and still be wrong. It can pass a glance and fail a build. It can satisfy the local page and break page information, links, or accessibility.

While building the site you are reading now, one generated update looked correct in the browser yet dropped the social share image because one page had quietly overwritten the shared settings. The page still worked. The page still loaded. But the share preview was worse, and the mistake was easy to miss without a careful check.

That is why tests, checks, and proof become more important with AI, not less.

The useful signals were ordinary ones: bun run format, bun run lint, bun run lint:biome, and bun run build. The build output showed which pages were generated. Live checks confirmed redirects, search files, browser files, security settings, and the contact path. Vercel inspection confirmed whether production was actually serving the right commit. GitHub Actions and Dependabot were not dramatic additions, but they made the public codebase harder to break by accident.

None of this is anti-AI. It is the opposite. If AI lets you move faster, your checking loop has to keep up. Speed without evidence just creates a nicer-looking pile of uncertainty.

Review loops are product work

There is a common mistake in treating review as a final gate, something that happens after the interesting creative work is done. With AI-assisted development, review is part of the creative work. The first generated version is often raw material. The quality comes from shaping it, cutting scope, checking behaviour, and asking whether it serves the product rather than the demo.

A site audit is a good example. It is easy to generate a homepage that looks complete. It is harder to notice that there is no contact method, project cards all point to the same place, the social share preview is missing, search files are thin, muted text may be low contrast, and the page has too many "coming soon" signals.

Those are not exotic engineering problems. They are judgement problems. They are the difference between a page that exists and a page that earns a little trust.

The same pattern applies to putting a site live. A site can build on my machine and still be connected to the wrong live branch. A domain can work and still need the preferred address checked. A public codebase can still need a licence note, automated checks, dependency alerts, secret scanning, and branch protection. Shipping is full of these small, unglamorous checks. They are easy to skip because each one feels minor. Together, they are the release.

The person steering it matters

The best AI workflow I have found so far is not "ask for everything." It is closer to working with a fast, capable collaborator who needs clear boundaries. I want the tool to propose, implement, inspect, and verify. I do not want it to silently expand the product, invent facts, add dependencies, or turn every task into a redesign.

That means the person steering the work has to keep a thread of intent through it. What is the smallest useful change? What should not change? What evidence will we accept? What is deliberately out of scope? Where is the risk?

A good prompt is not just a request for output. It is a set of operating conditions.

This matters even more for small builders. AI can make a solo or small-team project feel less constrained by time and blank-page friction. That is powerful. It also makes it easier to create more than you can maintain. The discipline is not only in shipping more. It is in choosing what not to ship yet.

Looks done is not done

The central gap between AI demos and shipping is the gap between appearance and confidence. A demo needs to look done. A release needs to be understood well enough that someone is willing to own it.

That does not mean every small change needs enterprise ceremony. It means the proof should match the risk. For a personal site, the proof might be a clean build, working pages, correct redirects, readable copy, accessible focus states, sane page information, no secrets in the codebase, and a live site connected to the right branch. For a tax calculator, a payment flow, a healthcare workflow, or anything with meaningful user harm, the bar should be much higher.

AI-assisted development is not a shortcut around judgement. It is a way to spend less time on mechanical production and more time on framing, review, testing, proof, and product decisions.

Used well, that is a serious advantage. Used lazily, it just moves the mess faster.

The promise is not that AI makes shipping effortless. The promise is that it makes careful shipping more accessible. The work still has to be clear and limited. The changed files still have to be read. The checks still have to pass. The person steering the work still has to decide whether the result is good enough to carry their name.

That decision is what separates the demo from the product.