Engineering · 2026-04-28

A response to James Bennett: LLM coding is not just "faster typing"

James Bennett argues LLM-coding isn't a silver bullet, so adoption isn't urgent. We think that misframes what's happening. LLMs change the interface between intent, implementation, testing, and review — and the skill of working in that loop takes a year to build.

James Bennett is right about one important thing: LLMs are not magic. They do not remove the hard parts of software engineering. They do not make architecture, specification, testing, security, deployment, observability, or product thinking disappear.

But that is also where we think his post goes wrong.

The mistake is treating LLM coding mainly as a faster way to generate source code, and then arguing that faster code generation cannot be a silver bullet because code generation was never the real bottleneck. Bennett repeatedly returns to the No Silver Bullet framing: the hard parts of software are specification, design, testing, and conceptual correctness, not merely writing syntax faster. He concludes that LLM coding is therefore likely to be incremental rather than revolutionary, and that delayed adoption has little downside.

That argument is clean, but it attacks a weaker version of what is actually happening.

The serious case for LLMs is not:

"The model types code faster, therefore software engineering is solved."

The serious case is:

"The model changes the interface between intent, implementation, testing, review, and iteration."

That is a much bigger shift.

A good LLM workflow is not simply opening a chatbot and saying "build me a professional application." That is exactly how you get AI slop. The real skill is learning how to describe the system clearly enough that the model has the right context: architecture, constraints, user flows, data ownership, failure modes, security invariants, acceptance criteria, test expectations, and non-goals.

In other words, successful LLM coding does not remove the developer mindset. It makes the developer mindset more important.

The developer's value shifts from manually producing every line of code to specifying, steering, reviewing, testing, and hardening the output. That is not "less engineering." That is engineering moved up a level.

Validation is the bottleneck, not adoption

Bennett is correct that validation and integration are bottlenecks. But the conclusion should not be "therefore LLM adoption is not urgent." The conclusion should be: adopt LLMs together with stronger validation systems.

CircleCI's 2026 data actually supports this more nuanced view. Their report says AI has made it easier to write code, but shipping is harder; most teams generate more activity on feature branches while main-branch throughput suffers. However, the top 5% of teams nearly doubled throughput and were able to ship more because their validation systems kept up. CircleCI's own conclusion is that success in the AI era depends on validation, integration, and recovery speed, not merely code-writing speed.

That is not an argument against LLM coding. It is an argument against immature LLM coding.

The same applies to DORA. DORA does not say "AI is useless." It says AI acts as an amplifier: it magnifies strong organisations and also magnifies weak ones. The best returns come from the underlying system: internal platforms, clear workflows, and organisational alignment.

Again, that is not a reason to avoid LLMs. It is a reason to stop treating them as a toy and start integrating them into serious engineering workflows.

What a proper loop looks like

A proper AI-assisted development loop looks more like this:

Write a clear specification.
Define architecture and boundaries.
Generate implementation with the LLM.
Generate or refine tests with the LLM.
Run linting, type checks, unit tests, integration tests, E2E tests, and security checks.
Use failures, traces, and logs as feedback to the LLM.
Human reviews the risky parts: architecture, data flow, security, business rules, edge cases.
Merge only when the hard checks pass.

This is where tools like Playwright matter. Playwright's own best practices emphasise testing user-visible behaviour rather than implementation details, which is exactly the kind of evidence AI-generated software needs. With Playwright MCP, LLMs can interact with web pages through structured accessibility snapshots, meaning they can help operate, inspect, and test real browser flows instead of only generating code in isolation.

That changes the game. The LLM is no longer only a code generator. It becomes part of a loop:

specification → implementation → browser test → failure trace → fix → regression test

That is much closer to a real engineering process.

The skill curve isn't trivial

Bennett is also too relaxed about delayed adoption. His argument is basically: if LLMs are not a silver bullet, late adopters lose little; if they do become a silver bullet, the breakthrough will be so different that current workflows may not matter much.

We think that is wrong.

There is a practical skill curve here, and it is not trivial. Learning to use LLMs well is not the same as learning a new IDE plugin. You have to learn how to provide context, how to decompose work, how to write machine-readable specs, how to constrain the model, how to detect plausible nonsense, how to review AI-generated code, how to use tests as guardrails, how to recover from bad generations, and how to prevent the model from filling gaps with dangerous assumptions.

Those are not skills you pick up instantly once your employer mandates AI tools.

The industry is already moving. Stack Overflow's 2025 Developer Survey reports that 84% of respondents either use or plan to use AI tools in development, and 51% of professional developers use them daily. JetBrains reports that 85% of developers regularly use AI tools for coding and development, with 62% relying on at least one coding assistant, agent, or AI code editor. GitHub's Octoverse 2025 says nearly 80% of new developers on GitHub use Copilot within their first week.

That does not prove LLMs are perfect. But it does show that this is no longer a niche experiment. It is becoming part of the default development environment.

So the real risk of not adopting now is not that you will miss one magic tool. The risk is that you will miss a year of learning how to work in the new mode.

The shape of the new mode

And that new mode matters.

Future software development is likely to involve more reviewing, directing, testing, and orchestrating. The developer becomes less of a manual typist and more of a system designer, reviewer, tester, and quality gatekeeper. But that does not make developers obsolete. It makes weak developers more dangerous and strong developers more leveraged.

This is also where the "democratisation" argument needs nuance. LLMs will allow more people to produce working-looking software. That is both powerful and dangerous. The market will probably be flooded with people who can generate CRUD apps, dashboards, scripts, and prototypes at low cost. Some of them will think that makes them developers.

It does not.

A person who cannot reason about architecture, data boundaries, authentication, authorisation, threat models, edge cases, state management, race conditions, deployment, observability, and regression testing is not suddenly a professional engineer because an LLM produced code for them.

But that does not weaken the case for LLMs. It strengthens the case for experienced developers using them properly.

The winners will not be people who blindly trust LLMs. The winners will be people who know when to trust them, how to constrain them, how to test them, and how to make them prove their output.

What the cited studies actually say

Bennett uses studies like METR's early-2025 research as evidence for skepticism. That study found that experienced open-source developers working on familiar repositories took 19% longer when allowed to use AI. But the authors themselves warn against overgeneralising the result: they do not claim that AI fails to speed up most developers, future tools, less experienced developers, unfamiliar codebases, or different workflows.

That study is useful. It shows that naïve or poorly matched AI usage can slow people down. It does not show that LLM-native workflows are a dead end.

There is also evidence in the other direction. A controlled Microsoft/GitHub Copilot experiment found that developers using Copilot completed a JavaScript HTTP server task 55.8% faster than the control group. That does not mean every software team becomes 55.8% faster end-to-end. But it does prove that dismissing LLMs as merely producing more review burden is too simplistic.

The honest conclusion

LLMs are not a silver bullet.

But "not a silver bullet" does not mean "not strategically important."

They are a force multiplier. And like every force multiplier, they amplify both competence and incompetence.

If your team has bad specs, weak tests, no CI discipline, poor review culture, and unclear ownership, LLMs will help you create garbage faster.

If your team has strong architecture, clear requirements, CI/CD, linting, type checks, Playwright E2E tests, security review, and fast feedback loops, LLMs can dramatically increase iteration speed.

That is why the right answer is not Bennett's slow-adoption posture.

The right answer is controlled adoption now.

Not hype.

Not blind trust.

Not "let the model build the whole thing."

But also not sitting on the sidelines pretending this is just autocomplete with better marketing.

The important skill of the next few years will be learning how to steer the beast: how to specify clearly, how to constrain outputs, how to build testable requirements, how to use functional user testing, how to review generated code, and how to combine LLMs with deterministic tools that provide hard evidence.

Code generation is only the visible part.

The real shift is that software development is becoming a loop between human intent, machine generation, automated validation, and human judgement.

Bennett is right that the hard parts of software remain hard.

He is wrong to treat that as a reason for slow adoption.

Because the people who start learning this workflow now will not just know the tools. They will know the failure modes, the prompting patterns, the review techniques, the testing strategies, and the architectural discipline needed to use those tools safely.

That experience will matter.

And waiting until the industry has already normalised these workflows is not prudence. It is falling behind.

← All engineering posts