Notes on AI-assisted development

Most of what I've read about AI-assisted development is either uncritically enthusiastic or uncritically skeptical. I want to write something more measured, because the reality is interesting in ways that don't fit either camp.

I've been building with Claude Code as my primary development partner since January, about six months now, across several projects. The largest was a production CRM and marketing platform for a Tanzania-based safari operator, a ground-up rebuild in TypeScript on Next.js of a static site I had hand-coded the year before. It is live now, running the business day to day, and the biggest thing I have shipped working alone. I want to describe what working this way is actually like, in concrete terms, with what is good and what is hard kept separate.

The shape of the collaboration

The collaboration is asymmetric. I bring the context: the domain knowledge, the architectural intent, the judgment about what's worth building. Claude Code brings raw fluency in TypeScript, exhaustive familiarity with Next.js patterns, and the willingness to write a hundred lines of test scaffolding without complaint. Neither of us could do this work alone in this timeframe.

The pattern that works for me has four steps:

I describe the problem in full sentences. Not "add a booking form," but "add a three-step booking wizard with cross-step validation, session persistence, and conditional fields that change based on package selection."
Claude Code drafts an implementation. The first draft is usually wrong in some way, but in the right neighborhood. It looks like the kind of code I'd write if I were typing fast.
I read the draft carefully. This is the step most discussions of AI coding skip. The draft is not a finished thing. It is a proposal that needs to be evaluated.
I push back, refine, or accept. Sometimes I rewrite parts by hand because the model's pattern doesn't match the rest of the codebase. Sometimes the draft is exactly right and I commit it. Most often I refine, pointing out what's wrong, suggesting a different abstraction, asking for a smaller change.

The rejection-and-revision loop is normal. It is the work, not a sign that the tool is failing.

What makes a draft good or bad

I've gotten better at reading drafts critically. The ones that ship without modification share a few properties:

The function does one thing, and the boundaries are clear.
The error handling matches what's already in the codebase, not the model's defaults.
The naming aligns with conventions established earlier in the project.
There are no redundant helpers, no defensive validation for inputs that come from internal code.

The drafts that need work tend to have one of these problems:

The implementation is plausibly correct, but the off-by-one or the missed edge case is hidden in plain sight.
The code reaches for a pattern from a different codebase instead of matching what already exists.
The error handling is overgrown: try/catch around things that can't fail, fallbacks for scenarios that don't happen.
The comment says what the code obviously does, instead of why.

These aren't syntactic errors. They are intent errors, and catching them is the part of the work that doesn't get easier.

The productivity question

I'm going to be careful here, because the productivity question is where most accounts of AI-assisted development get sloppy.

In rough terms, my experience is this. For things with well-known patterns (auth flows, form validation, REST endpoints, test scaffolding), the cost of an additional feature drops substantially. Not by a small amount, but by enough that the calculus of what makes it into v1 changes. For things that require novel logic, where the pattern doesn't exist in some other codebase, the gain is much smaller, and sometimes there isn't one.

I cannot give you precise numbers. I haven't run the controlled experiment, and the experience is too noisy to reduce to a single number. What I can say is that the projects I'm willing to attempt have changed.

The clearest case is one I lived. In December 2024 I hand-built a static marketing site for a safari operator: plain HTML, professional-looking, and inert, because a real CRM behind it was more than I could justify building alone. A year later I rebuilt the whole thing with AI assistance, a booking pipeline, an admin console, AI-drafted itineraries, the back-office that actually runs the business. Same developer, same nights and weekends. What changed was not my skill. It was the cost of attempting something that large solo. That is the shift that matters to me, and line counts are a secondary signal at best.

Where the leverage doesn't reach

It does not help with product decisions. It does not help with pricing strategy, content strategy, or what to build next. It does not help when the codebase is unusual, when the patterns are bespoke, the model's defaults fight the conventions I've established, and the back-and-forth becomes friction instead of velocity.

It also does not help with the parts that require domain context the model doesn't have. When I needed to model how a Tanzanian safari business actually quotes prices across multiple parks, multiple seasons, multiple permit fees, and four currencies, Claude Code could implement what I described, but I had to do the modeling. The structure of the domain came from me.

The discipline this requires

The single most important skill I've developed working with Claude Code is reading code carefully. Not for syntax, but for intent. Most of the bugs I've shipped came from drafts I half-read. The cost of moving fast with AI tools is that the bug surface area per unit time goes up. If you skip review, you ship bugs faster.

The pattern that keeps me honest is treating each draft as if a colleague had submitted it to me for review. I read it, I push back where I disagree, and I don't commit anything I haven't understood. That sounds simple. It is not always easy when the draft looks right at a glance and the next problem is waiting.

The objections I take seriously

It would be easy to stop there, on discipline and leverage. But that's the practitioner's flattering version, and not all of the criticism aimed at these tools is reflexive. Three objections land for me, and I don't think the honest response to any of them is "you're holding it wrong."

Skill atrophy is real, and I feel the pull. The same loop that makes me fast (describe, read, refine) degrades easily into describe, skim, accept. When it does, I stop learning the thing I'm shipping. I've caught myself reaching for the model on a problem I could have reasoned through in ten minutes, and the cost isn't the ten minutes. It's the reps I didn't do. For someone with years of pattern-matching already in muscle memory, that's a slow tax on sharpness. For someone earlier on, it can be the difference between building the fundamentals and renting them. A junior who can prompt their way to working code without ever struggling through why it works is in a weaker position than they look, and the tool will cheerfully hide that gap from them and from whoever reviews them. "The AI can just do it" is the most dangerous sentence in this whole shift, and it lands hardest on the people with the least experience to catch what it gets wrong.

The code is plausible, which is exactly the problem. I said the bug surface per unit time goes up. That's the polite version. The drafts that are wrong are rarely wrong in ways that look wrong. They're wrong in the off-by-one, the unhandled null, the auth check that's a little too permissive, the dependency pulled in for a single function. Security is where this worries me most. A model trained on the average of public code will reproduce the average of public code's security, which is not good: the string-concatenated SQL, the over-broad CORS, the secret logged "just for now." None of it announces itself. The only thing between that and production is a human reading every line as if a stranger wrote it, because one did. If you don't yet have the experience to read that way, the tool hasn't removed the danger. It's removed your ability to see it.

The labor question is the one I'm least willing to hand-wave. The optimistic story (the floor rises, the ceiling rises, judgment gets more valuable) has a hole in it. Seniors are made out of juniors. If the entry-level work that used to train people is the first work to get automated, the pipeline that produces the experienced engineers everyone still agrees you need starts to break, and it breaks quietly, years before anyone feels it. "New kinds of jobs will appear" can be true in aggregate and still be cold comfort to a specific person whose role got compressed this quarter. I benefit from these tools. That benefit is not costless, and it is not evenly shared. I don't have the policy answer, and I'm wary of anyone who says they do, in either direction.

None of this changes how I work tomorrow. It changes how I talk about it. The tool is genuinely useful and genuinely double-edged, and both halves are worth saying out loud.

What I think is coming

I'll resist the grand prediction. The near-term thing I'm fairly sure of is smaller. The gap widens between people who use these tools with judgment and people who let the tools do the judging. The first group looks like clear thinkers who can specify a problem precisely, read a draft critically, and hold a codebase coherent while it grows faster than their typing. The second ships more bugs, faster, and feels productive the entire time.

That's not "AI replaces engineers," and it's not "AI changes nothing." It is a tool that raises what's possible and, in the same motion, raises the cost of using it carelessly: to your codebase, to your own skill, and, at the scale of an industry, to the people whose way into the field used to run through exactly the work it now does first. I find it genuinely useful, and I'm trying to hold that and its costs together rather than pick one.

I have more to say about specific patterns: how I structure context for long conversations, how I keep architectural consistency across a few hundred files, what I do when the model's pattern fights mine. Those are the next posts.