← Back to writing

Manual Testing — On Understanding the Software You Build

By Jamie Aronson

The Playwright Spiral

I spent three hours debugging a Playwright test that kept failing. The button was there. The form worked. But the test couldn't find the element. I tried different selectors. I added waits. I asked Claude to fix it. Claude generated new tests. Those failed too. We went back and forth, tweaking timeouts and retry logic, until I honestly couldn't remember what the feature was even supposed to do anymore.

The test suite thought everything was fine. The software was broken.

Or maybe the software was fine and the tests were broken. I genuinely didn't know. I'd lost touch with what I was building.

Eventually I deleted the whole test suite and just opened the app.

The Distance Problem

Here's what I've come to believe: automated testing creates distance. And when you're a solo builder working with AI, distance is the last thing you need.

When you write tests — especially with AI — you start thinking in test logic. Does this button submit the form? Does this API return the right status code? Can Playwright find this element after the async load completes? You're debugging the test infrastructure instead of understanding the actual product.

And here's the thing that took me a while to see: AI is really good at building features that are technically correct but experientially wrong. It can scaffold a billing system, wire up the forms, hook up the database mutations, make all the pieces work — and the result can still feel off in ways that are hard to articulate but impossible to miss when you actually use it.

Playwright will never tell you that. Playwright checks if the pieces work. It doesn't check if they should.

What Manual Testing Actually Is

I'm building a veterinary assistant app. Recently I implemented a billing catalog — a place where vets can add services, medications, procedures, lab work, all the things they bill clients for. AI built it. The forms worked. The validation worked. The database mutations worked. Everything worked.

Then I sat down and actually used it.

I started adding items like I was a vet setting up their practice. "Routine Exam — $75." "Blood Panel — $120." "Extraction — $200." And as I was cataloging these items, I started noticing things.

The save button was always clickable, even when I hadn't changed anything. That felt wrong. Not broken — just wrong. A small thing, but it created a sense of uncertainty. "Did my last change save? Should I click this again?"

Then I got to the tax settings. I could toggle "tax included in prices" on and off. I could set a tax rate. But as I was clicking through it, a question formed that wouldn't have occurred to me just reading the code: How does this actually work?

If tax is included in prices, does that mean the $75 exam already has tax baked in, and we're just showing the breakdown on the invoice? Or does it mean we add tax on top when generating the bill? What does a vet expect this to mean?

And then another question: Should we even support multiple taxes? In the US, you might have state tax and local tax. Does a small vet practice need that? Is this overengineered? Or is it something we'll regret not building now?

These questions didn't come from reading the code. They didn't come from writing tests. They came from using the software as if I were the user.

Understanding vs Automation

Manual testing, in this sense, isn't about catching bugs. It's about maintaining understanding.

When I manually test, I'm not thinking "does this button work?" I'm thinking "would a vet understand this?" I'm thinking "does this flow make sense?" I'm thinking "is this the right feature?"

These are product questions, not test questions. And they only surface when you slow down and actually use the thing you're building.

This is especially true when you're working with AI. AI can generate features faster than you can think about them. It's easy to end up with a codebase full of technically correct functionality that doesn't cohere into a product that makes sense. The speed becomes a liability if you're not deliberately staying connected to what you're building.

Automated tests make you think like an engineer: "Does this function return the right value?"

Manual testing makes you think like a user: "Why would I even use this function?"

Both matter. But when you're building alone with AI, the second one is the bottleneck. AI is not short on engineering capability. It's short on product judgment. And the only way to develop that judgment is to use the software.

The Flavour

There's something more subtle I've been trying to articulate. It's not just about catching issues or asking better questions. It's about developing a flavour for the software.

When you manually test regularly, you start to develop an intuition for how things should feel. You know when a loading state takes too long, not because you timed it, but because it feels slow. You know when a form is asking for too much information, not because you counted the fields, but because filling it out felt tedious. You know when a feature is overengineered, not because you measured the complexity, but because using it felt unnecessarily complicated.

This flavour is what gets lost when you rely entirely on automation. Playwright doesn't have taste. It doesn't know what "smooth" feels like, or what "intuitive" means, or when something is technically correct but experientially wrong.

And here's the thing: you can't shortcut your way to this. You can't write a test that checks for "feels right." You have to use the software. You have to click through the flows. You have to add the catalog items and toggle the settings and navigate the tabs and feel whether it works.

I lost that when I was deep in Playwright. I was so focused on making the tests pass that I stopped asking whether the thing being tested was even worth building.

Craftsmanship

This connects to something I keep coming back to: software as craft, not just engineering.

A woodworker doesn't just measure whether the joints fit. They run their hand along the surface. They check the grain. They feel whether the piece is right. The measurements matter, but they're not the whole story.

Manual testing is the equivalent for software. It's the part where you step back from the code and experience the thing as a whole. Not as a collection of functions and API calls, but as a tool someone will actually use.

When I was stuck in Playwright, I'd forgotten that. I was thinking in units of "does this test pass?" instead of "does this software work?" And those are not the same question.

Where Automation Fits

I'm not saying automated tests are useless. For certain things — especially backend logic, API contracts, critical paths — they're valuable. If you have a billing calculation that must be precise, write a test. If you have authentication logic that must be secure, write a test.

But I've stopped trying to automate the experience layer. I've stopped trying to get Playwright to click through flows that I should be clicking through myself. Because every hour I spent debugging test selectors was an hour I wasn't spending understanding my product.

And when you're building with AI, understanding your product is the most important thing you can do.

The Real Test

The real test isn't whether Playwright can find the button. The real test is whether you, as the builder, can use the software and feel confident that it does what it should.

Can you add a billing item and know, intuitively, that the flow makes sense?

Can you toggle a setting and understand, immediately, what it's changing?

Can you navigate through the app and feel that it's cohesive, that the pieces fit together, that it's something you'd be willing to show someone else?

If you can't, no amount of automated testing will save you. Because the problem isn't that the code is broken. The problem is that you don't understand what you're building anymore.

Sometimes You Must Be Ready

This circles back to something I wrote about in another essay: sometimes things align in unique ways. You must be ready.

When you're manually testing, you're not just checking for errors. You're staying ready. Ready to notice when something feels off. Ready to ask the question that leads to a better design. Ready to see the feature that's technically correct but strategically wrong.

AI moves fast. Automated tests move fast. Manual testing slows you down — deliberately — so you can think. So you can stay connected to the craft. So you can maintain the flavour.

And in a world where AI can build anything you ask for, the bottleneck isn't speed. It's understanding. It's knowing what to ask for in the first place.

Manual testing is how you stay in touch with that.

Unanswered Questions

  • Is there a version of automated testing that preserves understanding instead of replacing it?
  • How do you know when you've tested enough manually? When do you trust the software?
  • What role does manual testing play in larger teams? Does the problem change when you have dedicated QA?
  • Can AI ever develop the "flavour" for software, or is that inherently human?
  • Is the real lesson here about slowing down in general, not just about testing?