Why Natural Language Is the Best Way to Write Tests

Feb 7, 2026

Brenn Hill

Test scripts written in English are more readable, more maintainable, and more accessible than any test framework. Here’s why natural language testing with Gasoline MCP is the future.

Tests Are Supposed to Be Documentation

The original promise of testing was simple: tests describe what the software should do. If someone new joins the team, they read the tests and understand the product.

That promise died somewhere between page.locator('.btn-primary').nth(2).click() and await expect(wrapper.find('[data-testid="modal-close"]')).toBeVisible().

Nobody reads test code to understand the product. They read it to figure out why CI is red.

Natural language testing brings the promise back. A test that says “Click ‘Add to Cart’ and verify the cart shows 1 item” is documentation that runs.

Everyone Can Read It, Everyone Can Write It

A Playwright test requires JavaScript knowledge, framework familiarity, and understanding of CSS selectors. The audience for that test is maybe 5 people on your team.

A natural language test requires knowing what the product should do. The audience is everyone — product managers, designers, QA, support, executives, and engineers.

Test: Password Reset Flow

1. Click "Forgot Password" on the login page
2. Enter "user@example.com" in the email field
3. Click "Send Reset Link"
4. Verify the page shows "Check your email"
5. Verify no errors in the console
6. Verify the API call to /api/auth/reset returned 200

A product manager wrote that. A designer can review it. QA can run it. An engineer can debug it when it fails. Everyone works from the same artifact.

Maintenance Costs Drop to Near Zero

Traditional test maintenance is a tax on velocity. Every UI change risks breaking tests. Teams either spend hours fixing selectors or stop running the tests entirely.

Natural language tests break only when the product behavior changes — and that’s exactly when you want them to break.

UI Change	Playwright breaks?	Natural language breaks?
Button class renamed	Yes	No
Form restructured	Yes	No
CSS framework swapped	Yes	No
Component library upgraded	Yes	No
Button text “Submit” to “Register”	No	Yes — intentionally
Checkout flow adds a step	No	Yes — intentionally

The test breaks when the product changes. It doesn’t break when the implementation changes. That’s the correct behavior for an acceptance test.

AI Fills the Gaps You’d Forget

When you write a Playwright test, you write exactly what you coded. Nothing more. If you forgot to check for console errors, the test doesn’t check for console errors.

When an AI executes a natural language test with Gasoline, it has access to the full browser state. You can write:

5. Verify no errors on the page

And the AI calls observe({what: "errors"}) to check the console, and looks at the page for visible error messages, and can check observe({what: "network_bodies", status_min: 400}) for failing API calls.

You described the intent. The AI was thorough about the implementation.

Tests Match How You Think About the Product

Product people think in workflows: “The user signs up, gets a welcome email, clicks the confirmation link, and lands on the dashboard.”

Engineers think in selectors: “Click #signup-btn, fill input[name='email'], submit the form, wait for [data-testid='welcome-modal'].”

Natural language tests match the product mental model, not the implementation mental model. This means:

Acceptance criteria become tests directly. The criteria in your Jira ticket are the test. No translation step.
Test reviews are product reviews. When a PM reviews a test, they’re reviewing product behavior, not code.
Gap analysis is intuitive. “We test the happy path but not the error case” is obvious when tests are in English.

Deeper Verification Than Code Can Express

With Gasoline, the AI can verify things that are awkward or impossible in traditional test frameworks:

8. Verify the WebSocket reconnects after the connection drop

The AI calls observe({what: "websocket_status"}) and checks the connection state. Try writing that in Selenium.

12. Verify the page loads in under 3 seconds

The AI checks observe({what: "vitals"}) for LCP. No performance testing library needed.

15. Verify the page is accessible

The AI runs observe({what: "accessibility"}) for a WCAG audit. No axe-core setup needed.

Natural language lets you describe what matters. The AI and Gasoline figure out how to measure it.

The BDD Promise, Finally Delivered

Behavior-Driven Development (BDD) tried to solve this problem with Gherkin syntax:

Given the user is on the login page
When they enter valid credentials
Then they should see the dashboard

But Gherkin still required step definitions — glue code that mapped English to implementation. Someone still had to write Given('the user is on the login page', () => page.goto('/login')). The maintenance burden just moved.

With Gasoline, there are no step definitions. The AI is the step definition. It reads “the user is on the login page” and navigates there. No glue code. No mapping layer. No maintenance.

BDD was right about the idea. It just needed AI to finish the job.

When to Use Natural Language Tests

Natural language tests are ideal for:

Acceptance testing — Verify the product meets requirements
Regression testing — Re-run after deploys to catch breakage
Exploratory testing — “Navigate the settings page and verify nothing looks broken”
Cross-product workflows — “Log in to the admin panel, create a user, then switch to the customer app and verify the user can log in”
Demo verification — “Run the demo script and verify every step completes”

They complement (not replace) unit tests and integration tests. Your engineers still write fast, focused unit tests for business logic. Natural language tests cover the full-stack, end-to-end workflows that live at the product level.

The best test is the one that gets written. And the test that gets written is the one that’s easy to write.