Skip to content

Blog

Why Natural Language Is the Best Way to Write Tests

Test scripts written in English are more readable, more maintainable, and more accessible than any test framework. Here’s why natural language testing with Gasoline MCP is the future.

The original promise of testing was simple: tests describe what the software should do. If someone new joins the team, they read the tests and understand the product.

That promise died somewhere between page.locator('.btn-primary').nth(2).click() and await expect(wrapper.find('[data-testid="modal-close"]')).toBeVisible().

Nobody reads test code to understand the product. They read it to figure out why CI is red.

Natural language testing brings the promise back. A test that says “Click ‘Add to Cart’ and verify the cart shows 1 item” is documentation that runs.

Everyone Can Read It, Everyone Can Write It

Section titled “Everyone Can Read It, Everyone Can Write It”

A Playwright test requires JavaScript knowledge, framework familiarity, and understanding of CSS selectors. The audience for that test is maybe 5 people on your team.

A natural language test requires knowing what the product should do. The audience is everyone — product managers, designers, QA, support, executives, and engineers.

Test: Password Reset Flow
1. Click "Forgot Password" on the login page
2. Enter "user@example.com" in the email field
3. Click "Send Reset Link"
4. Verify the page shows "Check your email"
5. Verify no errors in the console
6. Verify the API call to /api/auth/reset returned 200

A product manager wrote that. A designer can review it. QA can run it. An engineer can debug it when it fails. Everyone works from the same artifact.

Traditional test maintenance is a tax on velocity. Every UI change risks breaking tests. Teams either spend hours fixing selectors or stop running the tests entirely.

Natural language tests break only when the product behavior changes — and that’s exactly when you want them to break.

UI ChangePlaywright breaks?Natural language breaks?
Button class renamedYesNo
Form restructuredYesNo
CSS framework swappedYesNo
Component library upgradedYesNo
Button text “Submit” to “Register”NoYes — intentionally
Checkout flow adds a stepNoYes — intentionally

The test breaks when the product changes. It doesn’t break when the implementation changes. That’s the correct behavior for an acceptance test.

When you write a Playwright test, you write exactly what you coded. Nothing more. If you forgot to check for console errors, the test doesn’t check for console errors.

When an AI executes a natural language test with Gasoline, it has access to the full browser state. You can write:

5. Verify no errors on the page

And the AI calls observe({what: "errors"}) to check the console, and looks at the page for visible error messages, and can check observe({what: "network_bodies", status_min: 400}) for failing API calls.

You described the intent. The AI was thorough about the implementation.

Tests Match How You Think About the Product

Section titled “Tests Match How You Think About the Product”

Product people think in workflows: “The user signs up, gets a welcome email, clicks the confirmation link, and lands on the dashboard.”

Engineers think in selectors: “Click #signup-btn, fill input[name='email'], submit the form, wait for [data-testid='welcome-modal'].”

Natural language tests match the product mental model, not the implementation mental model. This means:

  • Acceptance criteria become tests directly. The criteria in your Jira ticket are the test. No translation step.
  • Test reviews are product reviews. When a PM reviews a test, they’re reviewing product behavior, not code.
  • Gap analysis is intuitive. “We test the happy path but not the error case” is obvious when tests are in English.

With Gasoline, the AI can verify things that are awkward or impossible in traditional test frameworks:

8. Verify the WebSocket reconnects after the connection drop

The AI calls observe({what: "websocket_status"}) and checks the connection state. Try writing that in Selenium.

12. Verify the page loads in under 3 seconds

The AI checks observe({what: "vitals"}) for LCP. No performance testing library needed.

15. Verify the page is accessible

The AI runs observe({what: "accessibility"}) for a WCAG audit. No axe-core setup needed.

Natural language lets you describe what matters. The AI and Gasoline figure out how to measure it.

Behavior-Driven Development (BDD) tried to solve this problem with Gherkin syntax:

Given the user is on the login page
When they enter valid credentials
Then they should see the dashboard

But Gherkin still required step definitions — glue code that mapped English to implementation. Someone still had to write Given('the user is on the login page', () => page.goto('/login')). The maintenance burden just moved.

With Gasoline, there are no step definitions. The AI is the step definition. It reads “the user is on the login page” and navigates there. No glue code. No mapping layer. No maintenance.

BDD was right about the idea. It just needed AI to finish the job.

Natural language tests are ideal for:

  • Acceptance testing — Verify the product meets requirements
  • Regression testing — Re-run after deploys to catch breakage
  • Exploratory testing — “Navigate the settings page and verify nothing looks broken”
  • Cross-product workflows — “Log in to the admin panel, create a user, then switch to the customer app and verify the user can log in”
  • Demo verification — “Run the demo script and verify every step completes”

They complement (not replace) unit tests and integration tests. Your engineers still write fast, focused unit tests for business logic. Natural language tests cover the full-stack, end-to-end workflows that live at the product level.

The best test is the one that gets written. And the test that gets written is the one that’s easy to write.

Why Product Managers Love Gasoline

Record demos, explore bugs, file detailed issue reports, and even fix simple issues — all without waiting for an engineer. Gasoline gives PMs superpowers.

You’re the product manager. You know the product better than anyone. You found the bug, you can reproduce it, you know exactly when it started. But you can’t fix it. You can’t even file a useful bug report without an engineer helping you extract the console errors, network responses, and steps to reproduce.

So you write “the checkout button doesn’t work sometimes” in Jira, attach a screenshot, and wait three days for an engineer to ask you to reproduce it on a call.

Gasoline changes this equation. With your AI tool connected to the browser, you can:

  1. Record polished product demos without engineering help
  2. Explore and diagnose bugs with full technical context
  3. File rich issue reports with errors, network data, and reproduction scripts
  4. Fix simple issues directly — yes, actually fix them

Writing a demo used to mean begging engineering for a staging environment and a walkthrough. Now you write a script in English:

Demo: New Onboarding Flow
1. Navigate to the signup page
2. Subtitle: "New users see a simplified signup with just email and password."
3. Fill in the signup form with test data
4. Click Create Account
5. Subtitle: "After signup, users land directly in the guided setup."
6. Wait for the setup wizard to load
7. Subtitle: "The wizard adapts based on the user's role selection."
8. Select "Product Manager" from the role dropdown
9. Click Continue
10. Subtitle: "PMs get a pre-built dashboard with team metrics and sprint progress."

Hand this to your AI. It drives the browser, shows the subtitles, and you narrate over the top. Save it. Replay it for the next meeting. Modify it for a different audience.

No engineering support. No screen recording software. No slides.

You found a bug. Instead of writing a vague description, tell your AI:

“The checkout page is showing the wrong total. Can you check what’s happening?”

Your AI, connected to the browser via Gasoline, does what an engineer would do:

Checks the console:

observe({what: "errors"})
// -> TypeError: Cannot read property 'discount' of undefined at cart.js:142

Checks the API response:

observe({what: "network_bodies", url: "/api/cart"})
// -> Response: {items: [...], discount: null, total: 89.99}

Checks the visual state:

observe({what: "screenshot"})
// -> The page shows "$89.99" but the discount line shows "$0.00 off"

Diagnosis: “The API returns discount: null instead of discount: {amount: 10, code: 'SAVE10'}. The frontend crashes trying to read discount.amount from null. The total is correct, but the discount display is broken.”

You just did 15 minutes of engineering triage in 30 seconds. Without opening DevTools. Without knowing JavaScript.

Instead of “checkout is broken,” you file:

Bug: Discount not displaying on checkout page
Steps to Reproduce:
1. Add item to cart
2. Apply discount code "SAVE10"
3. Navigate to checkout
Expected: Discount shows "$10.00 off"
Actual: Discount shows "$0.00 off", console error on page
Technical Details:
- Console error: TypeError at cart.js:142 — discount is null
- API response from /api/cart: discount field is null
(expected: {amount: 10, code: "SAVE10"})
- The /api/cart/apply-discount endpoint returned 200
but didn't persist the discount to the cart object
Reproduction script attached.

Your AI can also generate a Playwright reproduction script:

generate({format: "reproduction"})

The engineer gets a one-click reproduction, the exact error, the API response, and a root cause hypothesis. They start fixing, not investigating.

This is the big one. For certain classes of issues, you don’t need an engineer at all.

Copy changes: “The button says ‘Submit’ but it should say ‘Save Changes.’” Tell your AI, it finds the text in the codebase, changes it, runs the tests, and opens a PR.

Configuration issues: “The timeout on the upload page is too short — users with large files are getting errors.” Your AI observes the error, finds the timeout configuration, adjusts it, and verifies the fix.

Styling issues: “The modal is cut off on mobile.” Your AI takes a screenshot, identifies the CSS issue, fixes it, and shows you the before/after.

You’re not writing code. You’re describing the problem in English, and the AI — with full visibility into the browser via Gasoline — has enough context to fix it.

Obviously, this doesn’t replace engineers for complex features, architectural decisions, or security-critical changes. But for the dozens of small issues that sit in the backlog because they’re “not worth an engineer’s time” — now they’re worth your time, because your time is all it takes.

Superpower 5: Create Living Acceptance Tests

Section titled “Superpower 5: Create Living Acceptance Tests”

You write acceptance criteria anyway. Now they run:

Acceptance Criteria: User Registration
1. Navigate to the registration page
2. Fill in name, email, and password
3. Check the Terms of Service checkbox
4. Click Create Account
5. Verify the welcome page loads
6. Verify no errors in the console
7. Verify the registration API returned 200

Tell your AI to run it against staging after each deploy. You get a report: “All 7 steps passed” or “Step 5 failed — the welcome page returned a 500 error from the API.”

You’re not depending on QA bandwidth. You’re not waiting for an engineer to write the test. The acceptance criteria you already wrote are the test.

TaskBefore GasolineWith Gasoline
File a bug reportScreenshot + vague descriptionFull technical report with errors, API data, and reproduction script
Record a demoCoordinate with engineering, screen recordWrite a text script, AI runs it, replay anytime
Validate a fixWait for deploy, manually testAI runs the acceptance test and reports results
Explore an issueOpen DevTools (if you know how)Tell AI “what’s happening on this page?”
Fix a copy typoFile a ticket, wait for prioritizationAI fixes it and opens a PR in 2 minutes
Run acceptance testsDepend on QA scheduleRun your natural language tests on demand
  1. Install Gasoline — Follow the Getting Started guide. It takes 2 minutes.
  2. Connect your AI tool — Claude Code, Cursor, or any MCP-compatible tool.
  3. Start with observation — Browse your product normally and ask the AI: “What errors are happening on this page?” You’ll be surprised what you find.
  4. Try a demo script — Write 5 steps in English. Ask the AI to run them. See it work.
  5. File your first rich bug report — Next time you find a bug, ask the AI to diagnose it. Paste the diagnosis into your ticket.

You don’t need to learn JavaScript. You don’t need to understand CSS selectors. You don’t need DevTools.

You need to describe what you see and what you expect. The AI and Gasoline handle the rest.

Gasoline v5.8.0 Released

Gasoline v5.8.0 solves a long-standing WebSocket capture blind spot: pages that create WebSocket connections before the inject script loads now have those connections captured automatically. This release also adds visual feedback for AI actions and ships a comprehensive 106-test UAT suite.

  • Early-patch WebSocket capture — A new world: "MAIN" content script patches window.WebSocket before any page JavaScript runs. This means sites like Binance that create WebSocket connections immediately on page load now have those connections captured and visible via observe(websocket_status). Buffered connections are seamlessly adopted when the full inject script initializes.

  • Visual action toasts — When AI tools use interact() to navigate, execute JavaScript, or highlight elements, a brief toast overlay appears on the page showing what the AI is doing. This makes AI actions visible to developers watching the browser.

  • Fixed camelCase to snake_case field mapping for network waterfall entries (duration, transfer_size, etc.)
  • Command results now route through the /sync endpoint with proper client ID filtering
  • After navigation, tracking state is broadcast so favicon updates correctly
  • Empty arrays return [] instead of null in JSON responses
  • Bridge timeouts now return a proper extension_timeout error code
  • 106-test parallel UAT suite replacing the previous 8-test script, covering observe, generate, configure, interact, and data pipeline categories
  • 16-test human smoke test with error clusters, DOM query, full form lifecycle, highlight, and real WebSocket traffic tests
  • All tests default to fail (not pass) with strict field validation throughout

Binary sizes decreased ~4% from v5.7.5. All SLOs continue to pass:

  • MCP fast-start: ~130ms
  • Tool response: < 1ms
  • Max binary: 7.7 MB (target: < 15 MB)
Terminal window
npx gasoline-mcp@5.8.0

v5.7.5…v5.8.0

Gasoline v5.6.0 Released

Gasoline v5.6.0 focuses on server-side reliability with persistence guarantees and comprehensive architecture invariant tests.

  • Persistent Message Queue — Guarantees no messages lost during server restarts
  • Transaction-Safe State — Atomic operations for observer state updates
  • Architecture Validation — New test suite validating core invariants
  • Improved graceful shutdown of long-running observations
  • Better handling of concurrent client connections
  • Enhanced observability into server health and performance
  • Stricter validation of MCP protocol compliance
  • 50+ new architecture invariant tests
  • Stress testing with high message volume
  • Connection resilience testing
Terminal window
npm install -g gasoline-mcp@5.6.0

v5.6.0 Release

Gasoline v5.7.5 Released

This release makes Gasoline MCP feel instant. The new fast-start mode responds to MCP clients in ~130ms while the daemon boots in the background.

Previously, MCP clients had to wait for the full daemon to boot before getting any response. Now, initialize and tools/list respond immediately from the bridge process:

Before: Client → wait 2-4s for daemon → get response
After: Client → get response in ~130ms → daemon boots in background

This means your AI coding agent gets tool definitions instantly and can start planning while the server finishes starting up. If you call a tool before the daemon is ready, you get a helpful retry message instead of a hang.

The --doctor command now checks if port 7890 is available:

Terminal window
npx gasoline-mcp --doctor
# Now shows:
# ✅ Port 7890
# Default port is available
#
# Or if blocked:
# ⚠️ Port 7890
# Port 7890 is in use (PID: 12345)
# Suggestion: Use --port 7891 or kill the process using the port

When the daemon can’t start because the port is blocked, you now get actionable suggestions:

Server failed to start: port 7890 already in use. Port may be in use. Try: npx gasoline-mcp --port 7891

Daemon startup timeout reduced from 10s to 4s. If something is wrong, you’ll know in 4 seconds instead of 10.

Terminal window
npx gasoline-mcp@5.7.5

See the complete list of changes on GitHub.