Blog | Gasoline MCP

Why Natural Language Is the Best Way to Write Tests

Feb 7, 2026

Brenn Hill

Test scripts written in English are more readable, more maintainable, and more accessible than any test framework. Here’s why natural language testing with Gasoline MCP is the future.

Tests Are Supposed to Be Documentation

The original promise of testing was simple: tests describe what the software should do. If someone new joins the team, they read the tests and understand the product.

That promise died somewhere between page.locator('.btn-primary').nth(2).click() and await expect(wrapper.find('[data-testid="modal-close"]')).toBeVisible().

Nobody reads test code to understand the product. They read it to figure out why CI is red.

Natural language testing brings the promise back. A test that says “Click ‘Add to Cart’ and verify the cart shows 1 item” is documentation that runs.

Everyone Can Read It, Everyone Can Write It

A Playwright test requires JavaScript knowledge, framework familiarity, and understanding of CSS selectors. The audience for that test is maybe 5 people on your team.

A natural language test requires knowing what the product should do. The audience is everyone — product managers, designers, QA, support, executives, and engineers.

Test: Password Reset Flow

1. Click "Forgot Password" on the login page
2. Enter "user@example.com" in the email field
3. Click "Send Reset Link"
4. Verify the page shows "Check your email"
5. Verify no errors in the console
6. Verify the API call to /api/auth/reset returned 200

A product manager wrote that. A designer can review it. QA can run it. An engineer can debug it when it fails. Everyone works from the same artifact.

Maintenance Costs Drop to Near Zero

Traditional test maintenance is a tax on velocity. Every UI change risks breaking tests. Teams either spend hours fixing selectors or stop running the tests entirely.

Natural language tests break only when the product behavior changes — and that’s exactly when you want them to break.

UI Change	Playwright breaks?	Natural language breaks?
Button class renamed	Yes	No
Form restructured	Yes	No
CSS framework swapped	Yes	No
Component library upgraded	Yes	No
Button text “Submit” to “Register”	No	Yes — intentionally
Checkout flow adds a step	No	Yes — intentionally

The test breaks when the product changes. It doesn’t break when the implementation changes. That’s the correct behavior for an acceptance test.

AI Fills the Gaps You’d Forget

When you write a Playwright test, you write exactly what you coded. Nothing more. If you forgot to check for console errors, the test doesn’t check for console errors.

When an AI executes a natural language test with Gasoline, it has access to the full browser state. You can write:

5. Verify no errors on the page

And the AI calls observe({what: "errors"}) to check the console, and looks at the page for visible error messages, and can check observe({what: "network_bodies", status_min: 400}) for failing API calls.

You described the intent. The AI was thorough about the implementation.

Tests Match How You Think About the Product

Product people think in workflows: “The user signs up, gets a welcome email, clicks the confirmation link, and lands on the dashboard.”

Engineers think in selectors: “Click #signup-btn, fill input[name='email'], submit the form, wait for [data-testid='welcome-modal'].”

Natural language tests match the product mental model, not the implementation mental model. This means:

Acceptance criteria become tests directly. The criteria in your Jira ticket are the test. No translation step.
Test reviews are product reviews. When a PM reviews a test, they’re reviewing product behavior, not code.
Gap analysis is intuitive. “We test the happy path but not the error case” is obvious when tests are in English.

Deeper Verification Than Code Can Express

With Gasoline, the AI can verify things that are awkward or impossible in traditional test frameworks:

8. Verify the WebSocket reconnects after the connection drop

The AI calls observe({what: "websocket_status"}) and checks the connection state. Try writing that in Selenium.

12. Verify the page loads in under 3 seconds

The AI checks observe({what: "vitals"}) for LCP. No performance testing library needed.

15. Verify the page is accessible

The AI runs observe({what: "accessibility"}) for a WCAG audit. No axe-core setup needed.

Natural language lets you describe what matters. The AI and Gasoline figure out how to measure it.

The BDD Promise, Finally Delivered

Behavior-Driven Development (BDD) tried to solve this problem with Gherkin syntax:

Given the user is on the login page
When they enter valid credentials
Then they should see the dashboard

But Gherkin still required step definitions — glue code that mapped English to implementation. Someone still had to write Given('the user is on the login page', () => page.goto('/login')). The maintenance burden just moved.

With Gasoline, there are no step definitions. The AI is the step definition. It reads “the user is on the login page” and navigates there. No glue code. No mapping layer. No maintenance.

BDD was right about the idea. It just needed AI to finish the job.

When to Use Natural Language Tests

Natural language tests are ideal for:

Acceptance testing — Verify the product meets requirements
Regression testing — Re-run after deploys to catch breakage
Exploratory testing — “Navigate the settings page and verify nothing looks broken”
Cross-product workflows — “Log in to the admin panel, create a user, then switch to the customer app and verify the user can log in”
Demo verification — “Run the demo script and verify every step completes”

They complement (not replace) unit tests and integration tests. Your engineers still write fast, focused unit tests for business logic. Natural language tests cover the full-stack, end-to-end workflows that live at the product level.

The best test is the one that gets written. And the test that gets written is the one that’s easy to write.

Why Product Managers Love Gasoline

Feb 7, 2026

Brenn Hill

Record demos, explore bugs, file detailed issue reports, and even fix simple issues — all without waiting for an engineer. Gasoline gives PMs superpowers.

The PM Bottleneck

You’re the product manager. You know the product better than anyone. You found the bug, you can reproduce it, you know exactly when it started. But you can’t fix it. You can’t even file a useful bug report without an engineer helping you extract the console errors, network responses, and steps to reproduce.

So you write “the checkout button doesn’t work sometimes” in Jira, attach a screenshot, and wait three days for an engineer to ask you to reproduce it on a call.

Gasoline changes this equation. With your AI tool connected to the browser, you can:

Record polished product demos without engineering help
Explore and diagnose bugs with full technical context
File rich issue reports with errors, network data, and reproduction scripts
Fix simple issues directly — yes, actually fix them

Superpower 1: Record Product Demos

Writing a demo used to mean begging engineering for a staging environment and a walkthrough. Now you write a script in English:

Demo: New Onboarding Flow

1. Navigate to the signup page
2. Subtitle: "New users see a simplified signup with just email and password."
3. Fill in the signup form with test data
4. Click Create Account
5. Subtitle: "After signup, users land directly in the guided setup."
6. Wait for the setup wizard to load
7. Subtitle: "The wizard adapts based on the user's role selection."
8. Select "Product Manager" from the role dropdown
9. Click Continue
10. Subtitle: "PMs get a pre-built dashboard with team metrics and sprint progress."

Hand this to your AI. It drives the browser, shows the subtitles, and you narrate over the top. Save it. Replay it for the next meeting. Modify it for a different audience.

No engineering support. No screen recording software. No slides.

Superpower 2: Explore and Diagnose Bugs

You found a bug. Instead of writing a vague description, tell your AI:

“The checkout page is showing the wrong total. Can you check what’s happening?”

Your AI, connected to the browser via Gasoline, does what an engineer would do:

Checks the console:

observe({what: "errors"})
// -> TypeError: Cannot read property 'discount' of undefined at cart.js:142

Checks the API response:

observe({what: "network_bodies", url: "/api/cart"})
// -> Response: {items: [...], discount: null, total: 89.99}

Checks the visual state:

observe({what: "screenshot"})
// -> The page shows "$89.99" but the discount line shows "$0.00 off"

Diagnosis: “The API returns discount: null instead of discount: {amount: 10, code: 'SAVE10'}. The frontend crashes trying to read discount.amount from null. The total is correct, but the discount display is broken.”

You just did 15 minutes of engineering triage in 30 seconds. Without opening DevTools. Without knowing JavaScript.

Superpower 3: File Rich Issue Reports

Instead of “checkout is broken,” you file:

Bug: Discount not displaying on checkout page

Steps to Reproduce:
1. Add item to cart
2. Apply discount code "SAVE10"
3. Navigate to checkout

Expected: Discount shows "$10.00 off"
Actual: Discount shows "$0.00 off", console error on page

Technical Details:
- Console error: TypeError at cart.js:142 — discount is null
- API response from /api/cart: discount field is null
  (expected: {amount: 10, code: "SAVE10"})
- The /api/cart/apply-discount endpoint returned 200
  but didn't persist the discount to the cart object

Reproduction script attached.

Your AI can also generate a Playwright reproduction script:

generate({format: "reproduction"})

The engineer gets a one-click reproduction, the exact error, the API response, and a root cause hypothesis. They start fixing, not investigating.

Superpower 4: Fix Simple Issues Directly

This is the big one. For certain classes of issues, you don’t need an engineer at all.

Copy changes: “The button says ‘Submit’ but it should say ‘Save Changes.’” Tell your AI, it finds the text in the codebase, changes it, runs the tests, and opens a PR.

Configuration issues: “The timeout on the upload page is too short — users with large files are getting errors.” Your AI observes the error, finds the timeout configuration, adjusts it, and verifies the fix.

Styling issues: “The modal is cut off on mobile.” Your AI takes a screenshot, identifies the CSS issue, fixes it, and shows you the before/after.

You’re not writing code. You’re describing the problem in English, and the AI — with full visibility into the browser via Gasoline — has enough context to fix it.

Obviously, this doesn’t replace engineers for complex features, architectural decisions, or security-critical changes. But for the dozens of small issues that sit in the backlog because they’re “not worth an engineer’s time” — now they’re worth your time, because your time is all it takes.

Superpower 5: Create Living Acceptance Tests

You write acceptance criteria anyway. Now they run:

Acceptance Criteria: User Registration

1. Navigate to the registration page
2. Fill in name, email, and password
3. Check the Terms of Service checkbox
4. Click Create Account
5. Verify the welcome page loads
6. Verify no errors in the console
7. Verify the registration API returned 200

Tell your AI to run it against staging after each deploy. You get a report: “All 7 steps passed” or “Step 5 failed — the welcome page returned a 500 error from the API.”

You’re not depending on QA bandwidth. You’re not waiting for an engineer to write the test. The acceptance criteria you already wrote are the test.

The Productivity Shift

Task	Before Gasoline	With Gasoline
File a bug report	Screenshot + vague description	Full technical report with errors, API data, and reproduction script
Record a demo	Coordinate with engineering, screen record	Write a text script, AI runs it, replay anytime
Validate a fix	Wait for deploy, manually test	AI runs the acceptance test and reports results
Explore an issue	Open DevTools (if you know how)	Tell AI “what’s happening on this page?”
Fix a copy typo	File a ticket, wait for prioritization	AI fixes it and opens a PR in 2 minutes
Run acceptance tests	Depend on QA schedule	Run your natural language tests on demand

Getting Started as a PM

Install Gasoline — Follow the Getting Started guide. It takes 2 minutes.
Connect your AI tool — Claude Code, Cursor, or any MCP-compatible tool.
Start with observation — Browse your product normally and ask the AI: “What errors are happening on this page?” You’ll be surprised what you find.
Try a demo script — Write 5 steps in English. Ask the AI to run them. See it work.
File your first rich bug report — Next time you find a bug, ask the AI to diagnose it. Paste the diagnosis into your ticket.

You don’t need to learn JavaScript. You don’t need to understand CSS selectors. You don’t need DevTools.

You need to describe what you see and what you expect. The AI and Gasoline handle the rest.

Gasoline v5.8.0 Released

Feb 6, 2026

Brenn Hill

What’s New in v5.8.0

Gasoline v5.8.0 solves a long-standing WebSocket capture blind spot: pages that create WebSocket connections before the inject script loads now have those connections captured automatically. This release also adds visual feedback for AI actions and ships a comprehensive 106-test UAT suite.

Features

Early-patch WebSocket capture — A new world: "MAIN" content script patches window.WebSocket before any page JavaScript runs. This means sites like Binance that create WebSocket connections immediately on page load now have those connections captured and visible via observe(websocket_status). Buffered connections are seamlessly adopted when the full inject script initializes.
Visual action toasts — When AI tools use interact() to navigate, execute JavaScript, or highlight elements, a brief toast overlay appears on the page showing what the AI is doing. This makes AI actions visible to developers watching the browser.

Fixes

Fixed camelCase to snake_case field mapping for network waterfall entries (duration, transfer_size, etc.)
Command results now route through the /sync endpoint with proper client ID filtering
After navigation, tracking state is broadcast so favicon updates correctly
Empty arrays return [] instead of null in JSON responses
Bridge timeouts now return a proper extension_timeout error code

Testing

106-test parallel UAT suite replacing the previous 8-test script, covering observe, generate, configure, interact, and data pipeline categories
16-test human smoke test with error clusters, DOM query, full form lifecycle, highlight, and real WebSocket traffic tests
All tests default to fail (not pass) with strict field validation throughout

Performance

Binary sizes decreased ~4% from v5.7.5. All SLOs continue to pass:

MCP fast-start: ~130ms
Tool response: < 1ms
Max binary: 7.7 MB (target: < 15 MB)

Upgrade

npx gasoline-mcp@5.8.0

Full Changelog

v5.7.5…v5.8.0

Gasoline v5.6.0 Released

Feb 6, 2026

Brenn Hill

What’s New in v5.6.0

Gasoline v5.6.0 focuses on server-side reliability with persistence guarantees and comprehensive architecture invariant tests.

Features

Persistent Message Queue — Guarantees no messages lost during server restarts
Transaction-Safe State — Atomic operations for observer state updates
Architecture Validation — New test suite validating core invariants

Improvements

Improved graceful shutdown of long-running observations
Better handling of concurrent client connections
Enhanced observability into server health and performance
Stricter validation of MCP protocol compliance

Testing

50+ new architecture invariant tests
Stress testing with high message volume
Connection resilience testing

Upgrade

npm install -g gasoline-mcp@5.6.0

Full Changelog

v5.6.0 Release

Gasoline v5.7.5 Released

Feb 6, 2026

Brenn Hill

What’s New in v5.7.5

This release makes Gasoline MCP feel instant. The new fast-start mode responds to MCP clients in ~130ms while the daemon boots in the background.

Fast-Start MCP Mode

Previously, MCP clients had to wait for the full daemon to boot before getting any response. Now, initialize and tools/list respond immediately from the bridge process:

Before: Client → wait 2-4s for daemon → get response
After:  Client → get response in ~130ms → daemon boots in background

This means your AI coding agent gets tool definitions instantly and can start planning while the server finishes starting up. If you call a tool before the daemon is ready, you get a helpful retry message instead of a hang.

Port Conflict Diagnostics

The --doctor command now checks if port 7890 is available:

npx gasoline-mcp --doctor

# Now shows:
# ✅ Port 7890
#    Default port is available
#
# Or if blocked:
# ⚠️  Port 7890
#    Port 7890 is in use (PID: 12345)
#    Suggestion: Use --port 7891 or kill the process using the port

Better Error Messages

When the daemon can’t start because the port is blocked, you now get actionable suggestions:

Server failed to start: port 7890 already in use. Port may be in use. Try: npx gasoline-mcp --port 7891

Faster Failure Detection

Daemon startup timeout reduced from 10s to 4s. If something is wrong, you’ll know in 4 seconds instead of 10.

Upgrade

npx gasoline-mcp@5.7.5

Full Changelog

See the complete list of changes on GitHub.