testing

3 posts with the tag “testing”

AI-Powered QA: How to Test Your Web App Without Writing Test Code

Feb 7, 2026

Brenn Hill

What if you could test your web application by describing what should happen — in plain English — and have an AI actually run the tests?

No Playwright scripts. No Selenium WebDriver setup. No npm install or pip install. No learning CSS selectors, XPath, or assertion libraries. Just tell the AI what to test, and it tests it.

This isn’t a future vision. It works today with Gasoline MCP.

The Testing Problem

Writing automated tests is expensive:

Setup cost: Install Node.js, install Playwright, configure the test runner, set up CI/CD
Writing cost: Learn the API, figure out selectors, handle async operations, manage test data
Maintenance cost: Every UI change breaks selectors. Every flow change breaks sequences. Tests that took 2 hours to write take 4 hours to maintain.

The result? Most teams have either:

No automated tests — manual QA only
Fragile tests — break on every deploy, ignored by the team
Expensive tests — dedicated QA engineers maintaining a test suite that’s always behind

Natural Language Testing

With Gasoline, testing looks like this:

"Go to the login page. Enter 'test@example.com' as the email and 'password123'
as the password. Click Sign In. Verify that you land on the dashboard and there
are no console errors."

The AI:

Navigates to the login page
Finds the email field (using semantic selectors — label=Email, not #email-input-field-v2)
Types the email
Finds the password field
Types the password
Clicks the Sign In button (by text, not by CSS selector)
Waits for navigation
Checks the URL contains /dashboard
Checks for console errors

If anything fails, the AI reports exactly what happened: “The Sign In button was found and clicked, but the page navigated to /error instead of /dashboard. The API returned a 401 with {"error": "invalid credentials"}.”

Why This Is Different

Selenium/Playwright test:

await page.goto('https://myapp.com/login');
await page.locator('#email-input').fill('test@example.com');
await page.locator('#password-input').fill('password123');
await page.locator('button[type="submit"]').click();
await expect(page).toHaveURL(/.*dashboard/);

Gasoline natural language:

Log in with test@example.com / password123.
Verify you reach the dashboard.

The Selenium test breaks when:

The email field ID changes from #email-input to #email-field
The submit button gets a new class or is replaced with a different component
The form structure changes (inputs wrapped in a new div)

The natural language test survives all of these because the AI uses meaning-based selectors: “the email field” → label=Email, “the sign in button” → text=Sign In.

What You Can Test

User Flows

"Sign up with a new account, verify the welcome email prompt appears,
dismiss it, navigate to settings, change the display name, and verify
the change is reflected in the header."

Form Validation

"Submit the contact form with an empty email. Verify an error message
appears. Then enter a valid email and submit. Verify it succeeds."

Error Handling

"Navigate to a product page that doesn't exist (/products/99999).
Verify a 404 page is shown and there are no console errors."

Performance

"Navigate to the homepage. Check that LCP is under 2.5 seconds and
there are no layout shifts above 0.1."

Accessibility

"Run an accessibility audit on the checkout page. Report any critical
or serious violations."

API Behavior

"Submit an order. Verify the API returns a 201 status and the response
includes an order ID."

The Lock-In: Generate Real Tests

Natural language tests are great for exploratory testing and quick validation. But for CI/CD, you need repeatable tests.

After running a natural language test session:

generate({format: "test", test_name: "guest-checkout",
          assert_network: true, assert_no_errors: true})

Gasoline generates a complete Playwright test from the session — every action translated to Playwright commands with proper selectors, network assertions, and error checking. The AI ran the test in natural language; Gasoline converts it to code for CI.

This is the best of both worlds:

Write tests in English — fast, no setup
Export to Playwright — repeatable, CI-ready
Re-run in English — if the generated test breaks, describe the flow again and regenerate

Who This Is For

Product Managers

You know the user flows better than anyone. You shouldn’t need to write JavaScript to verify them. Describe the flow, the AI tests it, and you see the results.

Startups Without QA Teams

You don’t have dedicated QA engineers, and your developers are building features, not writing tests. Natural language testing gives you test coverage without the headcount.

QA Engineers

You already know how to test. Natural language testing lets you work faster — describe 10 test cases in the time it takes to code 1. Generate Playwright tests from the ones that should be permanent.

Developers in a Hurry

You just shipped a feature and want to verify the happy path before the PR review. A 30-second natural language test is faster than writing a proper test and faster than manual testing.

Resilience: Why AI Tests Survive UI Changes

Traditional tests are tightly coupled to the UI implementation:

// Breaks when the button text changes from "Submit" to "Place Order"
await page.locator('button:has-text("Submit")').click();

// Breaks when the ID changes
await page.locator('#checkout-submit-btn').click();

// Breaks when the class changes
await page.locator('.btn-primary.submit').click();

The AI uses semantic selectors that adapt:

text=Submit → If the button now says “Place Order”, the AI reads the page and finds the new text
label=Email → Works regardless of whether it’s an <input>, a Material UI <TextField>, or a custom component
role=button → Works regardless of styling or class names

And if a selector doesn’t match, the AI doesn’t just fail — it calls interact({action: "list_interactive"}) to discover what’s actually on the page and adapts.

Save and Replay

For tests you run regularly:

Save the Flow

"Save this test flow as 'checkout-happy-path'."

configure({action: "store", store_action: "save",
           namespace: "tests", key: "checkout-happy-path",
           data: {steps: ["navigate to /checkout", "fill in shipping...", ...]}})

Replay Later

"Load and run the 'checkout-happy-path' test."

configure({action: "store", store_action: "load",
           namespace: "tests", key: "checkout-happy-path"})

State Checkpoints

Save browser state at key points:

interact({action: "save_state", snapshot_name: "logged-in"})

Later, restore that state instead of repeating the login flow:

interact({action: "load_state", snapshot_name: "logged-in", include_url: true})

Get Started

Install Gasoline (Quick Start)
Open your web app
Tell your AI: “Test the login flow — go to the login page, enter test credentials, sign in, and verify you reach the dashboard.”

No setup. No dependencies. No test code. Just describe what should happen.

Gasoline MCP vs Playwright: When to Use Which

Feb 7, 2026

Brenn Hill

Gasoline and Playwright aren’t competitors — they’re complementary. Playwright is a browser automation library for writing repeatable test scripts. Gasoline is an AI-powered browser observation and control layer. Gasoline can even generate Playwright tests.

But they serve different purposes, and knowing when to use each saves significant time.

The Quick Comparison

	Gasoline MCP	Playwright
Interface	Natural language via AI	JavaScript/TypeScript/Python API
Who uses it	Developers, PMs, QA — anyone	Developers and QA engineers
Setup	Install extension + `npx gasoline-mcp`	`npm init playwright@latest`
Selectors	Semantic (`text=Submit`, `label=Email`)	CSS, XPath, role, text, test-id
Test creation	Describe in English	Write code
Execution	AI runs it interactively	CLI or CI/CD pipeline
Debugging	Real-time browser observation	Trace viewer, screenshots
Maintenance	AI adapts to UI changes	Manual selector updates
CI/CD	Generate Playwright tests → run in CI	Native CI/CD support
Observability	Console, network, WebSocket, vitals, a11y	Limited (what you assert)
Performance	Built-in Web Vitals + perf_diff	Manual performance assertions
Cost	Free, open source	Free, open source

Where Gasoline Wins

Exploratory Testing

You’re checking if a feature works. You don’t want to write a script — you want to try it.

Playwright: Write a script, run it, read the output, modify, repeat.

Gasoline: “Go to the checkout page, add two items, and complete the purchase. Tell me if anything breaks.”

For one-off verification, natural language is 10x faster.

Debugging

Your test failed. Now what?

Playwright: Open the trace viewer. Scrub through screenshots. Check the assertion error message. Maybe add console.log statements to the test and re-run.

Gasoline: The AI already sees everything — console errors, network responses, WebSocket state, performance metrics. It can diagnose while testing.

observe({what: "error_bundles"})

One call returns the error with its correlated network requests and user actions. No trace viewer needed.

Adapting to UI Changes

A designer renamed “Submit” to “Place Order” and restructured the form.

Playwright: Tests fail. You update selectors manually across 15 test files. You hope you caught them all.

Gasoline: The AI reads the page, finds the new button text, and continues. No manual updates.

Non-Technical Users

A product manager wants to verify the user flow before release.

Playwright: Not an option without JavaScript knowledge.

Gasoline: “Walk through the signup flow and make sure it works.” The PM can do this themselves.

Observability Beyond Assertions

Playwright tests only check what you explicitly assert. If you don’t assert “no console errors,” you’ll never know about them.

Gasoline observes everything passively:

Console errors the test didn’t check for
Slow API responses the test didn’t measure
Layout shifts the test didn’t detect
Third-party script failures the test couldn’t see

Performance Testing

Playwright: You can measure timing with custom code, but there’s no built-in Web Vitals collection or before/after comparison.

Gasoline: Web Vitals are captured automatically. Navigate or refresh, and you get a perf_diff with deltas, ratings, and a verdict. No custom code.

Where Playwright Wins

CI/CD Pipelines

Playwright tests run headlessly in GitHub Actions, GitLab CI, or any CI system. They’re deterministic, repeatable, and fast.

Gasoline generates Playwright tests, but the actual CI execution is Playwright’s domain. Gasoline runs interactively with an AI assistant — it’s not designed to be a CI test runner.

Parallel Test Execution

Playwright can shard tests across multiple workers and run them in parallel. For a suite of 500 tests, this means finishing in minutes instead of hours.

Gasoline is single-session — one AI, one browser, one tab at a time.

Cross-Browser Testing

Playwright supports Chromium, Firefox, and WebKit out of the box.

Gasoline’s extension currently runs in Chrome/Chromium only.

Deterministic Assertions

When you need a test that passes or fails the exact same way every time, Playwright’s explicit assertions are the right tool:

await expect(page.getByRole('heading')).toHaveText('Welcome back');
await expect(response.status()).toBe(200);

AI-driven testing is intelligent but non-deterministic — the AI might take different paths or interpret “verify it works” differently across runs.

Network Mocking

Playwright can intercept and mock network requests, letting you test error states, slow responses, and edge cases without a real backend.

Gasoline observes real traffic — it doesn’t mock it.

The Best of Both: Generate Playwright from Gasoline

The power move: use Gasoline for exploration and Playwright for CI.

1. Explore with Gasoline

"Walk through the checkout flow — add an item, go to cart, enter
shipping info, and complete the purchase."

The AI runs the flow interactively, handling UI variations and reporting issues in real time.

2. Generate a Playwright Test

"Generate a Playwright test from this session."

generate({format: "test", test_name: "checkout-flow",
          base_url: "http://localhost:3000",
          assert_network: true,
          assert_no_errors: true,
          assert_response_shape: true})

Gasoline produces a complete Playwright test:

import { test, expect } from '@playwright/test';

test('checkout-flow', async ({ page }) => {
  const consoleErrors = [];
  page.on('console', msg => {
    if (msg.type() === 'error') consoleErrors.push(msg.text());
  });

  await page.goto('http://localhost:3000/products');
  await page.getByRole('button', { name: 'Add to Cart' }).click();
  await page.getByRole('link', { name: 'Cart' }).click();
  await page.getByLabel('Address').fill('123 Main St');
  // ...
  expect(consoleErrors).toHaveLength(0);
});

3. Run in CI

The generated test runs in your CI pipeline like any other Playwright test. Deterministic, repeatable, fast.

4. When the Test Breaks

The UI changed and the Playwright test fails. Instead of manually updating selectors:

"The checkout test is failing because the form changed.
Walk through the checkout flow again and generate a new test."

The AI adapts to the new UI, generates a fresh Playwright test, and you’re back in CI.

Decision Guide

Scenario	Use
Quick feature verification	Gasoline
CI/CD regression suite	Playwright (generated by Gasoline)
Debugging a test failure	Gasoline (better observability)
Non-developer testing	Gasoline
Cross-browser testing	Playwright
Performance monitoring	Gasoline (built-in vitals)
Network mocking	Playwright
Accessibility auditing	Gasoline (built-in axe-core)
Exploratory testing	Gasoline
500+ test parallel execution	Playwright
Test maintenance	Gasoline (regenerate broken tests)

The Workflow That Uses Both

Develop — use Gasoline for real-time debugging and quick validation
Generate — convert validated flows to Playwright tests
CI — run Playwright tests on every push
Maintain — when tests break, re-explore with Gasoline and regenerate

Gasoline doesn’t replace Playwright. It makes Playwright tests easier to create, easier to maintain, and easier to debug when they fail.

Why Natural Language Is the Best Way to Write Tests

Feb 7, 2026

Brenn Hill

Test scripts written in English are more readable, more maintainable, and more accessible than any test framework. Here’s why natural language testing with Gasoline MCP is the future.

Tests Are Supposed to Be Documentation

The original promise of testing was simple: tests describe what the software should do. If someone new joins the team, they read the tests and understand the product.

That promise died somewhere between page.locator('.btn-primary').nth(2).click() and await expect(wrapper.find('[data-testid="modal-close"]')).toBeVisible().

Nobody reads test code to understand the product. They read it to figure out why CI is red.

Natural language testing brings the promise back. A test that says “Click ‘Add to Cart’ and verify the cart shows 1 item” is documentation that runs.

Everyone Can Read It, Everyone Can Write It

A Playwright test requires JavaScript knowledge, framework familiarity, and understanding of CSS selectors. The audience for that test is maybe 5 people on your team.

A natural language test requires knowing what the product should do. The audience is everyone — product managers, designers, QA, support, executives, and engineers.

Test: Password Reset Flow

1. Click "Forgot Password" on the login page
2. Enter "user@example.com" in the email field
3. Click "Send Reset Link"
4. Verify the page shows "Check your email"
5. Verify no errors in the console
6. Verify the API call to /api/auth/reset returned 200

A product manager wrote that. A designer can review it. QA can run it. An engineer can debug it when it fails. Everyone works from the same artifact.

Maintenance Costs Drop to Near Zero

Traditional test maintenance is a tax on velocity. Every UI change risks breaking tests. Teams either spend hours fixing selectors or stop running the tests entirely.

Natural language tests break only when the product behavior changes — and that’s exactly when you want them to break.

UI Change	Playwright breaks?	Natural language breaks?
Button class renamed	Yes	No
Form restructured	Yes	No
CSS framework swapped	Yes	No
Component library upgraded	Yes	No
Button text “Submit” to “Register”	No	Yes — intentionally
Checkout flow adds a step	No	Yes — intentionally

The test breaks when the product changes. It doesn’t break when the implementation changes. That’s the correct behavior for an acceptance test.

AI Fills the Gaps You’d Forget

When you write a Playwright test, you write exactly what you coded. Nothing more. If you forgot to check for console errors, the test doesn’t check for console errors.

When an AI executes a natural language test with Gasoline, it has access to the full browser state. You can write:

5. Verify no errors on the page

And the AI calls observe({what: "errors"}) to check the console, and looks at the page for visible error messages, and can check observe({what: "network_bodies", status_min: 400}) for failing API calls.

You described the intent. The AI was thorough about the implementation.

Tests Match How You Think About the Product

Product people think in workflows: “The user signs up, gets a welcome email, clicks the confirmation link, and lands on the dashboard.”

Engineers think in selectors: “Click #signup-btn, fill input[name='email'], submit the form, wait for [data-testid='welcome-modal'].”

Natural language tests match the product mental model, not the implementation mental model. This means:

Acceptance criteria become tests directly. The criteria in your Jira ticket are the test. No translation step.
Test reviews are product reviews. When a PM reviews a test, they’re reviewing product behavior, not code.
Gap analysis is intuitive. “We test the happy path but not the error case” is obvious when tests are in English.

Deeper Verification Than Code Can Express

With Gasoline, the AI can verify things that are awkward or impossible in traditional test frameworks:

8. Verify the WebSocket reconnects after the connection drop

The AI calls observe({what: "websocket_status"}) and checks the connection state. Try writing that in Selenium.

12. Verify the page loads in under 3 seconds

The AI checks observe({what: "vitals"}) for LCP. No performance testing library needed.

15. Verify the page is accessible

The AI runs observe({what: "accessibility"}) for a WCAG audit. No axe-core setup needed.

Natural language lets you describe what matters. The AI and Gasoline figure out how to measure it.

The BDD Promise, Finally Delivered

Behavior-Driven Development (BDD) tried to solve this problem with Gherkin syntax:

Given the user is on the login page
When they enter valid credentials
Then they should see the dashboard

But Gherkin still required step definitions — glue code that mapped English to implementation. Someone still had to write Given('the user is on the login page', () => page.goto('/login')). The maintenance burden just moved.

With Gasoline, there are no step definitions. The AI is the step definition. It reads “the user is on the login page” and navigates there. No glue code. No mapping layer. No maintenance.

BDD was right about the idea. It just needed AI to finish the job.

When to Use Natural Language Tests

Natural language tests are ideal for:

Acceptance testing — Verify the product meets requirements
Regression testing — Re-run after deploys to catch breakage
Exploratory testing — “Navigate the settings page and verify nothing looks broken”
Cross-product workflows — “Log in to the admin panel, create a user, then switch to the customer app and verify the user can log in”
Demo verification — “Run the demo script and verify every step completes”

They complement (not replace) unit tests and integration tests. Your engineers still write fast, focused unit tests for business logic. Natural language tests cover the full-stack, end-to-end workflows that live at the product level.

The best test is the one that gets written. And the test that gets written is the one that’s easy to write.