Skip to content

testing

3 posts with the tag “testing”

AI-Powered QA: How to Test Your Web App Without Writing Test Code

What if you could test your web application by describing what should happen — in plain English — and have an AI actually run the tests?

No Playwright scripts. No Selenium WebDriver setup. No npm install or pip install. No learning CSS selectors, XPath, or assertion libraries. Just tell the AI what to test, and it tests it.

This isn’t a future vision. It works today with Gasoline MCP.

Writing automated tests is expensive:

  • Setup cost: Install Node.js, install Playwright, configure the test runner, set up CI/CD
  • Writing cost: Learn the API, figure out selectors, handle async operations, manage test data
  • Maintenance cost: Every UI change breaks selectors. Every flow change breaks sequences. Tests that took 2 hours to write take 4 hours to maintain.

The result? Most teams have either:

  1. No automated tests — manual QA only
  2. Fragile tests — break on every deploy, ignored by the team
  3. Expensive tests — dedicated QA engineers maintaining a test suite that’s always behind

With Gasoline, testing looks like this:

"Go to the login page. Enter 'test@example.com' as the email and 'password123'
as the password. Click Sign In. Verify that you land on the dashboard and there
are no console errors."

The AI:

  1. Navigates to the login page
  2. Finds the email field (using semantic selectors — label=Email, not #email-input-field-v2)
  3. Types the email
  4. Finds the password field
  5. Types the password
  6. Clicks the Sign In button (by text, not by CSS selector)
  7. Waits for navigation
  8. Checks the URL contains /dashboard
  9. Checks for console errors

If anything fails, the AI reports exactly what happened: “The Sign In button was found and clicked, but the page navigated to /error instead of /dashboard. The API returned a 401 with {"error": "invalid credentials"}.”

Selenium/Playwright test:

await page.goto('https://myapp.com/login');
await page.locator('#email-input').fill('test@example.com');
await page.locator('#password-input').fill('password123');
await page.locator('button[type="submit"]').click();
await expect(page).toHaveURL(/.*dashboard/);

Gasoline natural language:

Log in with test@example.com / password123.
Verify you reach the dashboard.

The Selenium test breaks when:

  • The email field ID changes from #email-input to #email-field
  • The submit button gets a new class or is replaced with a different component
  • The form structure changes (inputs wrapped in a new div)

The natural language test survives all of these because the AI uses meaning-based selectors: “the email field” → label=Email, “the sign in button” → text=Sign In.

"Sign up with a new account, verify the welcome email prompt appears,
dismiss it, navigate to settings, change the display name, and verify
the change is reflected in the header."
"Submit the contact form with an empty email. Verify an error message
appears. Then enter a valid email and submit. Verify it succeeds."
"Navigate to a product page that doesn't exist (/products/99999).
Verify a 404 page is shown and there are no console errors."
"Navigate to the homepage. Check that LCP is under 2.5 seconds and
there are no layout shifts above 0.1."
"Run an accessibility audit on the checkout page. Report any critical
or serious violations."
"Submit an order. Verify the API returns a 201 status and the response
includes an order ID."

Natural language tests are great for exploratory testing and quick validation. But for CI/CD, you need repeatable tests.

After running a natural language test session:

generate({format: "test", test_name: "guest-checkout",
assert_network: true, assert_no_errors: true})

Gasoline generates a complete Playwright test from the session — every action translated to Playwright commands with proper selectors, network assertions, and error checking. The AI ran the test in natural language; Gasoline converts it to code for CI.

This is the best of both worlds:

  1. Write tests in English — fast, no setup
  2. Export to Playwright — repeatable, CI-ready
  3. Re-run in English — if the generated test breaks, describe the flow again and regenerate

You know the user flows better than anyone. You shouldn’t need to write JavaScript to verify them. Describe the flow, the AI tests it, and you see the results.

You don’t have dedicated QA engineers, and your developers are building features, not writing tests. Natural language testing gives you test coverage without the headcount.

You already know how to test. Natural language testing lets you work faster — describe 10 test cases in the time it takes to code 1. Generate Playwright tests from the ones that should be permanent.

You just shipped a feature and want to verify the happy path before the PR review. A 30-second natural language test is faster than writing a proper test and faster than manual testing.

Resilience: Why AI Tests Survive UI Changes

Section titled “Resilience: Why AI Tests Survive UI Changes”

Traditional tests are tightly coupled to the UI implementation:

// Breaks when the button text changes from "Submit" to "Place Order"
await page.locator('button:has-text("Submit")').click();
// Breaks when the ID changes
await page.locator('#checkout-submit-btn').click();
// Breaks when the class changes
await page.locator('.btn-primary.submit').click();

The AI uses semantic selectors that adapt:

  • text=Submit → If the button now says “Place Order”, the AI reads the page and finds the new text
  • label=Email → Works regardless of whether it’s an <input>, a Material UI <TextField>, or a custom component
  • role=button → Works regardless of styling or class names

And if a selector doesn’t match, the AI doesn’t just fail — it calls interact({action: "list_interactive"}) to discover what’s actually on the page and adapts.

For tests you run regularly:

"Save this test flow as 'checkout-happy-path'."
configure({action: "store", store_action: "save",
namespace: "tests", key: "checkout-happy-path",
data: {steps: ["navigate to /checkout", "fill in shipping...", ...]}})
"Load and run the 'checkout-happy-path' test."
configure({action: "store", store_action: "load",
namespace: "tests", key: "checkout-happy-path"})

Save browser state at key points:

interact({action: "save_state", snapshot_name: "logged-in"})

Later, restore that state instead of repeating the login flow:

interact({action: "load_state", snapshot_name: "logged-in", include_url: true})
  1. Install Gasoline (Quick Start)
  2. Open your web app
  3. Tell your AI: “Test the login flow — go to the login page, enter test credentials, sign in, and verify you reach the dashboard.”

No setup. No dependencies. No test code. Just describe what should happen.

Gasoline MCP vs Playwright: When to Use Which

Gasoline and Playwright aren’t competitors — they’re complementary. Playwright is a browser automation library for writing repeatable test scripts. Gasoline is an AI-powered browser observation and control layer. Gasoline can even generate Playwright tests.

But they serve different purposes, and knowing when to use each saves significant time.

Gasoline MCPPlaywright
InterfaceNatural language via AIJavaScript/TypeScript/Python API
Who uses itDevelopers, PMs, QA — anyoneDevelopers and QA engineers
SetupInstall extension + npx gasoline-mcpnpm init playwright@latest
SelectorsSemantic (text=Submit, label=Email)CSS, XPath, role, text, test-id
Test creationDescribe in EnglishWrite code
ExecutionAI runs it interactivelyCLI or CI/CD pipeline
DebuggingReal-time browser observationTrace viewer, screenshots
MaintenanceAI adapts to UI changesManual selector updates
CI/CDGenerate Playwright tests → run in CINative CI/CD support
ObservabilityConsole, network, WebSocket, vitals, a11yLimited (what you assert)
PerformanceBuilt-in Web Vitals + perf_diffManual performance assertions
CostFree, open sourceFree, open source

You’re checking if a feature works. You don’t want to write a script — you want to try it.

Playwright: Write a script, run it, read the output, modify, repeat.

Gasoline: “Go to the checkout page, add two items, and complete the purchase. Tell me if anything breaks.”

For one-off verification, natural language is 10x faster.

Your test failed. Now what?

Playwright: Open the trace viewer. Scrub through screenshots. Check the assertion error message. Maybe add console.log statements to the test and re-run.

Gasoline: The AI already sees everything — console errors, network responses, WebSocket state, performance metrics. It can diagnose while testing.

observe({what: "error_bundles"})

One call returns the error with its correlated network requests and user actions. No trace viewer needed.

A designer renamed “Submit” to “Place Order” and restructured the form.

Playwright: Tests fail. You update selectors manually across 15 test files. You hope you caught them all.

Gasoline: The AI reads the page, finds the new button text, and continues. No manual updates.

A product manager wants to verify the user flow before release.

Playwright: Not an option without JavaScript knowledge.

Gasoline: “Walk through the signup flow and make sure it works.” The PM can do this themselves.

Playwright tests only check what you explicitly assert. If you don’t assert “no console errors,” you’ll never know about them.

Gasoline observes everything passively:

  • Console errors the test didn’t check for
  • Slow API responses the test didn’t measure
  • Layout shifts the test didn’t detect
  • Third-party script failures the test couldn’t see

Playwright: You can measure timing with custom code, but there’s no built-in Web Vitals collection or before/after comparison.

Gasoline: Web Vitals are captured automatically. Navigate or refresh, and you get a perf_diff with deltas, ratings, and a verdict. No custom code.

Playwright tests run headlessly in GitHub Actions, GitLab CI, or any CI system. They’re deterministic, repeatable, and fast.

Gasoline generates Playwright tests, but the actual CI execution is Playwright’s domain. Gasoline runs interactively with an AI assistant — it’s not designed to be a CI test runner.

Playwright can shard tests across multiple workers and run them in parallel. For a suite of 500 tests, this means finishing in minutes instead of hours.

Gasoline is single-session — one AI, one browser, one tab at a time.

Playwright supports Chromium, Firefox, and WebKit out of the box.

Gasoline’s extension currently runs in Chrome/Chromium only.

When you need a test that passes or fails the exact same way every time, Playwright’s explicit assertions are the right tool:

await expect(page.getByRole('heading')).toHaveText('Welcome back');
await expect(response.status()).toBe(200);

AI-driven testing is intelligent but non-deterministic — the AI might take different paths or interpret “verify it works” differently across runs.

Playwright can intercept and mock network requests, letting you test error states, slow responses, and edge cases without a real backend.

Gasoline observes real traffic — it doesn’t mock it.

The Best of Both: Generate Playwright from Gasoline

Section titled “The Best of Both: Generate Playwright from Gasoline”

The power move: use Gasoline for exploration and Playwright for CI.

"Walk through the checkout flow — add an item, go to cart, enter
shipping info, and complete the purchase."

The AI runs the flow interactively, handling UI variations and reporting issues in real time.

"Generate a Playwright test from this session."
generate({format: "test", test_name: "checkout-flow",
base_url: "http://localhost:3000",
assert_network: true,
assert_no_errors: true,
assert_response_shape: true})

Gasoline produces a complete Playwright test:

import { test, expect } from '@playwright/test';
test('checkout-flow', async ({ page }) => {
const consoleErrors = [];
page.on('console', msg => {
if (msg.type() === 'error') consoleErrors.push(msg.text());
});
await page.goto('http://localhost:3000/products');
await page.getByRole('button', { name: 'Add to Cart' }).click();
await page.getByRole('link', { name: 'Cart' }).click();
await page.getByLabel('Address').fill('123 Main St');
// ...
expect(consoleErrors).toHaveLength(0);
});

The generated test runs in your CI pipeline like any other Playwright test. Deterministic, repeatable, fast.

The UI changed and the Playwright test fails. Instead of manually updating selectors:

"The checkout test is failing because the form changed.
Walk through the checkout flow again and generate a new test."

The AI adapts to the new UI, generates a fresh Playwright test, and you’re back in CI.

ScenarioUse
Quick feature verificationGasoline
CI/CD regression suitePlaywright (generated by Gasoline)
Debugging a test failureGasoline (better observability)
Non-developer testingGasoline
Cross-browser testingPlaywright
Performance monitoringGasoline (built-in vitals)
Network mockingPlaywright
Accessibility auditingGasoline (built-in axe-core)
Exploratory testingGasoline
500+ test parallel executionPlaywright
Test maintenanceGasoline (regenerate broken tests)
  1. Develop — use Gasoline for real-time debugging and quick validation
  2. Generate — convert validated flows to Playwright tests
  3. CI — run Playwright tests on every push
  4. Maintain — when tests break, re-explore with Gasoline and regenerate

Gasoline doesn’t replace Playwright. It makes Playwright tests easier to create, easier to maintain, and easier to debug when they fail.

Why Natural Language Is the Best Way to Write Tests

Test scripts written in English are more readable, more maintainable, and more accessible than any test framework. Here’s why natural language testing with Gasoline MCP is the future.

The original promise of testing was simple: tests describe what the software should do. If someone new joins the team, they read the tests and understand the product.

That promise died somewhere between page.locator('.btn-primary').nth(2).click() and await expect(wrapper.find('[data-testid="modal-close"]')).toBeVisible().

Nobody reads test code to understand the product. They read it to figure out why CI is red.

Natural language testing brings the promise back. A test that says “Click ‘Add to Cart’ and verify the cart shows 1 item” is documentation that runs.

Everyone Can Read It, Everyone Can Write It

Section titled “Everyone Can Read It, Everyone Can Write It”

A Playwright test requires JavaScript knowledge, framework familiarity, and understanding of CSS selectors. The audience for that test is maybe 5 people on your team.

A natural language test requires knowing what the product should do. The audience is everyone — product managers, designers, QA, support, executives, and engineers.

Test: Password Reset Flow
1. Click "Forgot Password" on the login page
2. Enter "user@example.com" in the email field
3. Click "Send Reset Link"
4. Verify the page shows "Check your email"
5. Verify no errors in the console
6. Verify the API call to /api/auth/reset returned 200

A product manager wrote that. A designer can review it. QA can run it. An engineer can debug it when it fails. Everyone works from the same artifact.

Traditional test maintenance is a tax on velocity. Every UI change risks breaking tests. Teams either spend hours fixing selectors or stop running the tests entirely.

Natural language tests break only when the product behavior changes — and that’s exactly when you want them to break.

UI ChangePlaywright breaks?Natural language breaks?
Button class renamedYesNo
Form restructuredYesNo
CSS framework swappedYesNo
Component library upgradedYesNo
Button text “Submit” to “Register”NoYes — intentionally
Checkout flow adds a stepNoYes — intentionally

The test breaks when the product changes. It doesn’t break when the implementation changes. That’s the correct behavior for an acceptance test.

When you write a Playwright test, you write exactly what you coded. Nothing more. If you forgot to check for console errors, the test doesn’t check for console errors.

When an AI executes a natural language test with Gasoline, it has access to the full browser state. You can write:

5. Verify no errors on the page

And the AI calls observe({what: "errors"}) to check the console, and looks at the page for visible error messages, and can check observe({what: "network_bodies", status_min: 400}) for failing API calls.

You described the intent. The AI was thorough about the implementation.

Tests Match How You Think About the Product

Section titled “Tests Match How You Think About the Product”

Product people think in workflows: “The user signs up, gets a welcome email, clicks the confirmation link, and lands on the dashboard.”

Engineers think in selectors: “Click #signup-btn, fill input[name='email'], submit the form, wait for [data-testid='welcome-modal'].”

Natural language tests match the product mental model, not the implementation mental model. This means:

  • Acceptance criteria become tests directly. The criteria in your Jira ticket are the test. No translation step.
  • Test reviews are product reviews. When a PM reviews a test, they’re reviewing product behavior, not code.
  • Gap analysis is intuitive. “We test the happy path but not the error case” is obvious when tests are in English.

With Gasoline, the AI can verify things that are awkward or impossible in traditional test frameworks:

8. Verify the WebSocket reconnects after the connection drop

The AI calls observe({what: "websocket_status"}) and checks the connection state. Try writing that in Selenium.

12. Verify the page loads in under 3 seconds

The AI checks observe({what: "vitals"}) for LCP. No performance testing library needed.

15. Verify the page is accessible

The AI runs observe({what: "accessibility"}) for a WCAG audit. No axe-core setup needed.

Natural language lets you describe what matters. The AI and Gasoline figure out how to measure it.

Behavior-Driven Development (BDD) tried to solve this problem with Gherkin syntax:

Given the user is on the login page
When they enter valid credentials
Then they should see the dashboard

But Gherkin still required step definitions — glue code that mapped English to implementation. Someone still had to write Given('the user is on the login page', () => page.goto('/login')). The maintenance burden just moved.

With Gasoline, there are no step definitions. The AI is the step definition. It reads “the user is on the login page” and navigates there. No glue code. No mapping layer. No maintenance.

BDD was right about the idea. It just needed AI to finish the job.

Natural language tests are ideal for:

  • Acceptance testing — Verify the product meets requirements
  • Regression testing — Re-run after deploys to catch breakage
  • Exploratory testing — “Navigate the settings page and verify nothing looks broken”
  • Cross-product workflows — “Log in to the admin panel, create a user, then switch to the customer app and verify the user can log in”
  • Demo verification — “Run the demo script and verify every step completes”

They complement (not replace) unit tests and integration tests. Your engineers still write fast, focused unit tests for business logic. Natural language tests cover the full-stack, end-to-end workflows that live at the product level.

The best test is the one that gets written. And the test that gets written is the one that’s easy to write.