May 7, 2026·12 min read

Claude Code with Playwright: E2E Testing Workflow

Claude CodePlaywrightE2E TestingQA

Why Playwright E2E tests need Claude Code configuration

Playwright is not a trivial tool. It has three selector strategies, two execution contexts (Node and browser), a Page Object Model convention, a fixtures system, a trace viewer, and distinct patterns for authentication, network interception, and parallel execution. Claude Code knows all of it.

The problem is scope. Without project-specific configuration, Claude Code generates Playwright tests that are technically valid but operationally difficult: they use fragile CSS selectors instead of getByRole, skip the Page Object Model your team agreed on, hard-code credentials that belong in fixtures, and produce test files that run in isolation but break in parallel. The tests pass on first run and fail in CI.

A project-specific CLAUDE.md eliminates most of these failure modes before they appear. This guide builds that configuration layer, then covers the Page Object Model generation workflow, selector strategy, flaky test diagnosis, and CI integration. If you are new to Claude Code entirely, read the Claude Code setup guide for installation and authentication first.

The Playwright CLAUDE.md

A good Playwright CLAUDE.md answers five questions: how is the project structured, what selector strategy is required, how is authentication handled, what is the Page Object Model pattern, and how are tests run?

# Playwright E2E test rules

## Project structure
- Tests in tests/e2e/{feature}/
- Page Object Models in tests/e2e/pages/{PageName}.page.ts
- Fixtures in tests/e2e/fixtures/
- Test data factories in tests/e2e/factories/
- Playwright config: playwright.config.ts at project root

## Selector strategy (strictly enforced)
- Preferred: getByRole(), getByLabel(), getByText() (user-visible selectors)
- Acceptable: getByTestId() when ARIA roles are insufficient
- Banned: CSS selectors (.class, #id), XPath, nth-child
- Add data-testid attributes when a test cannot be written with ARIA selectors

## Authentication
- Never hard-code credentials in test files
- Auth state stored in tests/e2e/.auth/{role}.json via storageState
- Fixture for each user role: adminPage, userPage, guestPage
- Use page.request for API-based auth setup (faster than UI login)

## Page Object Model
- One POM class per page/major section
- POM class name: {PageName}Page (e.g. DashboardPage)
- Locators declared as class properties in the constructor
- Actions as async methods that return the POM (for chaining)
- Assertions as expect() calls inside action methods or in the test body

## Test conventions
- describe() block per feature, test() per scenario
- Test names: verb + noun + condition ("displays error message when email is invalid")
- No test.only() in committed code
- No hardcoded waits (page.waitForTimeout). Use expect().toBeVisible() or waitForResponse().
- Tests must be independent, no shared state between tests

## Running tests
- Full suite: `npx playwright test`
- Specific file: `npx playwright test tests/e2e/auth/login.spec.ts`
- Specific test: `npx playwright test --grep "displays error"`
- Headed (debug): `npx playwright test --headed`
- UI mode: `npx playwright test --ui`
- Trace: `npx playwright test --trace on`

This file sets the ground rules for every Playwright session. The selector strategy section is the most important: without it, Claude Code uses whatever selector is shortest, which produces CSS selectors that break on class name changes.

Generating Page Object Models

The POM is the most time-consuming part of a Playwright test suite to build by hand. It is also the easiest to generate with Claude Code, because POM generation is a pattern-matching task: read the page, identify interactive elements, map them to locators, wrap actions into methods.

The prompt structure that works:

"Read the file src/app/dashboard/page.tsx. Generate a DashboardPage POM class in tests/e2e/pages/DashboardPage.page.ts. Include locators for: the stats cards (total users, total revenue, active sessions), the date range filter, the export CSV button, and the user table. Include action methods for: changing the date range, clicking export, and filtering the table by status. Follow the POM conventions in CLAUDE.md."

With the CLAUDE.md loaded, Claude Code generates:

// tests/e2e/pages/DashboardPage.page.ts
import { type Page, type Locator } from "@playwright/test";

export class DashboardPage {
  readonly page: Page;

  // Stats cards
  readonly totalUsersCard: Locator;
  readonly totalRevenueCard: Locator;
  readonly activeSessionsCard: Locator;

  // Controls
  readonly dateRangeFilter: Locator;
  readonly exportCsvButton: Locator;
  readonly statusFilter: Locator;

  // Table
  readonly userTable: Locator;
  readonly userTableRows: Locator;

  constructor(page: Page) {
    this.page = page;
    this.totalUsersCard = page.getByRole("region", { name: /total users/i });
    this.totalRevenueCard = page.getByRole("region", { name: /total revenue/i });
    this.activeSessionsCard = page.getByRole("region", { name: /active sessions/i });
    this.dateRangeFilter = page.getByRole("combobox", { name: /date range/i });
    this.exportCsvButton = page.getByRole("button", { name: /export csv/i });
    this.statusFilter = page.getByRole("combobox", { name: /filter by status/i });
    this.userTable = page.getByRole("table", { name: /users/i });
    this.userTableRows = this.userTable.getByRole("row").filter({ hasNot: page.getByRole("columnheader") });
  }

  async setDateRange(range: "7d" | "30d" | "90d"): Promise<DashboardPage> {
    await this.dateRangeFilter.selectOption(range);
    await this.page.waitForResponse((resp) => resp.url().includes("/api/metrics") && resp.status() === 200);
    return this;
  }

  async exportCsv(): Promise<string> {
    const downloadPromise = this.page.waitForEvent("download");
    await this.exportCsvButton.click();
    const download = await downloadPromise;
    return download.path();
  }

  async filterByStatus(status: "active" | "inactive" | "pending"): Promise<DashboardPage> {
    await this.statusFilter.selectOption(status);
    await this.page.waitForLoadState("networkidle");
    return this;
  }

  async getUserCount(): Promise<number> {
    const rows = await this.userTableRows.count();
    return rows;
  }
}

Every locator uses getByRole with accessible names. The setDateRange method waits for the network response rather than a timeout. The methods return this for chaining. Claude generates this pattern because the CLAUDE.md specifies it.

Without CLAUDE.md, the same prompt produces page.locator('.stats-card:nth-child(1)') and page.locator('#export-btn'), along with page.waitForTimeout(1000) between actions.

Writing tests against the POM

Once the POM exists, test generation is fast. Claude Code reads the POM class and generates tests that use its methods:

"Write E2E tests for the dashboard date range filter. Test that: (1) changing to 30d updates the stats cards with new values, (2) the URL reflects the selected range as a query parameter, (3) the export CSV button downloads a file with the correct date range in the filename. Use the DashboardPage POM."

// tests/e2e/dashboard/date-range.spec.ts
import { test, expect } from "@playwright/test";
import { DashboardPage } from "../pages/DashboardPage.page";

test.describe("Dashboard date range filter", () => {
  let dashboard: DashboardPage;

  test.beforeEach(async ({ page }) => {
    await page.goto("/dashboard");
    dashboard = new DashboardPage(page);
  });

  test("changing to 30d updates stats cards", async ({ page }) => {
    const initialUserCount = await dashboard.totalUsersCard.textContent();
    await dashboard.setDateRange("30d");
    const updatedUserCount = await dashboard.totalUsersCard.textContent();
    expect(updatedUserCount).not.toEqual(initialUserCount);
  });

  test("selected range reflects in URL", async ({ page }) => {
    await dashboard.setDateRange("30d");
    expect(page.url()).toContain("range=30d");
  });

  test("export downloads file with date range in filename", async ({ page }) => {
    await dashboard.setDateRange("30d");
    const filePath = await dashboard.exportCsv();
    expect(filePath).toContain("30d");
  });
});

Notice that setDateRange internally waits for the network response, so none of these tests need explicit waits. This is the operational benefit of building waits into POM actions rather than the test body: the tests read like specifications, and the waits are an implementation detail of the POM.

Authentication fixtures

Authentication is the most common source of E2E test duplication. Without fixtures, every test that requires authentication calls the login function manually. With fixtures, the auth state is shared across tests in the same worker, and UI login happens once per suite.

Claude Code generates auth fixtures reliably with this prompt:

"Create a Playwright fixture in tests/e2e/fixtures/auth.fixture.ts that provides three fixtures: adminPage (authenticated as admin@example.com), userPage (authenticated as user@example.com), and guestPage (not authenticated). Use API-based login via POST /api/auth/login to avoid UI login overhead. Store the auth state in tests/e2e/.auth/{role}.json and reuse across tests."

The generated fixture uses request.post to authenticate at the API level, stores the resulting session cookie via storageState, and injects the authenticated page object into tests that declare it as a dependency. Tests that use adminPage start already authenticated with no UI interaction.

Add the auth state paths to .gitignore:

tests/e2e/.auth/

These files contain session tokens. They should not be committed. Claude Code includes this note in the fixture setup when auth security is mentioned in CLAUDE.md.

Diagnosing flaky tests

Flaky tests are the biggest maintenance cost in any E2E suite. Claude Code diagnoses them effectively when you give it the right input: the test file, the error output from the flaky run, and the Playwright trace if available.

The pattern that works:

"This test fails intermittently in CI but passes locally. Here is the test file: [paste]. Here is the CI failure output: [paste]. Diagnose the cause and fix it."

Claude Code reads the error output looking for the specific failure signatures:

Timeout exceeded on a locator wait: the locator is too broad, or the element appears only after an async operation that is not being awaited properly
Element not found on the first interaction: navigation has not completed, or a loading state is blocking the element
Element is detached from DOM: the page re-renders between locating and clicking, which happens when state updates trigger a full component remount
strict mode violation: multiple elements match the locator, causing Playwright to refuse the action

For each failure type, Claude identifies the specific line causing the issue and produces a fix: tightening the locator, adding waitForResponse before the interaction, converting a locator().click() to locator().waitFor().then(() => locator().click()), or making the locator more specific with additional filtering.

The Claude Code debugging guide covers the broader input-error-fix loop that applies across all debugging scenarios, including test failures.

Parallel execution and test isolation

Playwright runs tests in parallel by default. Tests that share state break in parallel. Claude Code identifies shared state issues when you point it at your test suite:

"Scan all tests in tests/e2e/ for shared state that would cause failures in parallel execution. Look for: tests that write to the database without cleanup, tests that depend on a specific record existing without creating it, and tests that modify global application state (feature flags, configuration). Report each issue with a fix."

The most common isolation failures Claude identifies:

Global test data dependencies. A test that says expect(totalUsersCard).toContain('1,247') breaks when another test runs first and changes the user count. Fix: assert on relative changes, not absolute values, or seed and clean up test data in beforeEach/afterEach.

Shared auth state mutations. A test that changes an admin setting and does not restore it after the run breaks the next test that assumes the default setting. Fix: restore state in afterEach, or create a separate admin fixture per test that resets to baseline.

Sequential test dependencies. A test that assumes it follows another test (because it relies on data the prior test created) breaks when Playwright runs them in a different order. Fix: each test creates its own data in beforeEach.

Add this to CLAUDE.md to prevent these patterns:

## Test isolation rules
- Every test creates its own test data in beforeEach and cleans up in afterEach
- No test asserts on absolute database counts
- No test depends on another test having run first
- Use factories in tests/e2e/factories/ to create test data with consistent IDs

Network interception for external services

Tests that call external APIs (payments, email, SMS) are slow, non-deterministic, and expensive. Claude Code generates network interception mocks that replace external calls with controlled responses:

"Add network interception to the checkout test suite. Mock the Stripe payment endpoint (POST https://api.stripe.com/v1/payment_intents) to return a success response for card number 4242 4242 4242 4242 and a decline response for 4000 0000 0000 0002. Mock the order confirmation email endpoint to capture the request body without sending."

Claude generates page.route() intercepts that match the URL patterns and return the appropriate JSON responses. The tests run without network calls, complete in milliseconds instead of seconds, and are fully deterministic.

CI integration

Running Playwright in CI requires a few configuration adjustments. Claude Code produces a CI-ready config when you ask:

"Update playwright.config.ts for CI: use 4 workers in CI (process.env.CI check), set retries to 2 in CI and 0 locally, output JUnit XML for the test reporter, enable trace on retry, and set a baseURL from the CI_BASE_URL environment variable."

// playwright.config.ts
import { defineConfig, devices } from "@playwright/test";

export default defineConfig({
  testDir: "./tests/e2e",
  fullyParallel: true,
  forbidOnly: !!process.env.CI,
  retries: process.env.CI ? 2 : 0,
  workers: process.env.CI ? 4 : undefined,
  reporter: process.env.CI
    ? [["junit", { outputFile: "test-results/results.xml" }], ["html"]]
    : "html",
  use: {
    baseURL: process.env.CI_BASE_URL ?? "http://localhost:3000",
    trace: "on-first-retry",
    screenshot: "only-on-failure",
  },
  projects: [
    {
      name: "chromium",
      use: { ...devices["Desktop Chrome"] },
    },
    {
      name: "mobile-chrome",
      use: { ...devices["Pixel 5"] },
    },
  ],
});

Add the Playwright CI setup to your GitHub Actions workflow:

- name: Install Playwright browsers
  run: npx playwright install --with-deps chromium

- name: Run E2E tests
  run: npx playwright test
  env:
    CI: true
    CI_BASE_URL: ${{ env.DEPLOYED_URL }}

- name: Upload test results
  if: always()
  uses: actions/upload-artifact@v4
  with:
    name: playwright-report
    path: playwright-report/
    retention-days: 7

The if: always() on the upload step ensures test artifacts are available even when tests fail.

What QA engineers get wrong first

Three patterns appear consistently when teams start using Claude Code for Playwright.

Not specifying the selector strategy. Without a rule, Claude Code picks the shortest selector. For most UIs, that means CSS selectors or nth-child indices. These break on any visual refactor. One line in CLAUDE.md banning CSS selectors and requiring getByRole prevents this entirely, and it produces tests that are more readable as a side effect.

Generating tests without POMs. For a suite of fewer than five tests, POM overhead does not pay off. For twenty tests, a POM saves more time than it costs on the first refactor. Ask Claude Code to generate POMs first, then tests. The test generation step is fast once the POM exists.

Letting Claude run tests before the environment is ready. Claude Code can execute npx playwright test via the Bash tool. If the test environment is not running, it gets connection refused errors and generates a loop of fixes for a problem that is not in the test code. Confirm that the dev server or preview URL is accessible before starting a test generation session. Add this to CLAUDE.md:

## Before running tests
- Verify the dev server is running at http://localhost:3000
- If baseURL returns a connection error, do not attempt to fix the test, fix the environment first

Building a complete E2E test suite

The workflow in this guide produces a Playwright test suite where every test uses stable selectors, authentication is handled by fixtures rather than repeated UI login, tests are isolated enough to run in parallel, and external services are intercepted so the suite runs fast in CI.

The foundation is the CLAUDE.md template above. Add it to your project root, generate one POM and its associated tests end-to-end, and adjust the rules based on what Claude produces. The selector strategy and POM conventions rarely need adjustment after the first session. Authentication fixtures and isolation rules are usually where teams add two or three project-specific rules.

For the unit and integration testing workflow that sits below E2E in the testing pyramid, the Claude Code testing guide covers Vitest, TDD, and coverage analysis. For the custom commands and hook patterns that automate test runs during development, see the Claude Code hooks guide and custom agents.

Want a Playwright setup that runs from day one? Claudify includes a Playwright CLAUDE.md template, POM generator patterns, auth fixture setup, and CI configuration. One command: npx create-claudify.

Ready to upgrade your Claude Code setup?

Get Claudify