Why do my recorded tests pass locally but fail in CI?

Nine times out of ten it's one of three things: hardcoded waits that are too short on a slower CI runner, dynamic selectors that change between runs (especially with SSR frameworks), or missing auth state. Start by checking if your CI environment uses a different base URL or runs headless — some tools behave differently headless. In Playwright, add `--headed` to your CI run temporarily to watch what's actually happening. For auth issues, use `storageState` to serialize your session and load it at test start rather than re-logging in every time.

Is Playwright Codegen good enough to replace writing tests by hand?

As a starting scaffold, yes — it saves real time capturing the click sequence. As a finished product, no. The generated output usually has selectors you'd never write by hand (`locator('div').nth(2).locator('button')`) and no assertions beyond what you explicitly added during recording. Treat codegen output the way you'd treat a code snippet from Stack Overflow: get it working, then refactor it into something you'd actually commit.

Can I record tests for apps that require login without hardcoding credentials?

Yes, and you should set this up before recording anything else. In Playwright, run `npx playwright codegen --save-storage=auth.json https://yourapp.com`, log in manually during the recording session, then close it. The `auth.json` file captures cookies and localStorage. In your actual tests, load it with `storageState: 'auth.json'` in your test config. Store the auth.json in CI as a secret artifact or regenerate it as part of your setup step using a programmatic login script — don't commit it with credentials baked in.

Selenium IDE or Playwright Codegen for a team that's new to test automation?

Playwright Codegen for anyone starting fresh. Selenium IDE has zero setup friction which sounds appealing, but the code it exports is significantly harder to maintain — lots of XPath, no built-in retry logic, and getting it running in CI requires more manual work than it looks like. Playwright's learning curve is slightly steeper upfront but the tests you end up with are actually sustainable. If the team is truly non-technical (manual QA, no coding background), look at TestCafe Studio or Katalon instead — they have better visual interfaces for non-developers.

Web Test Recorders That Actually Replay Correctly (I've Broken Enough CI Pipelines to Know)

The Real Problem With Recorded Tests

The checkout flow test passes on your machine, passes in your demo to the team, and then fails at 3am in CI every single day for two weeks. You dig into the logs and find a different failure mode each time — sometimes it’s a missing element, sometimes a timeout, sometimes the test just hangs. That’s not a flaky test. That’s a recorder that captured what you clicked without understanding why the click was valid at that moment.

Most web test recorders work by serializing user interactions into selector-based scripts. The problem is that selectors generated by recorders are almost always wrong for production use. A recorder watching you click a button will grab whatever identifier is convenient — often something like div:nth-child(3) > button.sc-bdfxgS, a hashed CSS class from a CSS-in-JS library that changes every build. Your test passes once because it ran against the same build you recorded on. It fails permanently after the next deploy. I’ve seen teams spend more time debugging generated selectors than they would have spent writing the tests by hand.

The race condition problem is worse and harder to see. A recorder captures a sequence of events with fixed timings baked in. But your checkout flow has async operations — address validation, shipping rate calculation, payment tokenization — and the recorder doesn’t know those are async. It just saw you click “Continue” 800ms after filling in the ZIP code because that’s how long it took on your fast dev machine with a warm cache. In CI, on cold infrastructure, that operation takes 2.4 seconds. The recorder’s implicit wait is already gone and the test is clicking into a loading state. No amount of rerunning fixes this without actually addressing the wait strategy.

Auth state is the third failure mode nobody documents until it bites them. You record a test while already logged in. The recorder captures the authenticated session flow perfectly. Then CI runs the test from scratch, hits a redirect to /login, and the entire script is now operating on completely wrong DOM. Worse, some recorders will try to “recover” and continue clicking, producing a failure trace that looks nothing like the actual problem. The recorders that handle this well let you explicitly define preconditions — seed a session token, call an API to bootstrap state, or run a login step as a reusable setup block rather than replaying the login UI on every run.

The dividing line between a recorder you keep and one you rip out after week two usually comes down to three things:

Selector strategy control — can you tell it to prefer data-testid attributes, ARIA roles, or visible text over brittle auto-generated CSS paths?
Explicit wait handling — does it insert waitForSelector / waitForResponse style assertions, or does it just hardcode sleep(500)?
Exportable, readable output — if the generated test is a binary blob or locked inside a proprietary UI, you can’t review it, diff it, or fix it when it breaks.

The tools that get this right are making a deliberate choice to prioritize CI reliability over easy demos. The ones that look impressive in a five-minute screenshare are often the ones that generate uneditable scripts full of absolute coordinates and pixel offsets. For a broader view of which testing and productivity tools are actually worth paying for, the Essential SaaS Tools for Small Business in 2026 guide covers the evaluation framework in more depth.

The Tools I’ve Actually Run in Production (My Current Stack)

Playwright Codegen: The One I Open First Now

I switched my whole team to Playwright Codegen about 18 months ago, and the thing that actually sold me wasn’t the recording quality — it was the locator strategy. Most recorders grab id attributes or fragile CSS selectors. Codegen defaults to ARIA roles and accessible text first, then falls back to data-testid, then to CSS. That ordering matters enormously for playback reliability. A test that selects a button by [aria-label="Submit order"] survives a design overhaul. A test that selects #btn-147 breaks the next sprint when someone renames a component.

# Fire up the recorder against your local dev server
npx playwright codegen --viewport-size="1280,720" http://localhost:3000

# If you need to capture network requests too
npx playwright codegen --save-har=auth-flow.har http://localhost:3000/login

The generated output is TypeScript (or JS/Python/Java depending on flags) that you can paste directly into a test file. It’s not clean — you’ll see redundant waits and occasionally bizarre selector choices — but the structure is correct. I treat the output like a first draft from a junior dev: good bones, needs a real code review. The --save-har flag is something most people miss. If you’re recording an authenticated flow, capturing the HAR means you can replay network conditions in CI without standing up a live backend. That alone justified the switch for one of our projects.

Cypress Studio: Useful in a Narrow Slice

Cypress Studio is not for greenfield test suites. I used it as a primary recorder for about four months and quietly stopped. The issue isn’t the UI — it’s that Studio only records within an existing it() block, meaning you’re extending tests you’ve already written. That’s actually fine for adding assertions to a partial test, but it’s a weird workflow to teach a team. You end up with this hybrid approach where you write the navigation manually, then let Studio capture the assertions, then wonder why the generated .contains() calls break on dynamic text.

Where I still reach for it: adding quick regression assertions to a test that already exists and works. If I have a checkout flow test that I want to extend with three new UI checks after a feature ship, Cypress Studio does that in about 90 seconds. The selector generation is worse than Playwright’s — it leans heavily on nth-child and positional selectors — but for assertions against static UI elements it’s usually fine. The real trade-off: Cypress Studio requires the Cypress Test Runner running locally, so it’s not something you can hand to a non-dev easily. It’s a dev tool with a graphical layer, not a QA-friendly recorder.

Selenium IDE: The One You Inherit, Not Choose

I’ve never started a project with Selenium IDE. I’ve inherited three. My honest take: the recording fidelity is genuinely impressive for a browser extension, and the Selenese command format is actually readable by non-engineers. The problem is everything around the recorder. SIDE (Selenium IDE export) files are JSON, which sounds nice until you’re trying to diff them in a PR. The playback CLI (selenium-side-runner) has lag in keeping up with browser updates. I’ve had it break twice on Chrome major version bumps and once on a Firefox ESR update, both times requiring a full day of diagnosis.

# Running SIDE files headlessly (when it actually works)
npm install -g selenium-side-runner
selenium-side-runner -c "browserName=chrome" my-suite.side

# Expected: test results to stdout
# Reality: check that chromedriver matches your installed Chrome version exactly

If you’re maintaining a legacy suite on Selenium IDE, the migration path I’d recommend is: export to Java or Python, run those through a cleanup pass, then port to Playwright Test. It’s about two days of work per 50 tests if the tests were written reasonably. Not fun, but it removes a permanent dependency on a tool that’s one Chrome update away from being broken.

TestCafe Studio: The Config-Free Option That Actually Delivers

TestCafe Studio surprised me. The zero-config value proposition is real — you install the desktop app, point it at a URL, and record. No browser driver management, no npm install for the runner, no webdriver-manager update hell. For teams where the people writing tests aren’t comfortable in the terminal, this matters a lot. I set up TestCafe Studio for a QA team at a client where the testers were strong on manual testing but had no Node.js background, and they were producing runnable tests in their first hour.

The trade-offs are concrete though. TestCafe Studio costs $599/year per user (as of early 2025 — check their current pricing before budgeting). The selectors it generates are specific to TestCafe’s own selector engine, so if you ever want to migrate to Playwright or Cypress you’re rewriting, not porting. The test files are .testcafe format internally, though you can export to JavaScript. CI integration requires the TestCafe CLI, which adds a dependency you’re now maintaining separately from your recorder. I wouldn’t use it for a team that’s already comfortable with JavaScript tooling — Playwright Codegen gets you 80% of the same usability for free.

Why I Stopped Recommending Recording as the End Product

The mindset shift that actually made our test suite reliable was treating every recorded test as a scaffold, not a finished artifact. Recording gets you 70-80% of a working test in 5 minutes. That’s the value. But the last 20% — replacing positional selectors with data-testid attributes, adding proper wait conditions instead of implicit sleeps, extracting repeated flows into reusable helpers — that’s what makes tests survive for six months without constant maintenance. Tests that go straight from recorder to CI without review fail in ways that waste more time than the recording saved.

My current workflow: record with Playwright Codegen, do a 15-minute code review pass on the output (primarily checking selectors and removing any hardcoded timeouts), add the test to a PR for review before it ever touches CI. The review checklist I actually use:

Selectors: is anything using nth-child or a raw CSS class? Replace it with a role or data-testid.
Waits: does the recorder-generated code have waitForTimeout(2000) anywhere? Replace with waitForSelector or an assertion-based wait.
Reuse: does this test repeat a login flow that already exists as a fixture? Extract it.
Assertions: Codegen often captures clicks and fills but misses the most important assertions — add explicit expect calls for the actual outcome you care about.

The teams I’ve seen succeed with recorded tests treat them like generated API client code from OpenAPI: useful starting point, always requires a human pass before it ships. The teams that struggle are the ones expecting the recorder to output production-ready tests. No recorder does that, and honestly, promising it does sets up the whole practice for failure.

Playwright Codegen: What I Actually Use Day-to-Day

The thing that caught me off guard with Playwright Codegen wasn’t how good it was at recording — it was how good the generated selectors are. Most recorders I’ve used produce brittle XPath soup or CSS selectors tied to implementation details like div.MuiButton-root:nth-child(3). Codegen defaults to getByRole, getByLabel, and getByTestId — and those survive redesigns in a way that position-based selectors just don’t.

Getting It Running

If you’re starting from scratch in a Node project:

# bootstraps playwright, installs browsers, creates playwright.config.ts
npm init playwright@latest

# open the browser inspector + code panel against your actual app
npx playwright codegen https://yourapp.com

That second command opens two windows: a Chromium instance where you interact with your app, and a code panel that updates in real-time. Every click, fill, and navigation gets transcribed. The code it generates is ready to paste into a .spec.ts file and run — not pseudo-code, not a YAML DSL, actual TypeScript with real Playwright API calls.

The Config That Actually Makes This Useful

Recording anonymous flows is easy. Recording an authenticated flow — your dashboard, your settings page, anything behind a login — is where most recorders fall apart. Playwright handles this with storage state:

# Record the login flow first, save cookies + localStorage to auth.json
npx playwright codegen \
  --browser firefox \
  --save-storage auth.json \
  https://yourapp.com/login

# Now record the authenticated flow, starting already logged in
npx playwright codegen \
  --browser firefox \
  --load-storage auth.json \
  https://yourapp.com/dashboard

The --save-storage flag captures the full browser storage state after you finish the recording session. On the second invocation, --load-storage restores that state, so you’re already past the auth wall before you touch anything. I use --browser firefox for anything involving form autofill behavior because Chromium and Firefox handle it differently, and my users are split — I want tests that match real behavior. Add --viewport-size 390,844 for mobile flows; without it you’ll waste an afternoon wondering why your hamburger menu tests fail in CI when you ran them fine locally at 1440px wide.

Three Things the README Doesn’t Emphasize Enough

Iframe handling: Codegen automatically scopes actions to the correct frame context. The generated code uses page.frameLocator('#my-iframe') automatically — you don’t have to figure out the frame switching yourself. I’ve tried four other recorders that just silently missed iframe interactions entirely.
The assertion toolbar: There’s a button in the code panel that lets you switch from “record actions” mode to “record assertions” mode. Click an element in this mode and it generates a expect(locator).toBeVisible() or expect(locator).toHaveText(). Most people I know skip this and write assertions manually later. Don’t — the assertions it generates are actually good.
The pause button: Mid-recording, you can pause and the browser enters inspector mode. This is useful when you need to wait for a loading state to settle before your next action, and you want the recorded test to include a proper wait rather than a flaky page.waitForTimeout(2000).

Where It Breaks Down

SPAs with heavy CSS transitions or JavaScript-driven animations cause the recorder to generate redundant click events because the element is technically clickable before it’s visually stable. You end up with tests that pass locally and flake in CI under slower CPU throttling. The fix is usually manual — wrap those actions in page.waitForLoadState('networkidle') or add an explicit expect(locator).toBeVisible() before the action.

Third-party OAuth is a flat-out dead end for recording. Google’s login, GitHub’s OAuth flow — those pages detect automation and either block you or serve a degraded version. The right answer there is to use --load-storage with a pre-authenticated state and skip recording the OAuth dance entirely. File upload dialogs are also inconsistent: Codegen handles the input[type=file] path fine, but if your app opens a native OS file picker via a drag-and-drop zone or a non-standard button, some Chromium builds on Linux won’t trigger the dialog at all during recording. You’ll need to write that step manually using page.setInputFiles().

Cypress Studio: Good Idea, Inconsistent Execution

The thing that surprised me about Cypress Studio is that it’s been experimental since Cypress 6 and it’s still experimental in Cypress 13. That’s not a knock — it’s a useful warning signal. When something stays behind an experimental flag for multiple major versions, the team is either not happy with the output quality or the edge case surface is too large to stabilize. With Studio, it’s a bit of both.

Enabling it is one line:

// cypress.config.js
const { defineConfig } = require('cypress')

module.exports = defineConfig({
  e2e: {
    experimentalStudio: true,
    baseUrl: 'http://localhost:3000',
    // nothing else required — Studio activates in the test runner UI
  },
})

After that, you’ll see an “Add Commands to Test” wand icon in the Cypress runner next to each test. Click it, interact with your app, and Studio records the actions. The friction to get started is genuinely low. That’s where my praise for it ends and the selector quality problem begins.

The docs claim Studio generates “solid selectors.” My actual experience: you get a lot of .get(':nth-child(3)'), .contains('Submit') attached to elements three levels up, and class-based selectors like [class='btn btn-primary'] that will break the moment a designer adds a utility class. Compare that to what I’d write by hand:

// What Studio gave me
cy.get(':nth-child(3) > .form-group > [class="btn btn-primary"]').click()

// What I'd actually write
cy.get('[data-testid="checkout-submit-btn"]').click()

The gap is real and it matters. If your team has a data-testid convention (which you should), Studio mostly ignores it unless your elements already have those attributes. It doesn’t understand your selector conventions — it just walks the DOM and picks whatever looks unique at that moment. On dynamic apps where class names are generated by CSS Modules or Tailwind’s JIT, the recorded selectors can break on the next build.

That said, there’s one specific situation where I reach for Studio without hesitation: adding assertions to an existing, stable spec. You’ve already got a test that navigates to the right page, and you just want to bolt on a few should('be.visible') or should('have.text', '...') checks without tab-switching back to your editor. Studio handles this well because you’re clicking on elements that are already on screen and the assertion UI is visual. The recorded output for assertions is cleaner than for interactions — you get .should('have.value', 'Acme Corp') rather than fragile structural selectors.

Skip it entirely for greenfield projects (you’ll need to clean up every selector anyway, so you’ve saved nothing), anything with server-rendered dynamic data where element order shifts between runs, and teams that enforce strict selector standards in code review. In those cases, you’ll spend more time fixing Studio’s output than you would’ve spent writing the test from scratch. My default is to write tests by hand with data-testid attributes and only open Studio when I need to quickly prototype an assertion on a page I’m already looking at.

Selenium IDE: The One You’re Probably Stuck With

The thing that catches most people off guard with Selenium IDE isn’t what it can’t record — it’s what happens the moment you try to move those recordings somewhere useful. The browser extension installs in 30 seconds, you click record, and it looks like magic. Then you try to run that .side file in CI and spend two afternoons debugging why your pipeline keeps crashing on a blank screen.

Install path is dead simple: Chrome Web Store or Firefox Add-ons, search “Selenium IDE”, click install. No Node, no Python environment, no config files. That zero-friction onboarding is genuinely its best feature, and also exactly why teams get stuck with it. Someone in QA recorded 40 tests three years ago, nobody touched them since, and now you’re the person asked to “just make them run in GitHub Actions.” Good luck.

The export-to-code path uses selenium-side-runner, and the command itself is straightforward:

# Install once
npm install -g selenium-side-runner

# Run your recorded suite against staging
npx selenium-side-runner --base-url https://staging.yourapp.com tests/recorded.side

# Export to WebDriver code (JavaScript target)
npx selenium-ide export tests/recorded.side --language=javascript --output ./tests/generated/

The generated JavaScript output is where the pain starts. The code it produces is flat — no page object model, no separation of selectors from logic, every locator inlined as a raw XPath or CSS string. A generated file for a login test might have driver.findElement(By.xpath("//input[@id='username']")) repeated four different ways across the same file because the recorder captured slight variations each time. You can’t diff it meaningfully in a PR review, and the moment someone renames a field in your app, you’re hunting through machine-generated spaghetti instead of updating one selector in one place.

The XVFB problem is the silent killer in CI. Selenium IDE’s runner expects a real display — or a convincing fake. On headless Ubuntu runners (GitHub Actions, CircleCI), you’ll hit errors like unknown error: Chrome failed to start: exited abnormally with no useful context. The workaround that actually works:

# In your CI yaml (GitHub Actions example)
- name: Install Xvfb and Chrome deps
  run: |
    sudo apt-get install -y xvfb libgbm-dev
    
- name: Run Selenium IDE tests with virtual display
  run: |
    Xvfb :99 -screen 0 1920x1080x24 &
    export DISPLAY=:99
    npx selenium-side-runner \
      --capabilities '{"browserName":"chrome","goog:chromeOptions":{"args":["--no-sandbox","--disable-dev-shm-usage"]}}' \
      --base-url https://staging.yourapp.com \
      tests/recorded.side

The --no-sandbox and --disable-dev-shm-usage flags are non-negotiable inside Docker or constrained CI environments. Skip either one and Chrome silently dies before it opens a tab. The /dev/shm partition on most CI runners is 64MB — Chrome’s renderer will OOM trying to use it for shared memory, which manifests as flaky tests that pass locally and fail randomly in CI. That bug wasted a Friday for me before I figured it out.

Honest verdict: Selenium IDE earns its place on a team where QA writes tests and developers never touch them. If your workflow is “record in browser, run manually before a release, done” — it’s genuinely fine. The moment you need those tests committed to a repo, reviewed in PRs, maintained alongside feature code, or executed reliably in a pipeline without babysitting, you’re fighting the tool instead of using it. The .side format is JSON under the hood, but it’s not human-readable in a meaningful way, merge conflicts in it are a nightmare, and there’s no clean abstraction layer. For that workflow, you’re better off spending a day migrating to Playwright’s codegen or Cypress’s Studio — the recording experience is comparable, and what you get out the other end is actually maintainable code.

TestCafe Studio: The Underdog Worth Knowing About

The thing that surprised me most about TestCafe Studio is that it doesn’t touch WebDriver at all. No ChromeDriver binary sitting in your node_modules getting out of sync with your browser version, no `webdriver-manager update` ritual before CI runs. TestCafe injects a script directly into the page and drives interaction through that. This sounds like a minor implementation detail until you’ve spent an afternoon debugging a test suite that was failing because Chrome auto-updated to 115 and your ChromeDriver was still on 114. That whole category of problem just disappears.

The visual recorder genuinely is the most polished of any recorder I’ve tried — and I’ve run through Selenium IDE, Playwright’s codegen, Cypress Studio (before they semi-deprecated it), and a few commercial options. The action timeline is clean, editing a step inline feels natural, and it doesn’t produce the kind of selector spaghetti that Selenium IDE used to dump out. If you hand this to a QA analyst who lives in spreadsheets, they’ll figure it out without a 2-hour onboarding session. That actually matters for adoption.

The selector strategy is where TestCafe Studio makes an interesting architectural bet. They call it the Smart Assertion Query Mechanism — essentially it retries assertions over a configurable timeout window rather than requiring you to manually pepper your test with explicit waits. On paper this makes tests less brittle because you’re not writing:

// This breaks if the network is 200ms slower than usual
await t
  .wait(3000)
  .expect(Selector('#submit-btn').exists).ok();

// TestCafe's approach — retries automatically within timeout
await t
  .expect(Selector('#submit-btn').exists).ok({ timeout: 8000 });

The honest caveat: this holds up well for standard SPAs with predictable DOM updates. I’ve seen it get confused on apps with complex nested shadow DOM components or custom Web Components that don’t follow standard lifecycle patterns. Before you commit your team to this tool, record 10 tests on your actual app — not a demo — and run them 20 times in headless mode. Flakiness shows up fast under that kind of repetition.

Pricing is the friction point you need to verify before putting this in front of your manager. TestCafe (the open-source runner) is MIT-licensed and free. TestCafe Studio — the visual recorder and the polished IDE — moved to a commercial model. Prices change, so check their site directly rather than trusting any number I’d write here. The short version: if you’re a solo dev evaluating tools, the cost might be a non-starter. If you’re a team of 5–10 where one commercial seat saves you 3 hours a week of test maintenance, the math likely works out. Just factor it into the decision explicitly rather than discovering it mid-trial.

The fit is specific and honest: small QA teams, orgs where non-developers need to own test creation, and .NET shops that are already bought into the DevExpress ecosystem. If your company uses DevExpress components across your frontend, the integration and support story is coherent in a way it isn’t for teams arriving from a Playwright or Cypress background. For a startup where everyone writes code and Playwright’s codegen is already in the workflow, TestCafe Studio is probably redundant overhead. For an enterprise team where a business analyst is expected to write regression tests without touching a terminal, it’s one of the more realistic options I’ve seen.

Head-to-Head Comparison Table

Head-to-Head: The Comparison That Actually Matters

I’ve burned time on all four of these tools in real projects, and the differences between them aren’t always obvious from the marketing pages. The table below cuts to what you’ll actually hit after day one. One thing upfront: pricing moves fast in this space — treat the free/paid column as directional and verify current tiers before you commit anything to a budget doc.

Tool	Selector Quality	CI Integration Pain	Auth State Handling	Free Tier	Biggest Dealbreaker
Playwright Codegen	Role-based, text-based — best in class	Zero pain. It’s just Playwright.	`--save-storage` works great	Fully free, always	Locked to Playwright test runner
Cypress Studio	Fragile — data-cy attrs or bust	Built in, if you’re already on Cypress	No dedicated auth recording flow	Free in OSS Cypress	Still experimental, breaks unexpectedly
Selenium IDE	XPath-heavy, brittle under DOM changes	Needs `selenium-side-runner` setup	Manual workarounds only	Free browser extension	Generated code is genuinely ugly
TestCafe Studio	Smart selectors, survives refactors better	Integrates cleanly with TestCafe runner	Decent session handling out of the box	Paid (Studio license required)	Full vendor lock-in to TestCafe ecosystem

The selector quality column is the one that kills teams silently. Selenium IDE generating XPath like //div[@class='container']/ul/li[3]/a feels fine until your designer moves a wrapper div and half your suite red-screens. Playwright’s codegen produces getByRole('link', { name: 'Submit' }) — that selector survives most UI refactors because it’s tied to semantics, not DOM structure. I switched two projects off Selenium IDE specifically for this reason and didn’t miss a single recorded test.

The auth state column deserves more attention than it usually gets. Playwright’s approach is dead simple once you know it:

# Record session and save cookies + localStorage
npx playwright codegen --save-storage=auth.json https://yourapp.com/login

# Reuse that session in all subsequent recordings or tests
npx playwright codegen --load-storage=auth.json https://yourapp.com/dashboard

Cypress Studio has nothing equivalent. You either write a cy.session() block yourself or you record the login flow every single time, which means your recorded tests include brittle login steps that break independently of what you’re actually testing. For anything behind auth — which is most of the interesting UI — this is a real gap.

TestCafe Studio’s biggest hidden cost isn’t the license fee, it’s the switching cost. The Studio writes TestCafe-flavored JS that doesn’t map cleanly to Playwright or Cypress primitives. The moment you want to migrate, you’re re-recording everything. Weigh that against whether you actually need the better session handling it provides — for many teams the answer is no, and Playwright codegen is the free equivalent that doesn’t hold your tests hostage.

Practical guide for picking one fast:

Greenfield project, no runner preference: Playwright Codegen. Not close.
Already deep in the Cypress ecosystem: Cypress Studio for quick wins, but keep your data-cy attributes consistent or the generated selectors will embarrass you.
Legacy Selenium suite you can’t replace: Selenium IDE is tolerable for net-new recordings if you accept you’ll clean the generated code before committing it.
Non-engineer QA team that needs a visual tool and has budget: TestCafe Studio is the most polished recording experience, just go in knowing you’re picking a platform.

When to Pick What: Match the Tool to Your Situation

The honest answer is that most “which tool should I use” debates waste time because the answer is almost always determined by your current stack and team composition, not by feature matrices. Here’s how I actually think through it.

Starting Fresh with No Existing Test Suite

Playwright Codegen, full stop. Run npx playwright codegen https://yourapp.localhost and you get a browser with a live recorder that generates real Playwright test syntax as you click. No account needed, no config file required to start, and the generated code uses locators that are actually decent by default — it tries getByRole and getByLabel before falling back to CSS selectors. The output lands directly in a .spec.ts file you can run immediately with npx playwright test. The thing that caught me off guard the first time was how well it handles form fills — it captures fill() calls with the actual values, so your recorded test is runnable on the first try, not just a skeleton.

Already on Cypress and Need Coverage Fast

Use Cypress Studio as a shortcut, but go in knowing it’s scaffolding, not finished code. Enable it in cypress.config.js with experimentalStudio: true, record your flow by interacting with the app inside the Cypress runner, then immediately audit the generated selector chain. Studio leans hard on cy.get('[data-cy="..."]'code> when those attributes exist, but on apps where nobody added data-cy attributes it’ll generate stuff like cy.get('.MuiButton-root:nth-child(3)') which will break the second someone updates a Material UI version. The plan should always be: record fast, then spend 20 minutes swapping brittle selectors for cy.contains() or cy.findByRole() from Testing Library before committing.

Stuck on a Selenium Project You Can’t Migrate

Selenium IDE still works for recording, but treat the exported code as a rough draft, never as production test code. The real workflow is: record in the IDE, export to your language of choice (Java, Python, JS), then hand-edit before a single line goes into version control. The XPath selectors Selenium IDE generates are famously fragile — you’ll see output like //div[@class='container']/div[2]/button which breaks if someone adds a sibling element. Replace those with explicit IDs or By.cssSelector("[data-testid='submit']"). Also, Selenium IDE’s export doesn’t handle waits intelligently — add explicit WebDriverWait calls around any assertion that depends on async behavior or you’ll be debugging intermittent failures for weeks.

Non-Technical QA Team Needs to Own Tests

TestCafe Studio is worth having the cost conversation with your manager. The free testcafe CLI is excellent, but TestCafe Studio adds a visual recorder with a point-and-click interface that a QA analyst without JavaScript experience can actually use without a developer holding their hand. The license cost (check their current pricing at testcafe.io — it’s seat-based) is usually cheaper than the engineering hours spent translating QA’s manual test scripts into code. The trade-off is that the generated tests still need a developer to review before they go into CI, because non-technical users tend to over-assert on visual properties that change constantly. Budget for a monthly cleanup pass.

Recording Authenticated Flows Reliably in CI

Don’t try to record the login flow and replay it inline — that approach breaks constantly on CSRF tokens and session expiry. Instead, use Playwright’s storage state pattern. Write a one-time auth setup script:

// auth.setup.ts
import { chromium } from '@playwright/test';

async function globalSetup() {
  const browser = await chromium.launch();
  const page = await browser.newPage();
  await page.goto('https://yourapp.com/login');
  await page.fill('[name="email"]', process.env.TEST_EMAIL!);
  await page.fill('[name="password"]', process.env.TEST_PASSWORD!);
  await page.click('[type="submit"]');
  // save cookies + localStorage so every test starts authenticated
  await page.context().storageState({ path: 'playwright/.auth/user.json' });
  await browser.close();
}

export default globalSetup;

Then reference it in playwright.config.ts with globalSetup: './auth.setup.ts' and storageState: 'playwright/.auth/user.json' in your project config. Every recorded test you generate with codegen after this point can skip the login entirely. Commit the setup script, add playwright/.auth/ to .gitignore.

React or Next.js Apps with Lots of Dynamic Content

Pure record-and-playback on a heavily dynamic React app is a trap. The recorder captures what was in the DOM at that exact moment — if your component re-renders with different keys or your Next.js app does hydration that shifts element order, your recorded selectors become stale within days. Use codegen strictly as a scaffold: get the basic navigation and interaction flow captured, then manually replace generated selectors with role-based queries. The difference between a test that survives a refactor and one that doesn’t is usually this:

// what codegen gives you — breaks on className changes
await page.locator('.checkout-button--primary').click();

// what you replace it with — survives refactors
await page.getByRole('button', { name: 'Complete Purchase' }).click();

Spend the 10 minutes per test doing that swap. Your future self — the one debugging CI failures at 11pm — will be grateful.

The Gotchas Nobody Puts in the Docs

The thing that burned me first wasn’t picking the wrong recorder — it was realizing that every recorder I tried produced tests that passed once, then failed randomly for the next three days. The output looks like a test. It has selectors, assertions, waits. It just doesn’t survive contact with your actual CI pipeline.

Dynamic IDs Will Eat Your Selectors Alive

React, Vue, Angular — they all have a habit of generating IDs at runtime. You record a click on #input-3f8a2b and by the next render that hash has rotated. Playwright Codegen, Selenium IDE, Cypress Studio — none of them have a magic fix for this because it’s your app’s fault, not theirs. The recorder faithfully writes down what it sees. If what it sees is garbage, you get garbage tests.

The fix is a team convention, not a tool feature. Every interactive element that a test might need to target gets a data-testid attribute baked in at the component level:

<!-- In your React component -->
<input
  type="email"
  data-testid="login-email-input"
  {...props}
/>

Then when you run Playwright Codegen, pass --target to make it prefer those over auto-generated selectors:

# Playwright 1.40+ — this flag tells the recorder to prefer data-testid
npx playwright codegen \
  --target=playwright-test \
  https://app.yoursite.dev

You can also set testIdAttribute in your config so Playwright’s locator engine knows which attribute to use universally:

// playwright.config.ts
export default defineConfig({
  use: {
    testIdAttribute: 'data-testid', // now page.getByTestId() uses this
  },
});

Without this, even Playwright’s codegen will fall back to positional selectors like nth=2 or fragile text matches. Neither survives a UI copy change.

Hardcoded Waits Are a Flakiness Time Bomb

Every recorder I’ve used — without exception — inserts waitForTimeout(2000) or page.waitFor(3000) when it detects you paused between actions. That number is completely made up. It was fast enough on your laptop the day you recorded. It won’t be fast enough in CI running on a 2-core shared runner at 2am under load.

Delete every hardcoded timeout in your recorded output. Replace with intent-based waits:

// ❌ What the recorder gives you
await page.waitForTimeout(2000);
await page.click('#submit-btn');

// ✅ What you actually want
await page.waitForSelector('#submit-btn:not([disabled])');
await page.click('#submit-btn');

// Or for pages that do a full data fetch before showing content:
await page.waitForLoadState('networkidle');
await page.getByTestId('dashboard-table').waitFor();

networkidle isn’t perfect — some SPAs have persistent polling that prevents it from ever firing — but it’s significantly better than a magic number. The pattern I default to is waitForSelector on the element you’re about to interact with, combined with a visible state check. This mirrors what a real user does: they wait until the button is there before clicking it.

Auth Tokens Expire Mid-Run and Nobody Warns You

This one cost me half a Friday. I recorded a full authenticated flow, ran it locally twelve times — all green. Pushed to CI, watched it fail on step 4 with a 401. The token the recorder captured in storage was 1 hour old by the time the pipeline actually ran it.

Playwright’s storageState solves this if you set it up correctly. The key is doing the auth programmatically in a fixture that runs before each test suite, not capturing it once at record time:

// tests/fixtures/auth.ts
import { test as base } from '@playwright/test';

export const test = base.extend({
  authenticatedPage: async ({ page }, use) => {
    // Fresh auth on every CI run — no stale tokens
    await page.goto('/login');
    await page.getByTestId('email-input').fill(process.env.TEST_USER_EMAIL!);
    await page.getByTestId('password-input').fill(process.env.TEST_USER_PASSWORD!);
    await page.getByTestId('login-submit').click();
    await page.waitForURL('/dashboard');

    // Save state for reuse across tests in this worker
    await page.context().storageState({ path: 'playwright/.auth/user.json' });
    await use(page);
  },
});

// playwright.config.ts — reference this in your project setup
export default defineConfig({
  projects: [
    {
      name: 'setup',
      testMatch: /global.setup\.ts/,
    },
    {
      name: 'authenticated',
      use: { storageState: 'playwright/.auth/user.json' },
      dependencies: ['setup'],
    },
  ],
});

The dependencies array is the part that’s easy to miss. Without it, your auth setup and your actual tests can run in parallel, and you’ll get a race condition that only fails on full CI runs — never locally.

HTTPS with Self-Signed Certs Will Block Every Recorder

Local dev environments using mkcert or self-signed certs for HTTPS simulation will hard-block Playwright Codegen by default. The browser it launches will throw a cert error before your app even loads. The fix is a single flag, but it’s buried in the CLI help:

# Record against a local HTTPS environment with a self-signed cert
npx playwright codegen \
  --ignore-https-errors \
  https://local.dev:3000

# Combine with the testid preference flag:
npx playwright codegen \
  --ignore-https-errors \
  --target=playwright-test \
  https://local.dev:3000

For Cypress, the equivalent is setting chromeWebSecurity: false in cypress.config.js — but that’s a broader flag that disables cross-origin checks entirely, which you may not want. Playwright’s --ignore-https-errors is scoped to the codegen session only, so it’s cleaner. If you’re using Selenium IDE against a local HTTPS environment, you’ll need to add the cert to your OS trust store rather than relying on a flag — there’s no CLI equivalent.

Turning a Recorded Test Into Something That Won’t Break

The first thing I do after any codegen session is run the output immediately without touching it. Not because I trust it — I don’t — but because I need a baseline failure rate. Raw codegen almost always breaks on the second or third run due to timing issues and brittle selectors. Knowing how it breaks tells me exactly what to fix first.

Step 1: Capture the Baseline

Run the generated file, let it fail, and screenshot where it dies. Playwright’s --reporter=html gives you a flamegraph-style trace that shows exactly which selector timed out. I always run it three times in a row on CI before touching a single line — if it passes all three, I’ll still fix the selectors, but I know the timing at least is survivable.

# Run codegen output with full tracing before you edit anything
npx playwright test tests/recorded.spec.ts \
  --reporter=html \
  --trace=on \
  --retries=0

# Then open the report
npx playwright show-report

The trace viewer is what makes this actually useful. You’ll see the DOM snapshot at the moment of failure, which immediately tells you whether it’s a selector problem, a timing problem, or a navigation problem. Usually it’s all three at once.

Step 2: Burn the Fragile Selectors

Codegen loves to output page.locator('div.MuiButton-root:nth-child(3)'). That selector will break the moment a designer adds a wrapper div or swaps the CSS framework. Replace everything with semantic locators — Playwright’s built-in ones are actually backed by ARIA roles, so they survive refactors that don’t change the user-facing behavior.

// Raw codegen output — do not ship this
await page.locator('div.login-form > button:nth-child(2)').click();
await page.locator('#app > div > div.content > input[type="text"]').fill('[email protected]');

// After cleanup — this survives a CSS refactor
await page.getByRole('button', { name: 'Sign in' }).click();
await page.getByLabel('Email address').fill('[email protected]');

// If semantic markup isn't there yet, add data-testid and use this
// <button data-testid="submit-login">Sign in</button>
await page.getByTestId('submit-login').click();

getByRole is my first choice because it tests what the user actually sees. getByLabel is second for form inputs — it matches the <label> element, which forces your team to keep labels in sync with inputs. getByTestId is the escape hatch for dynamic content where role-based selection is genuinely impossible, not just inconvenient.

Step 3: Delete Every waitForTimeout

Every waitForTimeout(2000) in codegen output is a confession that the recorder didn’t know when the page was ready. These turn into flaky tests the moment CI runs on a slower machine. Replace them with assertions that describe what you’re actually waiting for.

// Codegen output — this is a sleep, not a test
await page.waitForTimeout(2000);
await page.locator('.dashboard-widget').click();

// Correct version: wait for the state you care about
await page.waitForLoadState('networkidle'); // for pages with heavy AJAX
await expect(page.getByRole('region', { name: 'Dashboard' })).toBeVisible();
await page.getByRole('region', { name: 'Dashboard' }).getByRole('button', { name: 'Widgets' }).click();

// For API-driven UIs, intercepting the response is more reliable than networkidle
const dashboardReady = page.waitForResponse(resp =>
  resp.url().includes('/api/dashboard') && resp.status() === 200
);
await page.goto('/dashboard');
await dashboardReady;

waitForLoadState('networkidle') works for most cases but can hang on pages that poll an endpoint every few seconds. In those situations I use waitForResponse with a URL pattern — it resolves the moment the specific call you care about completes, regardless of background activity.

Step 4: Extract the Repeated Flows

Login appears in 90% of tests. If it’s copy-pasted into every file, one password policy change will break your entire suite. Playwright fixtures are the right tool — they’re composable, they get proper setup/teardown, and they make the actual test body read like a spec, not an instruction manual.

// fixtures/auth.ts
import { test as base, Page } from '@playwright/test';

type AuthFixtures = {
  authenticatedPage: Page;
};

export const test = base.extend<AuthFixtures>({
  authenticatedPage: async ({ page }, use) => {
    // Reuse stored auth state instead of logging in on every test
    // Run: npx playwright codegen --save-storage=auth.json
    await page.goto('/login');
    await page.getByLabel('Email').fill(process.env.TEST_EMAIL!);
    await page.getByLabel('Password').fill(process.env.TEST_PASSWORD!);
    await page.getByRole('button', { name: 'Sign in' }).click();
    await expect(page.getByRole('navigation')).toBeVisible();
    await use(page);
  },
});

// In your actual test file
import { test } from '../fixtures/auth';
import { expect } from '@playwright/test';

test('user can update profile', async ({ authenticatedPage }) => {
  // starts already logged in — no boilerplate
  await authenticatedPage.getByRole('link', { name: 'Profile' }).click();
  await expect(authenticatedPage.getByRole('heading', { name: 'Your Profile' })).toBeVisible();
});

The Real Before/After

Here’s an actual comparison from a checkout flow test I cleaned up. The raw version passed locally twice and never made it through CI without retries. The cleaned version has run in CI for three months without a single flake.

// ❌ Raw codegen output — flaky within a week
test('checkout flow', async ({ page }) => {
  await page.goto('http://localhost:3000/shop');
  await page.waitForTimeout(1500);
  await page.locator('#app > div.product-grid > div:nth-child(1) > button').click();
  await page.waitForTimeout(500);
  await page.locator('div.cart-icon').click();
  await page.waitForTimeout(1000);
  await page.locator('input[name="email"]').fill('[email protected]');
  await page.locator('button.checkout-btn').click();
  await page.waitForTimeout(3000);
  await expect(page.locator('h1')).toContainText('Order confirmed');
});

// ✅ Cleaned version — survived 3 months in CI, zero flakes
test('checkout flow', async ({ authenticatedPage: page }) => {
  await page.goto('/shop');

  // Wait for products to actually render, not a fixed time
  await expect(page.getByRole('list', { name: 'Products' })).toBeVisible();

  await page.getByRole('listitem').first().getByRole('button', { name: 'Add to cart' }).click();

  // Assert cart count updated before proceeding
  await expect(page.getByRole('status', { name: 'Cart' })).toContainText('1');

  await page.getByRole('link', { name: 'Cart' }).click();
  await expect(page.getByRole('heading', { name: 'Your cart' })).toBeVisible();

  await page.getByRole('button', { name: 'Checkout' }).click();

  // Wait for Stripe iframe to load — networkidle handles this reliably
  await page.waitForLoadState('networkidle');
  await page.getByLabel('Email').fill('[email protected]');

  await page.getByRole('button', { name: 'Place order' }).click();

  // Explicit assertion on the confirmation, not a sleep
  await expect(page.getByRole('heading', { name: 'Order confirmed' })).toBeVisible({ timeout: 10000 });
});

The timeout bump on the final assertion (10000ms) is intentional — order confirmation hits a real payment processor in staging, so it legitimately takes longer than a page navigation. Raising that one timeout explicitly is far better than sprinkling waitForTimeout calls throughout. Everything else should resolve in under 5 seconds or there’s a real problem worth knowing about.

Disclaimer: This article is for informational purposes only. The views and opinions expressed are those of the author(s) and do not necessarily reflect the official policy or position of Sonic Rocket or its affiliates. Always consult with a certified professional before making any financial or technical decisions based on this content.

Written by Eric Woo

Lead AI Engineer & SaaS Strategist

Eric is a seasoned software architect specializing in LLM orchestration and autonomous agent systems. With over 15 years in Silicon Valley, he now focuses on scaling AI-first applications.

Web Test Recorders That Actually Replay Correctly (I’ve Broken Enough CI Pipelines to Know)