The Problem: Your Tests Pass Locally and Explode in CI
The classic failure: green locally, red in CI, nobody knows why
You run your Selenium suite locally — 47 tests, all green, ship it. The PR hits GitHub Actions and 23 tests fail. Half of them are TimeoutException, a few are NoSuchElementException, and one is some cryptic StaleElementReferenceException that you’ve never seen before. This is the Selenium tax on JavaScript apps, and I’ve paid it on enough projects that I can tell you exactly why it happens and how to stop it.
The root issue is almost never your test logic. It’s the environment gap. Your local machine runs Chrome with your profile, your cached assets, your CPU, and a real display. CI runs a headless Chrome (or Chromium, which has slightly different behavior) with a clean slate, constrained CPU, and — this is the one that catches people — no GPU acceleration. JavaScript-heavy SPAs that rely on CSS transitions to signal “component is ready” will render measurably slower under those constraints. Your driver.findElement() fires before the DOM is stable, and suddenly a test that passed every single time locally is flaky 40% of the time in CI.
JavaScript apps make this worse than almost any other stack. A React or Vue SPA doesn’t give you a simple “page loaded” signal — the browser fires DOMContentLoaded, then JS hydrates, then components mount async, then data fetches complete, then conditional renders trigger. If you’re using something like driver.wait(until.elementLocated(...), 5000) with a hardcoded timeout, you’re essentially hoping 5 seconds is enough on a GitHub Actions runner that might be under load. The thing that caught me off guard the first time was that document.readyState === 'complete' means nothing for a SPA — the DOM is “ready” before your actual content exists. You need explicit waits tied to application state, not browser lifecycle events.
This guide walks through a specific, working stack: WebdriverIO as the Selenium wrapper, Jest as the test runner, and GitHub Actions as the CI target. I’ll include notes on GitLab CI where the config differs meaningfully. I picked WebdriverIO over raw selenium-webdriver because it handles a lot of the async retry logic automatically — its $ selector commands will retry by default for up to the configured waitforTimeout, which eliminates an entire category of race conditions you’d otherwise handle manually. The trade-off: WebdriverIO’s docs have gaps, especially around custom reporters and the browser.executeAsync() API. Expect to read source code. For broader context on automating your development workflow beyond just testing, the Ultimate Productivity Guide: Automate Your Workflow in 2026 covers the bigger picture of CI, tooling, and process automation worth reading alongside this.
# What you'll need before we start
node --version # 18+ required, WebdriverIO 8 drops Node 16
npx wdio --version # should be 8.x
google-chrome --version # or chromium-browser on Ubuntu runners
chromedriver --version # must match Chrome major version exactly
That last line — chromedriver version matching Chrome — is where most CI setups silently break. Locally you installed Chrome 120 and the matching ChromeDriver months ago and never touched it. The GitHub Actions ubuntu-22.04 runner updates Chrome periodically, and if your workflow pins ChromeDriver at 120 while the runner now has Chrome 124, you’ll get a cryptic session not created error that looks like a networking issue. The fix is to stop pinning and use wdio-chromedriver-service with autoDownload: true, or switch entirely to @wdio/devtools-service which connects over the Chrome DevTools Protocol and skips the ChromeDriver version dance altogether.
Choosing Your Selenium Wrapper (and Why I Use WebdriverIO Over Raw Selenium)
The raw selenium-webdriver npm package will get you running in about 15 minutes, but you’ll spend the next two days fighting async timing issues. Every element interaction is a Promise chain, and if you forget a single await, you get silent failures that are nearly impossible to debug in CI logs. Here’s what basic raw Selenium looks like — and why it gets old fast:
// Raw selenium-webdriver — the pain starts here
const { Builder, By, until } = require('selenium-webdriver');
const driver = await new Builder().forBrowser('chrome').build();
await driver.get('https://yourapp.com');
await driver.wait(until.elementLocated(By.css('.login-btn')), 5000);
const btn = await driver.findElement(By.css('.login-btn'));
await btn.click();
// Miss any await here and you're debugging ghosts
That driver.wait(until.elementLocated(...)) call is what kills you at scale. Every single interaction needs its own explicit wait strategy. In a 200-test suite, that’s hundreds of manual timeout decisions. Miss one and your CI pipeline starts randomly failing on Tuesdays for no apparent reason.
I Used Both Notion and ClickUp to Run Marketing Campaigns — Here’s Where Each One Broke Down
What WebdriverIO v8 Actually Solves
I switched to WebdriverIO after my third “works locally, fails in CI” incident. The thing that caught me off guard was how much the auto-wait feature changes the experience. WebdriverIO waits for elements to be clickable, visible, and interactable by default — without you explicitly asking. The $() syntax is synchronous-looking even though it’s async under the hood, which makes tests readable by humans instead of Promise-chain archaeologists:
// WebdriverIO v8 — the same flow, less ceremony
await browser.url('https://yourapp.com');
await $('.login-btn').click(); // auto-waits, no manual until() needed
await expect($('.dashboard-title')).toBeDisplayed();
Install it with the CLI which walks you through the config interactively:
npm init wdio@latest ./
It’ll ask you browser, framework (Mocha/Jasmine/Cucumber), reporter, and whether you want cloud grid support. That last choice matters a lot — pick it now rather than retrofitting later. WebdriverIO also ships with allure, spec, and junit reporters out of the box. JUnit XML is what your CI system (GitHub Actions, Jenkins, whatever) needs to display test results natively. With raw Selenium you’re wiring that up yourself.
The Honest Answer on Playwright and Cypress
If you’re starting a brand new project with no existing test infrastructure, seriously consider Playwright first. The auto-waiting is equally good, the debugging tools are arguably better, and the docs are excellent. Cypress is great if your app is purely single-origin and you don’t need real multi-tab or iframe support — but it’ll hit a wall the moment you need to test OAuth flows across domains or work with browser extensions.
The situations where you should ignore that advice and stay with WebdriverIO: you have an existing Selenium test suite (migration cost is real — don’t rewrite what works), you need to run tests on real device farms like Sauce Labs or BrowserStack against actual Safari on iOS, or your org’s QA team already knows Selenium and you’re not going to retrain them for a tooling preference. Playwright’s BrowserStack integration exists but is newer and I’ve personally hit quirks with session handling on their Automate grid that I haven’t seen with WebdriverIO’s native service.
The Real Reason I Stay: Grid Integrations That Just Work
WebdriverIO has first-class services for both Sauce Labs and BrowserStack. You install the service, add credentials, and you’re running on 50 browser/OS combinations without writing a single line of custom grid code:
npm install @wdio/sauce-service --save-dev
// wdio.conf.js — Sauce Labs in under 10 lines of config
export const config = {
user: process.env.SAUCE_USERNAME,
key: process.env.SAUCE_ACCESS_KEY,
services: ['sauce'],
capabilities: [{
browserName: 'chrome',
browserVersion: 'latest',
platformName: 'Windows 11',
'sauce:options': { build: process.env.CI_BUILD_ID }
}]
};
BrowserStack’s equivalent is @wdio/browserstack-service and the config structure is nearly identical. What this gives you in CI is automatic test status reporting back to the Sauce/BrowserStack dashboard — pass/fail, session videos, logs — without any custom webhook setup. I tried wiring the same thing manually with raw Selenium once. It took a full afternoon and the video recording still didn’t attach to failures correctly. The WebdriverIO service handles all of that. That single integration is the practical reason I haven’t switched away despite Playwright being genuinely impressive on other metrics.
Installing and Configuring WebdriverIO
Run this in your project root and let the wizard do the heavy lifting:
npm init wdio@latest .
The interactive prompts look friendly, but a few of them will trip you up if you don’t know what you’re actually choosing. The environment prompt (browser vs. Node.js) — pick browser, obviously. Automation backend asks where the browser driver lives; choose “Run chromedriver automatically” unless you have a Selenium Grid. The base URL prompt defaults to http://localhost — change it to your actual dev server URL now, because hunting for it later in wdio.conf.js is annoying. Framework selection comes next, and this is where most tutorials lose me by being neutral about it.
Pick Jest If You’re Already in a JS Project
The wizard gives you Mocha, Jasmine, and Cucumber. I use Jest because it’s already in every JS project I work on — same assertion style, same expect() API, same mental model. Mocha is fine but requires wiring up Chai separately for assertions, which is an extra dependency I don’t want to manage. Cucumber is for teams doing BDD where product owners write the test scenarios in Gherkin — if that’s not your workflow, it’s overhead with no payoff. One honest caveat: WebdriverIO’s Jest integration is slightly younger than its Mocha integration, so occasionally you’ll hit a quirk with async setup that Mocha handles more gracefully. Still worth it for the consistency.
The Six Fields in wdio.conf.js You’ll Actually Touch
The generated config file is about 300 lines. Ignore most of it. Here are the six fields that matter:
- specs — glob pattern pointing to your test files. Default is
./test/specs/**/*.js. Change this to wherever your tests actually live. - baseUrl — your app’s base address. Use
browser.url('/login')in tests and WebdriverIO prepends this automatically. - capabilities — defines which browser(s) to run. This is also where you configure headless mode (more on this below).
- waitforTimeout — global default for how long WebdriverIO waits before declaring an element not found. Default is 3000ms. I bump it to 10000ms for any app doing server-side rendering or lazy loading.
- services — plugins that manage browser driver lifecycle. In 2024 this should be
['chromedriver']and nothing else for local Chrome testing. - reporters — how results are displayed.
'spec'gives you readable terminal output. Add'allure'if you need HTML reports for your CI dashboard.
The Exact Capability Object for Headless Chrome in CI
This is the thing that catches people — headless mode isn’t a top-level config option, it’s buried inside capabilities. Paste this into your wdio.conf.js:
capabilities: [{
browserName: 'chrome',
'goog:chromeOptions': {
args: [
'--headless=new',
'--no-sandbox',
'--disable-dev-shm-usage',
'--disable-gpu',
'--window-size=1920,1080'
]
}
}]
Use --headless=new not --headless — the old flag activates a legacy headless mode that Chrome has been deprecating since version 112. The --no-sandbox and --disable-dev-shm-usage flags are non-negotiable in Docker/CI environments; without them Chrome crashes silently and you spend an hour confused about why your tests pass locally and fail in the pipeline. The thing that caught me off guard the first time: --window-size actually matters. Without it, headless Chrome defaults to a tiny viewport and responsive layouts break your tests in ways that feel like flakiness but are actually just CSS.
Use @wdio/chromedriver-service — selenium-standalone Is Done
Install it once:
npm install @wdio/chromedriver-service --save-dev
@wdio/chromedriver-service automatically downloads a ChromeDriver version that matches your installed Chrome — no version pinning, no manual updates, no “ChromeDriver 114 doesn’t support Chrome 120” errors at 11pm. The selenium-standalone service still shows up in older tutorials and blog posts everywhere, but the package itself warns you it’s deprecated on install. It works until it doesn’t, and when it breaks it breaks in confusing ways involving Java version mismatches and port conflicts. If you’re maintaining an old project still using it, migrating to @wdio/chromedriver-service takes about 20 minutes and is worth doing before it bites you in a release cycle.
Writing Tests That Don’t Flake
Kill browser.pause() Before It Kills Your Pipeline
The single fastest way to write a flaky test is browser.pause(2000). I’ve seen codebases where every test had three or four of these scattered around — and the suite still failed randomly on CI because sometimes the animation took 2100ms instead of 1800ms. Hard-coded pauses don’t adapt. They just make your tests slow and unreliable at the same time, which is genuinely the worst possible outcome. Replace every single one with explicit waits:
// ❌ This will haunt you
await browser.pause(2000);
await $('#submit-btn').click();
// ✅ Wait until the element is actually ready to be clicked
await $('#submit-btn').waitForClickable({ timeout: 8000 });
await $('#submit-btn').click();
// For elements that need to exist in the DOM but aren't necessarily visible
await $('#toast-message').waitForExist({ timeout: 5000 });
// For visibility checks (good for modals, dropdowns, overlays)
await $('#modal').waitForDisplayed({ timeout: 6000 });
The difference matters: waitForExist() just checks the DOM, waitForDisplayed() checks that it’s visible, and waitForClickable() checks both visibility and that it’s not disabled. Use the wrong one and you’ll click an invisible element and wonder why nothing happened.
Why React (and Vue, Angular) Will Break Your Button Click on First Load
Framework hydration is the thing that caught me off guard when I first tested a Next.js app. The page renders HTML server-side, so Selenium can “see” the button immediately — waitForDisplayed() passes, your click fires, and absolutely nothing happens. The JavaScript bundle hasn’t finished attaching event listeners yet. The DOM is there. The interactivity is not. This is the hydration gap, and every SSR framework has it.
The fix is to wait for something that only exists after hydration completes. My preferred approach is a hydration sentinel — a small bit of code in your app that sets a data-hydrated="true" attribute on the body element after the framework mounts. Then in your tests:
// In your React root component (e.g., _app.tsx or main.tsx)
useEffect(() => {
document.body.setAttribute('data-hydrated', 'true');
}, []);
// In your WebdriverIO test
await $('body[data-hydrated="true"]').waitForExist({ timeout: 10000 });
await $('[data-testid="login-button"]').waitForClickable({ timeout: 5000 });
await $('[data-testid="login-button"]').click();
If touching the app code is off the table, an alternative is waiting for the network to go idle. WebdriverIO doesn’t give you that directly, but you can poll document.readyState via browser.execute() and combine it with a short custom wait. It’s less reliable than the sentinel approach, but it works in a pinch.
The waitforTimeout Global Is Set Too Low by Default — Fix It for CI
WebdriverIO’s default waitforTimeout is 3000ms. On your local machine with a fast SSD and hot caches, that’s fine. On a CI runner that’s also building Docker images, running linters, and fighting for CPU time, 3000ms is not enough — and you’ll get a wall of element ("...") still not displayed after 3000ms errors that disappear when you re-run the suite locally. Every. Single. Time.
Set the global to 10000ms in your wdio.conf.ts for CI runs. I use an environment variable to switch between environments:
// wdio.conf.ts
export const config: Options.Testrunner = {
waitforTimeout: process.env.CI ? 10000 : 4000,
connectionRetryTimeout: process.env.CI ? 120000 : 90000,
connectionRetryCount: 3,
// rest of your config...
};
The per-command timeout in individual waitFor* calls overrides this global, so you can still use shorter timeouts for specific assertions where you actually want fast failure. Use the global as your safety net, not your primary timeout strategy.
Page Object Model: A Real Login Page Implementation
POM gets talked about a lot but rarely shown with enough detail to be useful. Here’s an actual pattern I use — the key design decision is that every selector lives in the class, and every method returns a meaningful value or throws a descriptive error:
// test/pageobjects/login.page.ts
class LoginPage {
get usernameInput() { return $('[data-testid="username-input"]'); }
get passwordInput() { return $('[data-testid="password-input"]'); }
get submitButton() { return $('[data-testid="login-submit"]'); }
get errorMessage() { return $('[data-testid="login-error"]'); }
get dashboardHeading() { return $('[data-testid="dashboard-title"]'); }
async open() {
await browser.url('/login');
await $('body[data-hydrated="true"]').waitForExist({ timeout: 10000 });
}
async login(username: string, password: string) {
await this.usernameInput.waitForDisplayed();
await this.usernameInput.setValue(username);
await this.passwordInput.setValue(password);
await this.submitButton.waitForClickable();
await this.submitButton.click();
}
async expectSuccessfulLogin() {
await this.dashboardHeading.waitForDisplayed({ timeout: 8000 });
return this.dashboardHeading.getText();
}
async expectLoginError() {
await this.errorMessage.waitForDisplayed({ timeout: 5000 });
return this.errorMessage.getText();
}
}
export default new LoginPage();
// test/specs/login.spec.ts
import LoginPage from '../pageobjects/login.page.js';
describe('Login flow', () => {
it('redirects to dashboard on valid credentials', async () => {
await LoginPage.open();
await LoginPage.login('[email protected]', 'correcthorsebattery');
const heading = await LoginPage.expectSuccessfulLogin();
expect(heading).toContain('Dashboard');
});
it('shows error message on invalid password', async () => {
await LoginPage.open();
await LoginPage.login('[email protected]', 'wrongpassword');
const error = await LoginPage.expectLoginError();
expect(error).toBe('Invalid credentials. Please try again.');
});
});
Notice that the test file reads almost like English. That’s the actual value of POM — not code reuse (though you get that too), but making failures obvious. When a test breaks, you know immediately whether the problem is in the page interaction logic or the test assertion.
Selector Strategy: data-testid Over Everything Else
CSS class selectors break when your designer refactors the component library. XPath selectors break when the DOM structure changes. Both of these will happen. I’ve watched a team spend a full sprint re-writing Selenium tests after a Tailwind migration renamed every CSS class in the app. The tests weren’t wrong — the selectors just became meaningless.
data-testid attributes are the right answer because they create a contract between the app and the test suite. A designer won’t touch them. A CSS refactor won’t touch them. They survive almost any structural change. The pushback you’ll hear is “I don’t want test concerns in my production HTML” — my answer is that the attribute adds maybe 20 bytes per element and the debugging time you save is measured in days, not hours.
The one real problem with data-testid is dynamic generation — frameworks that auto-generate IDs like data-testid="button-7f3a2b" from component metadata. If you’re stuck with those, use a compound selector that combines a stable attribute with a structural relationship:
// Don't do this — the hash will change on every build
await $('[data-testid="button-7f3a2b"]').click();
// Better — combine a stable parent selector with element type
await $('[data-testid="checkout-form"] button[type="submit"]').click();
// Best — lobby your team to add explicit test IDs to interactive elements
// This is a conversation worth having once, not a workaround you maintain forever
If you’re starting a greenfield project, add data-testid to every interactive element from day one. It takes 30 seconds per component and saves hours per incident. I’ve never once regretted having too many test IDs.
Setting Up GitHub Actions to Run These Tests
The thing that caught me off guard the first time I set this up was Chrome. You see ubuntu-latest in everyone’s GitHub Actions examples and assume it’s a full desktop OS with a browser ready to go. It’s not. It’s a stripped-down server image. No Chrome, no display server, nothing. I wasted an afternoon wondering why my Selenium tests were passing locally and immediately dying in CI with a cryptic WebDriverException: unknown error: cannot find Chrome binary. Here’s the full workflow file, every line explained, so you don’t lose that afternoon.
# .github/workflows/e2e.yml
name: E2E Tests
on:
push:
branches: [main, develop]
pull_request:
branches: [main]
jobs:
e2e:
# ubuntu-latest is 22.04 as of mid-2024 — Chrome is NOT included
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
# Install Google Chrome stable from Google's own apt repo
# Do NOT use `chromium-browser` from Ubuntu's default repos —
# it's a snap package on 22.04 and snap doesn't work in GitHub Actions runners
- name: Install Google Chrome
run: |
wget -q -O - https://dl.google.com/linux/linux_signing_key.pub | sudo apt-key add -
echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" | sudo tee /etc/apt/sources.list.d/google-chrome.list
sudo apt-get update -y
sudo apt-get install -y google-chrome-stable
# Xvfb creates a virtual framebuffer — a fake display
# Even in headless mode, some Selenium operations still touch the display layer
# Missing this causes random failures in shadow DOM interactions and drag-and-drop
- name: Start Xvfb
run: |
sudo apt-get install -y xvfb
Xvfb :99 -screen 0 1920x1080x24 &
env:
DISPLAY: :99
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
# Cache node_modules — the key pattern here is intentional
# Hashing package-lock.json means cache busts only when deps actually change
# Without this, npm install on every run adds ~45-90 seconds depending on your tree
- name: Cache node_modules
uses: actions/cache@v3
with:
path: node_modules
key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
restore-keys: |
${{ runner.os }}-node-
- name: Install dependencies
run: npm ci
# Build the app and serve it — Selenium needs an actual HTTP server to hit
# The & at the end runs serve in the background so the step doesn't block
# sleep 5 gives serve time to bind to port 3000 before tests start
- name: Build and serve app
run: |
npm run build
npx serve -s build -l 3000 &
sleep 5
env:
CI: false # Prevents create-react-app from treating warnings as errors during build
# Set DISPLAY here too — env set in earlier steps doesn't always persist
- name: Run E2E tests
run: npm run test:e2e
env:
DISPLAY: :99
APP_URL: http://localhost:3000
That chromium-browser trap deserves its own callout. On Ubuntu 22.04, which is what ubuntu-latest resolves to, chromium-browser is a snap alias — not a real apt package. Snaps require snapd with a bunch of system services that don’t exist in the GitHub Actions sandbox. You’ll get a package that installs “successfully” but the binary path is wrong and ChromeDriver can’t find it. Always use google-chrome-stable from Google’s repo. This is one of those things that’s completely undocumented in the Selenium or GitHub Actions docs but has bitten basically everyone.
The Xvfb setup looks unnecessary if you’re running Chrome with --headless, and honestly I thought so too. But I’ve hit specific cases where it matters: anything involving window.dragEvent, some file input dialogs, and any component library that does screen-size-dependent rendering at mount time. The virtual framebuffer is cheap to set up (the install and boot take under 3 seconds) and prevents a whole category of flaky tests. The 1920x1080x24 resolution matters — I’ve seen tests fail with the default 640×480 because media queries kick in and hide elements the tests expect to be visible.
The cache key pattern ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }} is the one that actually works in a monorepo or multi-package setup. The double star glob catches nested package-lock.json files too. The restore-keys fallback to ${{ runner.os }}-node- is important — it means on a cache miss you still restore the most recent valid cache, then npm ci only installs the delta. Without that fallback, every dependency update starts from zero. I don’t use actions/setup-node‘s built-in cache here because I want explicit control over what gets cached and when it busts.
The build-and-serve step has two details worth understanding. First, CI: false — if you’re on Create React App or Vite with similar linting setup, warnings in your source files cause the build to exit with code 1 in CI mode. That’s intentional for production pipelines but it’ll silently kill your E2E workflow before tests ever run. Set it to false here. Second, sleep 5 is a blunt instrument but it works. The cleaner approach is polling — something like npx wait-on http://localhost:3000 --timeout 30000 — which waits until the port actually responds rather than guessing. I’ve switched to wait-on on larger apps where the build takes over 30 seconds and sleep 5 wasn’t enough buffer.
GitLab CI Setup (Because Not Everyone Uses GitHub)
The thing that caught me off guard the first time I ran Selenium inside GitLab CI was the networking. I’d used GitHub Actions before, so I assumed the mental model transferred. It doesn’t — not cleanly. GitLab’s Docker-in-Docker service architecture handles inter-container communication differently, and if you don’t know the one rule upfront, you’ll spend an afternoon staring at ERR_CONNECTION_REFUSED errors wondering what’s broken.
The simplest path is using selenium/standalone-chrome as a service in your job, not as the main image. Your main image should be whatever runs your Node tests — I use node:20-alpine. The standalone-chrome image spins up a full Chrome browser with a WebDriver endpoint already listening on port 4444. No Xvfb hacks, no manual chromedriver version management. Here’s the actual config:
stages:
- test
selenium_tests:
stage: test
image: node:20-alpine
services:
- name: selenium/standalone-chrome:latest
alias: selenium-chrome
variables:
SELENIUM_HOST: selenium-chrome
SELENIUM_PORT: "4444"
before_script:
- npm ci
script:
- npm run test:e2e
Now here’s the rule I mentioned: never use localhost to reach a GitLab CI service. Inside a GitLab CI job, each service runs in its own container. They’re not on the same network namespace as your job container, so localhost:4444 points to nothing. The alias you define in the services block — in this case selenium-chrome — is the actual hostname you need. GitLab wires up DNS so that alias resolves correctly. This is the single most common mistake I see in Selenium CI configs that were copied from a local Docker Compose setup where everything runs on the same host.
On the application side, make sure your WebDriver initialization reads from environment variables, not hardcoded strings. Something like this:
const { Builder } = require('selenium-webdriver');
const driver = await new Builder()
.forBrowser('chrome')
.usingServer(
`http://${process.env.SELENIUM_HOST || 'localhost'}:${process.env.SELENIUM_PORT || 4444}/wd/hub`
)
.build();
This way the same code works locally (where localhost is correct) and inside CI (where the alias takes over). One file, zero conditional logic.
A few operational notes from running this setup for a while: the selenium/standalone-chrome:latest tag is convenient but dangerous in CI. I’ve had pipelines break after an upstream Chrome update changed how certain WebDriver commands behave. I now pin to a specific version like selenium/standalone-chrome:4.20.0-20240425 and update it deliberately. Also, if your tests are slow or flaky, add a before_script step that waits for the WebDriver endpoint to be ready before running tests — the service container takes a few seconds to boot, and your test runner might hit port 4444 before Chrome is accepting connections. A simple wget --retry-connrefused --waitretry=5 --timeout=30 http://$SELENIUM_HOST:4444/wd/hub/status loop handles it cleanly without adding a brittle sleep 10.
The Gotchas I Hit (So You Don’t Have To)
The ChromeDriver version mismatch error is the first thing that will make you question your career choices. The error message looks something like SessionNotCreatedException: Message: session not created: This version of ChromeDriver only supports Chrome version 114 — completely unhelpful unless you already know what it means. It means your chromedriver npm package and the Chrome binary installed in your CI container are out of sync. The fix I landed on: stop letting Chrome auto-update in your Docker image and pin it explicitly.
# In your Dockerfile
RUN apt-get install -y google-chrome-stable=114.0.5735.198-1 \
&& apt-mark hold google-chrome-stable
Then pin the matching chromedriver in package.json:
"chromedriver": "114.0.5735.39"
Yes, you’ll have to update both manually when you bump Chrome. That’s annoying. But the alternative is a flaky pipeline that breaks randomly when the CI image pulls a newer Chrome version. The manual update is five minutes. The mystery debugging session is three hours.
The App Wasn’t Ready — But Selenium Didn’t Care
Tests timing out against localhost:3000 because your app hasn’t finished starting is a classic race condition. Your CI step starts the dev server, immediately runs Selenium, and Selenium tries to load a page that returns a connection refused error. The fix is wait-on — a tiny utility that polls a URL until it responds with a 2xx before letting the next step proceed.
npm install --save-dev wait-on
# In your CI config (GitHub Actions example) - name: Start app run: npm start & - name: Wait for app run: npx wait-on http://localhost:3000 --timeout 60000 - name: Run Selenium tests run: npm run test:e2e
Set the timeout to something realistic for your app’s cold start. I use 60 seconds because our Next.js build takes a while on the CI machine. If you skip wait-on and just add a sleep 10, you’ll eventually have a slow day where the app takes 12 seconds and your tests fail for no good reason.
Screenshots on Failure: They’re Silently Disappearing
This one stung me because the tests were screenshotting on failure — the files just got written to a temp path inside the container and evaporated when the job finished. You need two things: tell WebdriverIO where to write screenshots, then tell your CI to upload that directory as an artifact.
// wdio.conf.js
exports.config = {
outputDir: './test-output',
// ...
afterTest: async function(test, context, { error }) {
if (error) {
await browser.saveScreenshot(
`./test-output/screenshots/${test.title.replace(/\s/g, '_')}.png`
);
}
}
}
# GitHub Actions
- name: Upload test artifacts
if: failure()
uses: actions/upload-artifact@v3
with:
name: selenium-screenshots
path: ./test-output/screenshots/
The if: failure() part means the artifact upload only runs when tests fail — no point clogging up your artifact storage on green builds. I learned this the hard way after our artifact storage ballooned with thousands of screenshots from passing runs.
The --no-sandbox Flag Nobody Explains Properly
Every guide tells you to add --no-sandbox to your Chrome args. Almost none of them explain why. Chrome’s sandbox is a security feature that requires the process to not be running as root. CI containers — Docker containers in particular — run as root by default. When Chrome tries to set up its sandbox as root, it refuses and crashes. That’s the whole reason.
// wdio.conf.js
capabilities: [{
browserName: 'chrome',
'goog:chromeOptions': {
args: [
'--no-sandbox',
'--disable-dev-shm-usage', // also needed — /dev/shm is tiny in Docker
'--headless',
'--window-size=1920,1080'
]
}
}]
Add --disable-dev-shm-usage at the same time. Docker containers give Chrome a tiny shared memory allocation by default and Chrome will crash or behave strangely on anything with complex rendering. Both flags together — this combo is the baseline for running Chrome in any container environment.
Turn Off Parallel Runs Until Your Tests Deserve Them
The default maxInstances in WebdriverIO is 5. That means five browser instances running your test suite simultaneously. If your tests share state — same user account, same database, same localStorage — parallel runs will tear each other apart. The failures look completely random, which makes them brutal to debug.
// wdio.conf.js — start here maxInstances: 1,
Get your suite fully green at maxInstances: 1 first. Then bump it to 2, run the suite a few times, check if failures creep back in. If they do, you have shared state to fix before you earn parallel runs. Going straight to 5 instances when your tests are new is borrowing trouble. The time you think you’re saving on parallel execution you’ll spend debugging ghost failures that only reproduce under concurrency.
Uploading Test Artifacts and Reading Reports
Setting Up Allure and Getting Useful Reports Out of CI
Most teams I’ve seen skip proper reporting and just read raw terminal output. That works until you have 80 tests failing at 2am and you’re squinting at a scrollback buffer trying to figure out which ones are flaky versus genuinely broken. Allure fixes this — it gives you a proper HTML report with per-test timelines, screenshots attached inline, and retry history. The setup is maybe 10 minutes.
npm install @wdio/allure-reporter allure-commandline --save-dev
Then in your wdio.conf.js, you wire in the reporter and tell it where to dump the raw results:
reporters: [
'spec',
['allure', {
outputDir: 'allure-results',
disableWebdriverStepsReporting: true,
disableWebdriverScreenshotsReporting: false,
}]
],
The disableWebdriverStepsReporting: true flag is one I wish I’d turned on from day one. Without it, every single WebDriver command — every findElement, every getAttribute — gets logged as a step in the Allure report. You end up with 300-step tests where finding the actual failure is harder than reading raw logs. Turn it off. You want your logical test steps, not the WebDriver plumbing.
After your test run, you generate the HTML report from the raw JSON output:
npx allure generate allure-results --clean -o allure-report
That allure-results/ directory is what CI needs to upload. The generated allure-report/ is what humans actually open. Upload both — the results let you regenerate or diff later, the HTML report is what you send to stakeholders or link in a PR comment.
Uploading Artifacts to GitHub Actions
Here’s the artifact upload block I use. It runs whether tests pass or fail — that if: always() is critical. If you omit it, GitHub Actions skips the upload step when tests fail, which is exactly when you need the screenshots most. I burned an afternoon before I caught that.
- name: Upload Allure Report
uses: actions/upload-artifact@v3
if: always()
with:
name: allure-report
path: allure-report/
retention-days: 14
- name: Upload Screenshots
uses: actions/upload-artifact@v3
if: always()
with:
name: screenshots
path: screenshots/
retention-days: 7
Keep screenshot retention shorter than reports. Screenshots stack up fast — a single test suite across 5 browsers can generate a few hundred MB. GitHub Actions gives you 500MB of artifact storage on the free tier before it starts eating into your paid storage quota, and screenshot folders are usually the main culprit. 7 days is plenty; if you haven’t debugged a failure within a week, you’re not going to.
Don’t Run Selenium on Every Branch Push
This is the one I push hardest on with junior devs: Selenium tests are slow. A realistic suite — login flows, checkout, form validation across multiple pages — takes 5 to 12 minutes on a standard GitHub-hosted runner. If you trigger that on every push to every feature branch, you’ll burn through your free Actions minutes in a week, and developers will start ignoring CI because it takes forever.
Gate Selenium on what matters:
on:
push:
branches:
- main
pull_request:
branches:
- main
Feature branches get your unit tests and maybe a smoke test. Full Selenium runs on main pushes and on PRs targeting main. This keeps feedback fast for day-to-day work and still catches regressions before they land. I’ve also seen teams add a workflow_dispatch trigger so developers can manually kick off a full run when they need it — that’s a sensible escape hatch.
Slack Notifications on Test Failure
The simplest Slack notification that actually works is a raw curl against an Incoming Webhook URL. No third-party GitHub Action required, no extra dependencies. You set up an Incoming Webhook in Slack’s app config (takes 3 minutes), store the URL as a GitHub Actions secret, and add this step:
- name: Notify Slack on Failure
if: failure()
run: |
curl -X POST -H 'Content-type: application/json' \
--data '{"text":"❌ Selenium tests failed on `${{ github.ref_name }}` — <${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|View Run>"}' \
${{ secrets.SLACK_WEBHOOK_URL }}
That’s it. The message links directly to the failing Actions run. The if: failure() condition means it only fires when something actually broke — not on every run. One thing that tripped me up: if: failure() checks whether any previous step in the job failed, not whether the overall workflow failed. If your test step is in a separate job, you need to pass the job status through using needs and check needs.test.result == 'failure' instead. Simple distinction, annoying to debug when your notifications stop firing.
If you want richer messages — attaching the number of failed tests, which suite, which browser — you can generate a small JSON payload from your test results and pass it into the curl body. But honestly, for most teams a link to the failing run is enough. People click through to see the details. Keep the notification simple or it turns into its own maintenance burden.
When to Reach for a Remote Grid Instead of Running Chrome in CI
Most CI setups run Chrome headless fine — until they don’t. The breaking point is usually one of three things: you need to test a browser your CI machine can’t run natively (Safari, IE11), your parallel test count has grown to where you’re queueing tests for minutes waiting on containers, or your flaky test rate spiked because you’re starving Chrome of RAM on a 2-core build runner. That’s the actual moment you should start looking at a remote grid — not before.
Sauce Labs vs BrowserStack vs Self-Hosted Selenium Grid: The Honest Breakdown
I’ve used all three at different jobs and the honest answer is that none of them is clearly better — they solve different problems. BrowserStack has a better browser coverage story and their Automate product is polished. As of 2024, their Automate plans start around $29/month for 1 parallel and go to $249/month for 5 parallels on the standard tier. Sauce Labs pricing is comparable but their free tier (called “Free Trial”) gives you 150 automated test minutes and then stops — it’s not a permanent free tier, it’s a trial. That distinction bit me when I set up a project on it and expected it to keep working. BrowserStack’s free tier for open source projects is genuinely free if your repo is public, which is useful for OSS work. Self-hosted Selenium Grid costs you nothing in licensing but you’re paying with engineering time. Running the Hub + Node architecture on your own infra means you’re dealing with Docker Compose configs, session limits, node registration failures, and the occasional mystery where a node disappears from the grid mid-test run.
When Self-Hosted Actually Makes Sense
I switched a team to self-hosted Grid for two specific reasons: we needed IE11 (a legacy enterprise client requirement), and we were running around 60 parallel tests on a suite that needed to finish in under 8 minutes. At that parallelism, Sauce Labs would have cost us roughly $800–$1,000/month on a plan with enough concurrency. Spinning up a Selenium Grid on 4 EC2 instances with Windows Server nodes for IE11 cost about $180/month in compute and maybe a day of setup. The maintenance overhead was real — we had to update drivers manually when Chrome updated and IE11 nodes needed babysitting — but the math made sense. If you’re under 20 parallel tests and don’t need exotic browsers, just use BrowserStack or Sauce and stop thinking about it.
The WebdriverIO Config Change — It’s Actually Four Lines
People make this sound more complicated than it is. You go from your local ChromeDriver setup to pointing at a remote grid by changing your wdio.conf.js like this:
// Before (local)
services: ['chromedriver'],
capabilities: [{ browserName: 'chrome' }],
// After (BrowserStack example)
hostname: 'hub.browserstack.com',
port: 443,
protocol: 'https',
path: '/wd/hub',
user: process.env.BROWSERSTACK_USERNAME,
key: process.env.BROWSERSTACK_ACCESS_KEY,
capabilities: [{
browserName: 'chrome',
'bstack:options': {
os: 'OS X',
osVersion: 'Sonoma',
browserVersion: 'latest',
projectName: 'My App',
buildName: process.env.CI_BUILD_NUMBER,
}
}],
That’s it. Remove the chromedriver service, add the four connection lines (hostname, port, protocol, path), swap in your credentials from environment variables, and drop the bstack:options cap block. For Sauce Labs the hostname is ondemand.us-west-1.saucelabs.com and the cap namespace is sauce:options. For self-hosted Grid, hostname is your Grid Hub IP and you drop the auth entirely. The thing that caught me off guard the first time was forgetting to set buildName in the caps — without it, BrowserStack groups all your test sessions under “Unknown” in their dashboard and debugging a failing run becomes a painful scroll through hundreds of unlabeled sessions.
Check Pricing Before You Commit — The Free Tier Limits Are Real
Both services have limits on their free tiers that will stop your CI pipeline dead without warning if you hit them. BrowserStack Automate’s free plan gives you 100 minutes of Automate testing for open source or a 30-day trial for private projects. Once those minutes are gone, your tests don’t degrade — they just fail to start a session and your CI job hangs or throws an authentication error. I’ve seen this confuse junior devs who think their code broke when actually the account just ran out of credits. Before you wire either service into CI, go to their pricing pages directly (browserstack.com/pricing and saucelabs.com/pricing) and look at the concurrency limits on each tier, not just the monthly price. A plan that looks affordable at $49/month might only give you 2 parallel sessions, which means a 100-test suite that could run in 4 minutes will take 40 minutes sequentially. That kills the whole point of using a remote grid.
Quick Reference: The wdio.conf.js Fields You Actually Need
Copy this config, drop it into your project root, and you have a working CI/CD-ready WebdriverIO setup. I’ve stripped out every field that exists just to fill documentation and left only what actually changes behavior in a pipeline context. Every comment explains the why, not just the what.
// wdio.conf.js
exports.config = {
// ---------------------------------------------------------------------------
// Runner: 'local' runs specs in the same process. Use 'browser' only for
// browser-native runner (WDIO v8+). Stick with 'local' for Node-side logic.
// ---------------------------------------------------------------------------
runner: 'local',
// ---------------------------------------------------------------------------
// baseUrl: The environment variable pattern is non-negotiable in CI.
// Locally you hit localhost:3000. In CI you hit a ephemeral preview URL.
// The fallback keeps your local dev flow working without touching .env files.
// ---------------------------------------------------------------------------
baseUrl: process.env.BASE_URL || 'http://localhost:3000',
// ---------------------------------------------------------------------------
// specs: Glob patterns. Keep e2e tests in a dedicated folder, not colocated
// with unit tests. Mixing them causes accidental parallel execution headaches.
// ---------------------------------------------------------------------------
specs: ['./test/e2e/**/*.spec.js'],
// ---------------------------------------------------------------------------
// exclude: Anything in helpers/ or fixtures/ that gets picked up by the glob.
// ---------------------------------------------------------------------------
exclude: ['./test/e2e/helpers/**'],
// ---------------------------------------------------------------------------
// maxInstances: In CI, match this to the number of CPU cores available.
// GitHub Actions free runners give you 2 cores. Setting this to 5 on a
// 2-core machine doesn't speed things up — it thrashes and creates flaky
// timeouts. I learned that the hard way on a 40-spec suite.
// ---------------------------------------------------------------------------
maxInstances: process.env.CI ? 2 : 5,
capabilities: [{
browserName: 'chrome',
'goog:chromeOptions': {
// ---------------------------------------------------------------------------
// args: headless:new is the modern flag (replaces --headless). The old
// --headless flag still works but renders differently from headed Chrome.
// --no-sandbox is required inside Docker/CI containers. Without it,
// Chrome refuses to start and gives a cryptic exit code 127.
// --disable-dev-shm-usage prevents crashes on low-memory CI containers.
// The /dev/shm partition on default GitHub Actions runners is 64MB,
// which Chrome will blow through instantly on heavier pages.
// ---------------------------------------------------------------------------
args: [
'--headless=new',
'--no-sandbox',
'--disable-dev-shm-usage',
'--window-size=1280,800',
],
},
}],
// ---------------------------------------------------------------------------
// logLevel: 'warn' in CI. 'info' locally when debugging. 'trace' when you're
// really stuck and need to see every WebDriver command round-trip.
// Commit with 'warn' or your CI logs become unreadable walls of text.
// ---------------------------------------------------------------------------
logLevel: process.env.CI ? 'warn' : 'info',
// ---------------------------------------------------------------------------
// bail: Stops the suite after N failures. I set this to 1 in CI to fail fast
// and not burn 15 minutes running specs against a broken build.
// Set to 0 (default) locally so you see all failures in one run.
// ---------------------------------------------------------------------------
bail: process.env.CI ? 1 : 0,
// ---------------------------------------------------------------------------
// waitforTimeout: Default timeout for waitFor* commands (ms).
// 5000ms locally is usually fine. Bump to 10000 in CI because containerized
// Chrome + network latency adds up, especially on app startup waits.
// ---------------------------------------------------------------------------
waitforTimeout: process.env.CI ? 10000 : 5000,
// ---------------------------------------------------------------------------
// connectionRetryTimeout / connectionRetryCount: How long WDIO waits to
// establish a WebDriver session. If chromedriver takes >120s to respond
// in CI, something is broken upstream. Don't mask that with a huge timeout.
// ---------------------------------------------------------------------------
connectionRetryTimeout: 120000,
connectionRetryCount: 3,
// ---------------------------------------------------------------------------
// services: chromedriver handles the chromedriver binary for you.
// No need to globally install chromedriver or manage PATH manually.
// ---------------------------------------------------------------------------
services: ['chromedriver'],
// ---------------------------------------------------------------------------
// framework: jasmine gives you describe/it/expect. mocha is fine too.
// Pick one and stay consistent across the team. Switching frameworks mid-
// project is a waste of an afternoon.
// ---------------------------------------------------------------------------
framework: 'jasmine',
jasmineOpts: {
defaultTimeoutInterval: 60000,
},
// ---------------------------------------------------------------------------
// reporters: spec reporter gives readable terminal output locally.
// allure reporter generates the HTML report CI can archive as an artifact.
// You don't need both in production — spec is fine for CI logs, allure for
// the scheduled nightly run you'll actually want to inspect.
// ---------------------------------------------------------------------------
reporters: [
'spec',
['allure', {
outputDir: 'allure-results',
disableWebdriverStepsReporting: true, // cuts noise significantly
disableWebdriverScreenshotsReporting: false,
}],
],
// ---------------------------------------------------------------------------
// onPrepare: Runs once before any spec files are loaded. This is where I
// validate that the app is actually up before wasting time launching Chrome.
// A failed health check here gives you a clear error instead of 40 specs
// failing with "net::ERR_CONNECTION_REFUSED".
// ---------------------------------------------------------------------------
onPrepare: async function() {
const http = require('http');
const url = new URL(exports.config.baseUrl);
await new Promise((resolve, reject) => {
const req = http.get(
{ host: url.hostname, port: url.port || 80, path: '/health' },
(res) => {
if (res.statusCode === 200) {
console.log(`[onPrepare] App is up at ${exports.config.baseUrl}`);
resolve();
} else {
reject(new Error(`Health check returned ${res.statusCode}`));
}
}
);
req.on('error', reject);
req.setTimeout(5000, () => reject(new Error('Health check timed out')));
});
},
// ---------------------------------------------------------------------------
// afterTest: Runs after every single test. This is your debugging lifeline.
// Screenshots on failure + the browser console log saved to disk means you
// can diagnose a flaky test from a CI artifact without re-running anything.
// ---------------------------------------------------------------------------
afterTest: async function(test, context, { error, passed }) {
if (!passed) {
const fs = require('fs');
const path = require('path');
const timestamp = Date.now();
const safeName = test.title.replace(/[^a-z0-9]/gi, '_').toLowerCase();
// Screenshot
const screenshotDir = './test-artifacts/screenshots';
fs.mkdirSync(screenshotDir, { recursive: true });
await browser.saveScreenshot(
path.join(screenshotDir, `${safeName}_${timestamp}.png`)
);
// Browser console logs — catches JS errors your test assertions miss
const logs = await browser.getLogs('browser');
const logDir = './test-artifacts/logs';
fs.mkdirSync(logDir, { recursive: true });
fs.writeFileSync(
path.join(logDir, `${safeName}_${timestamp}.json`),
JSON.stringify(logs, null, 2)
);
console.error(`[afterTest] FAILED: "${test.title}" — artifacts saved`);
}
},
};
The BASE_URL environment variable pattern is the single change that makes this config environment-agnostic. Locally, your shell has no BASE_URL set, so it falls back to localhost:3000. In GitHub Actions or CircleCI, you export BASE_URL to whatever your ephemeral deployment URL is — a Vercel preview URL, a Docker service alias like http://app:3000, or a staging subdomain. The config never changes. The pipeline handles it:
# In your CI pipeline (GitHub Actions example)
- name: Run E2E tests
env:
BASE_URL: http://localhost:3000 # or your preview deploy URL
run: npx wdio run wdio.conf.js
The thing that caught me off guard early on: onPrepare throwing an error does not automatically fail the pipeline with a useful message unless you let the error bubble. If you swallow it in a try/catch without re-throwing, WDIO happily proceeds to launch 40 specs against a dead server. Every spec fails with a connection error and you spend 20 minutes thinking your selectors broke. Let the rejection propagate. The health check endpoint path /health assumes your app exposes one — if it doesn’t, ping the root path and check for a non-500 status instead.
The afterTest hook saves browser console logs alongside screenshots. Most guides show just the screenshot. The browser console capture is what actually tells you why the page looked wrong — uncaught promise rejections, failed network requests, CSP violations that silently block a script. I’ve had tests fail because a third-party analytics script threw an error that froze an event listener, and the screenshot showed a perfectly normal-looking page. The console log had the real story. Make sure your CI job archives the test-artifacts/ directory as a build artifact, or those files disappear the moment the runner terminates.
Fields you can safely ignore for now
suites— useful only once you have 100+ specs and need selective runs. Premature optimization before that point.exclude— only matters if your glob is too greedy. A well-structuredtest/e2e/folder avoids the problem entirely.autoCompileOpts— needed only if you’re writing specs in TypeScript. Add it then, not before.hostname/port— only relevant if you’re pointing at a remote Selenium Grid. For local chromedriver, the defaults are correct.