Autonomous AI agent that shops e-commerce stores and reports what's broken
Third-party apps and widgets update silently — and when they do, they can break parts of your store without anyone knowing: A reviews widget adds 3 seconds to checkout, the cookie consent banner starts reappearing on every page because the add-on pushed a config change, and a geolocation popup covers the search bar on Android but looks fine on desktop.
It gets worse when code that worked perfectly in development hits production. A theme update shifts the Add to Cart button behind a sticky header — but only on mobile. Layout issues that the dev team solved locally resurface under real-world conditions: different devices, network throttling, CDN caching, third-party scripts loading in a different order.
These aren't theoretical — they're things we found the moment we ran the system against production stores.
The standard industry answer is synthetic monitoring (Datadog, Pingdom, New Relic), which tells you if an endpoint is up and how fast it responds, or session replay tools, like FullStory and Contentsquare, which show you what happened to real users, after the fact. Neither actually walks through a purchase journey on your live store every day and tells you concretely what went wrong and where.
We built an agent that navigates e-commerce sites the way a customer would — browsing products, adding to cart, filling checkout forms — and evaluates the experience at every step along the way.
The AI agent drives a headless Chromium instance with realistic device profiles to simulate the experience of a real user. For example, Mobile gets a Pixel 5 viewport with 4G network throttling and CPU constraints, while Desktop runs Windows Chrome with broadband speeds.
The navigation is agentic. The system uses computer vision and DOM analysis to identify interactive elements — search inputs, product cards, Add to Cart buttons, form fields — and decides how to interact with them. Platform-specific knowledge (Shopify, WooCommerce) provides structural hints, but the agent handles actual page interaction through dynamic element resolution rather than brittle CSS paths. When a Shopify theme renames its classes after an update, the agent adapts because it resolves elements by intent — find the primary call-to-action, find the checkout button — not by hardcoded selectors.
Every run executes full purchase flows (search → product page → cart → checkout) on both devices and produces a full evidence package: step-by-step screenshots, video recording, HAR traces, console logs, and a structured issue report, while blockers trigger instant email alerts.
Driving and navigating a site like a human is one part of the story; identifying what's wrong is the other. We built a set of detectors that evaluate the experience at every interaction point during a flow:
Before every click, the agent verifies the target is visible, has real dimensions, and isn't covered by fixed or sticky elements. A header sitting on top of 60% of the Add to Cart button on mobile — that gets flagged with a screenshot showing the exact overlap.
After every click, we classify what happened: page navigation, SPA route change, DOM mutation, or nothing. "Nothing" means the click did nothing observable. This could be a broken handler, intercepted event, or an element that looks clickable but isn't wired up, which - in any case - are nearly impossible to catch from server logs.
LCP, CLS, INP, FCP, TTFB, long task count — measured at each step. Your site might be "fast" on average, but if search loads in 1.2s and checkout takes 4.8s, that matters.
CLS measured across each user action, not just initial page load. Product added to cart causes a 0.3 shift? We show exactly which step triggered it.
The system handles known overlays (cookies, geo banners, newsletter popups, chat widgets) automatically but tracks cumulative time consumed. A cookie dialog reappearing on every navigation? That shows up as measurable friction.
Console errors, failed requests, 4xx/5xx responses — all captured throughout the flow. While 40 JS errors and 12 failed API calls during checkout is a blocker, one console warn is not.
pointer-events: none looks like it's blocking but isn't. We built a grid-sampling system that probes a matrix of points across each target element, traces through the DOM to find fixed or sticky ancestors at each point, computes effective opacity through the full ancestor chain, and classifies the obstruction as hit-blocking or visual-only — with different severity thresholds for each.The agent runs against live stores providing a basic purchase goal.
On the first run against a mid-size fashion retailer, it flagged a sticky header covering 40% of the Add to Cart button on mobile, a cookie banner re-triggering on every page navigation, and checkout load times above 4 seconds under 4G conditions. None of this was visible in their analytics.
| Step | Time | LCP | CLS | |
|---|---|---|---|---|
| ▼ Search → Cart 📹 📄 | 38s | |||
| open_home | ✓ | 6.6s | 1508 | 0.00 |
| submit_search | ✓ | 8.3s | 3912 | 0.40 |
| open_pdp | ✓ | 5.0s | 904 | 0.00 |
| add_to_cart | ✗ | 4.0s | — | 0.00 |
| ▼ Cart → Checkout 📹 | 33s | |||
| go_checkout | ✓ | 8.1s | 4200 | 0.01 |
| fill_email | ✓ | 2.3s | — | 0.00 |
Let's talk about what an AI agent could do for your operations.
Book a Discovery Call