FOR TEAMS BUILDING SHOPPING AGENTS

Ship a shopping agent that checks out safely.

You’re building a shopping-orchestration AI that carts and checks out across merchant platforms over UCP. It handles real money and real user accounts — and we test the ways it goes wrong (paying the wrong amount, getting phished, failing the checkout) that passing conformance can’t catch.

39
Failure modes tested
6
You can watch live
100%
Proven to catch the bug

Your agent can pass every check and still lose money.

A conformance test reads the shape of the messages your agent sends. It can’t see your agent quietly paying a total that doesn’t add up, following a phishing link out of an error message, or trusting a store response it never verified. Those are behaviors — and they’re where real checkouts go wrong, with a real user’s card.

~99%of UCP stores pass conformance
Nearly every store passes the schema checks — yet a large share of real agent checkouts still fail. The message shapes are fine; the agent’s behavior is what nobody tests. Conformance is not reliability. That’s the gap we close.

The six ways an agent goes wrong — that conformance misses.

Each is a real behavior your agent must get right when it shops. Every test is proven to catch its own bug — it passes a known-good agent and provably fails the broken one. Watch all six live in the demo.

💰

Pays the wrong amount

The line items and total don’t reconcile, but the agent completes the purchase anyway instead of stopping for the buyer.

watch live
🎣

Gets phished

The agent follows a decoy link hidden in an error message and hands the user’s login to an attacker’s server.

watch live
🔒

Trusts a forged response

The agent skips verifying the store’s signed response, so a tampered or fake reply is accepted as real.

watch live
🔑

Links accounts unsafely

Missing PKCE or an unchecked issuer lets an attacker hijack the OAuth flow and capture the linked account.

watch live
💳

Pays with the wrong method

The agent pays with a payment type the store never offered — an unauthorized instrument.

watch live
🧹

Leaks or over-shares

Sends fields it shouldn’t, forgets to identify itself, or never revokes access when the user unlinks.

+ 34 more

Point your agent at a verified store. See if it actually buys.

There’s no easy way to try a real UCP checkout end-to-end yet — so we host a store that’s provably correct, and one that behaves badly on purpose. Your agent shops both, and we grade exactly how it behaves.

1

Your agent shops our store

Point it at our verified merchant sandbox and let it run a full checkout — discovery, payment, the works.

2

We throw problems at it

Bad signatures, spliced login servers, mismatched totals, phishing decoys — the things a real store might do wrong or an attacker might try.

3

You get a reliability report

Exactly which behaviors your agent got right, and which ones would have cost a user real money — the findings conformance never surfaces.

Watch a correct agent — then break it.

The demo replays real recorded runs. Flip one flaw and watch exactly what the agent does wrong — and how it’s caught. No signup.

or run it against your own agent
# clone the suite and run the agent-reliability lane
$ git clone https://github.com/vishkaty/ucp-conformance
$ python3 conformance/agent/run_agent.py
agent lane: reference agent ran 8 ops (ok); 38 checks
PASS — every check passes a correct agent and catches its bug.
Open the live demo →
Building the merchant side instead? Test that AI shopping agents can actually discover and buy from your catalog, cart & checkout.
Merchant platforms →

Building a shopping agent — or vetting one?

If you’re shipping an AI that checks out, or a platform that has to trust third-party agents, test what actually breaks before it touches a real card. The demo is free and open.

Unofficial. spck.dev is an independent, community effort — not affiliated with or endorsed by the Universal Commerce Protocol project. Results reflect only the checks run, not certified compliance; the official conformance suite is authoritative. We grade an agent’s observed behavior against the spec’s normative requirements.