FOR TEAMS BUILDING SHOPPING AGENTS

Ship a shopping agent that checks out safely.

You’re building a shopping-orchestration AI that carts and checks out across merchant platforms over UCP. It handles real money and real user accounts — and we test the ways it goes wrong (paying the wrong amount, getting phished, failing the checkout) that passing conformance can’t catch.

Watch an agent break — live → Why conformance isn’t enough ↓

Failure modes tested

You can watch live

100%

Proven to catch the bug

The problem

Your agent can pass every check and still lose money.

A conformance test reads the shape of the messages your agent sends. It can’t see your agent quietly paying a total that doesn’t add up, following a phishing link out of an error message, or trusting a store response it never verified. Those are behaviors — and they’re where real checkouts go wrong, with a real user’s card.

~99%of UCP stores pass conformance

Nearly every store passes the schema checks — yet a large share of real agent checkouts still fail. The message shapes are fine; the agent’s behavior is what nobody tests. Conformance is not reliability. That’s the gap we close.

What we catch

The six ways an agent goes wrong — that conformance misses.

Each is a real behavior your agent must get right when it shops. Every test is proven to catch its own bug — it passes a known-good agent and provably fails the broken one. Watch all six live in the demo.

💰

Pays the wrong amount

The line items and total don’t reconcile, but the agent completes the purchase anyway instead of stopping for the buyer.

watch live

🎣

Gets phished

The agent follows a decoy link hidden in an error message and hands the user’s login to an attacker’s server.

watch live

🔒

Trusts a forged response

The agent skips verifying the store’s signed response, so a tampered or fake reply is accepted as real.

watch live

🔑

Links accounts unsafely

Missing PKCE or an unchecked issuer lets an attacker hijack the OAuth flow and capture the linked account.

watch live

💳

Pays with the wrong method

The agent pays with a payment type the store never offered — an unauthorized instrument.

watch live

🧹

Leaks or over-shares

Sends fields it shouldn’t, forgets to identify itself, or never revokes access when the user unlinks.

+ 34 more

How you test it

Point your agent at a verified store. See if it actually buys.

There’s no easy way to try a real UCP checkout end-to-end yet — so we host a store that’s provably correct, and one that behaves badly on purpose. Your agent shops both, and we grade exactly how it behaves.

Your agent shops our store

Point it at our verified merchant sandbox and let it run a full checkout — discovery, payment, the works.

We throw problems at it

Bad signatures, spliced login servers, mismatched totals, phishing decoys — the things a real store might do wrong or an attacker might try.

You get a reliability report

Exactly which behaviors your agent got right, and which ones would have cost a user real money — the findings conformance never surfaces.

Try it now

Watch a correct agent — then break it.

The demo replays real recorded runs. Flip one flaw and watch exactly what the agent does wrong — and how it’s caught. No signup.

or run it against your own agent

# clone the suite and run the agent-reliability lane
$ git clone https://github.com/vishkaty/ucp-conformance
$ python3 conformance/agent/run_agent.py
agent lane: reference agent ran 8 ops (ok); 38 checks
PASS — every check passes a correct agent and catches its bug.

Open the live demo →

Building a shopping agent — or vetting one?

If you’re shipping an AI that checks out, or a platform that has to trust third-party agents, test what actually breaks before it touches a real card. The demo is free and open.

Try the live demo Run it on GitHub

Unofficial. spck.dev is an independent, community effort — not affiliated with or endorsed by the Universal Commerce Protocol project. Results reflect only the checks run, not certified compliance; the official conformance suite is authoritative. We grade an agent’s observed behavior against the spec’s normative requirements.