Sieve: conformance testing ML-KEM and ML-DSA against the bugs that matter

TL;DR

Read this first

Known-answer tests are necessary but not sufficient — they exercise correct-input round-trips only. Sieve exercises malformed, out-of-bounds, and edge-case inputs, each test tagged to a real bug class. Run it when you first integrate a library, in CI, and before each release.

A conformance harness is not a unit-test suite for one library. It exercises any implementation against a curated set of test categories, each targeting a specific class of bug we have observed in real-world audits or the public literature.

What conformance testing does and does not give you

It detects known bug classes — it cannot find an unknown one.
It validates a primitive, not the system that uses it. A correct ML-KEM can still be misused at the protocol layer.
It complements an audit; it does not replace one. Audits look at design and integration.

Decision

When to run it

Run conformance tests at three points: once when you first integrate a post-quantum library to establish a baseline, in CI on every commit that touches the cryptographic dependency, and once before each release against the release artefact rather than a development build.

From the audit floor

Implementations that pass KATs but fail conformance

We have seen multiple ML-KEM implementations pass every NIST known-answer test yet accept malformed ciphertexts a stricter implementation would reject. Conformance to KATs is necessary, but it is not sufficient.

The wiring is deliberately small: Sieve speaks a simple stdin/stdout JSON-line protocol, so the same battery runs against an implementation in any language behind a thin shim. When we find a bug we missed, it becomes a new test — the framework grows sharper over time.

Have a system that needs this?

Secure my organization

Sieve: conformance testing ML-KEM and ML-DSA against the bugs that matter

What conformance testing does and does not give you

Keep reading

Post-quantum migration is a risk-asymmetry problem, not a timeline bet

From the audit floor: replay-attackable post-quantum prekeys

X-Wing and the TLS group: choosing a hybrid KEM combiner