We wrote a realistic Python Kerberos / SPNEGO authentication service, asked Claude Sonnet 4.6 (training cutoff: August 2025) to perform a full security review including dependency CVE checks, then ran Gadriel Code Assurance against the same repository with a live OSV snapshot from 2026-06-27 — 270,336 advisories across 11 ecosystems.
The result is unambiguous and structural: large language models cannot perform reliable Software Composition Analysis. Their knowledge is frozen at a training cutoff. New CVEs published after that cutoff are invisible to them — no matter how thorough the review looks.
“gssapi 1.8.2: No published CVEs for the Python bindings known to me.” — Claude Sonnet 4.6, June 2026. It was wrong!
The scenario
A minimal enterprise Kerberos service — the kind of thing any developer might scaffold with an AI assistant. Pinned dependencies, intentionally not updated:
flask==2.3.0 ← released 2023, contains CVEs from 2023, 2024, and 2026
gssapi==1.8.2 ← Python wrapper around system libgssapi_krb5
krb5==0.7.1 ← Python wrapper around MIT krb5 C library (MGASA-2026-0233)
Werkzeug==2.3.7 ← contains CVE-2023-46136 and CVE-2024-34069
requests==2.31.0This is realistic: pinned versions like these sit in every production requirements.txt, accumulating vulnerabilities long after the last review.
Step 1 — AI security review
Claude produced a competent review covering code-level issues: a session token generated but never stored (auth bypass), missing TLS enforcement, race conditions in credential init, unbounded Authorization header size, base64 decoding without validation, partial knowledge of CVE-2023-30861 in Flask and CVE-2023-46136 in Werkzeug. Ten-plus findings, well-written, with sensible severities.
And then, verbatim:
“krb5 0.7.1: No published CVEs for the Python bindings known to me.”
Claude’s Flask entry listed only CVE-2023-30861. It had zero awareness of CVE-2026-27205 — a 2026 Flask session cookie-leak variant rated 5.0 CVSS that lives in the OSV PyPI shard right now.
This is not a criticism of Claude. It is a structural property of any AI with a training cutoff: no model can know about vulnerabilities that did not exist when it was trained.
Step 2 — Gadriel SCA scan
$ gadriel code policies --osv
✓ OSV shards synced — 270,336 advisories across 11 ecosystems
$ gadriel code scan . --osv-auto-sync=yes┌─────────────────┬──────────────────────┬──────────┬────────┬─────────────────────┐
│ ID │ Risk │ Severity │ Type │ Finding │
├─────────────────┼──────────────────────┼──────────┼────────┼─────────────────────┤
│ CODE-W1-SCA-056 │ critical (9.5) │ critical │ sca │ flask@2.3.0 │
│ CODE-W3-SCA-060 │ critical (9.5) │ critical │ sca │ flask@2.3.0 │
│ CODE-W1-SCA-003 │ high (8.0) │ high │ sca │ Werkzeug@2.3.7 │
│ CODE-W1-SCA-001 │ medium (5.0) │ medium │ sca │ flask@2.3.0 │
│ CODE-W1-SCA-002 │ medium (5.0) │ medium │ sca │ Werkzeug@2.3.7 │
│ CODE-W1-SCA-004 │ medium (5.0) │ medium │ sca │ Werkzeug@2.3.7 │
│ CODE-W1-L3-040 │ low/unverified │ critical │ sast │ src/auth_service.py │
└─────────────────┴──────────────────────┴──────────┴────────┴─────────────────────┘
Verdict: PARTIAL — 7.16 / 10.0The finding Claude missed: CVE-2026-27205
{
"id": "CODE-W1-SCA-001",
"severity": "medium",
"scan_type": "sca",
"what_was_tested": {
"title": "Known vulnerability (CVE-2026-27205) in flask 2.3.0",
"ecosystem": "PyPI",
"method": "osv_query"
},
"failure": {
"reason": "CVE-2026-27205: Flask session does not add 'Vary: Cookie'
header when accessed in some ways (in PyPI flask — 2.3.0)",
"risk_score": 5.0
}
}Gadriel also escalated CODE-W1-SCA-056 (CVE-2023-30861) to critical because the affected Flask release was yanked from PyPI — a supply-chain integrity signal beyond the raw CVSS — and CODE-W3-SCA-060 flagging a post-install network call that Claude does not check at all.
The ecosystem boundary problem
Gadriel’s PyPI scan correctly returns no PyPI advisory for krb5==0.7.1, even though the underlying MIT krb5 C library it wraps is exposed to MGASA-2026-0233 (CVE-2026-40355 / 40356). OSV treats them as separate artifacts in separate ecosystems. The defense is layered scanning: language-level SCA + OS-level SCA against the container image, both consuming the SBOMs Gadriel emits (SPDX 2.3 and CycloneDX 1.5). Claude can do neither.
Claude vs. Gadriel — capability matrix
| Capability | Claude (AI review) | Gadriel |
|---|---|---|
| Code-level SAST | Excellent (probabilistic) | Excellent (deterministic) |
| Known CVEs (pre-cutoff) | Partial | Complete |
| CVEs published after Aug 2025 | Blind | Live OSV feed |
| Supply-chain integrity | — | Yes |
| SBOM generation | — | SPDX + CycloneDX |
| Runs at every commit | Manual | Pre-commit + CI |
| Deterministic / reproducible | No | Yes |
The takeaway: do not rely on your AI to security-test itself
Asking your AI coding assistant to review its own output feels safe. It produces a confident, well-formatted report. It cites CVE numbers. It uses the right vocabulary. It looks like security.
It is not. An AI bounded by a training cutoff cannot see the CVEs published yesterday, last month, or last quarter — the exact window where active exploitation lives. Trusting that review is how serious vulnerabilities ship to production: unpatched dependencies, yanked releases, supply-chain compromises, and post-install network calls the model has never heard of. The exploit does not care that the AI sounded sure.
Never rely on an AI coding tool to security-test the code it just wrote. The cutoff is the attacker's window.
Get Gadriel today. It runs natively inside Claude Code, Cursor, Windsurf, Aider, ChatGPT, and Google AI Studio — the same coding surface you already use — and validates every line against a live OSV feed across 11 ecosystems, the full eight pillars, deterministically and reproducibly. Same input, same output. No training cutoff. No guessing.
Generated against Gadriel Code Assurance v1.1.3 · OSV snapshot 270,336 advisories · 11 ecosystems · synced 2026-06-27.

