Your AI Coding Assistant Cannot See This CVE (and 50,000 more identified vulnerabilities)

We wrote a realistic Python Kerberos / SPNEGO authentication service, asked Claude Sonnet 4.6 (training cutoff: August 2025) to perform a full security review including dependency CVE checks, then ran Gadriel Code Assurance against the same repository with a live OSV snapshot from 2026-06-27 — 270,336 advisories across 11 ecosystems.

The result is unambiguous and structural: large language models cannot perform reliable Software Composition Analysis. Their knowledge is frozen at a training cutoff. New CVEs published after that cutoff are invisible to them — no matter how thorough the review looks.

“gssapi 1.8.2: No published CVEs for the Python bindings known to me.” — Claude Sonnet 4.6, June 2026. It was wrong!

The scenario

A minimal enterprise Kerberos service — the kind of thing any developer might scaffold with an AI assistant. Pinned dependencies, intentionally not updated:

flask==2.3.0      ← released 2023, contains CVEs from 2023, 2024, and 2026
gssapi==1.8.2     ← Python wrapper around system libgssapi_krb5
krb5==0.7.1       ← Python wrapper around MIT krb5 C library (MGASA-2026-0233)
Werkzeug==2.3.7   ← contains CVE-2023-46136 and CVE-2024-34069
requests==2.31.0

This is realistic: pinned versions like these sit in every production requirements.txt, accumulating vulnerabilities long after the last review.

Step 1 — AI security review

Claude produced a competent review covering code-level issues: a session token generated but never stored (auth bypass), missing TLS enforcement, race conditions in credential init, unbounded Authorization header size, base64 decoding without validation, partial knowledge of CVE-2023-30861 in Flask and CVE-2023-46136 in Werkzeug. Ten-plus findings, well-written, with sensible severities.

And then, verbatim:

“krb5 0.7.1: No published CVEs for the Python bindings known to me.”

Claude’s Flask entry listed only CVE-2023-30861. It had zero awareness of CVE-2026-27205 — a 2026 Flask session cookie-leak variant rated 5.0 CVSS that lives in the OSV PyPI shard right now.

This is not a criticism of Claude. It is a structural property of any AI with a training cutoff: no model can know about vulnerabilities that did not exist when it was trained.

Step 2 — Gadriel SCA scan

$ gadriel code policies --osv
✓ OSV shards synced — 270,336 advisories across 11 ecosystems

$ gadriel code scan . --osv-auto-sync=yes

┌─────────────────┬──────────────────────┬──────────┬────────┬─────────────────────┐
│ ID              │ Risk                 │ Severity │ Type   │ Finding             │
├─────────────────┼──────────────────────┼──────────┼────────┼─────────────────────┤
│ CODE-W1-SCA-056 │ critical (9.5)       │ critical │ sca    │ flask@2.3.0         │
│ CODE-W3-SCA-060 │ critical (9.5)       │ critical │ sca    │ flask@2.3.0         │
│ CODE-W1-SCA-003 │ high (8.0)           │ high     │ sca    │ Werkzeug@2.3.7      │
│ CODE-W1-SCA-001 │ medium (5.0)         │ medium   │ sca    │ flask@2.3.0         │
│ CODE-W1-SCA-002 │ medium (5.0)         │ medium   │ sca    │ Werkzeug@2.3.7      │
│ CODE-W1-SCA-004 │ medium (5.0)         │ medium   │ sca    │ Werkzeug@2.3.7      │
│ CODE-W1-L3-040  │ low/unverified       │ critical │ sast   │ src/auth_service.py │
└─────────────────┴──────────────────────┴──────────┴────────┴─────────────────────┘
Verdict: PARTIAL — 7.16 / 10.0

The finding Claude missed: CVE-2026-27205

{
  "id": "CODE-W1-SCA-001",
  "severity": "medium",
  "scan_type": "sca",
  "what_was_tested": {
    "title": "Known vulnerability (CVE-2026-27205) in flask 2.3.0",
    "ecosystem": "PyPI",
    "method": "osv_query"
  },
  "failure": {
    "reason": "CVE-2026-27205: Flask session does not add 'Vary: Cookie'
               header when accessed in some ways (in PyPI flask — 2.3.0)",
    "risk_score": 5.0
  }
}

Gadriel also escalated CODE-W1-SCA-056 (CVE-2023-30861) to critical because the affected Flask release was yanked from PyPI — a supply-chain integrity signal beyond the raw CVSS — and CODE-W3-SCA-060 flagging a post-install network call that Claude does not check at all.

The ecosystem boundary problem

Gadriel’s PyPI scan correctly returns no PyPI advisory for krb5==0.7.1, even though the underlying MIT krb5 C library it wraps is exposed to MGASA-2026-0233 (CVE-2026-40355 / 40356). OSV treats them as separate artifacts in separate ecosystems. The defense is layered scanning: language-level SCA + OS-level SCA against the container image, both consuming the SBOMs Gadriel emits (SPDX 2.3 and CycloneDX 1.5). Claude can do neither.

Claude vs. Gadriel — capability matrix

Capability	Claude (AI review)	Gadriel
Code-level SAST	Excellent (probabilistic)	Excellent (deterministic)
Known CVEs (pre-cutoff)	Partial	Complete
CVEs published after Aug 2025	Blind	Live OSV feed
Supply-chain integrity	—	Yes
SBOM generation	—	SPDX + CycloneDX
Runs at every commit	Manual	Pre-commit + CI
Deterministic / reproducible	No	Yes

The takeaway: do not rely on your AI to security-test itself

Asking your AI coding assistant to review its own output feels safe. It produces a confident, well-formatted report. It cites CVE numbers. It uses the right vocabulary. It looks like security.

It is not. An AI bounded by a training cutoff cannot see the CVEs published yesterday, last month, or last quarter — the exact window where active exploitation lives. Trusting that review is how serious vulnerabilities ship to production: unpatched dependencies, yanked releases, supply-chain compromises, and post-install network calls the model has never heard of. The exploit does not care that the AI sounded sure.

Never rely on an AI coding tool to security-test the code it just wrote. The cutoff is the attacker's window.

Get Gadriel today. It runs natively inside Claude Code, Cursor, Windsurf, Aider, ChatGPT, and Google AI Studio — the same coding surface you already use — and validates every line against a live OSV feed across 11 ecosystems, the full eight pillars, deterministically and reproducibly. Same input, same output. No training cutoff. No guessing.