AI CODE ASSURANCE · PART 2 · 2026-06-27

    Anthropic Skills Couldn't Find the RCE. Gadriel Did.

    We loaded 817 Anthropic cybersecurity skills into Claude Sonnet 4.6 to hunt CVEs in a Flask + Kerberos stack. Skills improved structure but never surfaced CVE-2026-40355 — an unauthenticated krb5 RCE. Only Gadriel found it.

    Anthropic Skills Couldn't Find the RCE. Gadriel Did.
    AI CODE ASSURANCE · PART 2·2026-06-27·9 MIN READ

    After showing that vanilla Claude Sonnet 4.6 and Opus 4.8 both missed CVE-2026-27205 (Flask 2026 session fixation), we asked the obvious follow-up: does loading a comprehensive cybersecurity skills library into Claude change the result? We installed the mukul975/Anthropic-Cybersecurity-Skills pack — 817 skills, v1.3.0 — and re-ran the same supply-chain test.

    Short answer: skills do not add live CVE data. They teach methodology. The gap closed only when the recommended tools were actually executed — not when the skills were loaded.

    Setup

    npx skills add mukul975/Anthropic-Cybersecurity-Skills
    # → 817 skills installed across .claude/skills/ and .agents/skills/

    Target dependencies (intentionally vulnerable):

    flask==2.3.0       ← CVE-2026-27205 (session fixation, CVSS 5.0) — 2026
    krb5==0.7.1        ← MGASA-2026-0233 / CVE-2026-40355 (unauthenticated RCE)
    gssapi==1.8.2      ← same advisory, gss_accept_sec_context() vulnerable
    Werkzeug==2.3.7    ← CVE-2024-34069 (debugger RCE), CVE-2023-46136
    requests==2.31.0   ← CVE-2024-35195, CVE-2026-25645

    Skills activated for this test:

    • analyzing-sbom-for-supply-chain-vulnerabilities — parse SBOM, query NVD 2.0, grype cross-validation
    • hunting-for-supply-chain-compromise — threat hunting for compromised PyPI/npm packages
    • performing-sca-dependency-scanning-with-snyk — pip-audit, snyk, OWASP Dependency-Check
    • detecting-golden-ticket-attacks-in-kerberos-logs — Kerberos TGT anomaly detection

    Test 1 — Skills-armed static analysis (no tool execution)

    Claude with skills loaded, asked to identify all CVEs using only its built-in knowledge:

    Flask 2.3.0:
      ✓ CVE-2023-30861 (CVSS 7.5)            [2023, pre-cutoff]
      ✗ CVE-2026-27205 (CVSS 5.0)            [2026 — NOT FOUND]
    
    Werkzeug 2.3.7:
      ✓ CVE-2023-46136, CVE-2024-34069       [pre-cutoff]
      ✗ CVE-2026-21860, CVE-2026-27199       [2026 — NOT FOUND]
    
    requests 2.31.0:
      ✓ CVE-2024-35195
      ✗ CVE-2026-25645                       [2026 — NOT FOUND]
    
    krb5 0.7.1 (Python bindings):
      ✓ CVE-2022-42898, CVE-2023-36054       [C library, pre-cutoff]
      ✗ CVE-2026-40355, CVE-2026-40356       [2026 — NOT FOUND]

    Skills improved the structure of the analysis — explicit MITRE ATT&CK IDs (T1195.001), system-level krb5 C library CVEs surfaced, dependency-graph risk scoring, prominent training-cutoff warning. They did not change coverage. Every post-August-2025 CVE remained invisible.

    “My training cutoff is August 2025. Today is June 27, 2026. Approximately 10 months of CVE disclosures are completely invisible to this analysis.” — Claude, with 817 skills loaded

    Test 2 — NVD API query (the skill's Step 3)

    The SBOM skill prescribes querying NVD 2.0 for each component. We executed it exactly. Results from live keyword search on 2026-06-27:

    Query: "flask 2.3.0"      → 0 CVEs
    Query: "werkzeug 2.3.7"   → 0 CVEs
    Query: "requests 2.31.0"  → 1 CVE (CVE-2023-32681) ← FALSE POSITIVE (fixed in 2.31.0)
    Query: "krb5 0.7.1"       → 0 CVEs
    Query: "gssapi 1.8.2"     → 0 CVEs

    CVE-2026-27205 was not found. Root cause: NVD keyword search doesn't match against CPE strings, and our SBOM had PURLs but no CPE identifiers. The skill correctly notes CPE queries are more precise — but methodology documentation doesn't guarantee correct execution when the inputs are incomplete.

    Test 3 — pip-audit (the skill-recommended tool)

    $ pip-audit -r /tmp/auditable-reqs.txt --no-deps
    
    flask 2.3.0:    ✓ CVE-2023-30861, ✓ CVE-2026-27205     ← 2026 FOUND
    Werkzeug 2.3.7: ✓ 6 CVEs including CVE-2026-21860, CVE-2026-27199
    requests 2.31.0:✓ CVE-2024-35195, CVE-2024-47081, CVE-2026-25645
    
    krb5 0.7.1:   ✗ FAILED — cannot build (no libkrb5-dev C headers)
    gssapi 1.8.2: ✗ FAILED — same

    pip-audit found CVE-2026-27205 — because pip-audit makes a live OSV query at execution time. The skill pointed to the right tool, the tool ran, the live database had the 2026 data. But pip-audit could not audit krb5 or gssapi at all (missing C headers), and CVE-2026-40355 — the most dangerous finding, an unauthenticated remote crash — was missed by every skill-driven path.

    Complete results: 4-way comparison

    CVEYearSonnet / Opus+ 817 Skills (static)NVD API (skill step)pip-auditGadriel
    CVE-2023-308612023
    CVE-2024-340692024partial
    CVE-2024-470812024
    CVE-2026-27205 (Flask)2026
    CVE-2026-21860 / 271992026
    CVE-2026-25645 (requests)2026
    CVE-2026-40355 (krb5 RCE)2026FAILED
    CVE-2026-40356 (krb5 DoS)2026FAILED

    What this proves

    1. Skills teach methodology, not CVE data

    Loading 817 cybersecurity skills did not add a single post-cutoff CVE to what Claude could find. Skills are knowledge about how to perform security tasks — which tools to run, which APIs to query, which patterns to look for. They are not a substitute for the live databases those tools query. A medical textbook teaches a doctor how to diagnose pneumonia; it doesn't replace today's blood test.

    2. Skills improved rigor, not coverage

    With skills loaded, Claude's report was measurably better in structure — MITRE ATT&CK mappings, system-level CVEs surfaced, explicit risk scoring. The post-cutoff blind spot was identical. Skills added rigor inside the training window; they did not extend it.

    3. The skill-recommended tools require execution — and still miss native libraries

    When pip-audit was actually run, it closed the Flask / Werkzeug / requests gap. But executing a skill-recommended tool requires: installing it, running it, parsing output, acting on findings — every time, for every dependency change. And pip-audit still failed on krb5 and gssapi. CVE-2026-40355 — unauthenticated remote crash via NegoEx packet — was found by neither the NVD workflow nor pip-audit. Gadriel found it via OSV cross-ecosystem correlation against the SBOM's PURL data.

    Gadriel vs. skills-armed Claude

    StepSkills-armed ClaudeGadriel
    Load skills / rules817 skills (manual install)Built-in
    Identify tools to runRecommended pip-audit, grypeBuilt-in
    Run SCA toolManual executionAutomatic
    Handle native-library gapNot solvedOSV cross-ecosystem
    Generate SBOMNot generatedSPDX 2.3 + CycloneDX 1.5
    Pre-commit gateNoYes (blocks on critical)
    CVE-2026-27205 foundOnly via manual pip-auditAutomatic
    CVE-2026-40355 foundNoYes

    The takeaway

    The value of methodology skills is real, but bounded by execution. Skills correctly told Claude to run pip-audit. pip-audit found CVE-2026-27205. But every step from "skill suggests pip-audit" to "CVE blocked at the CI gate" requires a human in the loop — and still leaves native libraries uncovered.

    Skills tell you to run the scanner. Gadriel is the scanner that runs itself.

    Gadriel closes the loop automatically. It is not that Gadriel has better skills — it is that Gadriel is the tool the skills were pointing toward, executing deterministically, across ecosystems, every commit, with no manual install and no missing C headers.

    Skills library: github.com/mukul975/Anthropic-Cybersecurity-Skills (817 skills, v1.3.0) · Gadriel Code Assurance v1.1.3 · OSV snapshot 270,336 advisories · 11 ecosystems · synced 2026-06-27.