Demos, teardowns, and primary research on what AI writes, what AI does, and where assurance tooling earns its keep.

We asked Claude Sonnet 4.6 and Opus 4.8 to security-review a Python Kerberos service. It missed a 2026 Flask CVE that Gadriel caught from a live OSV feed. The structural reason matters.

We loaded 817 Anthropic cybersecurity skills into Claude Sonnet 4.6 to hunt CVEs in a Flask + Kerberos stack. Skills improved structure but never surfaced CVE-2026-40355 — an unauthenticated krb5 RCE. Only Gadriel found it.