Technical

Red Teams, Pen Tests, Tabletops, and Vulnerability Scans: Where Persistent Purple Teaming Fits

Persistent Purple Teaming fills the last 10% gap left behind by annual pen tests, tabletops, vulnerability scans, and scripted validation tools.

March 2, 20269 min readtooling • telemetry • validation

We run the pen test. We do the tabletop. We have the vulnerability scanner running. The SIEM is correlating. The MSSP is on overnight shift. By every visible measure, the security program is working.

And then a developer spins up an Azure instance without MFA. Novel malware hits a legacy Linux box that does not have EDR coverage. An obfuscated attack chain unfolds across a part of the network that has never been tested. The program that looked fine is not fine, because nobody ever tested the last 10%.

For the full picture of what Persistent Purple Teaming is and how it works, read Persistent Purple Teaming Explained: Why Continuous Validation Changes Everything.

What Is Actually Wrong with Annual Pen Testing?

The first pen test is almost always genuinely eye-opening. A qualified tester moves laterally through the environment, escalates privileges, reaches the core of the tech stack, and reveals things that would never have surfaced through a vulnerability scan or a compliance audit. That is valuable. The problem is what happens after that.

Our co-founder Matt Stewart has a clear-eyed view of the pen test cycle after years of incident response. If you never rotate the pen testing vendor, you get the same tactics year after year, the test becomes a benchmark of what you already know rather than a probe of what you do not. If you do rotate, quality becomes unpredictable, every company that has run eight pen tests has had a couple of bad ones. Either way, you are measuring breadth and calling it security validation. The last 10%, the obfuscated techniques, the zero-day logic, the novel attack chains that are actually being used in the wild right now, is never what gets tested.

Sean Martin, co-founder of ITSPmagazine, put it directly during our conversation: organizations can show improving scores, faster detection times, better team coordination, and still have a 10-20% blind spot that nobody has ever actually looked at. That blind spot is the one that matters most when a motivated adversary shows up.

Why Do Tabletops and Breach and Attack Simulation Tools Have the Same Problem?

Tabletop exercises are eye-opening the first time for the same reason pen tests are, they surface communication failures, decision-making gaps, and the disconnect between engineering, management, and leadership that everyone suspected but nobody had documented. After a few cycles, they become familiar. Teams rehearse the known scenarios more confidently. The unknown scenarios, the ones outside the neat package of what is planned, remain untested.

Breach and attack simulation platforms have a different version of the same limitation. They can run scripted attack simulations at scale and surface detection gaps against known patterns. What they cannot do is improvise, chain techniques creatively, adapt to the specific environment, or simulate the kind of adaptive attack chain that a real adversary actually uses. They tell you whether known techniques are logged. They cannot tell you whether an attacker who deviates from the script would be caught.

How Does AI Change the Risk Equation for Security Programs?

Matt shared an observation from a developer colleague that applies directly to security: AI makes the best operators faster and better, and the worst operators faster and worse, just all faster. In security operations, that means a program with solid foundations, well-tuned detections, and validated coverage can accelerate meaningfully as AI automates the routine work. A program with gaps, untuned rules, unvalidated assumptions, coverage that has never been depth-tested, will see those gaps expand faster than they could before AI entered the picture.

The implication is that the cost of not doing depth testing is rising, not staying flat. Every year that annual testing remains the standard, the gap between what a program appears to cover and what it would actually catch under a real, adaptive attack grows a little wider.

What Does the Last 10% Look Like in a Real Environment?

After 23 years in forensics and incident response, Matt has walked into enough breached environments to describe what the last 10% actually looks like in practice. A developer spins up an Azure instance without two-factor authentication, it is not connected to the main monitoring stack, and it creates a lateral movement pathway nobody anticipated. Novel malware gets introduced that is obfuscated enough to bypass EDR, or that hits a legacy Linux box, an IoT device, an OT environment that does not have EDR coverage at all. Old software versions running on equipment that was never decommissioned. Systems that log activity but are not ingested into the SIEM. Configurations that were set during initial deployment and never validated since.

None of these are hypothetical. They are what we find, consistently, in environments that have solid security programs by every conventional measure. The EDRs are firing. The MSSP is responding to critical alerts. The dashboard is green. And there are still pathways the adversary can walk through that nobody has ever tested.

What Does Persistent Purple Teaming Add That Other Testing Methods Cannot?

The persistent part is, as Matt puts it, where the rubber meets the road, the most important and the hardest. It is hard because it requires ongoing commitment rather than an annual event. It is important because the threat landscape does not take annual breaks. The developer who spins up an unsecured cloud instance does not wait for your testing cycle. The adversary with a new technique does not wait until after your pen test report is filed.

Monthly persistent purple teaming means engineers see misconfigurations live and fix them live. Analysts build real, specific knowledge of how attacks unfold in their environment, not rehearsed playbooks, but actual tested experience. The overall architecture improves continuously because depth testing drives it. And the human intelligence layer, the experienced operators who have seen this malware, this configuration failure, this attack chain in real engagements, is what makes the automation and the tooling around it actually work.

It is also, Matt notes, what makes the work worth doing again. When analysts know they can detect lateral movement because it was tested last month and the fix held, that is not checkbox security. That is confidence built on evidence. And that changes how the whole team shows up.

Watch the full Brand Story and let us know if you want to connect, we are always open to a conversation with security leaders who are asking the right questions.

See The Operating Model

Understand how Persistent Purple Teaming turns insight into proof.

Review the operating model behind continuous validation before you commit to an assessment.