SPEAKER NOTES

Stable-Edge Filtering
KPMG · Cyber Tech Resilience

9 slides + cold-open · about 6–7 minutes. Swipe up to move through your notes in sync with the deck.

→ / Space advance & reveal · ← back
1–9 jump · N on-screen notes · F fullscreen
From the last slide, → loops to the cold-open.

BEFORE YOU START

COLD-OPEN · ATTRACT SCREEN

Which connections can you safely throw away?

▸ press F for fullscreen · let it sit · → to begin

Leave the animated graph up while people settle. Press F to go fullscreen — the screensaver stays. Open with the question on screen, then press → to begin.

SLIDE 1

MSc THESIS · UvA SNE × KPMG CYBER

The clean-up step that broke the classifier

▸ no reveals · press → for next

I tried an obvious, cheap trick to make passive OT asset classification more robust on messy client traffic. It doesn't help, and in one realistic situation it actively breaks classification, in a way any classifier shares. So this is less a method-pitch than a warning about a preprocessing step that looks free and isn't. I'll keep the machine-learning to a phrase and stay on what it means for the work we actually deliver. About four minutes.

SLIDE 2

THE PROBLEM

We classify what we can't scan

▸ 2 reveals as you talk · then → for next

Here's why this is our problem, not just an academic one. On a live plant you can't run an active scan without risking the process, so we infer device roles from captured traffic, that's a deliverable we ship. The catch: research benchmarks these classifiers on clean one-hour testbed captures, and real client traffic never looks like that. Engineering laptops come and go, vulnerability scanners sweep, equipment gets serviced. So two honest questions, does the classifier still work when the network is changing, and does a simple clean-up step before classifying help it stay robust? That second question is what I tested.

SLIDE 3

THE IDEA + THE LAB

Keep only the connections that last

▸ 2 reveals as you talk · then → for next

The idea is almost too obvious to argue with. We build a graph of who talks to whom, and the model classifies devices from that pattern of connections. Before classifying, we throw away edges that don't persist across time windows, call it stable-edge filtering. The intuition: a transient connection, like a one-off engineering session or a scan sweep, is just noise, so clean it out and the picture should get sharper. To test it credibly I built a real OT lab: twenty hosts across five device classes, one passive capture point on a shared segment, four scripted operational-change scenarios with ground truth. Crucially I evaluated on held-out hosts the model never trained on, averaged over ten lab seeds times ten model seeds, and released the code and scenarios. Builds on Heo and Shin, who did this only on clean traces.

SLIDE 4

BUILDING THE LAB · THE FACTORY

What the devices actually do

▸ 1 reveal as you talk · then → for next

One picture to make the lab concrete, because the whole point is that the connections mean something. The twenty containers play a small bottling plant: four controllers each drive a real production stage, the storage tank, the filler, the capper, the labeller. In the control room, supervisory stations watch the line; engineering workstations configure the controllers; historians in the server room log every reading; and IT gateways bridge to the enterprise. So when we build the graph of who talks to whom, it isn't random, it mirrors how a plant actually runs, and that structure is exactly what the classifier learns from.

SLIDE 5

BUILDING THE LAB · THE TESTBED

20 containers, one passive tap

▸ 2 reveals as you talk · then → for next

A quick word on the testbed, because the result only means something if the lab is honest. This is twenty Docker containers standing in for a small plant: five device classes, four of each, the controllers, plus the HMIs, historians and engineering stations that poll them, and a few IT endpoints. They all share one network segment, watched by a single passive tap, exactly the position we’re in on a client site. Every container runs the same image; what makes one behave like a PLC and another like an HMI is just its class in a config file. The controller serves Modbus and S7; everyone else polls it on its own schedule. And this isn’t a laptop demo, it runs around the clock on a real server, and it has already produced seventy-odd scenario runs and about twenty-four gigabytes of captured traffic.

SLIDE 6

BUILDING THE LAB · OPERATIONAL CHANGE

Four scripted changes, known ground truth

▸ 2 reveals as you talk · then → for next

Here’s why I built a lab instead of using a public capture: I need to change the network on purpose and still know the ground truth. A single phase signal tells every container when the world changes, live, in the middle of a capture: pause this controller, bring a new HMI online, repoint an engineer to a different PLC, or let a scanner loose for half an hour. Five states in total, a steady baseline plus four realistic changes. And the crucial part: the labels for which connections are genuinely stable versus transient come from the configuration, never from the traffic, otherwise the filter would be grading its own homework. Keep your eye on maintenance: a controller paused for forty minutes. That’s the one that breaks, and that’s exactly where we go next.

SLIDE 7

RQ2 · THE RESULT (held-out macro-F1)

Neutral on four. It breaks the fifth.

▸ 2 reveals as you talk · then → for next

Here's what actually happened. The filter buys you no robustness. On four of the five scenarios, steady state, onboarding, configuration drift, and benign scanning, the effect is essentially zero, nothing significant either way. So the upside we hoped for simply isn't there. And in the fifth scenario, maintenance, it doesn't just fail to help, it significantly hurts. Held-out macro-F1 drops from 0.45 to 0.36; that's significant at p equals 0.027, and it's worse in eight of the ten runs, so it's not one unlucky seed. Watch the bar chart: four bars sit flat at zero, and one warm bar points the wrong way. A clean-up step that was supposed to add robustness introduced its own failure mode instead.

SLIDE 8

THE MECHANISM · MAINTENANCE

A paused controller looks like an idle laptop

▸ 2 reveals as you talk · then → for next

Let me tell you exactly how it breaks, because the mechanism is the lesson. During maintenance, one controller, a PLC, is paused for forty minutes. While it's paused, all the polls coming into it stop. The filter sees those connections vanish, decides they aren't persistent, and deletes them, all twenty. In-degree goes from twenty to zero, incoming bytes from 2.1 million to zero. But here's the thing: those polls are exactly what make a controller look like a controller. Strip them away and the paused PLC has no connections and no traffic, indistinguishable from an idle IT laptop. So the classifier confidently calls it an IT endpoint, and the controller-to-IT leak jumps from 0.08 to 0.30, watch the cell. And this isn't a quirk of the fancy graph model: a plain random forest with no graph at all breaks the identical way. The filter didn't confuse the model, it destroyed the evidence, for any model.

SLIDE 9

THE TAKEAWAY

Persistence is the wrong thing to filter on.

▸ 2 reveals as you talk · then → for next

So the deployable lesson, the part to take back to client work: don't blindly pre-filter passive OT graphs by how often a connection recurs. The edges that disappear during an outage aren't noise, they're precisely the ones that define what a device is. Persistence feels like a safe, free clean-up step, and it isn't: it can silently erase the evidence your classifier relies on, and you won't even notice because the pipeline still runs and still outputs a label. If you're going to filter, filter on what an edge means, the protocol, the direction, the roles of the two endpoints, not on how stable it looks over time. The whole lab, the four scenarios, and the code are released, so you can reproduce this on your own captures. Thank you, everything's at jvdh.tech, happy to take questions.

AFTER SLIDE 9

CLOSE · Q&A

Loop back to the cold-open

Pressing → past the takeaway rotates to the cold-open graph — a calm backdrop for questions. Everything's at jvdh.tech and github.com/jonathanvdheuvel.

Stable-Edge FilteringKPMG · Cyber Tech Resilience

Which connections can you safely throw away?

The clean-up step that broke the classifier

We classify what we can't scan

Keep only the connections that last

What the devices actually do

20 containers, one passive tap

Four scripted changes, known ground truth

Neutral on four. It breaks the fifth.

A paused controller looks like an idle laptop

Persistence is the wrong thing to filter on.

Loop back to the cold-open

Stable-Edge Filtering
KPMG · Cyber Tech Resilience