IT-in-Git

JFR + AI: Stop Guessing What Your JVM Is Doing

4. 6. 2026

JDK Flight Recorder has been quietly sitting in the JVM for years, mostly used as a last-resort "dump a recording and pray" tool. But combine it with the JFR Streaming API and modern AI tooling, and you get something actually useful: a runtime observability loop that can catch problems before your oncall does.

There's a talk from Jfokus 2026 by Yagmur Eren and Joakim Nordström called "Intelligent JVM Monitoring: Combining JDK Flight Recorder with AI." The slides are mostly images so I can't quote them directly, but the idea they're pushing is solid enough to dig into properly.

The pitch: stream live JFR data into an AI system, spot anomalies automatically, and build self-improving applications. I know how that sounds — it's the kind of thing that gets slapped on a conference slide with a very round diagram and zero code. But there's actual substance here, and the ecosystem around it has been quietly maturing. Let me walk through what this actually means in practice.

JFR is not just a dump-and-analyze tool

Most teams I've seen use JFR in exactly one way: something goes wrong in production, someone remembers it exists, they start a recording, dump a .jfr file, open it in JDK Mission Control, and stare at flame graphs for an hour. That's fine, but it's reactive and manual. You're debugging the past.

JEP 349 (Java 14) changed that by introducing the JFR Event Streaming API. The key class is RecordingStream. You can now subscribe to events as they happen, in-process, with sub-second latency. No file dump. No JMC. Just a callback.

The API is surprisingly clean:

try (var rs = new RecordingStream()) {
    rs.enable("jdk.CPULoad").withPeriod(Duration.ofSeconds(1));
    rs.enable("jdk.JavaMonitorEnter").withThreshold(Duration.ofMillis(10));
    rs.enable("jdk.GCHeapSummary").withPeriod(Duration.ofSeconds(5));

    rs.onEvent("jdk.CPULoad", event -> {
        double jvmUser = event.getFloat("jvmUser");
        if (jvmUser > 0.85) {
            alertHighCpu(jvmUser);
        }
    });

    rs.startAsync();
}

That's it. You've got real-time CPU monitoring with a threshold alarm, in about 15 lines of Java. Add GC heap summaries, lock contention events, allocation pressure — the full picture is right there, streaming.

The overhead stays low. The two knobs that matter: how many events you enable, and whether you capture full stack traces at high sampling rates. Stick to sensible defaults and you're well under 2% CPU impact. JFR was designed to run in production; that wasn't an accident.

What events are actually useful?

There are hundreds of built-in jdk.* events. In practice, a handful cover 90% of the interesting cases:

  • jdk.CPULoad — JVM vs system CPU, sampled on a period
  • jdk.GCHeapSummary / jdk.GarbageCollection — heap pressure and GC pause times
  • jdk.JavaMonitorEnter — lock contention, with a threshold so you only catch slow ones
  • jdk.ObjectAllocationInNewTLAB / jdk.ObjectAllocationOutsideTLAB — allocation hotspots
  • jdk.ExecutionSample — method profiling, the basis for flame graphs
  • jdk.SocketRead / jdk.FileRead etc. — I/O latency

You can also define custom events. Subclass jdk.jfr.Event, annotate your fields, and call event.commit(). Those events show up in recordings alongside the built-in ones. This is huge if you want to correlate JVM internals with your own application logic — think: "this request processing event happened, and at the same time GC paused for 200ms."

Where does AI fit in?

Here's where it gets interesting. The straightforward version: you take the event stream and feed it into a model that's been trained or prompted to reason about JVM performance patterns. The model watches the stream like a senior engineer would — except it doesn't get paged at 3am and doesn't miss patterns because it got distracted.

But "feed events to an LLM" is vague. There are a couple of more concrete approaches worth knowing about.

Anomaly detection on the stream. You're not sending raw events to a GPT. You aggregate them — rolling averages, p99 latencies, contention ratios — and feed those summaries to a model that classifies the current state as normal or suspicious. ML models trained on historical JFR data can flag when heap growth rate is trending toward OOM, or when lock contention on a specific monitor spikes beyond baseline. This is closer to time-series anomaly detection than "ask the AI what's wrong."

AI-assisted analysis of recordings. The more immediately practical angle: jfr-mcp — a project by Jaroslav Bachorik from BTrace — exposes a JFR analysis session as an MCP server. You point Claude (or any MCP-compatible agent) at a .jfr file and it can run structured queries using JfrPath expressions, correlate event types, and reason about bottlenecks using established frameworks like the USE method (Utilization, Saturation, Errors across CPU / memory / threads / I/O). The toolchain runs over JBang, so there's actually zero setup:

claude mcp add jafar -- jbang jfr-mcp@btraceio --stdio

Add that to your Claude config and you've got a JVM performance analyst available on demand. Does it replace knowing what you're doing? No. But for triaging "something is slow" incidents, having an agent systematically check every resource class before drilling into hot methods is genuinely useful.

Auto-documentation of events. This one's more of a footnote, but worth knowing: Johannes Bechberger used GPT-3.5 to generate descriptions for JFR events from the OpenJDK source. A lot of the built-in events have either no description or a one-liner that doesn't tell you much. The AI descriptions landed in JDK 21. It's a small thing, but it lowers the barrier to actually using events you haven't seen before.

ML-assisted JVM tuning

Yagmur Eren has been pushing this angle for a while — Jfokus rated her ML-assisted JVM flag tuning talk as one of the top three at the conference. The basic idea: JVM flags like GC algorithm selection, heap sizing, and GC pause targets have massive impact on performance but tuning them requires deep expertise and a lot of trial and error. Machine learning models trained on JFR telemetry from production runs can recommend flag configurations adapted to the actual workload. Honestly, the amount of time teams waste cargo-culting -Xmx and -XX:+UseG1GC from Stack Overflow answers from 2017 is staggering. Automating even part of that loop is worth it.

The live streaming loop

The thing I find most interesting about this direction isn't the AI part per se — it's the feedback loop. JFR Streaming gives you continuous observability with negligible overhead. Pair that with any kind of automated analysis, and you close the gap between "something happened" and "we know what happened."

Traditional approach: wait for user complaints → trigger recording → analyze offline → ship fix → repeat.

With a live JFR stream feeding into anomaly detection: event happens → threshold crossed → alert fired → developer gets context (event timeline, stack traces, correlated GC activity) already assembled → fix targeted.

That's not magic. It's just applying monitoring patterns that the ops world has had for ages — alert on signals, not on symptoms — to JVM internals that were always observable but never wired up properly.

Where the tooling stands right now

JFR itself is production-ready and has been since Java 11 when it was open-sourced (it was originally a JRockit feature that Oracle kept proprietary for years, which — I genuinely don't understand the business logic there, it slowed adoption significantly). The streaming API has been stable since Java 14. Custom events work well. The overhead story is solid.

The AI layer is the part that's still early. jfr-mcp is functional but requires Java 25+ to build from source. The ML-based auto-tuning approaches being demoed at conferences are mostly research prototypes or internal tools at Oracle/Red Hat. The gap between "this is a real architecture" and "this is in your production stack" is still substantial.

But the trajectory is clear. JFR streaming gives you the raw signal. The event model is rich enough that an AI with the right context can reason about it meaningfully. The MCP tooling demonstrates that structured AI analysis of JFR data is tractable today. Automated tuning is harder but the feedback data is there.

If you're running Java services in production and you're not using JFR at all, start there. Get recordings, open them in JMC, understand what your application looks like at runtime. Then look at the streaming API. The AI stuff will follow naturally once you have the observability foundation in place.

Don't wait for the conference slides to become production tooling. Wire up a RecordingStream this week. It'll take you an afternoon.


Sources

Comments

No comments yet. Be the first to comment.

Leave a comment