Hands-on examples of common JVM issues diagnosed through heap dumps, thread dumps, and core dumps.
Each case includes a reproducible Java application, Kubernetes deployment, and step-by-step analysis guide with screenshots.
Inspired by lldb-netcore-use-cases
# Build the project
mvn clean package -DskipTests
# Run a specific case (e.g., memory leak)
docker build -t jvm-dump-cases .
docker run -it jvm-dump-cases memoryleak
# Or deploy on Kubernetes
kubectl apply -f k8s/01-infinitewait.yaml| # | Case | CPU | Memory | Diagnosis Tool | Doc |
|---|---|---|---|---|---|
| 1 | Low CPU Hang (Infinite Wait) | 🟢 Low | 🟢 Normal | Thread Dump | 📖 |
| 2 | High CPU Hang (Infinite Loop) | 🔴 High | 🟢 Normal | Thread Dump + top -H |
📖 |
| 3 | Heap Memory Leak | 🟢 Normal | 🔴 Growing | Heap Dump + MAT | 📖 |
| 4 | Single Thread High Memory | 🟢 Normal | 🔴 Spike | Heap Dump + MAT | 📖 |
| 5 | Unhandled Exception / Crash | — | — | -XX:+CrashOnOutOfMemoryError |
📖 |
| 6 | Classloader Leak | 🟢 Normal | 🔴 Metaspace | Heap Dump (class histogram) | 📖 |
| 7 | Thread Leak | 🟡 Medium | 🔴 Growing | Thread Dump + jstack |
📖 |
| 8 | Log4j Appender Blocking | 🟢 Normal | 🟢 Normal | Thread Dump (BLOCKED) | 📖 |
| 9 | Finalizer Queue Leak | 🟢 Normal | 🔴 Growing | Heap Dump (Finalizer) | 📖 |
| 10 | Unmanaged (Native) Memory Leak | 🟢 Normal | 🔴 RSS growing | NMT + pmap |
— |
| 11 | Deadlock | 🟢 Low | 🟢 Normal | Thread Dump (jstack) |
📖 |
| 12 | GC Thrashing | 🔴 High (GC) | 🔴 Full | GC Logs + jstat |
📖 |
| 13 | Metaspace OOM | 🟢 Normal | 🔴 Metaspace | NMT + class histogram | 📖 |
| 14 | Off-Heap Leak | 🟢 Normal | 🔴 RSS growing | NMT + pmap |
📖 |
| 15 | Connection Pool Exhaustion | 🟢 Normal | 🟢 Normal | Thread Dump (WAITING) | 📖 |
| 16 | Thread Pool Saturation | 🟡 Medium | 🟢 Normal | Thread Dump + metrics | 📖 |
| 17 | Stack Overflow | 🟢 Normal | 🟢 Normal | Stack trace analysis | 📖 |
| 18 | File Descriptor Leak | 🟢 Normal | 🟢 Normal | lsof + /proc/fd |
📖 |
-
Deadlock Detection — Two threads holding locks and waiting for each other. Diagnose with
jstack(shows "Found one Java-level deadlock"). Classic producer-consumer deadlock scenario. -
GC Thrashing / GC Overhead Limit — Application spending >98% time in GC with <2% heap recovered. Diagnose with GC logs (
-Xlog:gc*), GCViewer. Triggerjava.lang.OutOfMemoryError: GC overhead limit exceeded. -
Metaspace / PermGen OOM — Metaspace exhaustion from dynamic class generation (Groovy scripts, CGLIB proxies, excessive reflection). Diagnose with
-XX:+HeapDumpOnOutOfMemoryErrorand class histogram. -
Direct ByteBuffer / Off-Heap Leak — Native memory leak via
ByteBuffer.allocateDirect()or NIO channels. RSS grows but heap looks fine. Diagnose with NMT (-XX:NativeMemoryTracking=detail) andjcmd VM.native_memory. -
Connection Pool Exhaustion — Database connection pool (HikariCP/C3P0) fully consumed. Threads stuck waiting for connection. Diagnose with thread dump (waiting on pool) + pool metrics.
-
Thread Pool Saturation —
ThreadPoolExecutorwith bounded queue full. Tasks rejected withRejectedExecutionException. Diagnose with thread dump (all pool threads RUNNABLE) + JMX metrics. -
Excessive Object Creation (Allocation Pressure) — High allocation rate causing frequent young GC. Short-lived objects dominating Eden space. Diagnose with allocation profiling (JFR/async-profiler).
-
Stack Overflow — Deep recursion causing
StackOverflowError. Diagnose with-Xsstuning, thread dump shows deep call stack. -
String/StringBuilder Abuse — Massive String concatenation in loops creating GC pressure. Compare
String +=vsStringBuildervsString.join(). Diagnose with allocation profiler. -
Zombie / Orphan Threads — Threads created but never properly shut down (missing
ExecutorService.shutdown()). Thread count grows indefinitely. Diagnose withjstackthread count over time. -
Class Data Sharing (CDS) Issues — AppCDS misconfiguration causing slow startup or class loading failures. Diagnose with
-Xlog:class+load.
-
JIT Compilation Issues — Code deoptimization causing performance cliffs. C2 compiler bailouts. Diagnose with
-XX:+PrintCompilationand JFR. -
Safepoint Stalls — Long time-to-safepoint causing latency spikes. Diagnose with
-XX:+PrintSafepointStatisticsand JFR safepoint events. -
TLAB Resizing / Allocation Contention — Multi-threaded allocation contention outside TLABs. Diagnose with
-XX:+PrintTLABand JFR. -
File Descriptor Leak —
java.io.IOException: Too many open files. Streams/connections opened but never closed. Diagnose withlsof -p <pid>and/proc/<pid>/fd. -
DNS Resolution Hang —
InetAddress.getByName()blocking under load. JVM DNS caching issues (networkaddress.cache.ttl). Thread dump shows threads stuck in DNS resolution. -
SSL/TLS Handshake Issues — Slow or failing TLS handshakes. Certificate validation problems, cipher negotiation. Diagnose with
-Djavax.net.debug=ssl:handshake. -
Container Memory Limits — JVM not respecting container memory limits (old JVMs).
-XX:+UseContainerSupportvs manual-Xmx. OOMKilled by Kubernetes vs JVM OOM. -
Large Object Allocation in Old Gen — Objects too large for young gen allocated directly in old gen, causing premature full GC. Diagnose with
-XX:PretenureSizeThresholdand GC logs.
| Tool | What It Captures | Command |
|---|---|---|
| jstack | Thread dump | jstack <pid> |
| jmap | Heap dump | jmap -dump:format=b,file=heap.hprof <pid> |
| jcmd | All-in-one diagnostic | jcmd <pid> GC.heap_dump heap.hprof |
| jstat | GC statistics | jstat -gcutil <pid> 1000 |
| jinfo | JVM flags | jinfo -flags <pid> |
| jfr | Flight Recorder | jcmd <pid> JFR.start duration=60s filename=rec.jfr |
| MAT | Heap analysis (GUI) | Eclipse Memory Analyzer |
| TDA | Thread dump analysis | Thread Dump Analyzer |
| async-profiler | CPU/allocation profiling | ./profiler.sh -d 30 -f profile.html <pid> |
| NMT | Native memory tracking | -XX:NativeMemoryTracking=detail + jcmd VM.native_memory |
| GCViewer | GC log analysis | Parse -Xlog:gc*:file=gc.log |
| VisualVM | Live monitoring | Connect via JMX |
# Heap dump on OOM (essential for production)
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/heapdump.hprof
# GC logging (JDK 11+)
-Xlog:gc*:file=gc.log:time,uptime,level,tags:filecount=5,filesize=100m
# Native Memory Tracking
-XX:NativeMemoryTracking=detail
# Flight Recorder (always-on in production)
-XX:StartFlightRecording=dumponexit=true,filename=recording.jfr,maxage=1h
# Crash on OOM (let Kubernetes restart the pod)
-XX:+CrashOnOutOfMemoryError
# Container-aware memory settings
-XX:+UseContainerSupport -XX:MaxRAMPercentage=75.0jvm-dump-usecases/
├── src/main/java/com/pamir/dump/cases/
│ ├── Application.java # Main entry point (selects case by arg)
│ ├── Case.java # Base interface
│ ├── InfiniteWait.java # Low CPU hang
│ ├── InfiniteLoop.java # High CPU hang
│ ├── MemoryLeak.java # Heap memory leak
│ ├── SingleThreadHighMemoryUsage.java
│ ├── CrashOnError.java # Unhandled exception
│ ├── ClassloaderLeak.java # Metaspace leak
│ ├── ThreadLeak.java # Thread count growing
│ ├── Log4JCase.java # Log4j blocking
│ ├── FinalizerCase.java # Finalizer queue leak
│ └── UnhandledException.java
├── docs/ # Step-by-step analysis guides with screenshots
├── k8s/ # Kubernetes deployment manifests
├── Dockerfile # Container image
├── Dockerfile_oom # OOM-specific image
└── pom.xml # Maven build
- eBay SRE: Triage a Non-Heap JVM OOM Issue
- lldb-netcore-use-cases (inspiration for .NET equivalent)
- Java Performance: In-Depth Advice — Scott Oaks
- JVM Troubleshooting Guide
- Eclipse MAT Documentation
- async-profiler
MIT