Heap Walking and Allocation Tracking: TLABs and Heap Analysis
Understand how the JVM allocates memory with TLABs, how to track allocations with low overhead, and how heap walking tools analyze object graphs.
Heap Walking and Allocation Tracking: TLABs and Heap Analysis
Every Java object you create eventually lives on the heap, and understanding how and where those objects get allocated is critical for diagnosing memory issues. The JVM uses Thread-Local Allocation Buffers (TLABs) to make allocation fast and lock-free, but this also makes tracking allocation origins harder. This guide explains how TLABs work, how to use allocation profiling tools without destroying performance, and how to walk the heap graph when you need to find what is keeping your memory hostage.
Introduction
Every Java object you create lives on the heap, but the path it takes from your code to memory is more nuanced than a simple allocation call. The JVM uses Thread-Local Allocation Buffers (TLABs) to make allocation fast and lock-free, but this optimization also hides most allocations from traditional profilers. Understanding how TLABs work, how to track allocations without destroying performance, and how to walk the object graph when things go wrong are essential skills for anyone diagnosing memory issues in production Java applications.
TLABs sit at the intersection of allocation performance and observability. They eliminate allocation contention in multi-threaded applications, but they also make allocation profiling significantly harder because most objects never touch shared data structures during allocation. To actually see where objects are born, you need sampling-based profiling, JFR events, or JVMTI callbacks—each with different trade-offs. Meanwhile, heap walking answers a different question: not where objects are created, but why they are still alive and what is keeping them in memory.
This post covers TLAB mechanics and allocation tracking, then shifts to heap walking when you need to find retention leaks that allocation data cannot explain. You will learn when to reach for allocation profiling versus heap dumps, how to use async-profiler and JFR for production-safe diagnostics, and how to trace GC root chains to find what is keeping your memory hostage. Practical failure scenarios show how TLAB bias, retention leaks, and off-heap memory all masquerade as different problems.
How the JVM Allocates Memory
Before getting into diagnostics, you need to understand how modern JVMs actually put objects on the heap.
The Problem with Concurrent Allocation
In a multi-threaded application, naively allocating objects on a shared heap requires synchronization. If every thread had to grab a lock to allocate memory, allocation would become a serious bottleneck. The JVM solves this with thread-local allocation buffers.
Thread-Local Allocation Buffers (TLABs)
TLABs are pre-allocated regions of Eden space that are assigned to each application thread. When a thread creates a new object, it allocates memory from its TLAB using a simple pointer bump — no locks, no atomic operations, just writing the object into the buffer and advancing a pointer.
graph TB
subgraph "Heap"
direction TB
subgraph "Eden Space"
T1[TLAB 1<br/>Thread-1]
T2[TLAB 2<br/>Thread-2]
T3[TLAB 3<br/>Thread-3]
end
subgraph "Survivor Spaces"
S1[S0]
S2[S1]
end
subgraph "Old Generation"
OG[Old Gen]
end
end
T1 -->|Fill, GC| S1
T2 -->|Fill, GC| S1
T3 -->|Fill, GC| S2
S1 -->|Aging| OG
S2 -->|Aging| OG
When a TLAB is exhausted, the thread requests a new one from Eden. This happens without stopping other threads.
What TLABs Mean for Profiling
Because allocation happens inside the TLAB without any shared lock, traditional allocation profilers that instrument every new keyword see only a small fraction of allocations. The actual heap allocation is hidden inside the TLAB bump operation.
To actually see allocation sites, you need either:
- Sampling-based allocation profiling (Periodically interrupt threads and inspect their stacks)
ObjectAllocationInTLABandObjectAllocationOutsideTLABevents from JFRVMObjectAlloccallbacks from JVMTI
When to Use Allocation Tracking
Ideal Use Cases
- Finding allocation hotspots: Which code paths create the most objects?
- Memory leak investigation: What objects are accumulating and who allocates them?
- GC tuning: Which generations are receiving the most allocation traffic?
- Off-heap vs on-heap decisions: How much memory goes to specific types?
When Allocation Tracking Misleads
- TLAB bias: Most small, short-lived objects never appear in allocation profiles because they die inside TLABs
- Sampling bias: Statistical sampling may miss rare but expensive allocations
- Survivor confusion: Allocation rate does not equal heap pressure — a churned survivor space still stresses GC
Implementation: Reading TLAB Statistics
import java.lang.management.*;
import javax.management.*;
public class TlabDiagnostics {
public void dumpTlabStats() {
List<MemoryPoolMXBean> pools = ManagementFactory.getMemoryPoolMXBeans();
for (MemoryPoolMXBean pool : pools) {
// TLAB stats are in the young generation pools
MemoryUsage usage = pool.getUsage();
if (usage == null) continue;
String name = pool.getName();
System.out.println("Pool: " + name + " used=" + usage.getUsed());
}
}
public void parseAllocationFromJfr(String jfrFile) throws Exception {
// Use JFR API to analyze allocation events
System.out.println("Analyzing allocation from " + jfrFile);
// jfr print --events ObjectAllocationInTLAB,ObjectAllocationOutsideTLAB file.jfr
}
}
Using async-profiler for Allocation Profiling
async-profiler can show allocation flame graphs with very low overhead:
# Profile allocations, show top 20 allocation sites
./async-profiler.sh alloc -d 60 -f alloc_profile.html \
-XX:MaxGCPauseMillis=50 \
<pid>
# Combined CPU + allocation profiling
./async-profiler.sh combined -d 30 -f combined.html <pid>
The alloc mode uses JVMTI’s VMObjectAlloc callback and is suitable for production with overhead around 2-5%.
Heap Walking
Heap walking traverses the entire object graph to find references between objects. This is how tools like Eclipse MAT find theGC roots that keep objects alive.
GC Roots
Objects are kept alive by references from:
- Thread stacks: Local variables and operand stacks
- JNI references: Native code holding Java object references
- Static fields: Class-level object references
- Running threads: The thread object itself
- Classloaders: Classloader objects
- Finalization queue: Objects awaiting finalization
Walking the Heap with JHSDB
# Attach to a process
jhsdb jhsdb --pid <pid>
# Once attached:
# Walk heap from a specific object
jsadebugd <pid> com.sun.tools.jhsdb.ObjectHistogramViewer
Programmatic Heap Walking with JVMTI
#include <jvmti.h>
static jvmtiEnv *jvmti = NULL;
jlong TAG_LEAKED = 1;
void JNICALL
ObjectFree(jvmtiEnv *jvmti_env, jlong tag) {
// Object being freed - if it had a tag, we know it was alive
}
jvmtiError JNICALL
iterate_heap_callback(jvmtiHeapReferenceKind ref_kind,
const jvmtiHeapReferenceInfo* info,
jlong klass, void* user_data) {
// Called for every reference found during heap iteration
return JVMTI_ITERATION_CONTINUE;
}
void walk_heap(jvmtiEnv *jvmti) {
jvmtiHeapCallbacks callbacks;
memset(&callbacks, 0, sizeof(callbacks));
callbacks.heap_reference_callback = iterate_heap_callback;
// Walk all reachable objects
(*jvmti)->IterateOverReachableHeap(jvmti, NULL, &callbacks, NULL);
}
Production Failure Scenarios
Scenario 1: Allocation Rate Mismatch
Symptom: Your profiler shows low allocation rate but GC is constantly running.
Investigation: Most allocations happen inside TLABs. If objects die young (inside the young generation), they never appear in sampling-based allocation profiles because the profiler only sees allocation sites, not heap turnover.
What TLAB hiding looks like: A profiler might show 100MB/s of allocations, but young GC is reclaiming 2GB/s. The difference is short-lived objects that never get sampled.
Fix: Use GC logs to measure actual allocation rate. JFR’s YGCT and YGC events show you the real churn.
Scenario 2: Retention Leak Masked as Allocation Leak
Symptom: Heap grows but allocation profiling shows no obvious culprit.
Investigation: The problem is not how many objects you allocate, but how long they live. A cache with no eviction policy might allocate a modest number of entries but never release them.
Approach: Use MAT or JMap to take a heap dump and find the GC root chain keeping objects alive. Allocation profiling alone cannot find retention leaks.
# Take heap dump
jmap -dump:file=heap.bin <pid>
# Analyze with jhat (basic)
jhat heap.bin
# Or import into Eclipse MAT
Scenario 3: NIO DirectByteBuffer Appearing as Native Memory Leak
Symptom: Native memory (off-heap) grows continuously, heap looks fine.
Investigation: Direct ByteBuffer allocation happens via PlatformAddress outside the Java heap. Use NMT (Native Memory Tracking) to see the breakdown.
# Enable NMT and check
-XX:NativeMemoryTracking=detail
jcmd <pid> VM.native_memory summary
# Track direct ByteBuffer allocations
jcmd <pid> VM.native_memory baseline
# ... run workload ...
jcmd <pid> VM.native_memory detail.diff
What it showed: File channel reads were accumulating DirectByteBuffer objects with 1MB native allocations each, and only 64KB were being freed because of a bug in the cleanup code.
Trade-off Table
| Aspect | JFR Allocation Events | async-profiler alloc | JMap Heap Dump | JVMTI Custom |
|---|---|---|---|---|
| Overhead | 1-5% | 2-5% | 50-200% (stop-the-world) | 5-20% |
| Data captured | TLAB vs non-TLAB | Call stacks | Full graph | Configurable |
| Production safe | Yes | Yes | No (pause) | Conditional |
| Shows retention | No | No | Yes | Yes |
| TLAB visibility | Yes | Partial | Yes | Yes |
| Setup complexity | Low | Medium | Low | High |
Observability Checklist
- Enable JFR
ObjectAllocationInTLABandObjectAllocationOutsideTLABfor allocation visibility - Use async-profiler
allocmode for production-safe allocation flame graphs - Correlate allocation rate with GC logs — mismatch reveals TLAB hiding
- Take heap dumps during or immediately after incidents (use
jcmdinstead ofjmap) - Track
BufferPoolandDirectByteBuffercounts via MBean if using NIO - Use NMT to understand native vs Java heap memory split
- Set allocation sampling thresholds appropriately — too low creates noise, too high misses small allocations
- Track heap occupancy over time with
MemoryMXBeanto find gradual leaks - When retention is the issue, heap dumps are unavoidable — plan for the pause
Security Notes
Allocation data can expose sensitive information:
- Allocation call stacks reveal business logic, API endpoints, and data structures
- Heap dumps contain full application state including session data, tokens, and PII
- Allocation profiling data stored in files may be accessible to unauthorized users
Best practices:
- Take heap dumps only in staging or with explicit approval in production
- Store heap dumps in encrypted storage with access controls
- Cleanse sensitive data from allocation logs before sharing
- Restrict access to allocation flame graph outputs
- Use temporary files for profiling data that are deleted after analysis
Common Pitfalls / Anti-Patterns
Pitfall 1: Confusing Allocation Rate with Heap Pressure
Problem: You optimize the top allocation sites, but heap still grows.
Explanation: Allocation profiling shows where objects are created, not how long they live. Reducing allocations that die young in Eden does not reduce old generation pressure. You need to find retention, not allocation.
Fix: Use heap dumps and GC logs together. GC logs show where memory is actually being retained.
Pitfall 2: Sampling Too Coarsely
Problem: Rare allocations that cause OOM never appear in profiles.
Explanation: If you sample 1 in 1000 allocations, objects allocated infrequently but in large batches (e.g., loading a large dataset once) will be invisible.
Fix: When OOM is your problem, do not rely on sampling. Use heap dumps and look at the total size of objects by class.
Pitfall 3: Ignoring DirectByteBuffer Off-Heap Memory
Problem: Java heap looks fine but OS reports the process using far more memory.
Explanation: Direct ByteBuffer and other off-heap allocations do not appear in Java heap metrics.
Fix: Enable NMT (-XX:NativeMemoryTracking=detail) and monitor jcmd <pid> VM.native_memory.
Pitfall 4: Misreading GC Root Chains
Problem: Heap dump analysis shows a massive object as the leak suspect, but it is just a symptom.
Explanation: MAT-style tools show the path keeping the largest object alive. But that object might itself be referenced by something bigger. You need to walk backwards from the GC root, not forwards from the largest object.
Fix: Use “Path toGC Roots” in MAT, selecting “exclude weak refs” to see what is really pinning memory.
Pitfall 5: TLAB Sizes Changing Under Load
Problem: Allocation profile changes dramatically between idle and peak load.
Explanation: TLAB sizes are dynamically computed based on allocation rate and thread count. When load changes, TLAB sizes change, which changes what appears in allocation profiles.
Fix: Run allocation profiling under realistic load to match production conditions.
Quick Recap Checklist
- TLABs make allocation fast and lock-free but hide most allocations from profilers
- Allocation profiles miss short-lived objects that die inside TLABs
- Use JFR
ObjectAllocationInTLABevents to see TLAB vs non-TLAB allocation - async-profiler
allocmode is production-safe for allocation flame graphs - Retention leaks require heap dumps, not allocation profiling
- NMT (
jcmd VM.native_memory) tracks off-heap memory including DirectByteBuffer - GC logs show actual heap pressure; allocation profiles show where objects originate
- Take heap dumps with
jcmdnotjmapfor more controlled pauses - Heap dump analysis with MAT requires understanding GC root chains
- Production allocation profiling overhead should stay under 5%
Interview Questions
A TLAB (Thread-Local Allocation Buffer) is a pre-allocated region of Eden space assigned to each application thread. When a thread allocates an object, it simply writes the object into its TLAB and bumps a pointer. No locks, no CAS, no contention. The problem for profiling is that most allocations never touch any shared data structure, so instrumentation-based profilers never see them. To profile allocations actually hitting the heap, you need sampling-based approaches (like async-profiler's signal-driven sampling) or JVM events like ObjectAllocationInTLAB from JFR that the JVM emits when a TLAB is retired.
Allocation profiling tells you where objects are born. A retention leak is about where objects die — or rather, why they keep living when they should not. To find retention leaks, take a heap dump and analyze it with MAT or similar tools. Look at the largest objects first, then trace their GC root chains using "Path to GC Roots" (excluding weak references). You are looking for unexpected references keeping collections, caches, or stateful objects alive long past their useful life. Common culprits are: maps used as caches without eviction, listeners registered but never unregistered, static collections accumulating entries, and objects stored in ThreadLocal that outlive their thread.
Allocation profiling tells you how many objects of each type or from each call site are being created per second. It answers "where are objects born." Heap walking traverses the graph of live objects to find references between them. It answers "why is this object still alive" and "what is keeping my heap full." Allocation profiling can run with low overhead in production (1-5%). Heap walking requires stopping the world (for a consistent snapshot) or a very sophisticated concurrent algorithm, and the analysis is interactive, not real-time.
async-profiler uses the OS SIGPROF signal (or SIGARMLP on ARM) to interrupt the JVM at safe points — moments when the thread is at a point where its stack is walkable and the heap is consistent. When the signal fires, the handler walks the thread's call stack using JVMTI GetStackTrace. This gives a statistical sample of what the thread was doing when it was interrupted. By itself this is just CPU profiling. To get allocation profiling, async-profiler also uses can_generate_vm_object_alloc_events fromJVMTI to receive callbacks for allocations above a configurable size threshold. The combination gives you allocation call stacks with overhead typically under 5%.
Java heap is just one part of a JVM process's memory footprint. Native memory usage includes: direct ByteBuffer allocations (via Unsafe.allocateMemory), JIT compiled code stored in the code cache, native code generated for dynamic languages, memory-mapped files used by JVM internals, thread stacks (typically 1MB each), and JNI allocations. Use -XX:NativeMemoryTracking=detail and jcmd to see the breakdown. Common culprits for unexplained native memory growth are: DirectByteBuffer leaks (when ByteBuffers are not properly released), classloader leaks (holding onto native libraries), and metaspace growth from dynamic class loading.
TLAB size is computed dynamically based on several factors: the allocation rate of the thread (higher allocation rate = larger TLAB), the desired refill waste threshold (typically 2%), and the overall Eden size and thread count. The JVM calculates desiredTLABSize = AllocationRate / TargetAllocationRate * averageTLABSize. When allocation rate spikes, TLABs grow larger to reduce TLAB refill overhead. When a thread dies, its TLAB is abandoned and its remaining space is reclaimed. You can control minimum and maximum TLAB sizes with -XX:MinTLABSize and -XX:TLABSize, but the JVM may override these based on observed allocation behavior.
Objects larger than a TLAB threshold are allocated directly in Eden outside any TLAB. These "humongous allocations" are tracked differently — they are not subject to TLAB sampling, so they appear in ObjectAllocationOutsideTLAB events but not in ObjectAllocationInTLAB. They also have different GC behavior: humongous objects are collected in a dedicated phase during young GC because they cannot fit in survivor spaces. This can cause longer young GC pauses and is a common source of GC performance issues when applications allocate large buffers or cache entries.
G1 (Garbage First) uses a different heap layout than Parallel GC — it divides the heap into many small regions instead of contiguous Eden/Survivor/Old spaces. TLABs still exist within G1's Eden regions, but G1's incremental collection approach means TLAB waste has different implications. G1 can reclaim a TLAB's remaining space more efficiently because it only collects regions that are mostly empty, whereas Parallel GC must collect entire generations. G1 also has a "humongous region" concept for objects larger than 50% of a region size, which behave differently from standard TLAB allocations.
ZGC is a concurrent collector that performs most GC work concurrently, meaning the heap is consistently traversable without long stop-the-world pauses. For heap walking, this means IterateOverReachableHeap does not block for seconds even on large heaps. For allocation profiling, ZGC's colored pointers and load barriers add overhead to allocation fast-paths, but the trade-off is predictable pause times. However, ZGC's concurrent nature means some events (like object free callbacks) may fire at different points compared to stop-the-world collectors, which can affect the accuracy of allocation profiling.
Escape analysis determines whether an object escapes a method's scope (e.g., returned, stored in a collection, or passed to another thread). When an object does not escape, the JIT can allocate it on the stack instead of the heap — this is called "scalar replacement." Stack allocation means the object never appears in heap profiles or heap dumps. This is why sometimes you see fewer allocations in profiles than expected for well-optimized code. The interaction with TLABs is indirect: scalar replaced objects bypass TLAB allocation entirely since they never touch the heap. Understanding this helps explain discrepancies between allocation profiling and GC logs.
jmap produces heap dumps in three formats: the classic binary format (-dump:file=heap.bin), the new Java 9+ format with class histogram metadata, and the java_pid format used by HPROF. For analysis: jmap -heap shows summary statistics without a dump, jmap -histo shows class-level histogram, and jmap -dump produces a full heap dump. For very large heaps, use jcmd instead of jmap because it handles large dumps more gracefully and can write to a specific directory.
Shallow heap is the size of the object itself (just the object header plus field sizes). Retained heap is the total size of the object plus all objects it keeps alive through direct or indirect references. In MAT, the "shallow size" of a HashMap is just the HashMap object size (~56 bytes), while its "retained size" includes the HashMap object plus all Entry objects plus all key-value objects reachable from those entries. Retention leaks cause retained heap to grow while shallow heap stays small. When analyzing dumps, always look at retained size — a large shallow heap object might not be the leak if it only points to small objects.
Card marking is how the JVM tracks which regions of memory might contain references to older generations. The heap is divided into "cards" (typically 512 bytes). When a TLAB is filled and a pointer to an older object is stored, the card containing that pointer is marked "dirty." During young GC, only dirty cards need to be scanned for cross-generational references instead of the entire heap. This is critical for G1's incremental collection. Excessive card dirtying from frequent TLAB-to-old-gen references can cause GC performance issues — a pattern common in applications with large write-heavy caches.
Objects with a finalize() method are not reclaimed immediately when unreachable — they are placed in the finalization queue and finalized by a dedicated thread before being reclaimed. During this time, the object is still alive (retained in heap) even though it is effectively dead. MemoryMXBean.getObjectPendingFinalizationCount() tells you how many objects are waiting for finalization. Large numbers of objects pending finalization indicate finalizer threads are overwhelmed or finalizers are slow. This is a common source of memory leaks because objects that should die quickly accumulate until the finalizer runs. Using try-finally or Cleaner (Java 9+) is preferred over finalization.
When large objects appear in the leak suspect list, trace their inbound references backward (not forward) using "Path to GC Roots" in MAT. The pattern to look for: the large object is a container (List, Map, cache) and its size reflects accumulated entries. If the container itself appears in the leak list, find what is keeping it alive — usually a static reference, a singleton cache, or a thread-local map. If the large object is a byte array or DirectByteBuffer, the leak is likely in NIO buffer management or native memory. Check the GC roots to see if it is reachable from thread stacks, static fields, or classloader hierarchies.
Weak references (WeakHashMap, WeakReference) are collected on the next GC cycle if only weakly reachable — they do not prevent garbage collection. Soft references are collected only when the JVM is low on memory, making them suitable for memory-sensitive caches. Phantom references are enqueued after an object is finalized but before its memory is reclaimed — used for post-mortem cleanup. For leak analysis, weak references in caches are a common culprit: entries disappear from the map after GC runs, making the cache appear to leak when the real issue is the map holding references to objects that should be weakly-referenced.
async-profiler in alloc mode uses JVMTI's VMObjectAlloc callback but with a configurable sampling interval — it does not receive every allocation. By default it samples roughly 1 in 1000 allocations for large objects and fewer for small ones. The threshold is configurable via the -i flag (interval). For production, a common setting is -i 1000000 (1 in 1 million) to keep overhead minimal. The profiler also uses different thresholds for TLAB vs non-TLAB allocations. Events below the threshold are not delivered, which is why allocation profiles show sampling statistics rather than exact counts.
Object alignment (controlled by -XX:ObjectAlignmentInBytes, default 8 bytes) ensures objects start at addresses that are multiples of this value. This affects both object size (padded to alignment) and TLAB sizing (TLAB size must be a multiple of alignment). Larger alignment reduces fragmentation and can improve performance on architectures with efficient vector operations, but increases memory overhead (a 17-byte object becomes 24 bytes with 8-byte alignment). TLAB sizing algorithms use alignment to ensure TLAB boundaries align with object boundaries, preventing wasted space from cross-TLAB allocations.
The dead code trap occurs when you optimize allocation sites based on profiling data that includes code paths that are no longer executed in production (e.g., debugging code, feature flags, or dead branches after refactoring). This makes the allocation profile misleading — you spend effort optimizing code that does not matter. To avoid it: verify allocation hotspots against production traffic patterns, use production traffic for profiling rather than synthetic benchmarks, and re-profile after code changes that remove or add allocation-heavy paths.
Heap dumps are stop-the-world in most collectors — the JVM pauses all threads while the heap is serialized to disk. On large heaps (100GB+), this can cause minutes-long pauses. Safe approaches: use jcmd instead of jmap -dump as it handles errors more gracefully; use -XX:+HeapDumpOnOutOfMemoryError to capture dumps only on OOM which is already a stopped state; use chunked heap dumps in G1 with -XX:G1HeapRegionSize tuning to control dump time; for production, use tooling that copies the heap incrementally like async-profiler's dump functionality which minimizes pause time.
Further Reading
- async-profiler GitHub - Allocation profiling and flame graphs
- Eclipse MAT Documentation - Memory Analyzer Tool reference
- JEP 425: Virtual Threads (Java 19+) - How TLAB interacts with virtual threads
- HSDB / jhsdb Tutorial - Using the Heap Statistics Debugger
- NMT (Native Memory Tracking) - Tracking off-heap memory with jcmd
Conclusion
The key to effective heap analysis is picking the right tool for the question you are asking. Allocation profiling tells you where objects are born — useful for finding churn hot spots — while heap walking tells you why objects are still alive — essential for finding retention leaks. TLABs keep allocation fast and lock-free but add complexity when profiling. For production diagnostics, use JFR allocation events or async-profiler; for deep retention analysis, plan for heap dumps during maintenance windows and use MAT to trace GC root chains.
Category
Related Posts
CDS and AppCDS: Class Data Sharing for Faster JVM Startup
A guide to Class Data Sharing in the JVM, covering how CDS and AppCDS work, how to create shared archives, and how they reduce startup time and memory footprint.
JVM in Containers: Cgroup Memory Limits and Heap Sizing
A guide to how the JVM detects container memory limits, configures heap accordingly, and avoids pitfalls when running Java in Docker and Kubernetes.
GC Logging Analysis: Decoding JVM Garbage Collection Logs
Master JVM garbage collection log analysis with -Xlog:gc, GCEasy, and GCViewer. Practical guide to diagnosing memory issues and tuning performance.