Execution Engine: Interpreter, JIT Compiler, and Garbage Collector

Deep dive into the JVM Execution Engine covering bytecode interpretation, JIT compilation, and Garbage Collector architecture and algorithms.

published: reading time: 27 min read author: GeekWorkBench

Execution Engine: Interpreter, JIT Compiler, and Garbage Collector

The Execution Engine is where the JVM actually runs your code. It takes bytecode from the Runtime Data Areas and executes it, either by interpreting it or by compiling it to native machine code on the fly. It also manages memory through the Garbage Collector, reclaiming objects that are no longer needed.

Understanding the Execution Engine helps you reason about JIT warmup behavior, choose the right GC algorithm, and diagnose performance problems in production.

Introduction

The Execution Engine is where the JVM actually runs your code. It consumes bytecode from the Runtime Data Areas and executes it through one of two paths: interpretation, which reads and executes bytecode instructions one at a time, or JIT compilation, which translates hot bytecode to native machine code at runtime for much faster execution. The third major component of the execution engine is the Garbage Collector, which reclaims memory from objects that are no longer reachable. Together, these three pieces determine your application’s performance characteristics — startup speed, peak throughput, and latency stability.

Understanding the Execution Engine gives you the mental model for reasoning about JIT warmup behavior, choosing the right GC algorithm, and diagnosing performance problems in production. When your application starts slowly and then speeds up, that is tiered compilation in action. When you see periodic latency spikes despite ample heap, that is often Full GC pause behavior. When your CPU usage spikes during startup and then stabilizes, that is the JIT compiler at work.

This guide covers the interpreter, JIT compiler, and garbage collector as integrated components of the JVM execution pipeline. You will learn how bytecode becomes native code, which JIT optimizations drive the biggest performance gains, how the GC identifies and reclaims unreachable objects, and which collector to choose based on your latency versus throughput requirements.

When NOT to Use

If your application does not have strict latency or throughput requirements, you can treat the interpreter, JIT compiler, and garbage collector as black boxes. The JVM defaults handle most workloads adequately. Do not switch GC algorithms or disable tiered compilation without measurements showing actual problems. A service handling hundreds of requests per second with consistent latency does not need ZGC; G1 is fine.

Avoid JIT internals unless you are tracking down specific issues. Compilation thresholds, inline budgets, and escape analysis settings are JVM-specific and vary between versions. Time spent tuning -XX:CompileCommand flags is usually better spent improving your hot code paths or algorithmic complexity. The exception is when you have clear evidence from async-profiler, Java Flight Recorder, or GC logs that compilation is actually the problem.

Pick your GC algorithm based on data, not preference. G1 balances latency and throughput for a range of workloads, but batch jobs that care about throughput may run faster with Parallel GC. Trading systems with ultra-low-latency requirements may need ZGC or Shenandoah. If you do not have measurable latency outliers violating SLAs, stick with the G1 defaults. Runtime Data Areas and JVM Startup and Shutdown give you the memory and initialization context that sits underneath the execution engine.

Execution Engine Architecture

graph LR
    Bytecode["Bytecode<br/>Instructions"]

    subgraph "Execution Engine"
        Interpreter["Interpreter"]
        JIT["JIT Compiler<br/>Client/Server"]
        GC["Garbage<br/>Collector"]
        NMI["JNI/Native<br/>Interface"]
    end

    Bytecode -->|Hot Code| JIT
    Bytecode -->|Cold Code| Interpreter
    Interpreter -->|Profile| JIT
    JIT -->|Native Code| Execution
    Execution --> NMI
    NMI --> GC

The Interpreter

The interpreter reads bytecode instructions one at a time and executes them directly. When the JVM starts, it uses the interpreter for all code. Interpretation is fast to start but slow to execute because each instruction requires a lookup and dispatch cycle.

The interpreter performs a switch on the opcode and calls the appropriate implementation. This dispatch overhead adds up. A simple arithmetic operation might require dozens of JVM instructions just to fetch operands and store results.

The interpreter records profiling information: which methods are called frequently, which branches are taken, which types flow through each call site. This profiling data feeds the JIT compiler.

Tiered Compilation

Modern JVMs use tiered compilation to balance startup speed and peak performance:

Tier 1: Interpreted bytecode. Fast startup, slower execution. Tier 2: Simple JIT compilation (client compiler). Quick native code generation with basic optimizations. Tier 3: Limited profiling JIT (server compiler). More aggressive optimizations based on profiling data. Tier 4: Full optimization JIT (server compiler). Maximum optimization for proven hot paths.

The JVM promotes methods through tiers as they become hotter. Deoptimization can cause a method to drop back to a lower tier if assumptions are violated.

Just-In-Time Compiler

The JIT compiler solves the interpretation bottleneck by compiling hot bytecode to native machine code at runtime. When a method is invoked many times, the JIT compiler spends time upfront compiling it, then all future calls run at native speed.

JIT Compilation Process

  1. Profiling: The interpreter collects execution statistics. Which methods are hot? What types flow through each call site? Which branches are taken most frequently?

  2. Compilation Request: When a method’s invocation count exceeds the JIT threshold, the compiler queues it for compilation.

  3. Code Generation: The JIT compiler generates native machine code. Different JVMs use different compilers: HotSpot uses C1 (client) and C2 (server); GraalVM uses Graal as a modern alternative.

  4. Installation: The compiled code replaces the interpreted version. Calls to this method now go directly to native code.

  5. Deoptimization: If assumptions are violated (e.g., a class gets loaded that invalidates type speculation), the JVM can revert to interpreted mode.

JIT Optimizations

The JIT compiler performs aggressive optimizations based on profiling data:

Inlining replaces method calls with the method body. This eliminates call overhead and enables further optimizations across method boundaries. The JIT inlines small methods and hot call sites automatically.

Dead Code Elimination removes code whose results are never used. If a computation is performed but its result is discarded, the JIT can eliminate the computation entirely.

Constant Folding evaluates constant expressions at compile time. If x is always 10, expressions like x + 5 become 15.

Loop Unrolling reduces loop overhead by performing multiple iterations per loop body evaluation. This reduces branching and enables vectorization.

Escape Analysis determines if an object escapes the method or thread. If an object does not escape, the JIT can allocate it on the stack instead of the heap, eliminating allocation and garbage collection overhead entirely.

Garbage Collection Architecture

The Garbage Collector automatically reclaims memory from objects that are no longer reachable. This eliminates dangling pointers and use-after-free bugs while reducing manual memory management burden.

GC Roots

The GC identifies reachable objects by tracing from GC roots. GC roots are objects that are inherently reachable:

  • Local variables and parameters on the JVM stack
  • Active Java threads
  • Static variables (loaded from class metadata)
  • JNI references
  • Exception objects currently being thrown

Anything not reachable from GC roots through object references is garbage.

GC Algorithms

Modern JVMs offer several GC algorithms, each optimized for different workload characteristics:

Serial GC uses a single thread for all GC work. It stops the world (STW) during collections. Appropriate for small heaps and single-threaded applications. Enabled with -XX:+UseSerialGC.

Parallel GC (Throughput GC) uses multiple threads to speed up GC. Designed to maximize throughput by completing GC quickly. Appropriate for batch workloads. Enabled with -XX:+UseParallelGC.

CMS (Concurrent Mark Sweep) aims to reduce pause times by doing most work concurrently with application threads. Deprecated in Java 9. Appropriate for interactive applications that cannot tolerate long pauses. Enabled with -XX:+UseConcMarkSweepGC.

G1 (Garbage First) divides the heap into regions and collects garbage region by region. It aims to meet pause time goals while delivering high throughput. Default GC since Java 9. Enabled with -XX:+UseG1GC.

ZGC and Shenandoah are ultra-low-pause collectors designed for applications requiring minimal latency. They perform most work concurrently, achieving pause times under 1 millisecond regardless of heap size. Enabled with -XX:+UseZGC or -XX:+UseShenandoahGC.

GC Phases

Most GC algorithms work in phases:

Mark identifies all reachable objects by tracing from GC roots. This phase is often done with the application paused (stop-the-world).

Remark completes marking by handling edge cases like objects modified during the concurrent mark phase.

Sweep/Compact reclaims memory from unreachable objects and moves surviving objects to eliminate fragmentation.

Young vs. Old Generation GC

Minor GC (young GC) collects the young generation. It is usually fast because the young generation is small and most objects are short-lived. Minor GC is stop-the-world but brief.

Major GC (old GC) collects the old generation. It is slower because the old generation is larger. The G1 algorithm refers to “mixed GC” when it collects both young and old regions.

Full GC collects the entire heap. It is the most disruptive and usually indicates a problem: either the heap is too small or a memory leak exists.

Production Failure Scenarios

ScenarioRoot CauseSymptomsResolution
GC ThrashingHeap too small, excessive short-lived objectsHigh CPU, frequent GC, application pausesIncrease heap, optimize allocation patterns
Long GC PausesWrong GC algorithm for workload, heap misconfigurationLatency spikes, timeoutsSwitch to low-pause GC (G1, ZGC), tune parameters
JIT Compilation OverheadExcessive hot methods, compilation queue backlogSudden CPU spikes during warmupTiered compilation flags, increase code cache
Deoptimization StormsVolatile type assumptions, class loading patternsPerformance instabilityReduce dynamic class loading, stabilize types
Metaspace GC PressureClassloader churnHigh GC CPU, metaspace growthAudit classloader lifecycle, limit dynamic proxies
Direct Buffer Memory ExhaustionExcessive NIO buffer allocationNative OOM, disk swap-XX:MaxDirectMemorySize, audit ByteBuffer usage

Trade-off Analysis

Trade-offConsiderations
Throughput vs. LatencyParallel GC maximizes throughput but causes long pauses. ZGC minimizes pauses but uses more CPU. G1 balances both.
Memory Footprint vs. FragmentationCompacting collectors eliminate fragmentation but use extra memory and CPU. Non-compacting collectors are faster but cause fragmentation.
JIT Compilation Time vs. Run TimeMore aggressive JIT compilation improves peak performance but increases startup time and memory usage.
Young vs. Old CollectionFrequent young collection is cheap but promotes objects prematurely. Rare old collection is expensive when it happens.
Concurrent vs. Stop-The-WorldConcurrent GC reduces pause times but uses CPU that could run application code. STW pauses are brief but still disruptive.

Failure Scenarios Deep Dive

Deoptimization Storms in Polymorphic Call Sites

When a method call site is polymorphic (calling different implementations based on receiver type), the JIT compiler initially generates monomorphic code assuming a single type. If multiple new types appear at that call site, the JIT must deoptimize and fall back to megamorphic dispatch, which is slower. This can cause “deoptimization storms” when class loading is uneven or when libraries dynamically generate proxy classes. Monitoring deoptimization counts with -XX:+PrintDeoptimization helps identify affected sites. The fix involves reducing dynamic proxy generation or forcing megamorphic dispatch earlier using @ForceInline hints.

CMS Concurrent Mode Failure

CMS (Concurrent Mark Sweep) can fail in two ways. First, “concurrent mode failure” occurs when the concurrent marking phase cannot complete before the old generation fills up, triggering a fallback to a stop-the-world full GC. Second, “promotion failed” occurs when the JVM tries to promote an object from young to old generation but finds insufficient contiguous space in the old generation. Both manifest as unexpected long pauses. The fix involves adjusting CMSInitiatingOccupancyFraction to start collection earlier, or switching to G1 which handles fragmentation better.

ZGC Heaps Larger Than Available Physical Memory

ZGC is designed for very large heaps (multi-terabyte range), but it requires all heap pages to be mapped in memory at all times. If the heap exceeds physical memory, ZGC will start swapping heavily, causing dramatic latency increases that defeat the purpose of using ZGC. The rule: ZGC works best when the heap fits comfortably in physical memory. For memory-constrained environments, G1 with proper tuning is often a better choice. Use ZAllocationSpikeTolerance to handle transient allocation spikes, but do not rely on it to compensate for undersized memory.

Implementation Patterns

// Forcing GC (not recommended for production, but useful for testing)
public class ForceGC {
    public static void main(String[] args) {
        // Make objects eligible for collection
        byte[] bigArray = new byte[1024 * 1024];
        bigArray = null;

        // Request GC
        System.gc();  // Hint to JVM, may be ignored

        // Better: run finalization
        System.runFinalization();
    }
}
// Soft references for caches (auto-reclaimed before OOM)
import java.lang.ref.SoftReference;

public class SoftReferenceCache<K, V> {
    private final Map<K, SoftReference<V>> cache = new HashMap<>();

    public V get(K key) {
        SoftReference<V> ref = cache.get(key);
        return ref != null ? ref.get() : null;
    }

    public void put(K key, V value) {
        cache.put(key, new SoftReference<>(value));
    }

    public void clear() {
        cache.clear();
    }
}
// Weak references for canonical mappings (reclaimed on GC)
import java.lang.ref.WeakHashMap;

public class WeakCache<K, V> {
    // Keys are weakly referenced - GC reclaims entry when key is weakly reachable
    private final WeakHashMap<K, V> cache = new WeakHashMap<>();

    public V get(K key) {
        return cache.get(key);
    }

    public void put(K key, V value) {
        cache.put(key, value);
    }
}
// Phantom references for cleanup actions (reclaimed after finalization)
import java.lang.ref.PhantomReference;
import java.lang.ref.ReferenceQueue;

public class ResourceCleanup {
    public static void main(String[] args) {
        ReferenceQueue<MyResource> queue = new ReferenceQueue<>();
        PhantomReference<MyResource> ref = new PhantomReference<>(
            new MyResource(), queue);

        // When MyResource is GC'd (after finalization),
        // the phantom reference is enqueued
        System.gc();

        // Check queue for cleanup
        PhantomReference<MyResource> polled = (PhantomReference<MyResource>) queue.poll();
        if (polled != null) {
            // Clean up native resources
            polled.clear();
        }
    }
}

Observability Checklist

  • Enable GC logging: -Xlog:gc*:file=gc.log:time,uptime,level,tags
  • Monitor pause times vs. latency SLAs
  • Track allocation rates (bytes/second promoted to old gen)
  • Monitor JIT compilation time and code cache size
  • Use jstat to track GC statistics: jstat -gcutil <pid> 1000
  • Monitor deoptimization counts
  • Watch for CMS/G1 old gen exhaustion patterns
  • Track metaspace reclamation after classloader cleanup
  • Use async-profiler or Java Flight Recorder for detailed analysis

Security Notes

JIT compilation introduces a security consideration: JIT compilers generate code at runtime. If an attacker can influence what code gets JIT compiled (through polymorphic call patterns, for example), they might be able to trigger generation of specific code sequences. Modern JVMs include protections against JIT spraying attacks.

The GC does not automatically sanitize memory. If you store sensitive data in objects and rely on GC to “erase” it, you may be disappointed. The GC reclaims the memory but does not overwrite it. Use explicit clearing (setting byte arrays to zero) for sensitive data that must not be accessible after use.

Common Pitfalls / Anti-Patterns

Assuming System.gc() forces garbage collection. System.gc() is only a hint to the JVM. The JVM may ignore it entirely. In production, never rely on System.gc() to manage memory.

Setting heap too small based on visible memory usage. The JVM uses more memory than the heap. Metaspace, code cache, thread stacks, native memory allocations, and internal JVM data structures all consume memory outside the heap.

Ignoring GC logs. GC logs are invaluable for diagnosing performance issues. They show pause times, allocation rates, and promotion patterns. Enable them in production.

Over-tuning GC parameters. Modern JVMs auto-tune well. Start with defaults and only tune after measuring. Premature tuning based on intuition often makes things worse.

Assuming more threads means better parallel GC performance. Parallel GC threads compete for CPU. Too many threads cause context switching overhead that outweighs parallelism benefits.

Treating G1 as always the best choice. G1 is the default but not optimal for all workloads. Throughput-focused batch jobs may run faster with Parallel GC. Ultra-low-latency requirements may need ZGC or Shenandoah.

Quick Recap Checklist

  • Interpreter runs bytecode initially, JIT compiles hot code to native
  • Tiered compilation balances startup speed and peak performance
  • JIT optimizations include inlining, dead code elimination, escape analysis
  • GC reclaims unreachable objects automatically
  • GC roots include stack variables, statics, active threads, JNI refs
  • Serial GC is single-threaded; Parallel uses multiple threads
  • G1 divides heap into regions, aims for pause time goals
  • ZGC and Shenandoah are ultra-low-pause collectors
  • Young GC collects short-lived objects; old GC collects long-lived ones
  • System.gc() is only a hint, not a command
  • Different GC algorithms prioritize different aspects (throughput vs. latency)

Interview Questions

1. Explain the difference between interpretation and JIT compilation.

Interpretation executes bytecode by reading each instruction and performing the corresponding action through a switch on the opcode. It has zero compilation overhead, so it starts fast, but each instruction incurs dispatch overhead. JIT compilation translates hot bytecode methods to native machine code at runtime. The first invocation is slow (compilation takes time), but subsequent calls run directly on the CPU without interpretation overhead. Modern JVMs use tiered compilation: interpreted code runs initially while the JIT compiler queues methods for compilation. Hot methods get compiled multiple times with increasing optimization levels.

2. What is escape analysis and how does it improve performance?

Escape analysis determines whether an object escapes the method or thread in which it was created. If an object does not escape (its reference is never stored somewhere that outlives the method, like a static field or returned from the method), the JIT compiler can perform optimizations. Most importantly, it can allocate the object on the stack instead of the heap. Stack allocation is faster (no atomic CAS operation needed) and eliminates garbage collection overhead for that object entirely. The object simply disappears when the method returns. This optimization is particularly powerful for small, short-lived objects inside hot methods.

3. How do Serial, Parallel, CMS, and G1 collectors differ?

Serial GC uses one thread for all GC work and stops the world during collections. Simple but slow. Parallel GC uses multiple threads for faster collection, aiming to maximize throughput rather than minimize pauses. CMS (Concurrent Mark Sweep) does most marking and sweeping work concurrently with application threads to reduce pause times, though it requires more CPU and memory overhead. G1 (Garbage First) divides the heap into equal-sized regions and collects garbage region by region, prioritizing regions with the most garbage. G1 aims to meet configurable pause time goals while maintaining good throughput. CMS is deprecated; G1 is the default.

4. What causes GC thrashing and how do you fix it?

GC thrashing happens when the heap is too small for the application's allocation rate. The JVM spends most of its time in GC, with young generation filling up immediately after each collection. Objects get promoted to old generation prematurely because they do not survive enough young collections. The fix involves increasing heap size (-Xmx), optimizing object allocation patterns to create fewer short-lived objects, choosing a different GC algorithm, or fixing memory leaks. Profile the application's allocation rate and set heap size to accommodate 2-3 young generations worth of allocation between young GCs.

5. What is the difference between Minor, Major, and Full GC?

Minor GC collects the young generation (Eden and Survivor spaces). It happens frequently and is usually brief because the young generation is small and most objects die young. Major GC collects the old generation. It is less frequent but slower because the old generation is larger. In G1, major GC is called mixed GC when it includes both young and old regions. Full GC collects the entire heap (young, old, and Metaspace if using CMS). It is the most disruptive and usually indicates a problem. Full GC can be triggered by old generation exhaustion, Metaspace pressure, explicit System.gc(), or metadata GC threshold.

6. How does tiered compilation balance startup speed against peak performance?

Tiered compilation works in stages. When an application starts, the JVM uses the interpreter (Tier 1) to execute bytecode immediately without compilation overhead. As methods become hot (invocation count exceeds thresholds), the client JIT compiler (C1) compiles them quickly with basic optimizations (Tier 2), providing faster execution while still profiling. Methods that remain hot are eventually compiled by the server JIT compiler (C2) with aggressive optimizations (Tier 3/4), producing peak performance. This approach avoids the warmup delay of waiting for full server compilation while still achieving high performance for sustained hot methods. The JVM automatically promotes and demotes methods between tiers based on invocation counts and profiling data.

7. What is the difference between ZGC and Shenandoah, and when would you choose one over the other?

Both ZGC and Shenandoah are ultra-low-pause collectors that do most work concurrently with application threads. The key difference is that ZGC is a colored pointer collector requiring load barriers on the critical path, while Shenandoah uses a Brooks pointer (an extra word per object) with a forwarding pointer for concurrent copying. ZGC pause times are typically under 1 millisecond regardless of heap size and are completely pause-free for the application. Shenandoah has slightly higher overhead due to the Brooks pointer indirection but works with older JDK versions (backported to Java 8). ZGC requires Java 11+ or 8u20+, while Shenandoah is available in Java 12+ (also backported to Java 8 via libraries like rednaxela's Shenandoah for OpenJDK). Choose ZGC for pure throughput on Java 11+, Shenandoah for wider JDK compatibility.

8. What causes JIT compilation to cause CPU spikes during application warmup?

JIT compilation is CPU-intensive work. When a hot method's invocation count exceeds the JIT threshold, the compiler thread(s) spend significant CPU time generating native code. With multiple methods crossing the threshold simultaneously (common after application startup or during request spikes), the compilation queue backs up and the compiler threads compete with application threads for CPU. The result is a sudden CPU spike that coincides with the JIT compilation activity. Mitigation strategies include: pre-warming methods during startup with replay-compilation files, using `-XX:+TieredCompilation` to spread compilation across more tiers with lower thresholds, increasing the code cache size with `-XX:ReservedCodeCacheSize`, or reducing the number of hot methods through code optimization.

9. What is the relationship between escape analysis, lock elision, and JIT optimization?

Escape analysis determines if an object escapes a method or thread. If an object does not escape, the JIT compiler can apply several optimizations. Lock elision (also called lock bypassing) occurs when an object is proven to be thread-local, meaning the lock on it can never be contended. The JIT removes the lock operation entirely, eliminating atomic operations and memory barriers. Scalar replacement replaces an object allocation with its component fields, allocating them as local variables instead. Stack allocation places the object on the stack instead of the heap, making it disappear automatically when the method returns. These optimizations compound: escape analysis enables lock elision, which can enable further inlining, which enables more escape analysis, forming an optimization loop.

10. How do soft references, weak references, and phantom references differ in behavior with the GC?

These reference types form a hierarchy of "reachability" from strongest to weakest. A soft reference is cleared at the JVM's discretion when memory is low. The GC will only reclaim soft references if the application is about to run out of memory. Use soft references for caches where you want memory-sensitive eviction. A weak reference is cleared as soon as the GC detects the referent is no longer strongly or softly reachable. The next GC cycle reclaims the object. Use WeakHashMap for canonicalizing mappings (like intern pools) where you want the entry removed when no other references exist. A phantom reference is never automatically cleared. The GC enqueues phantom references to a ReferenceQueue after the referent's memory is reclaimed (after finalization if applicable). Use phantom references for performing cleanup actions after an object is about to be reclaimed.

11. What is the purpose of the GC log and what information does it contain?

GC logs record every garbage collection event with timing, memory before/after, and pause duration. They are essential for diagnosing GC issues. A typical entry shows: GC type (young, mixed, full), which GC algorithm ran, heap usage before and after collection, promotion amounts, and pause time. With -Xlog:gc*:file=gc.log:time,uptime,level,tags, logs include timestamps and uptime. GC logs reveal patterns: frequent young GC with low reclaim rates indicates allocation issues; long pauses indicate GC algorithm mismatch; steady heap growth without full GC suggests a memory leak. Always enable GC logs in production for incident investigation.

12. What is the G1 garbage collector's approach to pause time goals?

G1 (Garbage First) lets you set a pause time target with -XX:MaxGCPauseMillis=200 (default ~200ms). G1 does not guarantee the target is always met but works to stay within it by limiting how much work (which regions) gets collected in a single pause. G1 divides the heap into equal-sized regions (typically 1-32MB). Instead of collecting an entire generation, G1 collects regions with the most garbage first—the "garbage first" strategy. This allows G1 to stop mid-collection if approaching the pause target, resuming next time. G1 is the default because it balances throughput and latency for most workloads.

13. What is the purpose of card marking in generational GC?

Card marking is a technique used in generational collectors to efficiently track old-to-young generation references (which matter for major GC marking). Rather than scanning all old generation objects, the heap is divided into "cards" (typically 512 bytes). When a reference from old generation to young generation is written, the JVM marks the corresponding card as "dirty." During young GC, only dirty cards need to be scanned to find cross-generation references that root from old generation. This reduces the scanning overhead significantly. The card table is a byte array indexed by heap address ranges. This optimization makes minor GC pause times predictable regardless of heap size.

14. What causes promotion failed in CMS and how do you fix it?

Promotion failed occurs when the JVM tries to promote an object from young to old generation during minor GC but finds insufficient contiguous space in the old generation—the old generation is fragmented. The minor GC completes, but immediately triggers a Full GC to compact the old generation, causing an unexpectedly long pause. This happens when old generation fills up gradually through promotions and then a large object needs space. Fixes: increase old generation size, reduce promotion rate by sizing young generation larger, use G1 which handles fragmentation better through region-based collection, or switch to a compacting collector like Parallel GC.

15. What is the difference between the client JIT compiler (C1) and the server JIT compiler (C2)?

C1 (client compiler) focuses on fast compilation with basic optimizations, suitable for short-running applications where quick startup matters. It uses simpler heuristics and less aggressive inlining. C2 (server compiler) performs intensive optimizations—aggressive inlining, sophisticated escape analysis, loop optimizations, and speculative optimizations—but requires longer compilation time and more resources. C2 produces higher-quality native code for long-running hot methods. In tiered compilation, methods start with C1 and migrate to C2 when they remain hot enough for peak optimization.

16. How does the GC identify and handle objects with finalizers?

Objects with finalize() methods are treated specially: after GC determines an object is unreachable, it is put on a queue for finalization. The finalizer thread runs finalize() methods asynchronously. Only after finalization does the object's memory become reclaimable. This delays reclamation—finalizable objects linger at least one GC cycle longer. Additionally, if a finalizer method objects a reference to the dying object, the object becomes reachable again. Because of these quirks and performance costs, avoid finalizers; use try-with-resources or Cleaner for cleanup instead.

17. What is the purpose of GC ergonomics in the JVM?

GC ergonomics is the JVM's self-tuning mechanism for GC. When -XX:+UseAdaptiveSizePolicy is enabled (default for G1 and Parallel GC), the JVM automatically adjusts heap sizes, young/old generation ratios, Survivor space sizes, and tenuring thresholds based on actual GC behavior and application behavior. It aims to meet user-specified pause time goals (-XX:MaxGCPauseMillis) or maximize throughput (-XX:GCTimeRatio). Ergonomics allows the JVM to adapt to workload changes without manual tuning. For most applications, letting ergonomics work without manual intervention produces reasonable results.

18. What is the difference between a stop-the-world GC and a concurrent GC?

Stop-the-world (STW) pauses all application threads during critical GC phases like marking or compacting. This ensures consistency but causes application latency. Serial and Parallel GC are entirely STW. Concurrent GC (CMS, G1, ZGC, Shenandoah) performs most work concurrently with application threads, minimizing pause times. Concurrent marking still has initial and final STW phases (initial mark and remark) for consistency. ZGC and Shenandoah achieve sub-millisecond pauses through colored pointers and concurrent relocation. The tradeoff is concurrent GC uses more CPU and may have slightly lower throughput.

19. What is humongous allocation in G1 and why does it affect performance?

In G1, objects larger than 50% of a region size are called "humongous" and handled specially. Instead of being allocated in a regular region, they are allocated in a contiguous set of humongous regions. G1 must scan these regions during young GC and remark phases, and they are reclaimed with a special "humongous reclamation" that scans all regions in the set. If humongous objects are frequently allocated and freed, they cause fragmentation and increase pause times because G1 cannot efficiently evacuate them like regular objects. Applications with many large short-lived objects (like large caches) may experience G1 performance degradation.

20. How does the JIT compiler perform method inlining and what are its limits?

The JIT compiler inlines methods by replacing call instructions with the method's bytecode body, eliminating call overhead (stack frame setup, parameter passing, return handling). Inlining enables other optimizations that span method boundaries. The JIT uses several criteria: method size (small methods are always inlined), call site frequency (hot call sites are aggressively inlined), and heuristics for virtual calls (monomorphic calls are inlined directly; megamorphic calls use type profile). Limits include the inline budget (too much inlined code exceeds code cache), method complexity (loops and recursion are rarely inlined), and compilation level (C2 inlines more aggressively than C1). The @ForceInline annotation forces inlining; @DontInline prevents it.

Further Reading

Conclusion

The Execution Engine converts bytecode to native code through interpretation and JIT compilation, while the Garbage Collector reclaims unreachable objects through various algorithms (Serial, Parallel, CMS, G1, ZGC, Shenandoah). Tiered compilation balances startup speed against peak performance, and JIT optimizations like inlining, dead code elimination, and escape analysis dramatically improve runtime efficiency. Choose the right GC algorithm based on your latency vs. throughput requirements.

Category

Related Posts

Deoptimization Debugging: When JIT Compiled Code Reverts

Learn what causes the JVM to deoptimize JIT-compiled code, how to detect deoptimization events, and how to fix the underlying issues.

#jvm #jit #deoptimization

JIT Compilation Internals

Understand how the JVM's Just-In-Time compiler detects hot code, applies compilation thresholds, and manages the code cache for peak performance.

#java #jit #jvm

JIT Optimization: Inlining, Escape Analysis, Dead Code Elimination

Understand how JVM JIT compiler optimizes code through inlining, escape analysis, and dead code elimination for peak application performance.

#jvm #jit #compiler-optimization