Serial vs Parallel GC: Throughput-Focused Collectors

A practical guide to Serial and Parallel garbage collectors - how they work, when to use each, throughput trade-offs, and production tuning tips.

published: May 26, 2026 reading time: 19 min read author: GeekWorkBench

Serial vs Parallel GC: Throughput-Focused Garbage Collectors

Serial and Parallel are the oldest garbage collectors in the JVM. They share a philosophy: maximize how much work the application does versus how much time GC steals. They are not flashy. No sub-millisecond pauses here. What they deliver is raw throughput - the highest percentage of CPU time possible going to your application instead of the garbage collector.

This covers how each works, when to use one versus the other, and the tuning flags that actually matter.

Introduction

Serial and Parallel are the workhorse garbage collectors in the JVM — the oldest collectors, with the simplest design, and the ones that deliver the highest raw throughput when configured correctly. They share a philosophy: maximize the percentage of CPU time that goes to your application rather than to GC. They are not flashy; there are no sub-millisecond pauses here. What they deliver is efficiency — the most application work per unit of wall-clock time on a given hardware configuration.

The key difference between them is that Serial uses a single GC thread while Parallel uses multiple threads to perform concurrent mark, sweep, and compaction. This makes Parallel significantly faster on multi-core systems for throughput-bound workloads. Understanding when to reach for one versus the other — or when to skip both in favor of a low-pause collector like G1, ZGC, or Shenandoah — requires knowing the trade-off between throughput, latency, and footprint. This post walks through how each collector works internally, the JVM flags that control them, and the concrete scenarios where each is the right choice.

When to Use This Knowledge

Use when:

Building batch processing, ETL, or data pipeline applications where throughput matters most
Running on single-core or resource-constrained environments
Debugging why your application is not utilizing CPU fully during GC
Choosing a GC strategy before reaching for the low-latency collectors (G1, ZGC, Shenandoah)

Do not use when:

Your application requires low pause times (sub-second GC pauses are unacceptable)
You are running on machines with many cores under reactive workloads
You have strict latency SLAs that GC pauses would violate

When NOT to Use This Knowledge

If your batch pipeline runs with multi-second pauses and CPU utilization is within budget, tuning ParallelGCThreads or SurvivorRatio is effort wasted. The default JVM ergonomics handle these scenarios fine without intervention.

The trap is thinking you need to master GC algorithms before you have a GC problem. Profilers and monitoring should guide your tuning. If your GC logs show healthy collection frequencies and pause times within your SLA thresholds, leave the defaults alone. Premature tuning based on benchmarks rather than production metrics leads to brittle configurations that are hard to maintain.

For applications needing sub-200ms response times under load, Serial and Parallel GC are the wrong tool. Evaluate G1, ZGC, or Shenandoah instead. Spending time on Parallel GC internals for a low-latency use case is a dead end. For continued learning, explore the Advanced Java & JVM Internals roadmap.

The Throughput Goal

Throughput-focused collectors optimize for this equation:

Throughput = Application Time / (Application Time + GC Time)

If your application runs for 1000ms and spends 50ms in GC, your throughput is 1000 / 1050 = 95.2%. Serial and Parallel collectors push this number as high as possible. They do not care about individual pause times as long as the overall ratio looks good.

Serial GC

Serial GC uses a single thread for all garbage collection work. It stops the application world (Stop-The-World) while it runs.

graph TB
    subgraph SerialGC["Serial GC - Single Thread"]
        direction TB
        APP["Application Thread PAUSED"] --> STW1["Mark Phase<br/>(single thread)"]
        STW1 --> STW2["Sweep/Compact<br/>(single thread)"]
        STW2 --> RESUME["Resume Application"]
    end

How It Works

Young generation uses a stop-the-world copying collector (similar to the basic Copying algorithm)
Old generation uses a stop-the-world Mark-Compact collector
Everything runs on a single thread
The world stops while GC runs

When to Use Serial GC

Serial GC is viable in specific scenarios:

Very small heaps (under 100MB) - no benefit to parallelism overhead
Single-core machines - multiple threads would compete for CPU anyway
Batch jobs where pauses do not matter - ETL pipelines, nightly batch processing
Resource-constrained containers - avoids thread creation overhead

JVM Flags for Serial GC

-XX:+UseSerialGC
-Xms512m -Xmx512m    # Small, stable heap

Key Characteristics

Aspect	Behavior
GC threads	1
Young gen collector	Copying (stop-the-world)
Old gen collector	Mark-Compact (stop-the-world)
Throughput	Moderate - limited by single thread
Memory footprint	Lowest
CPU overhead	Minimal
Pause times	Longest per collection

Parallel GC

Parallel GC (also called Throughput GC or Parallel Old GC) uses multiple threads for GC work. It is the default collector on most JVM installations for server workloads.

graph TB
    subgraph ParallelGC["Parallel GC - Multiple Threads"]
        direction TB
        APP["Application Threads PAUSED"] --> PT["Parallel Mark Phase<br/>(N threads)"]
        PT --> ST["Parallel Sweep/Compact<br/>(N threads)"]
        ST --> RESUME["Resume Application"]
    end

How It Works

Young generation uses a multi-threaded stop-the-world copying collector
Old generation uses a multi-threaded stop-the-world Mark-Compact collector
Number of GC threads controlled by -XX:ParallelGCThreads=N
Default thread count: max(Runtime.availableProcessors() - 1, 8) for large machines
The world stops while GC runs on multiple threads

When to Use Parallel GC

Parallel GC shines when:

CPU is abundant - many cores available for GC work
Throughput is the primary metric - batch processing, computation-heavy workloads
Pause times in the seconds range are acceptable - most pauses are under 1 second but can be longer
Memory is plentiful - parallel GC benefits from larger heaps where work can be divided

JVM Flags for Parallel GC

-XX:+UseParallelGC          # Parallel Young Gen
-XX:+UseParallelOldGC      # Parallel Old Gen (usually enabled with UseParallelGC)
-Xms4g -Xmx4g              # Heap sized for your workload
-XX:ParallelGCThreads=16   # Explicit thread count (optional)
-XX:+UseAdaptiveSizePolicy  # Enable adaptive heap sizing (default ON)

Key Characteristics

Aspect	Behavior
GC threads	Multiple (CPU-dependent)
Young gen collector	Parallel Copying (stop-the-world)
Old gen collector	Parallel Mark-Compact (stop-the-world)
Throughput	Highest of all collectors
Memory footprint	Standard
CPU overhead	Higher than Serial
Pause times	Shorter than Serial per unit of work, but still Stop-The-World

Comparing Serial and Parallel

Aspect	Serial GC	Parallel GC
Threads	1	Many
Throughput	Lower	Highest
Pause time per GC	Longer	Shorter per unit of work
CPU overhead	Minimal	Higher
Memory overhead	Minimal	Slight (thread stacks)
Best for	Small heaps, single core	Large heaps, multi-core batch
Default for	Client-mode JVMs	Server-mode JVMs

Production Failure Scenarios

1. Full GC Freezes in Parallel GC

Symptom: Application freezes for several seconds during old generation collection. GC logs show Full GC with long pause times.

Cause: Even with multiple threads, a full heap Mark-Compact is expensive. With many live objects, compaction can take seconds.

Solution: Reduce heap size to reduce work per GC cycle. Or switch to G1/ZGC for shorter pauses. Profile to confirm this is the actual GC cause.

# Example GC log showing long Full GC
[Full GC (Allocation Failure) [PSYoungGen: 512K->0K(1536K)] [ParOldGen: 4095M->4095M(4096M)] 4.1234567secs]

2. Excessive GC Thread Contention

Symptom: High CPU usage during GC but low overall throughput. GC threads fighting for CPU on a loaded system.

Cause: Too many GC threads (-XX:ParallelGCThreads) on a system with limited cores, or GC threads competing with application threads.

Solution:

# Limit GC threads to avoid contention
-XX:ParallelGCThreads=8

# Or use fewer threads than available cores
# Rule of thumb: leave 1-2 cores for application threads

3. Adaptive Size Policy Misfire

Symptom: Heap resizing causes performance instability - throughput spikes and dips unpredictably.

Cause: -XX:+UseAdaptiveSizePolicy automatically adjusts heap sizes based on allocation behavior, which can cause resizing during critical production periods.

Solution:

# Disable adaptive sizing
-XX:-UseAdaptiveSizePolicy

# Or set explicit ratios
-XX:NewRatio=2
-XX:SurvivorRatio=8

Implementation Snippets

Enabling and Configuring Parallel GC

# Basic Parallel GC configuration
java -XX:+UseParallelGC \
     -XX:+UseParallelOldGC \
     -Xms4g -Xmx4g \
     -XX:ParallelGCThreads=16 \
     -XX:+PrintGCDetails \
     -XX:+PrintGCDateStamps \
     -Xlog:gc*:file=gc.log \
     -jar myapp.jar

Reading Parallel GC Logs

Parallel GC logs show which phases are running and how long each takes:

[Full GC [PSYoungGen: 512K->0K(1536K)] [ParOldGen: 4095M->4095M(4096M)] 4.1234567secs]
[Times: user=32.10 sys=1.20, real=4.12 secs]

user = CPU time across all threads (32 seconds of CPU work done in 4 seconds wall time = 8 threads)
sys = time spent in kernel calls
real = actual wall clock time (should be close to user/N for N threads)

Checking Active GC Algorithm

import java.lang.management.*;
import java.util.*;

public class WhichGC {
    public static void main(String[] args) {
        List<GarbageCollectorMXBean> gcs = ManagementFactory.getGarbageCollectorMXBeans();
        for (GarbageCollectorMXBean gc : gcs) {
            System.out.println("Collector: " + gc.getName());
            System.out.println("  Pools: " + Arrays.toString(gc.getMemoryPoolNames()));
        }

        // Check if adaptive sizing is on
        RuntimeMXBean runtime = ManagementFactory.getRuntimeMXBean();
        System.out.println("Input args: " + runtime.getInputArgs());
    }
}

Observability Checklist

Enable GC logging: -Xlog:gc*:file=gc.log
Parse GC logs for pause time distribution: look for real= vs user= to spot contention
Monitor jstat -gc <pid> for FC (Full GC count) and FGCT (Full GC time)
Track heap occupancy before and after Full GC to understand live set size
Watch CPU usage during GC events - high sys time may indicate OS-level overhead
Use -XX:PrintGCApplicationConcurrentTime to measure time between GC pauses
Consider NMT (Native Memory Tracking): -XX:NativeMemoryTracking=summary

Security Notes

GC Logs can reveal application allocation patterns and memory behavior - protect them
JMX/Management Interface access to GC beans should be restricted in production
Heap Dumps after OOM contain full application state - handle with care
NMT output can reveal native memory allocation details useful for attacks

Common Pitfalls / Anti-Patterns

Pitfall	What happens	Fix
Too many GC threads	GC threads fight application threads for CPU	Set `-XX:ParallelGCThreads` explicitly
AdaptiveSizePolicy on in production	Unpredictable heap resizing	Disable with `-XX:-UseAdaptiveSizePolicy`
Setting heap too large	Fewer but longer Full GCs	Tune based on live set size
Setting heap too small	Constant GC thrashing	Profile to find working set size
Ignoring old gen occupancy	Promotion failure causes Full GC	Monitor promotion rate with `jstat`

Quick Recap Checklist

Serial GC = single thread, lowest throughput, lowest overhead
Parallel GC = multiple threads, highest throughput, default for server workloads
Throughput = Application Time / (Application Time + GC Time)
Parallelism helps when CPU cores are plentiful and work is divisible
Both are Stop-The-World collectors - pauses freeze the entire application
-XX:ParallelGCThreads controls thread count - tune based on available cores
-XX:+UseAdaptiveSizePolicy auto-tunes heap sizes but can cause instability
For low-latency requirements, look at G1, ZGC, or Shenandoah

Interview Questions

1. What is the difference between Serial and Parallel GC?

Serial GC uses one thread for all garbage collection work. Parallel GC uses multiple threads. On a multi-core machine, Parallel GC finishes GC work faster because it divides the work across threads, but both are stop-the-world collectors - the application freezes while either one runs. Parallel GC targets throughput; Serial GC targets simplicity and low overhead.

2. How do you determine the right number of GC threads for Parallel GC?

The JVM defaults to `max(availableProcessors - 1, 8)` on most machines. On a 32-core machine, that is 31 threads for GC, which is usually too many - GC threads compete with application threads for CPU. A common rule of thumb is to leave 1-2 cores for application threads and allocate the rest to GC, but this varies by workload. If your GC logs show high system time or low CPU utilization during GC, you probably have too many threads.

3. What does the `user=` and `real=` time in GC logs mean?

`user=` is total CPU time across all threads. `real=` is wall clock time. If you see `user=32.10 real=4.12`, that means 32 seconds of CPU time was used across threads in 4 seconds of wall time - roughly 8 threads doing work. If `real` is close to `user` divided by thread count, GC scaled well. If `real` is much higher, something is causing contention or serialization.

4. When would you choose Serial GC over Parallel GC?

Single-core machines, very small heaps (under 100MB), or batch workloads where pause times genuinely do not matter. Parallel GC has overhead from thread management and synchronization that is not worth it on a single-core or resource-constrained environment. If you are running a small container with strict memory limits and one CPU, Serial GC is often the better choice.

5. What is promotion failure and how does it relate to Parallel GC?

Promotion failure happens when an object cannot be promoted from young to old generation because there is not enough contiguous space. In Parallel GC, this triggers a Full GC. It is often caused by objects too large to fit in Survivor spaces, or by the old generation becoming fragmented. Tuning SurvivorRatio and NewSize/MaxNewSize helps prevent premature promotion that leads to this scenario.

6. Why is Parallel GC the default for server workloads?

Server workloads typically run on multi-core machines and care about throughput. Parallel GC maximizes application throughput by minimizing total GC time, even if individual pauses are longer. For batch processing, ETL, and computation-heavy workloads where pauses are acceptable, Parallel GC delivers the best overall performance. The trade-off is longer stop-the-world pauses, which is fine when your SLA is measured in minutes, not milliseconds.

7. What is the relationship between -XX:ParallelGCThreads and available CPU cores?

GC threads compete with application threads for CPU time. If you have 16 cores and set ParallelGCThreads to 16, GC and your application fight for the same cores. A common recommendation is to set ParallelGCThreads to (available cores - 1) or (available cores - 2), leaving cores for application threads. On a 32-core machine, 30 threads may seem aggressive since GC threads context-switch heavily when they outnumber physical cores.

8. How does UseAdaptiveSizePolicy interact with explicit heap sizing?

When UseAdaptiveSizePolicy is enabled (the default), the JVM monitors allocation and survival rates and dynamically adjusts heap region sizes - even if you set explicit values with -Xmn, -XX:NewRatio, etc. The policy can override your explicit settings within bounds. In production, this causes unpredictable pause spikes when resizing occurs. Disable adaptive sizing with -XX:-UseAdaptiveSizePolicy if you need deterministic behavior.

9. What does "Throughput = Application Time / (Application Time + GC Time)" mean in practice?

It measures what percentage of total time your application runs versus time spent in GC. If your app runs 95 seconds and GC takes 5 seconds, throughput is 95%. Parallel GC targets 99%+ throughput on well-tuned workloads. A 50% throughput means GC is stealing half your CPU - usually indicates heap is too small or allocation rate is extremely high.

10. What causes Full GC to run with Parallel GC and how do you diagnose it?

Full GC triggers when old gen cannot accommodate a promotion from young gen, when System.gc() is called, or when Metaspace is exhausted. In GC logs, look for "Full GC (Allocation Failure)" which indicates promotion failure. Use jstat -gc to monitor old gen capacity (OC) versus used (OU). If OU approaches OC frequently, either increase heap or tune promotion rate with SurvivorRatio and MaxTenuringThreshold.

11. What is the difference between UseParallelGC and UseParallelOldGC?

UseParallelGC enables parallel young generation collection (multi-threaded copying). UseParallelOldGC enables parallel old generation collection (multi-threaded Mark-Compact). They are usually enabled together because the old generation collector should also be parallel for throughput workloads. Enabling only UseParallelGC gives you parallel young gen with serial old gen - a mixed configuration that rarely performs well.

12. Why does setting -Xms different from -Xmx cause performance issues in production?

When -Xms < -Xmx, the JVM grows the heap as needed. Growing triggers a GC cycle to free space for growth, and shrinking triggers a GC to consolidate before releasing memory. Both introduce unpredictable pause spikes. The JVM ergonomics for heap growth and shrink rates may not match your workload's allocation patterns, causing oscillation. Always set them equal in production for predictable performance.

13. How does thread local allocation buffers (TLABs) affect Parallel GC performance?

TLABs give each thread a private allocation buffer in Eden, reducing contention on the global allocation path. Threads allocate in their TLAB with minimal synchronization. This is especially important in Parallel GC where multiple threads allocate rapidly - without TLABs, they would all compete for the same lock. The JVM handles TLAB sizing automatically, but you can tune with -XX:TLABSize and -XX:+ResizeTLAB.

14. What is the trade-off between heap size and GC frequency in Parallel GC?

Larger heap means fewer GC cycles but longer pauses per cycle (more objects to mark and compact). Smaller heap means more frequent GC cycles but shorter pauses. For Parallel GC targeting throughput, larger heaps usually win because total GC time decreases even if individual pauses are longer. For latency-sensitive workloads, smaller heaps with G1 or ZGC are better because they spread work more evenly.

15. What does "ParallelGCThreads scaling beyond available cores" mean?

GC threads are CPU-bound. If you have 8 physical cores and set 32 GC threads, the OS must time-slice those threads, causing context-switch overhead. In practice, GC threads should not exceed roughly the number of physical cores (not logical cores with hyperthreading) minus 1-2 for application threads. On hyperthreaded cores, you get partial parallelism but not full scaling - a 16-core machine with 32 logical cores may only need 13-14 GC threads.

16. How does Parallel GC handle big objects and what happens when they exceed Survivor space?

When an object larger than the entire Survivor space is allocated, it bypasses young generation entirely via -XX:PretenureSizeThreshold and goes directly to old generation. This is sometimes the right behavior for large, long-lived objects like caches or connection pools. When objects exceed Survivor space but are smaller than the threshold, they may be promoted immediately after one minor GC if they survive. Tuning PretenureSizeThreshold correctly prevents these objects from flooding candidates for aging.

17. What is the relationship between GC threads and application throughput?

Throughput = Application Time / (Application Time + GC Time). More GC threads reduce GC time for the same amount of work but add synchronization overhead and CPU contention. With too many GC threads, context switching and cache contention reduce efficiency. With too few, GC takes longer, eating into application time. The sweet spot depends on the number of physical cores and whether the workload is CPU-bound or I/O-bound.

18. How does UseParallelGC interact with G1 or other collectors that might be configured simultaneously?

UseParallelGC only affects the young generation collector; old generation collection depends on UseParallelOldGC setting. You should not mix collectors - enabling UseParallelGC with UseG1GC produces unpredictable results. The correct approach is to use either Parallel GC (both young and old gen use Mark-Compact) or switch entirely to G1 or another collector. Parallel GC and G1 have incompatible heap layouts and cannot be mixed.

19. What happens when the JVM runs out of heap memory with Parallel GC?

When heap is exhausted with Parallel GC, the JVM triggers a full stop-the-world GC. If full GC still cannot free enough memory (because Live Set + new allocation exceeds heap), an OutOfMemoryError is thrown with "Java heap space" message. The OOM message includes the size of the heap allocation request that failed. Frequent OOM despite reasonable heap size indicates either memory leaks, objects being promoted too aggressively to old gen, or allocation rate exceeding what the heap can handle.

20. Why might an application using Parallel GC show high CPU usage even when throughput appears acceptable?

High CPU with "acceptable" throughput often means GC threads are consuming CPU that application threads could use. If GC logs show user time high but real time also high proportionally, GC is scaling well. But if user time is high while real time is close to user time divided by threads, GC is not scaling and threads are fighting for resources. Another cause is excessive allocation rate requiring frequent minor GC or Full GC. Check for memory leaks causing frequent Full GC, premature promotion flooding old gen, or TLAB sizing issues causing allocation contention.

Conclusion

Serial GC uses one thread for all GC work; Parallel GC uses multiple threads for higher throughput. Both are stop-the-world collectors that freeze the application during collections. Parallel GC is the default for server workloads on multi-core machines and maximizes throughput at the cost of longer individual pauses. Serial GC suits small heaps and single-core environments. For low-latency needs, look to G1, ZGC, or Shenandoah instead.

Serial vs Parallel GC: Throughput-Focused Garbage Collectors

Introduction

When to Use This Knowledge

When NOT to Use This Knowledge

The Throughput Goal

Serial GC

How It Works

When to Use Serial GC

JVM Flags for Serial GC

Key Characteristics

Parallel GC

How It Works

When to Use Parallel GC

JVM Flags for Parallel GC

Key Characteristics

Comparing Serial and Parallel

Production Failure Scenarios

1. Full GC Freezes in Parallel GC

2. Excessive GC Thread Contention

3. Adaptive Size Policy Misfire

Implementation Snippets

Enabling and Configuring Parallel GC

Reading Parallel GC Logs

Checking Active GC Algorithm

Observability Checklist

Security Notes

Common Pitfalls / Anti-Patterns

Quick Recap Checklist

Interview Questions

Further Reading

Conclusion

Category

Tags

Related Posts

CMS and G1 Collectors: Low-Latency Garbage Collection

GC Fundamentals: Mark-Compact, Copying, and Mark-Sweep

JVM GC Tuning: Heap Sizing and Threshold Optimization