Serial vs Parallel GC: Throughput-Focused Collectors
A practical guide to Serial and Parallel garbage collectors - how they work, when to use each, throughput trade-offs, and production tuning tips.
Serial vs Parallel GC: Throughput-Focused Garbage Collectors
Serial and Parallel are the oldest garbage collectors in the JVM. They share a philosophy: maximize how much work the application does versus how much time GC steals. They are not flashy. No sub-millisecond pauses here. What they deliver is raw throughput - the highest percentage of CPU time possible going to your application instead of the garbage collector.
This covers how each works, when to use one versus the other, and the tuning flags that actually matter.
Introduction
Serial and Parallel are the workhorse garbage collectors in the JVM — the oldest collectors, with the simplest design, and the ones that deliver the highest raw throughput when configured correctly. They share a philosophy: maximize the percentage of CPU time that goes to your application rather than to GC. They are not flashy; there are no sub-millisecond pauses here. What they deliver is efficiency — the most application work per unit of wall-clock time on a given hardware configuration.
The key difference between them is that Serial uses a single GC thread while Parallel uses multiple threads to perform concurrent mark, sweep, and compaction. This makes Parallel significantly faster on multi-core systems for throughput-bound workloads. Understanding when to reach for one versus the other — or when to skip both in favor of a low-pause collector like G1, ZGC, or Shenandoah — requires knowing the trade-off between throughput, latency, and footprint. This post walks through how each collector works internally, the JVM flags that control them, and the concrete scenarios where each is the right choice.
When to Use This Knowledge
Use when:
- Building batch processing, ETL, or data pipeline applications where throughput matters most
- Running on single-core or resource-constrained environments
- Debugging why your application is not utilizing CPU fully during GC
- Choosing a GC strategy before reaching for the low-latency collectors (G1, ZGC, Shenandoah)
Do not use when:
- Your application requires low pause times (sub-second GC pauses are unacceptable)
- You are running on machines with many cores under reactive workloads
- You have strict latency SLAs that GC pauses would violate
When NOT to Use This Knowledge
If your batch pipeline runs with multi-second pauses and CPU utilization is within budget, tuning ParallelGCThreads or SurvivorRatio is effort wasted. The default JVM ergonomics handle these scenarios fine without intervention.
The trap is thinking you need to master GC algorithms before you have a GC problem. Profilers and monitoring should guide your tuning. If your GC logs show healthy collection frequencies and pause times within your SLA thresholds, leave the defaults alone. Premature tuning based on benchmarks rather than production metrics leads to brittle configurations that are hard to maintain.
For applications needing sub-200ms response times under load, Serial and Parallel GC are the wrong tool. Evaluate G1, ZGC, or Shenandoah instead. Spending time on Parallel GC internals for a low-latency use case is a dead end. For continued learning, explore the Advanced Java & JVM Internals roadmap.
The Throughput Goal
Throughput-focused collectors optimize for this equation:
Throughput = Application Time / (Application Time + GC Time)
If your application runs for 1000ms and spends 50ms in GC, your throughput is 1000 / 1050 = 95.2%. Serial and Parallel collectors push this number as high as possible. They do not care about individual pause times as long as the overall ratio looks good.
Serial GC
Serial GC uses a single thread for all garbage collection work. It stops the application world (Stop-The-World) while it runs.
graph TB
subgraph SerialGC["Serial GC - Single Thread"]
direction TB
APP["Application Thread PAUSED"] --> STW1["Mark Phase<br/>(single thread)"]
STW1 --> STW2["Sweep/Compact<br/>(single thread)"]
STW2 --> RESUME["Resume Application"]
end
How It Works
- Young generation uses a stop-the-world copying collector (similar to the basic Copying algorithm)
- Old generation uses a stop-the-world Mark-Compact collector
- Everything runs on a single thread
- The world stops while GC runs
When to Use Serial GC
Serial GC is viable in specific scenarios:
- Very small heaps (under 100MB) - no benefit to parallelism overhead
- Single-core machines - multiple threads would compete for CPU anyway
- Batch jobs where pauses do not matter - ETL pipelines, nightly batch processing
- Resource-constrained containers - avoids thread creation overhead
JVM Flags for Serial GC
-XX:+UseSerialGC
-Xms512m -Xmx512m # Small, stable heap
Key Characteristics
| Aspect | Behavior |
|---|---|
| GC threads | 1 |
| Young gen collector | Copying (stop-the-world) |
| Old gen collector | Mark-Compact (stop-the-world) |
| Throughput | Moderate - limited by single thread |
| Memory footprint | Lowest |
| CPU overhead | Minimal |
| Pause times | Longest per collection |
Parallel GC
Parallel GC (also called Throughput GC or Parallel Old GC) uses multiple threads for GC work. It is the default collector on most JVM installations for server workloads.
graph TB
subgraph ParallelGC["Parallel GC - Multiple Threads"]
direction TB
APP["Application Threads PAUSED"] --> PT["Parallel Mark Phase<br/>(N threads)"]
PT --> ST["Parallel Sweep/Compact<br/>(N threads)"]
ST --> RESUME["Resume Application"]
end
How It Works
- Young generation uses a multi-threaded stop-the-world copying collector
- Old generation uses a multi-threaded stop-the-world Mark-Compact collector
- Number of GC threads controlled by
-XX:ParallelGCThreads=N - Default thread count:
max(Runtime.availableProcessors() - 1, 8)for large machines - The world stops while GC runs on multiple threads
When to Use Parallel GC
Parallel GC shines when:
- CPU is abundant - many cores available for GC work
- Throughput is the primary metric - batch processing, computation-heavy workloads
- Pause times in the seconds range are acceptable - most pauses are under 1 second but can be longer
- Memory is plentiful - parallel GC benefits from larger heaps where work can be divided
JVM Flags for Parallel GC
-XX:+UseParallelGC # Parallel Young Gen
-XX:+UseParallelOldGC # Parallel Old Gen (usually enabled with UseParallelGC)
-Xms4g -Xmx4g # Heap sized for your workload
-XX:ParallelGCThreads=16 # Explicit thread count (optional)
-XX:+UseAdaptiveSizePolicy # Enable adaptive heap sizing (default ON)
Key Characteristics
| Aspect | Behavior |
|---|---|
| GC threads | Multiple (CPU-dependent) |
| Young gen collector | Parallel Copying (stop-the-world) |
| Old gen collector | Parallel Mark-Compact (stop-the-world) |
| Throughput | Highest of all collectors |
| Memory footprint | Standard |
| CPU overhead | Higher than Serial |
| Pause times | Shorter than Serial per unit of work, but still Stop-The-World |
Comparing Serial and Parallel
| Aspect | Serial GC | Parallel GC |
|---|---|---|
| Threads | 1 | Many |
| Throughput | Lower | Highest |
| Pause time per GC | Longer | Shorter per unit of work |
| CPU overhead | Minimal | Higher |
| Memory overhead | Minimal | Slight (thread stacks) |
| Best for | Small heaps, single core | Large heaps, multi-core batch |
| Default for | Client-mode JVMs | Server-mode JVMs |
Production Failure Scenarios
1. Full GC Freezes in Parallel GC
Symptom: Application freezes for several seconds during old generation collection. GC logs show Full GC with long pause times.
Cause: Even with multiple threads, a full heap Mark-Compact is expensive. With many live objects, compaction can take seconds.
Solution: Reduce heap size to reduce work per GC cycle. Or switch to G1/ZGC for shorter pauses. Profile to confirm this is the actual GC cause.
# Example GC log showing long Full GC
[Full GC (Allocation Failure) [PSYoungGen: 512K->0K(1536K)] [ParOldGen: 4095M->4095M(4096M)] 4.1234567secs]
2. Excessive GC Thread Contention
Symptom: High CPU usage during GC but low overall throughput. GC threads fighting for CPU on a loaded system.
Cause: Too many GC threads (-XX:ParallelGCThreads) on a system with limited cores, or GC threads competing with application threads.
Solution:
# Limit GC threads to avoid contention
-XX:ParallelGCThreads=8
# Or use fewer threads than available cores
# Rule of thumb: leave 1-2 cores for application threads
3. Adaptive Size Policy Misfire
Symptom: Heap resizing causes performance instability - throughput spikes and dips unpredictably.
Cause: -XX:+UseAdaptiveSizePolicy automatically adjusts heap sizes based on allocation behavior, which can cause resizing during critical production periods.
Solution:
# Disable adaptive sizing
-XX:-UseAdaptiveSizePolicy
# Or set explicit ratios
-XX:NewRatio=2
-XX:SurvivorRatio=8
Implementation Snippets
Enabling and Configuring Parallel GC
# Basic Parallel GC configuration
java -XX:+UseParallelGC \
-XX:+UseParallelOldGC \
-Xms4g -Xmx4g \
-XX:ParallelGCThreads=16 \
-XX:+PrintGCDetails \
-XX:+PrintGCDateStamps \
-Xlog:gc*:file=gc.log \
-jar myapp.jar
Reading Parallel GC Logs
Parallel GC logs show which phases are running and how long each takes:
[Full GC [PSYoungGen: 512K->0K(1536K)] [ParOldGen: 4095M->4095M(4096M)] 4.1234567secs]
[Times: user=32.10 sys=1.20, real=4.12 secs]
user= CPU time across all threads (32 seconds of CPU work done in 4 seconds wall time = 8 threads)sys= time spent in kernel callsreal= actual wall clock time (should be close to user/N for N threads)
Checking Active GC Algorithm
import java.lang.management.*;
import java.util.*;
public class WhichGC {
public static void main(String[] args) {
List<GarbageCollectorMXBean> gcs = ManagementFactory.getGarbageCollectorMXBeans();
for (GarbageCollectorMXBean gc : gcs) {
System.out.println("Collector: " + gc.getName());
System.out.println(" Pools: " + Arrays.toString(gc.getMemoryPoolNames()));
}
// Check if adaptive sizing is on
RuntimeMXBean runtime = ManagementFactory.getRuntimeMXBean();
System.out.println("Input args: " + runtime.getInputArgs());
}
}
Observability Checklist
- Enable GC logging:
-Xlog:gc*:file=gc.log - Parse GC logs for pause time distribution: look for
real=vsuser=to spot contention - Monitor
jstat -gc <pid>for FC (Full GC count) and FGCT (Full GC time) - Track heap occupancy before and after Full GC to understand live set size
- Watch CPU usage during GC events - high sys time may indicate OS-level overhead
- Use
-XX:PrintGCApplicationConcurrentTimeto measure time between GC pauses - Consider NMT (Native Memory Tracking):
-XX:NativeMemoryTracking=summary
Security Notes
- GC Logs can reveal application allocation patterns and memory behavior - protect them
- JMX/Management Interface access to GC beans should be restricted in production
- Heap Dumps after OOM contain full application state - handle with care
- NMT output can reveal native memory allocation details useful for attacks
Common Pitfalls / Anti-Patterns
| Pitfall | What happens | Fix |
|---|---|---|
| Too many GC threads | GC threads fight application threads for CPU | Set -XX:ParallelGCThreads explicitly |
| AdaptiveSizePolicy on in production | Unpredictable heap resizing | Disable with -XX:-UseAdaptiveSizePolicy |
| Setting heap too large | Fewer but longer Full GCs | Tune based on live set size |
| Setting heap too small | Constant GC thrashing | Profile to find working set size |
| Ignoring old gen occupancy | Promotion failure causes Full GC | Monitor promotion rate with jstat |
Quick Recap Checklist
- Serial GC = single thread, lowest throughput, lowest overhead
- Parallel GC = multiple threads, highest throughput, default for server workloads
- Throughput = Application Time / (Application Time + GC Time)
- Parallelism helps when CPU cores are plentiful and work is divisible
- Both are Stop-The-World collectors - pauses freeze the entire application
-
-XX:ParallelGCThreadscontrols thread count - tune based on available cores -
-XX:+UseAdaptiveSizePolicyauto-tunes heap sizes but can cause instability - For low-latency requirements, look at G1, ZGC, or Shenandoah
Interview Questions
Serial GC uses one thread for all garbage collection work. Parallel GC uses multiple threads. On a multi-core machine, Parallel GC finishes GC work faster because it divides the work across threads, but both are stop-the-world collectors - the application freezes while either one runs. Parallel GC targets throughput; Serial GC targets simplicity and low overhead.
The JVM defaults to `max(availableProcessors - 1, 8)` on most machines. On a 32-core machine, that is 31 threads for GC, which is usually too many - GC threads compete with application threads for CPU. A common rule of thumb is to leave 1-2 cores for application threads and allocate the rest to GC, but this varies by workload. If your GC logs show high system time or low CPU utilization during GC, you probably have too many threads.
`user=` is total CPU time across all threads. `real=` is wall clock time. If you see `user=32.10 real=4.12`, that means 32 seconds of CPU time was used across threads in 4 seconds of wall time - roughly 8 threads doing work. If `real` is close to `user` divided by thread count, GC scaled well. If `real` is much higher, something is causing contention or serialization.
Single-core machines, very small heaps (under 100MB), or batch workloads where pause times genuinely do not matter. Parallel GC has overhead from thread management and synchronization that is not worth it on a single-core or resource-constrained environment. If you are running a small container with strict memory limits and one CPU, Serial GC is often the better choice.
Promotion failure happens when an object cannot be promoted from young to old generation because there is not enough contiguous space. In Parallel GC, this triggers a Full GC. It is often caused by objects too large to fit in Survivor spaces, or by the old generation becoming fragmented. Tuning SurvivorRatio and NewSize/MaxNewSize helps prevent premature promotion that leads to this scenario.
Server workloads typically run on multi-core machines and care about throughput. Parallel GC maximizes application throughput by minimizing total GC time, even if individual pauses are longer. For batch processing, ETL, and computation-heavy workloads where pauses are acceptable, Parallel GC delivers the best overall performance. The trade-off is longer stop-the-world pauses, which is fine when your SLA is measured in minutes, not milliseconds.
GC threads compete with application threads for CPU time. If you have 16 cores and set ParallelGCThreads to 16, GC and your application fight for the same cores. A common recommendation is to set ParallelGCThreads to (available cores - 1) or (available cores - 2), leaving cores for application threads. On a 32-core machine, 30 threads may seem aggressive since GC threads context-switch heavily when they outnumber physical cores.
When UseAdaptiveSizePolicy is enabled (the default), the JVM monitors allocation and survival rates and dynamically adjusts heap region sizes - even if you set explicit values with -Xmn, -XX:NewRatio, etc. The policy can override your explicit settings within bounds. In production, this causes unpredictable pause spikes when resizing occurs. Disable adaptive sizing with -XX:-UseAdaptiveSizePolicy if you need deterministic behavior.
It measures what percentage of total time your application runs versus time spent in GC. If your app runs 95 seconds and GC takes 5 seconds, throughput is 95%. Parallel GC targets 99%+ throughput on well-tuned workloads. A 50% throughput means GC is stealing half your CPU - usually indicates heap is too small or allocation rate is extremely high.
Full GC triggers when old gen cannot accommodate a promotion from young gen, when System.gc() is called, or when Metaspace is exhausted. In GC logs, look for "Full GC (Allocation Failure)" which indicates promotion failure. Use jstat -gc to monitor old gen capacity (OC) versus used (OU). If OU approaches OC frequently, either increase heap or tune promotion rate with SurvivorRatio and MaxTenuringThreshold.
UseParallelGC enables parallel young generation collection (multi-threaded copying). UseParallelOldGC enables parallel old generation collection (multi-threaded Mark-Compact). They are usually enabled together because the old generation collector should also be parallel for throughput workloads. Enabling only UseParallelGC gives you parallel young gen with serial old gen - a mixed configuration that rarely performs well.
When -Xms < -Xmx, the JVM grows the heap as needed. Growing triggers a GC cycle to free space for growth, and shrinking triggers a GC to consolidate before releasing memory. Both introduce unpredictable pause spikes. The JVM ergonomics for heap growth and shrink rates may not match your workload's allocation patterns, causing oscillation. Always set them equal in production for predictable performance.
TLABs give each thread a private allocation buffer in Eden, reducing contention on the global allocation path. Threads allocate in their TLAB with minimal synchronization. This is especially important in Parallel GC where multiple threads allocate rapidly - without TLABs, they would all compete for the same lock. The JVM handles TLAB sizing automatically, but you can tune with -XX:TLABSize and -XX:+ResizeTLAB.
Larger heap means fewer GC cycles but longer pauses per cycle (more objects to mark and compact). Smaller heap means more frequent GC cycles but shorter pauses. For Parallel GC targeting throughput, larger heaps usually win because total GC time decreases even if individual pauses are longer. For latency-sensitive workloads, smaller heaps with G1 or ZGC are better because they spread work more evenly.
GC threads are CPU-bound. If you have 8 physical cores and set 32 GC threads, the OS must time-slice those threads, causing context-switch overhead. In practice, GC threads should not exceed roughly the number of physical cores (not logical cores with hyperthreading) minus 1-2 for application threads. On hyperthreaded cores, you get partial parallelism but not full scaling - a 16-core machine with 32 logical cores may only need 13-14 GC threads.
When an object larger than the entire Survivor space is allocated, it bypasses young generation entirely via -XX:PretenureSizeThreshold and goes directly to old generation. This is sometimes the right behavior for large, long-lived objects like caches or connection pools. When objects exceed Survivor space but are smaller than the threshold, they may be promoted immediately after one minor GC if they survive. Tuning PretenureSizeThreshold correctly prevents these objects from flooding candidates for aging.
Throughput = Application Time / (Application Time + GC Time). More GC threads reduce GC time for the same amount of work but add synchronization overhead and CPU contention. With too many GC threads, context switching and cache contention reduce efficiency. With too few, GC takes longer, eating into application time. The sweet spot depends on the number of physical cores and whether the workload is CPU-bound or I/O-bound.
UseParallelGC only affects the young generation collector; old generation collection depends on UseParallelOldGC setting. You should not mix collectors - enabling UseParallelGC with UseG1GC produces unpredictable results. The correct approach is to use either Parallel GC (both young and old gen use Mark-Compact) or switch entirely to G1 or another collector. Parallel GC and G1 have incompatible heap layouts and cannot be mixed.
When heap is exhausted with Parallel GC, the JVM triggers a full stop-the-world GC. If full GC still cannot free enough memory (because Live Set + new allocation exceeds heap), an OutOfMemoryError is thrown with "Java heap space" message. The OOM message includes the size of the heap allocation request that failed. Frequent OOM despite reasonable heap size indicates either memory leaks, objects being promoted too aggressively to old gen, or allocation rate exceeding what the heap can handle.
High CPU with "acceptable" throughput often means GC threads are consuming CPU that application threads could use. If GC logs show user time high but real time also high proportionally, GC is scaling well. But if user time is high while real time is close to user time divided by threads, GC is not scaling and threads are fighting for resources. Another cause is excessive allocation rate requiring frequent minor GC or Full GC. Check for memory leaks causing frequent Full GC, premature promotion flooding old gen, or TLAB sizing issues causing allocation contention.
Further Reading
- Java Platform, Standard Edition Deployment Guide - Ergonomics - How JVM ergonomics auto-tune GC for server-class machines
- Parallel GC Internals - Azul Systems Blog - Deep dive into parallel collection threading
- Java HotSpot VM Options for GC - Complete reference for GC-related JVM flags
- JVM Performance Tuning with jstat - Practical guide to using jstat for GC diagnostics
Conclusion
Serial GC uses one thread for all GC work; Parallel GC uses multiple threads for higher throughput. Both are stop-the-world collectors that freeze the application during collections. Parallel GC is the default for server workloads on multi-core machines and maximizes throughput at the cost of longer individual pauses. Serial GC suits small heaps and single-core environments. For low-latency needs, look to G1, ZGC, or Shenandoah instead.
Category
Related Posts
CMS and G1 Collectors: Low-Latency Garbage Collection
How CMS and G1 garbage collectors reduce pause times through concurrent marking, region-based heap layout, and incremental compaction.
GC Fundamentals: Mark-Compact, Copying, and Mark-Sweep
Understanding the three core garbage collection algorithms - Mark-Sweep, Mark-Compact, and Copying - their mechanics, trade-offs, and when to use each.
JVM GC Tuning: Heap Sizing and Threshold Optimization
Practical strategies for sizing JVM heap, tuning generation ratios, and optimizing GC thresholds to reduce pause times and improve throughput.