HashSet in Java

Learn HashSet: unique element collections backed by HashMap, contains and add operations, and the internal ELEMENT object pattern.

published: reading time: 13 min read author: GeekWorkBench

HashSet in Java

HashSet is Java’s implementation of a set — a collection that stores only unique elements. Under the hood, it is a thin wrapper around a HashMap that uses a shared sentinel object as the value for every key (the element itself). This design gives HashSet all the O(1) performance characteristics of HashMap.

Introduction

HashSet solves a simple problem: you need a collection of unique elements with fast membership checks. Behind the scenes, it is a clever hack — a HashMap where the element becomes the key and a shared dummy object (PRESENT) becomes the value. Since map keys are unique by definition, you get deduplication for free. The O(1) add() and contains() operations come directly from HashMap’s implementation.

The implication is important: HashSet inherits every behavioral quirk of HashMap. The element’s hashCode() must be stable after insertion or the element becomes unfindable. The set is not thread-safe. Iteration order is undefined — HashSet makes no promises about the sequence in which elements are visited, unlike LinkedHashSet which preserves insertion order. And because HashSet is just a thin wrapper, mutating an element in place can silently break the set’s internal consistency without any warning.

This post covers how HashSet delegates to HashMap, the PRESENT sentinel pattern that makes this work, the scenarios where HashSet is the right tool (deduplication, fast lookups, set algebra operations), and when to choose TreeSet (sorted) or LinkedHashSet (insertion order) instead.

When to Use HashSet

Use HashSet when:

  • You need to store a collection of unique elements
  • You need O(1) membership checks (contains())
  • You do not need to store duplicates or maintain order
  • You want fast add, remove, and clear operations

Do not use HashSet when:

  • You need sorted elements (use TreeSet)
  • You need insertion-order iteration (use LinkedHashSet)
  • You need to maintain counts for duplicate elements (use Map<E, Integer> or Multiset)
  • You need thread-safe unique collections (use ConcurrentHashSet or Collections.synchronizedSet())

Internal Structure

HashSet internally uses a HashMap<E, Object> where the element is stored as the key and a shared PRESENT object is used as the value:

private transient HashMap<E, Object> map;
private static final Object PRESENT = new Object();

// add() calls map.put(element, PRESENT) — PRESENT is never used
public boolean add(E e) {
    return map.put(e, PRESENT) == null;  // null means key was absent (new element)
}

Mermaid Diagram: HashSet Internals

graph TD
    A["HashSet<E>"] --> B["HashMap<E, Object>"]
    B --> C["map.put(element, PRESENT)"]
    C --> D["PRESENT = new Object()"]
    D --> E["Used as dummy value for all entries"]

Failure Scenarios

ScenarioCauseResult
NullPointerExceptionAdding null when set does not permit nullsRuntime crash
ClassCastExceptionStoring incompatible typesRuntime crash on retrieval
ConcurrentModificationExceptionModifying during iterationRuntime crash
Duplicate add() returnsAdding an element already presentReturns false, no exception

Trade-Off Table

AspectHashSetTreeSetLinkedHashSet
OrderingNoneSorted (Red-Black tree)Insertion order
contains() / add()O(1) averageO(log n)O(1) average
Memory overheadLowestTree node overheadLinked list overhead
null elementOne allowedNot allowedOne allowed
Sorted operationsN/Afirst(), last(), subSet()N/A

Code Snippets

Basic Operations

Set<String> languages = new HashSet<>();
languages.add("Java");
languages.add("Python");
languages.add("Java");   // Duplicate — ignored, returns false
System.out.println(languages.size());      // 2
System.out.println(languages.contains("Java")); // true
languages.remove("Python");

Set Algebra Operations

Set<String> a = Set.of("Java", "Python", "Go");
Set<String> b = Set.of("Python", "Rust", "Go");

Set<String> intersection = new HashSet<>(a);
intersection.retainAll(b);  // ["Python", "Go"]

Set<String> union = new HashSet<>(a);
union.addAll(b);  // ["Java", "Python", "Go", "Rust"]

Set<String> difference = new HashSet<>(a);
difference.removeAll(b);  // ["Java"]

Converting Between List and Set

List<String> withDuplicates = List.of("a", "b", "a", "c", "b");
// Deduplicate
Set<String> unique = new HashSet<>(withDuplicates);
// Back to list preserving order
List<String> deduped = new ArrayList<>(unique);

Observability Checklist

  • Monitor set size distributions — detect unbounded growth
  • Track contains() call frequency in hot paths — these are O(1) but often assumed O(n)
  • Profile hash function quality if set operations appear slow despite correct implementation
  • Log ConcurrentModificationException to identify iteration conflicts
  • Use Set.of() for small, immutable sets — avoids defensive copying overhead

Security Notes

  • HashSet is not thread-safe; concurrent modification can cause data corruption
  • Avoid storing sensitive elements (passwords, tokens) in sets that may be serialized or logged
  • Consider Collections.unmodifiableSet() when returning sets to external callers
  • When using custom objects as set elements, ensure hashCode() and equals() are consistent and do not leak sensitive state

Common Pitfalls / Anti-Patterns

  1. Mutable elements as set members: If an element’s hashCode() changes after insertion, the element becomes unretrievable and orphaned in the set
  2. Confusing add() return value: add() returns true if the element was added, false if it was already present — this is not an error condition
  3. null element: Most HashSet implementations allow one null element; attempting to add a second throws NullPointerException
  4. Assuming set preserves insertion order: It does not; use LinkedHashSet if order matters
  5. Using contains() on collections with custom equals(): The operation is O(n) for List (linear scan) but O(1) for HashSet — choose wisely

Quick Recap

  • HashSet wraps a HashMap internally, using a dummy PRESENT object as all values
  • Element uniqueness is enforced via the map’s key-based deduplication
  • O(1) average add(), contains(), and remove() operations
  • No ordering guarantees; use TreeSet for sorted or LinkedHashSet for insertion-order
  • Thread-unsafe; synchronize externally or use concurrent alternatives

Interview Questions

::: info The following .qa-card components contain typical interview questions you may encounter. Reviewing these will help reinforce key concepts. :::

1. How does HashSet ensure element uniqueness?

Model Answer: "`HashSet` internally uses a `HashMap` where the element itself is the key and a shared `PRESENT` object is the dummy value. Since `HashMap` keys are unique by definition, and `HashMap` uses `equals()` and `hashCode()` to determine key equality, `HashSet` inherits this uniqueness guarantee automatically. Adding a duplicate element simply overwrites the same key with the same `PRESENT` value."

2. What is the time complexity of `HashSet.contains()`?

Model Answer: "**O(1) average-case**, O(n) worst-case. The method computes `hashCode()` of the element, finds the bucket, then traverses the chain (or tree) calling `equals()` on each candidate. With a good hash function and reasonable load factor, collisions are rare, making lookups constant time."

3. Can a HashSet contain null?

Model Answer: "Yes, most implementations allow exactly **one `null` element**. Adding a second `null` throws a `NullPointerException`. However, attempting to add `null` to a `Collections.checkedSet()` or certain synchronized wrappers may throw an exception even for the first `null`."

4. What is the relationship between `HashSet` and `HashMap`?

Model Answer: "`HashSet` is implemented as a **thin wrapper around `HashMap`**. The set's elements become map keys, and a constant dummy object (`PRESENT`) is used as all map values. All set operations (`add()`, `remove()`, `contains()`) delegate directly to the corresponding map operations. This is a classic GoF **Adapter Pattern** application."

5. How do you iterate over a HashSet efficiently?

Model Answer: "You can iterate via `iterator()`, `for-each` loop (which uses the iterator under the hood), or `forEach()`. All three are equivalent in performance — O(n) total, O(1) per element. The iteration visits every bucket, so even empty buckets contribute to iteration time. There is no guaranteed order unless using `LinkedHashSet`."

6. How does HashSet's `add()` method return true or false?

Model Answer: "`add()` calls `map.put(element, PRESENT)`. `HashMap.put()` returns the **previous value** associated with the key, or `null` if the key was absent. Since `PRESENT` is never `null`, `add()` returns `true` when `put()` returns `null` (new element) and `false` when `put()` returns `PRESENT` (duplicate, already present)."

7. Can two different objects have the same hash code and still be stored separately in a HashSet?

Model Answer: "Yes. HashMap uses both `hashCode()` (to find the bucket) and `equals()` (to find the element within the bucket). If two objects have the same hash code but are not `equals()` to each other, they are stored in the same bucket as separate entries — typically as a linked list or tree structure. Having many such objects degrades performance to O(n) for that bucket."

8. What is the initial capacity and load factor of a HashSet?

Model Answer: "Default initial capacity is **16** and default load factor is **0.75**. When the number of entries exceeds `capacity * loadFactor`, the map doubles in size and rehashes all entries. You can override defaults: `new HashSet<>(initialCapacity, loadFactor)`. Setting initial capacity avoids intermediate resizes for known-size sets."

9. How do you create an immutable Set in Java?

Model Answer: "Use `Set.of("a", "b", "c")` (Java 9+) which creates an immutable set with O(1) operations. Alternatively, `Collections.unmodifiableSet(set)` wraps an existing set. Both prevent modifications but with different performance characteristics — `Set.of()` is typically more efficient for small, fixed sets."

10. What is the difference between HashSet and LinkedHashSet?

Model Answer: "`LinkedHashSet` extends `HashSet` but additionally maintains a **doubly-linked list** across entries in insertion order. This provides O(1) operations (inherited from HashSet) plus predictable iteration order (insertion order). The tradeoff is slightly higher memory overhead (~8 bytes per entry for the linked list pointers)."

11. Why does HashSet not allow duplicate elements?

Model Answer: "HashSet uses `HashMap` as its backing store, where elements become keys. `HashMap` keys are unique by definition — when you call `put(key, value)` with a key that already exists, it overwrites and returns the old value. `HashSet.add()` checks this return value and returns `false` for duplicates, silently ignoring the second insert."

12. What is the difference between `HashSet` and `IdentityHashMap`?

Model Answer: "`IdentityHashMap` uses reference equality (`==`) instead of `equals()` for key comparison. This means two distinct objects that are content-equal can both exist as separate keys. `IdentityHashMap` is rarely used — primarily for implementing object identity semantics like deep clones or custom memoization."

13. How does HashSet handle element retrieval if `hashCode()` changes after insertion?

Model Answer: "If a mutable object's `hashCode()` changes after insertion into a HashSet, the object becomes effectively **unretrievable**. The set stored it in the bucket corresponding to the original hash code, but now computes a different bucket index. The object is still in the set but cannot be found through normal `contains()` or `get()` operations."

14. What is the difference between `HashSet.contains()` and `List.contains()`?

Model Answer: "`HashSet.contains()` is O(1) average-case (hash-based lookup). `List.contains()` is O(n) (linear scan). For frequent membership checks on large collections, `HashSet` is dramatically faster. Use `HashSet` when you need fast lookups without order requirements."

15. Can HashSet be used as a cache with eviction based on size?

Model Answer: "`HashSet` itself does not support size-based eviction. Use `LinkedHashSet` with `removeEldestEntry()` for LRU cache implementation, or use `WeakHashMap`/`WeakHashSet` for reference-based cleanup. For bounded caches with eviction policies, consider `com.google.common.cache.Cache` from Guava."

16. What happens when you add the same element to a HashSet twice?

Model Answer: "The second `add()` call returns `false` (element was already present) and the set remains unchanged. No exception is thrown. The element's position in the internal HashMap is not modified — only the mapping's value (PRESENT) is re-set to itself."

17. How does HashSet's `clear()` method work internally?

Model Answer: "`clear()` calls `HashMap.clear()` which iterates through all buckets and sets each entry to `null`, resetting `size` to 0. It does not shrink the bucket array (capacity stays the same). For memory optimization after bulk removal, create a new smaller HashSet instead."

18. What is the difference between `HashSet.of()` and `new HashSet<>()` for small sets?

Model Answer: "`HashSet.of("a", "b", "c")` (Java 9+) creates an **immutable** set — modifications throw `UnsupportedOperationException`. It uses less memory and is faster for small fixed sets. `new HashSet<>()` creates a mutable set suitable for dynamic collections. Choose based on whether the set needs to change after creation."

19. Why should HashSet elements implement both `hashCode()` and `equals()`?

Model Answer: "HashSet uses `hashCode()` to find the bucket and `equals()` to identify the correct entry within the bucket. If two objects are `equals()` but have different `hashCode()` values, they would be stored in different buckets and treated as separate elements — violating the set contract. Both methods must be consistent for proper HashSet behavior."

20. How does LinkedHashSet differ from HashSet in iteration order?

Model Answer: "`LinkedHashSet` extends `HashSet` and adds a doubly-linked list that maintains **insertion order** iteration. `HashSet` has no ordering guarantee. `LinkedHashSet` provides O(1) operations with predictable iteration, at the cost of ~8 bytes extra per entry for the linked list pointers."

Further Reading

Conclusion

HashSet provides the simplest path to unique-element storage with O(1) membership checks. It is backed by a HashMap, using the element as the key and a shared dummy object as the value — all the O(1) behavior comes from that underlying map. The main constraint is no ordering guarantees and no duplicates.

The most common mistake is using mutable objects as set elements, which can silently break retrieval when hashCode() changes after insertion. Stick to immutable keys for any set that will be queried repeatedly. HashSet shines when deduplication, fast contains(), and raw uniqueness are what you need.

HashSet pairs naturally with HashMap — understanding one makes the other obvious since the mechanics are nearly identical. For cases requiring sorted unique elements, TreeSet offers O(log n) operations with full ordering at the cost of some performance.

Category

Related Posts

Abstract Classes in Java

Learn about partially implemented classes that define contracts for subclasses using abstract methods and concrete implementations.

#java-abstract-classes #java #java-fundamentals

Arithmetic Operators in Java

Master Java arithmetic operators: addition, subtraction, multiplication, division, and modulo with integer division gotchas and operator precedence explained.

#java-arithmetic-operators #java #java-fundamentals

Array Basics in Java

Learn Java array fundamentals: declaration, initialization, element access, and the length property explained simply.

#java-array-basics #java #java-fundamentals