The String Class
Master Java's String class: immutability, concatenation, interning, and key methods like substring, split, and format for text processing.
The String Class
The String class is one of the most fundamental and frequently used types in Java. Representing sequences of Unicode characters, Strings are immutable objects that provide rich functionality for text manipulation, searching, and formatting.
Introduction
The String class is the most frequently used reference type in Java — virtually every program works with text, and Strings appear in everything from configuration to user input to network data. Strings in Java are immutable objects: once created, the character sequence cannot change. This immutability is a deliberate design decision that enables thread safety, secure hashing, and the String Pool memory optimization — but it also means that naive string concatenation in loops creates many intermediate objects, degrading performance.
Understanding String interning — the String Pool mechanism where identical literals share memory — is essential for writing memory-efficient code. Knowing when to use StringBuilder instead of + concatenation prevents O(n^2) performance pitfalls in loops. The wide range of built-in methods (indexOf, substring, split, replace, format) covers most text manipulation needs without external libraries.
This post covers String immutability and its implications, the String Pool and when to use intern(), the critical performance difference between concatenation and StringBuilder, essential method patterns with production code examples, and security considerations like using char[] for passwords instead of Strings. You will also learn how to handle Unicode correctly and avoid common pitfalls with split() and substring().
When to Use / When Not to Use
Use String when:
- Working with text data of any kind
- You need immutable text storage (keys, constants, etc.)
- Building up strings through concatenation in non-critical paths
- Pattern matching with split() or regex
Consider alternatives when:
- Building strings in loops (use StringBuilder)
- Frequent string modification (StringBuilder or StringBuffer)
- Case-insensitive comparisons (use Locale-aware comparison)
- Large text manipulation (consider CharSequence or specialized libraries)
String Memory Architecture
graph TD
A["String Literal Pool<br/>(Method Area)"] --> B["\"hello\"<br/>Reference: s1"]
A --> C["\"hello\"<br/>Reference: s2<br/>Same object as s1"]
D["Heap Memory"] --> E["new String(\"hello\")<br/>New object, different from pool"]
F["Reference Variables"] --> A
F --> D
style A stroke:#00ff00,color:#00ff00
style D stroke:#ff00ff,color:#ff00ff
Production Failure Scenarios
| Scenario | Cause | Mitigation |
|---|---|---|
| String concatenation in loops | O(n²) performance with + operator | Use StringBuilder |
| Memory from intern() misuse | Over-interning fills PermGen/Metaspace | Intern only when needed |
| Case-sensitive comparison bugs | ”ABC”.equals(“abc”) returns false | Use equalsIgnoreCase() or Locale |
| Split creating empty tokens | split(“X”) creates empty strings | Use limit parameter or trim |
| Substring memory leak | JDK <7 kept char[] reference | Use substring with care in old versions |
// Performance pitfall: concatenation in loop
String result = "";
for (String item : items) {
result += item + ","; // Creates new String each iteration!
}
// Better: use StringBuilder
StringBuilder sb = new StringBuilder();
for (String item : items) {
sb.append(item).append(",");
}
String result = sb.toString();
// Split gotcha: trailing empty strings
String input = "a,b,c,";
String[] parts = input.split(","); // [a, b, c] - no trailing empty
// But with regex
String[] parts2 = input.split(","); // Still [a, b, c]
// For controlled splitting with limit
String[] limited = input.split(",", -1); // [a, b, c, ""] - keeps trailing empty
// Substring before Java 7u6: memory leak (shared char[])
String big = "large string...";
String small = big.substring(0, 5); // In old JDKs, 'small' held reference to large char[]
// In modern JDKs, this is fixed - substring creates new char[] copy
Trade-off Table
| Operation | Method | Performance | Notes |
|---|---|---|---|
| Concatenation | + | O(n²) in loops | Compiler optimizes to StringBuilder |
| Building | StringBuilder.append() | O(n) | Preferred for multiple appends |
| Thread-safe building | StringBuffer | Slower than Builder | Use only when needed |
| Search | indexOf() | O(n) | Linear scan |
| Immutable key | String in HashMap | O(1) average | Good for caching |
| Pattern split | split() or regex | O(n) + regex overhead | Consider StringTokenizer |
Implementation Snippets
String Creation and Interning
public class StringCreation {
public static void main(String[] args) {
// String literals go to the pool (interned)
String a = "hello";
String b = "hello";
System.out.println(a == b); // true - same pool reference
// new creates heap object
String c = new String("hello");
System.out.println(a == c); // false - different objects
// Explicit interning
String d = c.intern(); // Returns pool reference
System.out.println(a == d); // true
// String constants are compile-time merged
String e = "hel" + "lo"; // Compiler optimizes to "hello"
System.out.println(a == e); // true
}
}
Essential String Methods
public class StringMethods {
public static void main(String[] args) {
String text = "Hello, World!";
// Searching
int idx = text.indexOf("World"); // 7
int last = text.lastIndexOf("o"); // 8
boolean has = text.contains("ell"); // true
boolean starts = text.startsWith("Hell"); // true
boolean ends = text.endsWith("!"); // true
// Extraction
String sub = text.substring(7); // "World!"
String sub2 = text.substring(7, 12); // "World"
char ch = text.charAt(0); // 'H'
String trimmed = text.trim(); // Removes leading/trailing whitespace
// Transformation
String upper = text.toUpperCase(); // "HELLO, WORLD!"
String lower = text.toLowerCase(); // "hello, world!"
String replaced = text.replace("World", "Java"); // "Hello, Java!"
String[] words = text.split(", "); // ["Hello", "World!"]
// Formatting
String formatted = String.format("Name: %s, Age: %d", "Alice", 30);
// Null-safe operations
String nullStr = null;
String safe = String.valueOf(nullStr); // "null" not NPE
// Or use Objects.toString()
}
}
StringBuilder Patterns
public class StringBuilderDemo {
// Efficient string building
public static String buildCsv(List<String> values) {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < values.size(); i++) {
if (i > 0) sb.append(",");
sb.append(escapeCsv(values.get(i)));
}
return sb.toString();
}
// Reverse string
public static String reverse(String input) {
return new StringBuilder(input).reverse().toString();
}
// Check palindrome
public static boolean isPalindrome(String s) {
String clean = s.replaceAll("[^a-zA-Z0-9]", "").toLowerCase();
return clean.equals(new StringBuilder(clean).reverse().toString());
}
private static String escapeCsv(String value) {
if (value.contains(",") || value.contains("\"") || value.contains("\n")) {
return "\"" + value.replace("\"", "\"\"") + "\"";
}
return value;
}
}
Observability Checklist
- Monitor string memory usage (String pool size in heap)
- Track StringBuilder usage vs concatenation in performance profiling
- Alert on excessive String.split() operations in hot paths
- Log when interning is called (can indicate memory pressure)
- Measure substring creation overhead in old JVM versions
// Observability for string operations
public class StringMetrics {
public static void trackBuildTime(StringBuilder sb, String context) {
if (sb.length() > 10000) {
Logger.info("Large StringBuilder [{}]: {} chars", context, sb.length());
}
}
public static void trackSplit(String input, String delimiter) {
String[] parts = input.split(delimiter);
if (parts.length > 100) {
Logger.warn("Large split result [{}]: {} parts", input, parts.length);
}
}
}
Common Pitfalls / Anti-Patterns
- Password handling: Strings are immutable, passwords stay in memory until GC. Use
char[]for sensitive data. - Log injection: User input in logs can be exploited; sanitize newlines and special chars
- String comparison timing attacks: Constant-time comparison not native to String (use
MessageDigest.isEqual()) - Encoding issues: Always specify charset when converting bytes to String
// Security: avoid storing passwords in Strings
public class SecurePassword {
public boolean verify(char[] input, char[] stored) {
if (input.length != stored.length) return false;
// Constant-time comparison
boolean match = true;
for (int i = 0; i < input.length; i++) {
match &= (input[i] == stored[i]);
}
// Clear arrays when done
Arrays.fill(input, '0');
Arrays.fill(stored, '0');
return match;
}
}
// Security: specify charset to avoid garbling
byte[] bytes = getData();
String safe = new String(bytes, StandardCharsets.UTF_8); // Always specify
String unsafe = new String(bytes); // Platform default - unpredictable
Common Pitfalls / Anti-patterns
-
Using == to compare string values
// BAD - compares references, not content String a = new String("hello"); String b = new String("hello"); if (a == b) { } // false // GOOD - compares values if (a.equals(b)) { } // true -
Concatenating in loops
// BAD - creates many intermediate String objects String result = ""; for (String s : list) { result += s; } // GOOD - StringBuilder StringBuilder sb = new StringBuilder(); for (String s : list) { sb.append(s); } String result = sb.toString(); -
Ignoring empty string vs null
// BAD - NPE if str is null boolean empty = str.isEmpty(); // GOOD - handles null boolean empty = (str == null) || str.isEmpty(); // or use Apache Commons StringUtils.isEmpty() -
Case-insensitive comparison without Locale
// BAD - uses system default locale, may not be correct if (str.equalsIgnoreCase("yes")) { } // GOOD - explicitly use US locale for ASCII comparison if (str.equalsIgnoreCase("yes")) { } // Actually fine for ASCII, but for Turkish: str.toUpperCase(Locale.US).equals("YES") // Correct
Quick Recap Checklist
- String is immutable — creating a new String modifies nothing, returns new object
- String literals are interned (stored in String pool for reuse)
- Use new String() only when you need a distinct heap object
- intern() adds a string to the pool and returns the pooled reference
- Use StringBuilder for building strings in loops or multiple operations
- split() with no limit can create empty trailing tokens
- Use char[] instead of String for passwords (immutability prevents clearing)
- Always specify charset when creating strings from bytes
Interview Questions
Model Answer: "String immutability provides several benefits: 1) Security — strings are used as class names, file paths, network URLs; mutation could corrupt these. 2) Thread safety — immutable strings need no synchronization. 3) String pooling — immutability allows interning and memory sharing without fear of modification. 4) HashMap/HashSet keys — immutability ensures hash code remains stable. 5) Performance — the JVM can cache String objects and optimize operations. Once a String is created, its character sequence cannot change.
Model Answer: "String is immutable — every modification creates a new object. StringBuilder is mutable and designed for single-threaded string building — append, insert, reverse operations modify the internal buffer in place with better performance. StringBuffer is the thread-safe version of StringBuilder — all methods are synchronized, making it safe for multi-threaded use but slower. Use StringBuilder for most new code; use StringBuffer only when sharing the buffer across threads.
Model Answer: "String interning places literal strings in a shared pool (String.intern()). When you intern a string, if an equivalent literal already exists in the pool, the JVM returns that reference; otherwise the string is added and its reference returned. Use intern() when: you have many identical strings and want to save memory; you need to compare strings using == for performance. Avoid intern() when: the strings are numerous and short-lived (adds to pool pressure); you're not comparing many equal strings. In modern JVMs with G1GC, interned strings are moved to heap, reducing PermGen issues.
Model Answer: "String.split() with no limit (or limit > 0) discards trailing empty strings. For example, "a,b,c,".split(",") returns ["a", "b", "c"] — the trailing empty string is dropped. To preserve trailing empties, use a negative limit: "a,b,c,".split(",", -1) returns ["a", "b", "c", ""]. This behavior mirrors the Unix tool awk and prevents empty tokens at the end of lines from being silently lost.
Model Answer: "In modern JDKs (7u6+), substring() creates a new char array containing only the requested characters — no memory leak. However, before this fix, substring() in JDK 6 (and earlier) created a new String object but shared the parent String's underlying char array. The new String's offset/length pointed into the parent's array. This meant holding onto a small substring kept the entire large char array alive in memory — a common cause of OutOfMemoryErrors when processing large strings and keeping small substrings. Always prefer substring over other extraction methods for memory efficiency in modern JVMs.
Model Answer: "indexOf() searches from the beginning of the string toward the end, returning the first index where the substring is found, or -1 if not found. lastIndexOf() searches from the end toward the beginning, returning the last (rightmost) index of the substring. Both support an optional start position parameter. Use indexOf() when you want the first occurrence; use lastIndexOf() when you want the last occurrence, such as finding the final path separator in a file path (path.lastIndexOf('/')). Both perform linear O(n) searching through the string's character array.
Model Answer: "Use matches() with regex "\\d+" for simple cases, but for performance-sensitive code use a manual loop: public static boolean isNumeric(String s) { for (int i = 0; i < s.length(); i++) { if (!Character.isDigit(s.charAt(i))) return false; } return true; }. This avoids regex compilation overhead and is typically 3-5x faster for repeated checks. Alternatively, use Character.isDigit() in a loop with early exit. For empty strings, the manual loop returns true (no digits found to contradict); if you need false for empty, add a length check. Apache Commons Lang StringUtils.isNumeric() handles additional Unicode digit categories.
Model Answer: "String has four replace methods: replace(char, char) replaces all occurrences of a character — returns a new String. replace(CharSequence, CharSequence) replaces all occurrences of a substring. replaceFirst(String regex) replaces only the first match of a regex pattern. replaceAll(String regex, String replacement) replaces all matches of a regex. All return a new String (String is immutable) and perform linear-time scanning for simple character/substring replacements. For regex-based replacements, compilation happens on each call — consider using Pattern.compile() once and reusing the matcher for better performance in loops.
Model Answer: "String internally stores characters in UTF-16 encoding (char array), where each char is a 16-bit value. For characters in the Basic Multilingual Plane (BMP, U+0000 to U+FFFF), one char is sufficient. For supplementary characters (Unicode code points beyond U+FFFF, like many emoji), two chars are needed — a surrogate pair. Methods like length() return the number of chars, not code points. For correct character counting, use codePointCount(0, length()). Similarly, charAt() returns a single char which may be half of a supplementary character. Use codePointAt() for proper supplementary character handling.
Model Answer: "trim() removes leading and trailing ASCII whitespace (code points <= U+0020). It returns a new String with these characters removed. Limitations: it does not remove non-ASCII whitespace (e.g., NBSP U+00A0, BOM U+FEFF), it only looks at char values <= 32, and it cannot distinguish between intentional zero-width space and unwanted whitespace. For Unicode-aware trimming, use strip() (Java 11+) which uses Character.isWhitespace() and handles Unicode properly. stripLeading() and stripTrailing() remove from one side only. For HTML/entity whitespace, use a regex or a library like Apache Commons Text.
Model Answer: "Use toLowerCase(Locale.getDefault()) or toUpperCase(Locale.US) to specify locale. The default locale-sensitive versions can produce unexpected results in Turkish locale — for example, 'i'.toUpperCase() in Turkish locale produces 'İ' (dotted I), not 'I'. For ASCII-only strings, explicitly use Locale.ROOT or Locale.US: str.toUpperCase(Locale.US). The toLowerCase() and toUpperCase() without locale use the JVM's default locale, which can change between environments and cause subtle bugs in internationalized applications.
Model Answer: "String.split() uses regex internally — it compiles a pattern and matches against the string, which has overhead for simple delimiters. StringTokenizer is purpose-built for simple delimiter-based splitting and is generally faster when you only need basic tokenization. StringTokenizer also handles empty tokens more predictably. For simple single-character delimiters, String.indexOf() in a loop can be faster than both. For regex-based splitting (multiple delimiters, patterns), split() is more convenient. Performance difference is negligible for most applications; use whichever is more readable for your use case.
Model Answer: "String's hashCode is computed as: s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1], where s[i] is the character value and n is the string length. The constant 31 was chosen empirically as a good compromise between spread and performance (multiplication by 31 can be optimized to (i << 5) - i). This hashing is used by the JVM for String keys in HashMap/HashSet. Because Strings are immutable, the hashCode is cached after first computation — subsequent calls return the cached value. This makes String a good HashMap key because hashCode is stable and caching reduces repeated computation.
Model Answer: "Null strings require defensive handling: 1) Null check before use: if (str != null) { ... }. 2) Use String.valueOf(obj) which returns "null" for null rather than throwing NPE. 3) Use Objects.requireNonNull() to fail fast on null input. 4) Use Objects.toString() with a default: Objects.toString(str, "default"). 5) In equals comparisons, put the literal first: "literal".equals(str) — this is null-safe (won't throw NPE) and also guards against the literal being mistakenly null. 6) Apache Commons Lang StringUtils.defaultString() and related utilities handle null gracefully.
Model Answer: "String.format() uses a format string with printf-style specifiers (%s, %d, %.2f) and returns a formatted string. Use it for: complex formatted output (tables, aligned columns), internationalized messages (with MessageFormat), and when format is defined once and applied to multiple values. Avoid it for: simple concatenations in non-performance-critical paths, or in tight loops (format parsing has overhead). For simple cases like "Hello " + name, concatenation is clearer and the JIT compiler optimizes adjacent string concatenations into StringBuilder internally. Profile before replacing concatenation with format calls in hot paths.
Model Answer: "The String(byte[], charset) constructor decodes the byte array using the specified charset, throwing UnsupportedEncodingException if the charset is not supported. Always specify a charset explicitly: new String(bytes, StandardCharsets.UTF_8). Without a charset, new String(bytes) uses the JVM's default charset, which varies by platform and locale — a common source of encoding bugs. Similarly, getBytes(charset) should be used instead of getBytes() for consistent encoding. For binary data in strings (Base64, etc.), use proper encoding utilities rather than this constructor.
Model Answer: "equals() compares character-by-character (O(n) where n is string length) after a quick length check. == compares references — fast O(1) if both point to the same object, but misleading for value comparison. For interned strings from the pool, == may return true for equal literals because they share references. For Strings created with new String(), == always returns false for separate objects even with identical content. Always use equals() for value comparison. The JIT can optimize repeated equals() calls on the same strings by recognizing patterns, but reference equality is not reliable for value semantics.
Model Answer: "For a known small number of strings, + is readable and the JIT optimizes it to StringBuilder internally. For unknown or large numbers, explicitly use StringBuilder: StringBuilder sb = new StringBuilder(initialCapacity); for (String s : strings) { sb.append(s); } return sb.toString();. Set initial capacity to avoid resizing: new StringBuilder(sum of lengths). For streams, use Collectors.joining() which uses StringBuilder internally. Avoid StringBuffer unless you need thread safety (synchronized). For CSV/character-separated output, StringBuilder with manual append is faster than split+join approaches.
Model Answer: "isEmpty() (Java 6+) returns true only if length() == 0 — it does not check for whitespace. isBlank() (Java 11+) returns true if the string is empty OR contains only whitespace characters (according to Character.isWhitespace()). Example: " ".isEmpty() is false, but " ".isBlank() is true. "".isBlank() is true (empty is blank by definition). For input validation, isBlank() is usually what you want — it treats spaces, tabs, and other whitespace as empty. For checking if a string has actual content (non-whitespace characters), use isBlank() or trim + isEmpty depending on your requirements.
Model Answer: "String implements Comparable<String> with lexicographic (dictionary) comparison based on Unicode code point values. compareTo() compares character-by-character, stopping at the first difference or when one string is exhausted. Shorter strings compare as "less than" longer strings when the longer starts with the shorter: "ab".compareTo("abc") returns negative. This is NOT case-insensitive — 'A' (65) < 'a' (97). For case-insensitive or locale-aware comparison, use String.CASE_INSENSITIVE_ORDER or Collator.getInstance(). String comparisons are used in sorting algorithms and TreeSet/TreeMap ordering.
Further Reading
- Java Wrapper Classes - Immutable wrappers bridging primitives and objects
- String Implementation - OpenJDK - Source code for String class
- String Performance - Baeldung - Performance characteristics of String operations
- String Interning Deep Dive - Understanding String pool and intern() method
- Unicode and Java Strings - Oracle tutorial on Unicode handling
Conclusion
The String class in Java is an immutable object representing a sequence of Unicode characters. Because strings are so fundamental, Java optimizes them heavily — literal strings are interned in the String Pool for memory sharing, and the class provides rich methods for searching, extraction, transformation, and formatting.
Key takeaways: immutability makes String thread-safe and suitable for keys in HashMap/HashSet, but also means every concatenation in loops creates new objects — use StringBuilder instead. The String Pool stores literal strings for reuse, but new String() always creates a separate heap object. split() discards trailing empty strings unless you use a negative limit parameter.
Strings are the most common reference type in Java and appear in virtually every program. For understanding how wrapper classes like Integer and Double handle the boundary between primitives and objects, see Java Wrapper Classes.
Category
Related Posts
Abstract Classes in Java
Learn about partially implemented classes that define contracts for subclasses using abstract methods and concrete implementations.
Arithmetic Operators in Java
Master Java arithmetic operators: addition, subtraction, multiplication, division, and modulo with integer division gotchas and operator precedence explained.
Array Basics in Java
Learn Java array fundamentals: declaration, initialization, element access, and the length property explained simply.