Elasticsearch: Full-Text Search at Scale

Learn how Elasticsearch powers search at scale with inverted indexes, sharding, replicas, and its powerful Query DSL for modern applications.

published: March 22, 2026 reading time: 27 min read author: GeekWorkBench

Elasticsearch: Full-Text Search at Scale

Elasticsearch stores data in inverted indexes — the key data structure that makes full-text search fast across billions of documents. This post covers how it works: inverted indexes, shards, replicas, and the Query DSL for writing searches that actually perform.

Core Concepts

The Inverted Index

Elasticsearch stores data in Lucene indices. The core data structure is the inverted index — a mapping from terms to the documents containing those terms. When you search for a word, the engine looks it up and retrieves matching documents immediately. No full table scan. This is what makes Elasticsearch fast.

{
  "inverted_index": {
    "elasticsearch": [{ "doc_id": 1, "positions": [0, 3] }],
    "tutorial": [{ "doc_id": 1, "positions": [1] }],
    "full-text": [{ "doc_id": 2, "positions": [0] }]
  }
}

When you search for “elasticsearch tutorial,” Elasticsearch looks up both terms in the inverted index and finds matching documents instantly. No full table scan required.

The index also stores metadata about each term: document frequency, term frequency, positions for phrase queries, and norms for field-length normalization. This metadata is what makes relevance scoring, phrase matching, and fuzzy search possible.

Analyzer Pipeline

Before terms enter the inverted index, they pass through an analyzer consisting of three stages:

Character filters remove HTML tags, convert characters, or apply language-specific normalizations.
Tokenizer splits text into individual terms (tokens).
Token filters lowercase terms, remove stop words, apply synonyms, and perform stemming.

{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "snowball", "asciifolding"]
        }
      }
    }
  }
}

Choosing the right analyzer matters. The standard analyzer works fine for most English text. For domain-specific vocabularies, you might need custom analyzers with synonym filters or language-specific stemmers.

Sharding: Distributing Data Across Nodes

A single Elasticsearch node can store millions of documents, but eventually you will need more storage, CPU, or memory than one machine provides. Elasticsearch solves this with shards: horizontal partitions of your index.

When you create an index, you specify the number of primary shards. Each primary shard is an independent Lucene index that stores a subset of your documents.

{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  }
}

With three primary shards, Elasticsearch distributes documents roughly evenly across them. A document with ID doc123 is routed to a specific shard using a hash of the ID: shard = hash(_id) % num_primary_shards.

Shard Routing Explained

graph TD
    Client[Client] -->|"search request"| LB[Load Balancer]
    LB --> Node1[Node 1]
    Node1 -->|"coordination"| Coordinator[Coordinator]
    Coordinator -->|"scatter"| Shard1[Primary Shard 1]
    Coordinator -->|"scatter"| Shard2[Primary Shard 2]
    Coordinator -->|"scatter"| Shard3[Primary Shard 3]
    Shard1 -->|"gather"| Coordinator
    Shard2 -->|"gather"| Coordinator
    Shard3 -->|"gather"| Coordinator
    Coordinator -->|"reduce"| Client

The node receiving the search request becomes the coordinator. It broadcasts the query to all relevant shards, collects results, and merges them into a single response. The scatter-gather pattern here is what lets you search across all shards in parallel.

Replica Shards

Every primary shard can have replicas for fault tolerance and read scalability. Replicas are never allocated on the same node as their primary. If a node fails, Elasticsearch promotes replicas to primaries automatically. Add replicas to linearly scale search throughput for read-heavy workloads.

The Query DSL: Expressing Search Logic

Elasticsearch’s Query DSL is a JSON-based language for expressing complex searches. It separates leaf queries (match, term) from compound queries (bool). Use filter context for non-scoring queries to leverage caching. Use query context when relevance scoring matters.

The Bool Query

The bool query is the workhorse of Elasticsearch search. It supports four clauses:

must: Documents must match (AND logic)
should: Documents should match (OR logic)
filter: Same as must but without scoring (faster)
must_not: Documents must not match (NOT logic)

{
  "query": {
    "bool": {
      "must": [{ "match": { "title": "elasticsearch" } }],
      "filter": [
        { "range": { "publish_date": { "gte": "2024-01-01" } } },
        { "term": { "status": "published" } }
      ],
      "should": [
        { "match": { "tags": "search" } },
        { "match": { "tags": "database" } }
      ],
      "minimum_should_match": 1
    }
  }
}

The filter context is particularly important. Queries in filter context bypass scoring entirely, and Elasticsearch caches filter bitsets for reuse. If you filter by a static field like status, subsequent queries become faster.

Relevance Scoring

Elasticsearch uses BM25 (Okapi BM25) as its default similarity algorithm. BM25 considers term frequency saturation and field length normalization. A term appearing 10 times in a 100-word field scores roughly the same as appearing 5 times in a 50-word field.

You can debug relevance with the explain parameter:

GET /my-index/_search
{
  "explain": true,
  "query": {
    "match": { "content": "elasticsearch tutorial" }
  }
}

The response shows exactly how each document scored, including term frequencies, inverse document frequencies, and field norms.

Designing Indices for Performance

Index design profoundly impacts search performance. A few principles:

Size your shards wisely. Shards between 10GB and 50GB work well. Too many small shards cause overhead; too few large shards cause memory pressure.

Use aliases for flexibility. Index aliases let you reindex without downtime:

POST /_aliases
{
  "actions": [
    { "remove": { "index": "my-index-v1", "alias": "my-index" } },
    { "add": { "index": "my-index-v2", "alias": "my-index" } }
  ]
}

Denormalize for read performance. Elasticsearch does not support joins like SQL. If you frequently query documents with nested objects, consider flattening the structure or using denormalization.

Capacity Estimation

Sizing an Elasticsearch cluster means estimating heap and document counts.

Heap allocation per node:

heap_per_node ≈
    (shards_per_node × segment_overhead)
  + (indexing_buffer × 10-20% of heap)
  + (query_cache × 10% of heap)
  + (fielddata × 10-20% of heap)
  + OS reserve (1GB)

Elasticsearch heap is shared across all shards on a node. Segment overhead alone is roughly 25MB per shard. With 30 shards on one node, that is 750MB before you even touch indexing buffers or caches.

Docs per shard guidelines:

Shard Size Target	Docs per Shard (approx)
10GB shard	10-30M docs (variable by doc size)
30GB shard	30-100M docs
50GB shard	50-150M docs

Doc count is harder to predict than shard size. A 1KB doc and a 50KB doc land at very different doc counts even at the same shard size. Watch both.

Example: 500GB index with 3 primary shards and 1 replica each:

shards_per_node = 3 primaries + 3 replicas = 6 shards
at 30GB/shard = 180GB per node

heap needed:
  segment overhead: 6 × 25MB = 150MB
  indexing buffer: 512MB (default, scales with heap)
  query cache: 276MB (10% of ~2.7GB heap)
  fielddata: 276MB
  OS reserve: 1GB
  total ≈ 2.5GB minimum, 4GB recommended

The sweet spot is 30GB-32GB heap per node. Above 32GB, Lucene starts using direct memory buffers outside the Java heap, which shifts the memory math. Most production clusters run 31GB heap on 64GB machines — the remaining RAM goes to the OS page cache, which Lucene uses heavily for file system caching.

Under the Hood: Segments, Translog, and Merge Policies

Elasticsearch is built on Apache Lucene, and understanding how Lucene stores your data at the file level demystifies many production behaviors — sudden latency spikes, disk usage patterns, and heap growth all trace back to segment internals.

Lucene Segments

Every shard contains one or more Lucene segments. A segment is a self-contained, immutable inverted index. When you index a document, it goes into a small in-memory buffer. On refresh, that buffer is flushed into a new segment file on disk. Searches hit all segments and merge results.

Segment immutability means writes never modify existing segments — new documents create new segments. This design enables locking-free writes but creates a problem: many small segments degrade search performance.

Segment Lifecycle

    index    flush    merge
doc -> buffer -> segment -> (merged) -> larger segment
         \                  /
          \----------------/
           translog backup

The refresh_interval controls how often in-memory buffers become searchable (default: 1 second). Between flushes, data lives in both the buffer and the translog.

The Translog

The translog is a write-ahead log that durability-safes every indexed document before it is committed to a segment. If a crash occurs, Elasticsearch replays the translog to recover un-flushed documents. This is the key difference between Elasticsearch’s near-real-time promise and true real-time guarantees.

Translog truncation happens automatically after a successful flush. You can force a translog flush:

POST /my-index/_flush

For write-heavy workloads, sizing the translog matters:

{
  "index": {
    "translog": {
      "disable_threshold": "1h",
      "sync_interval": "5s",
      "retention": {
        "size": "512MB",
        "age": "12h"
      }
    }
  }
}

Setting retention.size too low causes data loss on crashes. Setting it too high wastes disk space. The right value depends on your crash-recovery SLA.

Merge Policies

When segments accumulate, Lucene runs a merge policy to coalesce small segments into larger ones. The default policy is TieredMergePolicy (Lucene 9+) or LogByteSizeMergePolicy (older).

TieredMergePolicy targets a balanced segment size ladder:

{
  "index": {
    "merge_policy": {
      "type": "tiered",
      "max_merge_at_once": 10,
      "segments_per_tier": 10,
      "max_merged_segment_size_bytes": "5GB"
    }
  }
}

Key parameters:

max_merge_at_once — how many segments can be merged in a single merge round (default: 10)
segments_per_tier — how many small segments are allowed before a merge is triggered (default: 10)
max_merged_segment_size_bytes — caps the size of merged segments to prevent giant segments

LogByteSizeMergePolicy (simpler, older):

{
  "index": {
    "merge_policy": {
      "type": "log_byte_size",
      "min_merge_size": "2MB",
      "max_merge_size": "10GB"
    }
  }
}

For write-heavy workloads (like log ingestion), set max_merge_at_once lower so merges don’t steal CPU from indexing. For read-heavy workloads, higher values mean fewer but larger segments — faster searches.

Force Merge

Running a force merge reduces segment count dramatically, at the cost of heavy I/O:

POST /my-index/_forcemerge?max_num_segments=1

This collapses all segments into one. It dramatically speeds up searches on read-heavy indices and reduces file handle usage. On large indices, run it during maintenance windows — it is I/O and CPU intensive, and Elasticsearch blocks new searches against indices being force-merged.

When to Use / When Not to Use

When to Use Elasticsearch

Log and event analysis at scale, especially with the ELK stack
Full-text search with complex relevance tuning and fuzzy matching
Real-time analytics on time-series data with aggregations
Distributed search where horizontal scalability is a requirement
Autocomplete and type-ahead features via completion suggester

When Not to Use Elasticsearch

Primary data store requiring ACID transactions (use a proper database)
Simple key-value lookups where a document store suffices
Heavy join operations across multiple entity types
Systems requiring strong consistency (Elasticsearch is eventually consistent by default)
Small datasets where a single PostgreSQL tsvector or SQLite FTS5 would suffice

Trade-off Analysis

Elasticsearch is not the only game in town. Depending on your use case, an alternative may serve you better.

Elasticsearch vs Apache Solr

Factor	Elasticsearch	Apache Solr
Ecosystem	ELK Stack (Kibana, etc.)	Solr + SolrCloud, Banana (older)
Operational model	Born for distributed	Distributed via SolrCloud
Schema	Dynamic mappings	Strict schema (SolrCell for parsing)
Query DSL	JSON-based, developer-friendly	XML-based (older), JSON supported
Scaling	Horizontal by default	Horizontal with auto-replication
Security	XPack (paid), builtin	Apache Shiro-based, configurable
Learning curve	Lower	Higher (deeper Lucene knowledge needed)
Use case fit	Logs, metrics, search	Enterprise search, faceted catalogs

Solr has a deeper history in enterprise search and handles complex faceted search (like e-commerce product catalogs) beautifully. Elasticsearch wins on operational simplicity, the Kibana ecosystem, and native horizontal scaling.

Elasticsearch vs OpenSearch

Factor	Elasticsearch	OpenSearch
Licensing	SSPL (since 2021)	Apache 2.0
Governance	Elastic N.V.	Linux Foundation (community-driven)
Feature parity	Faster innovation cycles	Elasticsearch features, ported over
Plugins	Commercial XPack	OpenSearch plugins
Security	XPack Security (paid)	OpenSearch Security (free)
Use case fit	Proprietary stack	If you need Apache-licensed search

If licensing concerns (SSPL) are a blocker, OpenSearch is the most mature drop-in replacement with near-complete API compatibility.

Elasticsearch vs Meilisearch

Factor	Elasticsearch	Meilisearch
Complexity	Full cluster management	Single binary, runs anywhere
Relevance tuning	Deep BM25 + scoring	Basic relevance, typo tolerance
Indexing speed	Very fast at scale	Extremely fast, even at small scale
Memory footprint	Several GB (JVM heap)	< 100MB
Distributed	Yes (native sharding)	Limited (single replica only)
Use case fit	Billions of docs, complex queries	Small-medium datasets, typo-tolerant search, fast prototype
Configuration	Expert-level tuning	Works out of the box

Meilisearch is the right choice when you want Elasticsearch-grade relevance without the operational overhead. It excels at typo-tolerant search, autocomplete, and datasets under 100 million documents.

Elasticsearch vs Typesense

Factor	Elasticsearch	Typesense
License	SSPL	GPL 3 (self-hosted), cloud-only (hosted)
Memory	JVM heap-heavy	~100MB-1GB
Query model	Full Query DSL	Simple, typo tolerance built-in
Distributed	Native sharding	Raft-based clustering
Faceting	Rich aggregations	Limited (planned)
Use case fit	Complex enterprise search	Typo-tolerant search, autocomplete, ranking
Dev experience	Steep learning curve	Quick start, friendly API

Typesense is optimized for developer experience and typo-tolerant search. If you need sub-100ms search with minimal DevOps and can live with simpler faceting, Typesense wins on simplicity.

Decision Matrix

Your need	Best choice
Billions of logs + Kibana dashboards	Elasticsearch
Enterprise faceted catalog search	Apache Solr
License-free search engine	OpenSearch
Fast prototype, small team, typo tolerance	Meilisearch
Autocomplete, typo-tolerant, low-ops	Typesense
Deep relevance tuning, complex queries	Elasticsearch
ACID transactions with search	PostgreSQL + tsvector

Production Failure Scenarios

Failure	Impact	Mitigation
Primary shard allocation fails	Index unavailable for writes	Set `index.number_of_replicas >= 1` and use automatic retry
Coordinating node OOM	Search queries fail across cluster	Limit `search.max_buckets`, add circuit breakers, increase heap
Split-brain during network partition	Duplicate data or conflicting primaries	Use `minimum_master_nodes` ( quorum), prefer single-zone clusters
Bulk indexing queue overflow	Documents rejected, indexing lag	Size queue with `thread_pool.write.queue_size`, implement backpressure
Hot/Warm node imbalance	Some nodes run out of disk or CPU	Use Index Lifecycle Management (ILM), allocate shards manually
Incorrect analyzer causing data loss	Documents return no results	Test analyzers with `_analyze` API before applying to production indices

Common Pitfalls / Anti-Patterns

Over-Sharding

Creating too many shards wastes memory and increases cluster metadata overhead. Each shard maintains its own segment files, segment metadata, and caches. A rule of thumb: keep shard size between 10GB and 50GB. If you have 1TB of data, five 200GB shards is better than fifty 20GB shards.

Fix: Plan shard count based on expected data volume. Use index templates with ILM to auto-delete old indices.

Using Query Context for Filters

Queries in must context score every document, which is expensive when you only need filtering. Move static filters to filter context.

Before (slow):

{
  "query": {
    "bool": {
      "must": [
        { "match": { "content": "search term" } },
        { "term": { "status": "published" } }
      ]
    }
  }
}

After (faster):

{
  "query": {
    "bool": {
      "must": [{ "match": { "content": "search term" } }],
      "filter": [{ "term": { "status": "published" } }]
    }
  }
}

Ignoring Refresh Interval

New documents are not searchable until the next refresh (default 1 second). For bulk indexing, this is fine, but for near-real-time requirements, understand the tradeoff. Setting refresh_interval: -1 disables auto-refresh and dramatically speeds up bulk ingestion.

Fix: Adjust refresh_interval based on write vs. read latency requirements.

Not Using Aliases for Zero-Downtime Reindexing

Reindexing directly into an existing index causes downtime and potential data inconsistency.

Fix: Use index aliases with swap operations:

POST /_aliases
{
  "actions": [
    { "remove": { "index": "my-index-v1", "alias": "my-index" } },
    { "add": { "index": "my-index-v2", "alias": "my-index" } }
  ]
}

Quick Recap

Key Bullets

Elasticsearch stores data in inverted indexes, enabling millisecond full-text queries
Shard count is fixed at index creation — plan wisely (10GB-50GB per shard target)
Use filter context for non-scoring queries to leverage caching
Replicas provide fault tolerance and read scaling; they do not help write throughput
The Query DSL separates leaf queries (match, term) from compound queries (bool)
Index aliases enable zero-downtime reindexing and blue-green deployments
BM25 is the default similarity algorithm; it handles term frequency saturation

Copy/Paste Checklist

# Check cluster health
GET /_cluster/health

# View shard allocation
GET /_cat/shards?v

# Monitor search latency
GET /_nodes/stats/indices/search?filter_path=**.*.query_total&timeout=30s

# Force merge to reduce segments
POST /my-index/_forcemerge?max_num_segments=1

# Update replica count
PUT /my-index/_settings
{
  "number_of_replicas": 2
}

# Check slow logs
GET /_nodes/stats/indices/indexing?filter_path=**.*.indexing.index.failed

# Set ILM policy
PUT /_ilm/policy/my-policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": { "max_age": "7d" }
        }
      }
    }
  }
}

Observability Checklist

Metrics to Monitor

{
  "cluster_metrics": {
    "cluster_health": "green/yellow/red status",
    "number_of_pending_tasks": "< 100 typically",
    "task_duration_avg": "< 1s for search, < 5s for bulk"
  },
  "node_metrics": {
    "heap_used_percent": "< 85% sustained",
    "cpu_usage_percent": "< 70% sustained",
    "disk_io_read/write": "baseline + anomaly detection",
    "open_file_handles": "< 80% of ulimit"
  },
  "index_metrics": {
    "indexing_rate": "documents/second",
    "search_latency_p99": "< 500ms for interactive, < 2s for batch",
    "refresh_latency": "< 1s per segment",
    "segments_count": "< 50 per shard (force merge if higher)"
  }
}

Key Logs to Capture

Error logs: logs/elasticsearch.log — shard failures, OOM events, circuit breaker trips
Deprecation logs: deprecated API usage, settings scheduled for removal
Slow search logs: configure index.search.slowlog.threshold to capture queries exceeding latency thresholds
Indexing slow logs: index.indexing.slowlog.threshold for bulk insert issues

# Enable slow logs via API (persistent across restarts)
PUT /my-index/_settings
{
  "index.search.slowlog.threshold.query.warn": "10s",
  "index.search.slowlog.threshold.fetch.warn": "1s",
  "index.indexing.slowlog.threshold.index.warn": "10s"
}

Alerts to Configure

Alert	Condition	Severity
Cluster health	`status == red`	Critical
Node heap usage	`heap_used_percent > 90%` for > 5 min	Warning
Search latency	`search_latency_p99 > 2s`	Warning
Unassigned shards	`unassigned_shards > 0` for > 5 min	Critical
Bulk queue rejection	`bulk_queue_rejections > 0`	Warning
Disk watermark	`disk.watermark.low` exceeded	Warning

Security Checklist

Enable XPack Security (or OpenSearch Security if using OpenSearch distro)
Use role-based access control (RBAC) — define roles with least-privilege principles
Encrypt node-to-node communication with TLS certificates
Restrict JMX/REST endpoints to internal networks; do not expose publicly
Validate input to prevent injection via query strings and aggregations
Audit access logs — log all write operations to sensitive indices
Rotate credentials regularly; use a secrets manager for API keys
Configure field-level security if certain fields must be hidden from some roles
Enable audit logging to track unauthorized access attempts

# Example: Create a read-only role for an index
POST /_security/role/read_only_blogs
{
  "indices": [
    {
      "names": ["blogs-*"],
      "privileges": ["read"]
    }
  ]
}

Interview Questions

1. What is an inverted index and how does it differ from a row-based database index?

An inverted index maps terms (words) to documents rather than documents to terms. When you search for a word, the database looks up the word in the index and immediately retrieves matching documents — no full table scan. A row-based index (like a B-tree in PostgreSQL) maps rows to disk locations for fast primary-key lookups. Inverted indexes enable full-text search capabilities that row-based indexes cannot efficiently support without scanning every row.

2. How does shard routing work in Elasticsearch?

Documents are routed to shards using a consistent hash of the document ID: shard = hash(_id) % num_primary_shards. This formula is deterministic — the same document ID always routes to the same shard as long as the number of primary shards does not change. When a search request arrives, the coordinating node broadcasts to all shards, collects results, merges them, and returns the response.

3. What is the difference between `must`, `should`, `filter`, and `must_not` clauses in a bool query?

must — documents must match; contributes to relevance score (AND logic). should — documents should match; contributes to score if they do (OR logic). filter — documents must match but without scoring; takes advantage of filter cache (faster). must_not — documents must not match; excluded from results (NOT logic). Filters bypass scoring entirely, making them faster than queries in must context.

4. What is BM25 and how does it handle relevance scoring?

BM25 (Okapi BM25) is Elasticsearch's default similarity algorithm. It improves on TF-IDF by introducing term frequency saturation — a term appearing 10 times does not score 10x a term appearing once. It also normalizes by field length: a term in a 10-word field scores higher than the same term in a 500-word field. BM25's saturation function prevents common words from dominating scores and handles long documents more fairly than naive TF-IDF.

5. What are primary shards and replica shards, and how do they differ in function?

Primary shards store the original data and handle both reads and writes. Replica shards are copies of primary shards, serve read requests for horizontal read scaling, and provide fault tolerance — if a primary fails, a replica is promoted automatically. Replicas never contain more data than their primary and are never allocated on the same node as their primary. Replicas do not improve write throughput; only primary shards handle writes.

6. What is the Elasticsearch analyzer pipeline and what are its three stages?

The analyzer pipeline has three stages: (1) Character filters — strip HTML tags, normalize Unicode, apply character mappings; (2) Tokenizer — splits text into individual tokens; (3) Token filters — lowercase tokens, remove stop words, apply synonyms, perform stemming. The output of all three stages is what gets stored in the inverted index. Choosing the right analyzer is critical — using the wrong one can make documents unfindable.

7. How does the translog provide durability in Elasticsearch?

The translog is a write-ahead log. Every indexed document is written to the translog before being acknowledged. On a crash, un-flushed documents are recovered by replaying the translog. This is why Elasticsearch guarantees durability even though it is not a true ACID database. Translog segments are truncated after a successful segment flush to disk. You can tune translog.sync_interval and translog.retention based on your crash-recovery SLA.

8. What is the difference between filter context and query context in Elasticsearch?

In filter context, a clause determines whether a document is included or excluded but does not affect the relevance score. Elasticsearch caches the resulting bitset for reuse — subsequent identical filters are near-instant. In query context, clauses affect the relevance score; every matching document is scored. Always use filter context for non-scoring criteria like date ranges, status fields, or numeric filters to leverage caching and skip scoring.

9. What is a Lucene segment merge policy and why does it matter?

Lucene writes new documents into small immutable segments. As segments accumulate, search performance degrades. A merge policy (e.g., TieredMergePolicy) periodically coalesces small segments into larger ones. Key parameters include max_merge_at_once (segments merged per round), segments_per_tier (small segments allowed before merge), and max_merged_segment_size_bytes (cap on merged segment size). Tuning these matters for write-heavy vs. read-heavy workloads — too aggressive merging hurts indexing throughput.

10. What is the recommended heap sizing for Elasticsearch nodes and why?

The sweet spot is 30–32GB heap per node. Below 30GB, Lucene uses heap memory for file system caches, which hurts I/O. Above 32GB, JVM pointer compression breaks down and Lucene may start using direct memory buffers outside the heap, making memory accounting less predictable. A typical production node: 64GB machine with 31GB heap, 1GB OS reserve, remaining RAM for the OS page cache (Lucene uses this heavily for file system caching). Always set -Xms equal to -Xmx to prevent heap resizing.

11. When would you use an index alias instead of a direct index name?

Index aliases enable zero-downtime reindexing. During a reindex from my-index-v1 to my-index-v2, you atomically swap the alias my-index from the old index to the new one. Applications continue querying my-index without knowing which version is active. Aliases also support blue-green deployments, traffic routing to specific shards via routing rules, and querying across multiple indices with a single name.

12. How do you prevent split-brain scenarios in an Elasticsearch cluster?

Split-brain occurs during network partitions when nodes cannot communicate but both sides still accept writes. Elasticsearch prevents this using the quorum formula: (primary_shards + replica_shards) / 2 + 1. Only nodes with a quorum can elect a master. Set discovery.zen.minimum_master_nodes to a value that ensures a quorum cannot exist on both sides of a partition. In modern Elasticsearch (7+), the default discovery avoids split-brain by requiring a majority for master election.

13. How does Elasticsearch handle sorting and what is the difference between `keyword` and `text` fields for sorting?

Text fields are analyzed — they break text into tokens (e.g., "Hello World" becomes ["hello", "world"]). You cannot sort or aggregate on analyzed text fields directly because the tokenized values lose the original order. Use keyword fields (or a multi-field with a keyword sub-field) for sorting, faceting, and aggregations. For sortable full-text, use a multi-field mapping with both text (for search) and keyword (for sorting). Example: "name": { "type": "text", "fields": { "keyword": { "type": "keyword" } } }.

14. What is the difference between `term`, `match`, and `query_string` queries?

term queries look up an exact value in the inverted index — no analysis is applied. Use it for keyword fields, IDs, and exact matches. match queries analyze the query string and then look up each resulting token — this is what you want for full-text search on text fields. query_string parses a query syntax string (like Lucene's) supporting AND, OR, NOT, phrase queries, wildcards, and field-specific searches in a single string. query_string is powerful but risky — it can throw parse errors on malformed input and has security implications if user-supplied.

15. What are index templates and when would you use them?

Index templates define settings and mappings that are automatically applied when new indices are created matching a pattern. They are essential for time-based data (like logs) where you create a new index per day or month. A template can set the number of shards, replicas, refresh intervals, field mappings, and ILM policies. Use composable templates (7.8+) over legacy index templates for better control over priority and module merging. Templates are applied at index creation time — they never update existing indices.

16. How does Index Lifecycle Management (ILM) work and what are the typical phases?

ILM automates index management across its lifetime. Typical phases: Hot — active indexing and searching, high resources; Warm — read-only, reduced shards or force-merged; Cold — no new writes, searchable but with slower storage; Delete — remove old indices after retention period. ILM policies are attached to indices via index templates. Actions within phases include rollover (start new index when size/age threshold is hit), shrink (reduce primary shards), force merge, freeze, and delete.

17. What is the difference between refresh, flush, and force merge in Elasticsearch?

Refresh makes in-memory indexed data searchable. It creates a new Lucene segment from the index buffer. Default interval is 1 second — this is why Elasticsearch is near-real-time, not truly real-time. Flush persists buffered data to disk as a new Lucene segment and clears the translog. It is about durability, not searchability. Force merge reduces the number of Lucene segments by merging many small ones into fewer large ones. It improves search performance on read-heavy indices but is I/O intensive and should be scheduled during maintenance windows.

18. What is a rollover index and how does it relate to ILM?

A rollover index creates a new write index when the current one reaches a size, age, or doc-count threshold. Instead of appending to a single ever-growing index (which creates giant shards), you roll over to a fresh index. This is the core of ILM's hot phase — the rollover alias always points to the active write index while older indices transition through warm, cold, and delete phases. Example: PUT /my-index-000001 with alias my-index as write index; when it hits max_age: 7d, ILM creates my-index-000002 and switches the alias.

19. How does Elasticsearch handle nested documents and when would you need them?

Nested documents are stored as separate Lucene documents with a shared doc ID, preserving the relationship between objects. Without nested mapping, objects are flattened — a query matching two conditions might match different child objects in the same parent. Nested queries treat the array as independent documents and correctly enforce must-match-on-same-child logic. Downside: nested queries are significantly slower because they require a join-like step. If you frequently query by fields inside nested objects and exact object-level matching matters, use nested. Otherwise, consider denormalization (flattening) for better performance.

20. What are the circuit breakers in Elasticsearch and how do they prevent OOM?

Elasticsearch has multiple circuit breakers that abort requests that would consume too much heap memory, preventing OutOfMemoryError: parent circuit breaker — total heap across all breakers (defaults to 95% of heap); fielddata circuit breaker — estimated size to load fielddata into memory (default 40%); request circuit breaker — per-request memory allocations like buckets in aggregations (default 40%); inflight requests circuit breaker — total size of incoming requests (default 100%); accounting circuit breaker — memory held by segments not released on merge (default 100%). When a breaker trips, Elasticsearch returns a 429 status with a circuit_breaking_exception. Tuning these requires care — setting them too high defeats the purpose; too low causes spurious rejections.

Elasticsearch: Full-Text Search at Scale

Core Concepts

The Inverted Index

Analyzer Pipeline

Sharding: Distributing Data Across Nodes

Shard Routing Explained

Replica Shards

The Query DSL: Expressing Search Logic

The Bool Query

Relevance Scoring

Designing Indices for Performance

Capacity Estimation

Under the Hood: Segments, Translog, and Merge Policies

Lucene Segments

The Translog

Merge Policies

Force Merge

When to Use / When Not to Use

When to Use Elasticsearch

When Not to Use Elasticsearch

Trade-off Analysis

Elasticsearch vs Apache Solr

Elasticsearch vs OpenSearch

Elasticsearch vs Meilisearch

Elasticsearch vs Typesense

Decision Matrix

Production Failure Scenarios

Common Pitfalls / Anti-Patterns

Over-Sharding

Using Query Context for Filters

Ignoring Refresh Interval

Not Using Aliases for Zero-Downtime Reindexing

Quick Recap

Key Bullets

Copy/Paste Checklist

Observability Checklist

Metrics to Monitor

Key Logs to Capture

Alerts to Configure

Security Checklist

Interview Questions

Further Reading

Category

Tags

Related Posts

Apache Solr: Enterprise Search Platform

Search Scaling: Sharding, Routing, and Horizontal Growth

Skip Lists: Layered Linked Lists for Fast Search