Elasticsearch: Full-Text Search at Scale

Learn how Elasticsearch powers search at scale with inverted indexes, sharding, replicas, and its powerful Query DSL for modern applications.

published: reading time: 27 min read author: GeekWorkBench

Elasticsearch: Full-Text Search at Scale

Elasticsearch stores data in inverted indexes — the key data structure that makes full-text search fast across billions of documents. This post covers how it works: inverted indexes, shards, replicas, and the Query DSL for writing searches that actually perform.

Core Concepts

The Inverted Index

Elasticsearch stores data in Lucene indices. The core data structure is the inverted index — a mapping from terms to the documents containing those terms. When you search for a word, the engine looks it up and retrieves matching documents immediately. No full table scan. This is what makes Elasticsearch fast.

{
  "inverted_index": {
    "elasticsearch": [{ "doc_id": 1, "positions": [0, 3] }],
    "tutorial": [{ "doc_id": 1, "positions": [1] }],
    "full-text": [{ "doc_id": 2, "positions": [0] }]
  }
}

When you search for “elasticsearch tutorial,” Elasticsearch looks up both terms in the inverted index and finds matching documents instantly. No full table scan required.

The index also stores metadata about each term: document frequency, term frequency, positions for phrase queries, and norms for field-length normalization. This metadata is what makes relevance scoring, phrase matching, and fuzzy search possible.

Analyzer Pipeline

Before terms enter the inverted index, they pass through an analyzer consisting of three stages:

  1. Character filters remove HTML tags, convert characters, or apply language-specific normalizations.
  2. Tokenizer splits text into individual terms (tokens).
  3. Token filters lowercase terms, remove stop words, apply synonyms, and perform stemming.
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "snowball", "asciifolding"]
        }
      }
    }
  }
}

Choosing the right analyzer matters. The standard analyzer works fine for most English text. For domain-specific vocabularies, you might need custom analyzers with synonym filters or language-specific stemmers.

Sharding: Distributing Data Across Nodes

A single Elasticsearch node can store millions of documents, but eventually you will need more storage, CPU, or memory than one machine provides. Elasticsearch solves this with shards: horizontal partitions of your index.

When you create an index, you specify the number of primary shards. Each primary shard is an independent Lucene index that stores a subset of your documents.

{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  }
}

With three primary shards, Elasticsearch distributes documents roughly evenly across them. A document with ID doc123 is routed to a specific shard using a hash of the ID: shard = hash(_id) % num_primary_shards.

Shard Routing Explained

graph TD
    Client[Client] -->|"search request"| LB[Load Balancer]
    LB --> Node1[Node 1]
    Node1 -->|"coordination"| Coordinator[Coordinator]
    Coordinator -->|"scatter"| Shard1[Primary Shard 1]
    Coordinator -->|"scatter"| Shard2[Primary Shard 2]
    Coordinator -->|"scatter"| Shard3[Primary Shard 3]
    Shard1 -->|"gather"| Coordinator
    Shard2 -->|"gather"| Coordinator
    Shard3 -->|"gather"| Coordinator
    Coordinator -->|"reduce"| Client

The node receiving the search request becomes the coordinator. It broadcasts the query to all relevant shards, collects results, and merges them into a single response. The scatter-gather pattern here is what lets you search across all shards in parallel.

Replica Shards

Every primary shard can have replicas for fault tolerance and read scalability. Replicas are never allocated on the same node as their primary. If a node fails, Elasticsearch promotes replicas to primaries automatically. Add replicas to linearly scale search throughput for read-heavy workloads.

The Query DSL: Expressing Search Logic

Elasticsearch’s Query DSL is a JSON-based language for expressing complex searches. It separates leaf queries (match, term) from compound queries (bool). Use filter context for non-scoring queries to leverage caching. Use query context when relevance scoring matters.

The Bool Query

The bool query is the workhorse of Elasticsearch search. It supports four clauses:

  • must: Documents must match (AND logic)
  • should: Documents should match (OR logic)
  • filter: Same as must but without scoring (faster)
  • must_not: Documents must not match (NOT logic)
{
  "query": {
    "bool": {
      "must": [{ "match": { "title": "elasticsearch" } }],
      "filter": [
        { "range": { "publish_date": { "gte": "2024-01-01" } } },
        { "term": { "status": "published" } }
      ],
      "should": [
        { "match": { "tags": "search" } },
        { "match": { "tags": "database" } }
      ],
      "minimum_should_match": 1
    }
  }
}

The filter context is particularly important. Queries in filter context bypass scoring entirely, and Elasticsearch caches filter bitsets for reuse. If you filter by a static field like status, subsequent queries become faster.

Relevance Scoring

Elasticsearch uses BM25 (Okapi BM25) as its default similarity algorithm. BM25 considers term frequency saturation and field length normalization. A term appearing 10 times in a 100-word field scores roughly the same as appearing 5 times in a 50-word field.

You can debug relevance with the explain parameter:

GET /my-index/_search
{
  "explain": true,
  "query": {
    "match": { "content": "elasticsearch tutorial" }
  }
}

The response shows exactly how each document scored, including term frequencies, inverse document frequencies, and field norms.

Designing Indices for Performance

Index design profoundly impacts search performance. A few principles:

Size your shards wisely. Shards between 10GB and 50GB work well. Too many small shards cause overhead; too few large shards cause memory pressure.

Use aliases for flexibility. Index aliases let you reindex without downtime:

POST /_aliases
{
  "actions": [
    { "remove": { "index": "my-index-v1", "alias": "my-index" } },
    { "add": { "index": "my-index-v2", "alias": "my-index" } }
  ]
}

Denormalize for read performance. Elasticsearch does not support joins like SQL. If you frequently query documents with nested objects, consider flattening the structure or using denormalization.

Capacity Estimation

Sizing an Elasticsearch cluster means estimating heap and document counts.

Heap allocation per node:

heap_per_node ≈
    (shards_per_node × segment_overhead)
  + (indexing_buffer × 10-20% of heap)
  + (query_cache × 10% of heap)
  + (fielddata × 10-20% of heap)
  + OS reserve (1GB)

Elasticsearch heap is shared across all shards on a node. Segment overhead alone is roughly 25MB per shard. With 30 shards on one node, that is 750MB before you even touch indexing buffers or caches.

Docs per shard guidelines:

Shard Size TargetDocs per Shard (approx)
10GB shard10-30M docs (variable by doc size)
30GB shard30-100M docs
50GB shard50-150M docs

Doc count is harder to predict than shard size. A 1KB doc and a 50KB doc land at very different doc counts even at the same shard size. Watch both.

Example: 500GB index with 3 primary shards and 1 replica each:

shards_per_node = 3 primaries + 3 replicas = 6 shards
at 30GB/shard = 180GB per node

heap needed:
  segment overhead: 6 × 25MB = 150MB
  indexing buffer: 512MB (default, scales with heap)
  query cache: 276MB (10% of ~2.7GB heap)
  fielddata: 276MB
  OS reserve: 1GB
  total ≈ 2.5GB minimum, 4GB recommended

The sweet spot is 30GB-32GB heap per node. Above 32GB, Lucene starts using direct memory buffers outside the Java heap, which shifts the memory math. Most production clusters run 31GB heap on 64GB machines — the remaining RAM goes to the OS page cache, which Lucene uses heavily for file system caching.

Under the Hood: Segments, Translog, and Merge Policies

Elasticsearch is built on Apache Lucene, and understanding how Lucene stores your data at the file level demystifies many production behaviors — sudden latency spikes, disk usage patterns, and heap growth all trace back to segment internals.

Lucene Segments

Every shard contains one or more Lucene segments. A segment is a self-contained, immutable inverted index. When you index a document, it goes into a small in-memory buffer. On refresh, that buffer is flushed into a new segment file on disk. Searches hit all segments and merge results.

Segment immutability means writes never modify existing segments — new documents create new segments. This design enables locking-free writes but creates a problem: many small segments degrade search performance.

Segment Lifecycle

    index    flush    merge
doc -> buffer -> segment -> (merged) -> larger segment
         \                  /
          \----------------/
           translog backup

The refresh_interval controls how often in-memory buffers become searchable (default: 1 second). Between flushes, data lives in both the buffer and the translog.

The Translog

The translog is a write-ahead log that durability-safes every indexed document before it is committed to a segment. If a crash occurs, Elasticsearch replays the translog to recover un-flushed documents. This is the key difference between Elasticsearch’s near-real-time promise and true real-time guarantees.

Translog truncation happens automatically after a successful flush. You can force a translog flush:

POST /my-index/_flush

For write-heavy workloads, sizing the translog matters:

{
  "index": {
    "translog": {
      "disable_threshold": "1h",
      "sync_interval": "5s",
      "retention": {
        "size": "512MB",
        "age": "12h"
      }
    }
  }
}

Setting retention.size too low causes data loss on crashes. Setting it too high wastes disk space. The right value depends on your crash-recovery SLA.

Merge Policies

When segments accumulate, Lucene runs a merge policy to coalesce small segments into larger ones. The default policy is TieredMergePolicy (Lucene 9+) or LogByteSizeMergePolicy (older).

TieredMergePolicy targets a balanced segment size ladder:

{
  "index": {
    "merge_policy": {
      "type": "tiered",
      "max_merge_at_once": 10,
      "segments_per_tier": 10,
      "max_merged_segment_size_bytes": "5GB"
    }
  }
}

Key parameters:

  • max_merge_at_once — how many segments can be merged in a single merge round (default: 10)
  • segments_per_tier — how many small segments are allowed before a merge is triggered (default: 10)
  • max_merged_segment_size_bytes — caps the size of merged segments to prevent giant segments

LogByteSizeMergePolicy (simpler, older):

{
  "index": {
    "merge_policy": {
      "type": "log_byte_size",
      "min_merge_size": "2MB",
      "max_merge_size": "10GB"
    }
  }
}

For write-heavy workloads (like log ingestion), set max_merge_at_once lower so merges don’t steal CPU from indexing. For read-heavy workloads, higher values mean fewer but larger segments — faster searches.

Force Merge

Running a force merge reduces segment count dramatically, at the cost of heavy I/O:

POST /my-index/_forcemerge?max_num_segments=1

This collapses all segments into one. It dramatically speeds up searches on read-heavy indices and reduces file handle usage. On large indices, run it during maintenance windows — it is I/O and CPU intensive, and Elasticsearch blocks new searches against indices being force-merged.

When to Use / When Not to Use

When to Use Elasticsearch

  • Log and event analysis at scale, especially with the ELK stack
  • Full-text search with complex relevance tuning and fuzzy matching
  • Real-time analytics on time-series data with aggregations
  • Distributed search where horizontal scalability is a requirement
  • Autocomplete and type-ahead features via completion suggester

When Not to Use Elasticsearch

  • Primary data store requiring ACID transactions (use a proper database)
  • Simple key-value lookups where a document store suffices
  • Heavy join operations across multiple entity types
  • Systems requiring strong consistency (Elasticsearch is eventually consistent by default)
  • Small datasets where a single PostgreSQL tsvector or SQLite FTS5 would suffice

Trade-off Analysis

Elasticsearch is not the only game in town. Depending on your use case, an alternative may serve you better.

Elasticsearch vs Apache Solr

FactorElasticsearchApache Solr
EcosystemELK Stack (Kibana, etc.)Solr + SolrCloud, Banana (older)
Operational modelBorn for distributedDistributed via SolrCloud
SchemaDynamic mappingsStrict schema (SolrCell for parsing)
Query DSLJSON-based, developer-friendlyXML-based (older), JSON supported
ScalingHorizontal by defaultHorizontal with auto-replication
SecurityXPack (paid), builtinApache Shiro-based, configurable
Learning curveLowerHigher (deeper Lucene knowledge needed)
Use case fitLogs, metrics, searchEnterprise search, faceted catalogs

Solr has a deeper history in enterprise search and handles complex faceted search (like e-commerce product catalogs) beautifully. Elasticsearch wins on operational simplicity, the Kibana ecosystem, and native horizontal scaling.

Elasticsearch vs OpenSearch

FactorElasticsearchOpenSearch
LicensingSSPL (since 2021)Apache 2.0
GovernanceElastic N.V.Linux Foundation (community-driven)
Feature parityFaster innovation cyclesElasticsearch features, ported over
PluginsCommercial XPackOpenSearch plugins
SecurityXPack Security (paid)OpenSearch Security (free)
Use case fitProprietary stackIf you need Apache-licensed search

If licensing concerns (SSPL) are a blocker, OpenSearch is the most mature drop-in replacement with near-complete API compatibility.

Elasticsearch vs Meilisearch

FactorElasticsearchMeilisearch
ComplexityFull cluster managementSingle binary, runs anywhere
Relevance tuningDeep BM25 + scoringBasic relevance, typo tolerance
Indexing speedVery fast at scaleExtremely fast, even at small scale
Memory footprintSeveral GB (JVM heap)< 100MB
DistributedYes (native sharding)Limited (single replica only)
Use case fitBillions of docs, complex queriesSmall-medium datasets, typo-tolerant search, fast prototype
ConfigurationExpert-level tuningWorks out of the box

Meilisearch is the right choice when you want Elasticsearch-grade relevance without the operational overhead. It excels at typo-tolerant search, autocomplete, and datasets under 100 million documents.

Elasticsearch vs Typesense

FactorElasticsearchTypesense
LicenseSSPLGPL 3 (self-hosted), cloud-only (hosted)
MemoryJVM heap-heavy~100MB-1GB
Query modelFull Query DSLSimple, typo tolerance built-in
DistributedNative shardingRaft-based clustering
FacetingRich aggregationsLimited (planned)
Use case fitComplex enterprise searchTypo-tolerant search, autocomplete, ranking
Dev experienceSteep learning curveQuick start, friendly API

Typesense is optimized for developer experience and typo-tolerant search. If you need sub-100ms search with minimal DevOps and can live with simpler faceting, Typesense wins on simplicity.

Decision Matrix

Your needBest choice
Billions of logs + Kibana dashboardsElasticsearch
Enterprise faceted catalog searchApache Solr
License-free search engineOpenSearch
Fast prototype, small team, typo toleranceMeilisearch
Autocomplete, typo-tolerant, low-opsTypesense
Deep relevance tuning, complex queriesElasticsearch
ACID transactions with searchPostgreSQL + tsvector

Production Failure Scenarios

FailureImpactMitigation
Primary shard allocation failsIndex unavailable for writesSet index.number_of_replicas >= 1 and use automatic retry
Coordinating node OOMSearch queries fail across clusterLimit search.max_buckets, add circuit breakers, increase heap
Split-brain during network partitionDuplicate data or conflicting primariesUse minimum_master_nodes ( quorum), prefer single-zone clusters
Bulk indexing queue overflowDocuments rejected, indexing lagSize queue with thread_pool.write.queue_size, implement backpressure
Hot/Warm node imbalanceSome nodes run out of disk or CPUUse Index Lifecycle Management (ILM), allocate shards manually
Incorrect analyzer causing data lossDocuments return no resultsTest analyzers with _analyze API before applying to production indices

Common Pitfalls / Anti-Patterns

Over-Sharding

Creating too many shards wastes memory and increases cluster metadata overhead. Each shard maintains its own segment files, segment metadata, and caches. A rule of thumb: keep shard size between 10GB and 50GB. If you have 1TB of data, five 200GB shards is better than fifty 20GB shards.

Fix: Plan shard count based on expected data volume. Use index templates with ILM to auto-delete old indices.

Using Query Context for Filters

Queries in must context score every document, which is expensive when you only need filtering. Move static filters to filter context.

Before (slow):

{
  "query": {
    "bool": {
      "must": [
        { "match": { "content": "search term" } },
        { "term": { "status": "published" } }
      ]
    }
  }
}

After (faster):

{
  "query": {
    "bool": {
      "must": [{ "match": { "content": "search term" } }],
      "filter": [{ "term": { "status": "published" } }]
    }
  }
}

Ignoring Refresh Interval

New documents are not searchable until the next refresh (default 1 second). For bulk indexing, this is fine, but for near-real-time requirements, understand the tradeoff. Setting refresh_interval: -1 disables auto-refresh and dramatically speeds up bulk ingestion.

Fix: Adjust refresh_interval based on write vs. read latency requirements.

Not Using Aliases for Zero-Downtime Reindexing

Reindexing directly into an existing index causes downtime and potential data inconsistency.

Fix: Use index aliases with swap operations:

POST /_aliases
{
  "actions": [
    { "remove": { "index": "my-index-v1", "alias": "my-index" } },
    { "add": { "index": "my-index-v2", "alias": "my-index" } }
  ]
}

Quick Recap

Key Bullets

  • Elasticsearch stores data in inverted indexes, enabling millisecond full-text queries
  • Shard count is fixed at index creation — plan wisely (10GB-50GB per shard target)
  • Use filter context for non-scoring queries to leverage caching
  • Replicas provide fault tolerance and read scaling; they do not help write throughput
  • The Query DSL separates leaf queries (match, term) from compound queries (bool)
  • Index aliases enable zero-downtime reindexing and blue-green deployments
  • BM25 is the default similarity algorithm; it handles term frequency saturation

Copy/Paste Checklist

# Check cluster health
GET /_cluster/health

# View shard allocation
GET /_cat/shards?v

# Monitor search latency
GET /_nodes/stats/indices/search?filter_path=**.*.query_total&timeout=30s

# Force merge to reduce segments
POST /my-index/_forcemerge?max_num_segments=1

# Update replica count
PUT /my-index/_settings
{
  "number_of_replicas": 2
}

# Check slow logs
GET /_nodes/stats/indices/indexing?filter_path=**.*.indexing.index.failed

# Set ILM policy
PUT /_ilm/policy/my-policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": { "max_age": "7d" }
        }
      }
    }
  }
}

Observability Checklist

Metrics to Monitor

{
  "cluster_metrics": {
    "cluster_health": "green/yellow/red status",
    "number_of_pending_tasks": "< 100 typically",
    "task_duration_avg": "< 1s for search, < 5s for bulk"
  },
  "node_metrics": {
    "heap_used_percent": "< 85% sustained",
    "cpu_usage_percent": "< 70% sustained",
    "disk_io_read/write": "baseline + anomaly detection",
    "open_file_handles": "< 80% of ulimit"
  },
  "index_metrics": {
    "indexing_rate": "documents/second",
    "search_latency_p99": "< 500ms for interactive, < 2s for batch",
    "refresh_latency": "< 1s per segment",
    "segments_count": "< 50 per shard (force merge if higher)"
  }
}

Key Logs to Capture

  • Error logs: logs/elasticsearch.log — shard failures, OOM events, circuit breaker trips
  • Deprecation logs: deprecated API usage, settings scheduled for removal
  • Slow search logs: configure index.search.slowlog.threshold to capture queries exceeding latency thresholds
  • Indexing slow logs: index.indexing.slowlog.threshold for bulk insert issues
# Enable slow logs via API (persistent across restarts)
PUT /my-index/_settings
{
  "index.search.slowlog.threshold.query.warn": "10s",
  "index.search.slowlog.threshold.fetch.warn": "1s",
  "index.indexing.slowlog.threshold.index.warn": "10s"
}

Alerts to Configure

AlertConditionSeverity
Cluster healthstatus == redCritical
Node heap usageheap_used_percent > 90% for > 5 minWarning
Search latencysearch_latency_p99 > 2sWarning
Unassigned shardsunassigned_shards > 0 for > 5 minCritical
Bulk queue rejectionbulk_queue_rejections > 0Warning
Disk watermarkdisk.watermark.low exceededWarning

Security Checklist

  • Enable XPack Security (or OpenSearch Security if using OpenSearch distro)
  • Use role-based access control (RBAC) — define roles with least-privilege principles
  • Encrypt node-to-node communication with TLS certificates
  • Restrict JMX/REST endpoints to internal networks; do not expose publicly
  • Validate input to prevent injection via query strings and aggregations
  • Audit access logs — log all write operations to sensitive indices
  • Rotate credentials regularly; use a secrets manager for API keys
  • Configure field-level security if certain fields must be hidden from some roles
  • Enable audit logging to track unauthorized access attempts
# Example: Create a read-only role for an index
POST /_security/role/read_only_blogs
{
  "indices": [
    {
      "names": ["blogs-*"],
      "privileges": ["read"]
    }
  ]
}

Interview Questions

1. What is an inverted index and how does it differ from a row-based database index?

An inverted index maps terms (words) to documents rather than documents to terms. When you search for a word, the database looks up the word in the index and immediately retrieves matching documents — no full table scan. A row-based index (like a B-tree in PostgreSQL) maps rows to disk locations for fast primary-key lookups. Inverted indexes enable full-text search capabilities that row-based indexes cannot efficiently support without scanning every row.

2. How does shard routing work in Elasticsearch?

Documents are routed to shards using a consistent hash of the document ID: shard = hash(_id) % num_primary_shards. This formula is deterministic — the same document ID always routes to the same shard as long as the number of primary shards does not change. When a search request arrives, the coordinating node broadcasts to all shards, collects results, merges them, and returns the response.

3. What is the difference between `must`, `should`, `filter`, and `must_not` clauses in a bool query?

must — documents must match; contributes to relevance score (AND logic). should — documents should match; contributes to score if they do (OR logic). filter — documents must match but without scoring; takes advantage of filter cache (faster). must_not — documents must not match; excluded from results (NOT logic). Filters bypass scoring entirely, making them faster than queries in must context.

4. What is BM25 and how does it handle relevance scoring?

BM25 (Okapi BM25) is Elasticsearch's default similarity algorithm. It improves on TF-IDF by introducing term frequency saturation — a term appearing 10 times does not score 10x a term appearing once. It also normalizes by field length: a term in a 10-word field scores higher than the same term in a 500-word field. BM25's saturation function prevents common words from dominating scores and handles long documents more fairly than naive TF-IDF.

5. What are primary shards and replica shards, and how do they differ in function?

Primary shards store the original data and handle both reads and writes. Replica shards are copies of primary shards, serve read requests for horizontal read scaling, and provide fault tolerance — if a primary fails, a replica is promoted automatically. Replicas never contain more data than their primary and are never allocated on the same node as their primary. Replicas do not improve write throughput; only primary shards handle writes.

6. What is the Elasticsearch analyzer pipeline and what are its three stages?

The analyzer pipeline has three stages: (1) Character filters — strip HTML tags, normalize Unicode, apply character mappings; (2) Tokenizer — splits text into individual tokens; (3) Token filters — lowercase tokens, remove stop words, apply synonyms, perform stemming. The output of all three stages is what gets stored in the inverted index. Choosing the right analyzer is critical — using the wrong one can make documents unfindable.

7. How does the translog provide durability in Elasticsearch?

The translog is a write-ahead log. Every indexed document is written to the translog before being acknowledged. On a crash, un-flushed documents are recovered by replaying the translog. This is why Elasticsearch guarantees durability even though it is not a true ACID database. Translog segments are truncated after a successful segment flush to disk. You can tune translog.sync_interval and translog.retention based on your crash-recovery SLA.

8. What is the difference between filter context and query context in Elasticsearch?

In filter context, a clause determines whether a document is included or excluded but does not affect the relevance score. Elasticsearch caches the resulting bitset for reuse — subsequent identical filters are near-instant. In query context, clauses affect the relevance score; every matching document is scored. Always use filter context for non-scoring criteria like date ranges, status fields, or numeric filters to leverage caching and skip scoring.

9. What is a Lucene segment merge policy and why does it matter?

Lucene writes new documents into small immutable segments. As segments accumulate, search performance degrades. A merge policy (e.g., TieredMergePolicy) periodically coalesces small segments into larger ones. Key parameters include max_merge_at_once (segments merged per round), segments_per_tier (small segments allowed before merge), and max_merged_segment_size_bytes (cap on merged segment size). Tuning these matters for write-heavy vs. read-heavy workloads — too aggressive merging hurts indexing throughput.

10. What is the recommended heap sizing for Elasticsearch nodes and why?

The sweet spot is 30–32GB heap per node. Below 30GB, Lucene uses heap memory for file system caches, which hurts I/O. Above 32GB, JVM pointer compression breaks down and Lucene may start using direct memory buffers outside the heap, making memory accounting less predictable. A typical production node: 64GB machine with 31GB heap, 1GB OS reserve, remaining RAM for the OS page cache (Lucene uses this heavily for file system caching). Always set -Xms equal to -Xmx to prevent heap resizing.

11. When would you use an index alias instead of a direct index name?

Index aliases enable zero-downtime reindexing. During a reindex from my-index-v1 to my-index-v2, you atomically swap the alias my-index from the old index to the new one. Applications continue querying my-index without knowing which version is active. Aliases also support blue-green deployments, traffic routing to specific shards via routing rules, and querying across multiple indices with a single name.

12. How do you prevent split-brain scenarios in an Elasticsearch cluster?

Split-brain occurs during network partitions when nodes cannot communicate but both sides still accept writes. Elasticsearch prevents this using the quorum formula: (primary_shards + replica_shards) / 2 + 1. Only nodes with a quorum can elect a master. Set discovery.zen.minimum_master_nodes to a value that ensures a quorum cannot exist on both sides of a partition. In modern Elasticsearch (7+), the default discovery avoids split-brain by requiring a majority for master election.

13. How does Elasticsearch handle sorting and what is the difference between `keyword` and `text` fields for sorting?

Text fields are analyzed — they break text into tokens (e.g., "Hello World" becomes ["hello", "world"]). You cannot sort or aggregate on analyzed text fields directly because the tokenized values lose the original order. Use keyword fields (or a multi-field with a keyword sub-field) for sorting, faceting, and aggregations. For sortable full-text, use a multi-field mapping with both text (for search) and keyword (for sorting). Example: "name": { "type": "text", "fields": { "keyword": { "type": "keyword" } } }.

14. What is the difference between `term`, `match`, and `query_string` queries?

term queries look up an exact value in the inverted index — no analysis is applied. Use it for keyword fields, IDs, and exact matches. match queries analyze the query string and then look up each resulting token — this is what you want for full-text search on text fields. query_string parses a query syntax string (like Lucene's) supporting AND, OR, NOT, phrase queries, wildcards, and field-specific searches in a single string. query_string is powerful but risky — it can throw parse errors on malformed input and has security implications if user-supplied.

15. What are index templates and when would you use them?

Index templates define settings and mappings that are automatically applied when new indices are created matching a pattern. They are essential for time-based data (like logs) where you create a new index per day or month. A template can set the number of shards, replicas, refresh intervals, field mappings, and ILM policies. Use composable templates (7.8+) over legacy index templates for better control over priority and module merging. Templates are applied at index creation time — they never update existing indices.

16. How does Index Lifecycle Management (ILM) work and what are the typical phases?

ILM automates index management across its lifetime. Typical phases: Hot — active indexing and searching, high resources; Warm — read-only, reduced shards or force-merged; Cold — no new writes, searchable but with slower storage; Delete — remove old indices after retention period. ILM policies are attached to indices via index templates. Actions within phases include rollover (start new index when size/age threshold is hit), shrink (reduce primary shards), force merge, freeze, and delete.

17. What is the difference between refresh, flush, and force merge in Elasticsearch?

Refresh makes in-memory indexed data searchable. It creates a new Lucene segment from the index buffer. Default interval is 1 second — this is why Elasticsearch is near-real-time, not truly real-time. Flush persists buffered data to disk as a new Lucene segment and clears the translog. It is about durability, not searchability. Force merge reduces the number of Lucene segments by merging many small ones into fewer large ones. It improves search performance on read-heavy indices but is I/O intensive and should be scheduled during maintenance windows.

18. What is a rollover index and how does it relate to ILM?

A rollover index creates a new write index when the current one reaches a size, age, or doc-count threshold. Instead of appending to a single ever-growing index (which creates giant shards), you roll over to a fresh index. This is the core of ILM's hot phase — the rollover alias always points to the active write index while older indices transition through warm, cold, and delete phases. Example: PUT /my-index-000001 with alias my-index as write index; when it hits max_age: 7d, ILM creates my-index-000002 and switches the alias.

19. How does Elasticsearch handle nested documents and when would you need them?

Nested documents are stored as separate Lucene documents with a shared doc ID, preserving the relationship between objects. Without nested mapping, objects are flattened — a query matching two conditions might match different child objects in the same parent. Nested queries treat the array as independent documents and correctly enforce must-match-on-same-child logic. Downside: nested queries are significantly slower because they require a join-like step. If you frequently query by fields inside nested objects and exact object-level matching matters, use nested. Otherwise, consider denormalization (flattening) for better performance.

20. What are the circuit breakers in Elasticsearch and how do they prevent OOM?

Elasticsearch has multiple circuit breakers that abort requests that would consume too much heap memory, preventing OutOfMemoryError: parent circuit breaker — total heap across all breakers (defaults to 95% of heap); fielddata circuit breaker — estimated size to load fielddata into memory (default 40%); request circuit breaker — per-request memory allocations like buckets in aggregations (default 40%); inflight requests circuit breaker — total size of incoming requests (default 100%); accounting circuit breaker — memory held by segments not released on merge (default 100%). When a breaker trips, Elasticsearch returns a 429 status with a circuit_breaking_exception. Tuning these requires care — setting them too high defeats the purpose; too low causes spurious rejections.

Further Reading

Elasticsearch: The Definitive Guide — official comprehensive guide

The inverted index is what makes Elasticsearch fast. Queries that would scan a relational database for minutes return in milliseconds. Sharding and replicas give you horizontal scalability and fault tolerance.

Category

Related Posts

Apache Solr: Enterprise Search Platform

Explore Apache Solr's powerful search capabilities including faceted search, relevance tuning, indexing strategies, and how it compares to Elasticsearch.

#search #solr #apache

Search Scaling: Sharding, Routing, and Horizontal Growth

Learn how to scale search systems horizontally with index sharding strategies, query routing, replication patterns, and cluster management techniques.

#search #scaling #elasticsearch

Skip Lists: Layered Linked Lists for Fast Search

Understand skip lists as probabilistic alternatives to balanced trees, providing O(log n) search with simple implementation and lock-free variants.

#skip-lists #data-structures #probabilistic