DNS-Based Service Discovery: Kubernetes, Consul, and etcd

Learn how DNS-based service discovery works in microservices platforms like Kubernetes, Consul, and etcd, including DNS naming conventions and SRV records.

published: March 24, 2026 reading time: 21 min read author: GeekWorkBench updated: May 17, 2026

Quick Summary

DNS-based service discovery stretches traditional DNS for dynamic microservice environments using short TTLs (30 seconds or less), SRV records for port discovery, and multi-value responses that naturally load-balance. Kubernetes CoreDNS watches the API and auto-creates DNS records for services, while headless services return individual pod IPs for stateful workloads needing direct pod-to-pod communication. Consul speaks DNS across datacenters with prepared queries for geo-based routing, and DNS caching at application, OS, and proxy layers creates staleness windows that readiness probes and short TTLs mitigate.

DNS-Based Service Discovery: Kubernetes, Consul, and etcd

Service discovery sits at the heart of any distributed system. Before a client can communicate with a service, it needs to find where that service lives on the network. DNS, the same protocol that translates domain names to IP addresses, has been stretched and adapted to solve this problem in modern microservices platforms.

This post covers how DNS-based service discovery works, the trade-offs involved, and how platforms like Kubernetes, Consul, and etcd each approach it.

How DNS Has Been Adapted for Service Discovery

Traditional DNS was designed for relatively static infrastructure. A server might change IP addresses once every few months, so TTLs (Time To Live) of hours or even days made sense. Microservices change constantly—pods get created and destroyed, containers scale up and down, services move between nodes.

DNS-based service discovery adapts the protocol in several ways:

Short TTLs: Service records expire quickly, often within 30 seconds or less. This allows clients to pick up changes rapidly without overwhelming the DNS infrastructure with queries.

Dynamic Registration: Services register themselves (or are registered by an agent) as they come online. When a service instance fails or is replaced, its DNS record is removed automatically.

SRV Records: Standard DNS A records map a name to an IP address. But services run on different ports. SRV records store both the target host and the port number, allowing complete endpoint information in DNS.

Multi-value Responses: A single DNS query can return multiple IP addresses. Load balancing becomes a matter of rotating through these values.

graph TD
    Client[Client Application] -->|Queries| DNS[DNS Server]
    DNS -->|Returns A/SRV records| Client

    subgraph "Service Instances"
        S1[Service-A:8080]
        S2[Service-A:8080]
        S3[Service-B:3000]
    end

    Registry[Service Registry] -->|Watches for changes| DNS
    S1 -->|Registers| Registry
    S2 -->|Registers| Registry
    S3 -->|Registers| Registry

    S1 -.->|Health check fails| Registry
    Registry -.->|Removes record| DNS

The diagram shows the basic pattern. Services register with a central registry. The registry pushes updates to DNS. Clients query DNS to discover endpoints. When health checks fail, records disappear.

Kubernetes DNS

Kubernetes operates its own internal DNS service for pod and service discovery. Understanding how this works helps you design better service communication patterns.

kube-dns and CoreDNS

Early Kubernetes versions shipped with kube-dns, which bundled SkyDNS. Modern clusters run CoreDNS instead—a modular DNS server written in Go that became the default in Kubernetes 1.11.

CoreDNS runs as a deployment in kube-system, usually with a couple replicas for HA. It watches the Kubernetes API for service and endpoint changes, rebuilding its zone data on every meaningful change.

The CoreDNS configuration lives in a ConfigMap fittingly named coredns. The default setup looks like this:

apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
data:
  Corefile: |
    .:53 {
        errors
        health
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           upstream
           fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        proxy . /etc/resolv.conf
        cache 30
    }

Service DNS Naming Conventions

Kubernetes services get DNS names that follow a predictable pattern:

<service-name>.<namespace>.svc.<cluster-domain>

So a service named “api-gateway” in the “production” namespace becomes:

api-gateway.production.svc.cluster.local

Same namespace? You can usually just use the service name. Different namespace? You need the full qualified name.

Headless services behave differently. When you set clusterIP: None, CoreDNS skips the VIP entirely and returns the IPs of backing pods directly:

apiVersion: v1
kind: Service
metadata:
  name: stateful-service
spec:
  clusterIP: None # This makes it headless
  selector:
    app: stateful-app
  ports:
    - port: 8080
      targetPort: http

With a headless service, DNS returns individual pod IPs. Your application handles load balancing—which is exactly what you want for stateful services where clients need to reach specific pods directly.

Consul DNS Interface

HashiCorp Consul takes a more traditional approach to service discovery. It runs a distributed, gossip-based cluster with agents on every node. Services register with local agents, which gossip information across the cluster.

The Consul DNS interface exposes everything through standard DNS queries. No API endpoint to query—just familiar DNS tools:

# Query for web service instances
dig @127.0.0.1 -p 8600 web.service.consul SRV

# Get just the IP addresses
dig @127.0.0.1 -p 8600 web.service.consul

Consul uses the .consul domain by default. Queries for web.service.consul return A records with the IP addresses of all healthy service instances.

DNS SRV Records for Port Discovery

SRV records become essential when services run on non-standard ports. Imagine a service catalog where different teams run their own instances on arbitrary ports. Clients do not hardcode port numbers, they discover them through DNS.

A Consul SRV response looks something like this:

;; ANSWER SECTION:
api.service.consul.    0   IN  SRV 1 1 8080 node1.service.consul.
api.service.consul.    0   IN  SRV 1 1 8081 node2.service.consul.

;; ADDITIONAL SECTION:
node1.service.consul.  0   IN  A    10.0.1.10
node2.service.consul.  0   IN  A    10.0.1.11

The SRV record tells you that two instances exist, on ports 8080 and 8081 respectively, running on nodes with those IP addresses.

Prepared Queries

Consul supports prepared queries—saved query templates on the server side. These enable advanced patterns like geo-based routing:

{
  "Name": "geo-routing",
  "Query": "api-fleet",
  "DNS": {
    "TTL": "10s"
  },
  "ServiceMeta": {
    "version": "v2"
  }
}

Clients then query geo-routing.query.consul and Consul returns instances based on the query definition.

etcd for Service Registration

etcd is the persistent store behind Kubernetes and many other distributed systems. It is not a DNS server, but it often sits underneath service registries that expose DNS interfaces.

Services store endpoint information in etcd’s hierarchical key-value space:

/services/api/10.0.1.10:8080
/services/api/10.0.1.11:8080

A separate component—etcd-watcher, a custom controller, whatever—watches these paths and updates DNS records when values change. Storage stays separate from DNS serving, which keeps things clean.

The advantage of etcd is its consistency and availability story. As a Raft-based consensus system, it handles network partitions gracefully and gives you strong consistency guarantees for registration data.

Watch operations in etcd notify listeners of changes immediately:

watcher := client.Watch(ctx, "/services/", client.WithPrefix())
for resp := range watcher {
    for _, event := range resp.Events {
        // Process registration/deregistration
    }
}

This reactive model works well for keeping DNS records current.

DNS Caching Challenges and TTL Considerations

Caching happens at multiple layers in the DNS resolution path. Each layer brings its own headaches for service discovery.

Application-Level Caching: Applications cache DNS lookups to avoid repeated queries. If your cached entry lives for 5 minutes but the service moved 1 minute ago, you are sending traffic to a dead address.

Operating System Caching: Most operating systems cache DNS responses based on the TTL in the record. Kubernetes DNS records typically have TTLs around 30 seconds, which most systems respect.

Load Balancer / Proxy Caching: If your service sits behind a proxy or load balancer, that component may cache DNS independently. Your 30-second TTL means nothing if the proxy cached the entry for 5 minutes.

The result: a window where traffic flows to addresses that no longer exist. Mitigation strategies include:

Readiness Probes: Kubernetes uses readiness probes to remove unhealthy pods from service endpoints immediately, regardless of DNS caching.
Connection Draining: Allow existing connections to complete while routing new traffic only to healthy instances.
Client-Side Re-resolution: Some clients re-resolve DNS periodically or on connection errors, rather than relying solely on cached entries.
Very Short TTLs: Some deployments use TTLs under 10 seconds, accepting the increased query load in exchange for faster convergence.

Headless Services in Kubernetes

Headless services change DNS semantics in ways that matter. When you set clusterIP: None, CoreDNS returns pod IPs directly instead of a service VIP.

This shows up in a few scenarios:

StatefulSets: Database clusters like MongoDB or Cassandra need pod-to-pod communication where clients connect to specific instances. Headless services let DNS resolve to individual pod IPs.

Custom Load Balancing: Some applications implement their own load balancing. They need to see all available pods and make their own routing decisions.

Service Mesh: With a service mesh like Istio, sidecar proxies often handle load balancing. They may need direct pod IPs for proper traffic management.

The trade-off is that your application takes on complexity the service proxy would normally handle. Without a VIP, failed pods mean failed connections unless your client implements retry logic.

Link Local DNS vs Global DNS

Service discovery DNS typically stays local to your infrastructure. These addresses do not resolve on the public internet and do not need delegation to global DNS servers.

Link-local DNS (also called private DNS) operates within a bounded environment:

Kubernetes cluster DNS lives in cluster.local (or a custom domain)
Consul datacenter DNS lives in datacenter.consul
VPC private DNS in AWS uses .compute.internal

These namespaces do not conflict with public DNS. You can have api.service.consul internally while someone else has api.com on the public internet.

When you need external access, you expose services through an ingress or gateway that bridges internal and external DNS. The external name points to a load balancer or reverse proxy that forwards traffic into your internal network.

Global DNS matters when you need geographic distribution of service discovery. A service registered in one datacenter should be discoverable from another. This requires replication mechanisms—Consul’s multi-datacenter support, for instance, replicates service registrations across datacenters so queries anywhere return consistent results.

Connecting the Patterns

DNS-based service discovery works well in many scenarios, but it has limits. When you need:

Strong consistency: DNS caching means some clients may see stale data. For leader election or configuration changes, you need a more consistent store.
Rich metadata: DNS records carry limited information. When you need health check details, latency metrics, or custom attributes, a service registry with an API works better.
Fine-grained routing: DNS operates at host/port level. When you need header-based routing, traffic splitting, or canary deployments, your service mesh or API gateway provides more control.

Most production systems use DNS for basic discovery, the service mesh for traffic management, and a service registry API for operational tooling.

Topic Deep Dive: CoreDNS Configuration for Service Discovery

CoreDNS is the default DNS server for Kubernetes clusters. Understanding its configuration helps you debug discovery issues and implement custom behaviors.

CoreDNS Architecture

CoreDNS works as a plugin-based DNS server. Each plugin handles a specific function—kubernetes plugin for K8s discovery, forward plugin for upstream DNS, cache plugin for caching responses.

# Default Corefile for Kubernetes
. {
    errors                         # Print errors to stdout
    health                         # Health endpoint on :8080
    kubernetes cluster.local in-addr.arpa ip6.arpa {
        pods insecure              # Use pod IPs (not secure for multi-tenant)
        upstream                   # Forward unresolved queries here
        fallthrough in-addr.arpa ip6.arpa
    }
    prometheus :9153              # Metrics endpoint
    proxy . /etc/resolv.conf      # Forward external queries
    cache 30                       # Cache TTL of 30 seconds
}

Custom DNS Records with CoreDNS

For custom service discovery needs, you can add additional DNS records:

# Add custom service records
api.example.com {
    forward . 8.8.8.8
}

# Or create custom discovery zones
service-discovery.local {
    file /var/lib/coredns/custom.db
    reload 10s
}

DNS TTL Considerations

CoreDNS cache TTL affects how quickly clients see updates:

TTL Setting	Use Case	Trade-off
30 seconds (default)	Most Kubernetes workloads	Balance between freshness and query load
10 seconds or less	Services that scale frequently	Higher DNS query load
5 minutes or more	Stable services, reduced query load	Slower convergence on changes

For headless services with frequent pod changes, lower TTLs ensure faster convergence. For stable services, higher TTLs reduce infrastructure load.

Real-world Failure Scenarios

Scenario	What Happens	Root Cause	Mitigation
DNS cache staleness	Traffic routes to terminated pod	Cached DNS entry not yet expired	Use readiness probes to remove from endpoints; set short TTLs
CoreDNS OOM kill	New pods cannot discover existing services	CoreDNS memory limits too low	Increase CoreDNS resource limits
NDOTS enabled	Excessive external DNS queries	Resolver appends search domains to every query	Disable ndots in pod spec or use FQDNs
Headless service with no endpoints	DNS returns empty	Pods not yet scheduled or selector matches nothing	Add startup probe, verify selector
Cross-namespace lookup	Query fails	Wrong namespace in DNS name	Use full qualified name: service.namespace.svc.cluster.local
CoreDNS crash loop	All service discovery fails cluster-wide	ConfigMap error or plugin failure	Validate Corefile syntax; check CoreDNS logs

Trade-off Comparison: DNS-Based Discovery Platforms

Feature	Kubernetes CoreDNS	Consul DNS	etcd + DNS Bridge	AWS Route 53
Integration	Native K8s	Multi-platform	K8s-native	Cloud-native
SRV Record Support	Yes	Yes	Via controller	Yes
Health-aware routing	Via readiness probes	Built-in	Custom implementation	Via health checks
Multi-datacenter	No (single cluster)	Yes, via gossip	Via federation	Yes, via latency routing
TTL flexibility	Yes	Yes	Configurable	Per-record
Headless services	Native	Via agent	Custom	Via alias records
Operational complexity	Low	Medium	Medium	Low

Quick Recap Checklist

DNS-based service discovery adapts traditional DNS with short TTLs for rapid service instance changes
Kubernetes CoreDNS integrates with the Kubernetes API to automatically create DNS records for services and headless services
Consul provides DNS interface with SRV records for port discovery and multi-datacenter support
DNS caching at multiple layers creates staleness windows; use readiness probes and short TTLs to mitigate
Headless services return individual pod IPs for custom client-side load balancing
For advanced routing (canary deployments, traffic splitting), combine DNS discovery with service mesh or API gateway

Interview Questions

1. How does CoreDNS differ from traditional DNS servers like BIND in the context of service discovery?

Traditional DNS servers like BIND serve static zone data that changes slowly. CoreDNS is designed for dynamic environments where records change constantly as pods scale up and down. CoreDNS watches the Kubernetes API and updates DNS records automatically as services and endpoints change.

CoreDNS uses a plugin architecture where each plugin handles specific DNS functionality. The kubernetes plugin translates K8s services and endpoints into DNS records. When you create or delete a service, CoreDNS updates its zone data without manual intervention.

2. What are SRV records and why are they important for DNS-based service discovery?

SRV records map a service name to a hostname and port. Unlike A records that only map names to IP addresses, SRV records include the port number, allowing complete endpoint discovery from DNS alone.

The format: _service._protocol.name TTL SRV priority weight port target

SRV records let clients discover both the IP address and port number of a service without hardcoding port numbers or making separate discovery calls.

3. What is the difference between headless services and clusterIP services in Kubernetes DNS?

A clusterIP service creates a virtual IP (VIP) that load balances to backing pods. DNS for a clusterIP service returns the VIP address.

A headless service (clusterIP: None) skips the VIP. DNS returns the individual pod IP addresses directly, allowing clients to do their own load balancing or connect to specific pods directly.

Headless services suit stateful applications where clients need specific instance connections—MongoDB replicas, Cassandra nodes, or custom load balancing.

4. How does DNS caching at multiple layers affect service discovery reliability?

DNS resolution caches at multiple layers: application-level, OS-level, and potentially load balancer or proxy caches. Each layer has its own TTL, creating staleness possibilities.

If Kubernetes DNS returns a record with 30-second TTL but the OS caches it for longer, or your application caches for 5 minutes, updates take time to propagate.

Mitigation: use short TTLs, configure OS-level cache appropriately, implement connection-level health checking that verifies actual connectivity.

5. How does Consul's DNS interface work and what advantages does it provide over HTTP-based service discovery?

Consul exposes service discovery through standard DNS queries on port 8600. You query for service names with the Consul domain suffix, and Consul returns A records for service IPs and SRV records for service IPs plus ports.

Advantages: DNS is universally supported, no client library needed, works at the network level.

Limitations: DNS has limited record types for rich metadata. HTTP APIs can return health status and custom attributes.

6. What is the ndots problem in Kubernetes DNS and how does it affect service discovery performance?

The ndots setting determines how many search domains to append to unqualified hostnames. With ndots=5, querying "api-service" tries multiple search domains before the bare name.

This means every service lookup generates multiple DNS queries, adding latency and load to CoreDNS.

Fix: set ndots to a lower value or use fully qualified domain names (FQDNs) in application code.

7. How do you handle DNS-based service discovery when running services both inside and outside Kubernetes?

You need an external service registry like Consul that bridges both environments. Consul agents run on Kubernetes and on VM infrastructure, and they gossip across the cluster boundary.

Services inside K8s register with their local Consul agent. Services outside register with their local agent. All agents share the same service catalog through gossip protocol.

8. What is the role of etcd in DNS-based service discovery architectures?

etcd is a distributed key-value store used by Kubernetes to store cluster state. It backs DNS-based discovery through separate controllers that watch etcd and update DNS records.

External-dns is a controller that watches Service and Ingress resources and updates DNS records in providers like Route 53 based on etcd data.

9. How does Kubernetes handle DNS for services across multiple namespaces?

Kubernetes DNS uses a hierarchical structure: api-service.production.svc.cluster.local

Within the same namespace, you can use just the service name. From a different namespace, you need the full name with the namespace suffix.

10. What are the limitations of DNS-based service discovery compared to API-based discovery?

DNS records have limited metadata—only IPs, ports, and basic priority/weight. You cannot filter DNS queries based on health status, version, or custom attributes.

DNS updates have eventual consistency. For immediate consistency requirements, DNS is not suitable.

DNS does not support header-based routing, traffic splitting, or canary deployments. These require service meshes or API gateways.

11. What happens to DNS records when a Kubernetes pod experiences a graceful termination?

When a pod terminates, the kubelet notifies the API server, which removes the pod from the endpoints object. CoreDNS watches these changes and stops returning the pod IP in DNS queries.

The timing depends on both the endpoint propagation and the DNS TTL. With a 30-second TTL, DNS caches the stale IP for up to 30 seconds after removal.

Readiness probes help mitigate this—if a pod fails its readiness check, it is removed from endpoints immediately, before DNS TTL expiration.

12. How does the gossip protocol in Consul work and why is it used for service discovery?

Consul uses the Serf library for gossip protocol, which disseminates information across the cluster through a peer-to-peer model.

Each Consul agent maintains a member list and periodically exchanges messages with random nodes. Information about new services, health changes, and failures propagates across the cluster through this indirect diffusion.

The gossip protocol provides fault tolerance and eventual consistency without requiring a central registry. If one node fails, others continue sharing information.

13. Explain the difference between SRV records and A records in the context of service discovery.

A records map a hostname to an IPv4 address only. They do not carry port information.

SRV records provide more complete service endpoint information: they specify the target hostname, port number, priority, and weight for a service.

In Kubernetes, A records for services return the cluster IP (VIP) while SRV records return the actual pod IPs and their port numbers.

14. What is external-dns and how does it integrate with DNS-based service discovery?

External-dns is a Kubernetes controller that synchronizes exposed Services and Ingresses with external DNS providers like Route 53, Cloudflare, or Google Cloud DNS.

It watches Kubernetes resources and creates DNS records in the cloud provider when services are created or modified.

This bridges internal Kubernetes DNS with external DNS, enabling external clients to discover services running inside the cluster.

15. How do you debug DNS resolution issues in a Kubernetes cluster?

First, verify CoreDNS is running: kubectl get pods -n kube-system -l k8s-app=kube-dns

Test DNS resolution from within a pod: kubectl exec -it test-pod -- nslookup kubernetes.default

Check CoreDNS logs: kubectl logs -n kube-system -l k8s-app=kube-dns

Verify the coredns ConfigMap is valid and check /etc/resolv.conf in pods for correct search domain configuration.

16. What are the security implications of DNS-based service discovery?

DNS cache poisoning attacks can redirect service traffic to malicious endpoints. Use DNSSEC to validate DNS responses.

In multi-tenant clusters, pod IP ranges may leak between namespaces if CoreDNS is not properly configured with network policies.

Headless services expose individual pod IPs, which may be undesirable in security-sensitive environments. Use network policies to restrict pod-to-pod communication.

17. How does Consul handle network partitions in DNS-based service discovery?

Consul uses the Raft consensus protocol for consistent state replication. During a network partition, minority nodes cannot elect a leader and stop processing writes.

DNS queries continue to be served from nodes in the majority partition with potentially stale data from the minority partition.

Consul's built-in health checking continues evaluating node and service health, removing failed instances from DNS responses.

18. What is the role of the endpoints object in Kubernetes DNS?

The Endpoints object in Kubernetes contains a list of IP addresses and ports for all pods that back a Service.

CoreDNS watches the Endpoints API and creates DNS records corresponding to the current pod IPs in the endpoint list.

When a pod is added or removed (through readiness probes, termination, or scaling), the endpoints object updates, triggering CoreDNS to update DNS records accordingly.

19. How does DNS-based service discovery interact with service meshes like Istio?

Service meshes like Istio often implement their own service discovery by intercepting DNS queries. When an application queries a service name, the sidecar proxy intercepts the request and handles load balancing according to mesh policies.

Istio can route traffic based on headers, version labels, or circuit breakers—capabilities that DNS-based discovery does not provide.

The application sees a stable local endpoint through the sidecar, while the sidecar handles actual endpoint selection and health checking.

20. What are the performance implications of using very short DNS TTLs in service discovery?

Very short TTLs (under 10 seconds) mean clients re-resolve DNS more frequently, increasing query load on the DNS infrastructure.

This can strain CoreDNS or Consul agents, especially at scale with thousands of services and frequent churn.

The trade-off is faster convergence when services change. For stable services, longer TTLs (30-60 seconds) reduce infrastructure load with minimal impact on discovery accuracy.

Conclusion

DNS-based service discovery provides a simple, universal mechanism for service location in microservices architectures. By adapting traditional DNS with short TTLs and dynamic registration, services can discover each other without hardcoded addresses. Kubernetes CoreDNS and Consul are the dominant solutions, each with distinct approaches suited to different deployment models. While DNS discovery handles basic location well, advanced traffic management requires additional layers like service mesh or API gateways.

DNS-Based Service Discovery: Kubernetes, Consul, and etcd

How DNS Has Been Adapted for Service Discovery

Kubernetes DNS

kube-dns and CoreDNS

Service DNS Naming Conventions

Consul DNS Interface

DNS SRV Records for Port Discovery

Prepared Queries

etcd for Service Registration

DNS Caching Challenges and TTL Considerations

Headless Services in Kubernetes

Link Local DNS vs Global DNS

Connecting the Patterns

Topic Deep Dive: CoreDNS Configuration for Service Discovery

CoreDNS Architecture

Custom DNS Records with CoreDNS

DNS TTL Considerations

Real-world Failure Scenarios

Trade-off Comparison: DNS-Based Discovery Platforms

Quick Recap Checklist

Interview Questions

Further Reading

Conclusion

Category

Tags

Related Posts

Client-Side Discovery: Direct Service Routing in Microservices

Server-Side Discovery: Load Balancer-Based Service Routing

Service Registry: Dynamic Service Discovery in Microservices