Sockets & Network IPC

Learn about Unix domain sockets, TCP/UDP sockets for local and network IPC, socket pairs, and advanced socket options for high-performance inter-process communication.

published: May 19, 2026 reading time: 36 min read author: GeekWorkBench

Quick Summary

Learn about Unix domain sockets, TCP/UDP sockets for local and network IPC, socket pairs, and advanced socket options for high-performance inter-process communication.

Sockets & Network IPC

If pipes and message queues are the local delivery trucks of the IPC world, sockets are the postal service — they can deliver data not just between processes on the same machine, but across the network to any reachable host. Sockets are the most versatile and widely-used form of IPC on Unix systems, and understanding them is essential for every systems programmer. Whether you are building a web server, a database client, a microservice communication layer, or a local daemon, sockets are the foundational building block.

Introduction

A socket is a bidirectional communication endpoint. Unlike pipes which are unidirectional and unnamed, sockets provide bidirectional, connection-oriented (TCP) or connectionless (UDP) communication that can be local (Unix domain) or network-based (TCP/IP, UDP/IP).

There are two main families of sockets:

Unix domain sockets (AF_UNIX / AF_LOCAL) — Use filesystem paths as addresses. Data never leaves the kernel. Used for local inter-process communication with TCP-like or UDP-like semantics. Comparable in speed to shared memory for many workloads.

Internet domain sockets (AF_INET / AF_INET6) — Use IP addresses and port numbers. Data flows through the full network stack. Used for network communication between processes on different hosts.

Within each family, there are two main protocols:

SOCK_STREAM (TCP) — Connection-oriented, reliable, byte-stream, no message boundaries. Similar to a pipe but bidirectional.

SOCK_DGRAM (UDP) — Connectionless, unreliable, message-oriented with preserved boundaries. Each send delivers a discrete message.

The socket API was originally developed for BSD Unix and standardized in POSIX. It consists of socket(), bind(), listen(), accept(), connect(), send(), recv(), close(), and related functions.

When to Use / When Not to Use

Use Unix domain sockets when:

You need IPC between processes on the same machine with TCP-like semantics
You need bidirectional communication
You want a simpler alternative to shared memory (with built-in synchronization at the kernel level)
You need to use select()/poll()/epoll for multiplexing multiple connections
You need a connection-oriented channel with backpressure (TCP flow control)

Use TCP sockets when:

You need network communication between different machines
You need reliable, ordered, connection-oriented delivery
You need to handle many concurrent connections efficiently

Use UDP sockets when:

You need low-latency communication and can tolerate some packet loss
You are building systems that handle brief disconnections gracefully
You are doing broadcast or multicast communication

Do not use sockets when:

You need maximum throughput for local communication (shared memory may be faster)
You need simple unidirectional streaming (pipes are simpler)
You need message queue semantics with priorities (message queues fit better)
You are communicating between threads in the same process (use condition variables or channels)

Architecture or Flow Diagram

Server Socket Setup

Server side socket setup starts by creating a file descriptor, binding it to a filesystem path or port, then marking it ready to accept connections. The diagram below shows each kernel call and what it returns.

sequenceDiagram
    participant S as Server
    participant K as Kernel (Socket Layer)

    S->>K: socket(AF_UNIX, SOCK_STREAM, 0)
    K-->>S: fd = socketfd

    S->>K: bind(socketfd, "/tmp/mysock")
    S->>K: listen(socketfd, backlog=5)

Client Connection & Accept

Client side is simpler: create a socket, then call connect to reach the server’s address. The kernel completes the three-way handshake and drops the finished connection into the server’s accept queue.

sequenceDiagram
    participant S as Server
    participant K as Kernel (Socket Layer)
    participant C as Client

    Note over S,C: TCP Server-Client Flow (Unix Domain or TCP)

    C->>K: socket(AF_UNIX, SOCK_STREAM, 0)
    C->>K: connect(socketfd, "/tmp/mysock")

    K-->>S: Notification: new connection
    S->>K: accept(socketfd)
    K-->>S: connfd = new connection socket

    Note right of K: Kernel maintains accept queue

Data Exchange

Once connected, both sides exchange data through send and recv. The kernel buffers data on each end, so writes go into the send buffer and reads pull from the receive buffer. Neither side has to wait for the other.

sequenceDiagram
    participant S as Server
    participant K as Kernel (Socket Layer)
    participant C as Client

    C->>K: send(data)
    K-->>C: 15 bytes accepted

    K-->>S: data available on connfd
    S->>K: recv(connfd, buf, 1024)
    K-->>S: "Hello from client"

    S->>K: send(connfd, "Hi back", 8)
    K-->>C: "Hi back"

    Note over S,C: close both sockets when done

Core Concepts

Unix Domain Socket Creation and Binding

#include <sys/socket.h>
#include <sys/un.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>

int main() {
    int server_fd = socket(AF_UNIX, SOCK_STREAM, 0);
    if (server_fd == -1) {
        perror("socket");
        exit(1);
    }

    struct sockaddr_un addr;
    memset(&addr, 0, sizeof(addr));
    addr.sun_family = AF_UNIX;
    strncpy(addr.sun_path, "/tmp/my_socket", sizeof(addr.sun_path) - 1);

    // Remove existing socket file (avoid EADDRINUSE)
    unlink("/tmp/my_socket");

    if (bind(server_fd, (struct sockaddr *)&addr, sizeof(addr)) == -1) {
        perror("bind");
        exit(1);
    }

    if (listen(server_fd, 5) == -1) {
        perror("listen");
        exit(1);
    }

    printf("Server listening on %s\n", addr.sun_path);

    // Accept a connection
    int client_fd = accept(server_fd, NULL, NULL);
    if (client_fd == -1) {
        perror("accept");
        exit(1);
    }

    char buf[256];
    ssize_t n = recv(client_fd, buf, sizeof(buf) - 1, 0);
    if (n > 0) {
        buf[n] = '\0';
        printf("Received: %s\n", buf);
    }

    send(client_fd, "Hello from server", 17, 0);

    close(client_fd);
    close(server_fd);
    unlink("/tmp/my_socket");

    return 0;
}

Socket Pairs (Anonymous Connected Sockets)

A socket pair is a pair of connected sockets where data written to one can be read from the other. Created with socketpair(), they are useful for creating bidirectional communication channels between related processes:

#include <sys/socket.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>

int main() {
    int sv[2];  // Two connected sockets

    if (socketpair(AF_UNIX, SOCK_STREAM, 0, sv) == -1) {
        perror("socketpair");
        exit(1);
    }

    pid_t pid = fork();
    if (pid == 0) {
        // Child: close write end, read from read end
        close(sv[1]);
        char buf[128];
        ssize_t n = recv(sv[0], buf, sizeof(buf), 0);
        if (n > 0) {
            buf[n] = '\0';
            printf("Child received: %s\n", buf);
        }
        close(sv[0]);
        _exit(0);
    } else {
        // Parent: close read end, write to write end
        close(sv[0]);
        send(sv[1], "Hello from parent!", 18, 0);
        close(sv[1]);
        wait(NULL);
    }

    return 0;
}

TCP Server with select() Multiplexing

#include <sys/socket.h>
#include <netinet/in.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#define PORT 8080
#define MAX_CLIENTS 10

int main() {
    int server_fd = socket(AF_INET, SOCK_STREAM, 0);
    int opt = 1;
    setsockopt(server_fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));

    struct sockaddr_in addr;
    memset(&addr, 0, sizeof(addr));
    addr.sin_family = AF_INET;
    addr.sin_addr.s_addr = INADDR_ANY;
    addr.sin_port = htons(PORT);

    bind(server_fd, (struct sockaddr *)&addr, sizeof(addr));
    listen(server_fd, 5);

    fd_set readfds;
    int client_fds[MAX_CLIENTS] = {0};

    while (1) {
        FD_ZERO(&readfds);
        FD_SET(server_fd, &readfds);
        int maxfd = server_fd;

        for (int i = 0; i < MAX_CLIENTS; i++) {
            if (client_fds[i] > 0) {
                FD_SET(client_fds[i], &readfds);
                if (client_fds[i] > maxfd) maxfd = client_fds[i];
            }
        }

        int activity = select(maxfd + 1, &readfds, NULL, NULL, NULL);
        if (activity < 0) perror("select");

        // New connection?
        if (FD_ISSET(server_fd, &readfds)) {
            int client_fd = accept(server_fd, NULL, NULL);
            for (int i = 0; i < MAX_CLIENTS; i++) {
                if (client_fds[i] == 0) {
                    client_fds[i] = client_fd;
                    break;
                }
            }
        }

        // Client data?
        for (int i = 0; i < MAX_CLIENTS; i++) {
            if (client_fds[i] > 0 && FD_ISSET(client_fds[i], &readfds)) {
                char buf[1024];
                ssize_t n = recv(client_fds[i], buf, sizeof(buf), 0);
                if (n <= 0) {
                    close(client_fds[i]);
                    client_fds[i] = 0;
                } else {
                    // Echo back
                    send(client_fds[i], buf, n, 0);
                }
            }
        }
    }

    return 0;
}

Production Failure Scenarios

EADDRINUSE — Socket Already Bound

EADDRINUSE means bind() was called on an address that is already claimed. For TCP/UDP ports, something is already listening. For Unix domain sockets, the socket file path already exists on the filesystem — even if no process currently has it open. The usual culprit is a server that crashed or got killed, leaving the kernel still holding the address while the socket file lingers.

Set SO_REUSEADDR before calling bind() to allow rebinding to an address in TIME_WAIT:

int opt = 1;
setsockopt(server_fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));
bind(server_fd, (struct sockaddr *)&addr, sizeof(addr));
// bind() now succeeds even if the previous socket on this port is in TIME_WAIT

The behavior differs by socket type:

Socket type	What SO_REUSEADDR allows
`AF_INET` / `AF_INET6` (TCP/UDP)	Bind to a port in `TIME_WAIT` from a previous connection on the same IP:port tuple
`AF_UNIX` (stream or dgram)	Bind to a Unix socket path that already exists on the filesystem

For Unix domain sockets, SO_REUSEADDR is not enough on its own. You also need to unlink() the path before rebinding if the file still exists. The kernel releases the address on close(), but the filesystem entry sticks around until you remove it:

unlink("/tmp/my_socket");  // Remove stale socket file before rebinding
bind(server_fd, (struct sockaddr *)&addr, sizeof(addr));

Skip the unlink() and you get EADDRINUSE regardless of SO_REUSEADDR. This trips up a lot of people restarting Unix domain socket servers during development.

For TCP servers, SO_REUSEADDR does not bypass TIME_WAIT safety — it only permits binding to an address in that state. Incoming packets for the old connection still route correctly. The kernel holds the port for 60 seconds (2 * MSL on Linux) to drain delayed packets; SO_REUSEADDR is the standard workaround for restarts within that window.

Connection Refused Under Load

When a server is overwhelmed with connections, the listen queue (backlog) fills up. New connections are refused with ECONNREFUSED or silently dropped (depending on OS). Set an appropriate backlog and monitor queue depth.

The kernel maintains two queues per listening socket: the SYN queue (incomplete connections that have received SYN but not yet a matching SYN-ACK) and the accept queue (completed connections that have finished the three-way handshake and are waiting for accept() to be called). When listen(fd, backlog) is called, the backlog argument sets the maximum size of the accept queue. The actual cap is the lesser of your backlog value and somaxconn (Linux default: 4096). When the accept queue is full, the kernel silently drops the final ACK from the client, causing clients to retransmit until they timeout or give up.

Under heavy load, your server appears to refuse connections even though it is running. Clients see ECONNREFUSED or timeouts. This is distinct from EMFILE (file descriptor exhaustion) and from port exhaustion in the TIME_WAIT state.

Mitigation: Increase the listen backlog with listen(fd, backlog). Set it to a value that handles burst traffic without masking a real overload condition. For high-throughput servers, values between 256 and 1024 are common. You can view and set somaxconn system-wide:

# View current cap
cat /proc/sys/net/core/somaxconn

# Set it (requires root)
echo 4096 > /proc/sys/net/core/somaxconn

Setting a large backlog does not help if your application cannot call accept() fast enough. The queue just fills with completed connections waiting there. If ss -ltn shows a consistently full queue under load, your application-level processing is the bottleneck, not the queue size. Implement connection limiting at the application level and use load balancing to distribute connection attempts across multiple server processes.

Socket Leak — File Descriptors Not Closed

Every socket is represented as a file descriptor, and file descriptors are a finite system resource. On Linux, the default limit per process is 1024 (viewable with ulimit -n or getrlimit(RLIMIT_NOFILE, ...)). Root processes can raise this, but it remains bounded. If sockets are not closed in all code paths — especially error branches — file descriptors accumulate until the process hits its limit. At that point, socket() and open() fail with EMFILE: Too many open files.

Socket leaks commonly occur in two scenarios. First, in error handling paths: after socket() succeeds but before connect() or bind() succeeds, the fd must be closed. Forgetting a close on any error return leaves a leak. Second, in long-running servers: each accepted connection produces a new socket fd. If the server does not close the client fd after handling (e.g., in a non-bugging fork pattern), leaked fds accumulate with every request.

Mitigation: Always close sockets in all code paths. The robust pattern is a cleanup goto block:

int fd = -1;
int server_fd = socket(AF_INET, SOCK_STREAM, 0);
if (server_fd == -1) {
    perror("socket");
    exit(1);
}

if (bind(server_fd, ...) == -1) {
    perror("bind");
    goto cleanup_server;
}

if (listen(server_fd, 5) == -1) {
    perror("listen");
    goto cleanup_server;
}

// ... accept and handle connections ...

cleanup_server:
    close(server_fd);
// Each error site does: goto cleanup_server;
// No duplicate close calls needed

Monitor file descriptor usage in real time:

# List open fds for a process
ls /proc/<pid>/fd/ | wc -l

# Show which files/sockets are open
lsof -p <pid>

# Check the system-wide and per-process limits
cat /proc/sys/fs/file-max # system-wide limit
ulimit -n                          # soft limit for shell
cat /proc/<pid>/limits # hard/soft limits for a process

Watch for a growing fd count over time — a stable server’s fd count should be constant under steady load. If it climbs, you have a leak somewhere in your connection lifecycle.

Partial Send / recv

send() may transmit fewer bytes than requested if the kernel’s send buffer is full (especially on non-blocking sockets). recv() may return fewer bytes than requested. Always check return values and handle partial operations.

On blocking sockets, send() blocks until all data is in the kernel buffer or an error occurs, but the return value may still be less than requested if the buffer fills mid-call. On non-blocking sockets, send() returns immediately with either the number of bytes accepted or EAGAIN/EWOULDBLOCK if the buffer is full. recv() similarly returns what’s available at the moment, which may be fewer bytes than requested even on blocking sockets if data arrives in fragments.

The consequence of not handling partial operations is protocol corruption. If you send a 100-byte message and only 50 bytes arrive, the receiver must know whether to wait for the remaining 50 or treat the partial as complete. Without a framing protocol, both sides go out of sync.

Mitigation: Loop until all data is sent. Use a sendall() wrapper that handles partial sends and EINTR (interrupted system call):

ssize_t sendall(int sockfd, const void *buf, size_t len) {
    size_t total = 0;
    while (total < len) {
        ssize_t n = send(sockfd, (const char *)buf + total, len - total, 0);
        if (n == -1) {
            if (errno == EINTR) continue;  // Retry on signal
            return -1;
        }
        total += n;
    }
    return total;
}

For recv(), accumulate bytes until a complete message is received. Choose a framing strategy:

Fixed-length: Every message is exactly N bytes. Simple but wastes bandwidth for variable-size data.
Length prefix: Prepend a fixed-size length field (e.g., 4-byte uint32_t in network byte order) before each message. The receiver reads the length first, then reads exactly that many bytes.
Delimiter-based: A special byte sequence (e.g., \n for line-oriented protocols) marks message boundaries. Beware of delimiter injection attacks.

Which framing you pick depends on your protocol. Length prefix is the most common for binary protocols. Delimiter-based is common for text protocols like HTTP, though you need to guard against delimiter injection attacks.

UDP Packet Loss and Reordering

UDP has no built-in mechanism for detecting lost packets, reordering out-of-order delivery, or deduplicating. When you send a datagram, the kernel hands it to the IP layer and walks away — no confirmation, no retry, no reordering. For DNS queries, video streams, or game state updates, this is usually fine. For other applications, it is a problem you have to solve yourself.

Packet loss happens when a router’s output buffer fills and it drops the datagram. The sender never finds out unless the application layer handles it. Reordering happens when packets take different paths — a datagram sent after another one can arrive first. Duplication is less common but real, usually from network retransmissions firing duplicates into the mix.

Building reliability on top of UDP means adding four things at the application layer:

Sequence numbers — Assign a monotonically increasing number to each datagram. The receiver tracks the last number seen and can spot gaps (missing packets) or duplicates.
Acknowledgments — The receiver sends an ACK when it gets a valid packet. If the sender does not see an ACK within a timeout, it retransmits.
Retransmission with backoff — After a timeout, resend. After repeated failures, back off exponentially so you do not make congestion worse.
Reordering buffer — The receiver holds onto out-of-order packets briefly, waiting for gaps to fill before passing data up to the application.

Reliability patterns:

Pattern	How it works	Tradeoff
Stop-and-wait	Send one packet, wait for ACK, then send the next	Simple, but high latency and wastes bandwidth while waiting
Sliding window	Keep a window of unacknowledged packets in flight; advance as ACKs arrive	Throughput-efficient, but more involved to implement correctly
Cumulative ACK	ACK carries the highest contiguous sequence number received	Low ACK overhead, but cannot acknowledge non-contiguous ranges without NACK
Negative ACK (NACK)	Receiver explicitly asks for missing sequence numbers	Cuts down on ACK traffic when losses are sparse, but needs an explicit NACK mechanism

For most applications, just use TCP. The complexity of building a reliable UDP protocol is hard to justify when TCP already does the job. But when you need custom congestion control tuned to your workload — real-time audio that tolerates a dropped packet but not a latency spike — or when you need multicast delivery to multiple peers, application-layer reliability on UDP is the only path. QUIC (the transport behind HTTP/3) and WebRTC’s data channel both sit on UDP for exactly this reason.

One thing to watch: a sender that retransmits aggressively on a congested link will make packet loss worse, not better. Track round-trip time using timestamps or sequence number timing, and pace your transmissions to stay within estimated available bandwidth. For non-critical logging or metrics over UDP, none of this matters — fire and forget.

Trade-off Table

Feature	Unix Domain Socket	TCP Socket	UDP Socket	Named Pipe (FIFO)
Scope	Local only	Local + network	Local + network	Local only
Connection model	Connection-oriented (SOCK_STREAM) or datagram	Connection-oriented	Connectionless	Connectionless
Message boundaries	SOCK_STREAM: no, SOCK_DGRAM: yes	No (byte stream)	Yes (preserved)	No (byte stream)
Bidirectional	Yes	Yes	Yes (send/recv on same socket)	No (unidirectional)
Select/poll/epoll	Yes	Yes	Yes	Yes (via file descriptor)
Performance	Very high (kernel, no network)	High (kernel, local); moderate (network)	Highest (no connection overhead)	High
Reliability	Depends on protocol	Reliable, ordered	Unreliable, best-effort	Reliable (kernel buffer)
Typical use	Local daemons, high-perf IPC	Network services, clients	Streaming, low-latency	Simple cross-process streaming

Implementation Snippet(s)

Python: Unix Domain Socket Client

Python’s built-in socket module talks to the BSD socket API so you don’t manage file descriptors directly. This client connects to a Unix domain socket, sends a message, and prints whatever the server sends back.

import socket
import os

SOCKET_PATH = "/tmp/my_socket"

# Create Unix domain socket
client = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)

try:
    client.connect(SOCKET_PATH)
    client.sendall(b"Hello from Python client")
    data = client.recv(1024)
    print(f"Server said: {data.decode()}")
except ConnectionRefusedError:
    print("Server not running")
except Exception as e:
    print(f"Error: {e}")
finally:
    client.close()

Bash: Using netcat for Socket Testing

netcat (nc) is a command-line tool for probing socket connections during debugging. The examples below show Unix domain sockets, TCP servers, and some common diagnostic commands.

# Connect to a Unix domain socket (Linux only)
# nc -U /tmp/my_socket

# Listen on a Unix domain socket
# nc -l -U /tmp/my_socket

# Test TCP server
# nc localhost 8080

# Send HTTP request to test server
# echo -e "GET / HTTP/1.0\r\n\r\n" | nc localhost 80

# Check what is listening on TCP ports
ss -tlnp | grep :8080
netstat -tlnp | grep :8080

Observability Checklist

Open sockets: lsof -p <pid> shows all file descriptors including sockets
Listening ports: ss -tlnp (preferred over netstat on modern Linux) shows listening sockets with process info
Established connections: ss -tnp shows all established TCP connections
Socket buffers: Check with cat /proc/sys/net/core/rmem_default and /proc/sys/net/core/wmem_default
Connection state: ss -ti shows TCP connection state, retransmissions, congestion window
strace: strace -e trace=bind,listen,accept,connect,send,recv,close -p <pid> to trace socket operations
perf: perf stat -e syscalls:sys_enter_bind,syscalls:sys_enter_connect to measure socket call frequency

Common Pitfalls / Anti-Patterns

Socket Permissions & Network Exposure: Unix domain sockets respect filesystem permissions on the socket file path — use appropriate permissions on the containing directory (0700 for sensitive IPC). TCP/UDP sockets bound to 0.0.0.0 or :: are network-accessible; bind to localhost (127.0.0.1 or ::1) for local-only access and use firewall rules for additional protection. Local processes can potentially sniff Unix domain socket traffic if they have access to the socket path — use filesystem permissions and separate namespaces for sensitive IPC.

DoS via Connection Flood & Audit: An attacker can exhaust server resources by opening many connections (SYN flood for TCP, connection flood for SOCK_STREAM). Use connection limits, rate limiting, proper timeout configuration, and consider a load balancer or SYN cookies. Socket operations (bind, listen, connect) generate standard audit events on most Linux distributions — for compliance, monitor for unexpected socket creation or binding to unusual ports.

Not handling EINTR on socket calls — same as pipes and other blocking calls, socket operations can return EINTR. Handle it or use SA_RESTART.
Ignoring partial send/recv — send() and recv() may process fewer bytes than requested. Loop until all data is transferred.
Forgetting SO_REUSEADDR — not setting this causes “Address already in use” errors after server restart, especially during development.
Buffer overflow in recv — always bounds-check the buffer size. Malicious clients can send more data than expected.
Using UDP for reliable data — UDP makes no guarantee of delivery, order, or uniqueness. If you need reliability on top of UDP, implement sequence numbers, ACKs, and retransmission.
Not setting socket timeouts — default socket operations may block forever. Set SO_RCVTIMEO and SO_SNDTIMEO for production code.
Leaving sockets in TIME_WAIT too long — after closing a connection, the kernel holds the port in TIME_WAIT state. Use SO_REUSEADDR to allow rebinding, or design your protocol to use longer-lived connections.
Mixing select/poll with edge-triggered epoll — if using epoll() in edge-triggered mode and not draining all pending data, you may miss events. Use level-triggered mode or drain completely.

Quick Recap Checklist

Sockets provide bidirectional IPC for both local (AF_UNIX) and network (AF_INET) communication
SOCK_STREAM is connection-oriented, reliable, byte-stream (like a bidirectional pipe)
SOCK_DGRAM is connectionless, unreliable, message-oriented (preserves boundaries)
Unix domain sockets (AF_UNIX) are the fastest local IPC mechanism with full socket API features
Always set SO_REUSEADDR before bind() to handle server restarts gracefully
Handle EINTR on all socket calls and partial send/recv by looping
Use select()/poll()/epoll() for multiplexing many connections in a single thread
Monitor socket file descriptor usage to prevent leaks; always close in all code paths
UDP requires application-level reliability if your use case needs it
Socket buffer sizes affect performance — tune with SO_RCVBUF and SO_SNDBUF

Interview Questions

1. What is the difference between AF_UNIX and AF_INET sockets?

AF_UNIX (also called AF_LOCAL) Unix domain sockets use a filesystem path as the address. Data never leaves the kernel — it is copied directly from sender's buffer to receiver's buffer through the kernel's socket infrastructure. They are used for local IPC between processes on the same machine and offer the highest performance.

AF_INET (IPv4) and AF_INET6 (IPv6) are internet domain sockets that use IP address and port number pairs as addresses. Data flows through the full TCP/IP network stack — through the kernel's networking layers and potentially across a physical network. They support communication with processes on remote machines.

Both support SOCK_STREAM (reliable, connection-oriented, byte-stream) and SOCK_DGRAM (message-oriented, unreliable). Unix domain sockets are generally faster since they avoid network stack overhead, but they are limited to local communication.

2. How does select() work with sockets, and what are its limitations?

select() allows a process to monitor multiple file descriptors, blocking until one or more become "ready" (readable, writable, or have an error condition). Internally, select() copies three bitmap sets (readfds, writefds, exceptfds) into the kernel, which checks each fd's state. When any fd is ready, the kernel updates the bitmaps in-place and returns.

Limitations:

O(n) scanning: On return, you must iterate through all fds to find which are ready, even if only one was ready. Poor scaling with thousands of fds.
Bitmap limit: The fd sets use fixed-size bitmaps (typically FD_SETSIZE, often 1024), limiting the number of fds you can monitor.
Reset on return: The fd sets are modified by select(), so you must reinitialize them on each call.

Modern alternatives: poll() solves the fd limit issue (uses array instead of bitmap). epoll() (Linux) solves both — it uses a kernel event list and returns only ready fds, scales to millions of fds, and supports edge-triggered mode. kqueue() (BSD/macOS) provides similar functionality.

3. What is the difference between SOCK_STREAM and SOCK_DGRAM?

SOCK_STREAM provides a connection-oriented, reliable, byte-stream channel. It behaves like a bidirectional pipe — you send bytes, they arrive in order at the other end, with no message boundaries. If you send 100 bytes then 50 bytes, the receiver might read 150 bytes at once, or 50 bytes then 100 bytes, or any other division. TCP is the protocol that implements SOCK_STREAM over IP.

SOCK_DGRAM provides a connectionless, unreliable, message-oriented channel. Each send() delivers a discrete message (datagram) that arrives as a unit. Messages have boundaries — a single recv() returns exactly one datagram. Packets can be lost, duplicated, or arrive out of order. UDP is the protocol that implements SOCK_DGRAM over IP.

Choose SOCK_STREAM when you need reliable ordered delivery with no message framing concerns. Choose SOCK_DGRAM when you need low latency and can tolerate packet loss, or when each message is self-contained and should not be fragmented across recv calls.

4. What is the TIME_WAIT state and how does SO_REUSEADDR help?

When a TCP connection is closed, the endpoint that initiates the close (the one sending the first FIN) enters TIME_WAIT state for a duration of 2 * Maximum Segment Lifetime (MSL), typically 60 seconds on Linux. During this time, the socket pair (IP:port combination) cannot be reused. This exists so that delayed packets from the old connection are not mistaken for packets from a new connection using the same tuple.

This causes problems when restarting a server — you try to bind to port 8080 but it is still in TIME_WAIT from the previous run.

SO_REUSEADDR tells the kernel to allow binding to an address that is in TIME_WAIT. For server sockets, you set this option before calling bind(). For Unix domain sockets, you also need to unlink() the socket path if the file still exists.

int opt = 1;
setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));
bind(sockfd, ...);

This does not violate the TIME_WAIT safety property because the kernel only allows binding to an address in TIME_WAIT — it does not allow binding to a connection that is still active. Incoming packets for the old connection will still be handled correctly.

5. How do you handle partial sends and receives with sockets?

send() and recv() may transmit or receive fewer bytes than requested when the kernel's socket buffer is full (send) or no data is immediately available (recv). Both return the number of bytes actually transferred, or -1 on error.

Handling partial sends — always loop until all data is sent:

ssize_t sendall(int sockfd, const void *buf, size_t len) {
    size_t total = 0;
    while (total < len) {
        ssize_t n = send(sockfd, (const char *)buf + total, len - total, 0);
        if (n == -1) {
            if (errno == EINTR) continue;  // Retry
            return -1;
        }
        total += n;
    }
    return total;
}

Handling partial receives — accumulate until a complete message is received. Use a length prefix protocol (4-byte length header) so the receiver knows when a message is complete:

// Read exactly len bytes
ssize_t recvall(int sockfd, void *buf, size_t len) {
    size_t total = 0;
    while (total < len) {
        ssize_t n = recv(sockfd, (char *)buf + total, len - total, 0);
        if (n == 0) return total;  // EOF
        if (n == -1) {
            if (errno == EINTR) continue;
            return -1;
        }
        total += n;
    }
    return total;
}

For streaming protocols, consider using a circular buffer and maintaining a receive state machine that tracks how many bytes of the current message have been received.

6. How does accept() behave with regard to the listen backlog queue?

The kernel maintains two queues for a listening socket: the accept queue (completed connections ready for accept()) and the SYN queue (incomplete connections that have received SYN but not yet a matching SYN-ACK). When listen(fd, backlog) is called, the backlog argument specifies the maximum size of the accept queue. The actual maximum is also bounded by /proc/sys/net/core/somaxconn.

When the accept queue is full, new completed connections are discarded (silently dropping the final ACK) rather than being queued, causing clients to retransmit until they timeout. accept() simply removes connections from the accept queue; it does not affect the SYN queue. High-performance servers must balance backlog size (memory) against the rate of new connections arriving. Linux also provides TCP_FASTOPEN which bypasses the SYN handshake for previously connected clients, bypassing queue pressure.

7. What is the difference between edge-triggered and level-triggered notification in epoll?

Level-triggered mode (the default for epoll) notifies you whenever a file descriptor is ready, as long as the condition persists. If you do not drain all available data when notified, you will be notified again on the next epoll_wait call. Edge-triggered mode notifies you only when the state changes from not-ready to ready, or when new data arrives on a file descriptor that was previously empty.

Edge-triggered mode requires careful handling: you must drain all data when notified (or use EAGAIN returns to stop), and you must handle all file descriptors that become ready on each call or miss events. Edge-triggered is typically used with non-blocking sockets in high-performance servers to avoid spurious wakeups. Level-triggered is easier to program but can generate more events in high-throughput scenarios.

8. How does SO_KEEPALIVE work and what are its configuration parameters?

SO_KEEPALIVE enables periodic probes on idle TCP connections to detect if the peer has crashed or become unreachable. When enabled and no data has been sent or received for a configurable period, the kernel sends a zero-checksum probe packet. If the peer responds with an ACK, the connection is alive. If probes fail repeatedly, the connection is closed with ETIMEDOUT.

Configurable via socket options: TCP_KEEPIDLE (time before first probe, default 7200s on Linux), TCP_KEEPINTVL (interval between probes, default 75s), and TCP_KEEPCNT (number of failed probes before giving up). These can be set per socket before connect() or listen(). Keepalive is useful for detecting dead peers in long-lived connections like database connections, but the long defaults mean dead connections may be undetected for hours.

9. What is the difference between Unix domain sockets and named pipes (FIFOs)?

Both Unix domain sockets and FIFOs use filesystem paths as addresses and respect filesystem permissions. Key difference: FIFOs are unidirectional (one reader, one writer) and message-oriented but with read/write semantics (write returns success when data is copied to kernel buffer), while Unix domain sockets are bidirectional and can be stream-oriented (SOCK_STREAM) or datagram-oriented (SOCK_DGRAM).

FIFOs cannot be used with select()/poll()/epoll() for multiplexing multiple readers/writers the way sockets can. A Unix domain SOCK_STREAM socket pair behaves like a bidirectional pipe, while a FIFO requires two named pipes for bidirectional communication. Unix sockets support ancillary messages (file descriptors, credentials), byte-stream ordering, and connection-oriented communication, making them more versatile than FIFOs for most IPC scenarios.

10. How does UDP connect() work and what advantages does it provide for UDP sockets?

Calling connect() on a UDP socket does not perform a handshake (unlike TCP). Instead, it associates the socket with a specific peer address and records this association in the kernel's socket state. For subsequent send() and recv() calls, the kernel uses the connected peer address, and ECONNREFUSED is returned if the peer is unreachable (ICMP port unreachable). Without connect(), UDP sendto() must specify the destination each time.

Connected UDP provides: automatic use of the peer address (no need to specify on every send), immediate error notification when the peer is unreachable (ICMP errors delivered to socket), and on some systems, improved performance due to reduced address resolution overhead. Connected UDP still has no delivery guarantees but is more efficient for bidirectional UDP communication with a single peer.

11. What are raw sockets and when would you use them?

Raw sockets bypass the normal protocol stack layers and allow you to construct custom network packets at the IP level or below. With socket(AF_INET, SOCK_RAW, protocol), you receive and send raw IP datagrams with full control over IP headers and payload. You can even use SOCK_PACKET (Linux-specific) to access Ethernet frames directly.

Legitimate uses: network diagnostics tools (ping uses raw ICMP sockets), packet sniffers with BPF, custom tunneling protocols (like GRE or IP-in-IP), network simulation, and firewall implementations. Requires CAP_NET_RAW capability. Raw sockets are a security concern because they can be used for reconnaissance, crafting spoofed packets, and network scanning, which is why many production environments restrict or disable them.

12. How does socket buffer sizing affect network performance?

Socket buffers (SO_RCVBUF and SO_SNDBUF) are kernel-managed ring buffers for incoming and outgoing data. If the send buffer is full, a send() call blocks or returns EAGAIN (non-blocking), backpressuring the application. If the receive buffer is full, incoming data is dropped (UDP) or the sender's TCP window closes (TCP), reducing throughput.

The kernel auto-tunes these on modern Linux, but high-throughput or low-latency applications may benefit from manual tuning. Increasing the receive buffer helps when receiving bursts of data. For low-latency, smaller buffers reduce queuing delay. For high-bandwidth-delay-product links (like WAN connections), larger buffers allow more data to be in flight. Linux also provides SO_SNDBUF and SO_RCVBUF with _LOWAT variants to set minimum watermarks.

13. What is the difference between poll() and epoll() for socket multiplexing?

poll() and select() both copy the file descriptor sets from user space to kernel space and scan all fds linearly on each call, making them O(n) for each notification. poll() uses an array of pollfd structures (no fixed FD_SETSIZE limit like select()), but still has the linear scan problem. Both require re-adding fds after each call (for poll(), events are cleared after each call; for select(), the fd sets are modified).

epoll() uses a kernel-maintained red-black tree of monitored file descriptors and a separate ready list of file descriptors that have events. On epoll_wait(), it returns directly from the ready list without scanning all fds, making it O(1) for notification. It supports edge-triggered mode and one-shot mode (one notification per event until explicitly rearmed). epoll() scales to hundreds of thousands of file descriptors efficiently, which is why nginx and other high-performance servers use it.

14. What is the purpose of TCP_NODELAY and when should you disable Nagle's algorithm?

Nagle's algorithm (enabled by default on TCP sockets) buffers outgoing data and waits for an ACK before sending more, coalescing small writes into larger segments to reduce packet overhead on low-speed links. For interactive, low-latency applications like remote shells or real-time game updates, this buffering adds unacceptable latency.

TCP_NODELAY disables Nagle's algorithm, sending data immediately without waiting for ACKs. Use it when you have small, latency-sensitive messages that should be sent immediately: keystrokes in an SSH session, game state updates, real-time chat messages. The tradeoff is more packets on the wire and potentially reduced throughput for bulk transfers. Most interactive applications disable Nagle's algorithm.

15. How does socketpair() differ from pipe() for IPC?

pipe() creates a unidirectional channel with two file descriptors: fd[0] for reading, fd[1] for writing. Data written to fd[1] is read from fd[0]. socketpair(AF_UNIX, SOCK_STREAM) creates a pair of bidirectional, connected stream sockets where both file descriptors can both send and receive.

socketpair() with SOCK_STREAM provides a bidirectional pipe that works with shutdown() (SHUT_RD, SHUT_WR, SHUT_RDWR) for half-close semantics, can be used with select()/poll()/epoll() for multiplexed communication, and supports out-of-band data (MSG_OOB) and file descriptor passing via sendmsg() with SCM_RIGHTS. A bidirectional protocol is more natural to implement on a socketpair than coordinating two unidirectional pipes.

16. What is the CLOSE_WAIT and LAST_ACK state in TCP connection termination?

When the remote end sends FIN (initiating connection close), the local TCP receives the FIN and moves the connection to CLOSE_WAIT state while notifying the application that the connection is closed for sending. The application must call close() to complete the close. After the application buffers are flushed and close() is called, the local TCP sends the final ACK and moves to LAST_ACK. The connection stays in LAST_ACK until the final ACK is received.

Connections stuck in CLOSE_WAIT indicate the application is not calling close() after receiving the remote close. This commonly happens when the application does not properly handle half-closes from the peer. A socket in CLOSE_WAIT holds kernel resources (receive buffer) until the application calls close(). Properly handling shutdown(SHUT_RDWR) or detecting peer close with zero-length recv() prevents socket leaks.

17. What is the difference between recv() and read() on a socket?

On Linux, read(fd, buf, len) and recv(sockfd, buf, len, flags) are essentially equivalent for socket file descriptors. The key difference is recv() accepts a flags argument: MSG_PEEK (peek at data without consuming it), MSG_DONTWAIT (non-blocking), MSG_OOB (out-of-band data), and MSG_WAITALL (wait for full request).

read() is the generic POSIX file descriptor operation and works on any file descriptor (pipes, files, sockets). recv() is socket-specific and provides socket-related control via flags. For regular sockets without special flags, they behave identically. Using recv() makes the socket-specific intent explicit and provides access to features that read() cannot express.

18. How does getsockopt() and setsockopt() work for socket configuration?

setsockopt() modifies kernel-level socket behavior at various levels: SOL_SOCKET (generic options like SO_REUSEADDR, SO_KEEPALIVE, SO_RCVBUF), SOL_TCP (TCP-specific like TCP_NODELAY, TCP_QUICKACK), SOL_IP (IP-specific like IP_MTU_DISCOVER), and protocol-family-specific options. Each option has a type and value that the kernel interprets according to the protocol's implementation.

getsockopt() retrieves the current value of an option. Options can be integer values, structs, or binary blobs depending on the option. Some options are read-only (determined by the kernel or connection state) and return errors when set. Setting options before bind() or connect() is important because some options affect connection establishment and cannot be changed after the socket is fully established.

19. What is the difference between shutdown() and close() for sockets?

close() closes the file descriptor and, when the last reference to the socket is closed, initiates the TCP close sequence (FIN exchange) if it is a connected socket. It releases the file descriptor and kernel resources. If multiple processes share the socket (via fork), each close() decrements the reference count and only the last one initiates the TCP close.

shutdown() operates at the socket level, shutting down one or both directions of the connection: SHUT_RD (no more receptions allowed, receive buffer is discarded), SHUT_WR (no more sends, initiates FIN), SHUT_RDWR (both directions). shutdown() is useful for half-close patterns where one direction closes while the other remains open, which close() cannot express. shutdown() does not release the file descriptor itself.

20. What is io_uring and how does it improve asynchronous socket I/O?

io_uring is a Linux kernel interface (5.1+) that enables high-performance asynchronous I/O for sockets, files, and other I/O operations. Unlike traditional async I/O (select/poll/epoll are synchronous multiplexing), io_uring uses a pair of ring buffers (submission queue and completion queue) shared between the application and kernel. The application submits I/O operations by filling the submission queue ring buffer; the kernel processes them and returns results in the completion queue ring buffer.

Key advantages: true async send/recv with IORING_OP_SEND and IORING_OP_RECV that return immediately and notify via completion queue, batched submissions that amortize syscall overhead, and features like fixed buffers that eliminate per-operation memory allocation. For high-frequency socket I/O, io_uring reduces context switch overhead significantly compared to epoll with blocking send/recv. Libraries like liburing make io_uring accessible to applications.

Conclusion

Sockets represent the pinnacle of IPC flexibility—bidirectional, connection-oriented or connectionless, local or network-spanning. Unix domain sockets provide highest-performance local communication, while TCP/UDP sockets extend that capability across network boundaries. The layered API (socket, bind, listen, accept, connect, send, recv) has remained remarkably stable since BSD introduced it decades ago.

Mastering socket programming means mastering edge cases: partial sends and receives requiring loops, EINTR handling for interrupted system calls, TIME_WAIT state and SO_REUSEADDR for server restarts, and appropriate use of select/poll/epoll for scalable multiplexing. These fundamentals apply whether you’re building a local daemon, a network service, or a distributed system.

Looking forward, the evolution continues with zero-copy socket APIs, io_uring integration for asynchronous I/O, and increasingly sophisticated offloading to network hardware. The socket API adapts, but the underlying principles of bidirectional communication endpoints and kernel-mediated data transfer remain constant.

Sockets & Network IPC

Introduction

When to Use / When Not to Use

Architecture or Flow Diagram

Server Socket Setup

Client Connection & Accept

Data Exchange

Core Concepts

Unix Domain Socket Creation and Binding

Socket Pairs (Anonymous Connected Sockets)

TCP Server with select() Multiplexing

Production Failure Scenarios

EADDRINUSE — Socket Already Bound

Connection Refused Under Load

Socket Leak — File Descriptors Not Closed

Partial Send / recv

UDP Packet Loss and Reordering

Trade-off Table

Implementation Snippet(s)

Python: Unix Domain Socket Client

Bash: Using netcat for Socket Testing

Observability Checklist

Common Pitfalls / Anti-Patterns

Quick Recap Checklist

Interview Questions

Further Reading

Conclusion

Category

Tags

Related Posts

ASLR & Stack Protection

Assembly Language Basics: Writing Code the CPU Understands

Boolean Logic & Gates