Fork & Exec System Calls

fork() duplicates a running process, then exec() replaces it with a new program. Together they power every shell, web server, and daemon on Unix-like systems.

published: May 20, 2026 reading time: 26 min read author: GeekWorkBench

Quick Summary

fork() duplicates a running process, then exec() replaces it with a new program. Together they power every shell, web server, and daemon on Unix-like systems.

Fork & Exec System Calls

Every time you type a command in a terminal, run a background service, or spawn a worker process in a web server, two Unix system calls are doing the heavy lifting: fork() and exec(). Together, they form the foundation of process creation in every Unix-like operating system — Linux, macOS, BSD, and even the Android kernel.

If you have been running programs without understanding these calls, you are missing a big piece of the picture. Once you see how fork() duplicates a running program and how exec() swaps it for something new, the entire process model makes sense. Engineers who understand these calls debug systems properly. Everyone else just restarts services and hopes for the best.

Introduction

The fork() and exec() system calls are the backbone of process creation in Unix-like operating systems. While they are almost always used together, each serves a distinct purpose:

fork() duplicates the current process, creating a child with an identical copy of the parent’s address space
exec() replaces the current process’s program image with a new one, without creating a new process

Separating these two operations gives operating systems flexibility. The shell can fork a child, let the child set up its environment (redirecting I/O, changing working directory), and then exec the target program. Web servers use the same pattern to spawn worker processes. Daemons fork to detach from the terminal. Understanding this pattern is essential for anyone working with systems programming, operating systems, or infrastructure.

In this post, we will examine how each call works individually, how they combine in the fork()+exec() pattern, what happens to process state and resources, and the edge cases every systems programmer should know.

The Anatomy of fork()

fork() is a strange beast. It takes zero arguments and returns twice — once in the original process (the parent) and once in the brand-new process (the child). That is not a typo.

The way this works is that fork() creates a duplicate of the calling process. The child gets an exact copy of the parent’s address space — its memory, its open files, its variables. Everything. The child starts executing at the exact instruction where fork() returned in the parent. The only difference is the return value.

Return Values Tell You Who You Are

This is the key to understanding every fork()-based program:

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>

int main() {
    pid_t pid = fork();

    if (pid < 0) {
        // fork() failed — not enough processes allowed, out of memory
        perror("fork failed");
        return 1;
    } else if (pid == 0) {
        // We are the child process
        printf("I am the child. My PID is %d\n", getpid());
    } else {
        // We are the parent process
        printf("I am the parent. My child's PID is %d\n", pid);
    }

    return 0;
}

The return value in the child is 0. The return value in the parent is the child’s PID — a positive integer. The parent needs the child’s PID to manage it (wait for it, send signals to it). The child does not know its own PID from the return value — it calls getpid() if it needs to know.

If fork() returns -1, the call failed entirely and no process was created. Common failure reasons include hitting the process limit (EAGAIN) or running out of virtual memory.

Copy-on-Write: Why fork() Does Not Destroy Performance

A naive reading of fork() suggests it must copy the entire address space — all memory pages — from parent to child. For a process with hundreds of megabytes of memory, that would be devastatingly slow and would waste enormous amounts of RAM.

Unix kernels solve this with copy-on-write (COW). At the moment of fork(), the kernel does not copy the memory pages at all. Instead, both processes share the same physical pages. The kernel marks those pages as read-only. As long as both processes only read from their memory, nothing needs to be copied.

The moment either process tries to write to a page, the write is trapped by the CPU. The kernel then allocates a new physical page, copies the original content there, and remaps the writing process’s page table to point at the new page. From that point on, the two processes have independent copies of that page.

This means fork() is fast — it does not need to copy all the data immediately. It only needs to copy the page tables and mark pages as read-only. The actual copying is deferred until a write is necessary, and in many programs, large portions of memory are never written at all.

The performance story shifts on multi-core chips. When a process writes to a COW page, the kernel allocates a new physical page and updates the writing process’s page table. That write also invalidates any TLB entries that other cores hold for the old page. TLB shootdowns have a cost — the kernel must send an Inter-Processor Interrupt to every core that might have cached the old mapping. On a 32-core machine, a process that touches many COW pages can trigger a wave of shootdowns. This is why fork() followed by heavy writes in the child can drag on large NUMA systems — the kernel has to allocate new pages local to the child node, and cross-node traffic adds latency.

For daemons that fork() many children, page fault overhead matters too. Even reading COW pages causes minor faults on first touch. Under memory pressure, the kernel may evict clean COW pages from the page cache — if either process then writes, the kernel has to re-copy from scratch. Knowing this helps you reason about vm.swappiness and overcommit_memory settings for fork-heavy workloads.

The Address Space Duplication

When fork() returns in the child, the child has an identical but independent copy of:

The code segment (the program’s executable instructions)
The data segment (global and static variables)
The heap (dynamically allocated memory)
The stack (local variables and return addresses)
Open file descriptors (pointing to the same file table entries)

The child does not share memory with the parent — it has its own separate address space. But the contents at the moment of fork() are identical. If you want to understand what goes into a Process Control Block and how the OS tracks all this state, see the Process Concept post.

The exec() Family: Replacing the Program Image

fork() alone is not enough to run a different program. It only duplicates the calling process. To run an actual different program — say, invoking ls from a shell — the child process needs to replace its address space with the new program’s code and data.

There are six functions in the family, all calling the same kernel service:

Function	Arguments	Example
`execl()`	list	`execl("/bin/ls", "ls", "-la", NULL);`
`execv()`	array	`execv("/bin/ls", argv);`
`execle()`	list + env	`execle("/bin/ls", "ls", NULL, envp);`
`execve()`	array + env	`execve("/bin/ls", argv, envp);`
`execlp()`	list + PATH	`execlp("ls", "ls", "-la", NULL);`
`execvp()`	array + PATH	`execvp("ls", argv);`

The p variants search PATH so you can run ls without typing /bin/ls. The e variants pass a custom environment instead of inheriting the parent’s.

What exec() Actually Does

Calling exec() does not create a new process. It replaces the current process’s address space with the code and data of the executable file. The PID does not change. Open file descriptors that are not marked O_CLOEXEC remain open. The process simply stops running its old program and starts running the new one from the entry point.

If exec() succeeds, the function never returns — the old program is gone. If exec() returns at all, it means it failed, and the code continues in the old program just as if an error occurred.

#include <stdio.h>
#include <unistd.h>

int main() {
    printf("About to exec ls...\n");

    execlp("ls", "ls", "-la", NULL);

    // If we reach here, exec failed
    perror("exec failed");
    return 1;
}

The fork()+exec() Pattern: How Shells and Servers Work

Every time you run a command in a shell, the shell performs the classic fork()+exec() sequence:

The shell calls fork() to create a child process.
The shell calls exec() to replace the child’s address space with the program you requested.
The shell calls wait() to suspend itself until the child finishes.

The shell does not run your command directly — it first duplicates itself, then swaps the duplicate for your program. This separation is deliberate. It means the shell’s own address space stays intact, ready to parse the next command.

A Complete fork()+exec()+wait() Example

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>

int main() {
    pid_t pid = fork();

    if (pid < 0) {
        perror("fork failed");
        return 1;
    }

    if (pid == 0) {
        // Child process: replace with ls
        execlp("ls", "ls", "-la", NULL);
        // exec failed if we reach here
        perror("exec failed");
        exit(1);
    }

    // Parent process: wait for child to finish
    int status;
    waitpid(pid, &status, 0);

    if (WIFEXITED(status)) {
        printf("Child exited with code %d\n", WEXITSTATUS(status));
    }

    return 0;
}

How Web Servers Spawn Workers

Web servers like Apache and Nginx use this same pattern, just at much larger scale. At startup, the master process binds to port 80. When a request arrives, the master calls fork() to spawn a worker child. The child inherits the listening socket, so it does not need to rebind. Then the child either calls exec() to run a different program or — in many modern servers — just keeps running the same server code in worker mode.

Apache still uses the classic fork()+exec() pattern in its prefork MPM — each child handles one connection at a time, and for CGI scripts the child calls exec() to run the script. Nginx takes a different route: a single master process forks a pool of workers at startup, and each worker handles thousands of connections without ever calling exec(). The workers just run the same nginx binary in worker mode. The pattern is fork-without-exec — fork() to spawn multiple workers, no exec() because the new program is identical to the parent.

Both models depend on socket inheritance through the file descriptor table. When fork() duplicates the table, the child gets fd 0, 1, 2 plus whatever else the parent had open. If the parent had a socket bound to port 80, that socket is in the table too — the child can accept connections on it immediately, no rebinding needed. This is exactly why fork()+exec() works so cleanly for servers: one call transfers all the established I/O infrastructure.

Modern process managers like systemd and PM2 have largely taken over from manual fork+exec worker management. The process manager forks a single instance and keeps it alive. When load spikes, it forks more copies (usually one per CPU core) and uses socket activation to distribute incoming connections. The application never calls fork() — it just listens on a socket that systemd passed in. Cleaner, and you get horizontal scaling without managing the pool yourself.

wait() and Zombie Processes

When a child process terminates, it does not disappear immediately. The kernel keeps certain information about it — its exit status, resource usage statistics — until the parent retrieves it. A process in this state is called a zombie.

The parent retrieves this information using wait() or waitpid(). Until the parent calls one of these, the child’s entry in the process table remains, consuming a slot. If the parent exits before the child, the child is adopted by the init process (PID 1), which always calls wait() on its children. This is why orphan processes do not become zombies — init cleans them up.

A Practical wait() Example

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>

int main() {
    pid_t pid = fork();

    if (pid == 0) {
        // Child does some work
        sleep(2);
        printf("Child finishing\n");
        exit(42);  // Exit with status 42
    }

    // Parent waits specifically for this child
    int status;
    waitpid(pid, &status, 0);

    if (WIFEXITED(status)) {
        printf("Child exited with status %d\n", WEXITSTATUS(status));
    }

    return 0;
}

What Happens When a Parent Exits

If a parent exits before its child, the child’s PCB (Process Control Block) is reparented to init. init periodically calls wait() on all its children, collecting their exit statuses and removing them from the process table. This means orphan processes are short-lived in a properly functioning system — they only persist when the parent fails to call wait().

A common mistake is for a parent to fork and then continue doing other work without ever calling wait(). Over time, this can exhaust the process table (there is a system limit on the number of zombie entries). The solution is to either call wait() in the parent, or if the parent does not care about the child’s exit status, call waitpid(pid, NULL, WNOHANG) in a loop or register a signal handler for SIGCHLD.

Reparenting changes more than just the parent PID in the child’s PCB. File descriptors stay open — sockets, files, pipes all remain functional. The child’s working directory does not move. Memory mappings and signal dispositions are untouched. Only the parent PID field changes. On Linux, prctl(PR_SET_PDEATHSIG, sig) lets a process ask for a signal when its parent dies — daemons use this to detect parent death and shut down cleanly.

The session leader throws in another wrinkle. If the parent was the session leader (the process that opened the controlling terminal) and it exits, the terminal does not automatically detach from the child. The child still has it open unless it closed it already. SIGHUP fires to the session leader’s foreground process group when the terminal closes — but that is separate from the parent’s death. Daemons double-fork deliberately to escape the session leader’s group and become immune to terminal-generated signals.

On systemd systems, reparented processes go to systemd, which calls wait() on them when the service stops. systemd reaps its children diligently, so zombie buildup from a negligent parent is less common there. In containers where the init process is not systemd, though, you can still hit zombie table exhaustion if a parent skips wait().

Fork Bombs and Process Limits

A fork bomb is a denial-of-service attack where a process keeps calling fork() as fast as possible, creating exponentially growing numbers of child processes until the system runs out of process table entries, memory, or both. The classic form:

// Do not run this
while(1) fork();

Modern systems mitigate this with per-user and per-process resource limits. The ulimit command (in the shell) and the setrlimit() system call control the maximum number of processes a user can create. When a process hits its limit, fork() returns -1 with errno set to EAGAIN.

System administrators can also use control groups (cgroups) on Linux to limit the number of processes a container or user can spawn. For more on how threads relate to processes in this context, see Threads & Lightweight Processes.

Interview Questions

1. What happens to open file descriptors during fork()?

All open file descriptors are duplicated in the child process. The child shares the same file table entries as the parent, meaning they reference the same open file description. This is why the fork()+exec() pattern works for I/O redirection — the child can close or duplicate file descriptors before calling exec(), affecting what the new program reads from or writes to.

2. What is the key difference between fork() and vfork()?

vfork() was introduced to address the performance overhead of copying the parent's page tables during fork(). Like fork(), vfork() creates a child process, but it does not copy the parent's address space — the child runs in the parent's address space until exec() or exit(). The parent is blocked while the child runs. vfork() is essentially obsolete now that copy-on-write makes fork() efficient enough for most use cases. On modern Linux, vfork() is implemented as a wrapper around fork() with copy-on-write disabled for performance comparisons.

3. How does exec() affect the signal handling of a process?

When exec() replaces the process image, the signal disposition is reset to the default for all signals. Custom signal handlers defined with signal() or sigaction() are not carried over to the new program. However, the signal mask (which signals are blocked) is preserved across exec(). Additionally, if a signal's disposition is set to SIG_IGN or SIG_DFL before exec(), those dispositions are also preserved.

4. Why does fork() return twice instead of once?

The kernel implements fork() by duplicating the current process and scheduling both the parent and child to continue running. From the kernel's perspective, both processes exist and both need to resume execution. The return value differs because the kernel detects which process is running — it sets the return value to 0 in the child (the newly created process) and to the child's PID in the parent. This design lets both processes determine their role and branch accordingly. It is the fundamental mechanism that makes the parent-child relationship explicit in the code.

5. What is the relationship between fork(), zombies, and the wait() system call?

When a child terminates, its exit status and resource usage are preserved in the kernel until the parent retrieves them via wait(). Until the parent calls wait() or waitpid(), the child remains in the process table as a zombie. If the parent never calls wait() (a programming error), the zombie entry persists and can eventually exhaust process table slots. If the parent exits before the child, init inherits the child and automatically calls wait(), so zombies from orphaned children are cleaned up promptly. SIGCHLD handling with SA_NOCLDWAIT can also prevent zombies by instructing the kernel to discard child exit information.

6. What happens to the child's signal handling landscape after an exec() call?

When a process calls exec(), custom signal handlers are reset to default and custom signal dispositions are cleared. The new program's signal handling is whatever was installed in the executable.

However, the signal mask (which signals are currently blocked) is preserved across exec(). Also, if a signal's disposition is set to SIG_IGN or SIG_DFL before exec(), those are preserved.

This is why daemon processes use fork()+exec() — they can set up the child's signal handling after fork() and before exec(), and the exec() will apply the new program's handlers rather than inheriting the parent's custom ones.

7. What is the difference between O_CLOEXEC and FD_CLOEXEC?

O_CLOEXEC is an flag passed to open() or socket() when creating a file descriptor. It sets the close-on-exec flag at creation time — the descriptor will automatically close when any exec() call replaces the process image.

FD_CLOEXEC is used with fcntl(F_GETFD, FD_CLOEXEC) to set the close-on-exec flag on an existing descriptor. It's how you achieve the same effect for descriptors inherited from a parent process.

Both achieve the same result — preventing file descriptor leakage across exec(). O_CLOEXEC is slightly more efficient because it avoids the race between setting FD_CLOEXEC and another thread's fork()+exec().

8. What is posix_spawn() and when should you use it instead of fork()+exec()?

posix_spawn() combines fork() and exec() into a single call. It creates a child process and replaces it with a new program, handling file descriptor inheritance and signal management through attribute parameters.

Use posix_spawn() when:

Forking in a multi-threaded program — fork() is unsafe when other threads hold locks (they won't exist in child)
You need to control the child's signal handling, file descriptor table, or scheduling parameters atomically
Implementing a shell or command executor where you need predictable fork+exec behavior

posix_spawn() is effectively a standardized interface that handles the tricky parts of fork()+exec() in a way that works correctly in multi-threaded programs.

9. What is the copy-on-write mechanism in detail, and what triggers the actual copying?

At fork(), the kernel marks all pages in the parent's address space as read-only in both parent's and child's page tables. Both processes share the same physical pages. No actual memory copying happens at fork() time.

When either process tries to write to a shared page:

CPU raises a page fault (write to read-only page)
Kernel intercepts the fault, allocates a new physical page
Kernel copies the original page content to the new page
Kernel updates the writing process's page table to point to the new page
Kernel marks the new page as writable
Other process's page table entry still points to the original (read-only) page

The kernel also marks both page table entries with copy-on-write flags so future writes trigger the same mechanism for the other process.

10. How does the kernel handle fork() failures and what are the common error codes?

fork() can fail and return -1 with errno set to:

EAGAIN: The process's RLIMIT_NPROC limit has been reached, or the system's process table is full. This is the most common failure in containerized or resource-constrained environments.
ENOMEM: Not enough kernel memory to allocate the child's PCB or page tables. Very rare on modern systems with swap.

For EAGAIN, the application should throttle fork attempts or increase ulimit -u. For ENOMEM, the system itself is in trouble and needs intervention.

Note: fork() can succeed even if there isn't enough memory for the child to run (COW means actual memory is only allocated on write). fork() succeeds but the child may be killed by the OOM killer later if it writes heavily and there is no memory available.

11. What is the relationship between fork(), COW, and the OOM killer?

fork() succeeds immediately because COW defers memory copying — at fork() time, no actual memory needs to be allocated. The child starts with a copy-on-write mapping of the parent's memory.

If both parent and child write heavily (triggering COW for many pages), they may collectively allocate significantly more memory than either would have alone. If the system runs out of memory, the OOM killer selects a process to terminate.

The OOM killer tends to target processes that allocate the most memory or have been running the longest. A fork bomb or heavy COW activity can trigger it. cgroups v2 allows controlling OOM behavior per container.

12. How does fork() interact with memory-mapped files (mmap)?

When fork() is called, memory mappings (created by mmap) are inherited by the child. For file-backed mappings, both processes share the same physical pages (COW applies if MAP_PRIVATE). For anonymous mappings (MAP_ANONYMOUS), the COW mechanism applies — both initially share pages until either writes.

MAP_SHARED mappings behave differently: writes go directly to the underlying file. These are NOT copied on write — modifications are immediately visible to any process sharing the mapping.

After fork(), mmap() calls in either process are independent — new mappings in one process don't appear in the other.

13. What is the child process's initial CPU time after fork()?

After fork(), the child starts with utime and stime both set to 0. The child has not yet used any CPU time.

Accounting begins when the child is first scheduled. On Linux, the scheduler records the time when a process is scheduled in and calculates the delta when the process is scheduled out.

This means that if you fork a child and immediately wait() for it, you may see non-zero CPU times if the child ran briefly between fork and wait (even for a moment).

14. What happens to the child's nice value after fork()?

Nice value is inherited across fork(). The child starts with the same nice value as the parent. However, the child can then call setpriority() or nice() to adjust its own priority independently.

This inheritance applies to all scheduler properties — the child gets the same scheduler policy (SCHED_OTHER by default), the same CPU affinity mask, and the same nice value as the parent.

This is why a background job started with `nice -n 10 ./job &` from a shell with default nice has nice=10, even though the shell forked and exec'd the job process.

15. What is the difference between _exit() and exit() in Unix?

exit() is a standard C library function that performs cleanup before terminating: flushing stdio buffers, calling atexit() handlers, then calling _exit().

_exit() is a raw system call that terminates immediately without cleanup. It does not flush buffers, does not call atexit handlers, and does not invoke C++ destructors.

In a fork()+exec() context, if the child needs to terminate without running the new program (e.g., error handling after fork() before exec()), it should call _exit() rather than exit(). Calling exit() in the child after fork() can cause duplicate flushing of parent's buffers, since exit() was never supposed to be called after fork() in a multi-threaded program.

16. What is the semantic difference between wait() and waitpid() with WNOHANG?

wait() blocks until any child terminates, then returns its PID and status. If there are no children to wait for, it returns -1 with errno ECHILD.

waitpid(pid, status, 0) waits for a specific child (or any child if pid=-1). By default it blocks if the child hasn't exited.

waitpid(pid, status, WNOHANG) is non-blocking — if the specified child hasn't exited yet, it returns 0 immediately. This is essential in event-driven programs or signal handlers where blocking would be unacceptable.

A common pattern: in a SIGCHLD handler, use waitpid(-1, NULL, WNOHANG) in a loop to reap all terminated children without blocking the handler.

17. How does the kernel handle a child that outlives its parent?

When a process's parent exits before it, the kernel re-parents the child to the nearest ancestor that is still running — ultimately init (PID 1) or the nearest service manager (systemd on modern systems).

The reparenting happens immediately upon the parent's termination. The child continues running exactly as before, just with a different parent PID.

init/systemd periodically calls wait() on all adopted children to prevent zombie accumulation. This is why orphan processes are short-lived — they get reaped automatically.

18. What is the difference between a zombie and a defunct process?

Zombie and defunct are the same thing. A zombie (or defunct) process is one that has terminated but whose PCB entry remains in the process table because the parent has not yet read its exit status via wait().

Once the parent calls wait() and reads the exit status, the zombie is cleaned up. If the parent never calls wait() (a programming bug), the zombie persists indefinitely — or until the parent is killed, at which point the zombie is adopted and reaped by init.

19. What is the relationship between the CLONE_VM flag in clone() and fork()?

When CLONE_VM is set (shared address space), writes by either process are visible to the other. When CLONE_VM is NOT set (fork semantics), the child's page tables point to COW copies of the parent's pages.

vfork() sets CLONE_VM but also CLONE_VFORK and blocks the parent until the child calls exec() or exit(). The combination of shared VM and parent blocking is what allows vfork() to work without COW overhead.

20. How does system call interception relate to process creation (ptrace, seccomp)?

ptrace() allows a tracer process to observe and control another process's execution, including intercepting system calls. When ptrace attaches to a child after fork(), the tracer can intercept every system call the child makes.

seccomp (secure computing mode) filters system calls. When a process enters seccomp mode, it can only make a whitelist of allowed syscalls. Any other syscall results in SIGKILL.

Together, ptrace+seccomp form the basis of sandboxes: a process can fork a child, have the child enter seccomp mode (narrowing its available syscalls), then execute untrusted code. The parent uses ptrace to monitor the child. This is how strace and sandboxing tools work.

Quick Recap Checklist

fork() creates a child process; returns 0 in child, child’s PID in parent, -1 on error
exec() family replaces the current process image with a new program; never returns on success
fork()+exec() is the universal pattern for creating and running new programs
wait() / waitpid() retrieves a child’s exit status and removes its zombie entry
Copy-on-write defers memory copying at fork() until a write actually occurs
File descriptors are duplicated (shared) across fork(); use O_CLOEXEC to auto-close before exec()
vfork() is an obsolete optimization that shares the address space until exec()
Fork bombs exploit the fact that fork() succeeds until process limits are hit; mitigate with ulimit and cgroups

Conclusion

fork() and exec() are two of the most important system calls in Unix-like operating systems, and they are almost always used together. fork() duplicates the calling process, creating a child with an identical address space. exec() replaces that address space with a new program, without creating a new process. Together, they form the foundation of how shells run commands, how web servers spawn workers, and how every process on your system comes to exist.

The key takeaways are straightforward: fork() returns twice (once per process) with different values, copy-on-write makes fork() efficient by deferring memory copies until a write occurs, file descriptors are duplicated but can be redirected before exec(), and wait() is necessary to prevent zombie processes. Understanding these details will make you noticeably better at debugging systems, writing shell scripts, and designing robust server architectures.

For a deeper dive into the concepts introduced here, explore the Process Concept post to understand how the OS tracks processes internally, and the Process Scheduling post to see how the OS decides which process runs next.

Fork & Exec System Calls

Introduction

The Anatomy of fork()

Return Values Tell You Who You Are

Copy-on-Write: Why fork() Does Not Destroy Performance

The Address Space Duplication

The exec() Family: Replacing the Program Image

What exec() Actually Does

The fork()+exec() Pattern: How Shells and Servers Work

A Complete fork()+exec()+wait() Example

How Web Servers Spawn Workers

wait() and Zombie Processes

A Practical wait() Example

What Happens When a Parent Exits

Fork Bombs and Process Limits

Interview Questions

Further Reading

Quick Recap Checklist

Conclusion

Category

Tags

Related Posts

CPU Affinity & Real-Time Operating Systems

System Calls Interface

What Is an Operating System?