GraphQL vs REST: Choosing the Right API Paradigm
Compare GraphQL and REST APIs, understand when to use each approach, schema design, queries, mutations, and trade-offs between the two paradigms.
Introduction
GraphQL came out of Facebook in 2012 and reached open source in 2015. The pitch: instead of multiple endpoints returning fixed data structures, you get one endpoint where the client asks for exactly what it needs.
That sounds simple, but it changes how you think about APIs. REST is resource-oriented — you fetch what the server decides. GraphQL is query-oriented — the client decides. Each model has strengths and trade-offs.
graph LR
A[Client] -->|"POST /graphql"| B[GraphQL Server]
B --> C[Schema]
C --> D[Resolvers]
D --> E[Data Sources]
The client specifies exactly what data it needs in the query. The server returns exactly that data, nothing more.
Core Concepts
REST: Multiple Endpoints
REST uses different endpoints for different resources:
# REST: Multiple endpoints
GET /users/123
GET /users/123/posts
GET /users/123/followers
Each endpoint returns a fixed data structure. If you need a user’s posts with follower counts, you might need multiple requests or get more data than necessary.
GraphQL: Single Endpoint
GraphQL uses a single endpoint with flexible queries:
# GraphQL: Single endpoint, flexible query
POST /graphql
query {
user(id: 123) {
name
posts {
title
}
followersCount
}
}
One request gets exactly the data you need.
Query Patterns
Queries
REST Queries
# REST: Get user and their posts
GET /users/123
GET /users/123/posts
GraphQL Queries
# GraphQL: One request, precise data
query GetUserWithPosts($userId: ID!) {
user(id: $userId) {
name
email
posts {
title
createdAt
}
}
}
GraphQL queries run in parallel automatically. If you request multiple fields, GraphQL fetches them concurrently.
Mutation Patterns
Mutations
REST Mutations
# REST: Create, update, delete with different HTTP methods
POST /users
PUT /users/123
DELETE /users/123
GraphQL Mutations
# GraphQL: Mutations are explicit
mutation CreateUser($input: CreateUserInput!) {
createUser(input: $input) {
id
name
email
}
}
mutation UpdateUser($id: ID!, $input: UpdateUserInput!) {
updateUser(id: $id, input: $input) {
id
name
email
}
}
mutation DeleteUser($id: ID!) {
deleteUser(id: $id)
}
Subscription Architecture
GraphQL Subscriptions Deep Dive
GraphQL has native Subscriptions for real-time updates, unlike REST which relies on polling or webhooks.
Subscription Protocol
Subscriptions use WebSockets under the hood. The client subscribes once, and the server pushes updates:
sequenceDiagram
Client->>Server: SUBSCRIBE mutation (WebSocket)
Server-->>Client: Confirm subscription
Server->>Database: Watch for changes
Database-->>Server: New data event
Server-->>Client: Push: { data: { postCreated: {...} } }
Server->>Database: Continue watching
Database-->>Server: Another event
Server-->>Client: Push: { data: { postCreated: {...} } }
Schema Definition
type Subscription {
postCreated: Post!
postUpdated(id: ID!): Post!
userJoined(roomId: ID!): User!
}
type Mutation {
createPost(input: CreatePostInput!): Post!
updatePost(id: ID!, input: UpdatePostInput!): Post!
joinRoom(roomId: ID!): Room!
}
type Query {
posts: [Post!]!
}
Subscription Resolver Implementation
import { PubSub } from 'graphql-subscriptions';
const pubsub = new PubSub();
// Define event names as constants
const POST_CREATED = 'POST_CREATED';
const POST_UPDATED = 'POST_UPDATED';
// Resolvers
const resolvers = {
Subscription: {
postCreated: {
subscribe: () => pubsub.asyncIterator([POST_CREATED]),
},
postUpdated: {
subscribe: (_, { id }) => pubsub.asyncIterator(`${POST_UPDATED}_${id}`),
},
},
Mutation: {
createPost: (_, { input }, { pubsub }) => {
const post = await db.posts.create(input);
// Publish the event to all subscribers
pubsub.publish(POST_CREATED, { postCreated: post });
return post;
},
updatePost: (_, { id, input }, { pubsub }) => {
const post = await db.posts.update(id, input);
// Publish to specific post subscribers
pubsub.publish(`${POST_UPDATED}_${id}`, { postUpdated: post });
return post;
},
},
};
Subscription Production Patterns
Filtering Subscriptions
Not all clients should receive all updates. Filter by authorization, room membership, or other criteria:
type Subscription {
# Only gets events the user is authorized to see
documentUpdated(documentId: ID!): Document!
}
const resolvers = {
Subscription: {
documentUpdated: {
subscribe: async function* (_, { documentId }, context) {
// Check authorization
if (!await context.user.canView(documentId)) {
throw new Error('Not authorized');
}
// Create async generator that yields when matching events
const eventEmitter = context.documentEvents.filter(
event => event.documentId === documentId
);
for await (const event of eventEmitter) {
yield { documentUpdated: event.document };
}
},
},
},
};
Production Subscription Architecture
graph TB
Client1[WebSocket Client] --> LB[Load Balancer]
Client2[WebSocket Client] --> LB
Client3[WebSocket Client] --> LB
LB --> Server1[GraphQL Server 1]
LB --> Server2[GraphQL Server 2]
Server1 --> RedisPubSub[Redis Pub/Sub]
Server2 --> RedisPubSub
RedisPubSub --> MessageBroker[Redis / RabbitMQ]
MessageBroker --> Server1
MessageBroker --> Server2
For multi-server deployments, use Redis Pub/Sub or a message broker:
import { RedisPubSub } from "graphql-redis-subscriptions";
const pubsub = new RedisPubSub({
connection: {
host: process.env.REDIS_HOST,
port: 6379,
retryStrategy: (times) => Math.min(times * 50, 2000),
},
});
// Use Redis-backed pub/sub for horizontal scaling
const resolvers = {
Subscription: {
postCreated: {
subscribe: () => pubsub.asyncIterator([POST_CREATED]),
},
},
};
Subscription Gotchas and Mitigations
| Issue | Problem | Mitigation |
|---|---|---|
| Memory leaks | Subscriptions hold server resources indefinitely | Set max, maxAge on PubSub; implement client heartbeat |
| Reconnection storms | Clients reconnect in burst after outage | Implement exponential backoff; deduplicate on reconnect |
| Authorization drift | User loses access but keeps subscription | Re-validate authorization periodically; emit “kicked” event |
| Query complexity | Subscription queries are as complex as regular queries | Apply same complexity limits; consider simplified subscription payloads |
// Limit subscription count per connection
const pubsub = new PubSub({
maxSubscriptionPerConnection: 100,
});
// Set TTL for subscription events
const pubsub = new PubSub({
eventTTL: 10, // seconds
});
When to Use Subscriptions vs Polling vs Webhooks
| Approach | Latency | Scalability | Use Case |
|---|---|---|---|
| Subscriptions | Instant | Medium | UI updates, live collaboration |
| Polling | Poll interval | High | Infrequent updates, simple clients |
| Webhooks | Near-instant | High | Cross-service communication |
| Server-Sent Events | Near-instant | Medium | One-way server push, simpler than WS |
For most GraphQL use cases: subscriptions for real-time UI, REST webhooks for cross-service events.
Persisted Queries
Persisted Queries & Query Whitelisting
By default, GraphQL accepts any query string sent by clients. This flexibility is powerful but creates security and performance problems. Persisted queries solve both.
The Problem with Dynamic Queries
Every GraphQL request sends the full query string:
# Every request - even identical ones - sends the full query
POST /graphql
{
"query": "query GetUser { user(id: $id) { name email posts { title } } }",
"variables": { "id": "123" }
}
This means:
- Security: Attackers can send complex or malicious queries
- Performance: Server parses and validates the same queries repeatedly
- Bandwidth: Large queries consume unnecessary network overhead
How Persisted Queries Work
Instead of sending the full query, clients send a hash that references a pre-registered query:
# Instead of full query...
POST /graphql
{ "query": "{ user(id: 123) { name email } }" }
# Client sends query ID (SHA-256 hash)
POST /graphql
{ "extensions": { "persistedQuery": { "version": 1, "sha256Hash": "a1b2c3d4e5f6..." } } }
Server looks up the hash, executes the pre-validated query.
Apollo Server Implementation
import { ApolloServer } from "@apollo/server";
import { hashQuery } from "@apollo/utils.usestripping";
import { LocalCache } from "apollo-server-cache-local";
// 1. Define persisted query plugin
const createPersistedQueryPlugin = (queryRegistry) => ({
async didResolveOperation({ operation, document }) {
// Skip if client sent full query (add to registry)
if (!operation.extensions?.persistedQuery) {
const hash = hashQuery(document);
queryRegistry.set(hash, operation.document);
}
},
});
// 2. Create server with plugin
const server = new ApolloServer({
typeDefs,
resolvers,
plugins: [createPersistedQueryPlugin(queryRegistry)],
// Reject unknown queries (not in registry)
allowDynamicPersistedQueries: false, // Default: false in production
// For development, allow full queries too
// persistedQueries: {
// cache: new LocalCache({ ttl: 3600 }),
// },
});
Building the Query Registry
// Build-time: generate registry from client queries
// (run as part of your CI/CD build)
import fs from "fs";
import path from "path";
import { hashQuery, parse } from "graphql";
const QUERIES_DIR = "./src/queries";
const REGISTRY_FILE = "./query-registry.json";
function buildQueryRegistry() {
const registry = {};
// Find all .graphql files
const queryFiles = glob.sync(`${QUERIES_DIR}/**/*.graphql`);
for (const file of queryFiles) {
const content = fs.readFileSync(file, "utf-8");
const document = parse(content);
// Extract operation names
for (const definition of document.definitions) {
if (definition.kind === "OperationDefinition") {
const hash = hashQuery(document); // SHA-256 of operation
registry[hash] = {
operationName: definition.name?.value || "anonymous",
file,
query: content,
};
}
}
}
// Write registry to file (upload to CDN/deployment)
fs.writeFileSync(REGISTRY_FILE, JSON.stringify(registry, null, 2));
console.log(`Registered ${Object.keys(registry).length} queries`);
}
buildQueryRegistry();
Client-Side Query Integration
Client-Side Integration
// React Apollo Client
import { createPersistedQueryLink } from "@apollo/client/link/persisted";
import { sha256 } from "crypto-hash";
const persistedLink = createPersistedQueryLink({ sha256 });
// Combine with http link
const httpLink = new HttpLink({ uri: "/graphql" });
const link = persistedLink.concat(httpLink);
// Or with Apollo Client 3+
import { ApolloClient, InMemoryCache, createHttpLink } from "@apollo/client";
import { ApolloLink } from "@apollo/client/link";
import { createPersistedQueryLink } from "@apollo/client/link/persisted";
const httpLink = createHttpLink({ uri: "/graphql" });
const persistedLink = createPersistedQueryLink({ sha256 });
const link = ApolloLink.from([persistedLink, httpLink]);
const client = new ApolloClient({
cache: new InMemoryCache(),
link,
});
Query Whitelisting (Strict Mode)
For maximum security, reject any query not in your pre-approved registry:
// Server-side: only allow registered queries
const server = new ApolloServer({
// ... other config
// CRITICAL: Reject all unregistered queries
allowDynamicPersistedQueries: false, // ← This is the key setting
plugins: [
{
async didResolveOperation({ operation }) {
const { sha256Hash } = operation.extensions?.persistedQuery || {};
if (!sha256Hash) {
throw new Error(
"Persisted query required. Send persistedQuery extension.",
);
}
if (!queryRegistry.has(sha256Hash)) {
throw new Error(`Unknown query: ${sha256Hash}`);
}
// Replace operation with registered version
operation.document = queryRegistry.get(sha256Hash);
},
},
],
});
Benefits Summary
| Benefit | Without Persisted Queries | With Persisted Queries |
|---|---|---|
| Security | Full query injection risk | Only pre-approved queries run |
| Parse overhead | Every request parsed | Parsed once at build time |
| Network | Full query string each request | 64-char hash only |
| CDN caching | Not cacheable | Persisted queries can use GET |
| Rate limiting | Hard to fingerprint | Per-query rate limits possible |
When to Use
| Use Case | Recommendation |
|---|---|
| Public API | Required—prevent abuse |
| Mobile apps | Highly recommended—bandwidth savings |
| Internal tools | Optional—still helps with parsing overhead |
| Development | Skip—full queries for flexibility |
Data Fetching and N+1 Problem
Data Fetching
Overfetching and Underfetching
REST often leads to overfetching (getting more data than needed) or underfetching (needing multiple requests):
# REST: Gets more data than needed
GET /users/123
# Returns: { id, name, email, created_at, updated_at, profile_url, bio, ... }
# REST: Multiple requests for related data
GET /users/123 # User info
GET /users/123/posts # User's posts
GET /posts/456/comments # Comments for a specific post
GraphQL solves both problems:
# GraphQL: Exact data
query {
user(id: 123) {
name # Only what you need
posts {
title # Only what you need
}
}
}
N+1 Problem
GraphQL can suffer from the N+1 problem: fetching a list of users, then making a separate request for each user’s posts:
# This could trigger many database queries
query {
users {
name
posts {
title # Triggers query for each user's posts
}
}
}
DataLoader solves this by batching requests.
DataLoader Patterns Deep Dive
The N+1 problem is GraphQL’s most notorious performance pitfall. DataLoader is Facebook’s official solution—a batching and caching library that coalesces multiple requests into fewer database queries.
DataLoader Deep Dive
How DataLoader Works
DataLoader works by queueing up individual field requests during query execution, then dispatching them as a single batched query when the field is accessed.
import DataLoader from "dataloader";
// Create a batch function that fetches users by IDs
const userLoader = new DataLoader(async (ids) => {
// This runs once for all pending user lookups
const users = await db.users.findMany({ where: { id: { in: ids } } });
// DataLoader expects results in the same order as input IDs
const userMap = new Map(users.map((u) => [u.id, u]));
return ids.map((id) => userMap.get(id) || null);
});
// In your resolver
const resolvers = {
User: {
posts: (user, args, context) => context.postLoader.load(user.id),
},
Post: {
author: (post, args, context) => context.userLoader.load(post.authorId),
},
};
Batching vs Caching
DataLoader provides two distinct benefits:
// Batching: Multiple users requested in same query
// Query: { users { posts { author } } }
// Only ONE batch call to posts for ALL users
const userLoader = new DataLoader(async (userIds) => {
// Single query: SELECT * FROM posts WHERE authorId IN (...)
const allPosts = await Post.findMany({ authorId: { in: userIds } });
// Group by authorId
const postsByAuthor = allPosts.reduce((acc, post) => {
(acc[post.authorId] ||= []).push(post);
return acc;
}, {});
return userIds.map((id) => postsByAuthor[id] || []);
});
// Caching: Same user requested multiple times in query
// Query: { user(id: 1) { author { name } } }
// { user(id: 1) { posts { title } } }
// The user is loaded once, second request hits cache
Common DataLoader Gotchas
1. Cache key collisions with nullable types
// Problem: null and id:123 could share cache if not careful
// DataLoader uses the key directly—ensure consistent types
const loader = new DataLoader((keys) => batchLoad(keys));
// Good: always return consistent types
load(userId); // string
load(parseInt(id, 10)); // number - could collide!
// Better: normalize to consistent type
const loader = new DataLoader((keys) => batchLoad(keys.map((k) => String(k))), {
cacheKeyFn: (k) => String(k),
});
2. Memoization vs database freshness
// DataLoader caches per-request by default
// For long-running servers, use Redis/Memcached
import Redis from "ioredis";
const redis = new Redis();
// Custom cache map with TTL
const createPersistentLoader = (batchFn, ttlMs = 60000) => {
const cache = new Map();
return new DataLoader(async (keys) => {
const now = Date.now();
const results = await Promise.all(
keys.map(async (key) => {
const cached = cache.get(key);
if (cached && now - cached.timestamp < ttlMs) {
return cached.value;
}
return null; // Let DataLoader handle batch miss
}),
);
// Batch load uncached keys
const uncachedKeys = keys.filter((_, i) => results[i] === null);
if (uncachedKeys.length > 0) {
const uncachedResults = await batchFn(uncachedKeys);
uncachedResults.forEach((value, i) => {
cache.set(uncachedKeys[i], { value, timestamp: now });
});
}
return results.map((cached, i) =>
cached !== null
? cached.value
: uncachedResults[uncachedKeys.indexOf(keys[i])],
);
});
};
3. Handling partial failures in batches
// Some IDs fail, some succeed - handle gracefully
const userLoader = new DataLoader(async (ids) => {
const users = await User.findMany({ where: { id: { in: ids } } });
const userMap = new Map(users.map((u) => [u.id, u]));
return ids.map((id) => {
const user = userMap.get(id);
if (!user) {
// Return Error for missing, not null
// This preserves the error in GraphQL response
return new Error(`User ${id} not found`);
}
return user;
});
});
DataLoader with Different Data Sources
// REST API as a data source
const remoteServiceLoader = new DataLoader(async (ids) => {
const responses = await Promise.all(
ids.map((id) => fetch(`/api/users/${id}`).then((r) => r.json())),
);
return responses;
});
// MongoDB with aggregation pipeline
const mongoLoader = new DataLoader(async (objectIds) => {
const results = await User.aggregate([
{ $match: { _id: { $in: objectIds } } },
{
$lookup: {
from: "posts",
localField: "_id",
foreignField: "authorId",
as: "posts",
},
},
]);
const resultMap = new Map(results.map((r) => [r._id.toString(), r]));
return objectIds.map((id) => resultMap.get(id.toString()) || null);
});
Error Handling
Error Handling Overview
REST Errors
REST uses HTTP status codes:
HTTP/1.1 404 Not Found
Content-Type: application/json
{"error": "User not found"}
GraphQL Errors
GraphQL returns 200 OK even for errors. Errors are in the response body:
{
"data": null,
"errors": [
{
"message": "User not found",
"locations": [{ "line": 3, "column": 5 }],
"path": ["user"]
}
]
}
This is controversial. Some prefer HTTP status codes for errors.
Caching Strategies
Effective caching is critical for performance in both REST and GraphQL, but each requires different strategies.
REST Caching
REST works well with HTTP caching:
GET /users/123
Cache-Control: max-age=3600
ETag: "v1"
CDNs, browser caches, and libraries like React Query handle REST caching well.
GraphQL Caching
GraphQL POST requests are harder to cache by default. Solutions:
- Normalized caching with Apollo Client or Relay
- Persisted queries that become GET requests
- Response caching at the CDN level
Client-Side Caching Strategies
Server-side caching for GraphQL is tricky because POST requests with dynamic queries don’t benefit from URL-based caching. Client-side caching becomes essential.
Apollo Client Cache
Apollo Client 3 uses a normalized in-memory cache with automatic cache updates:
import { ApolloClient, InMemoryCache, makeVar } from '@apollo/client';
// Reactive variables for local state
export const cartItemsVar = makeVar<string[]>([]);
// Configure normalized cache
const cache = new InMemoryCache({
typePolicies: {
// Customize field-level read/write
User: {
fields: {
// Automatically merge paginated posts
posts: {
keyArgs: false, // Same posts field for all users
merge(existing = [], incoming, { args }) {
// Cursor-based pagination merge
return {
...incoming,
items: [...(existing.items || []), ...incoming.items],
cursor: incoming.cursor,
};
},
},
// Real-time: update cache when subscription fires
notifications: {
merge(existing, incoming) {
return incoming; // Replace on each notification
},
},
},
},
Query: {
fields: {
// Debounce duplicate queries
searchUsers: {
keyArgs: ['query'],
mergeLimit: 1, // Only keep most recent
},
},
},
},
});
const client = new ApolloClient({ cache, link });
Cache Normalization
By default, Apollo denormalizes responses. For apps with related entities, normalize for consistency:
const cache = new InMemoryCache({
// Every User and Post stored by their ID
dataIdFromObject: (object) => {
switch (object.__typename) {
case "User":
return `User:${object.id}`;
case "Post":
return `Post:${object.id}`;
default:
return object.id;
}
},
});
// Now cache handles updates automatically
// If Post with id:123 is updated via mutation,
// all queries showing that post update automatically
Advanced Caching Patterns
Relay Cursor Connections
For paginated data, Relay Connections spec provides standardized pagination:
# Standardized connection pagination
query {
user(id: "123") {
postsConnection(first: 10, after: "cursor123") {
edges {
node {
id
title
}
cursor
}
pageInfo {
hasNextPage
endCursor
}
}
}
}
// Apollo cache understands connection structure
const cache = new InMemoryCache({
typePolicies: {
User: {
fields: {
postsConnection: {
// Apollo handles cursor-based pagination automatically
// when using apollo-utilities
keyArgs: false,
},
},
},
},
});
// Pagination query
const { data, fetchMore } = useQuery(POSTS_QUERY, {
variables: { first: 10 },
});
// Load more
const loadMore = () => {
return fetchMore({
variables: {
after: data.user.postsConnection.pageInfo.endCursor,
},
updateQuery: (prev, { fetchMoreResult }) => {
if (!fetchMoreResult) return prev;
return {
user: {
...prev.user,
postsConnection: {
...fetchMoreResult.user.postsConnection,
edges: [
...prev.user.postsConnection.edges,
...fetchMoreResult.user.postsConnection.edges,
],
},
},
};
},
});
};
Cache Invalidation Patterns
GraphQL cache invalidation is trickier than REST because the server doesn’t know what’s cached:
// Pattern 1: Cache as source of truth - mutate cache directly
const [createPost] = useMutation(CREATE_POST_MUTATION, {
update: (cache, { data: { createPost } }) => {
// Read existing query
const existing = cache.readQuery({ query: USER_POSTS_QUERY });
// Write new data to cache
cache.writeQuery({
query: USER_POSTS_QUERY,
data: {
user: {
...existing.user,
posts: [createPost, ...existing.user.posts],
},
},
});
},
});
// Pattern 2: Evict and refetch
const [deletePost] = useMutation(DELETE_POST_MUTATION, {
// Evict specific post from cache
refetchQueries: [{ query: USER_POSTS_QUERY }],
// Or evict specific key
awaitRefetchQueries: true,
});
// Pattern 3: Using @client directive for local-only fields
const cache = new InMemoryCache();
const client = new ApolloClient({ cache });
// Define local-only field
const typeDefs = `
type User {
isOnline: Boolean @client
cartItems: [ID!] @client
}
`;
// Read/write local state
const { data } = useQuery(GET_USER);
const [setOnline] = useMutation(SET_ONLINE_MUTATION);
// Update local state directly
setOnline({ variables: { isOnline: true } });
// Apollo updates cache, UI reacts automatically
Cache Performance Tips
| Pattern | Use When | Benefit |
|---|---|---|
keyArgs: ['field'] | Multiple similar queries | Cache reused across components |
merge(existing, incoming) | Pagination | Append to lists correctly |
read function | Transform data | Format dates, combine fields client-side |
@client directive | Local-only state | No server roundtrip for UI state |
cache-and-network fetch policy | Real-time updates | Show cached immediately, update from server |
Comparison Table
| Aspect | REST | GraphQL |
|---|---|---|
| Data fetching | Multiple requests | Single request |
| Data shape | Fixed per endpoint | Client specifies |
| Typing | Documentation | Schema enforced |
| Caching | HTTP caching | Custom caching |
| Learning curve | Lower | Higher |
| Tooling | Mature | Evolving |
| Error handling | HTTP status codes | 200 + error body |
| Overfetching | Common | Avoided |
Combining Both
You do not have to choose one exclusively. Some teams use REST for simple operations and GraphQL for complex data requirements.
# REST for simple operations
GET /health
GET /config
# GraphQL for complex data
POST /graphql
Schema Federation
As GraphQL usage scales across multiple teams, a single monolithic schema becomes unwieldy. Schema Federation and Schema Stitching are approaches for composing distributed GraphQL schemas.
Schema Federation (Apollo)
Federation decomposes a schema into independent subgraphs that can be developed and deployed separately:
graph TB
Client[GraphQL Client] --> Gateway[Apollo Gateway]
Gateway --> Users[Users Subgraph]
Gateway --> Orders[Orders Subgraph]
Gateway --> Products[Products Subgraph]
Users --> UsersDB[(Users DB)]
Orders --> OrdersDB[(Orders DB)]
Products --> ProductsDB[(Products DB)]
# products subgraph - src/subgraphs/products.ts
import { gql, Subgraph } from '@apollo/subgraph';
export const productsSubgraph: Subgraph = {
name: 'products',
typeDefs: gql`
type Product @key(fields: "id") {
id: ID!
name: String!
price: Float!
reviews: [Review!]!
}
type Review {
id: ID!
rating: Int!
comment: String!
}
extend type Query {
product(id: ID!): Product
productsOnSale: [Product!]!
}
`,
resolvers: {
Product: {
// Resolve reviews from external subgraph
__resolveReference: (product) => {
return { __typename: 'Product', id: product.id };
},
},
Query: {
product: (_, { id }) => getProductById(id),
productsOnSale: () => getProductsOnSale(),
},
},
};
# users subgraph extends Product with reviews
# src/subgraphs/users.ts
import { gql, Subgraph } from '@apollo/subgraph';
export const usersSubgraph: Subgraph = {
name: 'users',
typeDefs: gql`
extend type Product @key(fields: "id") {
id: ID! @external
reviews: [Review!]! # Defined in products, extended here
}
type Review @key(fields: "id") {
id: ID! @external
author: User!
}
type User {
id: ID!
name: String!
reviews: [Review!]!
}
`,
};
Schema Stitching (Legacy)
Stitching merges schemas at the boundary level, often with custom logic:
import { mergeSchemas } from "@graphql-tools/schema";
import { makeExecutableSchema } from "@graphql-tools/schema";
const usersSchema = makeExecutableSchema({
typeDefs: usersTypeDefs,
resolvers: usersResolvers,
});
const ordersSchema = makeExecutableSchema({
typeDefs: ordersTypeDefs,
resolvers: ordersResolvers,
});
// Merge schemas
const stitchedSchema = mergeSchemas({
schemas: [usersSchema, ordersSchema],
// Custom resolvers for cross-schema references
resolvers: (mergeInfo) => ({
User: {
orders: {
resolve(user, args, context, info) {
return mergeInfo.delegateToSchema({
schema: ordersSchema,
operation: "query",
fieldName: "ordersByUser",
args: { userId: user.id },
info,
});
},
},
},
}),
});
Federation vs Stitching
| Aspect | Federation | Stitching |
|---|---|---|
| Architecture | Gateway + autonomous subgraphs | Centralized merged schema |
| Schema ownership | Teams own their types | Central team owns merged schema |
| Deployment | Independent subgraph deploys | Full redeploy on changes |
| Query planning | Gateway routes to subgraphs | Stitching layer plans queries |
| @key directive | Entities shared across subgraphs | Type merging for shared types |
| Production maturity | Battle-tested at scale (Netflix, Airbnb) | More complex, less common now |
When to Use Federation
Federation shines when:
- Multiple teams own different domain areas
- Services are already microservices
- Independent deployment is important
- You want clear ownership boundaries
# Example: User service owns User type, can reference Product
type User {
id: ID!
name: String!
# Products this user has purchased - reference to Product entity
purchasedProducts: [Product!]! @requires(fields: "id")
}
Hybrid Pattern: REST + GraphQL Federation
You can federate REST APIs alongside GraphQL subgraphs:
import { ApolloGateway } from "@apollo/gateway";
import { RemoteGraphQLDataSource } from "@apollo/gateway";
const gateway = new ApolloGateway({
serviceList: [
{ name: "users", url: "http://users-service/graphql" },
{ name: "products", url: "http://products-service/graphql" },
// REST API exposed as GraphQL via REST Data Source
{ name: "inventory", url: "http://inventory-service/graphql" },
],
// Custom data source for REST
dataSources: () => ({
inventory: new RestDataSource("http://inventory-service/api"),
}),
});
When to Choose REST vs GraphQL
When to Choose REST
REST works well when:
- Your API is simple with predictable data requirements
- You need HTTP caching
- You are building public APIs consumed by many clients
- Your team is familiar with REST
- You need simple documentation (just list endpoints)
Examples: CRUD applications, simple CRUD APIs, public APIs for third-party developers
When to Choose GraphQL
GraphQL works well when:
- Clients need different data shapes
- Mobile apps needing minimal data transfer
- Complex domains with many related entities
- Rapid iteration with frontend teams
- You want strong typing and schema validation
Examples: Mobile apps, complex dashboards, microservices with varying client needs
Trade-off Analysis
Decision Framework
| Factor | REST | GraphQL |
|---|---|---|
| Data Fetching | Multiple endpoints; over/underfetching common | Single endpoint; exact data shapes |
| Caching | Native HTTP caching; CDN-friendly | Custom client-side caching; persisted queries for CDN |
| Type Safety | No enforced schema; OpenAPI optional | Strongly typed schema; self-documenting |
| Learning Curve | Simpler for beginners; familiar HTTP concepts | Steeper; requires understanding of queries and resolvers |
| Tooling | Mature ecosystem; standard HTTP debugging | GraphQL-specific IDEs; introspection for discovery |
| Performance | Predictable; caching reduces load | N+1 risk without DataLoader; complex query optimization |
| Real-time | Polling or webhooks; not native | Native subscriptions via WebSockets |
| Security | Endpoint-based rate limiting; familiar patterns | Query complexity limits; introspection control |
| Schema Evolution | Versioned APIs; breaking changes managed | Additive changes; @deprecated directive |
| Team Ownership | Endpoint-per-team; clear boundaries | Schema ownership; federation for scaling |
Coexistence Pattern
Many organizations use both. A pragmatic approach:
// api/index.js
import { createServer } from "./api-server";
// REST for simple, stable resources
app.use("/api/users", usersRestRouter);
app.use("/api/health", healthCheckRouter);
// GraphQL for complex, client-driven data
app.use("/api/graphql", graphqlMiddleware);
// Bridge: REST endpoints call GraphQL resolvers internally
app.get("/api/users/:id/summary", async (req, res) => {
const result = await graphql.execute({
query: `query UserSummary($id: ID!) {
user(id: $id) { name email role }
}`,
variables: { id: req.params.id },
});
res.json(result.data);
});
When to Migrate
| Signal | Action |
|---|---|
| Mobile app requests 15+ REST calls per screen | Consider GraphQL |
| Frontend team blocked by backend endpoint pace | GraphQL gives frontend autonomy |
| Complex data graphs with varying client needs | GraphQL shines here |
| Heavy HTTP caching requirements | REST likely sufficient |
| Simple CRUD with predictable data shapes | REST is probably fine |
| Third-party public API | REST for stability and caching |
Production Failure Scenarios
| Failure | Impact | Mitigation |
|---|---|---|
| Query complexity explosion | Server overwhelmed; possible DoS | Implement query depth limiting; set complexity budgets |
| N+1 query problem | Database flooded with queries; slow responses | Use DataLoader for batching; optimize resolvers |
| Schema introspection abuse | Information leakage; enumeration attacks | Disable introspection in production; restrict access |
| Subscription memory leaks | Server memory grows; eventual crash | Set subscription limits; implement timeouts |
| Mutation race conditions | Data inconsistency between related operations | Implement optimistic locking; use transactions |
| Persisted query abuse | Attackers pre-store malicious queries | Validate persisted query hashes; rate limit |
| Error masking with 200 OK | Errors hidden; debugging difficult | Return proper error status codes; use extensions |
Observability Checklist
Metrics
- Query rate by operation type (query/mutation/subscription)
- Query complexity distribution
- Request duration by operation and field
- Error rate by error type
- DataLoader batch efficiency (cache hit ratio)
- Schema introspection requests
- Subscription active count
- Query depth and breadth distribution
Logs
- Query requests with variables and complexity
- Mutation requests with authorization context
- DataLoader batch operations and cache misses
- Error responses with path and locations
- Schema change events
- Subscription lifecycle events
- Security events (introspection attempts, rate limit hits)
Alerts
- Query complexity exceeds threshold
- Error rate exceeds normal baseline
- DataLoader cache hit ratio drops below 80%
- Subscription count exceeds limits
- Unusual introspection activity
- Query depth spikes indicate potential attack
Security Checklist
- Disable schema introspection in production
- Implement query complexity limits and depth limits
- Use persisted queries to prevent abuse
- Validate and sanitize all variable inputs
- Implement proper authorization at resolver level
- Log and monitor unusual query patterns
- Rate limit queries per client
- Protect against query batching attacks (array literals)
- Use query whitelisting for sensitive operations
- Validate that mutations affect only intended fields
- Implement request timeout at GraphQL layer
- Do not expose internal error details in responses
Common Pitfalls / Anti-Patterns
Overusing GraphQL for Simple Cases
GraphQL adds complexity. Simple REST endpoints may be better.
# Overkill: GraphQL for simple, predictable data
query {
healthCheck {
status
}
}
# Better: Simple REST endpoint
GET /health
Ignoring N+1 Queries
GraphQL makes N+1 problems easy to create.
# Problem: Fetches posts for each user separately
query {
users {
name
posts {
title
} # N queries for N users
}
}
# Better: Use DataLoader to batch
query {
users {
name
posts {
title
} # Single batched query
}
}
Not Implementing Proper Error Handling
GraphQL returns 200 OK even for errors.
// Problem: Error masked as success
{
"data": { "user": null },
"errors": [{ "message": "Not authorized" }]
}
// Better: Use proper HTTP status codes
{
"errors": [{
"extensions": { "code": "UNAUTHORIZED" }
}]
}
Exposing Schema Internals
Introspection can reveal your entire schema.
// Disable introspection in production
const server = new ApolloServer({
schema,
introspection: false, // Production
playground: false, // Production
});
Interview Questions
Fundamentals
The N+1 problem occurs when fetching a list of items triggers separate requests for each item's related data. For example, querying 100 users and their posts makes 1 query for users plus 100 queries for posts.
DataLoader solves this by:
- Batching: Collecting all requested IDs during query execution, then loading them in a single database query
- Caching: Memoizing results within a request to avoid duplicate loads
- Queuing: Using the loader in resolvers, which automatically coalesces requests
Queries are idempotent reads—they don't modify data. Mutations are operations that cause side effects and modify server state.
Naming conventions:
- Queries as nouns (user, posts) — you "ask for" data
- Mutations as verbs (createUser, updatePost) — you "command" an action
This convention makes it immediately clear in tooling and documentation which operations modify state.
The reasoning is that HTTP and GraphQL live on separate layers. A query might partially succeed—some fields resolve, others don't. Returning 200 with both data and errors preserves that nuance.
It trips up traditional monitoring, though. Your HTTP-aware tools expect non-2xx for failures, not a 200 with an error body. And authorization errors masquerading as successes is genuinely unsettling.
Best practice: stick error codes in extensions so clients can actually detect failures.
Design & Architecture
GraphQL Subscriptions use WebSockets for bidirectional communication. The server pushes data when relevant events occur.
Comparison:
- Subscriptions: Instant push, long-lived connections, best for UI updates
- Polling: Simple, HTTP-based, predictable load—good for infrequent updates
- Webhooks: HTTP callbacks from server to client, standard for cross-service events
Most systems use all three for different use cases.
Schema stitching merges multiple schemas into one at a gateway layer. The gateway acts as a unified API surface.
Federation (Apollo) decomposes schema ownership—each subgraph owns its types and the gateway routes queries to the right subgraph. Federation is more scalable and team-friendly.
Stitching is older, more complex to maintain, and less commonly used in new projects.
Persisted queries replace full query strings with SHA-256 hashes sent from client to server.
Security: Only pre-registered queries execute—prevents query injection attacks and introspection abuse.
Performance: Queries are pre-validated at build time—no parsing or validation overhead at runtime. Network payload shrinks from full query to 64-character hash.
Apollo's normalized cache stores entities by their @id type, so updating a User:id:123 automatically updates all queries that reference that user. This prevents cache inconsistency.
Use normalized cache when:
- Multiple components show the same entity
- Mutations update entities that appear in cached queries
- You want automatic cache updates without manual
refetchQueries
REST wins for public APIs because HTTP caching is built in, debugging is straightforward, and consumers can rate-limit per endpoint. GraphQL gives clients flexibility to fetch exactly what they need with a self-documenting schema, but that flexibility comes at a cost: no standard caching, steeper learning curve, harder to debug.
For diverse third-party consumers, REST is usually the safer bet. For internal tools or complex client-driven UIs, GraphQL pays off.
Advanced & Production
Stack several approaches:
- Query depth limits stop deeply nested attacks
- Complexity analysis assigns costs to fields—reject anything too expensive
- Persisted queries mean only pre-registered operations run
- Rate limiting per client prevents abuse
- Timeouts kill queries that run too long
- Turn off introspection in production so attackers can't enumerate your schema
CDN caching breaks with GraphQL because POST bodies aren't URL-keyed. Your options:
- Persisted queries as GET requests—hash goes in the URL, CDN can cache
- Response caching keyed by operation name plus variables hash
- Some CDNs (Cloudflare, for example) can fingerprint GraphQL requests and cache accordingly
On the browser, Apollo or Relay handle client-side caching. Service workers add offline support. Normalized cache keeps shared entity data consistent across components.
DataLoader queues up requests during execution, then fires one batched query when the loader runs. Each key either hits cache or misses and gets batch-loaded. Results map back to the original key order.
Partial failures are interesting—the batch function can return Error objects alongside successful results. DataLoader propagates the Error to whichever field requested the failed key, while everything else succeeds. Nice for granular error handling.
Offset pagination (skip/limit) is dead simple but doesn't scale. Page N means scanning N times the page size. Add a row and everything shifts.
Cursors use stable markers. Page 2 after page 1 stays consistent even with concurrent writes. This is why infinite scroll and social feeds use cursors.
Use offset for small admin UIs where jumping to page 5 directly is valuable. Use cursors for anything user-facing with large datasets.
A few things trip people up:
- Teams need time to internalize GraphQL thinking—it's not just another REST
- Your existing tooling (CI/CD, monitoring, API gateways) probably needs updates
- Don't rewrite everything at once—run GraphQL alongside REST and migrate domain by domain
- Schema ownership gets thorny fast: who can modify what? Set clear boundaries early
- N+1 problems will surface. DataLoader isn't optional.
- Security surface is different—introspection attacks, query complexity, different rate-limiting vectors
GraphQL spec says nothing about files. You've got options:
- Multipart requests via
graphql-uploadlibrary - Base64 encoding (terrible—bloats the payload)
- Pre-signed URLs (S3, GCS, whatever)—upload goes direct, you just pass the URL to GraphQL
- REST for uploads, GraphQL for the mutation that references the result
Pre-signed URLs scale best. The heavy lifting happens outside your server.
Do it in resolvers:
const resolvers = {
User: {
email: (user, args, context) => {
if (context.user.id !== user.id && !context.user.isAdmin) {
return null;
}
return user.email;
},
},
};
Always check at resolver level, never trust the client to filter. Return null for unauthorized fields instead of throwing—it gives you partial data rather than killing the whole query.
Subscriptions are built on WebSockets for bidirectional, long-lived connections. The client opens a WebSocket, sends a subscription query, and the server pushes updates when events occur.
For distributed scaling (multiple GraphQL servers):
- Use Redis Pub/Sub or RabbitMQ to broadcast events across server instances
- Each server subscribes to the message broker and forwards relevant events to its connected clients
- Connection state (subscribed channels per client) must be tracked and cleaned up on disconnect
GraphQL sits on top of HTTP, so authentication is typically HTTP-based:
- JWT tokens in Authorization header—validated in middleware before GraphQL execution
- Cookies for web apps; tokens for mobile/native clients
- Authorization happens in resolvers—check context.user permissions per field
Field-level authorization: return null for unauthorized fields rather than throwing, so partial data still returns.
Schema-first: you write SDL first, then generate resolvers. Pros: clear contract, easy review, self-documenting. Cons: resolver code may drift from schema.
Code-first: you define types in code (using decorators, classes, or builder patterns) and schema is generated from them. Pros: type safety in your language, less duplication. Cons: schema is derived, not explicit.
Most teams end up with a hybrid—they write SDL for complex types but generate parts programmatically.
The @deprecated directive marks fields or enum values as obsolete:
type User {
name: String!
fullName: String @deprecated(reason: "Use 'name' instead")
role: UserRole!
}
Clients introspection can show deprecated fields with warnings. This lets you:
- Remove fields gradually—old clients see warnings, new clients use the replacement
- Communicate breaking changes without immediately cutting off old clients
- Keep schema backward compatible while guiding migration
Test at multiple layers:
- Unit tests: resolver logic with mock context
- Integration tests: execute full queries against a test database
- Schema tests: validate that schema transformations produce expected output
Tools:
graphql-testing-libraryfor integration tests@graphql-tools/mockfor schema mocking- Apollo Server test mode for end-to-end execution
- Load testing with
k6orartilleryfor performance
Further Reading
Official Documentation
- GraphQL Specification — The complete GraphQL specification
- Apollo GraphQL Docs — Apollo Server and Client documentation
- REST Maturity Model — Richardson’s REST maturity levels
Schema Design
- Schema Design Principles — Official GraphQL query documentation
- Schema Federation — Apollo Federation architecture
- GraphQL Schema Stitching — Schema stitching vs federation
Performance
- DataLoader — Official Facebook DataLoader library
- Query Complexity Analysis — Apollo query planning
- Persisted Queries — Apollo persisted queries documentation
Security
- GraphQL Security — Official GraphQL security best practices
- Query Depth Limiting — Apollo query depth limits
- Introspection Control — Disabling introspection
Conclusion
REST and GraphQL make different trade-offs. REST is resource-oriented with fixed endpoints and built-in HTTP caching. GraphQL is query-oriented with flexible data fetching and strong typing.
REST works well for simple, predictable APIs where caching matters. GraphQL works well for complex, client-driven data requirements where you want to avoid overfetching. Neither is universally better.
For REST API design, see the RESTful API Design post. For versioning both types of APIs, see the API Versioning Strategies post.
Category
Related Posts
RESTful API Design: Best Practices for Building Web APIs
Learn REST principles, resource naming, HTTP methods, status codes, and best practices. Design clean, maintainable, and scalable RESTful APIs.
API Versioning: Managing Change Without Breaking Clients
Learn API versioning strategies: URL path, header, and query parameter approaches. Understand backward compatibility, deprecation practices, and migration patterns.
Rate Limiting: Token Bucket, Sliding Window, and Distributed Systems
Rate limiting protects APIs from abuse. Learn token bucket, sliding window, fixed window algorithms and distributed rate limiting at scale.