The Architecture Behind Apps That Handle 10M Users (And What Yours Is Missing)
Most apps break at 10K users because of the same 5 mistakes. Here's the architecture playbook we use at PMML to build systems that scale — with real diagrams and code.
You don't need to be Netflix to think about scale. If you're building a SaaS, marketplace, or mobile app, these architecture patterns will save you from a 3 AM production meltdown.
The 5 Things That Break First
Every app that fails to scale hits the same walls:
- Database is the bottleneck — single Postgres instance, no read replicas, N+1 queries everywhere
- No caching layer — every request hits the database
- Synchronous processing — image resizing, emails, and webhooks block the request
- Monolithic deployment — a bug in billing takes down the entire app
- No observability — you don't know what's slow until users complain
Let's fix each one.
Pattern 1: Read Replicas + Connection Pooling
// Before: Single database connection
const db = new Pool({ connectionString: DATABASE_URL });
// After: Separate read/write pools with PgBouncer
const writePool = new Pool({
connectionString: PRIMARY_DB_URL,
max: 20,
});
const readPool = new Pool({
connectionString: READ_REPLICA_URL,
max: 50, // More read capacity
});
// Route queries based on intent
export function getDb(intent: "read" | "write" = "read") {
return intent === "write" ? writePool : readPool;
}
Impact: 70% of most app queries are reads. Moving them to a replica instantly doubles your database throughput.
Pattern 2: Multi-Layer Caching
import Redis from "ioredis";
const redis = new Redis(REDIS_URL);
async function getUser(id: string) {
// Layer 1: In-memory LRU (same-process, ~0.01ms)
const memCached = lruCache.get(`user:${id}`);
if (memCached) return memCached;
// Layer 2: Redis (~1ms)
const redisCached = await redis.get(`user:${id}`);
if (redisCached) {
const parsed = JSON.parse(redisCached);
lruCache.set(`user:${id}`, parsed);
return parsed;
}
// Layer 3: Database (~5-50ms)
const user = await db.query("SELECT * FROM users WHERE id = $1", [id]);
await redis.setex(`user:${id}`, 300, JSON.stringify(user));
lruCache.set(`user:${id}`, user);
return user;
}
Pattern 3: Background Job Queues
Never make users wait for things they don't need to wait for.
// Before: Blocking the request
app.post("/api/signup", async (req, res) => {
const user = await createUser(req.body);
await sendWelcomeEmail(user); // 2-5 seconds
await resizeAvatar(user.avatarUrl); // 3-10 seconds
await syncToCRM(user); // 1-3 seconds
res.json(user); // User waits 6-18 seconds 😱
});
// After: Queue background jobs
app.post("/api/signup", async (req, res) => {
const user = await createUser(req.body);
await queue.add("send-welcome-email", { userId: user.id });
await queue.add("resize-avatar", { url: user.avatarUrl });
await queue.add("sync-crm", { userId: user.id });
res.json(user); // User waits ~200ms ⚡
});
We use BullMQ (Redis-backed) for most projects. For critical financial operations, we use a Postgres-backed queue for transactional guarantees.
Pattern 4: Service Boundaries (Not Microservices)
You don't need microservices. You need clear boundaries.
monolith/
├── src/
│ ├── modules/
│ │ ├── auth/ # Own database schema, own API routes
│ │ ├── billing/ # Can be extracted later if needed
│ │ ├── notifications/ # Already async via queue
│ │ └── core/ # Shared types and utilities
│ ├── shared/
│ │ ├── database.ts
│ │ ├── queue.ts
│ │ └── cache.ts
│ └── index.ts
Each module has its own schema, routes, and business logic. They communicate through well-defined interfaces — not direct database queries across boundaries. When a module needs to scale independently, extracting it into a service is straightforward because the boundary already exists.
Pattern 5: Observability From Day One
import { trace } from "@opentelemetry/api";
const tracer = trace.getTracer("api");
app.use(async (req, res, next) => {
const span = tracer.startSpan(`${req.method} ${req.path}`);
span.setAttribute("user.id", req.userId ?? "anonymous");
const start = performance.now();
res.on("finish", () => {
const duration = performance.now() - start;
span.setAttribute("http.status", res.statusCode);
span.setAttribute("duration_ms", duration);
// Alert on slow requests
if (duration > 1000) {
span.setAttribute("alert", "slow_request");
}
span.end();
});
next();
});
Ship OpenTelemetry on day one. When something breaks at scale, you'll know exactly which query, which service, and which user is affected — in seconds, not hours.
The Scale Checklist
Before launching to production, make sure you have:
- Read replica(s) for your database
- Redis caching with TTL strategy
- Background job queue for async work
- Clear module boundaries in your codebase
- Structured logging and distributed tracing
- Load testing results (we use k6)
- Database indexes on all frequently queried columns
- Connection pooling (PgBouncer or equivalent)
What PMML Ships
Every production system we deliver includes these patterns by default. Not because every app will hit 10M users on day one — but because retrofitting scale is 10x harder than building it in.
Building something that needs to scale? Start a conversation with us — we'll architect it right from day one.