Architecture

The Architecture Behind Apps That Handle 10M Users (And What Yours Is Missing)

Most apps break at 10K users because of the same 5 mistakes. Here's the architecture playbook we use at PMML to build systems that scale — with real diagrams and code.

PMML Engineering · Studio 22 May 2026 10 min read 0 views

The Architecture Behind Apps That Handle 10M Users (And What Yours Is Missing)

You don't need to be Netflix to think about scale. If you're building a SaaS, marketplace, or mobile app, these architecture patterns will save you from a 3 AM production meltdown.

The 5 Things That Break First

Server room infrastructure

Every app that fails to scale hits the same walls:

Database is the bottleneck — single Postgres instance, no read replicas, N+1 queries everywhere
No caching layer — every request hits the database
Synchronous processing — image resizing, emails, and webhooks block the request
Monolithic deployment — a bug in billing takes down the entire app
No observability — you don't know what's slow until users complain

Let's fix each one.

Pattern 1: Read Replicas + Connection Pooling

// Before: Single database connection
const db = new Pool({ connectionString: DATABASE_URL });

// After: Separate read/write pools with PgBouncer
const writePool = new Pool({
  connectionString: PRIMARY_DB_URL,
  max: 20,
});

const readPool = new Pool({
  connectionString: READ_REPLICA_URL,
  max: 50, // More read capacity
});

// Route queries based on intent
export function getDb(intent: "read" | "write" = "read") {
  return intent === "write" ? writePool : readPool;
}

Impact: 70% of most app queries are reads. Moving them to a replica instantly doubles your database throughput.

Pattern 2: Multi-Layer Caching

import Redis from "ioredis";

const redis = new Redis(REDIS_URL);

async function getUser(id: string) {
  // Layer 1: In-memory LRU (same-process, ~0.01ms)
  const memCached = lruCache.get(`user:${id}`);
  if (memCached) return memCached;

  // Layer 2: Redis (~1ms)
  const redisCached = await redis.get(`user:${id}`);
  if (redisCached) {
    const parsed = JSON.parse(redisCached);
    lruCache.set(`user:${id}`, parsed);
    return parsed;
  }

  // Layer 3: Database (~5-50ms)
  const user = await db.query("SELECT * FROM users WHERE id = $1", [id]);
  await redis.setex(`user:${id}`, 300, JSON.stringify(user));
  lruCache.set(`user:${id}`, user);
  return user;
}

Data flow architecture

Pattern 3: Background Job Queues

Never make users wait for things they don't need to wait for.

// Before: Blocking the request
app.post("/api/signup", async (req, res) => {
  const user = await createUser(req.body);
  await sendWelcomeEmail(user);       // 2-5 seconds
  await resizeAvatar(user.avatarUrl);  // 3-10 seconds
  await syncToCRM(user);              // 1-3 seconds
  res.json(user); // User waits 6-18 seconds 😱
});

// After: Queue background jobs
app.post("/api/signup", async (req, res) => {
  const user = await createUser(req.body);
  await queue.add("send-welcome-email", { userId: user.id });
  await queue.add("resize-avatar", { url: user.avatarUrl });
  await queue.add("sync-crm", { userId: user.id });
  res.json(user); // User waits ~200ms ⚡
});

We use BullMQ (Redis-backed) for most projects. For critical financial operations, we use a Postgres-backed queue for transactional guarantees.

Pattern 4: Service Boundaries (Not Microservices)

You don't need microservices. You need clear boundaries.

monolith/
├── src/
│   ├── modules/
│   │   ├── auth/          # Own database schema, own API routes
│   │   ├── billing/       # Can be extracted later if needed
│   │   ├── notifications/ # Already async via queue
│   │   └── core/          # Shared types and utilities
│   ├── shared/
│   │   ├── database.ts
│   │   ├── queue.ts
│   │   └── cache.ts
│   └── index.ts

Team architecture planning

Each module has its own schema, routes, and business logic. They communicate through well-defined interfaces — not direct database queries across boundaries. When a module needs to scale independently, extracting it into a service is straightforward because the boundary already exists.

Pattern 5: Observability From Day One

import { trace } from "@opentelemetry/api";

const tracer = trace.getTracer("api");

app.use(async (req, res, next) => {
  const span = tracer.startSpan(`${req.method} ${req.path}`);
  span.setAttribute("user.id", req.userId ?? "anonymous");

  const start = performance.now();
  res.on("finish", () => {
    const duration = performance.now() - start;
    span.setAttribute("http.status", res.statusCode);
    span.setAttribute("duration_ms", duration);

    // Alert on slow requests
    if (duration > 1000) {
      span.setAttribute("alert", "slow_request");
    }
    span.end();
  });
  next();
});

Ship OpenTelemetry on day one. When something breaks at scale, you'll know exactly which query, which service, and which user is affected — in seconds, not hours.

The Scale Checklist

Before launching to production, make sure you have:

Read replica(s) for your database
Redis caching with TTL strategy
Background job queue for async work
Clear module boundaries in your codebase
Structured logging and distributed tracing
Load testing results (we use k6)
Database indexes on all frequently queried columns
Connection pooling (PgBouncer or equivalent)

What PMML Ships

Every production system we deliver includes these patterns by default. Not because every app will hit 10M users on day one — but because retrofitting scale is 10x harder than building it in.

Building something that needs to scale? Start a conversation with us — we'll architect it right from day one.

#architecture#scaling#backend#performance

The Architecture Behind Apps That Handle 10M Users (And What Yours Is Missing)

The 5 Things That Break First

Pattern 1: Read Replicas + Connection Pooling

Pattern 2: Multi-Layer Caching

Pattern 3: Background Job Queues

Pattern 4: Service Boundaries (Not Microservices)

Pattern 5: Observability From Day One

The Scale Checklist

What PMML Ships

You might also like

From Figma to Production in 4 Hours: Our Design-to-Code Workflow

Database Indexing Explained: The 20-Minute Guide That Saves Hours of Debugging

Why Your Side Project Fails (And the 1-Week MVP Framework That Works)