Back to journal
Architecture

The Architecture Behind Apps That Handle 10M Users (And What Yours Is Missing)

Most apps break at 10K users because of the same 5 mistakes. Here's the architecture playbook we use at PMML to build systems that scale — with real diagrams and code.

PMML Engineering · Studio 22 May 2026 10 min read 0 views
The Architecture Behind Apps That Handle 10M Users (And What Yours Is Missing)

You don't need to be Netflix to think about scale. If you're building a SaaS, marketplace, or mobile app, these architecture patterns will save you from a 3 AM production meltdown.

The 5 Things That Break First

Server room infrastructure

Every app that fails to scale hits the same walls:

  1. Database is the bottleneck — single Postgres instance, no read replicas, N+1 queries everywhere
  2. No caching layer — every request hits the database
  3. Synchronous processing — image resizing, emails, and webhooks block the request
  4. Monolithic deployment — a bug in billing takes down the entire app
  5. No observability — you don't know what's slow until users complain

Let's fix each one.

Pattern 1: Read Replicas + Connection Pooling

// Before: Single database connection
const db = new Pool({ connectionString: DATABASE_URL });

// After: Separate read/write pools with PgBouncer
const writePool = new Pool({
  connectionString: PRIMARY_DB_URL,
  max: 20,
});

const readPool = new Pool({
  connectionString: READ_REPLICA_URL,
  max: 50, // More read capacity
});

// Route queries based on intent
export function getDb(intent: "read" | "write" = "read") {
  return intent === "write" ? writePool : readPool;
}

Impact: 70% of most app queries are reads. Moving them to a replica instantly doubles your database throughput.

Pattern 2: Multi-Layer Caching

import Redis from "ioredis";

const redis = new Redis(REDIS_URL);

async function getUser(id: string) {
  // Layer 1: In-memory LRU (same-process, ~0.01ms)
  const memCached = lruCache.get(`user:${id}`);
  if (memCached) return memCached;

  // Layer 2: Redis (~1ms)
  const redisCached = await redis.get(`user:${id}`);
  if (redisCached) {
    const parsed = JSON.parse(redisCached);
    lruCache.set(`user:${id}`, parsed);
    return parsed;
  }

  // Layer 3: Database (~5-50ms)
  const user = await db.query("SELECT * FROM users WHERE id = $1", [id]);
  await redis.setex(`user:${id}`, 300, JSON.stringify(user));
  lruCache.set(`user:${id}`, user);
  return user;
}

Data flow architecture

Pattern 3: Background Job Queues

Never make users wait for things they don't need to wait for.

// Before: Blocking the request
app.post("/api/signup", async (req, res) => {
  const user = await createUser(req.body);
  await sendWelcomeEmail(user);       // 2-5 seconds
  await resizeAvatar(user.avatarUrl);  // 3-10 seconds
  await syncToCRM(user);              // 1-3 seconds
  res.json(user); // User waits 6-18 seconds 😱
});

// After: Queue background jobs
app.post("/api/signup", async (req, res) => {
  const user = await createUser(req.body);
  await queue.add("send-welcome-email", { userId: user.id });
  await queue.add("resize-avatar", { url: user.avatarUrl });
  await queue.add("sync-crm", { userId: user.id });
  res.json(user); // User waits ~200ms ⚡
});

We use BullMQ (Redis-backed) for most projects. For critical financial operations, we use a Postgres-backed queue for transactional guarantees.

Pattern 4: Service Boundaries (Not Microservices)

You don't need microservices. You need clear boundaries.

monolith/
├── src/
│   ├── modules/
│   │   ├── auth/          # Own database schema, own API routes
│   │   ├── billing/       # Can be extracted later if needed
│   │   ├── notifications/ # Already async via queue
│   │   └── core/          # Shared types and utilities
│   ├── shared/
│   │   ├── database.ts
│   │   ├── queue.ts
│   │   └── cache.ts
│   └── index.ts

Team architecture planning

Each module has its own schema, routes, and business logic. They communicate through well-defined interfaces — not direct database queries across boundaries. When a module needs to scale independently, extracting it into a service is straightforward because the boundary already exists.

Pattern 5: Observability From Day One

import { trace } from "@opentelemetry/api";

const tracer = trace.getTracer("api");

app.use(async (req, res, next) => {
  const span = tracer.startSpan(`${req.method} ${req.path}`);
  span.setAttribute("user.id", req.userId ?? "anonymous");

  const start = performance.now();
  res.on("finish", () => {
    const duration = performance.now() - start;
    span.setAttribute("http.status", res.statusCode);
    span.setAttribute("duration_ms", duration);

    // Alert on slow requests
    if (duration > 1000) {
      span.setAttribute("alert", "slow_request");
    }
    span.end();
  });
  next();
});

Ship OpenTelemetry on day one. When something breaks at scale, you'll know exactly which query, which service, and which user is affected — in seconds, not hours.

The Scale Checklist

Before launching to production, make sure you have:

  • Read replica(s) for your database
  • Redis caching with TTL strategy
  • Background job queue for async work
  • Clear module boundaries in your codebase
  • Structured logging and distributed tracing
  • Load testing results (we use k6)
  • Database indexes on all frequently queried columns
  • Connection pooling (PgBouncer or equivalent)

What PMML Ships

Every production system we deliver includes these patterns by default. Not because every app will hit 10M users on day one — but because retrofitting scale is 10x harder than building it in.

Building something that needs to scale? Start a conversation with us — we'll architect it right from day one.

#architecture#scaling#backend#performance

Keep reading

You might also like