System Design — Design WhatsApp (The Walkthrough I'd Use in an Actual Interview)

A admin · May 7, 2026 · 5 views

The most common mistake candidates make in a system design interview is jumping straight to the architecture diagram. They’ve seen the “design WhatsApp” question on a hundred YouTube videos, they have the answer memorized, and they start drawing boxes 30 seconds in.

That’s a fail signal for a senior interviewer. The job in the first 5 minutes isn’t to design — it’s to scope. WhatsApp the product is enormous. WhatsApp the system design question is whatever subset you and the interviewer agree to in those first 5 minutes. Get scoping right and the rest writes itself. Skip it and you’ll spend 40 minutes designing the wrong thing.

This post is a complete walk-through of the design, but with the framing I’d use in an actual loop. We’ll scope, estimate scale, pick the protocol, design the message pipeline, handle presence and read receipts, deal with media, then encryption, then scale. Throughout, I’ll flag the questions an interviewer is actually probing for at each step.

Step 1: Scope the Problem

Start by listing every feature WhatsApp has, then explicitly cut down with the interviewer. The list:

1:1 messaging
Group messaging (up to 1024 members in real WhatsApp)
Voice and video calls
Status / Stories
Media sharing (photos, videos, documents, voice notes)
Read receipts and typing indicators
End-to-end encryption
Multi-device sync (linked devices)
Backup & restore
Communities, Channels, Payments, Business APIs

For a 45–60 minute interview, no one expects all of this. A reasonable scoping conversation:

You: “Should I focus on 1:1 and group messaging plus media, or should I include calls and status too?”

Interviewer: “Let’s focus on messaging and media. Skip calls.”

You: “End-to-end encryption — should I assume it as a requirement and design around it, or skip the crypto details?”

Interviewer: “Assume it’s required, you don’t need to design the protocol.”

That’s 90 seconds and you’ve cut the problem to a manageable size. Functional scope: 1:1 and group messaging with media, with E2EE assumed. Non-functional: low latency (<500ms message delivery to online recipient), high availability (99.99%), durability (no message loss), works offline (queue and sync).

Step 2: Back-of-Envelope Numbers

Senior interviewers want to see you reason about scale before architecture. WhatsApp has ~3 billion monthly active users. Take 2 billion daily active. Average user sends 40 messages a day — that’s 80B messages/day, ~1M messages/second average, ~3M/sec at peak.

Storage: average message is ~100 bytes (text), so 80B × 100B = 8 TB of message data per day, just for text. Media is the real cost — if 10% of messages have media at average 500KB, that’s 4 PB/day.

Connections: with billions of users online, you need a connection-handling tier that can hold tens of millions of persistent connections per region. This is one of the hardest parts and we’ll come back to it.

You don’t need exact numbers. You need to demonstrate that you know where the scaling pressure lands: the connection tier, the media pipeline, and the message fan-out for groups. Mention these three in your estimate and the interviewer knows you’re not going to design a single Postgres instance and call it done.

Step 3: Pick the Transport — Why XMPP / WebSocket and Not REST

The single most important architectural decision is the transport, and most candidates underweight it.

REST/HTTP polling is wrong here. Polling at 1-second intervals across 2 billion users is 2 billion requests/second just to check for new messages — orders of magnitude more than the message rate. Long-polling is better but still creates and tears down connections constantly.

The right answer is a persistent bidirectional connection. Two real options: WebSocket (simpler, widely supported) or XMPP over a custom protocol (what real WhatsApp historically used, more efficient binary framing). For an interview, WebSocket is the cleanest answer. Mention you’d use a binary protocol on top (Protobuf, MessagePack) for payload efficiency.

Critically, the connection lives between the device and an edge server / gateway tier, not your application servers directly. Why: connection management is its own scaling problem. The gateway holds the socket, terminates TLS, authenticates, and routes messages to backend services over fast intra-DC channels.

┌─────────────┐  WebSocket  ┌────────────────┐  gRPC/Kafka  ┌──────────────┐
│   Android   │◄───────────►│  Gateway Tier  │◄────────────►│  Msg Service │
│   Client    │   (TLS)     │  (millions of  │   (intra-DC) │  (stateless) │
└─────────────┘             │   connections) │              └──────┬───────┘
                            └────────────────┘                     │
                                                                   ▼
                                                            ┌──────────────┐
                                                            │  Cassandra   │
                                                            │  Message DB  │
                                                            └──────────────┘

This separation is exactly what unlocks horizontal scaling on the connection side without coupling it to your application logic.

Step 4: The Message Send Path

Walk through what happens when Alice sends “hey” to Bob. This is the heart of the design and where interviewers go deep.

Alice’s Android client opens a WebSocket to a gateway (sticky session, keep-alive)
Client constructs the message: { from: alice, to: bob, body: ‘hey’, client_msg_id: uuid, timestamp: 16... }
Client sends over the socket and shows the message locally with status “sending”
Gateway authenticates Alice’s session, forwards to the Message Service
Message Service: assigns server-side message ID (monotonic per conversation), persists to message store, publishes to a delivery queue
Delivery worker picks up the message, looks up Bob’s active sessions in a Presence Service, finds his gateway, pushes the message down his open socket
Bob’s client receives, ACKs back to gateway, gateway ACKs to Message Service
Message Service emits a delivery event — this propagates back to Alice’s client (status changes from “sent” → “delivered”)
When Bob opens the chat, his client sends a read event — same path, status → “read”

The key insight: the client shows the message locally before any server confirms it. This is the “feels instant” UX that defines messaging apps. The status indicators (single check, double check, blue check) are async signals that catch up.

Senior follow-up: “What if Bob is offline?” The Message Service persists the message regardless. When Bob comes online, his client opens a socket, sends “give me messages since last_seen_id,” the gateway streams them down. This is the offline queue, and on Android it’s also why you keep a local SQLite/Room database that’s the source of truth for UI — the network is a sync mechanism, not a data dependency.

Step 5: Designing the Message Store

Pick the database based on access patterns, not vendor familiarity.

Access patterns for messaging:

Heavy write throughput (millions per second globally)
Read by conversation, ordered by time
Reads typically scoped to last N messages or “since timestamp X”
Almost no random access by message ID
Messages are immutable once sent (no updates except metadata: read status)

This profile screams Cassandra or another wide-column store. Schema:

CREATE TABLE messages (
    conversation_id  UUID,
    message_id       TIMEUUID,    -- time-ordered UUID
    sender_id        UUID,
    body             BLOB,        -- encrypted payload
    media_ref        TEXT,        -- pointer to blob storage if media
    PRIMARY KEY ((conversation_id), message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);

Partition key is conversation_id — all messages for a conversation live on the same node, sorted by time. Reading the last 50 messages is one slice on one partition: very fast. Inserting is one write to one partition. Cassandra also handles the write throughput Postgres can’t.

The trade-off you should call out: Cassandra’s eventual consistency means the “double-tap to see if your message arrived” race exists. Use QUORUM consistency for writes if it matters, or rely on the application-level ACK chain instead.

For metadata that needs strong consistency (user accounts, contacts, group membership), use Postgres or Spanner. Different data, different store. Don’t try to make Cassandra do everything.

Step 6: Group Messaging and the Fan-Out Problem

1:1 messaging is straightforward. Groups are where things get interesting because of fan-out: one send becomes N delivery operations.

Two strategies:

Fan-out on write. When Alice sends to a group of 200, the server writes 200 copies (one per recipient’s mailbox), and each delivery is a separate operation. Heavy on writes, light on reads.

Fan-out on read. One write to the group’s message log, every member reads from the shared log. Light on writes, requires every reader to track their own “last read” cursor.

WhatsApp uses fan-out on write for delivery (because mobile clients aren’t always online to read), with the message also stored once in the conversation log for history. The hybrid works because your storage is cheap (Cassandra) and your bottleneck is connection density.

For very large groups (1000+ members), fan-out on write becomes a problem — one send turns into 1000 delivery operations. Mitigations: batch them per-gateway (one fan-out per gateway holding multiple recipients), use a backpressure-aware queue (Kafka), and rate-limit truly enormous groups. You don’t need to solve the 100K-member channel problem in this interview unless asked — but mention you’re aware groups have a different scaling profile.

Step 7: Presence and Typing Indicators — Cheap-Looking, Surprisingly Hard

“Last seen” and typing indicators look like trivial features. They’re actually one of the heaviest sub-systems.

Why: presence updates fire constantly. User opens app → online. User backgrounds app → offline. User types → typing event every keystroke. Multiply by 2 billion users and naive presence becomes a flood.

Design choices:

Presence Service is in-memory only. Don’t persist to disk. Use Redis or a custom in-memory store sharded by user ID. Lookups are O(1).

Push presence only to interested parties. When Bob opens a chat with Alice, his client subscribes to Alice’s presence. Only then does Alice’s “came online” event get pushed to Bob’s gateway. Subscription expires when Bob backgrounds. This bounds the fan-out of presence events to active conversations.

Throttle typing events. Client sends one “typing” per 3 seconds, not per keystroke. Server forwards once. UI shows the indicator for 5 seconds before fading.

Heartbeat-based offline detection. Don’t track explicit logout — mobile networks drop connections constantly. Track presence as “active socket within last N seconds.” If no heartbeat for 30s, mark offline.

Read receipts use the same delivery infrastructure as messages but with a special type. They’re a normal message in the queue, just lighter weight.

Step 8: Media — Don’t Send Bytes Through the Message Pipe

Sending a 5MB photo through the message Cassandra is the wrong design. Media goes through a separate path:

Alice picks a photo. Client computes its hash (SHA-256).
Client requests a pre-signed upload URL from the Media Service, passing the hash.
If the hash already exists in storage (someone uploaded the same photo before), Media Service returns the existing URL — deduplication, big storage savings.
If new, Media Service issues a pre-signed S3 URL, valid for 5 minutes.
Client uploads directly to S3 (object storage).
Once upload completes, client sends a message through the normal pipe: text body is empty, media_ref points to the S3 object key.
Bob’s client receives the message, sees the media_ref, downloads from S3 (or CDN edge node).

This means Cassandra never holds binary blobs (it’s bad at them anyway). The message pipe stays small and fast. The CDN handles bulk byte movement near the user.

For E2EE, media is encrypted client-side before upload. The recipient’s client gets the decryption key via the (encrypted) message body. The CDN serves opaque ciphertext — even WhatsApp can’t see what’s in the photo.

Senior follow-up: “How do you handle thumbnail previews if everything is encrypted?” The sending client generates the thumbnail locally, encrypts it, embeds it in the message body. Fast preview, full media downloads on tap.

Step 9: Multi-Device Sync

The user has WhatsApp on phone, tablet, and web. All three should show the same state. This is harder than it sounds because of E2EE — the encryption keys live on each device, not on the server.

The model: there’s one “primary” device (phone) that holds the master identity key. Other devices register as “linked devices” with their own per-device keys. The Signal Protocol (which WhatsApp uses) supports this via sender keys — a sender encrypts each message N times, once for each of the recipient’s devices.

For an interview, you don’t need to detail the crypto. You need to call out the design implications:

Server stores ciphertext per recipient device, not per recipient user
Adding a new linked device requires syncing message history — either re-encrypted from a backup, or via a key transfer from the primary device
Removing a device must invalidate its keys server-side and re-key future messages
This is where backup/restore gets complex: backing up encrypted messages requires storing the keys somewhere (encrypted user-controlled cloud backup is the WhatsApp answer)

Mention these and you’re demonstrating awareness of the cross-cutting concerns. Don’t go deeper unless asked.

Step 10: The Android Client — Local Source of Truth

This is the part most system design write-ups ignore but matters massively for the user experience. The Android client isn’t a thin view of the server — it’s a full local replica.

// Conceptual local schema, mirrors the server’s but device-specific
@Entity
data class LocalMessage(
    @PrimaryKey val messageId: String,
    val conversationId: String,
    val senderId: String,
    val body: String,         // decrypted plaintext — safe, never leaves device
    val timestamp: Long,
    val status: MessageStatus // SENDING, SENT, DELIVERED, READ, FAILED
)

enum class MessageStatus { SENDING, SENT, DELIVERED, READ, FAILED }

Why this matters for the design:

UI reads from local DB only. The chat list, message list, search — all from Room. Network is a sync layer, not a UI dependency. This is what makes WhatsApp feel instant on a slow connection.

// The repository pattern that makes this work
class MessageRepository @Inject constructor(
    private val localDb: MessageDao,
    private val gatewayClient: GatewayClient
) {
    fun observeConversation(id: String): Flow<List<LocalMessage>> =
        localDb.observeMessages(id)
        // UI binds to this Flow — updates whenever local DB changes

    suspend fun send(text: String, conversationId: String) {
        val msg = LocalMessage(
            messageId = UUID.randomUUID().toString(),
            conversationId = conversationId,
            senderId = currentUserId,
            body = text,
            timestamp = System.currentTimeMillis(),
            status = MessageStatus.SENDING
        )
        localDb.insert(msg)
        // ✅ UI updates immediately via the Flow above

        try {
            val serverId = gatewayClient.sendMessage(msg)
            localDb.updateStatus(msg.messageId, MessageStatus.SENT, serverId)
        } catch (e: Exception) {
            localDb.updateStatus(msg.messageId, MessageStatus.FAILED)
            // Retry handled by WorkManager
        }
    }
}

Outbox pattern for sends. Anything queued with status SENDING gets retried on reconnect. WorkManager handles the “send when network returns” case for messages typed offline.

Sync on socket open. When the WebSocket connects after being offline, client sends “last_message_id_per_conversation” — server streams down everything since. Client merges into Room, UI updates via the Flow.

This pattern (offline-first, sync via Flow) is the right architecture for almost any app where network is intermittent. It’s not a WhatsApp specialty — it’s how every modern messaging or collaborative app works.

Step 11: Scaling and Geographic Distribution

If asked to scale to global, talk about:

Regional gateway clusters. Users in India connect to Mumbai. Users in Brazil connect to São Paulo. DNS-based routing or anycast IPs.

Cross-region replication for the message store. Cassandra does this natively. Writes go to the local region first, replicated async to others. For 1:1 messaging where both users are in-region, latency stays low. Cross-region messaging (Alice in India texts Bob in the US) costs the inter-DC round trip but is unavoidable.

Connection rebalancing. Gateways are stateful (they hold sockets). When you add a gateway, existing connections stay on old ones until they reconnect. Don’t try to migrate live connections — it’s not worth the complexity.

Caching. User profile, contact list, group membership — cache in Redis with short TTL. Don’t cache messages (they’re already partitioned and fast to read by conversation).

The Architecture — In One Picture

                          ┌──────────┐
                          │  Mobile  │
                          │  Client  │
                          └─────┬────┘
                                │ WebSocket (TLS, Protobuf)
                ┌───────────────┴───────────────┐
                │       Gateway Tier            │
                │   (regional, stateful,        │
                │    millions of connections)   │
                └─┬─────────────┬───────────┬───┘
                  │             │           │
                  ▼             ▼           ▼
           ┌──────────┐  ┌──────────┐  ┌──────────┐
           │ Message  │  │ Presence │  │  Media   │
           │ Service  │  │ Service  │  │ Service  │
           │(stateless)  │ (Redis)  │  │          │
           └────┬─────┘  └──────────┘  └────┬─────┘
                │                           │
                ▼                           ▼
        ┌──────────────┐             ┌──────────────┐
        │  Cassandra   │             │  Object Store│
        │  (messages)  │             │  + CDN (S3)  │
        └──────────────┘             └──────────────┘

        ┌──────────────┐             ┌──────────────┐
        │  Postgres    │             │   Kafka      │
        │  (users,     │             │  (delivery   │
        │   groups)    │             │   queue)     │
        └──────────────┘             └──────────────┘

Six services, three storage layers, one queue. Each piece does one thing. That’s the recommendation.

What Senior Interviewers Are Actually Probing

The questions above this section are setup. The deep-dive prompts that separate hire from no-hire:

“What happens if the gateway crashes mid-message?” — Tests whether you understand at-least-once delivery. Client retries on reconnect using client_msg_id; server deduplicates. Idempotency keys are how this works.

“How do you guarantee message ordering in a group?” — Per-conversation ordering via the Cassandra clustering key (TIMEUUID). Cross-conversation ordering is not guaranteed and shouldn’t be (no useful semantics).

“What if two people send simultaneously in a group?” — Server assigns the canonical order. Clients reconcile when they see the server-assigned IDs.

“How does ‘message edited’ work given E2EE and immutable messages?” — New encrypted message with a reference to the original. Clients overlay the edit in UI. Audit trail preserved.

“If Cassandra is eventually consistent, how do you avoid showing ‘phantom’ messages that disappear?” — Read your own writes via local DB; trust the server only as a sync source. The local DB is authoritative for the UI.

Each of these takes 60–90 seconds to answer well. If the interviewer asks four of them, you’ve had a good interview.

Common Failures in This Interview

Designing only the happy path. Senior interviewers will ask about offline, network drop, message loss, conflict resolution. If you didn’t plan for these, you’ll improvise badly.

Picking Postgres for messages. It’ll work at low scale and fall over at the rates we calculated. Pick the right tool and explain why.

Conflating presence with delivery. They’re different services with different consistency requirements. Presence is in-memory and lossy. Delivery is durable and guaranteed.

Forgetting the client. “Where does the data live on the device” is half the user experience. Mobile interviewers especially want to hear about local storage, sync, offline-first.

Trying to design too much. Don’t cover calls, Status, Payments, Communities unless asked. Depth beats breadth.

Closing

The trick to system design interviews isn’t knowing the “answer” — there isn’t one. It’s having a structured way to scope, estimate, design, and probe trade-offs out loud. WhatsApp is a useful exercise because it touches every interesting distributed-systems problem: persistent connections at scale, fan-out, eventual consistency, encrypted state, mobile sync.

If you remember three things: scope before designing, separate the connection tier from application logic, and the mobile client is a full local replica with the network as a sync layer. The rest you can derive from there.

Happy coding!

5 views · 0 comments

Comments (0)

No comments yet. Be the first to share your thoughts.