System Design — Design Instagram Feed (Push vs. Pull, the Decision That Shapes Everything)

A admin · May 7, 2026 · 4 views

The thing that makes “design Instagram’s feed” a great interview question is that it has one pivotal architectural decision and a candidate’s answer to it tells you everything about their seniority. Get that decision right, defend the trade-offs, and the rest of the design follows. Get it wrong — or worse, not realize there’s a decision to make — and you’ll spend the interview retrofitting around it.

The decision: do you compute each user’s feed when they post, or when their followers open the app? Push vs. pull. Fan-out-on-write vs. fan-out-on-read. Every other choice in this design — storage, caching, ranking, image delivery — gets shaped by it.

This post is the full walk-through with that decision as its spine. We’ll scope, estimate, work through both models honestly (including why each is wrong on its own), arrive at the hybrid that real Instagram-scale systems use, then handle ranking, image delivery, the Android client, and the deep-dive questions senior interviewers actually ask. Different shape from my WhatsApp post because the problem itself has a different shape: messaging is about getting bytes from A to B; feeds are about choosing which bytes to show out of millions of options.

Step 1: Scope — What “Instagram Feed” Means in 45 Minutes

Instagram has feeds, Reels, Stories, Explore, DMs, Shopping, and a dozen other surfaces. Most of those are out of scope. Get explicit:

You: “I’ll focus on the home feed — the chronological-or-ranked timeline of posts from people the user follows. Should I include Stories, Reels, or Explore?”

Interviewer: “Just the home feed. Photo and video posts.”

You: “Should I cover ranking, or assume chronological for now?”

Interviewer: “Cover ranking briefly — not the ML model, but where it sits in the architecture.”

That gives you the right scope: home feed, photo/video, ranking-aware but not ML-deep, mobile-first. Functional requirements: a user opens the app, sees a feed of posts from people they follow, can scroll to load more, can like/comment. Non-functional: feed loads in <500ms (cold) and <100ms (warm), images load progressively, works on flaky mobile networks, scales to billions of users with celebrity-level fan-out skew.

Step 2: Numbers — The Scale That Forces the Architecture

The reason this is hard: scale.

~2 billion monthly users, ~1 billion daily
Average user follows ~150 accounts; some follow thousands
Average user posts ~once per few days; influencers post multiple times a day; celebrities have 100M+ followers
Photos: ~95B viewed per day; videos: 100B+ counting Reels (out of scope but informative)
Average post: image ~250KB after compression, video ~3MB compressed for feed playback

The pressure points to flag in your interview:

Read-heavy. Most users open the app >10 times a day, post <1 time. Read:write is roughly 100:1. Architecture should optimize reads.

Celebrity skew. A user with 100M followers posts once. That single write potentially affects 100M reads. The fan-out math here is brutal.

Image bandwidth dominates. Text metadata is tiny; images and videos are 99% of bytes served. The CDN is doing more work than the database.

You don’t need exact numbers. You need to demonstrate awareness that reads vastly outnumber writes, that the fan-out distribution is wildly uneven, and that media delivery is a separate bandwidth problem from feed delivery. Mention these three and you’ve framed the design.

Step 3: The Pivotal Decision — Push vs. Pull vs. Hybrid

Here’s where the interview earns its salary. There are three models, each with real trade-offs.

Push Model (Fan-Out on Write)

When Alice posts, the system writes that post into the feed cache of every one of Alice’s followers. When Bob opens the app, his feed is already pre-computed — just read his timeline cache and return.

// Conceptual flow
1. Alice posts a photo
2. PostService writes the post to the post store
3. PostService publishes “new post by Alice” to a fan-out queue
4. Fan-out workers look up Alice’s followers (say, 500 people)
5. For each follower, write the post ID into their personal timeline cache
6. When Bob opens the app, read Bob’s timeline cache directly — instant

Wins: reads are O(1) cache lookups. Open-app latency is amazing. Feed is always ready before the user asks for it.

Loses: celebrity problem is brutal. A celebrity with 100M followers posts once and you have 100M cache writes to do. That’s a write storm. Storage is also enormous — you’re storing every post N times where N is the follower count.

Worse: users who haven’t opened the app in a year still get every post fanned-out into their cache. Wasted writes. Wasted storage.

Pull Model (Fan-Out on Read)

When Alice posts, the system writes one post record. Done. When Bob opens the app, the system looks up everyone Bob follows, fetches recent posts from each, merges them, returns the result.

// Conceptual flow
1. Alice posts a photo
2. PostService writes one post record. That’s it.
3. When Bob opens the app:
   a. Look up Bob’s following list (say, 150 accounts)
   b. For each, query “recent posts since X”
   c. Merge results, sort by timestamp (or rank)
   d. Return the top N

Wins: writes are trivial — one record per post. Storage is minimal. No fan-out storm on celebrity posts. Inactive users cost nothing.

Loses: reads become expensive. Every feed open does ~150 queries (one per followed account), then a merge step. With 1B daily users opening the app multiple times, that’s billions of queries per second. Database melts.

Worse: latency. The merge step doesn’t parallelize well past a point, and the slowest sub-query gates the whole feed.

Hybrid — What Real Systems Do

Neither model survives at Instagram scale on its own. The hybrid:

Most users use push. If you have 500 followers, fan-out on write is cheap. Pre-compute their timelines.
Celebrities use pull. If you have >1M followers, don’t fan out. Mark the account as “celebrity” (in production this is more nuanced — based on follower count, posting frequency, follower activity). On feed read, the system fetches the timeline cache (push-built for non-celebs) AND queries celebrity posts directly, then merges.
Inactive users get demand-driven push. If a user hasn’t opened the app in a month, don’t fan out to their cache. When they next open the app, run a one-time pull-model rebuild for them.

The threshold (the “celebrity boundary”) is tunable and product-driven. Real numbers are around 100K–1M followers. Above that, pull. Below, push. The exact number is determined by measuring.

Senior signal: walking the interviewer through both pure models, identifying the failure modes of each, and arriving at the hybrid as a deliberate design — not jumping to “hybrid” because you read it on a blog. The reasoning is the answer.

Step 4: The Storage — Three Stores, Three Jobs

One database can’t do this. Three storage layers, each chosen for what it does well:

Post Store. Holds the actual posts — metadata (caption, author, timestamp, location), references to media. Write-once-read-many. Modest update load (likes, comments stored elsewhere). Use Cassandra partitioned by user ID, clustered by post timestamp. Same reasoning as in the WhatsApp design: wide-column stores excel at “all posts by user X, ordered by time, give me the last 50.”

CREATE TABLE posts (
    user_id      UUID,
    post_id      TIMEUUID,    -- time-ordered
    caption      TEXT,
    media_refs   LIST<TEXT>,  -- pointers to object storage
    location     TEXT,
    created_at   TIMESTAMP,
    PRIMARY KEY ((user_id), post_id)
) WITH CLUSTERING ORDER BY (post_id DESC);

Timeline Cache (the push-built feeds). Per-user feed state, holding ~1000 most recent post IDs. Use Redis — in-memory, sharded by user ID, sorted set by timestamp/rank score. Reads are O(log N) for “give me the top K.” This is the cache that fan-out-on-write writes to and that feed reads pull from.

Social Graph Store. Who follows whom. Read-heavy (every feed pull touches this), needs to handle “list followers of user X” and “list users X follows” with low latency. Use a graph store (Neo4j, FlockDB-style) or a sharded relational store with careful indexing. Many companies use a custom service here because the access patterns are specific.

Notice none of these is “a Postgres instance with all the data.” The whole point of the design is recognizing that posts, timelines, and the social graph have wildly different access patterns and shouldn’t share a database.

Step 5: The Feed Read Path — Walked Through

What happens when a user opens the app and pulls to refresh:

Client requests /v1/feed?cursor=X from the API gateway
Gateway authenticates, forwards to Feed Service
Feed Service reads the user’s timeline cache from Redis (top ~1000 post IDs ranked by score)
Feed Service checks: any of the user’s followed accounts marked “celebrity”? If yes, query their recent posts from Post Store directly
Merge celebrity posts with cached timeline, applying ranking score
For the top N post IDs (say, 30), batch-fetch full post metadata from Post Store
For each post, attach signed URLs to images/videos (pointing at CDN)
Apply privacy filters (blocked users, hidden posts), de-duplicate
Return the list to the client

The merge in step 5 is where the hybrid does its work. The cached timeline gives Bob a pre-ranked feed of ~990 normal posts; querying directly from Justin Bieber’s post stream adds whatever’s recent there. The ranking layer (step 5b) reconciles into one ordered list.

Senior follow-up: “What if the timeline cache misses?” Cache miss means “rebuild from pull-model.” The user waits a moment longer (~500ms vs. ~50ms) while the system runs a one-time pull. Cache miss recovery should be invisible to the user beyond the initial wait. Mark the timeline as warm so future reads are fast.

Step 6: Ranking — Where Does the ML Live?

You’re not designing the ranking model. You’re designing where it sits in the architecture. The senior answer:

Two-stage ranking. Stage 1 (offline, runs every few minutes per user): a candidate generator picks the ~1000 posts that could be in your feed. Stage 2 (online, runs at request time): a faster scoring model re-ranks those 1000 to pick the top 30 right now, considering recency, engagement signals, and time-of-day.

The candidate generator can fan-out-on-write because it produces the timeline cache. It’s the “what posts might Bob care about.” The online ranker is what produces the actual order Bob sees, and runs in <50ms during feed load.

Why this matters architecturally: the ranking model needs features at request time (Bob’s recent activity, time of day, his current location, his last opened post). A feature store sits next to the Feed Service for this. Without the two-stage split, you’d need to evaluate the heavy model against millions of candidate posts at request time — impossible.

Don’t go deeper on the ML in an interview unless asked. Just demonstrate awareness that ranking is two stages and that real-time features need a feature store.

Step 7: Media — Separate Path, Separate Bandwidth

This part is largely the same as the WhatsApp design, with one major addition: image variants.

User picks photo from gallery on Android client
Client requests pre-signed upload URL from Media Service, sends hash for dedup
If new, client uploads original to S3-compatible object storage
Upload triggers a transcoding pipeline (Lambda/Cloud Function) that generates 4–6 image variants: thumbnail (150px), small (400px), medium (800px), large (1080px), HD (1440px)
Each variant is uploaded to the CDN with cache-friendly headers
The post record references the media ID; URLs for all variants are derived

The Android client picks the right variant based on viewport size and connection quality. On a slow connection, load the small variant first, then progressively upgrade. On a fast connection, load medium directly.

Why this matters: serving a 4MB original photo to a phone displaying it at 400px is a bandwidth crime — literally 10× more bytes than needed. At Instagram scale, this is hundreds of millions of dollars per year of CDN cost.

For video, similar pipeline produces multiple bitrates for adaptive streaming (HLS or DASH). The Android player switches bitrate based on observed bandwidth.

Step 8: The Android Client — Where Mobile Engineering Earns Its Title

Most system design write-ups stop at the server. For a mobile interviewer, the client side is half the answer — and it’s where Android candidates have an asymmetric advantage if they use it.

The pieces:

Local feed cache (Room). The feed shown on app launch comes from local DB, not the network. The network is a sync layer. This is what makes the app feel “instant” even on slow connections.

@Entity(tableName = “feed_items”)
data class FeedItem(
    @PrimaryKey val postId: String,
    val authorId: String,
    val caption: String?,
    val mediaUrls: List<String>,    // For all variants
    val createdAt: Long,
    val rankScore: Float,           // Server-provided
    val viewedLocally: Boolean = false,
    val likedLocally: Boolean = false  // Optimistic UI state
)

class FeedRepository @Inject constructor(
    private val feedDao: FeedDao,
    private val api: FeedApi,
    private val imageLoader: ImageLoader
) {
    fun observeFeed(): Flow<List<FeedItem>> = feedDao.observeAll()
    // UI binds to this Flow — updates whenever local DB changes
    // No network dependency for showing the feed

    suspend fun refresh() {
        val response = api.getFeed(cursor = null)
        feedDao.replaceAll(response.items.map { it.toEntity() })
        prefetchImages(response.items.take(10)) // First 10 visible
    }
}

Image prefetching. When the feed loads, prefetch images for the next 5–10 visible posts before the user scrolls to them. Coil and Glide both expose prefetching APIs. The trade-off: bandwidth vs. perceived performance. Aggressive prefetching is the right call on Wi-Fi; conservative on mobile data.

Pagination via cursor. The server returns posts plus a nextCursor token. Client requests the next page when the user scrolls near the end. Use Paging 3 for this on Android — it handles the threshold trigger, deduplication, viewport recycling. Paged data flows into the LazyColumn via collectAsLazyPagingItems().

Optimistic UI for likes/comments. User taps the like button, UI updates immediately, request fires in the background. If the server rejects (rare), revert. The 100ms felt-difference between “optimistic” and “wait for server” is the entire reason Instagram feels responsive.

Connection awareness. Listen to ConnectivityManager. On Wi-Fi, prefetch aggressively, load HD variants. On metered mobile, load smaller variants, defer prefetching. On airplane mode, work entirely from local cache — user can scroll their cached feed even offline.

Mention these on the mobile interview and you’ve demonstrated that you understand the half of the system the server interviewer can’t fully evaluate.

Step 9: Architecture — In One Picture

                          ┌────────────────┐
                          │  Mobile Client │
                          │  (Room cache,  │
                          │   Paging 3)    │
                          └───────┬────────┘
                                  │ HTTPS / Protobuf
                  ┌───────────────┴───────────────┐
                  │       API Gateway / Edge      │
                  │   (auth, rate limit, routing) │
                  └─┬───────────┬───────────┬─────┘
                    │           │           │
                    ▼           ▼           ▼
          ┌──────────┐  ┌──────────┐  ┌──────────┐
          │   Feed   │  │  Post    │  │  Media   │
          │ Service  │  │ Service  │  │ Service  │
          └────┬─────┘  └────┬─────┘  └────┬─────┘
               │             │             │
       ┌───────┼───────┐     │             │
       ▼       ▼       ▼     │             │
   ┌─────┐ ┌─────┐ ┌─────┐   │             │
   │Redis│ │Rank │ │Soc. │   │             │
   │(TLs)│ │ Svc │ │Graph│   │             │
   └─────┘ └─────┘ └─────┘   │             │
                             ▼             ▼
                       ┌──────────┐  ┌──────────┐
                       │Cassandra │  │Object Std│
                       │ (posts)  │  │ + CDN    │
                       └──────────┘  └──────────┘

   ┌─────────────┐    ┌─────────────┐
   │   Kafka     │    │ Fan-Out     │
   │ (post-events│◀───│ Workers     │
   │  topic)     │    │ (push model)│
   └─────────────┘    └─────────────┘

Six services, four storage layers, one queue, and the fan-out worker pool. Each piece has one job. The Feed Service is the orchestrator that does the merge between push-built timelines and pull-model celebrity posts.

Step 10: The Deep-Dive Questions Senior Interviewers Actually Ask

The setup questions are above. These are the prompts that separate hire from no-hire:

“What happens if the timeline cache for a user is corrupted or stale?” — Answer: TTL on the cache, fallback to pull-model rebuild on cache miss. The recovery is built in.

“How do you handle a user who unfollows someone? Their cached timeline still has that user’s posts.” — Two options: lazily filter at read time using the current follow list, or eagerly invalidate timeline entries on unfollow. Real systems usually use lazy filtering — cheaper, and unfollow is rare relative to feed reads.

“What about new follows? The user follows someone and expects their posts in the feed.” — On follow event, queue a one-time backfill: pull recent posts from the new followee, merge into the follower’s timeline cache. Async, eventual consistency — user sees them within a minute.

“Hot post problem — what if one post gets a million likes in a minute?” — Like counts shouldn’t be in the post record (write contention). Stored in a separate counter store, periodically aggregated. Reads can use cached approximations — nobody notices if a celebrity post shows 1.2M likes vs. 1.21M.

“How do you A/B test ranking changes without affecting all users?” — The Feed Service consults a feature-flag service per user. Different users get different ranking models, with logged outcomes. The architecture supports this because ranking is a separate service the Feed Service calls.

“What’s the latency budget for a feed open, and where does the time go?” — ~500ms cold-start budget. Auth + routing: ~30ms. Timeline cache read: ~10ms. Celebrity merge query: ~50ms. Ranking pass: ~50ms. Post metadata fetch: ~100ms. Image URL signing: ~10ms. Network: ~100ms. The bottleneck candidates are post metadata fetch (parallelizable) and network (mostly client-side).

Each of these takes 90 seconds to answer well. Hit four out of six and you’ve had a great interview.

Common Failures in This Interview

Picking pure push or pure pull and defending it religiously. The hybrid isn’t a compromise — it’s a recognition that the workload is bimodal (most users are normal, a few are celebrities) and one strategy fits each side. Sticking to one model when the interviewer presses on celebrity scale is a junior signal.

Forgetting the social graph is its own beast. “Just JOIN the followers table” doesn’t work at this scale. The graph is its own service with its own access patterns; calling that out demonstrates awareness.

Putting media in the database. Same lesson as WhatsApp: 4MB photos through Cassandra is the wrong tool for the job. Object storage + CDN, references in the database.

Skipping the client side entirely. Mobile interviewers want to hear about local cache, prefetching, optimistic UI. Server-only system design shows you don’t understand half the system you’re shipping.

Treating ranking as a black box. You don’t need to design the model, but you should know where it lives — offline candidate generation vs. online re-ranking, why both exist, what features go where. “ML magic” isn’t an answer.

Not estimating numbers. The whole architecture is shaped by “reads vs. writes,” “normal vs. celebrity,” “text vs. media bandwidth.” If you didn’t do back-of-envelope math, you can’t justify the design.

What Pairing This With WhatsApp Teaches You

If you’ve read the WhatsApp post, the contrast is the lesson. WhatsApp is about delivery — getting bytes from one user to specific other users with strong guarantees. Instagram is about selection — choosing which bytes to show out of millions of options. Different problem, different architecture.

WhatsApp uses fan-out on write because messages have a small, known set of recipients. Instagram uses hybrid because followers are a huge, uneven set. WhatsApp’s scaling pressure is connection density. Instagram’s is read amplification.

An interviewer giving you a third system design question — YouTube, Twitter, TikTok, Spotify — is testing whether you can pattern-match: is this a delivery problem or a selection problem? Is the workload uniform or skewed? Is bandwidth the bottleneck or query rate? Once you have those questions in muscle memory, you can design any of these systems on a whiteboard.

Closing

Instagram’s feed is one of the most famous interview prompts because the answer reveals so much about the candidate. It tests architectural taste (push vs. pull), scale awareness (celebrity skew, read-heaviness), service boundaries (post storage vs. timeline cache vs. social graph), media handling, and — if you’re lucky — mobile client design. Get the central decision right, defend it with numbers, walk through the consequences, and the rest writes itself.

The framework: scope first, estimate before designing, identify the pivotal decision, walk both extremes, justify the hybrid. That framework transfers to any feed/timeline/recommendation system. The specifics change; the shape of the reasoning doesn’t.

Happy coding!

4 views · 0 comments

Comments (0)

No comments yet. Be the first to share your thoughts.