System Design13 min read

Designing Real-Time Systems: WebSockets, SSE, and Event-Driven Architecture

AI Interview Trainer Team·
#WebSockets#SSE#Real-Time#System Design#Architecture

Real-time features are no longer optional — users expect live updates, instant notifications, and collaborative experiences. Designing these systems at scale is a common (and challenging) system design interview topic.

This guide covers the protocols, architectures, and trade-offs you need to ace any real-time system design question.

---

The Protocols

WebSockets — Full-Duplex Persistent Connection

WebSockets establish a persistent TCP connection after an HTTP upgrade handshake. Both client and server can send messages at any time.

Handshake:

GET /ws HTTP/1.1
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13

When to use:

- Bidirectional communication (chat, games, collaborative editing)

- High-frequency updates (trading platforms, live sports)

- Low-latency requirements (< 100ms)

Pitfalls:

- Connection management at scale (10k+ concurrent connections per server)

- Reconnection logic (exponential backoff with jitter)

- Message ordering and delivery guarantees

- No built-in compression — implement your own

Server-Sent Events (SSE) — One-Way from Server

SSE uses standard HTTP. The server sends a stream of events; the client listens via EventSource API.

const eventSource = new EventSource('/api/notifications');

eventSource.onmessage = (event) => {
  console.log('New notification:', event.data);
};

eventSource.addEventListener('order-update', (event) => {
  const order = JSON.parse(event.data);
  updateOrderUI(order);
});

When to use:

- One-way updates (notifications, stock tickers, feed updates)

- When you need to work through firewalls and proxies (uses standard HTTP)

- Simpler server-side implementation

Limitations:

- Browser limit: 6 concurrent SSE connections per domain

- No binary data support natively (Base64)

- Client reconnection is browser-managed with limited control

- Unidirectional only — client cannot send data

WebRTC — Peer-to-Peer Real-Time Communication

WebRTC enables direct browser-to-browser communication for audio, video, and data. It uses a signaling server (often WebSocket) to establish the connection, then media flows P2P.

When to use: Video calls, screen sharing, P2P file transfer, low-latency gaming.

Interview context: WebRTC is typically discussed as a specialized solution — most real-time design questions focus on WebSockets with SSE as an alternative.

---

Scaling WebSockets: The Architecture

Single Server (Simple)

Client 1 --+
Client 2 --+-- WebSocket Server -- Redis Pub/Sub -- Database
Client 3 --+

Works up to ~10k concurrent connections per server.

Horizontal Scaling

             +-- WebSocket Server 1 --+
Load Balancer --+-- WebSocket Server 2 --+-- Redis Pub/Sub -- Database
             +-- WebSocket Server 3 --+

Key challenge: Client A connects to Server 1, Client B to Server 2. How does A send a message to B?

Solution — Redis Pub/Sub:

Each server subscribes to Redis channels. When Server 1 receives a message from A targeting B, it publishes to Redis. Server 2 (where B is connected) receives the event and forwards it through B's WebSocket.

Production Architecture

             +-- WebSocket Server --+
Load Balancer --+-- WebSocket Server --+-- Redis Cluster (Pub/Sub + State)
(sticky sessions)+-- WebSocket Server --+
                      |
                  +---+---+
                  |       |
            Message Queue  Database
             (Kafka)      (PostgreSQL)
                  |
                  v
          Analytics / Monitoring

Persistence layer: Store messages in Cassandra for chat history (write-optimized, time-series friendly).

Presence service: Redis sorted sets (ZSET) to track online users with heartbeat expiration.

---

Design Example: Live Notifications System

Functional requirements:

- Users receive real-time notifications (likes, comments, follows)

- Users can mark notifications as read

- Notifications persist for history

- 50M users, 200M notifications/day

Protocol choice: SSE — notifications are server-to-client only, SSE is simpler and works through corporate proxies.

Database schema:

CREATE TABLE notifications (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id BIGINT NOT NULL,
  type VARCHAR(50) NOT NULL,
  actor_id BIGINT NOT NULL,
  target_id BIGINT,
  metadata JSONB,
  is_read BOOLEAN DEFAULT FALSE,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_notifications_user_time
  ON notifications (user_id, created_at DESC);

Scaling considerations:

- SSE Gateway maintains a map: user_id -> response stream

- Use consistent hashing across SSE Gateway instances

- Store user-to-server mapping in Redis (with TTL for cleanup)

- Batch Kafka messages for high-throughput scenarios

---

Design Example: Collaborative Document Editing

Requirements:

- Multiple users edit the same document simultaneously

- Changes sync in real-time (< 200ms)

- Conflict resolution (Operational Transform or CRDT)

- Version history

Protocol: WebSockets (bidirectional, low latency)

Key challenge — conflict resolution:

Operational Transform (OT): Google Docs approach. Each operation is transformed against concurrent operations to produce a consistent result. Requires a central server.

CRDT (Conflict-Free Replicated Data Types): Each client's changes converge automatically without a central coordinator. Used by Figma and Notion.

Interview answer structure:

> "For collaborative editing, I'd use WebSockets with a CRDT-based approach. Each character position is assigned a unique identifier (client_id + lamport timestamp). When two users type simultaneously, the CRDT merges both changes deterministically — all clients arrive at the same document state. The server acts as an ordering layer (not a transformation layer), which simplifies the architecture compared to OT."

---

Common Interview Questions

"How do you handle reconnection after a dropped WebSocket?"

Use exponential backoff with jitter:

let attempt = 0;
const maxAttempts = 10;

function connect() {
  const ws = new WebSocket(url);
  ws.onclose = () => {
    const delay = Math.min(1000 * Math.pow(2, attempt) + Math.random() * 1000, 30000);
    setTimeout(connect, delay);
    attempt++;
  };
  ws.onopen = () => { attempt = 0; };
}

On reconnect: send last received message ID to get any missed messages.

"How do you broadcast to millions of users?"

Fanout patterns:

1. In-app fanout: Redis Pub/Sub per server (100k users)

2. Middle-scale: Redis Cluster across server groups (1M users)

3. Massive scale: Dedicated push service + CDN-based SSE (10M+ users)

"WebSockets vs SSE — which would you use for a live sports score app?"

"I'd use SSE because: (1) updates are unidirectional (server to client), (2) SSE works through corporate firewalls, (3) simpler to implement and debug, (4) auto-reconnection is built into the EventSource API. I'd fall back to WebSockets only if the client needed to send data frequently (e.g., betting actions)."

---

Practice Real-Time System Design

These scenarios are common in FAANG interviews. [AI Interview Trainer](https://t.me/developing_interview_trainer_bot) helps you practice:

- Design WhatsApp / Telegram (messaging + real-time delivery)

- Design YouTube Live (streaming + chat)

- Design a real-time collaboration tool (like Figma or Notion)

- Design a live auction system (bids + notifications)

- Get scored on your architecture decisions and trade-off analysis

Practice what you learned

Try a realistic AI mock interview tailored to your role.

Start Free Practice →