Distributed Chat App: Architecture & Design

1. Overview

This project is a distributed chat system built to explore the engineering challenges behind high-throughput real-time messaging at scale. The goal was to build something that could handle serious load, handle failures gracefully, and separate the concerns of message delivery and message persistence cleanly.

The system benchmarks at 13,000 messages per second under load testing, with production-grade fault tolerance patterns throughout.

2. System Architecture

Four Spring Boot WebSocket servers sit behind an Application Load Balancer. RabbitMQ handles all message routing between servers, ensuring a message sent on server A reaches a recipient connected to server B. Dedicated consumers read from RabbitMQ and write to Cassandra for persistence, completely decoupled from the delivery path. Redis handles caching for active session data and recent messages.

Client

WebSocket connection

→

Load Balancer

AWS ALB

→

Chat Servers (x4)

Spring Boot WebSocket

→

RabbitMQ

Message routing

RabbitMQ

Two queues: delivery + persist

→

Persistence Consumer

Dedicated worker

→

Cassandra

Message storage

All servers run on self-managed AWS EC2 instances. The infrastructure is intentionally self-managed rather than using managed services, to expose the operational realities of running distributed systems without abstractions.

3. Message Flow

When a user sends a message, the following sequence happens:

The client sends the message over its WebSocket connection to whichever chat server it is connected to.
The server publishes the message to RabbitMQ on two separate queues: one for delivery, one for persistence. These are independent so a slow write to Cassandra cannot block delivery to the recipient.
The delivery consumer routes the message to the target server (or broadcasts if it is a group message). The target server pushes it over the recipient's WebSocket connection.
The persistence consumer writes the message to Cassandra asynchronously.
Redis caches active session state and recent message history so reconnecting clients can fetch recent messages without hitting Cassandra on every load.

Send Message

Over WebSocket

→

Publish to RabbitMQ

Delivery + Persist queues

→

Deliver to Recipient

Target server WebSocket

4. Reliability Patterns

Dead letter queues: messages that fail processing after a configurable number of retries are moved to a dead letter queue for inspection rather than dropped silently.
Exponential backoff: transient failures on the consumer side retry with increasing delays to avoid hammering a recovering downstream system.
Circuit breaker: if Cassandra or Redis becomes unavailable, the circuit breaker trips and prevents repeated failed calls from accumulating, giving the system time to recover without cascading the failure.
Message persistence decoupling: because delivery and persistence are separate consumers on separate queues, a Cassandra outage does not interrupt message delivery. Messages queue up in the persistence queue until Cassandra recovers.

5. Engineering Trade-offs

Cassandra over PostgreSQL: chat message storage is an append-heavy workload with high read volume for recent history. Cassandra's wide-column model with time-based partitioning handles this pattern well and scales horizontally without the write bottlenecks that come from PostgreSQL at high insert rates.
RabbitMQ over Kafka: the messaging patterns here are point-to-point routing between servers, not a durable event log with multiple independent consumers. RabbitMQ's exchange and queue model fits this exactly. Kafka would have added complexity without a clear benefit for this use case.
Self-managed EC2 over managed services: this was a deliberate choice to learn the operational layer. Running your own servers forces you to understand what managed services are actually abstracting. ECS or EKS would have been more practical in production.
Four servers: enough to demonstrate load balancing and cross-server message routing without the cost of running a larger cluster. The architecture scales horizontally by adding more servers behind the ALB.