← Back to Projects

HTTP Caching Proxy

C++17POSIX socketsDockerNetworkingSystems

Overview

HTTP Caching Proxy is a multithreaded HTTP/1.1 forward proxy written in C++ from scratch — no third-party HTTP libraries. It supports GET with an LRU cache (expiration, conditional revalidation), POST forwarding, and CONNECT tunneling for HTTPS. The proxy runs fully containerized via Docker and produces structured, per-request logs.

Description

HTTP Caching Proxy is a standards-aware forward proxy that sits between clients and origin servers, caching eligible GET responses and relaying all other traffic. Each incoming connection is handled on its own thread. The cache enforces an LRU eviction policy, RFC 7234-style expiration (from max-age, Expires, Last-Modified heuristics), and conditional revalidation using If-None-Match / If-Modified-Since. For CONNECT requests the proxy establishes a raw TCP tunnel to the target host and bidirectionally relays bytes, enabling HTTPS through the proxy. Every request and cache event is written to a structured log file, making it straightforward to audit cache behavior and trace request lifecycles.

What it is and what it does

The proxy accepts TCP connections on a configurable port and dispatches each client to one of three handlers. For GET requests it first queries the in-memory LRU cache by host + URL key. A cache hit that is still fresh is served immediately; a stale or must-revalidate entry triggers a conditional request to the origin — on a 304 the cached response is refreshed and served, on a 200 the cache is updated. A cache miss fetches the full response, which is stored if cacheable (status 200, no no-store, not private). For POST requests the proxy connects upstream, forwards the reconstructed request, and streams the response back without caching. For CONNECT requests the proxy connects to the target host and port, sends 200 Connection established to the client, then enters a select-based bidirectional relay loop until one side closes or an idle timeout fires. Malformed requests produce 400 responses, upstream failures produce 502, and unknown methods produce 501.

Capabilities

  • GET caching with LRU eviction: cache holds up to 50 entries; least-recently-used entry is evicted on overflow; expired entries cleaned up periodically (default 300 s interval)
  • Multi-source expiration: freshness lifetime derived from Date + max-age, Expires, must-revalidate, or a heuristic based on Last-Modified vs. Date difference
  • Conditional revalidation: on stale cache hits, If-None-Match and If-Modified-Since are added to the upstream request; 304 responses refresh the cache entry without a full transfer
  • Chunked and large-body handling: chunked GET responses streamed to client while accumulated for caching; non-chunked responses with Content-Length > 65536 handled by a dedicated large-response path
  • HTTPS via CONNECT tunnel: raw byte relay with select, MSG_NOSIGNAL, and a 10.5 s idle timeout
  • Thread-safe shared cache: std::shared_mutex allows concurrent reads; exclusive write lock only on insert/update/evict
  • Structured logging: every request gets a monotonic ID; cache status (hit, miss, expired, revalidate), upstream interactions, tunnel open/close, and errors logged with timestamps to /var/log/erss/proxy.log
  • Graceful shutdown: SIGINT sets an atomic flag, closes the listen socket, and joins client threads
  • Containerized deployment: Dockerfile (Ubuntu 22.04) and docker-compose.yml expose port 12345 and bind-mount ./logs into the container

Implementation

Networking and I/O: The server socket uses SO_REUSEADDR and a backlog of 100. accept runs inside a select loop with a 1 s timeout so the stop() signal is observed promptly. Each accepted client socket gets a 30 s SO_RCVTIMEO. Upstream connections use getaddrinfo + connect with a 10 s receive timeout. The CONNECT tunnel uses a select-based loop reading both file descriptors simultaneously.

Request parsing (`Request`): Parses the request line (method, URL, HTTP version) and significant headers (Host, User-Agent, Connection, validators). Request_line() reconstructs a minimal HTTP/1.1 request line for forwarding. The request ID is a globally incrementing std::atomic<int>.

Response parsing (`Response`): Parses status code, all response headers into a std::map<string,string>, and the start of the body. parseCacheControl() extracts no-store, no-cache, must-revalidate, private, max-age, and s-maxage directives. isCacheable() and needsRevalidation() gate cache storage and conditional-fetch logic.

Cache (`Cache`): std::unordered_map<string, CacheEntry> holds Response* pointers keyed by host + url. A std::list<string> tracks LRU order. get() returns a CacheStatus enum (NOT_IN_CACHE, VALID, EXPIRED, REQUIRES_VALIDATION). put() evicts the LRU tail while size >= max_entries, then inserts; cleanExpiredResponse() sweeps all entries when the cleanup interval elapses.

Logging (`Logger`): Opens the log file on construction (truncating); all writes serialized with a std::mutex. Log lines include request ID, method, URL, origin host, cache status, and error descriptions.

Concurrency: Client threads are detach()ed after launch. The thread vector retains handles for the join sweep in stop(). Cache reads use a shared lock; all mutating cache operations acquire an exclusive lock.

Demo

No live demo available for this project. See the repo for build and Docker run instructions.

Tech & Tools

C++17 · POSIX sockets (socket, bind, listen, accept, connect, send, recv, poll, select, getaddrinfo) · std::thread / std::mutex / std::shared_mutex / std::atomic · Docker / docker-compose · make / g++

Highlights

  • Multithreaded proxy handling GET, POST, and CONNECT from scratch in C++
  • RFC 7234-aligned LRU cache with expiration, eviction, and conditional revalidation
  • HTTPS tunneling via CONNECT with bidirectional byte relay
  • Thread-safe cache with shared_mutex for concurrent read performance
  • Fully containerized with Docker; structured per-request logging

More Projects