About Cloudsmith

Universal package management & supply chain security

  • Secure artifact repository for 32+ package formats
  • 110M+ API requests/day
  • Petabytes of packages daily
  • 10-year-old Django monolith

Every non-cached request hits our Django API ⚡

The Problem

┌─────────┐    ┌──────────┐    ┌────────┐
│  NGINX  │───▶│ HAProxy  │───▶│ uWSGI  │───▶ Django
└─────────┘    └──────────┘    └────────┘
SSL + Basic    Queuing &        App
Validation     Routing          Server
  • Multiple components to maintain and run = debugging hell
  • uWSGI threading model = black box latency spikes
  • No request cancellation (WSGI limitation)
  • Throwing hardware at scaling problems

Why This Matters

  • ❌ Full rewrites aren't viable (10 years of business logic)
  • ❌ Limited engineering bandwidth
  • ✅ Need performance gains NOW
  • ✅ Must maintain reliability

The question: Can we get massive wins without rewriting everything?

The Methodology

  1. Measure everything — baselines before changes
  2. Identify bottlenecks — profile & trace first
  3. Find existing Rust tools — don't reinvent
  4. Test extensively — months in lower envs
  5. Phased rollout — canary by region
  6. Monitor & iterate — production differs

What matters: Minimal code changes, maximum impact

Finding the Bottleneck

Our tools:

  • Datadog & OpenTelemetry traces & metrics
  • Loadtesting:
    1. Locust ❌ — Couldn't generate enough load
    2. wrk ❌ — Same problem
    3. Goose ✅ — Rust-based, actually stressed our system

What we found:

  • Application server layer was where we spent all out time
  • uWSGI was causing weird latency spikes
  • serialization overhead everywhere (JSON & XML)
  • Complex proxy chain adding latency

No single "aha moment" - we all knew it was complicated

The Rust Tools we used

  1. Granian — Tokio/Hyper-based ASGI/WSGI server
  2. orjson — PyO3-based JSON serialization
  3. jsonschema-rs — High-performance validation

Each solves a specific bottleneck with minimal integration effort

orjson: Drop-in JSON Performance

What it does: PyO3-based JSON serialization

Where we used it:

  • Django REST Framework responses
  • python-json-logger integration
  • Every json.dumps() / json.loads() call

Integration:

Before

import json
data = json.dumps(payload)

After

import orjson
data = orjson.dumps(payload)

Surprises:

  • ✅ Near-zero compatibility issues
  • ⚠️ One edge case (customer parsing JSON manually in bash)

jsonschema-rs: The Accidental Win

Discovery: We were running BOTH Python jsonschema AND jsonschema-rs

Fix: Remove the Python one

Result: Free performance win

Granian: Tokio/Hyper for Python

What it is: Rust HTTP server for Python (ASGI/WSGI) Built on Tokio async runtime + Hyper HTTP

What it replaced:

BEFORE: NGINX → HAProxy → uWSGI → Django

AFTER: Granian → Django

Loadtest: uWSGI vs Granian

Metric uWSGI Granian Improvement
Throughput (RPS) 18.38 24.20 +32%
Avg Response Time 6,985ms 3,837ms -45%
P50 Latency 7,000ms 3,000ms -57%
P95 Latency 11,000ms 12,000ms +9%
Total Requests 18,124 23,235 +28%

Test: 0-200 concurrent users, 16 minutes (Goose v0.17.2)

Loadtest: Full Stack (The Real Win)

Metric NGINX/HAProxy/uWSGI Granian Improvement
Throughput (RPS) 70.89 144.39 +104% (2x)
Avg Response Time 1,799ms 841ms -53%
P50 Latency 1,000ms 500ms -50%
P95 Latency 5,000ms 3,000ms -40%
P99 Latency 7,000ms 4,000ms -43%
Total Requests 69,828 142,225 +104%

2x throughput50%+ latency reduction

The Migration Journey

Timeline: Months from testing → production

Phased rollout:

  1. Staging rollout
  2. Internal environment validation
  3. Canary deployment by region
  4. Gradual rollout over days

Why: 110M requests/day = high-stakes change

What Broke: Threading Model Differences

1. Database Connection Exhaustion 💥

Root cause: Django 4's "pooling" = just connection reuse

  • Granian's true threading: each thread spawns connections
  • uWSGI green threads: better connection sharing

Fix:

  • Django 5.1's real connection pooling
  • Still in progress (why Granian isn't at 100%)

What Broke: HTTP Strictness

2. HTTP Handling Differences

Root cause: NGINX normalizes quirky HTTP silently

  • Hyper does Everything correctly
  • AWS ALB couldnt handle the weird responses

Fix: We are isolating this logic out and following specs

What Broke: Debugger Support

3. Development Workflow

Problem: Can't attach Python debugger to Granian

Workaround: Using a different server in dev

Tuning: Granian

Key parameters:

granian --interface wsgi \
        --workers 8 \
        --threads 4 \           # blocking-threads
        --backlog 2048 \        # connection queue
        app:application
  • Backlog: Max connections to hold in queue
  • Threads: Per-worker blocking I/O threads
  • Workers: Process count

Goal: Recreate HAProxy's queue without HAProxy

Contributing Back to Granian.

What we needed: Production observability

What was Missing: Metrics

What we did:

  • Forked, added Prometheus metrics endpoint
  • Exposed backlog depth, event loop stats
  • Open a PR (never got it merged)

Status: Will be in the next release

Where We Are Today

Granian deployment:

  • ✅ Running in Internal Envs
  • ⏳ Not 100% yet (Django upgrade in progress)
  • ✅ Debugging easier than uWSGI black box
  • ✅ Performance gains confirmed

Rust In the stack:

  • orjson: ✅ Everywhere
  • jsonschema-rs: ✅ Everywhere
  • Granian: ⏳ Phased rollout

Cultural Shift

This gave us confidence to to experiment more

Build new services in pure Rust

Where Rust-Python Tools Win

Good fit when you have:

  • ✅ Clear performance bottleneck exists
  • ✅ Drop-in API compatibility

Thank You!

Questions?

Cian Butler — Senior SRE @ Cloudsmith 📧 cbutler@cloudsmith.io

tl;dr: Your Rust tools gave us 2x throughput on 110M req/day. No rewrites. Just good libraries.