About Cloudsmith
Universal package management & supply chain security
- Secure artifact repository for 32+ package formats
- 110M+ API requests/day
- Petabytes of packages daily
- 10-year-old Django monolith
Every non-cached request hits our Django API ⚡
The Problem
┌─────────┐ ┌──────────┐ ┌────────┐
│ NGINX │───▶│ HAProxy │───▶│ uWSGI │───▶ Django
└─────────┘ └──────────┘ └────────┘
SSL + Basic Queuing & App
Validation Routing Server
- Multiple components to maintain and run = debugging hell
- uWSGI threading model = black box latency spikes
- No request cancellation (WSGI limitation)
- Throwing hardware at scaling problems
Why This Matters
- ❌ Full rewrites aren't viable (10 years of business logic)
- ❌ Limited engineering bandwidth
- ✅ Need performance gains NOW
- ✅ Must maintain reliability
The question: Can we get massive wins without
rewriting everything?
The Methodology
-
Measure everything — baselines before changes
-
Identify bottlenecks — profile & trace
first
-
Find existing Rust tools — don't reinvent
- Test extensively — months in lower envs
- Phased rollout — canary by region
-
Monitor & iterate — production differs
What matters: Minimal code changes, maximum
impact
Finding the Bottleneck
Our tools:
- Datadog & OpenTelemetry traces & metrics
-
Loadtesting:
-
Locust ❌ — Couldn't generate enough load
- wrk ❌ — Same problem
-
Goose ✅ — Rust-based, actually stressed
our system
What we found:
- Application server layer was where we spent all out time
- uWSGI was causing weird latency spikes
- serialization overhead everywhere (JSON & XML)
- Complex proxy chain adding latency
No single "aha moment" - we all knew it
was complicated
The Rust Tools we used
-
Granian — Tokio/Hyper-based ASGI/WSGI server
- orjson — PyO3-based JSON serialization
-
jsonschema-rs — High-performance validation
Each solves a specific bottleneck with minimal integration effort
orjson: Drop-in JSON Performance
What it does: PyO3-based JSON serialization
Where we used it:
- Django REST Framework responses
- python-json-logger integration
-
Every
json.dumps() / json.loads() call
Integration:
Before
import json
data = json.dumps(payload)
After
import orjson
data = orjson.dumps(payload)
Surprises:
- ✅ Near-zero compatibility issues
- ⚠️ One edge case (customer parsing JSON manually in bash)
jsonschema-rs: The Accidental Win
Discovery: We were running BOTH Python jsonschema
AND jsonschema-rs
Fix: Remove the Python one
Result: Free performance win
Granian: Tokio/Hyper for Python
What it is: Rust HTTP server for Python
(ASGI/WSGI) Built on Tokio async runtime + Hyper HTTP
What it replaced:
BEFORE: NGINX → HAProxy → uWSGI → Django
AFTER: Granian → Django
Loadtest: uWSGI vs Granian
| Metric |
uWSGI |
Granian |
Improvement |
| Throughput (RPS) |
18.38 |
24.20 |
+32% |
| Avg Response Time |
6,985ms |
3,837ms |
-45% |
| P50 Latency |
7,000ms |
3,000ms |
-57% |
| P95 Latency |
11,000ms |
12,000ms |
+9% |
| Total Requests |
18,124 |
23,235 |
+28% |
Test: 0-200 concurrent users, 16 minutes (Goose
v0.17.2)
Loadtest: Full Stack (The Real Win)
| Metric |
NGINX/HAProxy/uWSGI |
Granian |
Improvement |
| Throughput (RPS) |
70.89 |
144.39 |
+104% (2x) |
| Avg Response Time |
1,799ms |
841ms |
-53% |
| P50 Latency |
1,000ms |
500ms |
-50% |
| P95 Latency |
5,000ms |
3,000ms |
-40% |
| P99 Latency |
7,000ms |
4,000ms |
-43% |
| Total Requests |
69,828 |
142,225 |
+104% |
✅ 2x throughput ✅
50%+ latency reduction
The Migration Journey
Timeline: Months from testing → production
Phased rollout:
- Staging rollout
- Internal environment validation
- Canary deployment by region
- Gradual rollout over days
Why: 110M requests/day = high-stakes change
What Broke: Threading Model Differences
1. Database Connection Exhaustion 💥
Root cause: Django 4's "pooling" = just
connection reuse
- Granian's true threading: each thread spawns connections
- uWSGI green threads: better connection sharing
Fix:
- Django 5.1's real connection pooling
- Still in progress (why Granian isn't at 100%)
What Broke: HTTP Strictness
2. HTTP Handling Differences
Root cause: NGINX normalizes quirky HTTP silently
- Hyper does Everything correctly
- AWS ALB couldnt handle the weird responses
Fix: We are isolating this logic out and
following specs
What Broke: Debugger Support
3. Development Workflow
Problem: Can't attach Python debugger to Granian
Workaround: Using a different server in dev
Tuning: Granian
Key parameters:
granian --interface wsgi \
--workers 8 \
--threads 4 \ # blocking-threads
--backlog 2048 \ # connection queue
app:application
- Backlog: Max connections to hold in queue
- Threads: Per-worker blocking I/O threads
- Workers: Process count
Goal: Recreate HAProxy's queue without HAProxy
Contributing Back to Granian.
What we needed: Production observability
What was Missing: Metrics
What we did:
- Forked, added Prometheus metrics endpoint
- Exposed backlog depth, event loop stats
- Open a PR (never got it merged)
Status: Will be in the next release
Where We Are Today
Granian deployment:
- ✅ Running in Internal Envs
- ⏳ Not 100% yet (Django upgrade in progress)
- ✅ Debugging easier than uWSGI black box
- ✅ Performance gains confirmed
Rust In the stack:
- orjson: ✅ Everywhere
- jsonschema-rs: ✅ Everywhere
- Granian: ⏳ Phased rollout
Cultural Shift
This gave us confidence to to experiment more
Build new services in pure Rust
Where Rust-Python Tools Win
Good fit when you have:
- ✅ Clear performance bottleneck exists
- ✅ Drop-in API compatibility
Thank You!
Questions?
Cian Butler — Senior SRE @ Cloudsmith 📧
cbutler@cloudsmith.io
tl;dr: Your Rust tools gave us 2x throughput on
110M req/day. No rewrites. Just good libraries.