Building performance-critical Python tools with Rust: Lessons from production

About Cloudsmith

Universal package management & supply chain security

Secure artifact repository for 32+ package formats
110M+ API requests/day
Petabytes of packages daily
10-year-old Django monolith

Every non-cached request hits our Django API ⚡

The Problem

┌─────────┐    ┌──────────┐    ┌────────┐
│  NGINX  │───▶│ HAProxy  │───▶│ uWSGI  │───▶ Django
└─────────┘    └──────────┘    └────────┘
SSL + Basic    Queuing &        App
Validation     Routing          Server

Multiple components to maintain and run = debugging hell
uWSGI threading model = black box latency spikes
No request cancellation (WSGI limitation)
Throwing hardware at scaling problems

Why This Matters

❌ Full rewrites aren't viable (10 years of business logic)
❌ Limited engineering bandwidth
✅ Need performance gains NOW
✅ Must maintain reliability

The question: Can we get massive wins without rewriting everything?

The Methodology

Measure everything — baselines before changes
Identify bottlenecks — profile & trace first
Find existing Rust tools — don't reinvent
Test extensively — months in lower envs
Phased rollout — canary by region
Monitor & iterate — production differs

What matters: Minimal code changes, maximum impact

Finding the Bottleneck

Our tools:

Datadog & OpenTelemetry traces & metrics
Loadtesting:
1. Locust ❌ — Couldn't generate enough load
2. wrk ❌ — Same problem
3. Goose ✅ — Rust-based, actually stressed our system

What we found:

Application server layer was where we spent all out time
uWSGI was causing weird latency spikes
serialization overhead everywhere (JSON & XML)
Complex proxy chain adding latency

No single "aha moment" - we all knew it was complicated

The Rust Tools we used

Granian — Tokio/Hyper-based ASGI/WSGI server
orjson — PyO3-based JSON serialization
jsonschema-rs — High-performance validation

Each solves a specific bottleneck with minimal integration effort

orjson: Drop-in JSON Performance

What it does: PyO3-based JSON serialization

Where we used it:

Django REST Framework responses
python-json-logger integration
Every json.dumps() / json.loads() call

Integration:

Before

import json
data = json.dumps(payload)

After

import orjson
data = orjson.dumps(payload)

Surprises:

✅ Near-zero compatibility issues
⚠️ One edge case (customer parsing JSON manually in bash)

jsonschema-rs: The Accidental Win

Discovery: We were running BOTH Python jsonschema AND jsonschema-rs

Fix: Remove the Python one

Result: Free performance win

Granian: Tokio/Hyper for Python

What it is: Rust HTTP server for Python (ASGI/WSGI) Built on Tokio async runtime + Hyper HTTP

What it replaced:

BEFORE: NGINX → HAProxy → uWSGI → Django

AFTER: Granian → Django

Loadtest: uWSGI vs Granian

Metric	uWSGI	Granian	Improvement
Throughput (RPS)	18.38	24.20	+32%
Avg Response Time	6,985ms	3,837ms	-45%
P50 Latency	7,000ms	3,000ms	-57%
P95 Latency	11,000ms	12,000ms	+9%
Total Requests	18,124	23,235	+28%

Test: 0-200 concurrent users, 16 minutes (Goose v0.17.2)

Loadtest: Full Stack (The Real Win)

Metric	NGINX/HAProxy/uWSGI	Granian	Improvement
Throughput (RPS)	70.89	144.39	+104% (2x)
Avg Response Time	1,799ms	841ms	-53%
P50 Latency	1,000ms	500ms	-50%
P95 Latency	5,000ms	3,000ms	-40%
P99 Latency	7,000ms	4,000ms	-43%
Total Requests	69,828	142,225	+104%

✅ 2x throughput ✅ 50%+ latency reduction

The Migration Journey

Timeline: Months from testing → production

Phased rollout:

Staging rollout
Internal environment validation
Canary deployment by region
Gradual rollout over days

Why: 110M requests/day = high-stakes change

What Broke: Threading Model Differences

1. Database Connection Exhaustion 💥

Root cause: Django 4's "pooling" = just connection reuse

Granian's true threading: each thread spawns connections
uWSGI green threads: better connection sharing

Fix:

Django 5.1's real connection pooling
Still in progress (why Granian isn't at 100%)

What Broke: HTTP Strictness

2. HTTP Handling Differences

Root cause: NGINX normalizes quirky HTTP silently

Hyper does Everything correctly
AWS ALB couldnt handle the weird responses

Fix: We are isolating this logic out and following specs

What Broke: Debugger Support

3. Development Workflow

Problem: Can't attach Python debugger to Granian

Workaround: Using a different server in dev

Tuning: Granian

Key parameters:

granian --interface wsgi \
        --workers 8 \
        --threads 4 \           # blocking-threads
        --backlog 2048 \        # connection queue
        app:application

Backlog: Max connections to hold in queue
Threads: Per-worker blocking I/O threads
Workers: Process count

Goal: Recreate HAProxy's queue without HAProxy

Contributing Back to Granian.

What we needed: Production observability

What was Missing: Metrics

What we did:

Forked, added Prometheus metrics endpoint
Exposed backlog depth, event loop stats
Open a PR (never got it merged)

Status: Will be in the next release

Where We Are Today

Granian deployment:

✅ Running in Internal Envs
⏳ Not 100% yet (Django upgrade in progress)
✅ Debugging easier than uWSGI black box
✅ Performance gains confirmed

Rust In the stack:

orjson: ✅ Everywhere
jsonschema-rs: ✅ Everywhere
Granian: ⏳ Phased rollout

Cultural Shift

This gave us confidence to to experiment more

Build new services in pure Rust

Where Rust-Python Tools Win

Good fit when you have:

✅ Clear performance bottleneck exists
✅ Drop-in API compatibility

Thank You!

Questions?

Cian Butler — Senior SRE @ Cloudsmith 📧 cbutler@cloudsmith.io

tl;dr: Your Rust tools gave us 2x throughput on 110M req/day. No rewrites. Just good libraries.