Gunicorn Configuration¶
Current Procfile¶
web: gunicorn Lenzeye:app --workers=1 --threads=6 --timeout=120 --graceful-timeout=30 --keep-alive=5 --worker-class=gthread --worker-tmp-dir=/dev/shm --log-level=info --limit-request-line=4096 --limit-request-field_size=8190 --max-requests=1000 --max-requests-jitter=50
Parameter Breakdown¶
| Parameter | Value | Reason |
|---|---|---|
--workers=1 |
1 | Single process to prevent RAM doubling |
--threads=6 |
6 | Handle up to 6 concurrent requests in one process |
--timeout=120 |
120s | Large encrypted uploads can take >60s on slow connections |
--graceful-timeout=30 |
30s | Give active requests time to finish on worker restart |
--keep-alive=5 |
5s | HTTP keep-alive for repeated requests (upload parts) |
--worker-class=gthread |
gthread | Thread-based; suitable for I/O-bound Flask workloads |
--worker-tmp-dir=/dev/shm |
RAM-backed | Worker heartbeat files on RAM disk — avoids disk I/O |
--log-level=info |
info | Log all requests and errors |
--limit-request-line=4096 |
4096 | Max URL length |
--max-requests=1000 |
1000 | Restart worker after 1000 requests to prevent memory drift |
--max-requests-jitter=50 |
50 | Random jitter on restart threshold to prevent all workers restarting at once |
Why 1 Worker?¶
The Render Starter plan provides 512 MB RAM. Under production load:
| State | RAM |
|---|---|
| Baseline (1 worker, idle) | ~237 MB |
| Normal upload load | 243–377 MB |
| Peak (4 concurrent encrypted uploads) | 398 MB |
| Headroom | ~114 MB |
With 2 workers, each worker loads the full Flask application independently (models, blueprints, lazy imports). Baseline would be ~474 MB — leaving only ~38 MB headroom. A single encrypted upload would cause OOM.
Do NOT increase workers without re-validating RAM
This configuration was validated over 6h 56min with 2,630+ files and 0 crashes. Increasing --workers on the Starter plan will cause Out-of-Memory kills.
Why gthread?¶
gthread (threaded Gunicorn worker) allows one process to handle multiple concurrent requests using Python threads:
- 6 threads = 6 concurrent requests
- For I/O-bound work (S3 uploads, DB queries), threads are effective — threads block on I/O, not CPU
- GIL is not a bottleneck for I/O workloads
- Alternative (
gevent) would require monkey-patching and has known compatibility issues withcryptographylibrary
Timeout Considerations¶
--timeout=120— a 50 GB file in 10 MB chunks = 5,000 parts. Each part request is short-lived. The timeout applies per-request, not per-session. 120s per request is sufficient.- If the timeout is too short (e.g., 30s), slow connections uploading a 10 MB chunk will time out and fail the upload.
--graceful-timeout=30allows in-flight upload-part requests to complete before Gunicorn kills the worker on restart.
Lazy Loading Impact¶
Heavy libraries are lazy-loaded in Lenzeye.py to reduce baseline RAM:
python
cv2 = None # Loaded on first use
numpy = None # Loaded on first use
vision = None # google.cloud.vision — loaded on first use
pytesseract = None # Loaded on first use
razorpay = None # Loaded on first use
This reduced baseline from ~319 MB to ~237 MB — an 82 MB saving that enabled sustained operation under load.
TL;DR¶
1 worker, 6 threads, 120s timeout, gthread class. This is the validated config for Render Starter (512 MB). The single worker constraint comes from the in-process _hmac_registry (HMAC accumulation) and RAM limits. Do not change without re-validating with a full production load test.