Skip to content

Gunicorn Configuration


Current Procfile

web: gunicorn Lenzeye:app --workers=1 --threads=6 --timeout=120 --graceful-timeout=30 --keep-alive=5 --worker-class=gthread --worker-tmp-dir=/dev/shm --log-level=info --limit-request-line=4096 --limit-request-field_size=8190 --max-requests=1000 --max-requests-jitter=50


Parameter Breakdown

Parameter Value Reason
--workers=1 1 Single process to prevent RAM doubling
--threads=6 6 Handle up to 6 concurrent requests in one process
--timeout=120 120s Large encrypted uploads can take >60s on slow connections
--graceful-timeout=30 30s Give active requests time to finish on worker restart
--keep-alive=5 5s HTTP keep-alive for repeated requests (upload parts)
--worker-class=gthread gthread Thread-based; suitable for I/O-bound Flask workloads
--worker-tmp-dir=/dev/shm RAM-backed Worker heartbeat files on RAM disk — avoids disk I/O
--log-level=info info Log all requests and errors
--limit-request-line=4096 4096 Max URL length
--max-requests=1000 1000 Restart worker after 1000 requests to prevent memory drift
--max-requests-jitter=50 50 Random jitter on restart threshold to prevent all workers restarting at once

Why 1 Worker?

The Render Starter plan provides 512 MB RAM. Under production load:

State RAM
Baseline (1 worker, idle) ~237 MB
Normal upload load 243–377 MB
Peak (4 concurrent encrypted uploads) 398 MB
Headroom ~114 MB

With 2 workers, each worker loads the full Flask application independently (models, blueprints, lazy imports). Baseline would be ~474 MB — leaving only ~38 MB headroom. A single encrypted upload would cause OOM.

Do NOT increase workers without re-validating RAM

This configuration was validated over 6h 56min with 2,630+ files and 0 crashes. Increasing --workers on the Starter plan will cause Out-of-Memory kills.


Why gthread?

gthread (threaded Gunicorn worker) allows one process to handle multiple concurrent requests using Python threads:

  • 6 threads = 6 concurrent requests
  • For I/O-bound work (S3 uploads, DB queries), threads are effective — threads block on I/O, not CPU
  • GIL is not a bottleneck for I/O workloads
  • Alternative (gevent) would require monkey-patching and has known compatibility issues with cryptography library

Timeout Considerations

  • --timeout=120 — a 50 GB file in 10 MB chunks = 5,000 parts. Each part request is short-lived. The timeout applies per-request, not per-session. 120s per request is sufficient.
  • If the timeout is too short (e.g., 30s), slow connections uploading a 10 MB chunk will time out and fail the upload.
  • --graceful-timeout=30 allows in-flight upload-part requests to complete before Gunicorn kills the worker on restart.

Lazy Loading Impact

Heavy libraries are lazy-loaded in Lenzeye.py to reduce baseline RAM:

python cv2 = None # Loaded on first use numpy = None # Loaded on first use vision = None # google.cloud.vision — loaded on first use pytesseract = None # Loaded on first use razorpay = None # Loaded on first use

This reduced baseline from ~319 MB to ~237 MB — an 82 MB saving that enabled sustained operation under load.


TL;DR

1 worker, 6 threads, 120s timeout, gthread class. This is the validated config for Render Starter (512 MB). The single worker constraint comes from the in-process _hmac_registry (HMAC accumulation) and RAM limits. Do not change without re-validating with a full production load test.