HMAC Integrity Verification¶
The Problem Solved¶
Verifying a file's integrity after upload normally requires re-downloading it from S3 to compute an HMAC — a 10 GB file would spike RAM by 10 GB. Lenzeye's solution: accumulate the HMAC during the upload, chunk by chunk, in server memory. Zero re-download needed.
_hmac_registry Design¶
python
_hmac_registry: dict[str, dict] = {}
_hmac_registry_lock = threading.Lock()
- Module-level dict in
guest_upload_routes.py - Key: S3
upload_id(unique per multipart upload) - Value:
{"h": HMAC_object, "iv": iv_bytes} - One entry per in-progress encrypted upload
- Thread-safe via
_hmac_registry_lock
Accumulation Flow¶
Part 1¶
```python
Create new HMAC entry¶
with _hmac_registry_lock: h = crypto_hmac.HMAC(key, hashes.SHA256()) _hmac_registry[upload_id] = {"h": h, "iv": iv}
Update HMAC: seed with IV first¶
entry["h"].update(iv) # 16 bytes IV entry["h"].update(encrypted_chunk[16:]) # then ciphertext (skip stored IV prefix) ```
Parts 2, 3, 4...¶
```python
Just feed ciphertext bytes¶
entry["h"].update(encrypted_chunk) # pure ciphertext, no IV prefix ```
On Complete¶
```python with _hmac_registry_lock: entry = _hmac_registry.pop(upload_id) # remove from registry
hmac_bytes = entry["h"].finalize() # 32-byte HMAC hmac_hex = hmac_bytes.hex() # 64-char hex string
Store in S3 metadata via copy_object¶
s3.copy_object( Bucket=bucket_name, Key=s3_key, CopySource={"Bucket": bucket_name, "Key": s3_key}, Metadata={"lenzeye-hmac": hmac_hex, ...}, MetadataDirective="REPLACE" ) ```
Verification on Download¶
```python
Read stored HMAC from S3 metadata¶
head = s3.head_object(Bucket=bucket_name, Key=s3_key) stored_hmac = bytes.fromhex(head["Metadata"]["lenzeye-hmac"])
Stream ciphertext and compute HMAC simultaneously¶
h = HMAC(key, hashes.SHA256()) h.update(iv) # IV first for chunk in stream_from_s3(): h.update(chunk) # ... decrypt and yield ...
Verify AFTER full stream¶
computed = h.finalize() if not hmac.compare_digest(computed, stored_hmac): raise InvalidSignature("File integrity check failed") ```
Fail closed on HMAC mismatch
If HMAC fails, InvalidSignature is raised. The Flask streaming generator stops. No plaintext has been delivered to the client (Flask buffers streaming responses until the generator completes or raises).
Why Upload-Time HMAC (Not Post-Upload)¶
| Approach | RAM cost | S3 calls | Risk |
|---|---|---|---|
| Re-download after complete | Full file size in RAM | Extra GET per file | OOM on large files |
| Accumulate during upload-part | ~0 bytes extra | 0 extra calls | None |
The upload-time approach adds literally zero RAM overhead — the HMAC object accumulates state incrementally, costing only ~200 bytes regardless of file size.
Single-Worker Constraint¶
The _hmac_registry lives in process memory. With multiple Gunicorn workers:
- Part 1 hits Worker A → registry entry created in Worker A
- Part 2 hits Worker B → registry entry not found → HMAC accumulation breaks
This is why --workers=1 is a hard requirement for encrypted uploads.
TL;DR¶
What: HMAC-SHA256 accumulated chunk-by-chunk during upload in _hmac_registry dict. Stored in S3 metadata on complete. Verified on download before decryption.
Key techniques: Thread-safe dict with threading.Lock, HMAC.update() per chunk (O(1) memory), HMAC.finalize() on complete, copy_object to update S3 metadata, compare_digest for timing-safe comparison.
Constraint: Requires --workers=1 (in-process registry). Fail-closed on mismatch.