Skip to content

HMAC Integrity Verification


The Problem Solved

Verifying a file's integrity after upload normally requires re-downloading it from S3 to compute an HMAC — a 10 GB file would spike RAM by 10 GB. Lenzeye's solution: accumulate the HMAC during the upload, chunk by chunk, in server memory. Zero re-download needed.


_hmac_registry Design

python _hmac_registry: dict[str, dict] = {} _hmac_registry_lock = threading.Lock()

  • Module-level dict in guest_upload_routes.py
  • Key: S3 upload_id (unique per multipart upload)
  • Value: {"h": HMAC_object, "iv": iv_bytes}
  • One entry per in-progress encrypted upload
  • Thread-safe via _hmac_registry_lock

Accumulation Flow

Part 1

```python

Create new HMAC entry

with _hmac_registry_lock: h = crypto_hmac.HMAC(key, hashes.SHA256()) _hmac_registry[upload_id] = {"h": h, "iv": iv}

Update HMAC: seed with IV first

entry["h"].update(iv) # 16 bytes IV entry["h"].update(encrypted_chunk[16:]) # then ciphertext (skip stored IV prefix) ```

Parts 2, 3, 4...

```python

Just feed ciphertext bytes

entry["h"].update(encrypted_chunk) # pure ciphertext, no IV prefix ```

On Complete

```python with _hmac_registry_lock: entry = _hmac_registry.pop(upload_id) # remove from registry

hmac_bytes = entry["h"].finalize() # 32-byte HMAC hmac_hex = hmac_bytes.hex() # 64-char hex string

Store in S3 metadata via copy_object

s3.copy_object( Bucket=bucket_name, Key=s3_key, CopySource={"Bucket": bucket_name, "Key": s3_key}, Metadata={"lenzeye-hmac": hmac_hex, ...}, MetadataDirective="REPLACE" ) ```


Verification on Download

```python

Read stored HMAC from S3 metadata

head = s3.head_object(Bucket=bucket_name, Key=s3_key) stored_hmac = bytes.fromhex(head["Metadata"]["lenzeye-hmac"])

Stream ciphertext and compute HMAC simultaneously

h = HMAC(key, hashes.SHA256()) h.update(iv) # IV first for chunk in stream_from_s3(): h.update(chunk) # ... decrypt and yield ...

Verify AFTER full stream

computed = h.finalize() if not hmac.compare_digest(computed, stored_hmac): raise InvalidSignature("File integrity check failed") ```

Fail closed on HMAC mismatch

If HMAC fails, InvalidSignature is raised. The Flask streaming generator stops. No plaintext has been delivered to the client (Flask buffers streaming responses until the generator completes or raises).


Why Upload-Time HMAC (Not Post-Upload)

Approach RAM cost S3 calls Risk
Re-download after complete Full file size in RAM Extra GET per file OOM on large files
Accumulate during upload-part ~0 bytes extra 0 extra calls None

The upload-time approach adds literally zero RAM overhead — the HMAC object accumulates state incrementally, costing only ~200 bytes regardless of file size.


Single-Worker Constraint

The _hmac_registry lives in process memory. With multiple Gunicorn workers:

  • Part 1 hits Worker A → registry entry created in Worker A
  • Part 2 hits Worker B → registry entry not found → HMAC accumulation breaks

This is why --workers=1 is a hard requirement for encrypted uploads.


TL;DR

What: HMAC-SHA256 accumulated chunk-by-chunk during upload in _hmac_registry dict. Stored in S3 metadata on complete. Verified on download before decryption.

Key techniques: Thread-safe dict with threading.Lock, HMAC.update() per chunk (O(1) memory), HMAC.finalize() on complete, copy_object to update S3 metadata, compare_digest for timing-safe comparison.

Constraint: Requires --workers=1 (in-process registry). Fail-closed on mismatch.