Guest Upload Module — Internal Deep Dive¶
File:
Features/SecureStorage/guest_upload_routes.pyBlueprint:guest_upload_bpPublic name: Lenzeye File Transfer
Module-Level State¶
Two shared objects are initialized at module load time and shared across all threads within a Gunicorn worker:
python
_hmac_registry = {} # { upload_id: {"h": HMAC, "iv": bytes} }
_hmac_registry_lock = threading.Lock()
_encrypt_semaphore = threading.BoundedSemaphore(4)
| Object | Purpose |
|---|---|
_hmac_registry |
Stores in-progress HMAC state keyed by upload_id. Populated by upload-part, consumed by complete, deleted by abort. |
_hmac_registry_lock |
Protects dict structure (add/remove keys). Individual HMAC objects are only touched sequentially per upload (parts are sequential per file). |
_encrypt_semaphore |
Caps simultaneous encrypt+S3-upload operations at 4 across all users/threads. 10 MB chunk × 4 slots ≈ 80 MB peak encryption buffer. Threads beyond this wait — no request is rejected. |
Do not change BoundedSemaphore(4)
This was tuned to stay within 512 MB RAM on Render's Starter plan. Validated over a 6h 56min 2,630+ file upload session with peak RAM of 398 MB. Increasing it risks OOM crashes.
Architecture Overview¶
flowchart TD
Browser --> PreChecks
PreChecks --> Guidelines["guidelines/status
guide/accept"]
PreChecks --> Storage["storage-check"]
PreChecks --> FolderCheck["check-folder-name"]
Browser --> UploadPath{Encryption
enabled?}
UploadPath -- No --> PlainPath
UploadPath -- Yes --> EncPath
PlainPath --> PI["initiate
create_multipart_upload"]
PI --> PP["presigned-url
browser PUT direct to S3"]
PP --> PC["complete
complete_multipart_upload"]
EncPath --> EI["encrypted/initiate
session_token AES-GCM sealed"]
EI --> EP["encrypted/upload-part
encrypt chunk → S3"]
EP --> EC["encrypted/complete
HMAC → S3 metadata"]
PC --> Link["generate-link
GuestDownloadLink + OTP"]
EC --> Link
Route Reference¶
Pre-Upload Phase¶
POST /guest/upload-guidelines/status¶
Checks if the sender has accepted the current version of the upload guidelines.
Logic:
- Loads guidelines/guestuploadguidelines.txt from disk.
- Computes SHA-256 hash of the file content.
- Queries User.guest_upload_guidelines_accepted_hash and compares.
- Also calls reset_guest_upload_if_new_month() on every request — resets monthly guest upload counters if a new month has started.
- Returns { accepted, guest_upload_enabled, guidelines, guidelines_hash }.
Key detail: The guidelines hash is the version identifier. If the guidelines text changes, all senders must re-accept — the hash comparison enforces this automatically.
POST /guest/upload-guidelines/accept¶
Records acceptance of the current guidelines for a sender email.
Logic:
- Validates that the submitted guidelines_hash matches the current file hash (rejects if guidelines changed between the user reading and accepting).
- Calls auto_register_guest_user(email) — creates a User record with is_verified=False if one doesn't exist.
- Sets user.guest_upload_guidelines_accepted_hash and user.guest_upload_guidelines_accepted_at.
- If sender is a new (auto-registered) user, sets enable_guest_upload=True and saves email in Flask session as temporary_guest_upload_email — this flag is later used by guest_generate_link to disable guest upload after link generation (one-time use for non-registered senders).
POST /guest/storage-check¶
Validates the receiver has enough storage for the sender's selected files.
Logic:
- Calls calculate_storage_used_from_wasabi(email) — queries Wasabi S3 for actual byte usage under {email}/ prefix.
- Compares selected_size_bytes (in GB) against storage_limit - storage_used.
- Returns detailed breakdown: storage_limit, storage_used, storage_available, selected_size_gb.
POST /guest/check-folder-name¶
Ensures the upload folder name is unique within the receiver's S3 namespace.
Logic:
- Checks if prefix {email}/{folder_name}/ exists in S3 via list_objects_v2 with MaxKeys=1.
- If it exists, tries folder_v1, folder_v2 ... up to folder_v99.
- Falls back to folder_YYYYMMDD_HHMMSS if all 99 versions are taken.
- Returns { folder_name, was_unique }.
Plain (Unencrypted) Multipart Upload Path¶
POST /upload/guest/initiate¶
Initiates a Wasabi S3 multipart upload session.
Logic:
- Calls auto_register_guest_user(email) to ensure user exists.
- Calls s3.create_multipart_upload(Bucket, Key="{email}/{file_key}").
- Returns { upload_id, user_id }.
S3 key format: {guest_email}/{folder_name}/{filename}
GET /upload/guest/presigned-url¶
Generates a presigned URL for a specific part number.
Logic:
- Calls s3.generate_presigned_url("upload_part", ...) with ExpiresIn=100000 seconds.
- The browser uses this URL to PUT the part directly to Wasabi — the chunk never passes through the Lenzeye server.
- Returns { url }.
Direct-to-S3 upload
In the plain path, data goes Browser → S3 directly. The server only generates the presigned URL token. This is the most RAM-efficient upload path.
POST /upload/guest/complete¶
Finalises the multipart upload by assembling all parts.
Logic:
- Calls s3.complete_multipart_upload(Bucket, Key, UploadId, MultipartUpload={"Parts": parts}).
- parts is the list of { ETag, PartNumber } collected by the browser from each presigned URL PUT response.
POST /upload/guest/list-parts¶
Validates an existing multipart upload session — used for resumability checks.
- Calls
s3.list_parts(...)and returns all uploaded part numbers, ETags, and sizes. - Returns
{ valid: false }gracefully if the upload session no longer exists (NoSuchUpload).
POST /upload/guest/abort¶
Aborts an incomplete multipart upload — cleans up incomplete parts from S3.
- Calls
s3.abort_multipart_upload(...). - Returns success even if already aborted (
NoSuchUploadis treated as success).
Encrypted Multipart Upload Path¶
This path is used when the receiver has encrypt_data_b=True on their User record and the master key is configured.
Architecture: Stateless Session Token¶
flowchart LR
A[initiate:
get user key from DB] --> B[generate
16B IV]
B --> C[create_upload_token
seal with AES-256-GCM]
C --> D[return session_token
to browser]
D --> E[browser sends token
on every upload-part]
E --> F[server decrypts token
zero DB reads per part]
The session_token is an AES-256-GCM sealed blob (keyed with the master key) that carries {user_key, iv, part_size}. Layout of plaintext:
32 bytes user_key
16 bytes iv
4 bytes part_size (big-endian uint32)
──────────────────
52 bytes plaintext → 80 bytes token (after GCM tag + nonce)
~108 base64url chars
This means zero DB reads per upload part. Any Gunicorn worker can handle any part — fully stateless across workers.
POST /upload/guest/encrypted/initiate¶
Logic:
1. Validates part_size >= 5 MB (Wasabi S3 minimum part size).
2. Fetches or creates the user's encryption key via get_or_create_user_key(user) — 2 DB reads, once per file.
3. Generates a random 16-byte IV: iv = os.urandom(16).
4. Seals {user_key, iv, part_size} into session_token via create_upload_token(...).
5. Calls s3.create_multipart_upload(...) with S3 metadata lenzeye-encrypted=true, lenzeye-key-version=<version>.
6. Returns { upload_id, session_token }.
POST /upload/guest/encrypted/upload-part¶
This is the most complex route in the system.
Query params: upload_id, part_number, guest_email, file_key, session_token
Body: Raw binary plaintext chunk (application/octet-stream)
Logic (step by step):
```python key, iv, part_size = decrypt_upload_token(session_token) # 1 AES-GCM op, zero DB
plaintext_chunk = request.get_data() byte_offset = (part_number - 1) * part_size
with _encrypt_semaphore: # max 4 concurrent across all users encrypted_chunk = encrypt_multipart_chunk(plaintext_chunk, key, iv, byte_offset)
if part_number == 1:
encrypted_chunk = iv + encrypted_chunk # prepend IV to first part only
# Accumulate HMAC
with _hmac_registry_lock:
if upload_id not in _hmac_registry:
h = crypto_hmac.HMAC(key, hashes.SHA256())
_hmac_registry[upload_id] = {"h": h, "iv": iv}
entry = _hmac_registry[upload_id]
if part_number == 1:
entry["h"].update(iv) # seed with IV
entry["h"].update(encrypted_chunk[16:]) # then ciphertext (skip stored IV prefix)
else:
entry["h"].update(encrypted_chunk) # pure ciphertext
resp = s3.upload_part(...) # pushes encrypted chunk to Wasabi
```
Why byte_offset?
AES-CTR is a seekable stream cipher. The counter block for any byte offset is:
block_number = byte_offset // 16
counter_iv = (iv_as_int + block_number) % 2^128
This means part 3 can be encrypted without having seen parts 1 and 2. Each part's encryption is fully independent.
S3 object format after all parts uploaded:
Part 1: [16B IV][encrypted bytes 0 .. part_size-1]
Part 2: [encrypted bytes part_size .. 2*part_size-1]
Part 3: [encrypted bytes 2*part_size .. 3*part_size-1]
...
Combined: [16B IV][full ciphertext]
POST /upload/guest/encrypted/complete¶
Logic:
1. Decrypts session_token (zero DB reads).
2. Pops the HMAC entry from _hmac_registry — finalizes the HMAC: hmac_hex = entry["h"].finalize().hex() → 32 bytes → 64 hex chars.
3. Calls s3.complete_multipart_upload(...) with all parts — assembles the object on S3.
4. Does s3.head_object(...) with up to 5 retries (1s sleep between each) to handle Wasabi eventual consistency.
5. Calls s3.copy_object(...) with MetadataDirective="REPLACE" to store HMAC in S3 metadata — zero bytes through the Lenzeye server, Wasabi handles the metadata-only copy internally.
Why not compute HMAC at complete time?
Computing HMAC at complete would require downloading the entire file from Wasabi to hash it — causing a RAM spike proportional to file size. The accumulation-during-upload approach keeps RAM at O(1) regardless of file size.
POST /upload/guest/encrypted/abort¶
- Calls
s3.abort_multipart_upload(...). - Removes entry from
_hmac_registryif present (cleanup). - Handles
NoSuchUploadgracefully.
Post-Upload Phase¶
POST /guest-generate-link¶
Generates the shareable download link after upload completes.
Logic:
1. Calls auto_register_guest_user(email) (idempotent).
2. Calls reset_guest_upload_if_new_month().
3. Constructs s3_key = f"{email}/{folder_name}/".
4. Creates GuestDownloadLink record: token = secrets.token_urlsafe(32).
5. Calls link.guest_generate_otp() — generates a one-time OTP bound to the link.
6. If sender was a temporary guest (temporary_guest_upload_email session key), sets user.enable_guest_upload = False — prevents the auto-registered sender from re-using guest upload.
7. Returns { link, token, otp, folder_name }.
Shareable URL format:
/guest/transfer-manager?token=<32-byte-urlsafe>&otp=<otp>
POST /guest/send-download-link-email¶
Currently disabled — route exists, returns success, but does not send email. Email notification to receiver is an upcoming feature.
Utility & Monitoring Routes¶
GET /upload/ram-status¶
Live RAM usage for the guest upload page's RAM meter.
Logic:
- Uses psutil to get RSS of the current process.
- Walks up to the Gunicorn parent process and sums RSS of all worker siblings.
- Returns { used_mb, total_mb: 512, pct }.
- No DB or S3 calls — purely OS-level memory read.
GET /guest/wasabi-stats¶
Returns total files and bytes stored in Wasabi — shown on the guest upload landing page.
Caching strategy:
- Cache TTL: 5 minutes.
- If cache is fresh → return instantly.
- If cache is stale or empty → spawn a background threading.Thread to recalculate via s3.get_paginator('list_objects_v2').
- While recalculating → return stale data immediately with status: "refreshing".
- If no data at all → return status: "calculating".
This ensures the guest upload page never blocks waiting for a full S3 paginate walk.
GET /guest/total-users¶
Returns User.query.count() — fast SQL count, no caching needed.
POST /upload/test-speed / POST /upload/guest/speedtest¶
Accepts binary data and discards it — used by the frontend to measure actual upload speed to the server before starting a real upload.
Key Internal Functions¶
auto_register_guest_user(email)¶
Creates a User record with is_verified=False if one doesn't exist for the given email. Normalizes email to lowercase. Idempotent — safe to call multiple times.
get_unique_folder_name(email, folder_name)¶
Checks S3 for prefix existence. Returns original name if available, otherwise tries _v1…_v99, then falls back to _YYYYMMDD_HHMMSS.
get_guest_upload_guidelines()¶
Reads guidelines/guestuploadguidelines.txt from disk and returns (text, sha256_hash). The hash serves as the version identifier for guidelines acceptance tracking.
Memory Safety Design¶
| Design Decision | Memory Impact |
|---|---|
BoundedSemaphore(4) |
Caps peak encryption buffer at ~80 MB (4 × 10 MB chunks) |
| HMAC accumulated during upload-part | /complete never downloads from S3 → zero RAM spike |
| Presigned URLs for plain upload | Browser → S3 direct → server RAM = 0 for data |
session_token carries key+IV |
Zero DB reads per part → no connection pool pressure |
| 5-min cache for Wasabi stats | Background thread, never blocks request handler |
Error Handling Summary¶
| Scenario | Handling |
|---|---|
NoSuchUpload on abort/list-parts |
Treated as success (idempotent) |
session_token tampered |
decrypt_upload_token raises ValueError → 400 |
| HMAC registry missing at complete | Server restart mid-upload → 500 with retry message |
Wasabi eventual consistency on head_object |
Up to 5 retries with 1s sleep |
| Storage over limit | Blocked at pre-upload check, clear error message |
| Guidelines changed between read and accept | 409 Conflict response |
Related Files¶
| File | Role |
|---|---|
lenzeye_encryption_service.py |
encrypt_multipart_chunk, create_upload_token, decrypt_upload_token, get_or_create_user_key |
wasabiboto3.py |
s3 client, bucket_name |
lenzeye_database.py |
GuestDownloadLink model |
lenzeye_BiodataStructure.py |
User, UserEncryptionKey models |
guest_upload_monthly_reset.py |
reset_guest_upload_if_new_month() |
admin_db_operations.py |
calculate_storage_used_from_wasabi() |
TL;DR — Quick Overview¶
What it does: Allows anyone to upload files into a registered user's cloud storage without needing an account. Generates a secure download link after upload.
Two upload paths:
- Plain path — Browser uploads directly to Wasabi S3 via presigned URLs. Server RAM usage for data = zero.
- Encrypted path — Chunks pass through the server, encrypted with AES-256-CTR before hitting S3. HMAC-SHA256 integrity tag accumulated per chunk, stored in S3 metadata on completion.
Key techniques used:
- S3 Multipart Upload — files split into 10 MB parts, assembled on S3 side.
- AES-256-CTR — seekable stream cipher; each chunk encrypted independently at its byte offset.
- HMAC-SHA256 (Encrypt-then-MAC) — integrity verified on every download.
- AES-256-GCM session token — carries encryption key + IV sealed by master key; zero DB reads per upload part, works across all Gunicorn workers.
- BoundedSemaphore(4) — caps concurrent encrypt+upload ops to bound RAM at ~80 MB peak.
- SHA-256 guidelines hash — version-tracks upload guidelines; forces re-acceptance on any change.
- Background thread + 5-min cache — Wasabi storage stats never block the upload page.
- OTP-protected download links —
secrets.token_urlsafe(32)token + one-time OTP per upload session.