Guest Upload Module — Internal Deep Dive¶

File: Features/SecureStorage/guest_upload_routes.py Blueprint: guest_upload_bp Public name: Lenzeye File Transfer

Module-Level State¶

Two shared objects are initialized at module load time and shared across all threads within a Gunicorn worker:

python _hmac_registry = {} # { upload_id: {"h": HMAC, "iv": bytes} } _hmac_registry_lock = threading.Lock() _encrypt_semaphore = threading.BoundedSemaphore(4)

Object	Purpose
`_hmac_registry`	Stores in-progress HMAC state keyed by `upload_id`. Populated by `upload-part`, consumed by `complete`, deleted by `abort`.
`_hmac_registry_lock`	Protects dict structure (add/remove keys). Individual HMAC objects are only touched sequentially per upload (parts are sequential per file).
`_encrypt_semaphore`	Caps simultaneous encrypt+S3-upload operations at 4 across all users/threads. 10 MB chunk × 4 slots ≈ 80 MB peak encryption buffer. Threads beyond this wait — no request is rejected.

Do not change BoundedSemaphore(4)

This was tuned to stay within 512 MB RAM on Render's Starter plan. Validated over a 6h 56min 2,630+ file upload session with peak RAM of 398 MB. Increasing it risks OOM crashes.

Architecture Overview¶

flowchart TD
    Browser --> PreChecks
    PreChecks --> Guidelines["guidelines/status
guide/accept"]
    PreChecks --> Storage["storage-check"]
    PreChecks --> FolderCheck["check-folder-name"]

    Browser --> UploadPath{Encryption
enabled?}

    UploadPath -- No --> PlainPath
    UploadPath -- Yes --> EncPath

    PlainPath --> PI["initiate
create_multipart_upload"]
    PI --> PP["presigned-url
browser PUT direct to S3"]
    PP --> PC["complete
complete_multipart_upload"]

    EncPath --> EI["encrypted/initiate
session_token AES-GCM sealed"]
    EI --> EP["encrypted/upload-part
encrypt chunk → S3"]
    EP --> EC["encrypted/complete
HMAC → S3 metadata"]

    PC --> Link["generate-link
GuestDownloadLink + OTP"]
    EC --> Link

Route Reference¶

Pre-Upload Phase¶

`POST /guest/upload-guidelines/status`¶

Checks if the sender has accepted the current version of the upload guidelines.

Logic: - Loads guidelines/guestuploadguidelines.txt from disk. - Computes SHA-256 hash of the file content. - Queries User.guest_upload_guidelines_accepted_hash and compares. - Also calls reset_guest_upload_if_new_month() on every request — resets monthly guest upload counters if a new month has started. - Returns { accepted, guest_upload_enabled, guidelines, guidelines_hash }.

Key detail: The guidelines hash is the version identifier. If the guidelines text changes, all senders must re-accept — the hash comparison enforces this automatically.

`POST /guest/upload-guidelines/accept`¶

Records acceptance of the current guidelines for a sender email.

Logic: - Validates that the submitted guidelines_hash matches the current file hash (rejects if guidelines changed between the user reading and accepting). - Calls auto_register_guest_user(email) — creates a User record with is_verified=False if one doesn't exist. - Sets user.guest_upload_guidelines_accepted_hash and user.guest_upload_guidelines_accepted_at. - If sender is a new (auto-registered) user, sets enable_guest_upload=True and saves email in Flask session as temporary_guest_upload_email — this flag is later used by guest_generate_link to disable guest upload after link generation (one-time use for non-registered senders).

`POST /guest/storage-check`¶

Validates the receiver has enough storage for the sender's selected files.

Logic: - Calls calculate_storage_used_from_wasabi(email) — queries Wasabi S3 for actual byte usage under {email}/ prefix. - Compares selected_size_bytes (in GB) against storage_limit - storage_used. - Returns detailed breakdown: storage_limit, storage_used, storage_available, selected_size_gb.

`POST /guest/check-folder-name`¶

Ensures the upload folder name is unique within the receiver's S3 namespace.

Logic: - Checks if prefix {email}/{folder_name}/ exists in S3 via list_objects_v2 with MaxKeys=1. - If it exists, tries folder_v1, folder_v2 ... up to folder_v99. - Falls back to folder_YYYYMMDD_HHMMSS if all 99 versions are taken. - Returns { folder_name, was_unique }.

Plain (Unencrypted) Multipart Upload Path¶

`POST /upload/guest/initiate`¶

Initiates a Wasabi S3 multipart upload session.

Logic: - Calls auto_register_guest_user(email) to ensure user exists. - Calls s3.create_multipart_upload(Bucket, Key="{email}/{file_key}"). - Returns { upload_id, user_id }.

S3 key format: {guest_email}/{folder_name}/{filename}

`GET /upload/guest/presigned-url`¶

Generates a presigned URL for a specific part number.

Logic: - Calls s3.generate_presigned_url("upload_part", ...) with ExpiresIn=100000 seconds. - The browser uses this URL to PUT the part directly to Wasabi — the chunk never passes through the Lenzeye server. - Returns { url }.

Direct-to-S3 upload

In the plain path, data goes Browser → S3 directly. The server only generates the presigned URL token. This is the most RAM-efficient upload path.

`POST /upload/guest/complete`¶

Finalises the multipart upload by assembling all parts.

Logic: - Calls s3.complete_multipart_upload(Bucket, Key, UploadId, MultipartUpload={"Parts": parts}). - parts is the list of { ETag, PartNumber } collected by the browser from each presigned URL PUT response.

`POST /upload/guest/list-parts`¶

Validates an existing multipart upload session — used for resumability checks.

Calls s3.list_parts(...) and returns all uploaded part numbers, ETags, and sizes.
Returns { valid: false } gracefully if the upload session no longer exists (NoSuchUpload).

`POST /upload/guest/abort`¶

Aborts an incomplete multipart upload — cleans up incomplete parts from S3.

Calls s3.abort_multipart_upload(...).
Returns success even if already aborted (NoSuchUpload is treated as success).

Encrypted Multipart Upload Path¶

This path is used when the receiver has encrypt_data_b=True on their User record and the master key is configured.

Architecture: Stateless Session Token¶

flowchart LR
    A[initiate:
get user key from DB] --> B[generate
16B IV]
    B --> C[create_upload_token
seal with AES-256-GCM]
    C --> D[return session_token
to browser]
    D --> E[browser sends token
on every upload-part]
    E --> F[server decrypts token
zero DB reads per part]

The session_token is an AES-256-GCM sealed blob (keyed with the master key) that carries {user_key, iv, part_size}. Layout of plaintext:

32 bytes user_key 16 bytes iv 4 bytes part_size (big-endian uint32) ────────────────── 52 bytes plaintext → 80 bytes token (after GCM tag + nonce) ~108 base64url chars

This means zero DB reads per upload part. Any Gunicorn worker can handle any part — fully stateless across workers.

`POST /upload/guest/encrypted/initiate`¶

Logic: 1. Validates part_size >= 5 MB (Wasabi S3 minimum part size). 2. Fetches or creates the user's encryption key via get_or_create_user_key(user) — 2 DB reads, once per file. 3. Generates a random 16-byte IV: iv = os.urandom(16). 4. Seals {user_key, iv, part_size} into session_token via create_upload_token(...). 5. Calls s3.create_multipart_upload(...) with S3 metadata lenzeye-encrypted=true, lenzeye-key-version=<version>. 6. Returns { upload_id, session_token }.

`POST /upload/guest/encrypted/upload-part`¶

This is the most complex route in the system.

Query params: upload_id, part_number, guest_email, file_key, session_token Body: Raw binary plaintext chunk (application/octet-stream)

Logic (step by step):

```python key, iv, part_size = decrypt_upload_token(session_token) # 1 AES-GCM op, zero DB

plaintext_chunk = request.get_data() byte_offset = (part_number - 1) * part_size

with _encrypt_semaphore: # max 4 concurrent across all users encrypted_chunk = encrypt_multipart_chunk(plaintext_chunk, key, iv, byte_offset)

if part_number == 1:
    encrypted_chunk = iv + encrypted_chunk  # prepend IV to first part only

# Accumulate HMAC
with _hmac_registry_lock:
    if upload_id not in _hmac_registry:
        h = crypto_hmac.HMAC(key, hashes.SHA256())
        _hmac_registry[upload_id] = {"h": h, "iv": iv}
    entry = _hmac_registry[upload_id]

if part_number == 1:
    entry["h"].update(iv)              # seed with IV
    entry["h"].update(encrypted_chunk[16:])  # then ciphertext (skip stored IV prefix)
else:
    entry["h"].update(encrypted_chunk) # pure ciphertext

resp = s3.upload_part(...)  # pushes encrypted chunk to Wasabi

```

Why byte_offset?

AES-CTR is a seekable stream cipher. The counter block for any byte offset is:

block_number = byte_offset // 16 counter_iv = (iv_as_int + block_number) % 2^128

This means part 3 can be encrypted without having seen parts 1 and 2. Each part's encryption is fully independent.

S3 object format after all parts uploaded: Part 1: [16B IV][encrypted bytes 0 .. part_size-1] Part 2: [encrypted bytes part_size .. 2*part_size-1] Part 3: [encrypted bytes 2*part_size .. 3*part_size-1] ...

Combined: [16B IV][full ciphertext]

`POST /upload/guest/encrypted/complete`¶

Logic: 1. Decrypts session_token (zero DB reads). 2. Pops the HMAC entry from _hmac_registry — finalizes the HMAC: hmac_hex = entry["h"].finalize().hex() → 32 bytes → 64 hex chars. 3. Calls s3.complete_multipart_upload(...) with all parts — assembles the object on S3. 4. Does s3.head_object(...) with up to 5 retries (1s sleep between each) to handle Wasabi eventual consistency. 5. Calls s3.copy_object(...) with MetadataDirective="REPLACE" to store HMAC in S3 metadata — zero bytes through the Lenzeye server, Wasabi handles the metadata-only copy internally.

Why not compute HMAC at complete time?

Computing HMAC at complete would require downloading the entire file from Wasabi to hash it — causing a RAM spike proportional to file size. The accumulation-during-upload approach keeps RAM at O(1) regardless of file size.

`POST /upload/guest/encrypted/abort`¶

Calls s3.abort_multipart_upload(...).
Removes entry from _hmac_registry if present (cleanup).
Handles NoSuchUpload gracefully.

Post-Upload Phase¶

`POST /guest-generate-link`¶

Generates the shareable download link after upload completes.

Logic: 1. Calls auto_register_guest_user(email) (idempotent). 2. Calls reset_guest_upload_if_new_month(). 3. Constructs s3_key = f"{email}/{folder_name}/". 4. Creates GuestDownloadLink record: token = secrets.token_urlsafe(32). 5. Calls link.guest_generate_otp() — generates a one-time OTP bound to the link. 6. If sender was a temporary guest (temporary_guest_upload_email session key), sets user.enable_guest_upload = False — prevents the auto-registered sender from re-using guest upload. 7. Returns { link, token, otp, folder_name }.

Shareable URL format: /guest/transfer-manager?token=<32-byte-urlsafe>&otp=<otp>

`POST /guest/send-download-link-email`¶

Currently disabled — route exists, returns success, but does not send email. Email notification to receiver is an upcoming feature.

Utility & Monitoring Routes¶

`GET /upload/ram-status`¶

Live RAM usage for the guest upload page's RAM meter.

Logic: - Uses psutil to get RSS of the current process. - Walks up to the Gunicorn parent process and sums RSS of all worker siblings. - Returns { used_mb, total_mb: 512, pct }. - No DB or S3 calls — purely OS-level memory read.

`GET /guest/wasabi-stats`¶

Returns total files and bytes stored in Wasabi — shown on the guest upload landing page.

Caching strategy: - Cache TTL: 5 minutes. - If cache is fresh → return instantly. - If cache is stale or empty → spawn a background threading.Thread to recalculate via s3.get_paginator('list_objects_v2'). - While recalculating → return stale data immediately with status: "refreshing". - If no data at all → return status: "calculating".

This ensures the guest upload page never blocks waiting for a full S3 paginate walk.

`GET /guest/total-users`¶

Returns User.query.count() — fast SQL count, no caching needed.

`POST /upload/test-speed` / `POST /upload/guest/speedtest`¶

Accepts binary data and discards it — used by the frontend to measure actual upload speed to the server before starting a real upload.

Key Internal Functions¶

`auto_register_guest_user(email)`¶

Creates a User record with is_verified=False if one doesn't exist for the given email. Normalizes email to lowercase. Idempotent — safe to call multiple times.

`get_unique_folder_name(email, folder_name)`¶

Checks S3 for prefix existence. Returns original name if available, otherwise tries _v1…_v99, then falls back to _YYYYMMDD_HHMMSS.

`get_guest_upload_guidelines()`¶

Reads guidelines/guestuploadguidelines.txt from disk and returns (text, sha256_hash). The hash serves as the version identifier for guidelines acceptance tracking.

Memory Safety Design¶

Design Decision	Memory Impact
`BoundedSemaphore(4)`	Caps peak encryption buffer at ~80 MB (4 × 10 MB chunks)
HMAC accumulated during upload-part	`/complete` never downloads from S3 → zero RAM spike
Presigned URLs for plain upload	Browser → S3 direct → server RAM = 0 for data
`session_token` carries key+IV	Zero DB reads per part → no connection pool pressure
5-min cache for Wasabi stats	Background thread, never blocks request handler

Error Handling Summary¶

Scenario	Handling
`NoSuchUpload` on abort/list-parts	Treated as success (idempotent)
`session_token` tampered	`decrypt_upload_token` raises `ValueError` → 400
HMAC registry missing at complete	Server restart mid-upload → 500 with retry message
Wasabi eventual consistency on `head_object`	Up to 5 retries with 1s sleep
Storage over limit	Blocked at pre-upload check, clear error message
Guidelines changed between read and accept	409 Conflict response

File	Role
`lenzeye_encryption_service.py`	`encrypt_multipart_chunk`, `create_upload_token`, `decrypt_upload_token`, `get_or_create_user_key`
`wasabiboto3.py`	`s3` client, `bucket_name`
`lenzeye_database.py`	`GuestDownloadLink` model
`lenzeye_BiodataStructure.py`	`User`, `UserEncryptionKey` models
`guest_upload_monthly_reset.py`	`reset_guest_upload_if_new_month()`
`admin_db_operations.py`	`calculate_storage_used_from_wasabi()`

TL;DR — Quick Overview¶

What it does: Allows anyone to upload files into a registered user's cloud storage without needing an account. Generates a secure download link after upload.

Two upload paths:

Plain path — Browser uploads directly to Wasabi S3 via presigned URLs. Server RAM usage for data = zero.
Encrypted path — Chunks pass through the server, encrypted with AES-256-CTR before hitting S3. HMAC-SHA256 integrity tag accumulated per chunk, stored in S3 metadata on completion.

Key techniques used:

S3 Multipart Upload — files split into 10 MB parts, assembled on S3 side.
AES-256-CTR — seekable stream cipher; each chunk encrypted independently at its byte offset.
HMAC-SHA256 (Encrypt-then-MAC) — integrity verified on every download.
AES-256-GCM session token — carries encryption key + IV sealed by master key; zero DB reads per upload part, works across all Gunicorn workers.
BoundedSemaphore(4) — caps concurrent encrypt+upload ops to bound RAM at ~80 MB peak.
SHA-256 guidelines hash — version-tracks upload guidelines; forces re-acceptance on any change.
Background thread + 5-min cache — Wasabi storage stats never block the upload page.
OTP-protected download links — secrets.token_urlsafe(32) token + one-time OTP per upload session.

Guest Upload Module — Internal Deep Dive¶

Module-Level State¶

Architecture Overview¶

Route Reference¶

Pre-Upload Phase¶

POST /guest/upload-guidelines/status¶

POST /guest/upload-guidelines/accept¶

POST /guest/storage-check¶

POST /guest/check-folder-name¶

Plain (Unencrypted) Multipart Upload Path¶

POST /upload/guest/initiate¶

GET /upload/guest/presigned-url¶

POST /upload/guest/complete¶

POST /upload/guest/list-parts¶

POST /upload/guest/abort¶

Encrypted Multipart Upload Path¶

Architecture: Stateless Session Token¶

POST /upload/guest/encrypted/initiate¶

POST /upload/guest/encrypted/upload-part¶

POST /upload/guest/encrypted/complete¶

POST /upload/guest/encrypted/abort¶