Skip to content

Guest Upload Module — Internal Deep Dive

File: Features/SecureStorage/guest_upload_routes.py Blueprint: guest_upload_bp Public name: Lenzeye File Transfer


Module-Level State

Two shared objects are initialized at module load time and shared across all threads within a Gunicorn worker:

python _hmac_registry = {} # { upload_id: {"h": HMAC, "iv": bytes} } _hmac_registry_lock = threading.Lock() _encrypt_semaphore = threading.BoundedSemaphore(4)

Object Purpose
_hmac_registry Stores in-progress HMAC state keyed by upload_id. Populated by upload-part, consumed by complete, deleted by abort.
_hmac_registry_lock Protects dict structure (add/remove keys). Individual HMAC objects are only touched sequentially per upload (parts are sequential per file).
_encrypt_semaphore Caps simultaneous encrypt+S3-upload operations at 4 across all users/threads. 10 MB chunk × 4 slots ≈ 80 MB peak encryption buffer. Threads beyond this wait — no request is rejected.

Do not change BoundedSemaphore(4)

This was tuned to stay within 512 MB RAM on Render's Starter plan. Validated over a 6h 56min 2,630+ file upload session with peak RAM of 398 MB. Increasing it risks OOM crashes.


Architecture Overview

flowchart TD
    Browser --> PreChecks
    PreChecks --> Guidelines["guidelines/status
guide/accept"]
    PreChecks --> Storage["storage-check"]
    PreChecks --> FolderCheck["check-folder-name"]

    Browser --> UploadPath{Encryption
enabled?}

    UploadPath -- No --> PlainPath
    UploadPath -- Yes --> EncPath

    PlainPath --> PI["initiate
create_multipart_upload"]
    PI --> PP["presigned-url
browser PUT direct to S3"]
    PP --> PC["complete
complete_multipart_upload"]

    EncPath --> EI["encrypted/initiate
session_token AES-GCM sealed"]
    EI --> EP["encrypted/upload-part
encrypt chunk → S3"]
    EP --> EC["encrypted/complete
HMAC → S3 metadata"]

    PC --> Link["generate-link
GuestDownloadLink + OTP"]
    EC --> Link

Route Reference

Pre-Upload Phase

POST /guest/upload-guidelines/status

Checks if the sender has accepted the current version of the upload guidelines.

Logic: - Loads guidelines/guestuploadguidelines.txt from disk. - Computes SHA-256 hash of the file content. - Queries User.guest_upload_guidelines_accepted_hash and compares. - Also calls reset_guest_upload_if_new_month() on every request — resets monthly guest upload counters if a new month has started. - Returns { accepted, guest_upload_enabled, guidelines, guidelines_hash }.

Key detail: The guidelines hash is the version identifier. If the guidelines text changes, all senders must re-accept — the hash comparison enforces this automatically.


POST /guest/upload-guidelines/accept

Records acceptance of the current guidelines for a sender email.

Logic: - Validates that the submitted guidelines_hash matches the current file hash (rejects if guidelines changed between the user reading and accepting). - Calls auto_register_guest_user(email) — creates a User record with is_verified=False if one doesn't exist. - Sets user.guest_upload_guidelines_accepted_hash and user.guest_upload_guidelines_accepted_at. - If sender is a new (auto-registered) user, sets enable_guest_upload=True and saves email in Flask session as temporary_guest_upload_email — this flag is later used by guest_generate_link to disable guest upload after link generation (one-time use for non-registered senders).


POST /guest/storage-check

Validates the receiver has enough storage for the sender's selected files.

Logic: - Calls calculate_storage_used_from_wasabi(email) — queries Wasabi S3 for actual byte usage under {email}/ prefix. - Compares selected_size_bytes (in GB) against storage_limit - storage_used. - Returns detailed breakdown: storage_limit, storage_used, storage_available, selected_size_gb.


POST /guest/check-folder-name

Ensures the upload folder name is unique within the receiver's S3 namespace.

Logic: - Checks if prefix {email}/{folder_name}/ exists in S3 via list_objects_v2 with MaxKeys=1. - If it exists, tries folder_v1, folder_v2 ... up to folder_v99. - Falls back to folder_YYYYMMDD_HHMMSS if all 99 versions are taken. - Returns { folder_name, was_unique }.


Plain (Unencrypted) Multipart Upload Path

POST /upload/guest/initiate

Initiates a Wasabi S3 multipart upload session.

Logic: - Calls auto_register_guest_user(email) to ensure user exists. - Calls s3.create_multipart_upload(Bucket, Key="{email}/{file_key}"). - Returns { upload_id, user_id }.

S3 key format: {guest_email}/{folder_name}/{filename}


GET /upload/guest/presigned-url

Generates a presigned URL for a specific part number.

Logic: - Calls s3.generate_presigned_url("upload_part", ...) with ExpiresIn=100000 seconds. - The browser uses this URL to PUT the part directly to Wasabi — the chunk never passes through the Lenzeye server. - Returns { url }.

Direct-to-S3 upload

In the plain path, data goes Browser → S3 directly. The server only generates the presigned URL token. This is the most RAM-efficient upload path.


POST /upload/guest/complete

Finalises the multipart upload by assembling all parts.

Logic: - Calls s3.complete_multipart_upload(Bucket, Key, UploadId, MultipartUpload={"Parts": parts}). - parts is the list of { ETag, PartNumber } collected by the browser from each presigned URL PUT response.


POST /upload/guest/list-parts

Validates an existing multipart upload session — used for resumability checks.

  • Calls s3.list_parts(...) and returns all uploaded part numbers, ETags, and sizes.
  • Returns { valid: false } gracefully if the upload session no longer exists (NoSuchUpload).

POST /upload/guest/abort

Aborts an incomplete multipart upload — cleans up incomplete parts from S3.

  • Calls s3.abort_multipart_upload(...).
  • Returns success even if already aborted (NoSuchUpload is treated as success).

Encrypted Multipart Upload Path

This path is used when the receiver has encrypt_data_b=True on their User record and the master key is configured.

Architecture: Stateless Session Token

flowchart LR
    A[initiate:
get user key from DB] --> B[generate
16B IV]
    B --> C[create_upload_token
seal with AES-256-GCM]
    C --> D[return session_token
to browser]
    D --> E[browser sends token
on every upload-part]
    E --> F[server decrypts token
zero DB reads per part]

The session_token is an AES-256-GCM sealed blob (keyed with the master key) that carries {user_key, iv, part_size}. Layout of plaintext:

32 bytes user_key 16 bytes iv 4 bytes part_size (big-endian uint32) ────────────────── 52 bytes plaintext → 80 bytes token (after GCM tag + nonce) ~108 base64url chars

This means zero DB reads per upload part. Any Gunicorn worker can handle any part — fully stateless across workers.


POST /upload/guest/encrypted/initiate

Logic: 1. Validates part_size >= 5 MB (Wasabi S3 minimum part size). 2. Fetches or creates the user's encryption key via get_or_create_user_key(user)2 DB reads, once per file. 3. Generates a random 16-byte IV: iv = os.urandom(16). 4. Seals {user_key, iv, part_size} into session_token via create_upload_token(...). 5. Calls s3.create_multipart_upload(...) with S3 metadata lenzeye-encrypted=true, lenzeye-key-version=<version>. 6. Returns { upload_id, session_token }.


POST /upload/guest/encrypted/upload-part

This is the most complex route in the system.

Query params: upload_id, part_number, guest_email, file_key, session_token Body: Raw binary plaintext chunk (application/octet-stream)

Logic (step by step):

```python key, iv, part_size = decrypt_upload_token(session_token) # 1 AES-GCM op, zero DB

plaintext_chunk = request.get_data() byte_offset = (part_number - 1) * part_size

with _encrypt_semaphore: # max 4 concurrent across all users encrypted_chunk = encrypt_multipart_chunk(plaintext_chunk, key, iv, byte_offset)

if part_number == 1:
    encrypted_chunk = iv + encrypted_chunk  # prepend IV to first part only

# Accumulate HMAC
with _hmac_registry_lock:
    if upload_id not in _hmac_registry:
        h = crypto_hmac.HMAC(key, hashes.SHA256())
        _hmac_registry[upload_id] = {"h": h, "iv": iv}
    entry = _hmac_registry[upload_id]

if part_number == 1:
    entry["h"].update(iv)              # seed with IV
    entry["h"].update(encrypted_chunk[16:])  # then ciphertext (skip stored IV prefix)
else:
    entry["h"].update(encrypted_chunk) # pure ciphertext

resp = s3.upload_part(...)  # pushes encrypted chunk to Wasabi

```

Why byte_offset?

AES-CTR is a seekable stream cipher. The counter block for any byte offset is:

block_number = byte_offset // 16 counter_iv = (iv_as_int + block_number) % 2^128

This means part 3 can be encrypted without having seen parts 1 and 2. Each part's encryption is fully independent.

S3 object format after all parts uploaded: Part 1: [16B IV][encrypted bytes 0 .. part_size-1] Part 2: [encrypted bytes part_size .. 2*part_size-1] Part 3: [encrypted bytes 2*part_size .. 3*part_size-1] ...

Combined: [16B IV][full ciphertext]


POST /upload/guest/encrypted/complete

Logic: 1. Decrypts session_token (zero DB reads). 2. Pops the HMAC entry from _hmac_registry — finalizes the HMAC: hmac_hex = entry["h"].finalize().hex() → 32 bytes → 64 hex chars. 3. Calls s3.complete_multipart_upload(...) with all parts — assembles the object on S3. 4. Does s3.head_object(...) with up to 5 retries (1s sleep between each) to handle Wasabi eventual consistency. 5. Calls s3.copy_object(...) with MetadataDirective="REPLACE" to store HMAC in S3 metadata — zero bytes through the Lenzeye server, Wasabi handles the metadata-only copy internally.

Why not compute HMAC at complete time?

Computing HMAC at complete would require downloading the entire file from Wasabi to hash it — causing a RAM spike proportional to file size. The accumulation-during-upload approach keeps RAM at O(1) regardless of file size.


POST /upload/guest/encrypted/abort

  • Calls s3.abort_multipart_upload(...).
  • Removes entry from _hmac_registry if present (cleanup).
  • Handles NoSuchUpload gracefully.

Post-Upload Phase

POST /guest-generate-link

Generates the shareable download link after upload completes.

Logic: 1. Calls auto_register_guest_user(email) (idempotent). 2. Calls reset_guest_upload_if_new_month(). 3. Constructs s3_key = f"{email}/{folder_name}/". 4. Creates GuestDownloadLink record: token = secrets.token_urlsafe(32). 5. Calls link.guest_generate_otp() — generates a one-time OTP bound to the link. 6. If sender was a temporary guest (temporary_guest_upload_email session key), sets user.enable_guest_upload = False — prevents the auto-registered sender from re-using guest upload. 7. Returns { link, token, otp, folder_name }.

Shareable URL format: /guest/transfer-manager?token=<32-byte-urlsafe>&otp=<otp>


POST /guest/send-download-link-email

Currently disabled — route exists, returns success, but does not send email. Email notification to receiver is an upcoming feature.


Utility & Monitoring Routes

GET /upload/ram-status

Live RAM usage for the guest upload page's RAM meter.

Logic: - Uses psutil to get RSS of the current process. - Walks up to the Gunicorn parent process and sums RSS of all worker siblings. - Returns { used_mb, total_mb: 512, pct }. - No DB or S3 calls — purely OS-level memory read.


GET /guest/wasabi-stats

Returns total files and bytes stored in Wasabi — shown on the guest upload landing page.

Caching strategy: - Cache TTL: 5 minutes. - If cache is fresh → return instantly. - If cache is stale or empty → spawn a background threading.Thread to recalculate via s3.get_paginator('list_objects_v2'). - While recalculating → return stale data immediately with status: "refreshing". - If no data at all → return status: "calculating".

This ensures the guest upload page never blocks waiting for a full S3 paginate walk.


GET /guest/total-users

Returns User.query.count() — fast SQL count, no caching needed.


POST /upload/test-speed / POST /upload/guest/speedtest

Accepts binary data and discards it — used by the frontend to measure actual upload speed to the server before starting a real upload.


Key Internal Functions

auto_register_guest_user(email)

Creates a User record with is_verified=False if one doesn't exist for the given email. Normalizes email to lowercase. Idempotent — safe to call multiple times.

get_unique_folder_name(email, folder_name)

Checks S3 for prefix existence. Returns original name if available, otherwise tries _v1_v99, then falls back to _YYYYMMDD_HHMMSS.

get_guest_upload_guidelines()

Reads guidelines/guestuploadguidelines.txt from disk and returns (text, sha256_hash). The hash serves as the version identifier for guidelines acceptance tracking.


Memory Safety Design

Design Decision Memory Impact
BoundedSemaphore(4) Caps peak encryption buffer at ~80 MB (4 × 10 MB chunks)
HMAC accumulated during upload-part /complete never downloads from S3 → zero RAM spike
Presigned URLs for plain upload Browser → S3 direct → server RAM = 0 for data
session_token carries key+IV Zero DB reads per part → no connection pool pressure
5-min cache for Wasabi stats Background thread, never blocks request handler

Error Handling Summary

Scenario Handling
NoSuchUpload on abort/list-parts Treated as success (idempotent)
session_token tampered decrypt_upload_token raises ValueError → 400
HMAC registry missing at complete Server restart mid-upload → 500 with retry message
Wasabi eventual consistency on head_object Up to 5 retries with 1s sleep
Storage over limit Blocked at pre-upload check, clear error message
Guidelines changed between read and accept 409 Conflict response

File Role
lenzeye_encryption_service.py encrypt_multipart_chunk, create_upload_token, decrypt_upload_token, get_or_create_user_key
wasabiboto3.py s3 client, bucket_name
lenzeye_database.py GuestDownloadLink model
lenzeye_BiodataStructure.py User, UserEncryptionKey models
guest_upload_monthly_reset.py reset_guest_upload_if_new_month()
admin_db_operations.py calculate_storage_used_from_wasabi()

TL;DR — Quick Overview

What it does: Allows anyone to upload files into a registered user's cloud storage without needing an account. Generates a secure download link after upload.

Two upload paths:

  • Plain path — Browser uploads directly to Wasabi S3 via presigned URLs. Server RAM usage for data = zero.
  • Encrypted path — Chunks pass through the server, encrypted with AES-256-CTR before hitting S3. HMAC-SHA256 integrity tag accumulated per chunk, stored in S3 metadata on completion.

Key techniques used:

  • S3 Multipart Upload — files split into 10 MB parts, assembled on S3 side.
  • AES-256-CTR — seekable stream cipher; each chunk encrypted independently at its byte offset.
  • HMAC-SHA256 (Encrypt-then-MAC) — integrity verified on every download.
  • AES-256-GCM session token — carries encryption key + IV sealed by master key; zero DB reads per upload part, works across all Gunicorn workers.
  • BoundedSemaphore(4) — caps concurrent encrypt+upload ops to bound RAM at ~80 MB peak.
  • SHA-256 guidelines hash — version-tracks upload guidelines; forces re-acceptance on any change.
  • Background thread + 5-min cache — Wasabi storage stats never block the upload page.
  • OTP-protected download linkssecrets.token_urlsafe(32) token + one-time OTP per upload session.