Skip to content

feature/to library#7

Open
cperales wants to merge 41 commits intomainfrom
feature/to-library
Open

feature/to library#7
cperales wants to merge 41 commits intomainfrom
feature/to-library

Conversation

@cperales
Copy link
Owner

  • Fix: Handle ClientDisconnect in upload handlers
  • docker-compose config file updated, now included DB path
  • DB ignored for git tracking
  • error with argument detail in Response
  • Phase 5.2: S3-compatible API for resumable multipart uploads
  • S3 endpoint on /
  • S3 endpoint on /s3
  • access and secret key must be configured for S3 connection
  • GET method for /storage through S3 protocol
  • Fix S3 Sig V4 canonical query string to preserve percent-encoding
  • Avoid the double encoding
  • Handling ISO strings as date format in S3
  • Pre-signed thumbnail and file requests will now verify correctly (I hope)
  • Add debug logging for presigned URL signature mismatches
  • Fix: URL-decode presigned URL credentials before parsing
  • Fix: URL-decode X-Amz-SignedHeaders before splitting
  • Adapt rclone use: fix routing, remove TUS, add MD5 ETags, and improve S3 compatibility
  • More log to S3
  • Adding COPY to S3
  • More logging in the S3 endpoint
  • Entry point renamed from main.py to entrypoint.py, to avoid confusion with pythowncloud/main.py
  • Also mount S3 endpoint on /
  • entrypoint.py needs to be specified at Dockerfile copy
  • Tests added
  • Fix S3 CompleteMultipartUpload ETag comparison
  • Logging the S3 multipart
  • Fix race condition in concurrent S3 multipart part uploads
  • Remove debug logging from CompleteMultipartUpload
  • Docker test added to Github Actions
  • Implement S3 ListObjectsV2 pagination for flat listing (delimiter=)
  • Dummies variables added for Docker Health Github Action
  • Typo fixed
  • Indentation fixed
  • Tests on push, not only dev/main
  • Update docker healthcheck to handle curl errors gracefully
  • Hide .thumb folder from web browser listings
  • Remove /s3 prefix mount, update docs and architecture diagram
  • README updated, S3 API endpoint is at /storage/

cperales and others added 30 commits March 11, 2026 09:11
When clients disconnect during file uploads (WebDAV, File API, TUS),
the server was crashing with an unhandled ClientDisconnect exception.
Added try-except blocks to catch and gracefully handle these disconnects:

- WebDAV PUT handler: Logs warning, cleans up partial file, returns 400
- File API PUT handler: Logs warning, cleans up partial file, raises HTTP 400
- TUS PATCH handler: Logs warning, raises HTTP 400 to allow resumption

This prevents server crashes during slow uploads or unstable connections
(e.g., Android syncing over poor network).

Co-Authored-By: Claude Haiku 4.5 <[email protected]>
Implements AWS Signature V4 auth and S3 multipart upload support to enable
resumable uploads for rclone and S3Drive on flaky mobile connections.

New modules:
- pythowncloud/s3_auth.py: AWS Signature V4 HMAC-SHA256 verification
  (stdlib-only: hmac, hashlib, urllib.parse)
- pythowncloud/s3_xml.py: S3 XML response builders for all endpoints
- pythowncloud/routers/s3.py: Complete S3 API router (~750 lines)
- pythowncloud/uploads.py: Unified cleanup for TUS and S3 multipart uploads

Features implemented:
- Single-object ops: GET, PUT (with streaming & MD5), HEAD, DELETE
- Bucket ops: ListBuckets, HeadBucket, ListObjectsV2 (with prefix/delimiter)
- Multipart upload: Initiate, UploadPart, Complete, Abort, ListParts
- Empty directory creation (trailing-slash PUTs with DB upsert)
- Atomic completion (temp-file-then-rename pattern)
- Incremental SHA256 hashing during concatenation (Pi 3 optimization)
- Burst-aware thumbnail generation (integrates with Phase 3.2)
- S3-compliant XML with critical fields: <KeyCount>, <MaxKeys>, <IsTruncated>

Database additions:
- db.list_all_under(prefix): recursive flat file listing for S3 flat queries

Config additions:
- s3_access_key, s3_secret_key, s3_region

Refactoring:
- Moved cleanup_abandoned_uploads() from tus.py to unified uploads.py
- Removed TUS cleanup function from tus.py (25 lines)
- Updated main.py to import from unified uploads module
- Added S3 router mounting at /s3 prefix

All three APIs (REST, WebDAV, S3) now coexist on same port with shared
storage, database, cache, and thumbnail logic.

Co-Authored-By: Claude Haiku 4.5 <[email protected]>
parse_qs was decoding percent-encoded values (e.g. %2F → /) before
building the canonical query string, causing signature mismatches
with S3 clients like S3Drive. Now parses the raw query string directly
and applies URI-encoding per AWS Signature V4 spec.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Include path, query string, headers, and payload hash in warning logs
to debug pre-signed URL authentication failures.

Co-Authored-By: Claude Haiku 4.5 <[email protected]>
Presigned URLs have percent-encoded slashes in X-Amz-Credential (e.g., pythowncloud%2F20260311%2F...).
The credential must be decoded before splitting on '/' to extract access key, date, region, etc.
This was causing all presigned URL requests to fail with 'Invalid credential format' (403).

Co-Authored-By: Claude Haiku 4.5 <[email protected]>
Presigned URLs have percent-encoded header separators in X-Amz-SignedHeaders.
For example: 'host%3Bx-amz-meta-mtime' where %3B is the encoded semicolon.
Without decoding, split(';') fails to separate headers, resulting in empty headers_to_sign dict.
This caused all PUT requests with multiple signed headers to fail with signature mismatches.

Co-Authored-By: Claude Haiku 4.5 <[email protected]>
… S3 compatibility

This commit addresses 8 major issues for better rclone S3 integration:

1. **Route cleanup (main.py)**: Removed duplicate root-level mounts for S3 and WebDAV.
   Routes now live at single prefixes only: REST at /, WebDAV at /dav/, S3 at /s3/.

2. **Remove TUS entirely**: Deleted tus.py router. TUS has been replaced by S3 multipart
   which is now the only upload protocol. Updated uploads.py to only clean up S3 uploads.

3. **Merge duplicate GET routes (s3.py)**: Combined two GET /storage/{key:path} handlers
   into one, checking for ?uploadId= first (ListParts) before file download.

4. **Fix ETag consistency (db.py, s3.py, files.py, webdav.py, scanner.py, s3_xml.py)**:
   - Added md5 column to files table (migration-safe via ALTER TABLE)
   - Updated upsert_file() to accept and store md5 parameter
   - Modified all upload paths (s3 PUT, multipart completion, REST PUT, WebDAV PUT)
     to compute MD5 alongside SHA256
   - Updated scanner to compute both hashes during file scan
   - Changed ListObjectsV2 to return md5 field as ETag (instead of truncated SHA256)
   - Multipart uploads now store composite ETag as md5 (md5_of_concatenated-partcount)
   This fixes rclone re-uploading files on every sync.

5. **XML namespace fallback (s3.py)**: Added fallback for clients that don't send
   S3 XML namespace. Now handles both {http://s3...}Part and bare Part elements.

6. **Canonical URI encoding (s3_auth.py)**: Added proper _canonical_uri function
   implementing AWS Signature V4 spec with per-segment URI encoding (RFC 3986).

7. **Preserve prefix trailing slash (s3.py)**: Separated raw_prefix (for XML response)
   from prefix_for_db (for DB queries) in _list_objects_v2 to preserve trailing slashes
   in responses.

Files changed:
- pythowncloud/main.py: removed imports, routers, TUS setup
- pythowncloud/routers/tus.py: DELETED
- pythowncloud/uploads.py: simplified to S3-only cleanup
- pythowncloud/db.py: added md5 column migration, updated upsert_file signature
- pythowncloud/routers/s3.py: merged GET routes, XML namespace fallback,
  prefix handling, added md5 parameter to upsert_file calls
- pythowncloud/routers/files.py: added MD5 computation in streaming loop
- pythowncloud/routers/webdav.py: added MD5 computation in streaming loop
- pythowncloud/scanner.py: added MD5 to checksum function return type
- pythowncloud/s3_auth.py: added _canonical_uri function for proper path encoding
- pythowncloud/s3_xml.py: updated to use md5 field for ETag in listings

Co-Authored-By: Claude Haiku 4.5 <[email protected]>
Normalize ETags by stripping quotes before comparison. Client sends unquoted
ETags in XML but server stores them quoted, causing 400 errors on completion.

Add debug logging for ETag mismatches.

Co-Authored-By: Claude Haiku 4.5 <[email protected]>
Concurrent UploadPart requests all read the .meta file before streaming,
then overwrote it after — so only the last writer's part number survived.

Fix: move meta read inside a per-upload asyncio.Lock so the read-modify-write
is atomic. Lock is created on first part upload and cleaned up on complete/abort.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Clean up temporary debug logging used for troubleshooting the race condition.

Co-Authored-By: Claude Haiku 4.5 <[email protected]>
cperales and others added 11 commits March 17, 2026 18:06
Fixes S3Drive cache rebuild timeout by implementing keyset pagination
with base64-encoded continuation tokens for flat file listings.

Changes:
1. db.py: list_all_under() now accepts after_key and limit parameters
   - Pushes page boundary into SQLite (LIMIT max_keys+1)
   - Stops full table scans on large folders

2. routers/s3.py: _list_objects_v2() reads and processes tokens
   - Decodes base64 continuation-token from query param
   - Passes after_key cursor to DB (keyset pagination)
   - Computes next_continuation_token from last object path
   - Passes token to XML builder for truncated responses

3. s3_xml.py: build_list_objects_v2() now emits NextContinuationToken
   - Adds next_continuation_token parameter
   - Emits <NextContinuationToken> XML element when is_truncated=true

S3Drive no longer times out (70s) waiting for pagination tokens.

Co-Authored-By: Claude Haiku 4.5 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant