Skip to content

import-paperless script fails due to mixed API paths (/api vs /app/api) and async processing in Docspell 0.43 #3197

@voyager

Description

@voyager

Hi Docspell team,

I'm migrating from Paperless using tools/import-paperless.sh (originally written for Docspell 0.3 beta) on a UGREEN NAS (Docker Compose) and encountered several issues with the current import script:

  1. Mixed API base paths
  1. Asynchronous document processing not handled
  • Upload endpoint /app/api/v1/sec/upload/item returns success immediately but documents are queued for processing
  • The script attempts to set metadata immediately after upload, but documents don't have IDs yet
  • Processing can take 15-30+ minutes per document (OCR, NLP analysis, etc.)
  • Solution: Implemented two-pass approach:
    • Pass 1 (mode=upload): Upload all documents quickly
    • Wait for processing queue to complete
    • Pass 2 (mode=metadata): Apply all metadata using /checkfile/{checksum} to get document IDs
  1. Upload payload format
  • The script was using file=@path but web UI uses file[]=@path with specific metadata JSON
  • Required metadata structure:
    {"multiple":true,"flattenArchives":false,"direction":"incoming","folder":null,"skipDuplicates":true,"tags":null,"fileFilter":null,"language":null,"attachmentsOnly":null,"customData":null}
  1. Shell quoting issues
  • Original script had nested quote problems and missing fields (e.g., "use":"correspondent" for
    organizations)
  • Fixed by using jq -n to build JSON payloads safely

Environment

  • Platform: UGREEN NAS (Docker Compose)
  • Docspell version: 0.43.0
  • UI base: http://localhost:7880/app/dashboard
  • Auth endpoint: /api/v1/open/auth/login
  • Secured endpoints: /app/api/v1/sec/...

Working solution summary

Authentication:

payload=$(printf '{"account":"%s","password":"%s"}' "$user" "$password")
curl -s -X POST -H 'Content-Type: application/json' \
  -d "$payload" "http://localhost:7880/api/v1/open/auth/login"

Organization create (with required "use" field):

payload=$(jq -n --arg name "$org_name" \
  '{id: "", name: $name, address: {street: "", zip: "", city: "", country: ""}, contacts: [], notes: null, 
created: 0, shortName: null, use: "correspondent"}')
curl -s -X POST -H "X-Docspell-Auth: $token" \
  -H 'Content-Type: application/json' \
  -d "$payload" "http://localhost:7880/app/api/v1/sec/organization"

Document upload (matching web UI format):

meta_json='{"multiple":true,"flattenArchives":false,"direction":"incoming","folder":null,"skipDuplicates":true,"tags":null,"fileFilter":null,"language":null,"attachmentsOnly":null,"customData":null}'
curl -s -X POST -H "X-Docspell-Auth: $token" \
  -F "meta=$meta_json" \
  -F "file[]=@$filepath" \
  "http://localhost:7880/app/api/v1/sec/upload/item"

Check processing status:

  curl -s -H "X-Docspell-Auth: $token" \
    "http://localhost:7880/app/api/v1/sec/checkfile/$checksum"

Returns: {"exists":false} while processing, {"exists":true,"items":[{"id":"..."}]} when done.

Results

I successfully imported 51 files end‑to‑end using the two‑pass approach (upload first, then metadata after processing) with the following commands:

Pass 1: upload (fast, queues processing)

./import-paperless.sh \
  http://localhost:7880 \
  your_user \
  'YOUR_PASSWORD' \
  /home/user/paperless/data/db.sqlite3 \
  /home/user/paperless/media/documents/originals \
  upload

Pass 2: metadata (after processing finishes)

./import-paperless.sh \
  http://localhost:7880 \
  your_user \
  'YOUR_PASSWORD' \
  /home/user/paperless/data/db.sqlite3 \
  /home/user/paperless/media/documents/originals \
  metadata

Note: This run was orchestrated with Claude Code; the core changes were aligning paths (auth at /api/v1/open/auth/login, secured endpoints under /app/api/v1/sec/…), switching JSON payloads to printf/jq -n, fixing upload format to file[] with proper metadata JSON, and deferring metadata until items had IDs via /checkfile/{checksum}. Also, I did disable my 2FA in the process, to make things simpler.

Request

  • Please clarify if the /api vs /app/api split is intended behavior when UI is served from /app
  • Consider updating import-paperless.sh to:
    a. Support two-pass mode (upload, then metadata after processing)
    b. Use correct API paths consistently
    c. Include required fields like "use":"correspondent" for organizations
    d. Use proper upload format (file[] with metadata JSON)
    e. Build JSON via jq or printf to avoid quoting issues
    f. Handle missing bc command (use $((seconds * 1000)) instead)

Attached is the modified version of totti4ever's original import-paperless.sh that successfully imported the documents with full metadata from Paperless-ngx v2.18.4.

Thanks for the excellent project!

import-paperless.sh

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions