-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
Provide a screenshot and describe the bug
When I submit multiple URLs for archival, it's very serial which I think is intentional and good.
However, when I select multiple snapshots and click "Archive Again", it's very noticeably done in parallel and breaks the Chromium profile. It'll sometimes leave the Chromium lock files in /data/personas/Default/chrome_profile/Singleton* which prevents future Chromium launches. Pretty much all archival attempts fail on the second run and even single URLs will fail after triggering the Chromium lockfile issue.
I'm not exactly sure what the knock-on effects are but the following fail very consistently once I get into this state:
- archive_org
- htmltotext
- readability
- title
- dom
- screenshot
- singlefile
Workaround is to delete the Chrome profile and only submit one URL at any time:
rm -r data/personas/
Steps to reproduce
docker run -v "./data/:/data/" archivebox/archivebox:dev archivebox initdocker run -v "./data/:/data/" -it archivebox/archivebox:dev archivebox manage createsuperuserdocker run -p "8000:8000" -v "./data/:/data/" archivebox/archivebox:dev- Visit http://localhost:8000/add/ and add 3 urls, all at once:
- Let those finish archiving for like 2 minutes
- Select all and "Archive Again" from http://localhost:8000/admin/core/snapshot/
Logs or errors
From `worker_scheduler.log`:
> screenshot
Extractor failed:
Failed to save screenshot
[1437:1437:1118/224805.720253:ERROR:process_singleton_posix.cc(340)]
Failed to create /data/personas/Default/chrome_profile/SingletonLock: File
exists (17)
[1437:1437:1118/224805.720577:ERROR:chrome_main_delegate.cc(594)]
Failed to create a ProcessSingleton for your profile directory. This means that
running multiple instances would start multiple browser processes rather than
opening a new window in the existing process. Aborting now to avoid profile
corruption.ArchiveBox Version
0.8.5rc51
ArchiveBox v0.8.5rc51 COMMIT_HASH=63bf902 BUILD_TIME=2024-10-24 06:30:40
1729751440
IN_DOCKER=True IN_QEMU=False ARCH=aarch64 OS=Linux
PLATFORM=Linux-6.10.4-linuxkit-aarch64-with-glibc2.36 PYTHON=Cpython
EUID=911:0 UID=911:0 PUID=911:0 FS_UID=911:0 FS_PERMS=644 FS_ATOMIC=True
FS_REMOTE=True
DEBUG=False IS_TTY=False SUDO=False ID=9f373648:efbea00e SEARCH_BACKEND=ripgrep
LDAP=False
Binary Dependencies:
√ python 3.11.10 sys_pip /usr/local/bin/python3.11
√ django 5.1.2 sys_pip /usr/local/lib/python3.11/site-packages/django/__init__.py
√ sqlite 2.6.0 sys_pip /usr/local/lib/python3.11/site-packages/django/db/backends/sqlite3/base.py
√ pip 24.0.0 sys_pip /usr/local/bin/pip
√ pipx 1.1.0 sys_pip /usr/bin/pipx
√ node 22.10.0 apt /usr/bin/node
√ npm 10.9.0 apt /usr/bin/npm
√ npx 10.9.0 apt /usr/bin/npx
√ playwright 1.48.0 sys_pip /usr/local/bin/playwright
√ puppeteer 23.6.0 lib_npm ~/.npm/bin/puppeteer
√ ldap 3.4.4 sys_pip /usr/local/lib/python3.11/site-packages/ldap/__init__.py
√ rg 13.0.0 apt /usr/bin/rg
√ sonic 1.4.9 env /usr/local/bin/sonic
√ chrome 130.0.6723 env /usr/bin/chromium-browser
√ curl 8.10.1 apt /usr/bin/curl
√ git 2.39.5 apt /usr/bin/git
√ postlight-parser 2.2.3 sys_npm ~/.npm/bin/postlight-parser
√ readability-extractor 0.0.11 lib_npm ~/.npm/bin/readability-extractor
√ single-file 1.1.54 lib_npm ~/.npm/bin/single-file
√ wget 1.21.3 apt /usr/bin/wget
√ yt-dlp 2024.10.22 sys_pip /usr/local/bin/yt-dlp
√ ffmpeg 5.1.6 env /usr/bin/ffmpeg
Package Managers:
√ env /usr/bin/which UID=911 P…
√ apt /usr/bin/apt-get UID=0 P…
- brew not available UID=911 P…
√ sys_pip /usr/local/bin/pip UID=911 P…
- venv_pip not available UID=911 P…
- lib_pip not available UID=911 P…
√ sys_npm /usr/bin/npm UID=911 P…
- lib_npm /usr/bin/npm UID=911 P…
√ playwright /usr/local/bin/playwright UID=0 P…
√ puppeteer /usr/bin/npx UID=911 P…
Code locations:
√ PACKAGE_DIR 39 files valid /app/archivebox
√ TEMPLATES_DIR 4 files valid /app/archivebox/templates
- CUSTOM_TEMPLATES_DIR missing unused ./user_templates
- USER_PLUGINS_DIR missing unused ./user_plugins
√ LIB_DIR 0 files valid /usr/share/archivebox/lib
Data locations:
√ DATA_DIR 17 files @ valid /data
√ CONFIG_FILE 139.0 Bytes valid ./ArchiveBox.conf
√ SQL_INDEX 476.0 KB valid ./index.sqlite3
√ QUEUE_DATABASE 92.0 KB valid ./queue.sqlite3
√ ARCHIVE_DIR 9 files valid ./archive
√ SOURCES_DIR 6 files valid ./sources
√ PERSONAS_DIR 2 files valid ./personas
√ LOGS_DIR 5 files valid ./logs
√ TMP_DIR 0 files valid /tmp/archiveboxHow did you install the version of ArchiveBox you are using?
Docker (or other container system like podman/LXC/Kubernetes or TrueNAS/Cloudron/YunoHost/etc.)
What operating system are you running on?
macOS (including Docker on macOS)
What type of drive are you using to store your ArchiveBox data?
-
data/is on a local SSD or NVMe drive -
data/is on a spinning hard drive or external USB drive -
data/is on a network mount (e.g. NFS/SMB/CIFS/etc.) -
data/is on a FUSE mount (e.g. SSHFS/RClone/S3/B2/OneDrive, etc.)
Docker Compose Configuration
N/AArchiveBox Configuration
# Converted from INI to TOML format: https://toml.io/en/
[SERVER_CONFIG]
SECRET_KEY = "abcdefg"