Local swarm + podman support (webrecorder#261)

* backend: refactor swarm support to also support podman (webrecorder#260) - implement podman support as subclass of swarm deployment - podman is used when 'RUNTIME=podman' env var is set - podman socket is mapped instead of docker socket - podman-compose is used instead of docker-compose (though docker-compose works with podman, it does not support secrets, but podman-compose does) - separate cli utils into SwarmRunner and PodmanRunner which extends it - using config.yaml and config.env, both copied from sample versions - work on simplifying config: add docker-compose.podman.yml and docker-compose.swarm.yml and signing and debug configs in ./configs - add {build,run,stop}-{swarm,podman}.sh in scripts dir - add init-configs, only copy if configs don't exist - build local image use current version of podman, to support both podman 3.x and 4.x - additional fixes for after testing podman on centos - docs: update Deployment.md to cover swarm, podman, k8s deployment
urbanlab · Jun 14, 2022 · 418c07b · 418c07b
1 parent 68ec582
commit 418c07b
Show file tree

Hide file tree

Showing 40 changed files with 661 additions and 389 deletions.
diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml
@@ -24,7 +24,7 @@ jobs:
 
       - 
         name: Copy Configs
-        run: cp ./configs/config.sample.env ./configs/config.env; cp ./configs/storages.sample.yaml ./configs/storages.yaml
+        run: ./scripts/init-configs.sh
 
       -
         name: Build Backend

diff --git a/.gitignore b/.gitignore
@@ -1,6 +1,6 @@
 **/*.pyc
 **/node_modules/
 **/config.env
-configs/storages.yaml
+**/config.yaml
 **/signing.yaml
 .DS_Store
diff --git a/Deployment.md b/Deployment.md
@@ -1,60 +1,108 @@
 # Deploying Browsertrix Cloud
 
-Currently Browsertrix Cloud can be deployed in both Docker and Kubernetes.
+Browsertrix Cloud can be deployed anywhere from single-node isolated environments, multi-machine setups and cloud-native Kubernetes!
+
+Browsertrix Cloud currently supports three deployment methods:
+- Rootless deployment with podman on a single-machine (no Docker required)
+- Docker Swarm for single or multi-machine deployments
+- Kubernetes Cluster deployment.
 
 Some basic instructions are provided below, we plan to expand this into more detail tutorial in the future.
 
-## Deploying to Docker
+(All shell scripts can be found in the `./scripts` directory)
+
+## Deploying with Docker Swarm
+
+For local deployments, using Docker Swarm is recommended. Docker Swarm can be used in a single-machine mode as well
+as with multi-machine setups. Docker Swarm is part of Docker, so if you have Docker installed, you can use this method.
+
+1. Run the `init-configs.sh` which will copy the sample configs to `configs/config.env` and `configs/config.yaml`.
+
+2. You can edit `configs/config.env` and `configs/config.yaml` to set default passwords for superadmin, minio and mongodb.
+
+3. Run `run-swarm.sh` to initialize the cluster.
+
+4. Load `http://localhost:9871/` to see the Browsertrix Cloud login page. (The API is also available at: `http://localhost:9871/api/docs`).
+
+You can stop the deployment with `stop-swarm.sh` and restart again with `run-swarm.sh`
+
+
+Note: Currently, unless email settings are configured, you will need to look at the logs to get the invite code for invites. You can do this by running:
+`docker service logs btrix_backend`
+
+
+## Deploying with Podman
+
+Browsertrix Cloud can now also be used with Podman for environments that don't support Docker.
 
-For testing out Browsertrix Cloud on a single, local machine, the Docker Compose-based deployment is recommended.
+Podman allows Browsertrix Cloud to be deployed locally by a non-root user.
 
-To deploy via local Docker instance, copy the `config.sample.env` to `config.env`.
+Podman deployment also requires either docker-compose or podman-compose.
 
-Docker Compose is required.
 
-Then, run `docker-compose build; docker-compose up -d` to launch.
+### Initial Installation
 
-To update/relaunch, use `./docker-restart.sh`.
+To run with Podman as a non-root user, there's a few initial installation
 
-The API documentation should be available at: `http://localhost:9871/api/docs`.
+1. Ensure the podman service over a socket is running with: `systemctl --user start podman.socket`. Podman does not require a service, but Browsertrix Cloud requires access to the socket to worker.
 
-To allow downloading of WACZ files via the UI from a remote host, set the `STORE_ACCESS_ENDPOINT_URL` to use the domain of the host.
-Otherwise, the files are accesible only through the default Minio service running on port 9000.
+2. Ensure podman [can set cpu limits](https://github.com/containers/podman/blob/main/troubleshooting.md#26-running-containers-with-cpu-limits-fails-with-a-permissions-error) as Browsertrix Cloud uses cpu and memory limits for each crawl. After following instructions above, also run `sudo systemctl daemon-reload` to reload the delegate settings.
 
+3. Ensure podman-compose is installed via `pip install podman-compose`.
 
-Note: When deployed in local Docker, failed crawls are not retried currently. Scheduling is handled by a subprocess, which stores active schedule in the DB.
+4. Run `build-podman.sh` to build the local images.
 
+5. Run the `init-configs.sh` which will copy the sample configs to `configs/config.env` and `configs/config.yaml`.
 
-### Enabling Signing
+6. You can edit `configs/config.env` and `configs/config.yaml` to set default passwords for superadmin, minio and mongodb.
+
+7. Run `run-podman.sh` to run Browsertrix Cloud using podman.
+
+8. Load `http://localhost:9871/` to see the Browsertrix Cloud login page. (The API is also available at: `http://localhost:9871/api/docs`).
+
+
+You can stop the deployment with `stop-podman.sh` and restart again with `run-podman.sh`
+
+Note: Currently, unless email settings are configured, you will need to look at the logs to get the invite code for invites. You can do this by running:
+`podman logs -f browsertrix-cloud_backend_1`
+
+It's also possible to use Docker Compose with podman by setting `export DOCKER_HOST=unix://$XDG_RUNTIME_DIR/podman/podman.sock`. You can change the setting
+in `run-podman.sh` and `stop-podman.sh` to use docker-compose instead if desired.
+
+
+### Enabling Signing (for Swarm and Podman Deployments)
 
 Browsertrix Cloud can optionally sign WACZ files with the same key used to generate an SSL cert.
-To use this functionality, the machine running Browsertrix Cloud must be associated with a domain and must have port 80 available on that domain.
 
-To enable signing in the Docker-based deployment:
+To use this functionality, the machine running Browsertrix Cloud must be associated with a domain and must have port 80 available on that domain,
+or another port forwarding to port 80.
+
+The `docker-compose.signing.yml` adds the capability for signing with the `authsign` module.
 
-1) Copy `configs/signing.sample.yaml` to `configs/signing.yaml` and set the domain and email fields in the config. Set `staging` to false to generate real certificates.
+To enable signing in the Docker-based deployment:
 
-2) In `configs.config.env`, also uncomment `WACZ_SIGN_URL`.
+1. Copy `configs/signing.sample.yaml` to `configs/signing.yaml` and set the domain and email fields in the config. Set `staging` to false to generate real certificates.
 
+2. In `docker-compose.signing.yaml`, set an optional signing token.
 
-WACZ files created on minio should now be signed! Be sure to also set `STORE_ACCESS_ENDPOINT_URL` to get downloadable links from the UI downloads view.
+3. In `run-swarm.sh`, uncomment the option for running with signing.
 
 
-## Deploying to Kubernetes
 
-For deploying in the cloud and across multiple machines, the Kubernetes (k8s) deployment is recommended.
 
-To deploy to K8s, `helm` is required. Browsertrix Cloud comes with a helm chart, which can be installed as follows:
+## Deploying to Kubernetes
 
-`helm install -f ./chart/values.yaml btrix ./chart/`
+For deploying in the cloud, the Kubernetes (k8s) deployment is recommended.
+Browsertrix Cloud uses `helm` to deploy to K8s.
 
-This will create a `browsertrix-cloud` service in the default namespace.
 
-For a quick update, the following is recommended:
+1. Ensure `helm` is installed locally and `kubectl` is configured for your k8s cluster.
 
-`helm upgrade -f ./chart/values.yaml btrix ./chart/`
+2. Edit `chart/values.yaml` to configure your deployment. The `ingress` section contains the domain the service will be deployed in, and `signing` can be used to enable WACZ signing.
 
+3. Run: `helm upgrade --install -f ./chart/values.yaml btrix ./chart/` to deploy or upgrade an existing deployment.
 
-Note: When deployed in Kubernetes, failed crawls are automatically retried. Scheduling is handled via Kubernetes Cronjobs, and crawl jobs are run in the `crawlers` namespace.
 
+To stop, run `helm uninstall btrix`.
 
+*Additional info coming soon*
diff --git a/README.md b/README.md
@@ -10,7 +10,7 @@ and managing all aspects of crawling process. This system provides the orchestra
 while the actual crawling is performed using
 [Browsertrix Crawler](https://github.com/webrecorder/browsertrix-crawler) containers, which are launched for each crawl.
 
-The system is designed to run equally in Kubernetes and Docker.
+The system is designed to run in both Kubernetes and Docker Swarm, as well as locally under Podman.
 
 See [Features](https://browsertrix.cloud/features) for a high-level list of planned features.
 
@@ -21,7 +21,7 @@ See the [Deployment](Deployment.md) page for information on how to deploy Browse
 
 ## Development Status
 
-Browsertrix Cloud is currently in pre-alpha stages and not ready for production. This is an ambitious project and there's a lot to be done!
+Browsertrix Cloud is currently in an alpha stage and not ready for production. This is an ambitious project and there's a lot to be done!
 
 If you would like to help in a particular way, please open an issue or reach out to us in other ways.
 

diff --git a/backend/Dockerfile b/backend/Dockerfile
@@ -1,3 +1,7 @@
+ARG PODMAN_VERSION=4
+
+FROM docker.io/mgoltzsche/podman:${PODMAN_VERSION}-remote as podmanremote
+
 FROM python:3.9
 
 WORKDIR /app
@@ -10,5 +14,7 @@ RUN python-on-whales download-cli
 
 ADD btrixcloud/ /app/btrixcloud/
 
+COPY --from=podmanremote /usr/local/bin/podman-remote /usr/bin/podman
+
 CMD uvicorn btrixcloud.main:app_root --host 0.0.0.0 --access-log --log-level info
 
diff --git a/backend/btrixcloud/archives.py b/backend/btrixcloud/archives.py
@@ -120,8 +120,8 @@ async def serialize_for_user(self, user: User, user_manager):
 class ArchiveOps:
     """Archive API operations"""
 
-    def __init__(self, db, invites):
-        self.archives = db["archives"]
+    def __init__(self, mdb, invites):
+        self.archives = mdb["archives"]
 
         self.router = None
         self.archive_viewer_dep = None

diff --git a/backend/btrixcloud/crawl_job.py b/backend/btrixcloud/crawl_job.py
@@ -58,6 +58,7 @@ def __init__(self):
         self.finished = None
 
         self._cached_params = {}
+        self._files_added = False
 
         params = {
             "cid": self.cid,
@@ -188,6 +189,12 @@ async def finish_crawl(self):
         if self.finished:
             return
 
+        # check if one-page crawls actually succeeded
+        # if only one page found, and no files, assume failed
+        if self.last_found == 1 and not self._files_added:
+            await self.fail_crawl()
+            return
+
         self.finished = dt_now()
 
         completed = self.last_done and self.last_done == self.last_found
@@ -283,6 +290,7 @@ async def add_file_to_crawl(self, cc_data):
                 "$push": {"files": crawl_file.dict()},
             },
         )
+        self._files_added = True
 
         return True
 

diff --git a/backend/btrixcloud/crawlconfigs.py b/backend/btrixcloud/crawlconfigs.py
@@ -335,6 +335,7 @@ async def update_crawl_config(self, cid: uuid.UUID, update: UpdateCrawlConfig):
 
     async def get_crawl_configs(self, archive: Archive):
         """Get all crawl configs for an archive is a member of"""
+        # pylint: disable=duplicate-code
         cursor = self.crawl_configs.aggregate(
             [
                 {"$match": {"aid": archive.id, "inactive": {"$ne": True}}},

diff --git a/backend/btrixcloud/crawls.py b/backend/btrixcloud/crawls.py
@@ -191,6 +191,7 @@ async def list_crawls(
         if running_only:
             query["state"] = {"$in": ["running", "starting", "stopping"]}
 
+        # pylint: disable=duplicate-code
         aggregate = [
             {"$match": query},
             {

diff --git a/backend/btrixcloud/invites.py b/backend/btrixcloud/invites.py
@@ -50,8 +50,8 @@ class InviteToArchiveRequest(InviteRequest):
 class InviteOps:
     """ invite users (optionally to an archive), send emails and delete invites """
 
-    def __init__(self, db, email):
-        self.invites = db["invites"]
+    def __init__(self, mdb, email):
+        self.invites = mdb["invites"]
         self.email = email
 
     async def add_new_user_invite(
@@ -95,7 +95,6 @@ async def remove_invite(self, invite_token: str):
         """ remove invite from invite list """
         await self.invites.delete_one({"_id": invite_token})
 
-    # pylint: disable=no-self-use
     def accept_user_invite(self, user, invite_token: str):
         """ remove invite from user, if valid token, throw if not """
         invite = user.invites.pop(invite_token, "")

diff --git a/backend/btrixcloud/k8s/base_job.py b/backend/btrixcloud/k8s/base_job.py
@@ -31,7 +31,7 @@ def __init__(self):
 
     async def init_job_objects(self, template, extra_params=None):
         """ init k8s objects from specified template with given extra_params """
-        with open(self.config_file) as fh_config:
+        with open(self.config_file, encoding="utf-8") as fh_config:
             params = yaml.safe_load(fh_config)
 
         params["id"] = self.job_id

diff --git a/backend/btrixcloud/k8s/k8sman.py b/backend/btrixcloud/k8s/k8sman.py
@@ -144,7 +144,6 @@ async def _create_from_yaml(self, _, yaml_data):
         """ create from yaml """
         await create_from_yaml(self.api_client, yaml_data, namespace=self.namespace)
 
-    # pylint: disable=no-self-use
     def _secret_data(self, secret, name):
         """ decode secret data """
         return base64.standard_b64decode(secret.data[name]).decode()

diff --git a/backend/btrixcloud/swarm/base_job.py b/backend/btrixcloud/swarm/base_job.py
@@ -9,19 +9,21 @@
 
 from fastapi.templating import Jinja2Templates
 
-from .utils import get_templates_dir, run_swarm_stack, delete_swarm_stack
+from .utils import get_templates_dir, get_runner
 from ..utils import random_suffix
 
+runner = get_runner()
+
 
 # =============================================================================
 # pylint: disable=too-many-instance-attributes,bare-except,broad-except
 class SwarmJobMixin:
     """ Crawl Job State """
 
     def __init__(self):
+        self.secrets_prefix = "/var/run/secrets/"
         self.shared_config_file = os.environ.get("SHARED_JOB_CONFIG")
         self.custom_config_file = os.environ.get("CUSTOM_JOB_CONFIG")
-        self.shared_secrets_file = os.environ.get("STORAGE_SECRETS")
 
         self.curr_storage = {}
 
@@ -39,15 +41,14 @@ def __init__(self):
         self.prefix = os.environ.get("STACK_PREFIX", "stack-")
 
         if self.custom_config_file:
-            self._populate_env("/" + self.custom_config_file)
+            self._populate_env(self.secrets_prefix + self.custom_config_file)
 
         self.templates = Jinja2Templates(directory=get_templates_dir())
 
         super().__init__()
 
-    # pylint: disable=no-self-use
     def _populate_env(self, filename):
-        with open(filename) as fh_config:
+        with open(filename, encoding="utf-8") as fh_config:
             params = yaml.safe_load(fh_config)
 
         for key in params:
@@ -61,7 +62,9 @@ async def init_job_objects(self, template, extra_params=None):
         loop.add_signal_handler(signal.SIGUSR1, self.unschedule_job)
 
         if self.shared_config_file:
-            with open("/" + self.shared_config_file) as fh_config:
+            with open(
+                self.secrets_prefix + self.shared_config_file, encoding="utf-8"
+            ) as fh_config:
                 params = yaml.safe_load(fh_config)
         else:
             params = {}
@@ -71,18 +74,7 @@ async def init_job_objects(self, template, extra_params=None):
         if extra_params:
             params.update(extra_params)
 
-        if (
-            os.environ.get("STORAGE_NAME")
-            and self.shared_secrets_file
-            and not self.curr_storage
-        ):
-            self.load_storage(
-                f"/var/run/secrets/{self.shared_secrets_file}",
-                os.environ.get("STORAGE_NAME"),
-            )
-
-        if self.curr_storage:
-            params.update(self.curr_storage)
+        params["storage_name"] = os.environ.get("STORAGE_NAME", "default")
 
         await self._do_create(loop, template, params)
 
@@ -94,36 +86,25 @@ async def delete_job_objects(self, _):
         if not self.is_scheduled or self.remove_schedule:
             print("Removed other objects, removing ourselves", flush=True)
             await loop.run_in_executor(
-                None, delete_swarm_stack, f"job-{self.orig_job_id}"
+                None, runner.delete_service_stack, f"job-{self.orig_job_id}"
             )
         else:
             sys.exit(0)
 
         return True
 
     def unschedule_job(self):
-        """ mark job as unscheduled"""
+        """ mark job as unscheduled """
         print("Unscheduled, will delete when finished", flush=True)
         self.remove_schedule = True
 
-    def load_storage(self, filename, storage_name):
-        """ load storage credentials for given storage from yaml file """
-        with open(filename) as fh_config:
-            data = yaml.safe_load(fh_config.read())
-
-        if not data or not data.get("storages"):
-            return
-
-        for storage in data["storages"]:
-            if storage.get("name") == storage_name:
-                self.curr_storage = storage
-                break
-
     async def _do_create(self, loop, template, params):
         data = self.templates.env.get_template(template).render(params)
         return await loop.run_in_executor(
-            None, run_swarm_stack, self.prefix + self.job_id, data
+            None, runner.run_service_stack, self.prefix + self.job_id, data
         )
 
     async def _do_delete(self, loop):
-        await loop.run_in_executor(None, delete_swarm_stack, self.prefix + self.job_id)
+        await loop.run_in_executor(
+            None, runner.delete_service_stack, self.prefix + self.job_id
+        )