Frost's Blog
1224 字
6 分钟
PDM Internals(2)
2024-04-01

This article will introduce the lock strategy of PDM based on the current latest version 2.13. Read the Chinese version for your convenience.

How does PDM solve dependencies#

Under the hood, PDM uses a pure Python implementation of the PubGrub algorithm,named Resolvelib. If explained in plain language, its parsing process is roughly as follows:

  1. Select an unresolved dependency and get the list of all available versions.
  2. Starting from the latest version, obtain the dependencies for this version.
  3. Check if there are any conflicts between this version’s dependencies and the already resolved dependencies.
  4. If there is a conflict, try the next version.
  5. If there is no conflict, add this dependency’s version to the solution set.
  6. If all versions have been attempted but none can be solved, backtrack to the last pinned package and try the next version.
  7. Eventually, all dependencies will be solved and we will get a list of pinned versions that satisfy all dependencies.

After the resolution, PDM will write the result to the pdm.lock file. This file contains not only all dependency version information but also some other metadata.

Conditional dependencies#

Sometimes we need to install different package versions based on different conditions, which can be achieved using markers. For example:

pytest >= 7.0; python_version >= "3.6"
pytest < 7.0; python_version < "3.6"

However, PDM currently does not support solving this type of conditional dependencies. The reason is that the dependency solver in PDM implementation encodes the package name as the key in the solution set. In other words, each package has only one determined version in the solution set. I have to admit that this is indeed a major flaw of PDM and everyone is welcome to contribute code to solve this problem.

The metadata in pdm.lock#

The pdm.lock file is a TOML formatted file that contains some metadata in its [metadata] table, including:

[metadata]
groups = [
  "default",
  "all",
  "doc",
  "pytest",
  "test",
  "tox",
  "workflow"
]
strategy = [
  "cross_platform",
  "inherit_metadata"
]
lock_version = "4.4.1"
content_hash = "sha256:13270582f610302a77a5e1fef2192e1a65f5b6202cf15aedf12bf799de8de45c"

lock_version#

This is a three-digit version number indicating the compatibility of this lock file.The first number indicates backward incompatible changes. The second number indicates backward compatibile but forward incompatible changes, and the last number indicates backward and forward compatible changes. For example, if the current lock file version is 4.4.2, then:

  • Able to read: 4.3.0, 4.4.0, 4.4.3
  • Unable to read: 5.1.0, 3.0.0

Whenever the lock file is updated, PDM will write the current lock version to the file. With this version number, PDM can determine whether to attempt to read this lock file or prompt the user to regenerate the lock file.

groups#

This is stored to indicate which dependency groups the lock file was generated from. Each value in the list corresponds to a group in optional-dependencies or dev-dependencies in pyproject.toml.

When dependency resolution is complete, these groups are recorded in the lock file. When installing, PDM checks whether the requested installation group is included and aborts the installation if not.

content_hash#

Because the lock file corresponds to a set of initial inputs, that is, from which dependencies are resolved. In PDM, this input is the metadata written in pyproject.toml. content_hash is a sha256 checksum calculated from these contents. When your pyproject.toml changes, PDM will revalidate this value. If it finds that it does not match the value in the lock file, it will update the lock file for you and write the new content_hash into it.

Lock strategies#

The strategy field in the [metadata] table records the strategies being usd by the lock file, which is used to control the dependency resolution process.

cross_platform#

By default, PDM will write package files for all platforms to the lock file. For more information, please refer to the previous article (/en/2024/pdm-lockfile#cross-version-lock-and-lock-for-current-environment). However, sometimes we have to use the lock for current environment. One major reason is that some packages have different dependencies for different platforms, which can cause wrong locking results. In PDM, if you want to disable a lock strategy, just run:

pdm lock --strategy=no_cross_platform

This command will turn off the “cross_platform” strategy, and other strategies that are stored will not be affected.

static_urls#

By default, the files field of [[package]] only records the file names of package files, not URLs. The benefit of doing this is that users can freely switch to other PyPI mirror sources, and PDM will only check if the downloaded file name matches those in the lock file during installation. If the static_urls strategy is enabled, PDM will record the URL of package files, and it will directly download and install packages through these URLs during installation. This also facilitates some security audit tools to check the source of packages.

inherit_metadata#

Enabled by default, PDM will attempt to calculate the final markers for each package (see previous article for details). The benefit of this is that during installation, PDM only needs pdm.lock as the sole data source and does the installation by only traversing the lock file and evaluating the markers using Python’s standard library1, without relying on other components of PDM. If this strategy is disabled, instead, markers will not be recorded for packages. As a result, the information recorded in the lock file may not be sufficient for the installer to determine whether or not to install a package, resulting in slight changes during installation. To obtain a list of versions of packages that need to be installed finally, PDM will run another dependency resolution process based on the required dependencies lists along with dependency information from the lock files.

direct_minimal_versions#

By default, when resolving dependencies, the latest version is attempted first, resulting in a lock file that usually contains the newest possible package versions. However, sometimes we want to test library compatibility with the minimum version within a specific range. Enabling this strategy will cause PDM to attempt resolving dependencies from the minimum version and result in a lock file containing the minimum version’s dependency numbers.

--exclude-newer DATE#

In addition to the above strategies, pdm lock also supports a one-time option --exclude-newer. The function of this option is somewhat similar to a time machine. When a specific time or date is specified, PDM will skip package versions uploaded later than that time point when parsing the index. Using this option can make the lock file reproducible. It should be noted that the upload time of packages requires support from the package registry which must have implemented PEP 700. Otherwise, the package will be considered not meeting the requirements and will be ignored.

Update strategies#

When you try to update the package version in the lock file, PDM also provides different update strategies. These strategies can be specified through the --update-* option, and pdm add, pdm lock, and pdm update all support this set of options.

  • --update-all: Update all packages (direct and indirect dependencies) to the latest version, completely ignoring version stored in the lock file.
  • --update-reuse: Only update direct dependency versions and reuse indirect dependency versions from the lock file.
  • --update-eager: Update specified dependencies and their indirect dependencies to the latest version, reusing other dependency versions from the lock file.
  • --update-reuse-installed: Reuse currently installed versions as much as possible.

When updating package versions, the version range specified in pyproject.toml will still be respected. This behavior can be disabled by using the --unconstrained option to remove the version constraints.

So far, we have introduced a series of features and the logic behind the lock files around PDM. We hope this information can help you better understand how PDM works behind the scene.

Footnotes#

  1. Including the packaging library, because it contains implementations for many Python packaging standards and has become a core library in the Python packaging ecosystem.

PDM Internals(2)
https://frostming.com/en/2024/pdm-lock-strategy/
作者
Frost Ming
发布于
2024-04-01
许可协议
CC BY-NC-SA 4.0