Skip to content

Proposal: Documenting CPython Reference Counting Semantics via Automated Analysis #142618

@wr-web

Description

@wr-web

Hello everyone,

I’ve been working on a third-party project to systematically document CPython’s reference counting semantics—including internal APIs—through automated analysis. So far, I’ve collected 1,534 entries covering a variety of functions. The analysis is largely automated, with an estimated accuracy of around 90%, though full manual verification is still ongoing.

Current Design & Format
The data is structured in JSON to facilitate processing, integration, and further tooling. Each entry includes the function name and its reference semantics (e.g., “return new reference,” “stealing reference,” “return borrowed reference,” etc.).

Example structure:

[
    {
        "function": "mocked_funcA",
        "semantics": [
            { "semantic": "return new reference" },
            { "semantic": "stealing reference", "stealing param": "0" },
            { "semantic": "stealing reference", "stealing param": "1" },
            { "semantic": "stealing reference", "stealing param": "2" }
        ]
    },
    {
        "function": "mocked_funcB",
        "semantics": [
            { "semantic": "return borrowed reference" }
        ]
    },
    {
        "function": "mocked_funcC",
        "semantics": [
            { "semantic": "return immortal reference" }
        ]
    }
]

Sample from the current dataset:

   {
        "name": "_PyDict_GetItemRef_KnownHash_LockHeld",
        "semantics": [
            {
                "semantic": "return a new reference via an output pointer parameter",
                "new ptr param": 3
            }
        ]
    },

Purpose & Hope for Collaboration
This dataset aims to serve as a machine-readable reference for developers working with CPython’s C API, aiding in debugging, static analysis, and tooling development.

I would love for the community to:

  1. Review and discuss the approach and structure.
  2. Help validate entries, especially for edge cases or internal APIs.
  3. Consider whether something like this could be useful as a supplemental resource or possibly integrated into CPython’s documentation ecosystem in the future.

The full JSON file and partial analysis code are available here:
CPython_PyAPI_FUNC_RF_Semantics

Looking forward to your thoughts, feedback, and hopefully a lively discussion!

Metadata

Metadata

Assignees

No one assigned

    Labels

    docsDocumentation in the Doc dir

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions