Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade free-threading CI to run with pytest-freethreaded instead of pytest-xdist #30007

Open
ogrisel opened this issue Oct 4, 2024 · 32 comments
Labels
Build / CI free-threading PRs and issues related to support for free-threaded CPython (a.k.a. nogil or no-GIL, PEP 703) module:test-suite everything related to our tests Needs Decision Requires decision

Comments

@ogrisel
Copy link
Member

ogrisel commented Oct 4, 2024

There is a new kid on the block that should help us find out whether scikit-learn and its dependencies can be reliably considered free-threading compatible:

https://pypi.org/project/pytest-freethreaded/

Let's try to adopt it in scikit-learn.

Here is a possible plan:

  • first run the tests locally a few times and see if they are tests (or a set of interacting tests) that cause a crash or a failure, open an issue for each of them, possibly upstream and then mark them as skipped under free-threading builds with a reference to the issue in the "reason" field;
  • then upgrade our nightly free-threading scheduled CI run to use pytest-freethreaded.

Any comments @lesteve @jeremiedbb @ngoldbaum?

EDIT: anyone interested in getting hands on the first item can find this resource useful:

https://py-free-threading.github.io/

EDIT 2: there is also the pytest-run-parallel plugin that can serve a similar purpose.

@ogrisel ogrisel added free-threading PRs and issues related to support for free-threaded CPython (a.k.a. nogil or no-GIL, PEP 703) Needs Decision Requires decision labels Oct 4, 2024
@ogrisel
Copy link
Member Author

ogrisel commented Oct 7, 2024

Discussing with @lesteve we have a problem with test fixture that use on sklearn.with_config because the fixture is executed in the main thread while the test is run in a different thread: since sklearn.with_config uses thread locals, the configuration change of the fixture is not visible in the test run in a parallel thread.

Maybe there is a problem of scoping the fixture correctly, and somehow telling pytest to run the fixture in the thread running the test.

@ogrisel
Copy link
Member Author

ogrisel commented Oct 7, 2024

I think that for a function-scoped fixture (the default fixture scope), it would make sense to ensure that the parallel runner runs the fixture on the same thread as the test function it-self.

Shall we open an issue upstream in the issue tracker of the pytest-freethreaded repo?

@lesteve
Copy link
Member

lesteve commented Oct 7, 2024

Here is a somewhat simple test file that reproduces the issue. Note this happens on vanilla CPython 3.13 not only free-threaded CPython 3.13.

As you can tell from the output, the fixture is run in the main thread and the test function in another thread (threadpool created by the pytest plugin).

# test_simple.py
import threading

import pytest

thread_local = threading.local()


@pytest.fixture
def change_thread_local():
    print("inside fixture", threading.current_thread())
    setattr(thread_local, "value", 1)
    yield


def test_simple(change_thread_local):
    print("inside function", threading.current_thread())
    assert getattr(thread_local, "value", None) == 1
pytest --threads 1 --iterations 1 test_simple.py

Output:

============================================================================================================= test session starts ==============================================================================================================
platform linux -- Python 3.13.0rc3, pytest-8.3.3, pluggy-1.5.0
rootdir: /tmp/proj
plugins: xdist-3.6.1, freethreaded-0.1.0
collected 1 item                                                                                                                                                                                                                               

test_simple.py F                                                                                                                                                                                                                         [100%]

=================================================================================================================== FAILURES ===================================================================================================================
_________________________________________________________________________________________________________________ test_simple __________________________________________________________________________________________________________________

change_thread_local = None

    def test_simple(change_thread_local):
        print("inside function", threading.current_thread())
>       assert getattr(thread_local, "value", None) == 1
E       AssertionError: assert None == 1
E        +  where None = getattr(<_thread._local object at 0x7b3843ff32e0>, 'value', None)

test_simple.py:17: AssertionError
------------------------------------------------------------------------------------------------------------ Captured stdout setup -------------------------------------------------------------------------------------------------------------
inside fixture <_MainThread(MainThread, started 135481613739584)>
------------------------------------------------------------------------------------------------------------- Captured stdout call -------------------------------------------------------------------------------------------------------------
inside function <Thread(ThreadPoolExecutor-0_0, started 135481585043136)>
=========================================================================================================== short test summary info ============================================================================================================
FAILED test_simple.py::test_simple - AssertionError: assert None == 1
============================================================================================================== 1 failed in 0.03s ===============================================================================================================

@ngoldbaum
Copy link
Contributor

ping @andfoy

@andfoy
Copy link

andfoy commented Oct 8, 2024

Hi, from the SciPy side, We've been testing using pytest-run-parallel, which also supports untittest-style classes, it allows to mark individual tests as well as whole test suites to be run using multiple threads: https://github.com/Quansight-Labs/pytest-run-parallel

It also fully supports fixtures

@ogrisel
Copy link
Member Author

ogrisel commented Oct 8, 2024

I tried @lesteve's reproducer with pytest-run-parallel and I get exactly the same problem:

$ pytest --parallel-threads 2 test_thread_local_fixture.py -v
======================================= test session starts ========================================
platform darwin -- Python 3.12.5, pytest-8.3.2, pluggy-1.5.0 -- /Users/ogrisel/miniforge3/envs/dev/bin/python3.12
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase(PosixPath('/private/tmp/testfolder/.hypothesis/examples'))
rootdir: /private/tmp/testfolder
plugins: repeat-0.9.2, hypothesis-6.112.2, anyio-4.4.0, run-parallel-0.1.0, xdist-3.6.1
collected 1 item                                                                                   

test_thread_local_fixture.py::test_simple FAILED                                             [100%]

============================================= FAILURES =============================================
___________________________________________ test_simple ____________________________________________

change_thread_local = None

    def test_simple(change_thread_local):
        print("inside function", threading.current_thread())
>       assert getattr(thread_local, "value", None) == 1
E       AssertionError: assert None == 1
E        +  where None = getattr(<_thread._local object at 0x103d6aac0>, 'value', None)

test_thread_local_fixture.py:17: AssertionError
-------------------------------------- Captured stdout setup ---------------------------------------
inside fixture <_MainThread(MainThread, started 8595033088)>
--------------------------------------- Captured stdout call ---------------------------------------
inside function <Thread(Thread-2 (closure), started 6199488512)>
inside function <Thread(Thread-1 (closure), started 6182662144)>
===================================== short test summary info ======================================
FAILED test_thread_local_fixture.py::test_simple - AssertionError: assert None == 1
======================================== 1 failed in 0.14s =========================================

The function-fixture is executed in a different thread (the main thread) than the thread used to execute the test function.

@ogrisel
Copy link
Member Author

ogrisel commented Oct 8, 2024

I will open an issue in https://github.com/tonybaloney/pytest-freethreaded with Loic's reproducer.

@ogrisel
Copy link
Member Author

ogrisel commented Oct 8, 2024

@andfoy let me know if you want me to open a similar issue in https://github.com/Quansight-Labs/pytest-run-parallel.

@andfoy
Copy link

andfoy commented Oct 8, 2024

I think the issue at hand is that a separate threaded fixture is required here, which should be different from a global, pytest fixture. However, from a concurrency/distribution/parallelism point of view, each thread should get its own local atomic state, which should be independent from each other. As pytest fixtures are spawned globally from the main thread, the case exposed by @lesteve is the expected behaviour rather than a bug.

In order to fix this, there could be 2 different avenues:

  1. Extend the @pytest.fixture decorator to create a new @threaded_fixture that allows for the creation of per-thread fixtures. Challenges here could include interfacing with global pytest fixtures.
  2. Explicitly initialize each thread state in the test function, this ensures that best practices regarding explicit local state when doing concurrent computing are followed. Right now, on TST: Use pytest-run-parallel against free-threaded CI scipy/scipy#21496, many of the issues have been related to tests that were written using global state, which were useful when the GIL serialized code, but they obscure the execution workflow when brought into a concurrent stage.

I support going for the second option rather than the first one, since it invites developers to think about concurrency, parallel and concurrent constructs explicitly, as opposed to having to rely on a separate machinery to (apparently) do the job without thinking about what is going on underneath. This is of uttermost importance if free-threaded is going to be the future of Python at some point.

@ngoldbaum
Copy link
Contributor

It's worth keeping in mind as well that Pytest itself is not thread safe and I don't think the maintainers are particularly open to making it thread safe. More on this under this page: https://docs.pytest.org/en/stable/explanation/flaky.html

@lesteve
Copy link
Member

lesteve commented Oct 9, 2024

I guess this is towards option 1. but pytest-parallel (archived and no longer maintained) seems run the fixture and the test function in the same thread, see tonybaloney/pytest-freethreaded#17 (comment) for more details.

@ogrisel
Copy link
Member Author

ogrisel commented Oct 9, 2024

Thanks for your feedback @andfoy.

  1. Explicitly initialize each thread state in the test function, this ensures that best practices regarding explicit local state when doing concurrent computing are followed.

I am trying to see what that would entail for scikit-learn.

@lesteve I tried to grep our source code to find all the occurrences of with sklearn.config_context(): in a fixture and I only found enable_slep006. However, this fixture alone already impacts many tests:

https://github.com/search?q=repo%3Ascikit-learn%2Fscikit-learn+enable_slep006&type=code

Each of those tests would need to have its body indented under a with sklearn.config_context(enable_metadata_routing=True): block. From a verbosity point of view, this is not too bad, but that requirement for an extra indentation level might be a bit annoying for the readability of tests that would already need several indentation levels in their body.

So some form of thread-local fixtures would be nice in my opinion.

@ogrisel
Copy link
Member Author

ogrisel commented Oct 9, 2024

Or maybe we could just write our own test function decorator instead of using a pytest fixture.

EDIT: the following might work for us:

import functools
import sklearn


def enable_metadata_routing(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        with sklearn.config_context(enable_metadata_routing=True):
            return func(*args, **kwargs)

    return wrapper

I will open a quick PR.

EDIT 2: actually context managers can directly be used as decorators. So the following works and is even cleaner:

@sklearn.config_context(enable_metadata_routing=True)
def test_function():
    pass

@ogrisel
Copy link
Member Author

ogrisel commented Oct 9, 2024

Note that we will still have problems with non scikit-learn fixtures such as capsys apparently...

Furthermore, the link to https://docs.pytest.org/en/stable/explanation/flaky.html posted by @ngoldbaum above also mentions that pytest.warns(), pytest.raises() are not thread-safe. This would be a problem for us because we use those quite extensively...

EDIT: I confirm that tests that use pytest.warns() randomly fail when run in parallel.

@ogrisel
Copy link
Member Author

ogrisel commented Oct 9, 2024

Maybe we need a way to mark some tests as inherently non thread-safe (e.g. if they rely on capsys or any other non-thread safe utilities). For instance, with a dedicated @pytest.mark.threadunsafe decorator?

Thread parallel pytest runners such as pytest-run-parallel or pytest-freethreaded would then make sure that all those tests are grouped to be run sequentially (in the same thread, possibly even the main thread).

@ngoldbaum
Copy link
Contributor

I confirm that tests that use pytest.warns() randomly fail when run in parallel.

For warnings it's worse than that, the whole warnings module
isn't thread-safe. See the note at the end of the docs for catch_warning.

I was just talking to @lysnikolaou about this today and the issue is that making the warning module thread safe and scalable is very tricky.

@ogrisel
Copy link
Member Author

ogrisel commented Oct 9, 2024

@ngoldbaum what do you think of my proposal above to introduce a @pytest.mark.threadunsafe decorator and make pytest-run-parallel run those decorated test sequentially, possibly in the main thread?

@ogrisel
Copy link
Member Author

ogrisel commented Oct 9, 2024

pytest-run-parallel could also automatically detect that some tests use non threadsafe fixtures such as capsys and also run those tests sequentially, possibly in the main thread before starting thread-parallel tests or after running them to completion for extra safety.

@andfoy
Copy link

andfoy commented Oct 9, 2024

We could introduce a pytest.mark.threadunsafe to set the threads to 1. Although right now it is possible to mark such tests using @pytest.mark.parallel_threads(1)

@ogrisel ogrisel added Build / CI module:test-suite everything related to our tests labels Oct 10, 2024
@ogrisel
Copy link
Member Author

ogrisel commented Oct 10, 2024

Thanks, @andfoy, I confirm that using @pytest.mark.parallel_threads(1) make some previously flaky warning related tests (e.g. test_warning_recursion_non_constant_init) or capsys fixtured test consistently (e.g. test_kmeans_verbose) pass.

Still, I think pytest.mark.threadunsafe might be more explicit and possibly more portable across different pytest plugins.

@ogrisel
Copy link
Member Author

ogrisel commented Oct 10, 2024

@andfoy Back to the original problem of function-scoped fixture. I found a problem with a simple use of the tmpdir fixture:

tonybaloney/pytest-freethreaded#17 (comment)

This fixture provides valid test isolation only if it is run around each function call within the same thread as the one actually executing the test. Otherwise, one would need an extra layer of thread protection to be able to correctly use it with a thread-based parallel execution plugin. That would be too invasive in my opinion.

@ogrisel
Copy link
Member Author

ogrisel commented Oct 10, 2024

I started #30041 to investigate what would be needed to get our test suite to pass with parallel thread execution.

It's quite tedious to manually mark all the capsys, monkeypatch, tmpdir fixtured tests and the tests that use pytest.warns with @pytest.mark.parallel_threads(1) manually...

I think we instead need:

  • either the plugin itself to be smart enough to inspect test functions and find common cases of non-thread safe test to run them sequentially;
  • build our own tools to compile a list of test functions to ignore when using pytest-run-parallel to investigate free-threading.

@ngoldbaum
Copy link
Contributor

ngoldbaum commented Oct 10, 2024

For temporary directories, I think you should either migrate to a function that doesn't use temporary directories with a specific name, or prepend the name of the directory with e.g. a UUID to make the directory name unique.

@ogrisel
Copy link
Member Author

ogrisel commented Oct 10, 2024

For temporary directories, I think you should either migrate to a function that doesn't use temporary directories with a specific name, or prepend the name of the directory with e.g. a UUID to make the directory name unique.

But the point of the tmpdir fixture is to not have to deal with test isolation within the code of the test logic itself. Having to update the test code to make more verbose and complex would be a serious step back. It would be much better if to have a thread-local initialization of the function-scoped tmpdir fixture.

Granted, the tmpdir fixture has been deprecated in favor of the tmp_path fixture (to leverage the pathlib.Path API instead of a custom API). But the problem stays the same with the tmp_path fixture:

import pytest


@pytest.mark.parametrize("value", list("abcd"))
def test_simple(tmp_path, value):
    file_path = tmp_path / "file.txt"
    assert not file_path.exists()
    file_path.write_text(value)
    read_value = file_path.read_text()
    assert read_value == value
$ pytest --parallel-threads=4 -x
=================================== test session starts ===================================
platform darwin -- Python 3.12.5, pytest-8.3.2, pluggy-1.5.0
rootdir: /private/tmp/testfolder
plugins: repeat-0.9.2, hypothesis-6.112.2, anyio-4.4.0, run-parallel-0.1.0, xdist-3.6.1
collected 4 items                                                                         

test_tmp_path.py F

======================================== FAILURES =========================================
_____________________________________ test_simple[a] ______________________________________

tmp_path = PosixPath('/private/var/folders/_y/lfnx34p13w3_sr2k12bjb05w0000gn/T/pytest-of-ogrisel/pytest-9/test_simple_a_0')
value = 'a'

    @pytest.mark.parametrize("value", list("abcd"))
    def test_simple(tmp_path, value):
        file_path = tmp_path / "file.txt"
>       assert not file_path.exists()
E       AssertionError: assert not True
E        +  where True = exists()
E        +    where exists = PosixPath('/private/var/folders/_y/lfnx34p13w3_sr2k12bjb05w0000gn/T/pytest-of-ogrisel/pytest-9/test_simple_a_0/file.txt').exists

test_tmp_path.py:7: AssertionError
================================= short test summary info =================================
FAILED test_tmp_path.py::test_simple[a] - AssertionError: assert not True
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
==================================== 1 failed in 0.16s ====================================

@ngoldbaum
Copy link
Contributor

I guess I don't see why you need to have a specific name that's specified in the test and why you want to go out of your way to keep that.

Since you're running the same test in many threads and are implicitly using global state (the file system), then of course that's not thread safe. IMO you should update your testing code to not use the global state, rather than expect the testing code to hide the global state from you.

@ngoldbaum
Copy link
Contributor

Also, I just created Quansight-Labs/free-threaded-compatibility#92 to mention these issues around writing thread-safe tests in the guide.

@ogrisel
Copy link
Member Author

ogrisel commented Oct 10, 2024

Since you're running the same test in many threads and are implicitly using global state (the file system), then of course that's not thread safe.

But tmpdir and tmp_path are precisely in charge of creating a unique temp folder for your test. When you use them with pytest-xdist, there is no concurrency problem when using them naively, even if all Python worker processes managed by pytest-xdist share the same file-system: each test sees only its own temp folder. The problem here is that their implicit usage contract is broken if their initialization is shared for several tests running in concurrent threads.

@ngoldbaum
Copy link
Contributor

When you use them with pytest-xdist, there is no concurrency problem when using them naively, even if all Python worker processes managed by pytest-xdist share the same file-system: each test sees only its own temp folder.

I want to double-check this, since I think if you were running the same test many times in parallel with xdist you'd have the same issue.

@ogrisel
Copy link
Member Author

ogrisel commented Oct 10, 2024

import pytest


@pytest.mark.parametrize("value", list("abcd"))
def test_simple(tmp_path, value):
    file_path = tmp_path / "file.txt"
    assert not file_path.exists()
    file_path.write_text(value)
    read_value = file_path.read_text()
    assert read_value == value
$ pip install pytest-xdist pytest-repeat
$ pytest -n 8 --count 2 -v test_tmp_path.py
========================================================================== test session starts ==========================================================================
platform darwin -- Python 3.13.0, pytest-8.3.3, pluggy-1.5.0 -- /Users/ogrisel/miniforge3/envs/py313/bin/python3.13
cachedir: .pytest_cache
rootdir: /private/tmp/testfolder
plugins: repeat-0.9.3, xdist-3.6.1
8 workers [8 items]     
scheduling tests via LoadScheduling

test_tmp_path.py::test_simple[c-1-2] 
test_tmp_path.py::test_simple[d-1-2] 
test_tmp_path.py::test_simple[a-2-2] 
[gw4] [ 12%] PASSED test_tmp_path.py::test_simple[c-1-2] 
test_tmp_path.py::test_simple[d-2-2] 
test_tmp_path.py::test_simple[b-2-2] 
test_tmp_path.py::test_simple[c-2-2] 
test_tmp_path.py::test_simple[b-1-2] 
test_tmp_path.py::test_simple[a-1-2] 
[gw0] [ 25%] PASSED test_tmp_path.py::test_simple[a-1-2] 
[gw3] [ 37%] PASSED test_tmp_path.py::test_simple[b-2-2] 
[gw5] [ 50%] PASSED test_tmp_path.py::test_simple[c-2-2] 
[gw1] [ 62%] PASSED test_tmp_path.py::test_simple[a-2-2] 
[gw7] [ 75%] PASSED test_tmp_path.py::test_simple[d-2-2] 
[gw6] [ 87%] PASSED test_tmp_path.py::test_simple[d-1-2] 
[gw2] [100%] PASSED test_tmp_path.py::test_simple[b-1-2] 

=========================================================================== 8 passed in 0.52s ===========================================================================

@andfoy
Copy link

andfoy commented Oct 10, 2024

As @ngoldbaum mentioned, tests that use tmpdir should append an extra UUID per thread, so that each file does not get overwritten/unexpectedly removed by other thread.

This goes in line with the philosophy of being aware of concurrent computing, as opposed to having a complex and opaque machinery that tries to abstract the problem away.

I understand that going through each test in the library takes a lot of time, but doing so will ensure that each test function is thread-safe (even if the solution sometimes implies adding locks) and it will future-proof it for future contributors, as adding a CI (for example), will force them to not use global state patterns.

It is difficult to get away from thinking in global state, which is common in single-threaded imperative or object oriented computing. But free-threaded deals with a different paradigm, and requires different abstractions and patterns

@ogrisel
Copy link
Member Author

ogrisel commented Oct 10, 2024

But it's sad to require the test to handle the thread-isolation by itself. It would be a much better separation of concern if those fixtures or the runner could handle the thread-aware isolation by them-selves. It goes against the DRY principle.

We could probably write a threadsafe_tmp_path fixture that would automatically append the thread id or a UUID as subcomponent of the tmp folder managed by the fixture, but it's sad to have each project reimplement this independently.

@ogrisel
Copy link
Member Author

ogrisel commented Oct 10, 2024

I suppose it's possible to maintain a per-project test utility wrapper such as:

import pytest
import threading


def thread_local_path(path):
    local_path = path / f"thread-{threading.get_ident()}"
    local_path.mkdir()
    return local_path


@pytest.mark.parametrize("value", list("abcd"))
def test_simple(tmp_path, value):
    file_path = thread_local_path(tmp_path) / "file.txt"
    assert not file_path.exists()
    file_path.write_text(value)
    read_value = file_path.read_text()
    assert read_value == value

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build / CI free-threading PRs and issues related to support for free-threaded CPython (a.k.a. nogil or no-GIL, PEP 703) module:test-suite everything related to our tests Needs Decision Requires decision
Projects
None yet
Development

No branches or pull requests

4 participants