-
Notifications
You must be signed in to change notification settings - Fork 161
Fix pytorch fbgemm.dll dependency issue #2927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2927 +/- ##
==========================================
- Coverage 56.47% 56.47% -0.01%
==========================================
Files 535 535
Lines 62263 62263
Branches 7711 7711
==========================================
- Hits 35165 35164 -1
+ Misses 25335 25334 -1
- Partials 1763 1765 +2 ☔ View full report in Codecov by Sentry. |
82d0969
to
27024ce
Compare
do you have a link to a source for "because libomp140 was no longer shipped in a VS redistributable.", or did you diff different VS redistributables yourself? |
Progress: After verifying that a barebones pytorch install works fine (also on github windows runners), I've been able to find the culprit package that we install and with which it breaks: |
do you install the patched version of pyshtools from |
Found the culprit; the patched pyshtools didn't fix it. Installing sphericaltexture pulls openmp 5.0.0. 5.0.0 is sufficient to cause the issue even with pytorch 2.5.1. Installing 8.0.1 allows pytorch 2.5.1 to work fine. |
But |
I had a look of at such an environment and I see at least two openmp libs,
not sure why conda doesn't issue a clobber warning. My guess would be this causes the problem. So the pulling in of openmp per se is not the problem - but clobbering is. versions of openmp aside, you can also "fix" this with |
🤦 I hadn't noticed that. Not just with openmp btw; in an attempt with a newer
but it's happy to issue dozens of them on our CI, none of them seemingly an issue :(
This does seem to work; so I think we can choose to pin pytorch 2.3, or unpin pytorch and do this force-reinstall instead. Do you have a preference for either? Neither is really clean. The force-reinstall would extend CI duration. We could even pin for CI and force-reinstall for release, though CI durations aren't enough of a bottleneck to warrant that extra complexity I think. |
strong preference for pinning The reinstall "hack" will break something else - I'm sure :) |
For documentation, I had read that over here:
But as it turns out isn't related to the issue we have here. |
Clash on Win - pyshtools installs openmp, which clobbers dlls needed by pytorch's intel-openmp since pytorch 2.4 See bfab86d (similar problem) and ilastik#2927
27024ce
to
12839a7
Compare
Fixes #2922 |
I've updated the original post to summarise my understanding of what's going on now for future reference |
Clash on Win - pyshtools installs openmp, which clobbers dlls needed by pytorch's intel-openmp since pytorch 2.4 See bfab86d (similar problem)
Clash on Win - pyshtools installs openmp, which clobbers dlls needed by pytorch's intel-openmp since pytorch 2.4 See bfab86d (similar problem)
Issue we first saw in #2904 with pytorch 2.4.1
E.g. https://github.com/ilastik/ilastik/actions/runs/11721367647/job/32654563102
Conclusion:
I've put the
pytorch <2.4.1
constraint in meta.yaml, and also environment-dev.yml, so we should effectively be on 2.4.0 now.The error we see on Windows CI with 2.4.1 and up may look identical to an error widely described for pytorch 2.4.0 (pytorch/pytorch#131662), but I think they are separate issues. The error looks identical because in both cases, a dll related to openmp is missing/broken.
Evidently, unlike a huge number of people as documented by the pytorch issue, ilastik seems to be fine with pytorch 2.4.0. I suspect everyone else was having openmp ruined by
torchaudio
, which we don't pull.In #2904, @k-dominik describes a clash between pytorch 2.4.1 and
pyshtools
due to its dependencygmt
overwriting openmp. An attempted workaround with a modified pyshtools package, uploaded to the conda channelilastik-forge/label/patched
, may have slimmed the dependencies but didn't solve the problem.In my attempts here,
pyshtools
was also the source of the problem; but instead ofgmt
, it looks like openmp was being overwritten bylibflang
. Specifically, it was pulling openmp and libflang both at versions 5, which are rather old. I tried if pytorch 2.5.1 was able to coexist with other openmp versions in an env, and looks likeopenmp=8.0.1
would have been fine.libflang=11.*
pulledllvm-openmp
instead ofopenmp
, with the overwriting issue persisting, butlibflang=17.*
no longer pulled any openmp and was fine as well.