Skip to content

Conversation

@kvark
Copy link
Contributor

@kvark kvark commented Aug 18, 2018

A step towards #4372
This PR enables us to start running with gfx-portability or MoltenVK. It's not enough to make the emulator functional just yet - much depends on maturing of the portability libraries. It also doesn't provide a "native' swapchain path yet (not a priority until we get something working first).

I believe there is value in having the code path open, for curious developers to experiment and advance the state of Vulkan Portability.

Technical issues/blockers I'm aware of (to be extended):

  1. Too many samplers are requested by a pipeline layout. Metal only supports 16 samplers per stage on all devices, while RPCS3 requested 20.
  2. Texture swizzling is (almost) not supported.
  3. Occlusion queries are missing.
  4. Buffers in HOST_VISIBLE memory with TEXEL_BUFFER usage are requested but not supported
  5. Buffer views size limits are too small and not checked for
  6. Surface needs to handle being out of date
  7. Matrix isn't applied properly in vertex shaders.
  8. Crash in ~user_interface on exit.

r? @kd-11

Copy link
Contributor

@kd-11 kd-11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to vulkan are ok, with the exception of minor formatting issues. Add a space between the // and the text in comments to avoid clang-format from messing up the changes. There are existing cases of the old format without leading space, but it was agreed that going forward leading spaces were preferred.
As for LLVM changes, they seem strange to me but neko will ultimately have to review that.


extensions.push_back(VK_KHR_SURFACE_EXTENSION_NAME);
extensions.push_back(VK_EXT_DEBUG_REPORT_EXTENSION_NAME);
if (support.is_supported(VK_EXT_DEBUG_REPORT_EXTENSION_NAME)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

brace on newline

}
};

#ifdef __APPLE__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this block, but I can't see why unsigned long needed specification since all the sized types are already handled above. Also why return 8-bit representation for 64-bit long? Or is it a compiler quirk?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not completely sure either. Apparently, clang treats unsigned long as a distinct type for the template specialization.

Also why return 8-bit representation for 64-bit long? Or is it a compiler quirk?

oops, nice catch! 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not necessary, the issue is incompatibility between u64 and uptr on MacOS (unsigned long long vs unsigned long, same thing but compiler doesn't deal with it properly). Change uptr to uint64_t in types.h and Mac should build fine.

bors bot added a commit to gfx-rs/gfx that referenced this pull request Aug 18, 2018
2328: [mtl] expose anisotropy and DXT format features r=grovesNL a=kvark

These are required for RPCS3/rpcs3#4996
PR checklist:
- [ ] `make` succeeds (on *nix)
- [ ] `make reftests` succeeds
- [ ] tested examples with the following backends:
- [ ] `rustfmt` run on changed code


Co-authored-by: Dzmitry Malyshau <[email protected]>
@kvark
Copy link
Contributor Author

kvark commented Aug 19, 2018

I believe the current issues are addressed.
r? @Nekotekina

@kd-11
Copy link
Contributor

kd-11 commented Aug 19, 2018

@kvark Any plans to eventually finish working on the NSView display code? I don't have access to relevant hardware to help with that and it would be a shame to have it stick around as dead code.

target_link_libraries(rpcs3 "${RPCS3_SRC_DIR}/../3rdparty/discord-rpc/lib/libdiscord-rpc-linux.a")
elseif(APPLE)
target_link_libraries(rpcs3 "${RPCS3_SRC_DIR}/../3rdparty/discord-rpc/lib/libdiscord-rpc-mac.a")
target_link_libraries(rpcs3 "${RPCS3_SRC_DIR}/../3rdparty/discord-rpc/lib/libdiscord-rpc-mac.a" objc "-framework Foundation" "-framework CoreServices")
Copy link
Member

@hcorion hcorion Aug 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd appreciate it if these changes were added to the other -framework link_libraries on line 421

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these are needed specifically for discord integration, hence I put it this way. I don't mind changing it though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, alright it's fine to leave them there then.

@kd-11
Copy link
Contributor

kd-11 commented Aug 22, 2018

Swizzle emulation is fixed by #5024. I tested on windows and it worked fine for both ingame render and the UI overlay. Just need to finish the NSView drawing and support should be complete, I'm marking this set as WIP so we can finish this in one go. Are occlusion queries a motlenVK limitation or just gfx-rs? Because some games will never work properly without this and simulation is very difficult.

@kd-11 kd-11 changed the title Vulkan Portability code path on MacOS with MVK extensions [WIP] Vulkan Portability code path on MacOS with MVK extensions Aug 22, 2018
@kd-11 kd-11 added Enhancement Render: Vulkan Workaround To be fixed properly in the future. labels Aug 22, 2018
@kvark
Copy link
Contributor Author

kvark commented Aug 22, 2018

@kd-11

Any plans to eventually finish working on the NSView display code?
Just need to finish the NSView drawing and support should be complete

AFAIK, the missing code is for an emulated swapchain (confusingly called "native" swapchain...) - to be used when the real Vulkan swapchain is unavailable. It's not supposed to be used most of the time, so I don't think the lack of this path should be blocking the PR.

Swizzle emulation is fixed by #5024.

Amazing, thank you!

Are occlusion queries a motlenVK limitation or just gfx-rs? Because some games will never work properly without this and simulation is very difficult.

They are a limitation of gfx-rs at the moment. I've just finished implementing them in gfx-rs/gfx#2334, passing all the relevant Vulkan CTS, so we just need to land all the things to be ready.

@kd-11
Copy link
Contributor

kd-11 commented Aug 22, 2018

I don't know if Qt does proper hw-swapchain-compatible windows, so you're saying it does actually work? iirc the MVK swapchain extension was not working last time someone tried to get it working. If it does actually work then simply adding the swizzle emulation and this PR should have MVK working.

@kvark
Copy link
Contributor Author

kvark commented Aug 22, 2018

I don't know if Qt does proper hw-swapchain-compatible windows, so you're saying it does actually work?

When I was testing it last time, swapchain creation went fine on gfx-portability.

iirc the MVK swapchain extension was not working last time someone tried to get it working. If it does actually work then simply adding the swizzle emulation and this PR should have MVK working.

IIRC, MVK expects the CAMetalLayer to already be created on NSView, while gfx-portability just creates one. The Vulkan Portability group hasn't discussed that aspect, I'll add it to the agenda.

@kvark
Copy link
Contributor Author

kvark commented Aug 23, 2018

@kd-11 I tried your branch with my latest changes in queries and got stuck in the next problem: RPCS3 tries to create a 384MB buffer that has VK_BUFFER_USAGE_UNIFORM_TEXEL_BUFFER_BIT in the CPU-visible memory. We don't support this (rightfully) and RPCS3 ends up triggering an exception. Do you really need to keep it CPU-visible?

@kd-11
Copy link
Contributor

kd-11 commented Aug 23, 2018

@kvark For now, yes. Working around this would involve a separate upload buffer and a second buffer copy. As far as I could see the host-visible nature does not degrade performance when used for upload in this fashion so it was optimized around that. How much can you allocate once for host access on metal? Those are just storage heaps, you can make them smaller for testing, with possible out-of-memory situations that will degrade performance. You can set to 128M for example and simple applications will not crash.

@kvark
Copy link
Contributor Author

kvark commented Aug 23, 2018 via email

@kd-11
Copy link
Contributor

kd-11 commented Aug 23, 2018

Why would a second buffer copy be needed?

How else do we transfer a large bytestream (even 384M is not large enough sometimes) for random access in the vertex shader? Isn't the only other solution to have an upload buffer and then copy to the actual stream using a GPU-side transfer? By second buffer copy I meant there are 2 copies, one from CPU->buffer1 then buffer1->buffer2.

@kd-11
Copy link
Contributor

kd-11 commented Aug 23, 2018

I'll push a workaround for the texel buffer incompatibility on the textures PR in a few minutes and we can see if it now works. 🤞

@kvark
Copy link
Contributor Author

kvark commented Aug 23, 2018

By second buffer copy I meant there are 2 copies, one from CPU->buffer1 then buffer1->buffer2.

Oh, I read this as another buffer in memory. If you mean "buffer copy" in terms of operation (e.g. 2 copies), then sure, I understand. Although, CPU could be working with the HOST_VISIBLE memory directly (instead of doing "CPU->buffer1" copy), in which case there is a single copy.

Do the contents change a lot between frames?

@kvark
Copy link
Contributor Author

kvark commented Aug 23, 2018

@kd-11 FYI, our query support is almost landed, pending gfx-rs/portability#134

I'll push a workaround for the texel buffer incompatibility on the textures PR in a few minutes and we can see if it now works.

This is incredible! Thank you for addressing our blockers so promptly! ❤️
When implementing this, please make sure to follow the standard Vulkan logic instead of adding #ifdef __APPLE__: check the memory types in buffer requirements, and work around if the implementation can't support it. Alternatively, using a non-HOST_VISIBLE memory for those buffers may be good on all platforms as the only path, but this depends on how often and what parts of it need to be updated.

@kd-11
Copy link
Contributor

kd-11 commented Aug 23, 2018

It would be wasteful of vram as the memory content is only guaranteed to remain intact for the immediate near future. This is largely due to design constraints where the cpu is often used to offload vertex calculations on ps3, so you end up reuploading almost the entire graphics memory every frame as it has changed.

@kvark
Copy link
Contributor Author

kvark commented Aug 25, 2018

I'm seeing an issue with frame presentation and possibly synchronization, but at least there is a proof we pass the initialization now - the glorious triangle:
rpcs3-triangle

@kd-11
Copy link
Contributor

kd-11 commented Aug 25, 2018

Thats wierd, the triangle is supposed to be centered. The colors are correct though so at least we know all the pipeline stages are working. With that I could just merge this PR as is then. How does it run on MVK alone (without gfx-rs) ?

@kd-11
Copy link
Contributor

kd-11 commented Aug 25, 2018

I forgot to mention - the part that is seemingly not applying is the matrix multiplication at the end of the vertex shader. Basically goes something like:

gl_Position = reg0;
... bunch of other code...
gl_Position = gl_Position * scale_offset_matrix;
..bunch of other code..
gl_Position = apply_clip_transform(gl_Position);

The non-inline nature is largely since the code generator is shared with other languages such as HLSL. I'm wondering if for some reason on gfx-rs (and maybe MVK?) the gl_Position = gl_Position * mat is not working due to the "gl_Position" variable mapping to a write-only variable?

@kvark
Copy link
Contributor Author

kvark commented Aug 26, 2018

Thats wierd, the triangle is supposed to be centered.

Oops. Well, sounds like a low hanging fruit to fix anyway.

With that I could just merge this PR as is then.

Someone still needs to look at the LLVM code (who should we ping?). That path doesn't work, and I'm running RPCS3 in interpretation mode.

How does it run on MVK alone (without gfx-rs) ?

Sorry, I'm not interested to debug MVK 😛

I'm wondering if for some reason on gfx-rs (and maybe MVK?) the gl_Position = gl_Position * mat is not working due to the "gl_Position" variable mapping to a write-only variable?

That's a great observation, it's going to easy the investigation, thank you! FWIW, we still use the same shader translator as MVK (SPIRV-Cross). I'll check the produced code.

@JohnHolmesII
Copy link
Contributor

JohnHolmesII commented Aug 26, 2018

For all things LLVM, one pings his High Kotliness, i. e. @Nekotekina

Neko, can you review the LLVM stuff int the PR

@kvark
Copy link
Contributor Author

kvark commented Aug 26, 2018

@kd-11 I resolved the presentation issue (mentioned earlier) in gfx-rs/gfx#2345

the part that is seemingly not applying is the matrix multiplication at the end of the vertex shader

I do see a matrix being applied, but it's a fairly identity-ish matrix. And the buffer it's read from is in Shared memory, so there isn't much to screw up in terms of synchronization. I'll keep looking...

@oscarbg
Copy link

oscarbg commented Aug 26, 2018

Hi all,
great effort guys!
@kd-11 I might be willing to test on MoltenVK, BUT I'm a bit lazy to rebuild these days..
I might wait a little (estimate, given the fast progress :-) ) to (in increasing easiness level for me to test):

  1. this PR being in master (with fixed LLVM?)
  2. continous prebuilt binaries for Macos.. Setup Continuous Integration for MacOS #5015
  3. once this PR in master and assuming Mac CI builds take longer to arrive I can download frequent builds from (http://www.emucr.com) and as you know Wine has Vulkan support on MacOS using MoltenVK.. sadly not enabled by default on https://dl.winehq.org/wine-builds/macosx/download.html.. anyway I have it compiled and in fact I checked rpcs3 Windows binary about a month ago and tried running gs_gcm_basic_triangle.elf (Support for RPCS3 (+via Wine).. KhronosGroup/MoltenVK#205) I got:
    [MoltenVK ERROR] VK_ERROR_EXTENSION_NOT_PRESENT: Vulkan extension VK_EXT_debug_report is not supported.
    so question is if can you make use of this extension conditionally on platforms that don't support it? like MoltenVK.. I already requested support for it on MoltenVK site but seems support for it can take a while..
    one question for @kvark is if "portability" supports VK_EXT_debug_report or this PR disables somehow use of it..
    forgot to say that running Windows binary also has the advantage of avoiding X11 issues I found in my last testing around 5 months ago(My findings trying to run simple triangle homebrew app on MacOS using MoltenVK.. #4293) peharps that issue is already fixed I'm not up to speed on RPCS3 Macos changes..
  4. have been seeing also latests commits in master like:
    "vk: Implement double-buffered heaps for platforms without universal support for host visibility (APPLE)"
    "vk: Support sw component swizzle decode because metal sucks "
    and more..
    and seems like most Vulkan "massaging changes" for Metal are not platform dependant.. but swizzle changes are with #ifdef APPLE so seems running RPCS3 Windows binary under Wine on Macos will hit the swizzle issues.. so question is can the "not needing swizzle support" code be enabled on all cases (even for Windows builds) or has a performance hit? I know may be going to far but also change build time (#ifdef APPLE) for a configuration flag?

thanks..

@hcorion
Copy link
Member

hcorion commented Aug 26, 2018

I can download frequent builds from (http://www.emucr.com)

We'll have them on our official website https://rpcs3.net

so question is can the "not needing swizzle support" code be enabled on all cases (even for Windows builds) or has a performance hit?

I don't really see any good reason why you should be running rpcs3 on wine. It just seems to me like an unnecessary option for the one person who wants to run rpcs3 on wine on MacOS instead of just running the native version.

@oscarbg
Copy link

oscarbg commented Aug 26, 2018

@hcorion thanks for pointing https://rpcs3.net .. I almost forgot it..
as said reason for running rpcs3 on wine (on MACOS) is temporally for avoiding X11 issues and also do quick testing before you RPCS3 devs ship Mac binary builds.. it's understandable not useful when mac binary builds come with no issues on MoltenVK and/or portability libraries..

@kvark
Copy link
Contributor Author

kvark commented Aug 27, 2018

@kd-11 Alright, I nailed down the issues with the triangle sample (in gfx-rs/gfx#2348), and we are in a much better shape now:
prcs3-cube
rpcs3-scogger

The only known issue so far is the crash on exit, happening within user_interface struct destructor. Not sure if it's related, will keep looking.

@kd-11
Copy link
Contributor

kd-11 commented Aug 27, 2018

Looking very good! Nice work. Interesting how many issues were caught just trying to get RPCS3 to run, but such is the norm when real-world testing comes into play. The __APPLE__ workarounds can be replaced with an external preprocessor directive that will be set by cmake for CI builds; that way someone can build rpcs3 for windows with the same settings should they choose to.

@kvark
Copy link
Contributor Author

kvark commented Aug 28, 2018

@kd-11 I'm fine with going through a bunch of blockers if it's all running smooth and dandy in the end :) Your help was crucial here, makes me respect RPCS3 developer team quite a bit.

For those curious, I attached the pre-built binaries to gfx-rs/portability#144
In the meantime, we are setting up a continuous deployment scheme for easier updates. Does RPCS3 plan to rely on LunarG SDK binaries for macOS, or is it going to distribute a portability library like Dota2 does?

@hcorion
Copy link
Member

hcorion commented Aug 28, 2018

The plan I believe is to distribute the gfx-rs/portability library with RPCS3, if licenses and such work together.

VkResult present(u32 index) override
{
auto& src = swapchain_images[index];

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throw an assert here, as this is theoretically unreachable since 'init' will fail with 0x0 dimensions

@kd-11
Copy link
Contributor

kd-11 commented Aug 28, 2018

All major technical blockers have been dealt with, I have made some minor comments on how to fix the remaining one. With these changes this PR will be mergable.

@kvark
Copy link
Contributor Author

kvark commented Aug 28, 2018

@kd-11 I updated the code according to your review notes and rebased. There is still an issue with exiting RPCS3 where it crashes (inside user_interface::~user_interface), which may or may not be related to graphics.


VkResult present(u32 index) override
{
assert(!"native MacOS swapchain is not implemented yet");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use fmt::throw_exception(string) for consistency

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@kd-11 kd-11 changed the title [WIP] Vulkan Portability code path on MacOS with MVK extensions Vulkan Portability code path on MacOS with MVK extensions Aug 28, 2018
@MSuih
Copy link
Member

MSuih commented Aug 28, 2018

There have been some reports about crashes when exiting on other systems as well, so it could be that your changes have nothing to do with it. But you can still continue to debug, it might be a seperate issue or you might find a simple bug and fix it for all versions.

@kvark
Copy link
Contributor Author

kvark commented Aug 28, 2018

@MSuih I did attempt an investigation, but haven't found anything. Are the other reports registered somewhere so that I could at least compare the call stacks?

@MSuih
Copy link
Member

MSuih commented Aug 28, 2018

Don't think so, I've just seen the occasional complaint on Discord.

@kd-11
Copy link
Contributor

kd-11 commented Aug 28, 2018

If its related I'll look into it, unfortunately I do not have a way to recreate this issue at the moment on my end. Merging this set as is, any other issues can be fixed in follow-ups.

@kd-11 kd-11 merged commit c452b43 into RPCS3:master Aug 28, 2018
@kvark kvark deleted the portable branch August 28, 2018 18:25
@dio-gh dio-gh mentioned this pull request Aug 20, 2021
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Enhancement Render: Vulkan Workaround To be fixed properly in the future.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants