Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert Atom_RPI.Public and Atom_RPI.Edit to shared libraries #18427

Merged

Conversation

mprettner-huawei
Copy link
Contributor

@mprettner-huawei mprettner-huawei commented Oct 29, 2024

What does this PR do?

TLDR: Convert Atom_RPI.Public and Atom_RPI.Edit to shared libraries and fix all upstream (mostly in AzCore) problems. Feedback needed on whether the MSVC warning 4251 could be disabled globally. In comparison to the reverted PR18315, this PR does not export all symbols but only those which are explicitly marked for export.

Detailed changes

  • Create new files Configuration.h which contain the export macro definitions
  • Add export macros (ATOM_RPI_PUBLIC_API, ATOM_RPI_REFLECT_API, ATOM_RPI_EDIT_API) to class declarations in Atom_RPI
  • Add AZ_DLL_EXPORT macros to explicit template specializations of exported template functions
  • Remove GetBuildTimestamp() in ShaderVariant/ShaderVariantAsset because the implementation in ShaderVariantAsset was missing
  • Change ViewGroup::m_cameraViews from vector to array, because the ViewData struct is not copy constructible (it uses AZ::Event)
  • Move ResourcePoolSourceData::Reflect to new cpp file (was in ResourcePoolBuilder.cpp, which is a different cmake target)
  • Some small changes like defining a default constructor in the cpp file (this is mostly necessary for the nounity-build)
  • Add new dll AzCore which publicly depends on AzCore.Static, this dll now contains the explicit instanciations of some templates (like DECLARE_EBUS_INSTANTIATION(...)). This dll depends on AzCore since instantiating an ebus uses many things from AzCore, so it cannot easily be moved to O3DEKernel; and with this constellation no downstream targets need to be changed. The explicit instanciations must be in a shared library, since otherwise the linker complains about multiple definitions of a template in different dlls/exes, which violates the ODR and cannot be resolved like inside of a single dll/exe. This seems to be a problem in general when a static library is linked into multiple dlls which are then linked again into another dll or exe.
  • Add new AZ_DLL_EXPORT_EXTERN and AZ_DLL_IMPORT_EXTERN macros since Clang on Linux requires an additional dllexport when declaring an extern template, but Clang on Windows and MSVC dont allow this
  • Move template functions in AZ::Internal::IdHandler/AZ::Internal::NonIdHandler/AZ::Internal::MultiHandler ouside of the class definition such that externing them works correctly (this is mostly due to linker errors (*IdHandler), and in one instance about crashes when an EBus context is accessed from multiple modules, eg. when the codepath from an EBus broadcast is handled in different modules): This works for the most part, the downside is that AZ::Internal::NonIdHandler and ::IdHandler also need to be externed with a extern template class Internal::IdHandler... in addition to the EBus itself, and since using directives cannot be used for extern template class (extern template class EBus<SystemTickEvents>::BusesContainer::Handler is not allowed (it works in MSVC but is not allowed according to the standard, and doesnt work with Clang on Windows), as well as eg. AZStd::conditional<...> which uses using as well), there are now two separate macros required for declaring and instanciating an extern EBus (one for Id, one for NonId). I didnt find a way to get the id/nonid information from the class with template magic, since not being allowed to use a using directive limits template metaprogramming significantly.

Feedback needed

  • Disable the warning 4251 "'type' : class 'type1' needs to have dll-interface to be used by clients of class 'type2'": This PR only compiles when disabling this warning globally (it would be possible to use the AZ_PUSH_DISABLE_DLL_EXPORT_MEMBER_WARNING/AZ_POP_DISABLE_DLL_EXPORT_MEMBER_WARNING macros to disable this warning more locally for the affected classes (of which there are many) if this is not an option). I dont really see the point of keeping this warning enabled, since it is disabled a total of ~350 times in the codebase; mostly due to Qt classes, but it would also needed to be used in any Atom::RPI class which contains AZStd::* members or members from AzCore, such as intrusive_ptr (until AzCore is a shared library itself). So I would suggest to disable this warning globally.
  • The macros DECLARE_EBUS_EXTERN_DLL_SINGLE_ADDRESS and similar ones are not really that pretty, but it does not seem to be possible to get the correct members to declare as extern template due to the using limitation of the extern template syntax. Maybe someone could think of a better way to handle this?
  • I dont really know what is going on with RPI::SceneNotificationBus and RPI::ShaderReloadNotificationBus: It works fine with the unity build, but lead to linker errors (IdHandler::~IdHandler already defined in CommonFeaturesAtom.Editor.Static) when using a nounity build. The workaround is to declare the bus as extern with the DECLARE_EBUS_EXTERN_DLL_MULTI_ADDRESS macro, but I dont know why this is necessary (could be any of: template instantiations are not deduplicated across dll boundries, a derived dllexported class exports the handler, the functions are not implicitly inline, the functions are not exactly the same in all translation units due to changes in compile options, or something completely different; it is very difficult for me to get a complete picture of the EBus and related classes to see where the problem could be, and since it is all template code it should just work™, but obviously it does not). This being necessary only for the SceneNotificationBus with nounity-build suggests that some specific constellation of defining handlers and dispatching events leads to problems, so it could be a problem for other RPI buses which are used in external gems.

Future work

Based on the large number of AZ_PUSH_DISABLE_DLL_EXPORT_BASECLASS_WARNING/AZ_POP_DISABLE_DLL_EXPORT_BASECLASS_WARNING that needed to be added, and some linker errors which are likely due to static AzCore being linked into multiple dlls, it is probably advisable to convert the core libraries (AzCore/AzFramework) first and then move outwards to other targets and gems for future work (which is also suggested in the "Improve Developer iteration workflow RFC").

How was this PR tested?

Run the Editor and do some interactions, run tests locally on Linux, run the AtomSampleViewer and look at a few samples

@mprettner-huawei mprettner-huawei changed the title Atom rpi exportmacros Convert Atom_RPI.Public and Atom_RPI.Edit to shared libraries Oct 29, 2024
@mprettner-huawei mprettner-huawei added sig/core Categorizes an issue or PR as relevant to SIG Core sig/graphics-audio Categorizes an issue or PR as relevant to SIG graphics-audio. labels Oct 29, 2024
@mprettner-huawei mprettner-huawei force-pushed the atom_rpi_exportmacros branch 4 times, most recently from 509ea40 to 1031b67 Compare October 31, 2024 17:03
@mprettner-huawei mprettner-huawei force-pushed the atom_rpi_exportmacros branch 2 times, most recently from 2f7131a to 777f76e Compare November 19, 2024 13:18
@mprettner-huawei mprettner-huawei marked this pull request as ready for review November 20, 2024 09:48
@mprettner-huawei mprettner-huawei requested review from a team as code owners November 20, 2024 09:48
@nick-l-o3de
Copy link
Contributor

nick-l-o3de commented Nov 20, 2024

I'm still reading thru this (there's some minor spelling stuff so far and I'm trying to think of a way to get around the Id/NonId thing...

but I'm okay with disabling that warning, for non-exported modules. Doesn't it show up later as a linker error anyway if its something non-exported but actually used?

As for the complaint about notification bus complaining about the destructor - I'm not sure. Many of the other busses do not have a destructor and don't need one, but the scene one doesn't have one, and you say it still complains. Note that the destructor is not being forward declared, so ... would that not mess with externing somehow? I'm not sure - we're supposed to be externing the ebus, not the interface... so it shouldn't matter.

I don't mind the macros which extern them, but I'm concerned about things which make us have to extern them. Its supposed to be a bit of an optimization, not somehting which actually is required to function, and we were planning to use it for any busses that showed up with a lot of frequency in the binary analysis. I'm not completely against it if it solves a problem to do with linking of course. extern and inline are about the tools we have to solve that.

Copy link
Contributor

@nick-l-o3de nick-l-o3de left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see a problem with all this. we'll have to try it out a lot before it gets deployed for weird 3rd party gem side effects, but moving to better use of the shared library space we have is going to help a lot in compile time and workflow.

If I was a better Template Magician I'm sure I could figure out a way to reduce those macros, but alas, I'm not a Template Magician and in fact, really dislike template metaprogramming and the symbol explosion it causes. I can't think of a way to squish those defs down, and might be better not to, to reduce instantiations.

that being said,
before we actually pull the trigger on this one, I would like to see otehr input on this, especially renderer folks or core folks and get approval for the blanket warning removal.

That, and I'm interested in what other testing you might need, in order to feel "safe" about pushing this...

One thing I can think of is building and installer and trying it out (build o3de with cmake configure options -ULY_PROJECTS and build the target install in both profile (and release? AFAIK each time you build install it overlays that configuration into the install subfolder and makes it more capable.)

Then use the subfolder install to test, making a project or whatever.

};

// This functino needs to be defined outside of the class definition such that it can get explicitly instantiated correctly. This is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling functino (sorry for the nit!)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish I knew more about the internals of ebus - I've always wanted to ask the question, what is the callstack storage for anyway? is it necessary to function or just a debug tool? if the latter, could we remove it except in debug or something? NotE: NOT NECESSARY to change this for this PR, just curious as to whether you have any insight for future. It costs a bit of overhead and constantly shows up when I use a binary analysis tool to find whats inside a binary.

@mprettner-huawei
Copy link
Contributor Author

After some PRs (18501, 18525, multiplayersample 493) the following tested scenarios work ("MyGem" is an internal gem which interacts with Atom a lot):

  • AR with all test cases
  • Running AtomSampleViewer samples
  • Running the engine with MyGem in an engine-centric build (profile + release)
  • Running the engine SDK with MyGem as source in a project-based build (profile only; in a release build it fails to find the "RPISystem" service; same as with current development)
  • Running the engine SDK with the o3de-multipleyersample (profile only): This only works partially, I can start the ServerLauncher and fly around the level, but not connect with a GameLauncher; but this is the same behaviour as with current development (not sure what I am doing wrong here)

So the bottom line is that the things that worked before on my machine still work, but I am not sure why the release build is this problematic. I cannot rule out the possibility that other linker errors will occur when using different parts of the engine from an external gem with the SDK, but at least for MyGem it works without problem.

Copy link
Contributor

@galibzon galibzon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm all for these kind of improvements. But let me try first with Android (Quest3) to give you feedback if it breaks mobile or not.

@jhmueller-huawei
Copy link
Contributor

@galibzon Have you had a chance to try this on Android?

@galibzon
Copy link
Contributor

@galibzon Have you had a chance to try this on Android?

I'm testing now

@galibzon
Copy link
Contributor

@galibzon Have you had a chance to try this on Android?

I'm testing now

It works. I compiled AutomatedTesting for Android. Deployed to a Google Pixel 7 Pro. Loaded the "DefaultLevel". Looks great.

@jhmueller-huawei jhmueller-huawei merged commit 08b1dd9 into o3de:development Dec 10, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/core Categorizes an issue or PR as relevant to SIG Core sig/graphics-audio Categorizes an issue or PR as relevant to SIG graphics-audio.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants