-
Notifications
You must be signed in to change notification settings - Fork 563
WIP: vmm: refactor FD passing #7431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
If the general approach here is acked, I'll be happy to add more tests, if needed/requested. |
13a22a7 to
6645937
Compare
As discussed in PR cloud-hypervisor#7377, Cloud Hypervisor may potentially accept different types of FDs on VM creation and/or resuming: - FDs for net devices (currently supported); - FDs for pmem devices for memory zones (WIP); - FDs for storage devices, etc. As FDs are passed as an array of anonymous numbers over UDS sockets, there should be a way to assign them to the devices they belong to. This PR introduces ExternalFdsConfig in vm_config.rs, which matches, positionally, the list of device IDs/names to the list of FDs. Signed-off-by: Peter Oskolkov <[email protected]>
6645937 to
e8ff526
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! In general, I love to have a generic way to solve this. I however see lots of caveats and room for improvement in the implementation as well as a few open design discussions. I also have to run this in our (= @cyberus-technology) internal test suite and test live migration with virtio-net devices backed by external FDs.
My review here is very comprehensive and strict cause we investigated quite some time to dive into this domain. It complex with many caveats. Together we can come up with a nice solution, tho!
My Checklist
- Tested that this will be live-migration compatible for virtio-net
| let fds = match &mut restore_config.external_fds { | ||
| Some(external_fds) => { | ||
| let mut fds = vec![]; | ||
| core::mem::swap(&mut fds, &mut external_fds.fds); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is weird to use core::mem::swap here. Why not net.fds.replace(external_fds.fds)? Feels much more idiomatic to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what you need is std::mem::take(&mut external_fds.fds).
Essentially does the same (taking ownership of the inner Vec<i32>), but more concise and readable.
| landlock_rules: None, | ||
| #[cfg(feature = "ivshmem")] | ||
| ivshmem: None, | ||
| external_fds: None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not about this line but there is a typo in the commit message: I think UDS is not a common appreviation for Unix domain sockets? Could you please be a little more verbose here and write something like:
FDs are passed as an array of .... via UNIX Domain Sockets using the SCM_RIGHTS mechanism. or so?
| } | ||
| } | ||
|
|
||
| pub fn consume_fds(&mut self, fds: Vec<i32>) -> Result<(), PayloadConfigError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Please rename this to
apply_external_fds_to_componentsor similar descriptive - this could be
IntoIterator<Item = RawFd>which feels more idiomatic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think consume_fds and i32 are fine. Detail comments above.
Of course, we need to add good documentation to such pub fn.
| } | ||
| } | ||
|
|
||
| pub fn consume_fds(&mut self, fds: Vec<i32>) -> Result<(), PayloadConfigError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a complex mechanism so we should add proper documentation to the underlying mechanism. The mechanism is complex enough to write a few sentences if not paragraphs about it :)
- explain the rather complex overall mechanism
- explain constraints for developers for their public APIs
- explain how FDs are associated with components in a VM
- what else might be interesting
Or in other words. We should not worsen the situation that Cloud Hypervisor code is undocumented. New functionality especially for complex stuff should have documentation. I don't say must as I'm not a maintainer, yet I'm a major contributor :)
| return Ok(()); | ||
| }; | ||
|
|
||
| for net in nets { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here you basically hard code the order in that FDs are applied? You start with the net devices and future components (memory, virtio-blk) will follow below?
| } | ||
|
|
||
| impl NetConfig { | ||
| pub fn consume_fds(&mut self, fds: Vec<i32>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RawFd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, documentation is missing. Further, I think the the name is not optimal. Would replace_fds be a more matching name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we need to add documentation around this pub fn.
The naming looks fine to me - it is good to stay consistent with the consume_fds provided from VmConfig, RestoreConfig, etc.
Perhaps, it is good to use a trait to unify such behavior across different data structure along with some common sanity checks and logging. Just a thought.
|
|
||
| impl NetConfig { | ||
| pub fn consume_fds(&mut self, fds: Vec<i32>) { | ||
| if !fds.is_empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this check, why not a simple replace? I can't think of a valid use case where we keep the old self.fds here - can you? :)
| .map(|f| f.try_clone().unwrap().into_raw_fd()) | ||
| .collect(); | ||
|
|
||
| net_cfg.consume_fds(fds); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
attach_fds_to_cfg() did some sanity checks and helpful logging, which you removed now. This is a little unfortunate and I think we can do better here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to handle error from consume_fds(), the same as what's being done for VmCreate.
Also, I agree the (rest of) sanity checks and helpful logging should be added to consume_fds() if applicable.
| continue; | ||
| }; | ||
| let mut fds = vec![]; | ||
| for idx in 0..external_fds.ids.len() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a very C-style way to iterate. I think this can be replaced with for (id, idx) in 0..external_fds.ids.iter().enumerate()
| #[error("Cannot restore VM")] | ||
| Restore(#[source] MigratableError), | ||
|
|
||
| #[error("Invalid external FDs")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this error message is not helpful; please add more context. Currently, this originates from the consume() path. Why can'T they be consumed? Is there a mismatch?
|
|
||
| pub fn consume_fds(&mut self, fds: Vec<i32>) -> Result<(), PayloadConfigError> { | ||
| // Update self.external_fds, if appropriate. | ||
| let Some(external_fds) = self.external_fds.as_mut() else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this reads a bit cumbersome. How about a more idiomatic version?
self.external_fds.as_mut()
.ok_or_else(|| if fds.is_empty() {
return Ok(());
} else {
PayloadConfigError::FdCountMismatch
})?;| } | ||
| }; | ||
|
|
||
| if external_fds.ids.len() != fds.len() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does the number of provided component (device) IDs must match the number of FDs?! This sounds wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the design of the proposed API syntax. I agree it should be explicitly explained, say from the commit message.
| parser | ||
| .add("source_url") | ||
| .add("prefault") | ||
| .add("external_ids") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| .add("external_ids") | |
| .add("external_fds_ids") |
more verbose but much clearer. The other name is very weird and hard to get a meaning from
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current naming reads okay to me. If we want to be more verbose, I'd go with external_device_ids and external_device_fds.
likebreath
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@posk-io Thank you for the contribution. I like the proposed refactoring. Detailed comments below. Please also don't forget to update the doc: https://github.com/cloud-hypervisor/cloud-hypervisor/blob/main/docs/snapshot_restore.md#restore-a-vm-with-new-net-fds.
| let fds = match &mut restore_config.external_fds { | ||
| Some(external_fds) => { | ||
| let mut fds = vec![]; | ||
| core::mem::swap(&mut fds, &mut external_fds.fds); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what you need is std::mem::take(&mut external_fds.fds).
Essentially does the same (taking ownership of the inner Vec<i32>), but more concise and readable.
| parser | ||
| .add("source_url") | ||
| .add("prefault") | ||
| .add("external_ids") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current naming reads okay to me. If we want to be more verbose, I'd go with external_device_ids and external_device_fds.
| .map(|f| f.try_clone().unwrap().into_raw_fd()) | ||
| .collect(); | ||
|
|
||
| net_cfg.consume_fds(fds); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to handle error from consume_fds(), the same as what's being done for VmCreate.
Also, I agree the (rest of) sanity checks and helpful logging should be added to consume_fds() if applicable.
| restore_cfg | ||
| .consume_fds(fds) | ||
| .map_err(|_| HttpError::BadRequest)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto: Let's not lose and log the error being raised from consume_fds.
| pub struct ExternalFdsConfig { | ||
| pub ids: Vec<String>, | ||
| #[serde(default)] | ||
| pub fds: Vec<i32>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think i32 is fine here, particularly comparing with using RawFd. Such data structure (and all config related ones) is mostly consumed by external APIs for accepting user inputs, either via CLI or HTTP requests. A primitive data type fits better for such role. Sanity checks is now done when such data is consumed by internal APIs, say from config.validate() etc.
As for a bigger future refactoring (in a separate PR), i think using File is a better choice to enforce stricter semantics directly at the data structure level. This works out-of box with external APIs for serving HTTP requests, while changes needed for parsing CLI inputs.
| #[derive(Clone, Debug, PartialEq, Eq, Deserialize, Serialize)] | ||
| pub struct ExternalFdsConfig { | ||
| pub ids: Vec<String>, | ||
| #[serde(default)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should #[serde(skip)] for the fds. The main reason why we needed it in the original API syntax is using its length is matching with number of the incoming FDs sent over SCM_RIGHTS. We now have such information from ids.len(). (The other part of the reason is for logging and user introspection purposes).
This also removes the need to implement custom deserializer for ExternalFdsConfig (like deserialize_netconfig_fds for NetConfig).
| } | ||
|
|
||
| impl NetConfig { | ||
| pub fn consume_fds(&mut self, fds: Vec<i32>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we need to add documentation around this pub fn.
The naming looks fine to me - it is good to stay consistent with the consume_fds provided from VmConfig, RestoreConfig, etc.
Perhaps, it is good to use a trait to unify such behavior across different data structure along with some common sanity checks and logging. Just a thought.
| pub fn consume_fds(&mut self, fds: Vec<i32>) { | ||
| if !fds.is_empty() { | ||
| self.fds = Some(fds); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will need to at least ensure the number of incoming FDs is as expected. The original way is to check with the "deserailized" fds.len(). As I am reviewing your PR, I realized you can simply check with self.num_queues/2, e.g. the number of FDs should be the same as queue pairs.
| } | ||
| } | ||
|
|
||
| pub fn consume_fds(&mut self, fds: Vec<i32>) -> Result<(), PayloadConfigError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think consume_fds and i32 are fine. Detail comments above.
Of course, we need to add good documentation to such pub fn.
| } | ||
| }; | ||
|
|
||
| if external_fds.ids.len() != fds.len() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the design of the proposed API syntax. I agree it should be explicitly explained, say from the commit message.
|
Thanks for the review! I'm converting this PR into a draft/WIP, as we are currently doing a largish refactor related to FD passing: basically, we need to pass pretty much every guest VM resource (guest memory, the kernel, initramfs, vfio devices, etc.) to the VMM as FDs, and as such it probably makes sense for us to wait until we have the full picture of what and how before we settle on the API. This should not prevent anyone else from driving this effort, of course. But if the final API turns out to be not the best because it was put in place too soon, I'd prefer to be the consumer of it rather than the author... :) |
|
Feel free to present and discuss your early designs in the Cloud Hypervisor office hour meeting: that might also be a good chance to discuss motivation and if and now this can be upstreamed! I'd love to be part of that discussion. From my experience, this helps a lot for large and potentially controversial changes 👍 |
As discussed in PR #7377, Cloud Hypervisor may potentially accept different types of FDs on VM creation and/or resuming:
As FDs are passed as an array of anonymous numbers over UDS sockets, there should be a way to assign them to the devices they belong to.
This PR introduces ExternalFdsConfig in vm_config.rs, which matches, positionally, the list of device IDs/names to the list of FDs.
Signed-off-by: Peter Oskolkov [email protected]