Skip to content

Conversation

@posk-io
Copy link
Contributor

@posk-io posk-io commented Oct 23, 2025

As discussed in PR #7377, Cloud Hypervisor may potentially accept different types of FDs on VM creation and/or resuming:

  • FDs for net devices (currently supported);
  • FDs for pmem devices for memory zones (WIP);
  • FDs for storage devices, etc.

As FDs are passed as an array of anonymous numbers over UDS sockets, there should be a way to assign them to the devices they belong to.

This PR introduces ExternalFdsConfig in vm_config.rs, which matches, positionally, the list of device IDs/names to the list of FDs.

Signed-off-by: Peter Oskolkov [email protected]

@posk-io posk-io requested a review from a team as a code owner October 23, 2025 19:21
@posk-io
Copy link
Contributor Author

posk-io commented Oct 23, 2025

If the general approach here is acked, I'll be happy to add more tests, if needed/requested.

As discussed in PR cloud-hypervisor#7377, Cloud Hypervisor may potentially accept
different types of FDs on VM creation and/or resuming:
- FDs for net devices (currently supported);
- FDs for pmem devices for memory zones (WIP);
- FDs for storage devices, etc.

As FDs are passed as an array of anonymous numbers over UDS sockets,
there should be a way to assign them to the devices they belong to.

This PR introduces ExternalFdsConfig in vm_config.rs, which
matches, positionally, the list of device IDs/names to the list
of FDs.

Signed-off-by: Peter Oskolkov <[email protected]>
Copy link
Member

@phip1611 phip1611 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! In general, I love to have a generic way to solve this. I however see lots of caveats and room for improvement in the implementation as well as a few open design discussions. I also have to run this in our (= @cyberus-technology) internal test suite and test live migration with virtio-net devices backed by external FDs.

My review here is very comprehensive and strict cause we investigated quite some time to dive into this domain. It complex with many caveats. Together we can come up with a nice solution, tho!

My Checklist

  • Tested that this will be live-migration compatible for virtio-net

let fds = match &mut restore_config.external_fds {
Some(external_fds) => {
let mut fds = vec![];
core::mem::swap(&mut fds, &mut external_fds.fds);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is weird to use core::mem::swap here. Why not net.fds.replace(external_fds.fds)? Feels much more idiomatic to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you need is std::mem::take(&mut external_fds.fds).

Essentially does the same (taking ownership of the inner Vec<i32>), but more concise and readable.

landlock_rules: None,
#[cfg(feature = "ivshmem")]
ivshmem: None,
external_fds: None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not about this line but there is a typo in the commit message: I think UDS is not a common appreviation for Unix domain sockets? Could you please be a little more verbose here and write something like:

FDs are passed as an array of .... via UNIX Domain Sockets using the SCM_RIGHTS mechanism. or so?

}
}

pub fn consume_fds(&mut self, fds: Vec<i32>) -> Result<(), PayloadConfigError> {
Copy link
Member

@phip1611 phip1611 Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Please rename this to apply_external_fds_to_components or similar descriptive
  2. this could be IntoIterator<Item = RawFd> which feels more idiomatic

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think consume_fds and i32 are fine. Detail comments above.

Of course, we need to add good documentation to such pub fn.

}
}

pub fn consume_fds(&mut self, fds: Vec<i32>) -> Result<(), PayloadConfigError> {
Copy link
Member

@phip1611 phip1611 Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a complex mechanism so we should add proper documentation to the underlying mechanism. The mechanism is complex enough to write a few sentences if not paragraphs about it :)

  • explain the rather complex overall mechanism
  • explain constraints for developers for their public APIs
  • explain how FDs are associated with components in a VM
  • what else might be interesting

Or in other words. We should not worsen the situation that Cloud Hypervisor code is undocumented. New functionality especially for complex stuff should have documentation. I don't say must as I'm not a maintainer, yet I'm a major contributor :)

return Ok(());
};

for net in nets {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here you basically hard code the order in that FDs are applied? You start with the net devices and future components (memory, virtio-blk) will follow below?

}

impl NetConfig {
pub fn consume_fds(&mut self, fds: Vec<i32>) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RawFd

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, documentation is missing. Further, I think the the name is not optimal. Would replace_fds be a more matching name?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we need to add documentation around this pub fn.

The naming looks fine to me - it is good to stay consistent with the consume_fds provided from VmConfig, RestoreConfig, etc.

Perhaps, it is good to use a trait to unify such behavior across different data structure along with some common sanity checks and logging. Just a thought.


impl NetConfig {
pub fn consume_fds(&mut self, fds: Vec<i32>) {
if !fds.is_empty() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this check, why not a simple replace? I can't think of a valid use case where we keep the old self.fds here - can you? :)

.map(|f| f.try_clone().unwrap().into_raw_fd())
.collect();

net_cfg.consume_fds(fds);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attach_fds_to_cfg() did some sanity checks and helpful logging, which you removed now. This is a little unfortunate and I think we can do better here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to handle error from consume_fds(), the same as what's being done for VmCreate.

Also, I agree the (rest of) sanity checks and helpful logging should be added to consume_fds() if applicable.

continue;
};
let mut fds = vec![];
for idx in 0..external_fds.ids.len() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a very C-style way to iterate. I think this can be replaced with for (id, idx) in 0..external_fds.ids.iter().enumerate()

#[error("Cannot restore VM")]
Restore(#[source] MigratableError),

#[error("Invalid external FDs")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this error message is not helpful; please add more context. Currently, this originates from the consume() path. Why can'T they be consumed? Is there a mismatch?


pub fn consume_fds(&mut self, fds: Vec<i32>) -> Result<(), PayloadConfigError> {
// Update self.external_fds, if appropriate.
let Some(external_fds) = self.external_fds.as_mut() else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this reads a bit cumbersome. How about a more idiomatic version?

self.external_fds.as_mut()
    .ok_or_else(|| if fds.is_empty() {
        return Ok(());
    } else {
        PayloadConfigError::FdCountMismatch
    })?;

}
};

if external_fds.ids.len() != fds.len() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does the number of provided component (device) IDs must match the number of FDs?! This sounds wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the design of the proposed API syntax. I agree it should be explicitly explained, say from the commit message.

parser
.add("source_url")
.add("prefault")
.add("external_ids")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.add("external_ids")
.add("external_fds_ids")

more verbose but much clearer. The other name is very weird and hard to get a meaning from

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current naming reads okay to me. If we want to be more verbose, I'd go with external_device_ids and external_device_fds.

Copy link
Member

@likebreath likebreath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@posk-io Thank you for the contribution. I like the proposed refactoring. Detailed comments below. Please also don't forget to update the doc: https://github.com/cloud-hypervisor/cloud-hypervisor/blob/main/docs/snapshot_restore.md#restore-a-vm-with-new-net-fds.

let fds = match &mut restore_config.external_fds {
Some(external_fds) => {
let mut fds = vec![];
core::mem::swap(&mut fds, &mut external_fds.fds);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you need is std::mem::take(&mut external_fds.fds).

Essentially does the same (taking ownership of the inner Vec<i32>), but more concise and readable.

parser
.add("source_url")
.add("prefault")
.add("external_ids")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current naming reads okay to me. If we want to be more verbose, I'd go with external_device_ids and external_device_fds.

.map(|f| f.try_clone().unwrap().into_raw_fd())
.collect();

net_cfg.consume_fds(fds);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to handle error from consume_fds(), the same as what's being done for VmCreate.

Also, I agree the (rest of) sanity checks and helpful logging should be added to consume_fds() if applicable.

Comment on lines +305 to +307
restore_cfg
.consume_fds(fds)
.map_err(|_| HttpError::BadRequest)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto: Let's not lose and log the error being raised from consume_fds.

pub struct ExternalFdsConfig {
pub ids: Vec<String>,
#[serde(default)]
pub fds: Vec<i32>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think i32 is fine here, particularly comparing with using RawFd. Such data structure (and all config related ones) is mostly consumed by external APIs for accepting user inputs, either via CLI or HTTP requests. A primitive data type fits better for such role. Sanity checks is now done when such data is consumed by internal APIs, say from config.validate() etc.

As for a bigger future refactoring (in a separate PR), i think using File is a better choice to enforce stricter semantics directly at the data structure level. This works out-of box with external APIs for serving HTTP requests, while changes needed for parsing CLI inputs.

#[derive(Clone, Debug, PartialEq, Eq, Deserialize, Serialize)]
pub struct ExternalFdsConfig {
pub ids: Vec<String>,
#[serde(default)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should #[serde(skip)] for the fds. The main reason why we needed it in the original API syntax is using its length is matching with number of the incoming FDs sent over SCM_RIGHTS. We now have such information from ids.len(). (The other part of the reason is for logging and user introspection purposes).

This also removes the need to implement custom deserializer for ExternalFdsConfig (like deserialize_netconfig_fds for NetConfig).

}

impl NetConfig {
pub fn consume_fds(&mut self, fds: Vec<i32>) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we need to add documentation around this pub fn.

The naming looks fine to me - it is good to stay consistent with the consume_fds provided from VmConfig, RestoreConfig, etc.

Perhaps, it is good to use a trait to unify such behavior across different data structure along with some common sanity checks and logging. Just a thought.

Comment on lines +358 to +363
pub fn consume_fds(&mut self, fds: Vec<i32>) {
if !fds.is_empty() {
self.fds = Some(fds);
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will need to at least ensure the number of incoming FDs is as expected. The original way is to check with the "deserailized" fds.len(). As I am reviewing your PR, I realized you can simply check with self.num_queues/2, e.g. the number of FDs should be the same as queue pairs.

}
}

pub fn consume_fds(&mut self, fds: Vec<i32>) -> Result<(), PayloadConfigError> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think consume_fds and i32 are fine. Detail comments above.

Of course, we need to add good documentation to such pub fn.

}
};

if external_fds.ids.len() != fds.len() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the design of the proposed API syntax. I agree it should be explicitly explained, say from the commit message.

@posk-io posk-io marked this pull request as draft November 6, 2025 21:47
@posk-io
Copy link
Contributor Author

posk-io commented Nov 6, 2025

Thanks for the review! I'm converting this PR into a draft/WIP, as we are currently doing a largish refactor related to FD passing: basically, we need to pass pretty much every guest VM resource (guest memory, the kernel, initramfs, vfio devices, etc.) to the VMM as FDs, and as such it probably makes sense for us to wait until we have the full picture of what and how before we settle on the API.

This should not prevent anyone else from driving this effort, of course. But if the final API turns out to be not the best because it was put in place too soon, I'd prefer to be the consumer of it rather than the author... :)

@posk-io posk-io changed the title vmm: refactor FD passing WIP: vmm: refactor FD passing Nov 6, 2025
@phip1611
Copy link
Member

phip1611 commented Nov 7, 2025

Feel free to present and discuss your early designs in the Cloud Hypervisor office hour meeting: that might also be a good chance to discuss motivation and if and now this can be upstreamed! I'd love to be part of that discussion.

From my experience, this helps a lot for large and potentially controversial changes 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants