ããããã宣è¨ãã¾ãããã¿ã¤ãã«ã§unshare(2)ã¨è¨ããªããclone(2)ã使ãã¾ãã
ã³ã³ãããèªä½ãã ã®ã趣å³ã ã£ãã®ã§ãRustã§ãã³ã³ããã®åºæ¬çãªæ©è½ã§ããLinux Namespaceå¨ãã®ã³ã¼ãã£ã³ã°ããã¦ã¿ãã
å¿
è¦ãªææã¨ãã¦ã¯ clone(2) ã®ã©ããã¼ã§ãããã libc::clone()
ã®å®ç¾©ãè¦ã¦ã使ãæ¹ãããããããªã...
pub unsafe extern "C" fn clone( cb: extern "C" fn(_: *mut c_void) -> c_int, child_stack: *mut c_void, flags: c_int, arg: *mut c_void, _: ... ) -> c_int
extern "C" fn(_: *mut c_void) -> c_int
ã£ã¦ã©ãããåãªãã ï¼- stackã
*mut c_void
ã ãã©ããã£ã¦ã©ãä½ããã ã mmap ãå¼ã°ãªãã¨ãã¡ï¼ å¼ã³ããããã¾ããã...ã- Cã§clone(2)ã«stackã渡ãéã¯ãã¹ã¿ãã¯ã¯æ«å°¾ããå é ã«ä¼¸ã³ãã®ã§ãæ«å°¾ã®ã¢ãã¬ã¹ã渡ãããã«ãã¤ã³ã¿æ¼ç®ãå¿ è¦ãªãã ãã©ãããããã®ãããå¿ è¦ããããã ããª...
é ãçããªã£ã¦ããããããã㧠nix::sched::clone()
ã使ãã
ã³ã¼ããè¦ã ã¨ãu8ã®ã¹ã©ã¤ã¹ãã¹ã¿ãã¯é åã¨ãã¦ä½¿ããããã«ããæä½ã¨ãããã¨å®ã¯ã³ã¼ã«ããã¯é¢æ°ã¯ void*
ã¨ãã¦ç¬¬4å¼æ°ã«æ¸¡ãã¦ããã ãªãã»ã©... ã¯ãã¼ã¸ã£ã ãããª... ãªã©ã®æ§ã
ãªæ°ã¥ããããã
ã¨ã«ããããã使ã£ã¦Namespaceã®åé¢ãè¡ãã
å®éã®ã³ã¼ã
æä½éã ã¨ãããªãã¨æãã use
ã大èã«çç¥ããã¹ã¿ã¤ã«ã
fn main() -> MyResult { let cb = Box::new(|| { let cmd = CString::new("bash").unwrap(); let args = vec![ CString::new("containered bash").unwrap(), CString::new("-l").unwrap(), ]; if let Err(e) = execvp(&cmd, &args.as_ref()) { eprintln!("execvp failed: {:?}", e); return 127; } 127 }); let mut child_stack = [0u8; 8192]; let flags = CloneFlags::CLONE_NEWNS | CloneFlags::CLONE_NEWUTS | CloneFlags::CLONE_NEWIPC | CloneFlags::CLONE_NEWPID; let sigchld = 17; // x86/arm. ref man 7 signal let _pid = clone(cb, &mut child_stack, flags, Some(sigchld))?; while let Ok(status) = waitpid(None, None) { println!("Exit Status: {:?}", status); } Ok(()) }
æ¸ããéããªãã ãã©ã child_stack ã¯æ®éã® u8 ã®é åã§OKãªç¹ããµã¤ãºã¯é°å²æ°ã§å²ãå½ã¦ã¦ãç¹ãsigchldã®å¤ã¯èªåã§èª¿ã¹ã¦ãç¹*1ã¯çæãã
ãã ãããã§ã³ã³ããåããbashã¯ç«ã¡ä¸ãããã ãã©ã /proc ãæ°ãããã¦ã³ããã¦ãã§ããªããchrootãã¦ãã§ããªãã¨ããæãã§ããã¹ãåãå¼ãããããããéé¢ããã¦ãæããªãã
æä½éå¿ è¦ãªåå¦çã追å
ã³ã³ããã®é°å²æ°ãåºãæä½éã®åå¦çãè¡ãã
mount --make-rprivate /
ç¸å½ãçºè¡ãã
ã¾ããMount Namespaceãåé¢ãã¦ããã«ãããããããsystemd管çã®Linuxãã£ã¹ããã§ã¯ããã©ã«ãã§ãã¹ãã® /
ãsharedã«ãªã£ã¦ãã¾ã£ã¦ããããã®ã¾ã¾ã§ã¯åé¢ãããNamespaceã§ã®å¤æ´ããã¹ãã«ä¼æããããã®è¾ºãã®ã話㯠TenForward ããã®è§£èª¬ã詳細ãªã®ã§è²ãã¾ã...ã
ã¨ãããã¨ã§ mount --make-rprivate /
ãæã¤ãã³ã³ããèªä½çéï¼ï¼ï¼ã§ã¯ãããããããªå¦çã ã¨æããä»å㯠nix::mount::mount()
ã§ã·ã¹ãã ã³ã¼ã«ãçºè¡ãããã¨ã§å®æ½ã
fn mount_make_private() -> Result<(), nix::Error> { mount( Some("none"), "/", None::<&str>, // ãã ã®Noneã ã¨åæ¨è«ãã¦ãããªãã...ã MsFlags::MS_REC | MsFlags::MS_PRIVATE, None::<&str>, ) }
chrootå ã®root filesystemãä½ã
ãã®ä¸ã§ãã³ã³ããã使ãrootããã¹ãã®rootãã¯åãããã®ã§ãchrootå ã®rootãbind mountã§ããã£ã¨ä½ã£ã¦ãã¾ãã
fn mount_bind(source: &str, target: &str) -> Result<(), nix::Error> { mount( Some(source), target, None::<&str>, MsFlags::MS_BIND, None::<&str>, ) } // å©ç¨æã®ã¤ã¡ã¼ã¸ create_dir(root)?; mount_bind("/", root)?;
procfsããã¦ã³ããã
ãããåãããã« mount()
ã«é©åãªãã©ã°ã渡ãã
fn mount_proc(source: &str, target: &str) -> Result<(), nix::Error> { mount( Some(source), target, Some("proc"), MsFlags::empty(), None::<&str>, ) }
ãã¨ã¯ nix::unistd::chroot()
㨠std::env::set_current_dir()
ãé©åãªé åºã§å¼ã¶ã
type MyResult = Result<(), Box<dyn std::error::Error>>; fn container_prelude(root: &str) -> MyResult { mount_make_private()?; create_dir(root)?; mount_bind("/", root)?; chroot(root)?; set_current_dir("/")?; mount_proc("proc", "/proc")?; Ok(()) }
container_prelude()
ãfork(clone)ãã¦ããexecããç´åã¾ã§ã®éã§å¼ã¶ãã¨ã§ãã³ã³ããã¨ãã¦ã»ããã¢ãããããç¶æ
㧠bash ãç«ã¡ä¸ãããã¨ã«ãªãã
main ã®å ¨ä½
use nix::mount::*; use nix::sched::*; use nix::sys::wait::waitpid; use nix::unistd::{chroot, execvp}; use std::env::{args, set_current_dir}; use std::ffi::CString; use std::fs::{create_dir, remove_dir}; type MyResult = Result<(), Box<dyn std::error::Error>>; // ä¸æ²ã®é¢æ°ãããã«ãçç¥ // ... fn main() -> MyResult { let usage = format!("Usage: {} [newroot]", args().nth(0).unwrap()); let root = args().nth(1).ok_or(usage)?; let cb = Box::new(|| { if let Err(e) = container_prelude(&root) { eprintln!("prelude failed: {:?}", e); return 127; } let cmd = CString::new("bash").unwrap(); let args = vec![ CString::new("containered bash").unwrap(), CString::new("-l").unwrap(), ]; if let Err(e) = execvp(&cmd, &args.as_ref()) { eprintln!("execvp failed: {:?}", e); return 127; } 127 }); let mut child_stack = [0u8; 8192]; let flags = CloneFlags::CLONE_NEWNS | CloneFlags::CLONE_NEWUTS | CloneFlags::CLONE_NEWIPC | CloneFlags::CLONE_NEWPID; let sigchld = 17; // x86/arm. ref man 7 signal let _pid = clone(cb, &mut child_stack, flags, Some(sigchld))?; while let Ok(status) = waitpid(None, None) { println!("Exit Status: {:?}", status); } // ä¸å¿å¾å§æ«ãã remove_dir(&root)?; Ok(()) }
åä½ã®æ§å
$ sudo ../target/debug/minimum-container-example /tmp/foobar-1 root@ubuntu-groovy:/# mount /dev/sda1 on / type ext4 (rw,relatime) proc on /proc type proc (rw,relatime) root@ubuntu-groovy:/# ps auxf USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.1 0.0 10008 5072 ? S 09:49 0:00 containered bash -l root 13 0.0 0.0 11476 3560 ? R+ 09:49 0:00 ps auxf root@ubuntu-groovy:/# exit logout Exit Status: Exited(Pid(198843), 0)
ä»åãã£ã¦ããªããã¨
- cgroup ã¨ã... ãã£ã¬ã¯ã㪠ã¨ãã¡ã¤ã«ã®æä½ãããã ããªã®ã§ãä»åº¦æ°ãåãããããã
- ãã®ä»ãlibcapãlibseccompããã£ã¦ããããããªãã¨ã¯ä¸æ¦ãã£ã¦ããªããFFIããã°ã§ããã¨æããã㨠setsid() ããã¨ãç´°ããã¨ã...ã
- ãããã¯ã¼ã¯å¨ãã setns ç¸å½ã®nixã®é¢æ°ãããã®ã§ã§ããã¨æããnetnsä½ãã«ã¯ããã¨ãã° ããããã¯ã¬ã¼ã ããã£ã¦ãããã¯pyroute2ç¸å½ã®ãã¨ãæ®éã«ã§ããã¿ãããªã®ã§ãnetnsãvethä½æãã§ããã¨æããã¡ãã£ã¨èª¿ã¹ããã¦ãªãã§ãã...ã
- OCI specã«æ²¿ã£ãå®è£ ã«æã£ã¦ããã»ã©ã®å æ°ã¯ãªãã§ã...ã
ã³ã¼ãã¯ããã«ç½®ãã¦ãããã
*1:ãããã£ã¨é©åãªåºãæ¹ããããã