In this episode, we cover the following topics:
- Operating-system-level virtualization = containers
- Allows the resources of a computer to be partitioned via the kernel
- All containers share single kernel with each other AND the host system
- Depend on their host OS to do all the communication and interaction with the physical machine
- Containers don't need a hypervisor; they run directly within the host machine's kernel
- Containers are using the underlying operational system resources and drivers
- This is why you cannot run different OSes on the same host system
- i.e. Windows containers can run on Windows only, and Linux Containers can run on Linux only
- What we think of different OSes (RHEL, CentOS, SUSE, Debian, Ubuntu) are not really different...
- They are all same core OS (Linux), they just differ in apps/files
- Based on the virtualization, isolation, and resource management mechanisms provided by the Linux kernel
- Container history
- FreeBSD Jails (2000)
- BSD userland software that runs on top of the chroot(2) system call
- chroot is used to change the root directory of a set of processes
- Processes created in the chrooted environment cannot access files or resources outside of it
- Jails virtualize access to the file system, the set of users, and the networking subsystem
- A jail is characterized by four elements:
- Directory subtree: the starting point from which a jail is entered
- Once inside the jail, a process is not permitted to escape outside of this subtree
- Hostname
- IP address
- Command: the path name of an executable to run inside the jail
- Configured via jail.conf file
- LXC containers (2008)
- Userspace interface for the Linux kernel features to contain processes, including:
- Kernel namespaces (ipc, uts, mount, pid, network and user)
- Apparmor and SELinux profiles
- Seccomp policies
- Chroots (using pivot_root)
- Kernel capabilities
- CGroups (control groups)
- Docker containers (2014)
- Early versions of Docker used LXC as the container runtime
- LXC was made optional in v0.9 (March 2014)
- Replaced by libcontainer)
- libcontainer became the core of runC
- LXC was dropped in v1.10 (February 2016)
- Container technology
- Containers are just processes. So what makes them special?
- Namespaces
- Restrict what you can SEE
- Virtualize system resources, like the file system or networking
- Makes it appear to processes within the namespace that they have their own isolated instance of resource
- Changes to the global resource only visible to processes that are members of the namespace
- Processes inherit from parent
- Linux provides the following namespaces:
- IPC (interprocess communications)
- CLONE_NEWIPC: Isolates System V IPC, POSIX message queues
- Network
- CLONE_NEWNET: Isolates network devices, stacks, ports, etc
- Mount
- CLONE_NEWNS: Isolates mount points
- PID
- CLONE_NEWPID: Isolates process IDs
- User
- CLONE_NEWUSER: Isolates user and group IDs
- UTS (Unix Timesharing System)
- CLONE_NEWUTS: Isolates hostname and NIS domain name
- Cgroup
- CLONE_NEWCGROUP: Isolates cgroup root directory
- Syscall interface
- System call is the fundamental interface between an app and the Linux kernel
- i.e. Linux kernel calls to create/enter namespaces for processes
- Control groups (cgroups)
- Restrict what you can DO
- Limits an application (container) to a specific set of resources like CPU and memory
- Allow containers to share available hardware resources and optionally enforce limits and constraints
- Creating, modifying, using cgroups is done through the cgroup virtual filesystem
- Processes inherit from parent
- Can be reassigned to different cgroups
- Memory
- CPU / CPU cores
- Devices
- I/O
- Processes
- Using cgroups
- To see mounted cgroups:
- To create a new cgroup:
- mkdir /sys/fs/cgroup/cpu/chris
- To set "cpu.shares" to 512:
- echo 512 > /sys/fs/cgroup/cpu/chris/cpu.shares
- Now add a process to this cgroup:
- echo <get_pid> > /sys/fs/cgroup/cpu/chris/cgroup.procs
- Pseudo code: Creating a container
- Steps:
- Create root filesystem for container
- Spin up busybox in Docker container, and then export filesystem
- Run "launcher" process that sets up "child" namespace
- Launcher process forks new child process (now under new namespaces)
- Child process then forks new process for container
- chroot (to our root filesystem)
- mount any other FS
- set cgroups (e.g. apply CPU constraints)
Links
End Song
Bettie Black & Sophia - Something Beautiful
For a full transcription of this episode, please visit the episode webpage.
We'd love to hear from you! You can reach us at: