Learning about containers and how they work by creating them the hard way

Overview

Containers the hard way: Gocker: A mini Docker written in Go

It is a set of Linux's operating system primitives that provide the illusion of a container. A process or a set of processes can shed their environment or namespaces and live in new namespaces of their own, separate from the host's default namespace. Container management systems like Docker make it incredibly easy to manage containers on your machine. But how are these containers constructed? It is just a sequence of Linux system calls (involving namespaces and cgroups, mainly), at the very basic level while also leveraging other existing Linux technologies for container file system, networking, etc.

What is Gocker?

Gocker is an implementation from scratch of the core functionalities of Docker in the Go programming language. The main aim here is to provide an understanding of how exactly containers work at the Linux system call level. Gocker allows you to create containers, manage container images, execute processes in existing containers, etc.

Gocker session

Gocker explanation

Gocker and how it works is explained at the Linux system call level on the Unixism blog. If you are interested in that level of detail, please read it.

Why Gocker?

When I came across bocker, which is Docker-like container management written system in Bash shell script, I found 2 problems with it:

  • Bocker uses various Linux utilities. While you get the point, command line utilities are opaque, and you don't get to understand what they are doing at the Linux system call level. Also, a single command can sometime issue a more than one pertinent system calls.
  • Bocker's last commit is more than 5 years ago, and it does not work anymore. Docker Hub API changes seem to have broken it.

Gocker on the other hand is pure Go source code which allows you to see what exactly goes on at the Linux system call level. This should give you a way better understanding of how containers actually work.

Don't get me wrong here. Bocker is still a fantastic and very creatively written tool. If you want to understand how containers work, you should still take a look at it and I'm confident you'll learn a thing or two from it, just like I did.

Gocker capabilities

Gocker can emulate the core of Docker, letting you manage Docker images (which it gets from Docker Hub), run containers, list running containers or execute a process in an already running container:

  • Run a process in a container
    • gocker run <--cpus=cpus-max> <--mem=mem-max> <--pids=pids-max> <image[:tag]> </path/to/command>
  • List running containers
    • gocker ps
  • Execute a process in a running container
    • gocker exec <container-id> </path/to/command>
  • List locally available images
    • gocker images
  • Remove a locally available image
    • gocker rmi <image-id>

Other capabilities

  • Gocker uses the Overlay file system to create containers quickly without the need to copy whole file systems while also sharing the same container image between multiple container instances.
  • Gocker containers get their own networking namespace and are able to access the internet. See limitations below.
  • You can control system resources like CPU percentage, the amount of RAM and the number of processes. Gocker achieves this by leveraging cgroups.

Gocker container isolation

Containers created with Gocker get the following namespaces of their own (see run.go):

  • File system (via chroot)
  • PID
  • IPC
  • UTS (hostname)
  • Mount
  • Network

While cgroups to limit the following are created, containers are left to use unlimited resources unless you specify the --mem, --cpus or --pids options to the gocker run command. These flags limit the maximum RAM, CPU cores and PIDs the container can consume respectively.

  • Number of CPU cores
  • RAM
  • Number of PIDs (to limit processes)

An example Gocker session

➜  sudo ./gocker images          
2020/06/12 08:32:23 Cmd args: [./gocker images]
IMAGE	             TAG	   ID
centos
	          latest 470671670cac
redis
	          latest c349430fd524
ubuntu
	           18.04 c3c304cb4f22
	          latest 1d622ef86b13
➜  sudo ./gocker run alpine /bin/sh
2020/06/12 08:33:33 Cmd args: [./gocker run alpine /bin/sh]
2020/06/12 08:33:33 New container ID: 7bfe9b0f1c2e
2020/06/12 08:33:33 Downloading metadata for alpine:latest, please wait...
2020/06/12 08:33:36 imageHash: a24bb4013296
2020/06/12 08:33:36 Checking if image exists under another name...
2020/06/12 08:33:36 Image doesn't exist. Downloading...
2020/06/12 08:33:38 Successfully downloaded alpine
2020/06/12 08:33:38 Uncompressing layer to: /var/lib/gocker/images/a24bb4013296/fe8bebfdf212/fs 
2020/06/12 08:33:38 Image to overlay mount: a24bb4013296
2020/06/12 08:33:38 Cmd args: [/proc/self/exe setup-netns 7bfe9b0f1c2e]
2020/06/12 08:33:38 Cmd args: [/proc/self/exe setup-veth 7bfe9b0f1c2e]
2020/06/12 08:33:38 Cmd args: [/proc/self/exe child-mode --img=a24bb4013296 7bfe9b0f1c2e /bin/sh]
/ # ifconfig 
lo        Link encap:Local Loopback  
         inet addr:127.0.0.1  Mask:255.0.0.0
         inet6 addr: ::1/128 Scope:Host
         UP LOOPBACK RUNNING  MTU:65536  Metric:1
         RX packets:0 errors:0 dropped:0 overruns:0 frame:0
         TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:1000 
         RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

veth1_7bfe9b Link encap:Ethernet  HWaddr 02:42:6E:E8:FC:06  
         inet addr:172.29.41.13  Bcast:172.29.255.255  Mask:255.255.0.0
         inet6 addr: fe80::42:6eff:fee8:fc06/64 Scope:Link
         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
         RX packets:22 errors:0 dropped:0 overruns:0 frame:0
         TX packets:7 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:1000 
         RX bytes:2328 (2.2 KiB)  TX bytes:586 (586.0 B)

/ # ps aux
PID   USER     TIME  COMMAND
   1 root      0:00 /proc/self/exe child-mode --img=a24bb4013296 7bfe9b0f1c2e /bin/sh
   7 root      0:00 /bin/sh
   9 root      0:00 ps aux
/ # apk add python3
fetch http://dl-cdn.alpinelinux.org/alpine/v3.12/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.12/community/x86_64/APKINDEX.tar.gz
(1/10) Installing libbz2 (1.0.8-r1)
(2/10) Installing expat (2.2.9-r1)
(3/10) Installing libffi (3.3-r2)
(4/10) Installing gdbm (1.13-r1)
(5/10) Installing xz-libs (5.2.5-r0)
(6/10) Installing ncurses-terminfo-base (6.2_p20200523-r0)
(7/10) Installing ncurses-libs (6.2_p20200523-r0)
(8/10) Installing readline (8.0.4-r0)
(9/10) Installing sqlite-libs (3.32.1-r0)
(10/10) Installing python3 (3.8.3-r0)
Executing busybox-1.31.1-r16.trigger
OK: 53 MiB in 24 packages
/ # python3
Python 3.8.3 (default, May 15 2020, 01:53:50) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> exit()
/ # exit
2020/06/12 08:34:34 Container done.
➜  sudo ./gocker run ubuntu /bin/bash
2020/06/12 08:35:13 Cmd args: [./gocker run ubuntu /bin/bash]
2020/06/12 08:35:13 New container ID: c7eb7bab7e4c
2020/06/12 08:35:13 Image already exists. Not downloading.
2020/06/12 08:35:13 Image to overlay mount: 1d622ef86b13
2020/06/12 08:35:13 Cmd args: [/proc/self/exe setup-netns c7eb7bab7e4c]
2020/06/12 08:35:13 Cmd args: [/proc/self/exe setup-veth c7eb7bab7e4c]
2020/06/12 08:35:13 Cmd args: [/proc/self/exe child-mode --img=1d622ef86b13 c7eb7bab7e4c /bin/bash]
[email protected]:/# 

[On another terminal]

➜  sudo ./gocker ps
[sudo] password for shuveb: 
2020/06/12 08:36:19 Cmd args: [./gocker ps]
CONTAINER ID	IMAGE		COMMAND
c7eb7bab7e4c	ubuntu:latest	/usr/bin/bash
➜  sudo ./gocker exec c7eb7bab7e4c /bin/bash
2020/06/12 08:37:15 Cmd args: [./gocker exec c7eb7bab7e4c /bin/bash]
[email protected]:/# ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.0  0.0 1153100 6132 ?        Sl   03:05   0:00 /proc/self/exe child-mode --img=1d622ef86b13 
root           8  0.0  0.0   4116  3236 ?        S+   03:05   0:00 /bin/bash
root          11  0.0  0.0   4116  3376 ?        S    03:07   0:00 /bin/bash
root          14  0.0  0.0   5888  2956 ?        R+   03:07   0:00 ps aux
[email protected]:/# 

Gocker limitations

Here are some limitations I'd love to fix in a future release:

  • Gocker does not currently support exposing container ports on the host. Whenever Docker containers need to expose ports on the host, Docker uses the program docker-proxy as a proxy to get that done. Gocker needs a similar proxy developed. While Gocker containers can access the internet today, the ability to expose ports on the host will be a great feature to have (mainly to learn how that's done).
  • Gocker does not do error handling well. Should something go wrong especially when attempting to run a container, Gocker might not cleanly unmount some file systems.

Containers accessing internet

When you run Gocker for the first time, a new bridge, gocker0 is created. Since all container network interfaces are connected to this bridge, they can talk to each other without you having to do anything. For containers to be able to reach the internet though, you need to enable packet forwarding on the host. For this, a convenience script enable_internet.sh has been provided. You might need to change it to reflect the name of your internet connected interface before you run it. There are instructions in the script. After you run this, Gocker containers should be able to reach the internet and install packages, etc.

External Go libraries used

  • GoContainerRegistry for downloading container images from a container registry, the default being Docker Hub.
  • PFlag for handling command line flags.
  • Netlink to configure Linux network interfaces without having to get bogged down by Netlink socket programming.
  • Unix Because Unix :)

Disclaimer

Gocker runs as root. Use at your own risk. This is my first Go program beyond a reasonable number of lines, and I'm sure there are better ways to write Go programs and there might still be a lot of bugs lingering in here. Here are some things Gocker does to your system so you know:

  • It creates the gocker0 bridge if it does not exist.
  • It blindly assumes that the IP address range 172.29.*.* is available and uses it.
  • It creates various namespaces and cgroups.
  • It mounts overlay file systems.

To this end, the safest way to run Gocker might be in a virtual machine.

Distributions

I developed Gocker on my day-to-day Arch Linux based computer. I also tested Gocker on an Ubuntu 20.04 virtual machine. It works great.

Building and running

Once you clone the repo, assuming you have Go installed on your machine, change into the Gocker directory and use the following command to retrieve dependencies:

go mod download

Then, to build gocker, run the following command:

go build -o gocker .

About me

My name is Shuveb Hussain and I'm the author of the Linux-focused blog Unixism.net. You can follow me on Twitter where I post tech-related content mostly focusing on Linux, performance, scalability and cloud technologies.

Comments
  • trying to execute snap application into gocker

    trying to execute snap application into gocker

    Hello, I'm trying to execute snapcraft apps into gocker but with error, perhaps do you have an idea

    [email protected] /]# hello-world internal error, please report: running "hello-world" failed: cannot create transient scope: DBus error "org.freedesktop.DBus.Error.InvalidArgs": [Process 54 is a kernel thread, refusing.] [[email protected] /]#

    I do not know if in gocker i have access to parent dbus.

    Regards, NIcolas

    opened by limbo127 1
  • gocker containers aren't talking to each other

    gocker containers aren't talking to each other

    Hello,

    Thanks for making this project opensource. I'm trying to play with it. I created 2 containers running alpine image and /bin/sh shell each.

    Container A got ip 172.29.15.82 and Container B got ip 172.29.174.195. As you can see, gocker0 bridge in host was setup with 172.29.0.0/16 range.

    Bridge control shows gocker0 was setup as expected

    sudo brctl show gocker0
    bridge name	bridge id		STP enabled	interfaces
    gocker0		8000.2a5377828fc2	no		veth0_7de037
    							                        veth0_def9e1
    

    However, if I ping one another, there is no talking

    / # ping 172.29.174.195
    PING 172.29.174.195 (172.29.174.195): 56 data bytes
    ^C
    --- 172.29.174.195 ping statistics ---
    3 packets transmitted, 0 packets received, 100% packet loss
    / #
    

    What am I missing ?

    opened by kspviswa 1
  • About the Syscall package deprecation

    About the Syscall package deprecation

    Hello, thank you for providing me with a project to better understand Docker, but When I used it, I found that I could not use the function syscall.mounth. After checking the official documents, I found that Syscall had been deactivated, which seems to be a problem

    opened by sandaawa 1
  • bug: Run fails with `Mount failed: invalid argument`

    bug: Run fails with `Mount failed: invalid argument`

    Hi!

    Today I tried to run gocker on my MacOS using following setup:

    1. I used docker: docker run --rm -it --cap-add=NET_ADMIN --cap-add=SYS_ADMIN --security-opt apparmor:unconfined ubuntu:latest
    2. I installed all necessary things (git, iptables, build-essential, sudo, golang)
    3. I compiled the binary
    4. I tried to run it using sudo ./gocker run alpine /bin/sh

    The run command failed with:

    2020/08/19 22:33:40 Cmd args: [./gocker run alpine /bin/sh]
    2020/08/19 22:33:40 New container ID: 759be7783396
    2020/08/19 22:33:40 Downloading metadata for alpine:latest, please wait...
    2020/08/19 22:33:42 imageHash: a24bb4013296
    2020/08/19 22:33:42 Checking if image exists under another name...
    2020/08/19 22:33:42 Image doesn't exist. Downloading...
    2020/08/19 22:33:51 Successfully downloaded alpine
    2020/08/19 22:33:51 Uncompressing layer to: /var/lib/gocker/images/a24bb4013296/fe8bebfdf212/fs
    2020/08/19 22:33:51 Image to overlay mount: a24bb4013296
    2020/08/19 22:33:51 Mount failed: invalid argument
    

    E.g. on following line: https://github.com/shuveb/containers-the-hard-way/blob/d90997c/run.go#L53 for call

    syscall.Mount("none", contFSHome+"/mnt", "overlay", 0, mntOptions)
    

    Do you please know why/what might be the reason?

    opened by matoous 1
  • Adding go.mod / go.sum

    Adding go.mod / go.sum

    How about adding go.mod and go.sum for the purpose of dependency management?

    Tests I performed

    • built successfully go build -o gocker .
    • run some command (sorry, not all)

    image image image

    opened by mikutas 1
  • Fix README formatting and typos

    Fix README formatting and typos

    Hello! Read your article and clicked through to check out the repo. Saw that a couple of these typos were on the article as well.

    Enjoyed the article and appreciate the example you’ve put together here.

    opened by clatour 1
  • Fatal error Unable to write to cgroup notification file

    Fatal error Unable to write to cgroup notification file

    Hello there!

    Thanks for making this project open source. I was trying to run it but stumble upon this issue:

    #  ./gocker run alpine ls
    2021/05/15 17:37:25 Cmd args: [./gocker run alpine /bin/sh
    2021/05/15 17:37:25 New container ID: 52597f3d1b2d
    2021/05/15 17:37:25 Image already exists. Not downloading.
    2021/05/15 17:37:25 Image to overlay mount: 6dbb9cc54074
    2021/05/15 17:37:25 Cmd args: [/proc/self/exe setup-netns 52597f3d1b2d]
    2021/05/15 17:37:25 Cmd args: [/proc/self/exe setup-veth 52597f3d1b2d]
    2021/05/15 17:37:25 Cmd args: [/proc/self/exe child-mode --img=6dbb9cc54074 52597f3d1b2d ls]
    2021/05/15 17:37:25 Fatal error: Unable to write to cgroup notification file: open /sys/fs/cgroup/memory/gocker/52597f3d1b2d/notify_on_release: permission denied
    2021/05/15 17:37:25 Fatal error: exit status 1
    

    Any ideas?

    opened by pathcl 3
  • prompt

    prompt

    I have recommended your project to HelloGitHub and successfully included it. I hope more people can learn the basic implementation principle of Docker through your project. :)

    opened by sandaawa 0
Owner
Shuveb Hussain
Shuveb Hussain
Kubedock is a minimal implementation of the docker api that will orchestrate containers on a Kubernetes cluster, rather than running containers locally.

Kubedock Kubedock is an minimal implementation of the docker api that will orchestrate containers on a kubernetes cluster, rather than running contain

Vincent van Dam 79 Nov 11, 2022
Open Source runtime scanner for Linux containers (LXD), It performs security audit checks based on CIS Linux containers Benchmark specification

lxd-probe Scan your Linux container runtime !! Lxd-Probe is an open source audit scanner who perform audit check on a linux container manager and outp

Chen Keinan 16 Dec 26, 2022
Docker-NodeJS - Creating a CI/CD Environment for Serverless Containers on Google Cloud Run

Creating a CI/CD Environment for Serverless Containers on Google Cloud Run Archi

David 1 Jan 8, 2022
Controller-check - Run checks against K8s controllers to verify if they meets certain conventions

controller-check Run checks against K8s controllers to verify if they meets cert

Sunny 2 Jan 4, 2022
Fadvisor(FinOps Advisor) is a collection of exporters which collect cloud resource pricing and billing data guided by FinOps, insight cost allocation for containers and kubernetes resource

[TOC] Fadvisor: FinOps Advisor fadvisor(finops advisor) is used to solve the FinOps Observalibility, it can be integrated with Crane to help users to

Crane 41 Jan 3, 2023
A tool to build, deploy, and release any environment using System Containers.

Bravetools Bravetools is an end-to-end System Container management utility. Bravetools makes it easy to configure, build, and deploy reproducible envi

null 125 Dec 14, 2022
Simple docker tui to list, start and stop your containers

docker-tui Simple docker tui that lets you list, start and stop your containers. Current status Rough, initial prototype. Build with This tool relies

Olek 7 Dec 2, 2022
Viewnode displays Kubernetes cluster nodes with their pods and containers.

viewnode The viewnode shows Kubernetes cluster nodes with their pods and containers. It is very useful when you need to monitor multiple resources suc

NTTDATA-DACH 9 Nov 23, 2022
Build and run Docker containers leveraging NVIDIA GPUs

NVIDIA Container Toolkit Introduction The NVIDIA Container Toolkit allows users to build and run GPU accelerated Docker containers. The toolkit includ

NVIDIA Corporation 15.6k Jan 7, 2023
Vulnerability Static Analysis for Containers

Clair Note: The main branch may be in an unstable or even broken state during development. Please use releases instead of the main branch in order to

QUAY 9.3k Jan 4, 2023
Provides an interactive prompt to connect to ECS Containers using the ECS ExecuteCommand API.

ecsgo Heavily inspired by incredibly useful gossm, this tool makes use of the new ECS ExecuteCommand API to connect to running ECS tasks. It provides

Ed Smith 45 Dec 12, 2022
Aceptadora provides the boilerplate to orchestrate the containers for an acceptance test.

aceptadora Aceptadora provides the boilerplate to orchestrate the containers for an acceptance test. Aceptadora is a replacement for docker-compose in

Cabify 57 Nov 16, 2022
Binary program to restart unhealthy Docker containers

DeUnhealth Restart your unhealthy containers safely Features Restart unhealthy containers marked with deunhealth.restart.on.unhealthy=true label Recei

Quentin McGaw 59 Dec 22, 2022
A super simple demo to document my journey to reasonably sized docker containers.

hello-docker A super simple demo to document my journey to reasonably sized docker containers. Task at Hand Build a docker container as small as possi

Torsten Wunderlich 0 Nov 30, 2021
Truly Minimal Linux Distribution for Containers

Statesman Statesman is a minimal Linux distribution, running from memory, that has just enough functionality to run OCI-compatible containers. Rationa

James Cunningham 3 Nov 12, 2021
My Homemade ci-cd service made for docker containers

Docker-CI-CD What Is This? The Docker-CI-CD is a tool that helps you to make every commit and push to your github repositories happen seamlessly and a

null 1 Jan 24, 2022
Show dependency graph of docker images/containers

docker-graph Show dependency graph of docker images/containers like this: Orange is images and green is containers. Features Collect docker images, co

Tomohisa Hirami 0 Feb 7, 2022
A kubernetes cni, connecting containers to neutron virtual networks.

neutron-cni A kubernetes cni, connecting containers to neutron virtual networks. Network Topology Architecture Quick Start Build make build-dev-im

null 8 May 5, 2022
An app that fetches a random name and joke, and combines them.

Wildfire Backend Assessment An app that fetches a random name and joke, and combines them.

Kushol Huq 0 Jan 29, 2022