Application Kernel for Containers

Overview

gVisor

Build status Issue reviver gVisor chat code search

What is gVisor?

gVisor is an application kernel, written in Go, that implements a substantial portion of the Linux system surface. It includes an Open Container Initiative (OCI) runtime called runsc that provides an isolation boundary between the application and the host kernel. The runsc runtime integrates with Docker and Kubernetes, making it simple to run sandboxed containers.

Why does gVisor exist?

Containers are not a sandbox. While containers have revolutionized how we develop, package, and deploy applications, using them to run untrusted or potentially malicious code without additional isolation is not a good idea. While using a single, shared kernel allows for efficiency and performance gains, it also means that container escape is possible with a single vulnerability.

gVisor is an application kernel for containers. It limits the host kernel surface accessible to the application while still giving the application access to all the features it expects. Unlike most kernels, gVisor does not assume or require a fixed set of physical resources; instead, it leverages existing host kernel functionality and runs as a normal process. In other words, gVisor implements Linux by way of Linux.

gVisor should not be confused with technologies and tools to harden containers against external threats, provide additional integrity checks, or limit the scope of access for a service. One should always be careful about what data is made available to a container.

Documentation

User documentation and technical architecture, including quick start guides, can be found at gvisor.dev.

Installing from source

gVisor builds on x86_64 and ARM64. Other architectures may become available in the future.

For the purposes of these instructions, bazel and other build dependencies are wrapped in a build container. It is possible to use bazel directly, or type make help for standard targets.

Requirements

Make sure the following dependencies are installed:

Building

Build and install the runsc binary:

mkdir -p bin
make copy TARGETS=runsc DESTINATION=bin/
sudo cp ./bin/runsc /usr/local/bin

Testing

To run standard test suites, you can use:

make unit-tests
make tests

To run specific tests, you can specify the target:

make test TARGETS="//runsc:version_test"

Using go get

This project uses bazel to build and manage dependencies. A synthetic go branch is maintained that is compatible with standard go tooling for convenience.

For example, to build and install runsc directly from this branch:

echo "module runsc" > go.mod
GO111MODULE=on go get gvisor.dev/gvisor/[email protected]
CGO_ENABLED=0 GO111MODULE=on sudo -E go build -o /usr/local/bin/runsc gvisor.dev/gvisor/runsc

Subsequently, you can build and install the shim binary for containerd:

GO111MODULE=on sudo -E go build -o /usr/local/bin/containerd-shim-runsc-v1 gvisor.dev/gvisor/shim

Note that this branch is supported in a best effort capacity, and direct development on this branch is not supported. Development should occur on the master branch, which is then reflected into the go branch.

Community & Governance

See GOVERNANCE.md for project governance information.

The gvisor-users mailing list and gvisor-dev mailing list are good starting points for questions and discussion.

Security Policy

See SECURITY.md.

Contributing

See Contributing.md.

Issues
  • DNS fails on gVisor using netstack on EKS

    DNS fails on gVisor using netstack on EKS

    Description

    I'm deploying Pods on my EKS cluster using the gVisor runtime, however the outbound network requests fail while inbound requests succeed. The issue is mitigated when using network=host in the runsc config options.

    Steps to reproduce

    1. I created a 2 node EKS cluster and configured a node to use conatinerd as a container CRI and configured the gVisor runtime with containerd (following this tutorial). I also labeled the node I selected for gVisor with app=gvisor.

    EKS Cluster Nodes: (you can see the first node using containerd as it's container runtime)

    kubectl get nodes -o wide
    NAME                                           STATUS   ROLES    AGE    VERSION                INTERNAL-IP      EXTERNAL-IP     OS-IMAGE         KERNEL-VERSION                  CONTAINER-RUNTIME
    ip-192-168-31-136.us-west-2.compute.internal   Ready    <none>   3d1h   v1.16.12-eks-904af05   192.168.31.136   35.161.102.17   Amazon Linux 2   4.14.181-142.260.amzn2.x86_64   containerd://1.3.2
    ip-192-168-60-139.us-west-2.compute.internal   Ready    <none>   3d1h   v1.16.12-eks-904af05   192.168.60.139   44.230.198.56   Amazon Linux 2   4.14.181-142.260.amzn2.x86_64   docker://19.3.6
    

    runsc config on gVisor node:

    [[email protected] ~]$ ls /etc/containerd/
    config.toml  runsc.toml
    [[email protected] ~]$ cat /etc/containerd/config.toml 
    disabled_plugins = ["restart"]
    [plugins.linux]
      shim_debug = true
    [plugins.cri.containerd.runtimes.runsc]
      runtime_type = "io.containerd.runsc.v1"
    [plugins.cri.containerd.runtimes.runsc.options]
      TypeUrl = "io.containerd.runsc.v1.options"
      ConfigPath = "/etc/containerd/runsc.toml"
    [[email protected] ~]$ cat /etc/containerd/runsc.toml 
    [runsc_config]
      debug="true"
      strace="true"
      log-packets="true"
      debug-log="/tmp/runsc/%ID%/"
    
    1. I applied a gVisor runtime class to my cluster:
    cat << EOF | tee gvisor-runtime.yaml 
    apiVersion: node.k8s.io/v1beta1
    kind: RuntimeClass
    metadata:
      name: gvisor
    handler: runsc
    EOF
    
    kubectl apply -f gvisor-runtime.yaml
    
    1. And ran a simple nginx Pod using the gvisor runtime:
    cat << EOF | tee nginx-gvisor.yaml 
    apiVersion: v1
    kind: Pod
    metadata:
      name: nginx-gvisor
    spec:
      containers:
      - name: my-nginx
        image: nginx
        ports:                    
        - containerPort: 80
      nodeSelector:
        app: gvisor
      runtimeClassName: gvisor
    EOF
    
    kubectl create -f nginx-gvisor.yaml
    

    To verify the Pod is running with gVisor:

    # Get the container ID
    kubectl get pod nginx-gvisor -o jsonpath='{.status.containerStatuses[0].containerID}' 
    containerd://9f71a133fc27c3a305710552489c16977d5c48cd40f31810c2010dac393c5ba7%  
    
    # List conatienrs running with runsc on gVisor node
    [[email protected] gvisor]$ sudo env "PATH=$PATH" runsc --root /run/containerd/runsc/k8s.io list -quiet
    9411dfee3811da9dd45e8681f697bcf5326173d6510238ce70beb02ffe00f444
    9f71a133fc27c3a305710552489c16977d5c48cd40f31810c2010dac393c5ba7 
    
    1. To test the inbound network traffic of the Pod, I simply curled port 80 of the Pod and it succeeded. To test the outbound network traffic of the Pod, I did the following:
    kubectl exec --stdin --tty nginx-gvisor -- /bin/bash
    [email protected]:/# apt-get update
    Err:1 http://security.debian.org/debian-security buster/updates InRelease
      Temporary failure resolving 'security.debian.org'
    Err:2 http://deb.debian.org/debian buster InRelease
      Temporary failure resolving 'deb.debian.org'
    Err:3 http://deb.debian.org/debian buster-updates InRelease
      Temporary failure resolving 'deb.debian.org'
    Reading package lists... Done
    W: Failed to fetch http://deb.debian.org/debian/dists/buster/InRelease  Temporary failure resolving 'deb.debian.org'
    W: Failed to fetch http://security.debian.org/debian-security/dists/buster/updates/InRelease  Temporary failure resolving 'security.debian.org'
    W: Failed to fetch http://deb.debian.org/debian/dists/buster-updates/InRelease  Temporary failure resolving 'deb.debian.org'
    W: Some index files failed to download. They have been ignored, or old ones used instead.
    

    You can see that it fails. Other attempts such as wget www.google.com fail as well.

    For debug purposes, these are the DNS and routing tables (without net-tools, since I couldn't install them) in the Pod container:

    [email protected]:/# cat /etc/resolv.conf
    search default.svc.cluster.local svc.cluster.local cluster.local us-west-2.compute.internal
    nameserver 10.100.0.10
    options ndots:5
    [email protected]:/# cat /proc/net/route
    Iface   Destination     Gateway Flags   RefCnt  Use     Metric  Mask    MTU     Window  IRTT
    eth0    0101FEA9        00000000        0001    0       0       0       FFFFFFFF        0       0       0
    eth0    00000000        0101FEA9        0003    0       0       0       00000000        0       0       0  
    

    I also captured the tcpdump packets on the ENI network interface for the Pod allocated by EKS: eni567d651201a.nohost.tcpdump.tar.gz. Details about the network interface:

    [[email protected] ~]$ ifconfig
    eni567d651201a: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9001
            inet6 fe80::4cfa:44ff:fe5d:9495  prefixlen 64  scopeid 0x20<link>
            ether 4e:fa:44:5d:94:95  txqueuelen 0  (Ethernet)
            RX packets 3  bytes 270 (270.0 B)
            RX errors 0  dropped 2  overruns 0  frame 0
            TX packets 5  bytes 446 (446.0 B)
            TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
    

    I also captured runsc debug information for the containers in the Pod: 9f71a133fc27c3a305710552489c16977d5c48cd40f31810c2010dac393c5ba7.tar.gz 9411dfee3811da9dd45e8681f697bcf5326173d6510238ce70beb02ffe00f444.tar.gz

    1. Now to verify that it works when the Pod is using the host network, I added network="host" to the /etc/containerd/runsc.toml file and restarted containerd. I reran the same experiment above with the following results:

    Verify running Pod:

    # Get the container ID
    kubectl get pod nginx-gvisor -o jsonpath='{.status.containerStatuses[0].containerID}' 
    containerd://e4ec52fdad3e889bf386b1eca03e231ad53e0452e4bc623282732eba0d2da720%    
    
    # List conatienrs running with runsc on gVisor node
    [[email protected] gvisor]$ sudo env "PATH=$PATH" runsc --root /run/containerd/runsc/k8s.io list -quiet
    96198907b56174067a1aa2b9c0fa3644670675b25fa28a7b44234fc232cccd5d
    e4ec52fdad3e889bf386b1eca03e231ad53e0452e4bc623282732eba0d2da720 
    

    Successful inbound with curl, and successful outbound as follows:

    kubectl exec --stdin --tty nginx-gvisor -- /bin/bash
    [email protected]:/# apt-get update
    Get:1 http://security.debian.org/debian-security buster/updates InRelease [65.4 kB]
    Get:2 http://security.debian.org/debian-security buster/updates/main amd64 Packages [213 kB]
    Get:3 http://deb.debian.org/debian buster InRelease [121 kB]
    Get:4 http://deb.debian.org/debian buster-updates InRelease [51.9 kB]
    Get:5 http://deb.debian.org/debian buster/main amd64 Packages [7905 kB]
    Get:6 http://deb.debian.org/debian buster-updates/main amd64 Packages [7868 B]
    Fetched 8364 kB in 6s (1462 kB/s)
    Reading package lists... Done
    

    DNS and routing table (with net-tools this time) on Pod:

    [email protected]:/# cat /etc/resolv.conf
    search default.svc.cluster.local svc.cluster.local cluster.local us-west-2.compute.internal
    nameserver 10.100.0.10
    options ndots:5
    [email protected]:/# cat /proc/net/route
    Iface   Destination     Gateway Flags   RefCnt  Use     Metric  Mask    MTU     Window  IRTT
    eth0    00000000        0101FEA9        0003    0       0       0       00000000        0       0       0
    eth0    0101FEA9        00000000        0001    0       0       0       FFFFFFFF        0       0       0
    eth0    751FA8C0        00000000        0001    0       0       0       FFFFFFFF        0       0       0
    [email protected]:/# route
    Kernel IP routing table
    Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
    default         169.254.1.1     0.0.0.0         UG    0      0        0 eth0
    169.254.1.1     0.0.0.0         255.255.255.255 U     0      0        0 eth0
    192.168.31.117  0.0.0.0         255.255.255.255 U     0      0        0 eth0
    

    TCPDump file: eni567d651201a.host.tcpdump.tar.gz Details about the network interface:

    [[email protected] ~]$ ifconfig
    eni567d651201a: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9001
            inet6 fe80::58a9:b5ff:feda:27e5  prefixlen 64  scopeid 0x20<link>
            ether 5a:a9:b5:da:27:e5  txqueuelen 0  (Ethernet)
            RX packets 10  bytes 796 (796.0 B)
            RX errors 0  dropped 2  overruns 0  frame 0
            TX packets 5  bytes 446 (446.0 B)
            TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
    

    runsc debug files: 96198907b56174067a1aa2b9c0fa3644670675b25fa28a7b44234fc232cccd5d.tar.gz e4ec52fdad3e889bf386b1eca03e231ad53e0452e4bc623282732eba0d2da720.tar.gz

    Environment

    Please include the following details of your environment:

    • runsc -version
    [[email protected] ~]$ runsc -version
    runsc version release-20200622.1-171-gc66991ad7de6
    spec: 1.0.1-dev
    
    • kubectl version and kubectl get nodes -o wide
    $ kubectl version
    Client Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.6-beta.0", GitCommit:"e7f962ba86f4ce7033828210ca3556393c377bcc", GitTreeState:"clean", BuildDate:"2020-01-15T08:26:26Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"darwin/amd64"}
    Server Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.8-eks-fd1ea7", GitCommit:"fd1ea7c64d0e3ccbf04b124431c659f65330562a", GitTreeState:"clean", BuildDate:"2020-05-28T19:06:00Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
    
    $ kubectl get nodes -o wide                                                            
    NAME                                           STATUS   ROLES    AGE    VERSION                INTERNAL-IP      EXTERNAL-IP     OS-IMAGE         KERNEL-VERSION                  CONTAINER-RUNTIME
    ip-192-168-31-136.us-west-2.compute.internal   Ready    <none>   3d3h   v1.16.12-eks-904af05   192.168.31.136   35.161.102.17   Amazon Linux 2   4.14.181-142.260.amzn2.x86_64   containerd://1.3.2
    ip-192-168-60-139.us-west-2.compute.internal   Ready    <none>   3d3h   v1.16.12-eks-904af05   192.168.60.139   44.230.198.56   Amazon Linux 2   4.14.181-142.260.amzn2.x86_64   docker://19.3.6
    
    • uname -a
    $ uname -a
    Darwin moehajj-C02CJ1ARML7M 19.6.0 Darwin Kernel Version 19.6.0: Sun Jul  5 00:43:10 PDT 2020; root:xnu-6153.141.1~9/RELEASE_X86_64 x86_64
    
    type: bug area: networking area: integration 
    opened by moehajj 48
  • DNS not working in Docker Compose

    DNS not working in Docker Compose

    DNS lookups fail in Docker Compose 2.3.

    docker-compose.yml

    version: '2.3'
    services:
      gvisor_test:
        command: node /home/test.js
        image: node:8-alpine
        runtime: runsc
        volumes:
          - /home/ubuntu/compose/test.js:/home/test.js
    

    test.js

    const http = require('http')
    http.get('http://www.google.com', res => console.log(res))
    

    Error:

    $ docker-compose up
    Starting compose_gvisor_test_1 ... done
    Attaching to compose_gvisor_test_1
    gvisor_test_1  | events.js:183
    gvisor_test_1  |       throw er; // Unhandled 'error' event
    gvisor_test_1  |       ^
    gvisor_test_1  | 
    gvisor_test_1  | Error: getaddrinfo EAI_AGAIN www.google.com:80
    gvisor_test_1  |     at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:67:26)
    compose_gvisor_test_1 exited with code 1
    
    $ uname -a
    Linux ubuntu-2 4.15.0-36-generic #39~16.04.1-Ubuntu SMP Tue Sep 25 08:59:23 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
    
    $ docker version
    Client:
    Version:           18.09.0
    API version:       1.39
    Go version:        go1.10.4
    Git commit:        4d60db4
    Built:             Wed Nov  7 00:48:57 2018
    OS/Arch:           linux/amd64
    Experimental:      false
    
    Server: Docker Engine - Community
    Engine:
     Version:          18.09.0
     API version:      1.39 (minimum version 1.12)
     Go version:       go1.10.4
     Git commit:       4d60db4
     Built:            Wed Nov  7 00:16:44 2018
     OS/Arch:          linux/amd64
     Experimental:     false
    

    I've tried using the use-vc and other resolve options in Compose (http://man7.org/linux/man-pages/man5/resolv.conf.5.html) to force TCP over UDP, with no luck:

    version: '2.3'
    services:
      gvisor_test:
        command: node /home/test.js
        image: node:8-alpine
        runtime: runsc
        volumes:
          - /home/ubuntu/compose/test.js:/home/test.js
        dns_opt:
          - use-vc
    
    area: docs area: networking area: container runtime area: integration 
    opened by boostpaal 29
  • gvisor prevents AMQP sockets from opening (TCP_SYNCNT)

    gvisor prevents AMQP sockets from opening (TCP_SYNCNT)

    When running a simple application to submit jobs via AMQP to a Celery server, gvisor prevents the sockets from being opened to send the data. Debug logs are attached.

    Base simple application that listens via HTTP and makes request, along with python requirements and Dockerfile: https://gist.github.com/mcowger/7d4ab07a75dc1ddddd1f1fb20dc5d8fc

    When running this compiled docker image under regular containerd, it runs fine. When run under the runsc runtime, it fails. When run under Google Cloud Run (where I actually first encountered the issue), the error is easier to see (though gvisor tries to disclaim responsibility):

    Container Sandbox: Unsupported syscall getsockopt(0x5,0x6,0xa,0x3ee2d2dfb3c0,0x3ee2d2dfb3c4,0x0). It is very likely that you can safely ignore this message and that this is not the cause of any error you might be troubleshooting. Please, refer to https://gvisor.dev/c/linux/amd64/getsockopt for more information.

    Other useful information to include is:

    • runsc -v

    runsc version release-20191213.0 spec: 1.0.1-dev

    • docker version or docker info if more relevant

    Server: Docker Engine - Community Engine: Version: 19.03.5 API version: 1.40 (minimum version 1.12) Go version: go1.12.12 Git commit: 633a0ea838 Built: Wed Nov 13 07:24:29 2019 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.2.10 GitCommit: b34a5c8af56e510852c35414db4c1f4fa6172339 runc: Version: 1.0.0-rc8+dev GitCommit: 3e425f80a8c931f88e6d94a8c831b9d5aa481657 docker-init: Version: 0.18.0 GitCommit: fec3683

    • uname -a - git describe

    Linux devbox 4.9.0-11-amd64 #1 SMP Debian 4.9.189-3+deb9u2 (2019-11-11) x86_64 GNU/Linux

    • Detailed reproduction steps
    docker run -p 8080:8080 celery  # works fine
    docker run --runtime=runsc -p 8080:8080 celery
    

    runsc.bug.zip

    type: enhancement area: compatibility area: networking 
    opened by mcowger 29
  • Handle ICMP Timestamp Request

    Handle ICMP Timestamp Request

    Added code to handle incoming ICMP request message with timestamp and also added timestamp in ICMP reply code for the respective request.

    Testing: Verified as below,

    Linux_Machine --------------------- Intel_NUC (Fuchsia)
    

    a. From Linux Machine sent ICMP Request with timestamp b. For this request, Intel_NUC responded with ICMP reply including timestamp

    0001-Added-code-to-handle-incoming-ICMP-request-message-w.txt

    cla: yes area: networking stale 
    opened by globaledgesoftware-ltd 28
  • GKE Sandbox: OpenSSL SSL_read: SSL_ERROR_SYSCALL, errno 104

    GKE Sandbox: OpenSSL SSL_read: SSL_ERROR_SYSCALL, errno 104

    Hi! I don't know if this is the correct site to publish this kind of issues as it is related to gVisor but on top of GKE.

    Scenario

    We are using gke with a nodepool with the GKE Sandbox feature enable. We found and error during an image upload to the telegram API.

    TL;DR

    # curl -X POST "https://api.telegram.org/bot990060833:I_CAN_SEND_YOU_THE_TOKEN/sendPhoto" -F chat_id=334621642 -F photo="@googlelogo_color_92x30dp.png" --verbose
    Note: Unnecessary use of -X or --request, POST is already inferred.
    *   Trying 149.154.167.220:443...
    * TCP_NODELAY set
    * Connected to api.telegram.org (149.154.167.220) port 443 (#0)
    * ALPN, offering h2
    * ALPN, offering http/1.1
    * successfully set certificate verify locations:
    *   CAfile: /etc/ssl/certs/ca-certificates.crt
      CApath: none
    * TLSv1.3 (OUT), TLS handshake, Client hello (1):
    * TLSv1.3 (IN), TLS handshake, Server hello (2):
    * TLSv1.2 (IN), TLS handshake, Certificate (11):
    * TLSv1.2 (IN), TLS handshake, Server key exchange (12):
    * TLSv1.2 (IN), TLS handshake, Server finished (14):
    * TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
    * TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
    * TLSv1.2 (OUT), TLS handshake, Finished (20):
    * TLSv1.2 (IN), TLS handshake, Finished (20):
    * SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
    * ALPN, server accepted to use http/1.1
     Server certificate:
    *  subject: OU=Domain Control Validated; CN=api.telegram.org
    *  start date: May  4 14:42:31 2018 GMT
    *  expire date: May 23 16:17:38 2020 GMT
    *  subjectAltName: host "api.telegram.org" matched cert's "api.telegram.org"
    *  issuer: C=US; ST=Arizona; L=Scottsdale; O=GoDaddy.com, Inc.; OU=http://certs.godaddy.com/repository/; CN=Go Daddy Secure Certificate Authority - G2
    *  SSL certificate verify ok.
    > POST /bot990060833:I_CAN_SEND_YOU_THE_TOKEN/sendPhoto HTTP/1.1
    > Host: api.telegram.org
    > User-Agent: curl/7.66.0
    > Accept: */*
    > Content-Length: 4142
    > Content-Type: multipart/form-data; boundary=------------------------c5bb5ff482d768bf
    > Expect: 100-continue
    > 
    * Mark bundle as not supporting multiuse
    < HTTP/1.1 100 Continue
    * We are completely uploaded and fine
    * OpenSSL SSL_read: SSL_ERROR_SYSCALL, errno 104
    * Closing connection 0
    curl: (56) OpenSSL SSL_read: SSL_ERROR_SYSCALL, errno 104
    

    curl: (56) OpenSSL SSL_read: SSL_ERROR_SYSCALL, errno 104

    • runsc -v:
    /home/containerd/usr/local/sbin/runsc --version
    runsc version google-281502745
    spec: 1.0.1-dev 
    
    • docker version or docker info if more relevant
    docker version
    Client:
     Version:           19.03.1
     API version:       1.40
     Go version:        go1.11.2
     Git commit:        74b1e89
     Built:             Wed Oct  9 06:26:18 2019
     OS/Arch:           linux/amd64
     Experimental:      false
    
    Server:
     Engine:
      Version:          19.03.1
      API version:      1.40 (minimum version 1.12)
      Go version:       go1.11.2
      Git commit:       74b1e89
      Built:            Wed Oct  9 06:25:30 2019
      OS/Arch:          linux/amd64
      Experimental:     false
     containerd:
      Version:          1.2.8
      GitCommit:        a4bc1d432a2c33aa2eed37f338dceabb93641310
     runc:
      Version:          1.0.0-rc8
      GitCommit:        425e105d5a03fabd737a126ad93d62a9eeede87f
     docker-init:
      Version:          0.18.0
      GitCommit:        fec3683b971d9c3ef73f284f176672c44b448662
    
    • uname -a - git describe
    $ uname -a
    Linux gke-live-clients-7bd46286-mtrp 4.19.76+ #1 SMP Tue Oct 8 23:17:06 PDT 2019 x86_64 Intel(R) Xeon(R) CPU @ 2.20GHz GenuineIntel GNU/Linux
    
    • Detailed reproduction steps

    To debug deeply this problem, we entered one of the nodes via ssh modifying the docker daemon configuration to run some tests:

    First, configure the runsc as an available docker runtime:

    cat /etc/docker/daemon.json
    {
        "live-restore": true,
        "runtimes": {
            "runsc": {
                "path": "/home/containerd/usr/local/sbin/runsc"
            }
        },
        "storage-driver": "overlay2"
    }
    

    Then run the following containers to reproduce the error:

    1st without gvisor/runsc

    $ docker run  --rm -it -m 128Mi --cpus="0.1" alpine:3.10 /bin/sh
    # apk add curl
    # curl https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_92x30dp.png --output googlelogo_color_92x30dp.png
    # curl -X POST "https://api.telegram.org/bot990060833:I_CAN_SEND_YOU_THE_TOKEN/sendPhoto" -F chat_id=334621642 -F photo="@googlelogo_color_92x30dp.png" --verbose
    
    # TRUNCATED OUTPUT #
    
    < HTTP/1.1 200 OK
    < Server: nginx/1.16.1
    < Date: Tue, 07 Jan 2020 15:43:53 GMT
    < Content-Type: application/json
    < Content-Length: 413
    < Connection: keep-alive
    < Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
    < Access-Control-Allow-Origin: *
    < Access-Control-Allow-Methods: GET, POST, OPTIONS
    < Access-Control-Expose-Headers: Content-Length,Content-Type,Date,Server,Connection
    < 
    * Connection #0 to host api.telegram.org left intact
    {"ok":true,"result":{"message_id":9,"from":{"id":990060833,"is_bot":true,"first_name":"testk8spin","username":"k8spin_bot"},"chat":{"id":334621642,"first_name":"Pau","last_name":"Rosello","username":"paurosello","type":"private"},"date":1578411833,"photo":[{"file_id":"AgADBAADSLIxG3cuoFCCXM-yOT0enWp5qBsABAEAAwIAA20AA-M0BgABFgQ","file_unique_id":"AQADanmoGwAE4zQGAAE","file_size":4066,"width":184,"height":60}]}}/ # 
    

    2nd with gvisor/runsc

    $ docker run --runtime=runsc --rm -it -m 128Mi --cpus="0.1" alpine:3.10 /bin/sh
    # apk add curl
    # curl https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_92x30dp.png --output googlelogo_color_92x30dp.png
    # curl -X POST "https://api.telegram.org/bot990060833:I_CAN_SEND_YOU_THE_TOKEN/sendPhoto" -F chat_id=334621642 -F photo="@googlelogo_color_92x30dp.png" --verbose
    Note: Unnecessary use of -X or --request, POST is already inferred.
    *   Trying 149.154.167.220:443...
    * TCP_NODELAY set
    * Connected to api.telegram.org (149.154.167.220) port 443 (#0)
    * ALPN, offering h2
    * ALPN, offering http/1.1
    * successfully set certificate verify locations:
    *   CAfile: /etc/ssl/certs/ca-certificates.crt
      CApath: none
    * TLSv1.3 (OUT), TLS handshake, Client hello (1):
    * TLSv1.3 (IN), TLS handshake, Server hello (2):
    * TLSv1.2 (IN), TLS handshake, Certificate (11):
    * TLSv1.2 (IN), TLS handshake, Server key exchange (12):
    * TLSv1.2 (IN), TLS handshake, Server finished (14):
    * TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
    * TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
    * TLSv1.2 (OUT), TLS handshake, Finished (20):
    * TLSv1.2 (IN), TLS handshake, Finished (20):
    * SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
    * ALPN, server accepted to use http/1.1
     Server certificate:
    *  subject: OU=Domain Control Validated; CN=api.telegram.org
    *  start date: May  4 14:42:31 2018 GMT
    *  expire date: May 23 16:17:38 2020 GMT
    *  subjectAltName: host "api.telegram.org" matched cert's "api.telegram.org"
    *  issuer: C=US; ST=Arizona; L=Scottsdale; O=GoDaddy.com, Inc.; OU=http://certs.godaddy.com/repository/; CN=Go Daddy Secure Certificate Authority - G2
    *  SSL certificate verify ok.
    > POST /bot990060833:I_CAN_SEND_YOU_THE_TOKEN/sendPhoto HTTP/1.1
    > Host: api.telegram.org
    > User-Agent: curl/7.66.0
    > Accept: */*
    > Content-Length: 4142
    > Content-Type: multipart/form-data; boundary=------------------------c5bb5ff482d768bf
    > Expect: 100-continue
    > 
    * Mark bundle as not supporting multiuse
    < HTTP/1.1 100 Continue
    * We are completely uploaded and fine
    * OpenSSL SSL_read: SSL_ERROR_SYSCALL, errno 104
    * Closing connection 0
    curl: (56) OpenSSL SSL_read: SSL_ERROR_SYSCALL, errno 104
    

    I can not reproduce it locally even with the same runsc and docker versions/binaries.

    Let me know if is needed more information! (the api token for example)

    Thanks!

    type: bug area: networking 
    opened by angelbarrera92 26
  • Operation timed out when using iperf3

    Operation timed out when using iperf3

    Hi everyone,

    I am experiencing a connectivity problem when using iperf3. I am booting a docker container with the following cmd: docker run -dit --name alpine1 --runtime=runsc -p 52022:22 -p 42022:5201 alpine ash

    After configuring SSH and few other things, I am able to SSH into the container from the remote machine, no problem. However, when I start an iperf3 server inside that container and try to run the client on a remote host, it just hangs and timeouts.

    From the client side I cannot see any output: iperf3 -c 10.90.36.40 -p 42022 -V

    From the server side, I see:

     iperf3 -s
    
    -----------------------------------------------------------
    
    Server listening on 5201
    
    -----------------------------------------------------------
    
    iperf3: error - unable to receive parameters from client: Operation timed out
    

    This exact setup works perfectly with the default docker runtime.

    Host system: 4.15.0-47-generic #50-Ubuntu SMP Wed Mar 13 10:44:52 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux Docker version 18.09.2, build 6247962 runsc version release-20190304.1-112-g4209edafb6a9, spec: 1.0.1-dev iperf 3.0.11

    type: bug area: networking priority: p1 
    opened by ustiugov 25
  • userns-remap default is unsupported

    userns-remap default is unsupported

    What I would consider an important security setting, appears unsupported by runsc. When I set "userns-remap": "default" in the daemon conf file, trying to run any container with runsc as the runtime causes it to fail (default runtime works fine).

    Commands attempted:

    $ docker run --runtime=runsc --rm -v /root/:/tmp/root -u root -it ubuntu bash
    docker: Error response from daemon: OCI runtime create failed: /var/lib/docker/100000.100000/runtimes/runsc did not terminate sucessfully: reading spec: mount option "noexec" is not supported: &{/dev/shm bind /var/lib/docker/100000.100000/containers/5d4b42d548e56a6057c078c0605b1127ec6d0a70e92355a5f06877ef6abacc1a/mounts/shm [rbind rprivate noexec nosuid nodev]}
    
    $ docker run --runtime=runsc hello-world
    docker: Error response from daemon: OCI runtime create failed: /var/lib/docker/100000.100000/runtimes/runsc did not terminate sucessfully: unknown.
    ERRO[0000] error waiting for container: context canceled 
    

    Docker version:

    Client:
     Version:           18.09.3
     API version:       1.39
     Go version:        go1.12
     Git commit:        
     Built:             Sun Mar 10 23:16:06 2019
     OS/Arch:           linux/amd64
     Experimental:      false
    
    Server:
     Engine:
      Version:          18.09.3
      API version:      1.39 (minimum version 1.12)
      Go version:       go1.12
      Git commit:       v18.09.3
      Built:            Sun Mar 10 23:16:06 2019
      OS/Arch:          linux/amd64
      Experimental:     false
    

    uname (void linux) Linux void-nvme2 4.20.16_1 #1 SMP PREEMPT Thu Mar 14 20:39:59 UTC 2019 x86_64 GNU/Linux

    Debug Log for ubuntu container:

    I0320 18:50:48.840255    8593 x:0] ***************************
    I0320 18:50:48.840315    8593 x:0] Args: [/usr/local/bin/runsc --debug-log=/tmp/runsc/ --debug --strace --root /var/run/docker/runtime-runsc/moby --log /run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/5d4b42d548e56a6057c078c0605b1127ec6d0a70e92355a5f06877ef6abacc1a/log.json --log-format json create --bundle /var/run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/5d4b42d548e56a6057c078c0605b1127ec6d0a70e92355a5f06877ef6abacc1a --pid-file /run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/5d4b42d548e56a6057c078c0605b1127ec6d0a70e92355a5f06877ef6abacc1a/init.pid --console-socket /tmp/pty571071343/pty.sock 5d4b42d548e56a6057c078c0605b1127ec6d0a70e92355a5f06877ef6abacc1a]
    I0320 18:50:48.840337    8593 x:0] Git Revision: 87cce0ec08b9d629a5e3a88be411b1721d767301
    I0320 18:50:48.840346    8593 x:0] PID: 8593
    I0320 18:50:48.840355    8593 x:0] UID: 0, GID: 0
    I0320 18:50:48.840362    8593 x:0] Configuration:
    I0320 18:50:48.840369    8593 x:0] 		RootDir: /var/run/docker/runtime-runsc/moby
    I0320 18:50:48.840376    8593 x:0] 		Platform: ptrace
    I0320 18:50:48.840386    8593 x:0] 		FileAccess: exclusive, overlay: false
    I0320 18:50:48.840395    8593 x:0] 		Network: sandbox, logging: false
    I0320 18:50:48.840403    8593 x:0] 		Strace: true, max size: 1024, syscalls: []
    I0320 18:50:48.840411    8593 x:0] ***************************
    W0320 18:50:48.841643    8593 x:0] AppArmor profile "docker-default" is being ignored
    W0320 18:50:48.841659    8593 x:0] Seccomp spec is being ignored
    W0320 18:50:48.841683    8593 x:0] FATAL ERROR: reading spec: mount option "noexec" is not supported: &{/dev/shm bind /var/lib/docker/100000.100000/containers/5d4b42d548e56a6057c078c0605b1127ec6d0a70e92355a5f06877ef6abacc1a/mounts/shm [rbind rprivate noexec nosuid nodev]}
    
    type: bug area: container runtime priority: p2 
    opened by D-Nice 22
  • Go branch does not build on arm64

    Go branch does not build on arm64

    Description

    gvisor.dev/gvisor/pkg/sentry/platform/ring0/pagetables

    ../pkg/sentry/platform/ring0/pagetables/walker_empty.go:121:14: pudEntry.SetSuper undefined (type *PTE has no field or method SetSuper) ../pkg/sentry/platform/ring0/pagetables/walker_empty.go:132:22: pudEntry.IsSuper undefined (type *PTE has no field or method IsSuper) ../pkg/sentry/platform/ring0/pagetables/walker_empty.go:138:24: pmdEntries[index].SetSuper undefined (type PTE has no field or method SetSuper) ../pkg/sentry/platform/ring0/pagetables/walker_empty.go:175:15: pmdEntry.SetSuper undefined (type *PTE has no field or method SetSuper) ../pkg/sentry/platform/ring0/pagetables/walker_empty.go:186:23: pmdEntry.IsSuper undefined (type *PTE has no field or method IsSuper) ../pkg/sentry/platform/ring0/pagetables/walker_lookup.go:121:14: pudEntry.SetSuper undefined (type *PTE has no field or method SetSuper) ../pkg/sentry/platform/ring0/pagetables/walker_lookup.go:132:22: pudEntry.IsSuper undefined (type *PTE has no field or method IsSuper) ../pkg/sentry/platform/ring0/pagetables/walker_lookup.go:138:24: pmdEntries[index].SetSuper undefined (type PTE has no field or method SetSuper) ../pkg/sentry/platform/ring0/pagetables/walker_lookup.go:175:15: pmdEntry.SetSuper undefined (type *PTE has no field or method SetSuper) ../pkg/sentry/platform/ring0/pagetables/walker_lookup.go:186:23: pmdEntry.IsSuper undefined (type *PTE has no field or method IsSuper) ../pkg/sentry/platform/ring0/pagetables/walker_lookup.go:186:23: too many errors

    Steps to reproduce

    make runsc

    Environment

    [email protected]:~/gvisor/runsc# go version go version go1.15 linux/arm64 [email protected]:~/gvisor/runsc# uname -a Linux cloud 5.5.19-050519-generic #202004210831 SMP Tue Apr 21 08:49:56 UTC 2020 aarch64 aarch64 aarch64 GNU/Linux [email protected]:~/gvisor/runsc#

    type: bug 
    opened by magnate3 21
  • Add basic support for cgroupv2 in gvisor

    Add basic support for cgroupv2 in gvisor

    Fix #3481

    This adds support for cgroupv2 based on libcontainer's fs2 package. It should work as-is with containerd's CRI setup, and works with docker configured with cgroupfs cgroup driver.

    I added a Vagrantfile based on fedora33 so we have an "easy-to-test" cgroupv2 environment.

    cla: no 
    opened by dqminh 20
  • Add bitmap library and apply bitmap in fd_table

    Add bitmap library and apply bitmap in fd_table

    Provides the implementation of bitmap. Apply bitmap in fd_table to record open file FD. It can accelerate the speed of allocating FD for fdtable. The time complexity to find an available FD is O(1) in the bitmap.

    Fixes #6136

    Current tests status:

    • [x] unit-tests
    • [x] nogo-tests
    • [x] runsc-tests
    • [x] syscall-tests kvm
    • [x] syscall-tests ptrace

    Here is the benchmark for this PR.

    1. time of create 10000 socket

    | original | bitmap | runc | | --- | --- | --- | | 1.076s | 0.700s | 0.070s |

    1. time of some syscall tests (run each syscall tests five times and calculate the average time)

    || original | bitmap | | --- | --- | --- | |ping_socket_test_runsc_kvm_vfs2| 31.8s | 26.5s | |socket_inet_loopback_nogotsan_test_runsc_kvm_vfs2| 51.2s | 44.5s | |socket_ipv4_udp_unbound_loopback_nogotsan_test_runsc_kvm_vfs2| 26.9s | 18.1s |

    Signed-off-by: Howard Zhang [email protected]

    cla: yes ready to pull 
    opened by zhlhahaha 19
  • tcpip/stack: (*nic).DeliverNetworkPacket godoc out of date

    tcpip/stack: (*nic).DeliverNetworkPacket godoc out of date

    Description

    The godoc on tcpip/stack.(*nic).DeliverNetworkPacket at HEAD is out of date and refers to a vv parameter that no longer exists:

    https://github.com/google/gvisor/blob/2a62f437960641b655f790cfb13ca14ca6a7478d/pkg/tcpip/stack/nic.go#L700

    Presumably vv was a vectorized view in the past and it's now a PacketBuffer.

    type: bug 
    opened by bradfitz 0
  • Correct sharedmem stress test flakiness.

    Correct sharedmem stress test flakiness.

    Correct sharedmem stress test flakiness.

    This change increases test size to decrease the likelihood of a timeout. It also moves queue clearing to when then dispatcher receives a close notification rather than after it returns.

    exported 
    opened by copybara-service[bot] 0
  • Add a smoke test for runsc-race

    Add a smoke test for runsc-race

    null

    ready to pull 
    opened by avagin 0
  • Do not reject TCP SYN w/ ECN flags set.

    Do not reject TCP SYN w/ ECN flags set.

    Do not reject TCP SYN w/ ECN flags set.

    This change does not add support for ECN flags in Netstack. It just ensures we don't reject valid SYN packets with ECN bits set.

    For a complete ECN implementation we would need to implement the following section https://datatracker.ietf.org/doc/html/rfc3168#section-6 as well as the IP bits to set the ECN bits accordingly.

    Fixes #7075

    exported 
    opened by copybara-service[bot] 0
  • Support receiving ttl/hoplimit control message

    Support receiving ttl/hoplimit control message

    Support receiving ttl/hoplimit control message

    exported 
    opened by copybara-service[bot] 0
  • TCP Forwarder.HandlePacket doesn't handle SYNs with ECN/CWR bits set

    TCP Forwarder.HandlePacket doesn't handle SYNs with ECN/CWR bits set

    Description

    The gVisor netstack TCP forwarder mishandles initial SYN packets with ECN bits set.

    gVisor doesn't support ECN (#995) but it should just ignore ECN, not send RSTs as it does today.

    Every since the first public gVisor commit on GitHub (I'm not sure about the google tree history), the TCP forwarder code has required that the SYN flags be exactly SYN, without other bits set.

    The code is currently:

    https://github.com/google/gvisor/blob/5fb52763235857427fef62ef227b26f1fad4de2c/pkg/tcpip/transport/tcp/forwarder.go#L72

    Which means it bails out early, resulting in a RST by the caller:

            // We only care about well-formed SYN packets.                                                                        
            if !s.parse(pkt.RXTransportChecksumValidated) || !s.csumValid || s.flags != header.TCPFlagSyn {
                    return false
            }
    

    That s.flags != header.TCPFlagSyn is not right. It should probably be s.flags&0x3F != header.TCPFlagSyn instead. Or something more readable.

    (More debugging details in https://github.com/tailscale/tailscale/issues/2642)

    Steps to reproduce

    • on machine 1, run a webserver listening on localhost:8080 or other port on the Tailscale box (any language/server). Or just nc -l -p 8080.
    • on machine 1, use Tailscale's tailscaled daemon in --tun=userspace-networking mode (which forces gVisor/netstack)
    • on machine 2, a Linux box with Tailscale, force ECN with sudo sysctl net.ipv4.tcp_ecn=1
    • on machine 2, curl $machine1-tailscale-ip:8080

    Observe TCP RSTs arrive.

    runsc version

    n/a
    

    docker version (if using docker)

    n/a
    

    uname

    Not OS-specific.

    kubectl (if using Kubernetes)

    n/a
    

    repo state (if built from source)

    Bug exists at HEAD (5fb52763235857427fef62ef227b26f1fad4de2c)

    runsc debug logs (if available)

    n/a
    
    type: bug area: compatibility area: networking 
    opened by bradfitz 6
  • Clean documentation and add go vet support for checklocks.

    Clean documentation and add go vet support for checklocks.

    Clean documentation and add go vet support for checklocks.

    This makes it easier to iterate with checklocks. This pattern will be duplicated with more complex analyzers.

    exported 
    opened by copybara-service[bot] 0
  • Add leak checking to fdbased_test.

    Add leak checking to fdbased_test.

    Add leak checking to fdbased_test.

    exported 
    opened by copybara-service[bot] 0
  • Increase buildkite parallelism.

    Increase buildkite parallelism.

    Increase buildkite parallelism.

    Since there is very little wasted work for Buildkite, increasing the parallelsim will decrease throw-away work on cancelation or failure.

    This aims to achieve ~3 minutes per individiaul test instance.

    exported 
    opened by copybara-service[bot] 0
  • cpuid: deflake cpuid_test

    cpuid: deflake cpuid_test

    cpuid: deflake cpuid_test

    xsavec, xgetbv1 are in Sub-leaf (EAX = 0DH, ECX = 1).

    exported 
    opened by copybara-service[bot] 0
Owner
Google
Google ❤️ Open Source
Google
Go (golang) Jupyter Notebook kernel and an interactive REPL

lgo Go (golang) Jupyter Notebook kernel and an interactive REPL Disclaimer Since go1.10, this Go kernel has performance issue due to a performance reg

Yu Watanabe 2.2k Jan 6, 2022
L3AFD kernel function control plane

L3AFD: Lightweight eBPF Application Foundation Daemon L3AFD is a crucial part of the L3AF ecosystem. For more information on L3AF see https://l3af.io/

L3AF 56 Jan 22, 2022
OpenAIOS is an incubating open-source distributed OS kernel based on Kubernetes for AI workloads

OpenAIOS is an incubating open-source distributed OS kernel based on Kubernetes for AI workloads. OpenAIOS-Platform is an AI development platform built upon OpenAIOS for enterprises to develop and deploy AI applications for production.

4Paradigm 70 Nov 30, 2021
Learning about containers and how they work by creating them the hard way

Containers the hard way: Gocker: A mini Docker written in Go It is a set of Linux's operating system primitives that provide the illusion of a contain

Shuveb Hussain 1.4k Jan 22, 2022
A tool to build, deploy, and release any environment using System Containers.

Bravetools Bravetools is an end-to-end System Container management utility. Bravetools makes it easy to configure, build, and deploy reproducible envi

null 81 Jan 8, 2022
Vulnerability Static Analysis for Containers

Clair Note: The main branch may be in an unstable or even broken state during development. Please use releases instead of the main branch in order to

QUAY 8.4k Jan 16, 2022
Provides an interactive prompt to connect to ECS Containers using the ECS ExecuteCommand API.

ecsgo Heavily inspired by incredibly useful gossm, this tool makes use of the new ECS ExecuteCommand API to connect to running ECS tasks. It provides

Ed Smith 24 Jan 3, 2022
Aceptadora provides the boilerplate to orchestrate the containers for an acceptance test.

aceptadora Aceptadora provides the boilerplate to orchestrate the containers for an acceptance test. Aceptadora is a replacement for docker-compose in

Cabify 53 Sep 10, 2021
Binary program to restart unhealthy Docker containers

DeUnhealth Restart your unhealthy containers safely Features Restart unhealthy containers marked with deunhealth.restart.on.unhealthy=true label Recei

Quentin McGaw 33 Dec 1, 2021
Simple docker tui to list, start and stop your containers

docker-tui Simple docker tui that lets you list, start and stop your containers. Current status Rough, initial prototype. Build with This tool relies

Olek 5 Dec 5, 2021
A super simple demo to document my journey to reasonably sized docker containers.

hello-docker A super simple demo to document my journey to reasonably sized docker containers. Task at Hand Build a docker container as small as possi

Torsten Wunderlich 0 Nov 30, 2021
Viewnode displays Kubernetes cluster nodes with their pods and containers.

viewnode The viewnode shows Kubernetes cluster nodes with their pods and containers. It is very useful when you need to monitor multiple resources suc

NTTDATA-DACH 3 Dec 10, 2021
Truly Minimal Linux Distribution for Containers

Statesman Statesman is a minimal Linux distribution, running from memory, that has just enough functionality to run OCI-compatible containers. Rationa

James Cunningham 3 Nov 12, 2021
This repository is where I'm learning to write a CLI using Go, while learning Go, and experimenting with Docker containers and APIs.

CLI Project This repository contains a CLI project that I've been working on for a while. It's a simple project that I've been utilizing to learn Go,

Tamir Arnesty 0 Dec 12, 2021
Build and run Docker containers leveraging NVIDIA GPUs

NVIDIA Container Toolkit Introduction The NVIDIA Container Toolkit allows users to build and run GPU accelerated Docker containers. The toolkit includ

NVIDIA Corporation 14.1k Jan 21, 2022
Docker-NodeJS - Creating a CI/CD Environment for Serverless Containers on Google Cloud Run

Creating a CI/CD Environment for Serverless Containers on Google Cloud Run Archi

David 1 Jan 8, 2022
Fadvisor(FinOps Advisor) is a collection of exporters which collect cloud resource pricing and billing data guided by FinOps, insight cost allocation for containers and kubernetes resource

[TOC] Fadvisor: FinOps Advisor fadvisor(finops advisor) is used to solve the FinOps Observalibility, it can be integrated with Crane to help users to

Crane 0 Jan 10, 2022
My Homemade ci-cd service made for docker containers

Docker-CI-CD What Is This? The Docker-CI-CD is a tool that helps you to make every commit and push to your github repositories happen seamlessly and a

null 0 Jan 20, 2022
Web-based gdb front-end application

Introduction Tired of using the plain gdb command-line interface to debug your Go/C/C++ applications? Godbg is a graphical web-based front end for gdb

Chris McGee 224 Nov 22, 2021