Amazon Elastic Container Service Agent

Overview

Amazon ECS Container Agent

Amazon ECS logo

Build Status

The Amazon ECS Container Agent is a component of Amazon Elastic Container Service (Amazon ECS) and is responsible for managing containers on behalf of Amazon ECS.

Usage

The best source of information on running this software is the Amazon ECS documentation.

Please note that from Agent version 1.20.0, Minimum required Docker version is 1.9.0, corresponding to Docker API version 1.21. For more information, please visit Amazon ECS Container Agent Versions.

On the Amazon Linux AMI

On the Amazon Linux AMI, we provide an installable RPM which can be used via sudo yum install ecs-init && sudo start ecs. This is the recommended way to run it in this environment.

On Other Linux AMIs

The Amazon ECS Container Agent may also be run in a Docker container on an EC2 instance with a recent Docker version installed. A Docker image is available in our Docker Hub Repository.

$ # Set up directories the agent uses
$ mkdir -p /var/log/ecs /etc/ecs /var/lib/ecs/data
$ touch /etc/ecs/ecs.config
$ # Set up necessary rules to enable IAM roles for tasks
$ sysctl -w net.ipv4.conf.all.route_localnet=1
$ iptables -t nat -A PREROUTING -p tcp -d 169.254.170.2 --dport 80 -j DNAT --to-destination 127.0.0.1:51679
$ iptables -t nat -A OUTPUT -d 169.254.170.2 -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 51679
$ # Run the agent
$ docker run --name ecs-agent \
    --detach=true \
    --restart=on-failure:10 \
    --volume=/var/run/docker.sock:/var/run/docker.sock \
    --volume=/var/log/ecs:/log \
    --volume=/var/lib/ecs/data:/data \
    --net=host \
    --env-file=/etc/ecs/ecs.config \
    --env=ECS_LOGFILE=/log/ecs-agent.log \
    --env=ECS_DATADIR=/data/ \
    --env=ECS_ENABLE_TASK_IAM_ROLE=true \
    --env=ECS_ENABLE_TASK_IAM_ROLE_NETWORK_HOST=true \
    amazon/amazon-ecs-agent:latest

On Other Linux AMIs when awsvpc networking mode is enabled

For the AWS VPC networking mode, ECS agent requires CNI plugin and dhclient to be available. ECS also needs the ecs-init to run as part of its startup. The following is an example of docker run configuration for running ecs-agent with Task ENI enabled. Note that ECS agent currently only supports cgroupfs for cgroup driver.

$ # Run the agent
$ /usr/bin/docker run --name ecs-agent \
--init \
--restart=on-failure:10 \
--volume=/var/run:/var/run \
--volume=/var/log/ecs/:/log:Z \
--volume=/var/lib/ecs/data:/data:Z \
--volume=/etc/ecs:/etc/ecs \
--volume=/sbin:/host/sbin \
--volume=/lib:/lib \
--volume=/lib64:/lib64 \
--volume=/usr/lib:/usr/lib \
--volume=/usr/lib64:/usr/lib64 \
--volume=/proc:/host/proc \
--volume=/sys/fs/cgroup:/sys/fs/cgroup \
--net=host \
--env-file=/etc/ecs/ecs.config \
--cap-add=sys_admin \
--cap-add=net_admin \
--env ECS_ENABLE_TASK_ENI=true \
--env ECS_UPDATES_ENABLED=true \
--env ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION=1h \
--env ECS_DATADIR=/data \
--env ECS_ENABLE_TASK_IAM_ROLE=true \
--env ECS_ENABLE_TASK_IAM_ROLE_NETWORK_HOST=true \
--env ECS_LOGFILE=/log/ecs-agent.log \
--env ECS_AVAILABLE_LOGGING_DRIVERS='["json-file","awslogs","syslog","none"]' \
--env ECS_LOGLEVEL=info \
--detach \
amazon/amazon-ecs-agent:latest

See also the Advanced Usage section below.

On the ECS Optimized Windows AMI

ECS Optimized Windows AMI ships with a pre-installed PowerShell module called ECSTools to install, configure, and run the ECS Agent as a Windows service. To install the service, you can run the following PowerShell commands on an EC2 instance. To launch into another cluster instead of windows, replace the 'windows' in the script below with the name of your cluster.

PS C:\> Import-Module ECSTools
PS C:\> # The -EnableTaskIAMRole option is required to enable IAM roles for tasks.
PS C:\> Initialize-ECSAgent -Cluster 'windows' -EnableTaskIAMRole

Downloading Different Version of ECS Agent

To download different version of ECS Agent, you can do the following:

PS C:\> # use agentVersion = "latest" for the latest available agent version
PS C:\> $agentVersion = "v1.20.4"
PS C:\> Initialize-ECSAgent -Cluster 'windows' -EnableTaskIAMRole -Version $agentVersion

Advanced Usage

The Amazon ECS Container Agent supports a number of configuration options, most of which should be set through environment variables.

Environment Variables

The table below provides an overview of optional environment variables that can be used to configure the ECS agent. See the Amazon ECS developer guide for additional details on each available environment variable.

Environment Key Example Value(s) Description Default value on Linux Default value on Windows
ECS_CLUSTER clusterName The cluster this agent should check into. default default
ECS_RESERVED_PORTS [22, 80, 5000, 8080] An array of ports that should be marked as unavailable for scheduling on this container instance. [22, 2375, 2376, 51678, 51679] [53, 135, 139, 445, 2375, 2376, 3389, 5985, 5986, 51678, 51679]
ECS_RESERVED_PORTS_UDP [53, 123] An array of UDP ports that should be marked as unavailable for scheduling on this container instance. [] []
ECS_ENGINE_AUTH_TYPE "docker" | "dockercfg" The type of auth data that is stored in the ECS_ENGINE_AUTH_DATA key.
ECS_ENGINE_AUTH_DATA See the dockerauth documentation Docker auth data formatted as defined by ECS_ENGINE_AUTH_TYPE.
AWS_DEFAULT_REGION <us-west-2>|<us-east-1>|… The region to be used in API requests as well as to infer the correct backend host. Taken from Amazon EC2 instance metadata. Taken from Amazon EC2 instance metadata.
AWS_ACCESS_KEY_ID AKIDEXAMPLE The access key used by the agent for all calls. Taken from Amazon EC2 instance metadata. Taken from Amazon EC2 instance metadata.
AWS_SECRET_ACCESS_KEY EXAMPLEKEY The secret key used by the agent for all calls. Taken from Amazon EC2 instance metadata. Taken from Amazon EC2 instance metadata.
AWS_SESSION_TOKEN The session token used for temporary credentials. Taken from Amazon EC2 instance metadata. Taken from Amazon EC2 instance metadata.
DOCKER_HOST unix:///var/run/docker.sock Used to create a connection to the Docker daemon; behaves similarly to this environment variable as used by the Docker client. unix:///var/run/docker.sock npipe:////./pipe/docker_engine
ECS_LOGLEVEL <crit> | <error> | <warn> | <info> | <debug> The level of detail to be logged. info info
ECS_LOGLEVEL_ON_INSTANCE <none> | <crit> | <error> | <warn> | <info> | <debug> Can be used to override ECS_LOGLEVEL and set a level of detail that should be logged in the on-instance log file, separate from the level that is logged in the logging driver. If a logging driver is explicitly set, on-instance logs are turned off by default, but can be turned back on with this variable. none if ECS_LOG_DRIVER is explicitly set to a non-empty value; otherwise the same value as ECS_LOGLEVEL none if ECS_LOG_DRIVER is explicitly set to a non-empty value; otherwise the same value as ECS_LOGLEVEL
ECS_LOGFILE /ecs-agent.log The location where logs should be written. Log level is controlled by ECS_LOGLEVEL. blank blank
ECS_CHECKPOINT <true | false> Whether to checkpoint state to the DATADIR specified below. true if ECS_DATADIR is explicitly set to a non-empty value; false otherwise true if ECS_DATADIR is explicitly set to a non-empty value; false otherwise
ECS_DATADIR /data/ The container path where state is checkpointed for use across agent restarts. Note that on Linux, when you specify this, you will need to make sure that the Agent container has a bind mount of $ECS_HOST_DATA_DIR/data:$ECS_DATADIR with the corresponding values of ECS_HOST_DATA_DIR and ECS_DATADIR. /data/ C:\ProgramData\Amazon\ECS\data
ECS_UPDATES_ENABLED <true | false> Whether to exit for an updater to apply updates when requested. false false
ECS_DISABLE_METRICS <true | false> Whether to disable metrics gathering for tasks. false true
ECS_POLL_METRICS <true | false> Whether to poll or stream when gathering metrics for tasks. Setting this value to true can help reduce the CPU usage of dockerd and containerd on the ECS container instance. See also ECS_POLL_METRICS_WAIT_DURATION for setting the poll interval. false false
ECS_POLLING_METRICS_WAIT_DURATION 10s Time to wait between polling for metrics for a task. Not used when ECS_POLL_METRICS is false. Maximum value is 20s and minimum value is 5s. If user sets above maximum it will be set to max, and if below minimum it will be set to min. 10s 10s
ECS_PULL_DEPENDENT_CONTAINERS_UPFRONT <true | false> Whether to pull images for containers with dependencies before the dependsOn condition has been satisfied. false false
ECS_RESERVED_MEMORY 32 Memory, in MiB, to reserve for use by things other than containers managed by Amazon ECS. 0 0
ECS_AVAILABLE_LOGGING_DRIVERS ["awslogs","fluentd","gelf","json-file","journald","logentries","splunk","syslog"] Which logging drivers are available on the container instance. ["json-file","none"] ["json-file","none"]
ECS_DISABLE_PRIVILEGED true Whether launching privileged containers is disabled on the container instance. false false
ECS_SELINUX_CAPABLE true Whether SELinux is available on the container instance. false false
ECS_APPARMOR_CAPABLE true Whether AppArmor is available on the container instance. false false
ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION 10m Default time to wait to delete containers for a stopped task (see also ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION_JITTER). If set to less than 1 minute, the value is ignored. 3h 3h
ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION_JITTER 1h Jitter value for the task engine cleanup wait duration. When specified, the actual cleanup wait duration time for each task will be the duration specified in ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION plus a random duration between 0 and the jitter duration. blank blank
ECS_CONTAINER_STOP_TIMEOUT 10m Instance scoped configuration for time to wait for the container to exit normally before being forcibly killed. 30s 30s
ECS_CONTAINER_START_TIMEOUT 10m Timeout before giving up on starting a container. 3m 8m
ECS_CONTAINER_CREATE_TIMEOUT 10m Timeout before giving up on creating a container. Minimum value is 1m. If user sets a value below minimum it will be set to min. 4m 4m
ECS_ENABLE_TASK_IAM_ROLE true Whether to enable IAM Roles for Tasks on the Container Instance false false
ECS_ENABLE_TASK_IAM_ROLE_NETWORK_HOST true Whether to enable IAM Roles for Tasks when launched with host network mode on the Container Instance false false
ECS_DISABLE_IMAGE_CLEANUP true Whether to disable automated image cleanup for the ECS Agent. false false
ECS_IMAGE_CLEANUP_INTERVAL 30m The time interval between automated image cleanup cycles. If set to less than 10 minutes, the value is ignored. 30m 30m
ECS_IMAGE_MINIMUM_CLEANUP_AGE 30m The minimum time interval between when an image is pulled and when it can be considered for automated image cleanup. 1h 1h
NON_ECS_IMAGE_MINIMUM_CLEANUP_AGE 30m The minimum time interval between when a non ECS image is created and when it can be considered for automated image cleanup. 1h 1h
ECS_NUM_IMAGES_DELETE_PER_CYCLE 5 The maximum number of images to delete in a single automated image cleanup cycle. If set to less than 1, the value is ignored. 5 5
ECS_IMAGE_PULL_BEHAVIOR <default | always | once | prefer-cached > The behavior used to customize the pull image process. If default is specified, the image will be pulled remotely, if the pull fails then the cached image in the instance will be used. If always is specified, the image will be pulled remotely, if the pull fails then the task will fail. If once is specified, the image will be pulled remotely if it has not been pulled before or if the image was removed by image cleanup, otherwise the cached image in the instance will be used. If prefer-cached is specified, the image will be pulled remotely if there is no cached image, otherwise the cached image in the instance will be used. default default
ECS_IMAGE_PULL_INACTIVITY_TIMEOUT 1m The time to wait after docker pulls complete waiting for extraction of a container. Useful for tuning large Windows containers. 1m 3m
ECS_IMAGE_PULL_TIMEOUT 1h The time to wait for pulling docker image. 2h 2h
ECS_INSTANCE_ATTRIBUTES {"stack": "prod"} These attributes take effect only during initial registration. After the agent has joined an ECS cluster, use the PutAttributes API action to add additional attributes. For more information, see Amazon ECS Container Agent Configuration in the Amazon ECS Developer Guide. {} {}
ECS_ENABLE_TASK_ENI false Whether to enable task networking for task to be launched with its own network interface false Not applicable
ECS_ENABLE_HIGH_DENSITY_ENI false Whether to enable high density eni feature when using task networking true Not applicable
ECS_CNI_PLUGINS_PATH /ecs/cni The path where the cni binary file is located /amazon-ecs-cni-plugins Not applicable
ECS_AWSVPC_BLOCK_IMDS true Whether to block access to Instance Metadata for Tasks started with awsvpc network mode false Not applicable
ECS_AWSVPC_ADDITIONAL_LOCAL_ROUTES ["10.0.15.0/24"] In awsvpc network mode, traffic to these prefixes will be routed via the host bridge instead of the task ENI [] Not applicable
ECS_ENABLE_CONTAINER_METADATA true When true, the agent will create a file describing the container's metadata and the file can be located and consumed by using the container enviornment variable $ECS_CONTAINER_METADATA_FILE false false
ECS_HOST_DATA_DIR /var/lib/ecs The source directory on the host from which ECS_DATADIR is mounted. We use this to determine the source mount path for container metadata files in the case the ECS Agent is running as a container. We do not use this value in Windows because the ECS Agent is not running as container in Windows. On Linux, note that when you specify this, you will need to make sure that the Agent container has a bind mount of $ECS_HOST_DATA_DIR/data:$ECS_DATADIR with the corresponding values of ECS_HOST_DATA_DIR and ECS_DATADIR. /var/lib/ecs Not used
ECS_ENABLE_TASK_CPU_MEM_LIMIT true Whether to enable task-level cpu and memory limits true false
ECS_CGROUP_PATH /sys/fs/cgroup The root cgroup path that is expected by the ECS agent. This is the path that accessible from the agent mount. /sys/fs/cgroup Not applicable
ECS_CGROUP_CPU_PERIOD 10ms CGroups CPU period for task level limits. This value should be between 8ms to 100ms 100ms Not applicable
ECS_AGENT_HEALTHCHECK_HOST localhost Override for the ecs-agent container's healthcheck localhost ip address localhost localhost
ECS_ENABLE_CPU_UNBOUNDED_WINDOWS_WORKAROUND true When true, ECS will allow CPU unbounded(CPU=0) tasks to run along with CPU bounded tasks in Windows. Not applicable false
ECS_ENABLE_MEMORY_UNBOUNDED_WINDOWS_WORKAROUND true When true, ECS will ignore the memory reservation parameter (soft limit) to run along with memory bounded tasks in Windows. To run a memory unbounded task, omit the memory hard limit and set any memory reservation, it will be ignored. Not applicable false
ECS_TASK_METADATA_RPS_LIMIT 100,150 Comma separated integer values for steady state and burst throttle limits for task metadata endpoint 40,60 40,60
ECS_SHARED_VOLUME_MATCH_FULL_CONFIG true When true, ECS Agent will compare name, driver options, and labels to make sure volumes are identical. When false, Agent will short circuit shared volume comparison if the names match. This is the default Docker behavior. If a volume is shared across instances, this should be set to false. false false
ECS_CONTAINER_INSTANCE_PROPAGATE_TAGS_FROM ec2_instance If ec2_instance is specified, existing tags defined on the container instance will be registered to Amazon ECS and will be discoverable using the ListTagsForResource API. Using this requires that the IAM role associated with the container instance have the ec2:DescribeTags action allowed. none none
ECS_CONTAINER_INSTANCE_TAGS {"tag_key": "tag_val"} The metadata that you apply to the container instance to help you categorize and organize them. Each tag consists of a key and an optional value, both of which you define. Tag keys can have a maximum character length of 128 characters, and tag values can have a maximum length of 256 characters. If tags also exist on your container instance that are propagated using the ECS_CONTAINER_INSTANCE_PROPAGATE_TAGS_FROM parameter, those tags will be overwritten by the tags specified using ECS_CONTAINER_INSTANCE_TAGS. {} {}
ECS_ENABLE_UNTRACKED_IMAGE_CLEANUP true Whether to allow the ECS agent to delete containers and images that are not part of ECS tasks. false false
ECS_EXCLUDE_UNTRACKED_IMAGE alpine:latest Comma seperated list of imageName:tag of images that should not be deleted by the ECS agent if ECS_ENABLE_UNTRACKED_IMAGE_CLEANUP is enabled.
ECS_DISABLE_DOCKER_HEALTH_CHECK false Whether to disable the Docker Container health check for the ECS Agent. false false
ECS_NVIDIA_RUNTIME nvidia The Nvidia Runtime to be used to pass Nvidia GPU devices to containers. nvidia Not Applicable
ECS_ENABLE_SPOT_INSTANCE_DRAINING true Whether to enable Spot Instance draining for the container instance. If true, if the container instance receives a spot interruption notice, agent will set the instance's status to DRAINING, which gracefully shuts down and replaces all tasks running on the instance that are part of a service. It is recommended that this be set to true when using spot instances. false false
ECS_LOG_ROLLOVER_TYPE size | hourly Determines whether the container agent logfile will be rotated based on size or hourly. By default, the agent logfile is rotated each hour. hourly hourly
ECS_LOG_OUTPUT_FORMAT logfmt | json Determines the log output format. When the json format is used, each line in the log would be a structured JSON map. logfmt logfmt
ECS_LOG_MAX_FILE_SIZE_MB 10 When the ECS_LOG_ROLLOVER_TYPE variable is set to size, this variable determines the maximum size (in MB) the log file before it is rotated. If the rollover type is set to hourly then this variable is ignored. 10 10
ECS_LOG_MAX_ROLL_COUNT 24 Determines the number of rotated log files to keep. Older log files are deleted once this limit is reached. 24 24
ECS_LOG_DRIVER awslogs | fluentd | gelf | json-file | journald | logentries | syslog | splunk The logging driver to be used by the Agent container. json-file Not applicable
ECS_LOG_OPTS {"option":"value"} The options for configuring the logging driver set in ECS_LOG_DRIVER. {} Not applicable
ECS_ENABLE_AWSLOGS_EXECUTIONROLE_OVERRIDE true Whether to enable awslogs log driver to authenticate via credentials of task execution IAM role. Needs to be true if you want to use awslogs log driver in a task that has task execution IAM role specified. When using the ecs-init RPM with version equal or later than V1.16.0-1, this env is set to true by default. false false
ECS_FSX_WINDOWS_FILE_SERVER_SUPPORTED true Whether FSx for Windows File Server volume type is supported on the container instance. This variable is only supported on agent versions 1.47.0 and later. false true
ECS_ENABLE_RUNTIME_STATS true Determines if pprof is enabled for the agent. If enabled, the different profiles can be accessed through the agent's introspection port (e.g. curl http://localhost:51678/debug/pprof/heap > heap.pprof). In addition, agent's runtime stats are logged to /var/log/ecs/runtime-stats.log file. false false
ECS_EXCLUDE_IPV6_PORTBINDING true Determines if agent should exclude IPv6 port binding using default network mode. If enabled, IPv6 port binding will be filtered out, and the response of DescribeTasks API call will not show tasks' IPv6 port bindings, but it is still included in Task metadata endpoint. true true

Persistence

When you run the Amazon ECS Container Agent in production, its datadir should be persisted between runs of the Docker container. If this data is not persisted, the agent registers a new container instance ARN on each launch and is not able to update the state of tasks it previously ran.

Flags

The agent also supports the following flags:

  • -k — The agent will not require valid SSL certificates for the services that it communicates with. We recommend against using this flag.
  • -loglevel — Options: [<crit>|<error>|<warn>|<info>|<debug>]. The agent will output on stdout at the given level. This is overridden by the ECS_LOGLEVEL environment variable, if present.

Building and Running from Source

Running the Amazon ECS Container Agent outside of Amazon EC2 is not supported.

Docker Image (on Linux)

The Amazon ECS Container Agent may be built by typing make with the Docker daemon (v1.5.0) running.

This produces an image tagged amazon/ecs-container-agent:make that you may run as described above.

Standalone (on Linux)

The Amazon ECS Container Agent may also be run outside of a Docker container as a Go binary. This is not recommended for production on Linux, but it can be useful for development or easier integration with your local Go tools.

The following commands run the agent outside of Docker:

make gobuild
./out/amazon-ecs-agent

Make Targets (on Linux)

The following targets are available. Each may be run with make <target>.

Make Target Description
release (Default) Builds the agent within a Docker container and and packages it into a scratch-based image
gobuild Runs a normal go build of the agent and stores the binary in ./out/amazon-ecs-agent
static Runs go build to produce a static binary in ./out/amazon-ecs-agent
test Runs all unit tests using go test
test-in-docker Runs all tests inside a Docker container
run-integ-tests Runs all integration tests in the engine and stats packages
clean Removes build artifacts. Note: this does not remove Docker images

Standalone (on Windows)

The Amazon ECS Container Agent may be built by invoking scripts\build_agent.ps1

Scripts (on Windows)

The following scripts are available to help develop the Amazon ECS Container Agent on Windows:

  • scripts\run-integ-tests.ps1 - Runs all integration tests in the engine and stats packages
  • misc\windows-deploy\Install-ECSAgent.ps1 - Install the ECS agent as a Windows service
  • misc\windows-deploy\amazon-ecs-agent.ps1 - Helper script to set up the host and run the agent as a process
  • misc\windows-deploy\user-data.ps1 - Sample user-data that can be used with the Windows Server 2016 with Containers AMI to run the agent as a process

Contributing

Contributions and feedback are welcome! Proposals and pull requests will be considered and responded to. For more information, see the CONTRIBUTING.md file.

If you have a bug/and issue around the behavior of the ECS agent, please open it here.

If you have a feature request, please open it over at the AWS Containers Roadmap.

Amazon Web Services does not currently provide support for modified copies of this software.

Security disclosures

If you think you’ve found a potential security issue, please do not post it in the Issues. Instead, please follow the instructions here or email AWS security directly.

License

The Amazon ECS Container Agent is licensed under the Apache 2.0 License.

Issues
  • Way to disable per-container memory limit in task definition

    Way to disable per-container memory limit in task definition

    According to the docs1, the per-container memory limit is required. This is inconvenient, because I use the same ElasticBeanstalk Dockerrun.aws.json file with multiple instance types.

    Is there a way to disable this setting?

    kind/feature request 
    opened by robbyt 118
  • --net=host support

    --net=host support

    Dockers default behavior is to give a container a private network stack and to build a bridge with the host's network interface. This has considerable performance drawbacks (see netperf in here ) and is an absolute nogo for some types of applications (including mine). Is host-native interface (--net=host) supported or will it ever be supported?

    Copied from: https://forums.aws.amazon.com/thread.jspa?threadID=177850&tstart=25

    There are a couple of forks out there that attempt to provide support, for example: https://github.com/chaos-generator/amazon-ecs-agent

    But it would be great if there were official support for this in the AWS repo. Not having this really precludes ECS from performance sensitive applications.

    kind/feature request 
    opened by ChrisRut 92
  • Feature request: First class support for SSM parameter store as a secrets store

    Feature request: First class support for SSM parameter store as a secrets store

    Summary

    Use SSM parameter store as a secret store in a similar style to Kubernetes secrets.

    Description

    SSM parameter store currently works as a really low cost of entry secret store that allows applications with the appropriate IAM permissions to fetch and decrypt secrets at a given path. It's also used commonly by people running ECS who want to inject secrets into their containers at run time (to avoid baking them into containers and allowing for environment specific secrets) and recommended in some AWS blog posts.

    Unfortunately this requires the application to fetch the secrets from SSM parameter store at start up or, probably more commonly, rely on an entrypoint script to fetch the secrets using the AWS CLI or something like confd.

    This is a bit fiddly and also gets in the way when using a third party Docker image that has a useful entrypoint script that you don't want to have to extend to avoid drift between the official image's entrypoint script and your fork.

    Ideally ECS would support having the ECS task read parameters from SSM parameter store at startup and inject them as environment variables or as volume mounts in tmpfs similar to Kubernetes.

    I'm picturing something like this as a task definition:

    {
      "containerDefinitions": [
        {
          "name": "hello-world",
          "image": "123456789012.dkr.ecr.eu-west-1.amazonaws.com/hello-world:v1",
          "memory": 200,
          "cpu": 10,
          "essential": true,
          "environment":  [
            {
              "name": "GREETING",
              "value": "Hello"
            },
            {
              "name": "RECIPIENT",
              "valueFrom": "/production/hello-world"
            },
          ]     
        }
      ],
      "family": "hello-world",
      "taskRoleArn": "arn:aws:iam::123456789012:role/HelloWorld"
    }
    

    where /production/hello-world is an SSM parameter that is optionally encrypted.

    Note that the task's IAM role must have permission to get the SSM parameter store (and decrypt it with the appropriate KMS key if it's encrypted).

    This allows the ECS task's secrets to be separated from the application code and deployment parts and also means that users with read only access to the task definition can't easily view secrets that are injected as plain environment variables.

    kind/feature request scope/ECS Agent scope/ECS Service 
    opened by tomelliff 63
  • Volume driver support

    Volume driver support

    The VolumeDriver parameter is now part of the Docker Remote API 1.21. It would be great to get this included in the ECS task definition. This would allow me to access more storage options for my Docker containers, for example, accessing EBS volumes through Flocker or using the Ceph RDB driver (http://www.sebastien-han.fr/blog/2015/08/17/getting-started-with-the-docker-rbd-volume-plugin/).

    kind/feature request scope/ECS Agent scope/ECS Service scope/Task Definition scope/Placement 
    opened by robhaswell 60
  • Feature Request: Support for Docker Health Checks when bumping a task definition revision

    Feature Request: Support for Docker Health Checks when bumping a task definition revision

    Currently ECS utilises ELB/ALB health checks to verify when a task is ready to accept traffic, and also when it is safe to terminate additional tasks as part of a rolling replacement/upgrade when bumping a task definition revision (to align with the service deployment configuration parameters).

    Would it be possible when an ELB is not in use for an ECS service, to also look at the Docker health check status? There are some scenarios when you may not want an ELB in use, but need to gracefully rolling replace the containers as part of an upgrade.

    Details on the feature introduced in Docker 1.12.x:

    https://docs.docker.com/engine/reference/builder/#/healthcheck https://docs.docker.com/engine/reference/run/#/healthcheck

    kind/feature request scope/ECS Agent scope/ECS Service scope/Monitoring 
    opened by CpuID 58
  • TaskARN --> Container ID

    TaskARN --> Container ID

    It very difficult to track the taskARN to the execution docker container ID. These two number have no relation. How could I find which container(s) the task started/run?

    kind/feature request scope/ECS Agent scope/ECS Service 
    opened by pedrorjbr 55
  • Ports are assumed to be TCP

    Ports are assumed to be TCP

    I am looking into ECS but after looking a the source code it seems ports are assumed to be /tcp. In Docker you can specify a port as /udp for example -p 8301:8301/udp in the port mapping. Some of my containers require udp exchanges.

    kind/enhancement 
    opened by owaaa 48
  • Swap options

    Swap options

    The default of no swap seems a bit heavy-handed, for some programs it's really hard to pin-point the usage you'll need, and often 90% of the time they use 10% of that limit until a spike. Maybe an option to disable the limit all together would be nice, not like we want swap at all haha, but for spiky behaviour it's tricky. Let me know what you think!

    kind/enhancement 
    opened by tj 47
  • ECS agent should handle EFS volume umount after a task container is killed

    ECS agent should handle EFS volume umount after a task container is killed

    Summary

    aws/efs-utils#83

    The EFS volume attached to a task is not umounted after the task is killed. Umount is client side behaviour, so once the task & container is exited, the volume attached should be umounted.

    Description

    After the task is running, everything works fine and volume is attached. After the task is killed, the volume is still mounted, the amazon-efs-watchdog cannot kill the stunnel since the volume is not umounted, since the iam credentials are fetched from AWS_CONTAINER_CREDENTIALS_RELATIVE_URI, the watchdog is complaining

    2021-02-15 03:35:03,419 - ERROR - Cannot recreate self-signed certificate
    2021-02-15 03:35:04,448 - ERROR - Failed to retrieve AWS security credentials using lookup method: ecs:/v2/credentials/****
    

    Expected Behavior

    The volume resource is umounted after the task & container is killed.

    Observed Behavior

    The volume resource is still attached to the host instance after the task & container is killed.

    You can see that there are two volume attached, the volume mounted on /var/lib/ecs/volumes/ecs-Github83-1-EFS-d8d9dfe0a1d9b4918a01 cannot retrieve credentials properly

    [[email protected] efs]# df
    Filesystem            1K-blocks    Used        Available Use% Mounted on
    ...
    127.0.0.1:/    9007199254739968 4194304 9007199250545664   1% /var/lib/ecs/volumes/ecs-Github83-1-EFS-d8d9dfe0a1d9b4918a01
    ...
    127.0.0.1:/    9007199254739968 4194304 9007199250545664   1% /var/lib/ecs/volumes/ecs-Github83-2-EFS-e083d4bdace2f4908b01
    ...
    

    While the container 0c3c32e57096 is already killed

    [[email protected] efs]# docker ps -a
    CONTAINER ID        IMAGE                            COMMAND                  CREATED             STATUS                      PORTS               NAMES
    38cb273faa83        nginx                            "/docker-entrypoint.…"   35 minutes ago      Up 35 minutes               80/tcp              ecs-Github83-2-nginx-f292ca9ae7d6f7c11300
    0c3c32e57096        nginx                            "/docker-entrypoint.…"   39 minutes ago      Exited (0) 35 minutes ago                       ecs-Github83-1-nginx-b688b3edf6cdabcde901
    a9b8f9d326ce        amazon/amazon-ecs-agent:latest   "/agent"                 46 minutes ago      Up 46 minutes (healthy)                         ecs-agent
    

    After umount the file system on the instance, everything works fine

    [[email protected] efs]# sudo umount /var/lib/ecs/volumes/ecs-Github83-1-EFS-d8d9dfe0a1d9b4918a01
    [[email protected] efs]# df
    Filesystem            1K-blocks    Used        Available Use% Mounted on
    ...
    127.0.0.1:/    9007199254739968 4194304 9007199250545664   1% /var/lib/ecs/volumes/ecs-Github83-2-EFS-e083d4bdace2f4908b01
    ...
    
    [[email protected] efs]# tail -f mount-watchdog.log
    2021-02-15 03:36:45,589 - ERROR - Cannot recreate self-signed certificate
    # Umount the file system
    2021-02-15 03:36:46,591 - INFO - No mount found for "fs-66e7bce7.var.lib.ecs.volumes.ecs-Github83-1-EFS-d8d9dfe0a1d9b4918a01.20167"
    2021-02-15 03:37:16,642 - INFO - Unmount grace period expired for fs-66e7bce7.var.lib.ecs.volumes.ecs-Github83-1-EFS-d8d9dfe0a1d9b4918a01.20167
    2021-02-15 03:37:16,642 - INFO - Terminating running TLS tunnel - PID: 4564, group ID: 4564
    2021-02-15 03:37:16,642 - INFO - TLS tunnel: 4564 is still running, will retry termination
    2021-02-15 03:37:17,644 - INFO - Unmount grace period expired for fs-66e7bce7.var.lib.ecs.volumes.ecs-Github83-1-EFS-d8d9dfe0a1d9b4918a01.20167
    2021-02-15 03:37:17,644 - INFO - TLS tunnel: 4564 is no longer running, cleaning up state
    

    Environment Details

    Task definition:

    {
      ...
          "mountPoints": [
            {
              "readOnly": null,
              "containerPath": "/efs",
              "sourceVolume": "EFS"
            }
          ],
    ...
          "image": "nginx",
    ...
          "name": "nginx"
        }
      ],
    ...
      "volumes": [
        {
          ...
          "efsVolumeConfiguration": {
            "transitEncryptionPort": null,
            "fileSystemId": "fs-12345678",
            "authorizationConfig": {
              "iam": "ENABLED",
              "accessPointId": null
            },
            "transitEncryption": "ENABLED",
            "rootDirectory": "/"
          },
          "name": "EFS",
          "host": null,
          "dockerVolumeConfiguration": null
        }
      ]
    }
    

    Supporting Log Snippets

    N/A, if needed I will collect. Otherwise I think the investigation should be around how ECS agent handle the resource clean up (EFS volume umount).

    kind/question more info needed 
    opened by Cappuccinuo 45
  • ECS Agent Disconnected becoming more common

    ECS Agent Disconnected becoming more common

    Summary

    We're seeing more and more ecs-agents being disconnected recently, running on both 1.14.4 and 1.14.3, that do not recover on their own. We've been needing to connect to the boxes and run stop ecs && start ecs to which some will sustain, while others remain disconnected.

    Description

    screen shot 2017-09-15 at 2 24 12 pm

    Expected Behavior

    Agents stay connected, or reconnect on their own if issues arise.

    Observed Behavior

    Agents are disconnected and staying disconnected.

    Environment Details

    $docker info
    Containers: 4
     Running: 4
     Paused: 0
     Stopped: 0
    Images: 15
    Server Version: 17.03.1-ce
    Storage Driver: overlay2
     Backing Filesystem: extfs
     Supports d_type: true
     Native Overlay Diff: true
    Logging Driver: json-file
    Cgroup Driver: cgroupfs
    Plugins: 
     Volume: local
     Network: bridge host macvlan null overlay
    Swarm: inactive
    Runtimes: runc
    Default Runtime: runc
    Init Binary: docker-init
    containerd version:  (expected: 4ab9917febca54791c5f071a9d1f404867857fcc)
    runc version: N/A (expected: 54296cf40ad8143b62dbcaa1d90e520a2136ddfe)
    init version: N/A (expected: 949e6facb77383876aeff8a6944dde66b3089574)
    Security Options:
     seccomp
      Profile: default
    Kernel Version: 4.9.38-16.35.amzn1.x86_64
    Operating System: Amazon Linux AMI 2017.03
    OSType: linux
    Architecture: x86_64
    CPUs: 2
    Total Memory: 3.677 GiB
    Name: ip-10-7-18-149
    ID: MBVV:K4P6:F7GF:LTJ3:LNAF:GZ63:RBTB:R2KE:EMMY:QARA:HF2M:7KOR
    Docker Root Dir: /var/lib/docker
    Debug Mode (client): false
    Debug Mode (server): false
    Registry: https://index.docker.io/v1/
    Experimental: false
    Insecure Registries:
     127.0.0.0/8
    Live Restore Enabled: false
    
    $curl http://localhost:51678/v1/metadata
    {"Cluster":"<REDACTED>","ContainerInstanceArn":"arn:aws:ecs:<REDACTED>:container-instance/d45c9085-a359-474c-ae5e-b04b2efb966a","Version":"Amazon ECS Agent - v1.14.3 (15de319)"}
    

    Supporting Log Snippets

    Some instances say something along these lines:

    2017-09-15T21:08:58Z [INFO] Loading configuration
    2017-09-15T21:08:58Z [INFO] Loading state! module="statemanager"
    2017-09-15T21:08:58Z [INFO] Event stream ContainerChange start listening...
    2017-09-15T21:08:58Z [INFO] Registering Instance with ECS
    2017-09-15T21:09:09Z [INFO] Registered! module="api client"
    2017-09-15T21:09:09Z [INFO] Registration completed successfully. I am running as <REDACTED>
    2017-09-15T21:09:09Z [INFO] Saving state! module="statemanager"
    2017-09-15T21:09:09Z [INFO] Beginning Polling for updates
    2017-09-15T21:09:09Z [INFO] Event stream DeregisterContainerInstance start listening...
    2017-09-15T21:09:09Z [INFO] Initializing stats engine
    2017-09-15T21:09:09Z [INFO] NO_PROXY set:169.254.169.254,169.254.170.2,/var/run/docker.sock
    2017-09-15T21:09:19Z [INFO] Saving state! module="statemanager"
    2017-09-15T21:10:36Z [WARN] ACS Connection hasn't had any activity for too long; closing connection
    

    While some say:

    2017-09-15T21:03:39Z [INFO] Connection closed for a valid reason: websocket: close 1000 (normal): ConnectionExpired: Reconnect to continue
    2017-09-15T21:05:06Z [INFO] Connection closed for a valid reason: websocket: close 1000 (normal): ConnectionExpired: Reconnect to continue
    2017-09-15T21:07:30Z [INFO] Redundant container state change for task <REDACTED>:42 arn:aws:ecs:<REDACTED>:task/d70fa1a0-5b03-47cf-b590-694f053b02cb, Status: (RUNNING->RUNNING) Containers: [<REDACTED> (RUNNING->RUNNING),]: <REDACTED>(<REDACTED>:ec5903f) (RUNNING->RUNNING) to RUNNING, but already RUNNING
    2017-09-15T21:10:20Z [INFO] Begin building map of eligible unused images for deletion
    2017-09-15T21:10:20Z [INFO] No eligible images for deletion for this cleanup cycle
    2017-09-15T21:10:20Z [INFO] End of eligible images for deletion: No more eligible images for deletion; Still have 1 image states being managed
    2017-09-15T21:15:07Z [INFO] Connection closed for a valid reason: websocket: close 1000 (normal): ConnectionExpired: Reconnect to continue
    2017-09-15T21:17:30Z [INFO] Redundant container state change for task <REDACTED>:42 arn:aws:ecs:<REDACTED>:task/d70fa1a0-5b03-47cf-b590-694f053b02cb, Status: (RUNNING->RUNNING) Containers: [<REDACTED> (RUNNING->RUNNING),]: <REDACTED>(<REDACTED>:ec5903f) (RUNNING->RUNNING) to RUNNING, but already RUNNING
    2017-09-15T21:25:07Z [INFO] Connection closed for a valid reason: websocket: close 1000 (normal): ConnectionExpired: Reconnect to continue
    2017-09-15T21:26:17Z [INFO] Connection closed for a valid reason: websocket: close 1000 (normal): ConnectionExpired: Reconnect to continue
    2017-09-15T21:27:30Z [INFO] Redundant container state change for task <REDACTED>:42 arn:aws:ecs:<REDACTED>:task/d70fa1a0-5b03-47cf-b590-694f053b02cb, Status: (RUNNING->RUNNING) Containers: [<REDACTED> (RUNNING->RUNNING),]: <REDACTED>(<REDACTED>:ec5903f) (RUNNING->RUNNING) to RUNNING, but already RUNNING
    

    Any idea whats going on?

    kind/bug pending release scope/ECS Agent 
    opened by djenriquez 41
  • ECS agents stops and starts the tasks

    ECS agents stops and starts the tasks

    Summary

    We are experiencing an issue with one of the ECS cluster deployed using cloud formation in our production network. We have a service which runs two tasks. One of the tasks running in a container instance is stopped by ECS agent and started after some time.

    Description

    In the log, you can see container is stopped

    2019-06-20T18:09:36Z [INFO] Error from tcs; backing off: websocket client: unable to dial ecs-t-6.us-east-1.amazonaws.com response: : write tcp 10.66.9.203:46356->10.51.176.5:8080: i/o timeout
    2019-06-20T18:09:38Z [INFO] Connected to ACS endpoint
    2019-06-20T18:09:40Z [INFO] Saving state! module="statemanager"
    2019-06-20T18:09:40Z [INFO] Task engine [arn:aws:ecs:us-east-1:361190373704:task/63bae6e8-b975-4661-b67e-ae06b9e31327]: stopping container [DES-desg]
    

    Can you please explain this behavior of ECS?

    There is no change in container image definition and there are no errors in our docker container and the container was running fine, but this issue happens suddenly and consistently over the period of time.

    Expected Behavior

    ECS agent doesnt stop the container

    Observed Behavior

    ECS stops the container

    Environment Details

    Docker Server Version: 18.06.1-ce ECS Agent version 1.20.3

    Supporting Log Snippets

    2019-06-20T18:02:51Z [ERROR] Error getting message from ws backend: error: [websocket: close 1002 (protocol error): Channel long idle: No message is received, close the channel], messageType: [-1] 
    2019-06-20T18:02:51Z [INFO] Error from tcs; backing off: websocket: close 1002 (protocol error): Channel long idle: No message is received, close the channel
    2019-06-20T18:02:51Z [WARN] Error publishing metrics: write tcp 10.66.9.203:47724->10.51.177.169:8080: i/o timeout
    2019-06-20T18:02:51Z [WARN] Error getting cpu stats, err: No data in the queue, container: 8133d8c0371ab05231a9eefd8c27deeb54311c81be5c131670536cbbcf493670
    2019-06-20T18:02:51Z [WARN] Error publishing metrics: stats engine: no task metrics to report
    2019-06-20T18:02:52Z [INFO] Establishing a Websocket connection to https://ecs-t-6.us-east-1.amazonaws.com/ws?cluster=DES-E1-USEZ&containerInstance=arn%3Aaws%3Aecs%3Aus-east-1%3A361190373704%3Acontainer-instance%2F575a2c1d-ff75-4cfc-8643-1ac0481f2f3a
    2019-06-20T18:02:52Z [INFO] Connected to TCS endpoint
    2019-06-20T18:02:52Z [WARN] Error getting cpu stats, err: No data in the queue, container: 8133d8c0371ab05231a9eefd8c27deeb54311c81be5c131670536cbbcf493670
    2019-06-20T18:03:52Z [ERROR] Error getting message from ws backend: error: [websocket: close 1002 (protocol error): Channel long idle: No message is received, close the channel], messageType: [-1] 
    2019-06-20T18:04:22Z [INFO] Error from tcs; backing off: websocket: close 1002 (protocol error): Channel long idle: No message is received, close the channel
    2019-06-20T18:04:58Z [WARN] Error publishing metrics: write tcp 10.66.9.203:36502->10.51.176.207:8080: i/o timeout
    2019-06-20T18:05:26Z [INFO] Managed task [arn:aws:ecs:us-east-1:361190373704:task/63bae6e8-b975-4661-b67e-ae06b9e31327]: task at steady state: RUNNING
    2019-06-20T18:05:44Z [WARN] ACS Connection hasn't had any activity for too long; closing connection
    2019-06-20T18:05:59Z [INFO] Disconnected from ACS
    2019-06-20T18:06:13Z [INFO] Establishing a Websocket connection to https://ecs-t-6.us-east-1.amazonaws.com/ws?cluster=DES-E1-USEZ&containerInstance=arn%3Aaws%3Aecs%3Aus-east-1%3A361190373704%3Acontainer-instance%2F575a2c1d-ff75-4cfc-8643-1ac0481f2f3a
    2019-06-20T18:06:35Z [WARN] Unable to set read deadline for websocket connection: set tcp 10.66.9.203:44018: use of closed network connection for https://ecs-a-6.us-east-1.amazonaws.com/ws?agentHash=0821fbc7&agentVersion=1.25.2&clusterArn=DES-E1-USEZ&containerInstanceArn=arn%3Aaws%3Aecs%3Aus-east-1%3A361190373704%3Acontainer-instance%2F575a2c1d-ff75-4cfc-8643-1ac0481f2f3a&dockerVersion=DockerVersion%3A+18.06.1-ce&sendCredentials=false&seqNum=1
    2019-06-20T18:06:49Z [ERROR] Stopping redundant reads on closed network connection: https://ecs-a-6.us-east-1.amazonaws.com/ws?agentHash=0821fbc7&agentVersion=1.25.2&clusterArn=DES-E1-USEZ&containerInstanceArn=arn%3Aaws%3Aecs%3Aus-east-1%3A361190373704%3Acontainer-instance%2F575a2c1d-ff75-4cfc-8643-1ac0481f2f3a&dockerVersion=DockerVersion%3A+18.06.1-ce&sendCredentials=false&seqNum=1
    2019-06-20T18:06:56Z [WARN] Unable to extend read deadline for ACS connection: set tcp 10.66.9.203:44018: use of closed network connection
    2019-06-20T18:07:00Z [INFO] Managed task [arn:aws:ecs:us-east-1:361190373704:task/63bae6e8-b975-4661-b67e-ae06b9e31327]: redundant container state change. DES-desg to NONE, but already RUNNING
    2019-06-20T18:07:00Z [WARN] Unable to set read deadline for websocket connection: set tcp 10.66.9.203:44018: use of closed network connection for https://ecs-a-6.us-east-1.amazonaws.com/ws?agentHash=0821fbc7&agentVersion=1.25.2&clusterArn=DES-E1-USEZ&containerInstanceArn=arn%3Aaws%3Aecs%3Aus-east-1%3A361190373704%3Acontainer-instance%2F575a2c1d-ff75-4cfc-8643-1ac0481f2f3a&dockerVersion=DockerVersion%3A+18.06.1-ce&sendCredentials=false&seqNum=1
    2019-06-20T18:07:01Z [WARN] DockerGoClient: inactivity time exceeded timeout while retrieving stats for container 8133d8c0371ab05231a9eefd8c27deeb54311c81be5c131670536cbbcf493670
    2019-06-20T18:07:02Z [INFO] Managed task [arn:aws:ecs:us-east-1:361190373704:task/63bae6e8-b975-4661-b67e-ae06b9e31327]: task at steady state: RUNNING
    2019-06-20T18:07:02Z [ERROR] Stopping redundant reads on closed network connection: https://ecs-a-6.us-east-1.amazonaws.com/ws?agentHash=0821fbc7&agentVersion=1.25.2&clusterArn=DES-E1-USEZ&containerInstanceArn=arn%3Aaws%3Aecs%3Aus-east-1%3A361190373704%3Acontainer-instance%2F575a2c1d-ff75-4cfc-8643-1ac0481f2f3a&dockerVersion=DockerVersion%3A+18.06.1-ce&sendCredentials=false&seqNum=1
    2019-06-20T18:07:02Z [INFO] Reconnecting to ACS in: 253.312033ms
    2019-06-20T18:07:19Z [INFO] Establishing a Websocket connection to https://ecs-a-6.us-east-1.amazonaws.com/ws?agentHash=0821fbc7&agentVersion=1.25.2&clusterArn=DES-E1-USEZ&containerInstanceArn=arn%3Aaws%3Aecs%3Aus-east-1%3A361190373704%3Acontainer-instance%2F575a2c1d-ff75-4cfc-8643-1ac0481f2f3a&dockerVersion=DockerVersion%3A+18.06.1-ce&sendCredentials=false&seqNum=1
    2019-06-20T18:07:56Z [WARN] Error creating a websocket client: dial tcp: i/o timeout
    2019-06-20T18:08:02Z [ERROR] Error connecting to ACS: websocket client: unable to dial ecs-a-6.us-east-1.amazonaws.com response: : dial tcp: i/o timeout
    2019-06-20T18:08:10Z [INFO] Reconnecting to ACS in: 428.558079ms
    2019-06-20T18:08:57Z [INFO] Establishing a Websocket connection to https://ecs-a-6.us-east-1.amazonaws.com/ws?agentHash=0821fbc7&agentVersion=1.25.2&clusterArn=DES-E1-USEZ&containerInstanceArn=arn%3Aaws%3Aecs%3Aus-east-1%3A361190373704%3Acontainer-instance%2F575a2c1d-ff75-4cfc-8643-1ac0481f2f3a&dockerVersion=DockerVersion%3A+18.06.1-ce&sendCredentials=false&seqNum=1
    2019-06-20T18:09:36Z [WARN] Error creating a websocket client: write tcp 10.66.9.203:46356->10.51.176.5:8080: i/o timeout
    2019-06-20T18:09:36Z [ERROR] Error connecting to TCS: websocket client: unable to dial ecs-t-6.us-east-1.amazonaws.com response: : write tcp 10.66.9.203:46356->10.51.176.5:8080: i/o timeout
    2019-06-20T18:09:36Z [INFO] Error from tcs; backing off: websocket client: unable to dial ecs-t-6.us-east-1.amazonaws.com response: : write tcp 10.66.9.203:46356->10.51.176.5:8080: i/o timeout
    2019-06-20T18:09:38Z [INFO] Connected to ACS endpoint
    2019-06-20T18:09:40Z [INFO] Saving state! module="statemanager"
    2019-06-20T18:09:40Z [INFO] Task engine [arn:aws:ecs:us-east-1:361190373704:task/63bae6e8-b975-4661-b67e-ae06b9e31327]: stopping container [DES-desg]
    2019-06-20T18:09:40Z [INFO] Managed task [arn:aws:ecs:us-east-1:361190373704:task/c838234c-e247-418d-9ffe-f3536231c187]: unable to create task state change event []: create task state change event api: status not recognized by ECS: NONE
    2019-06-20T18:09:40Z [INFO] Managed task [arn:aws:ecs:us-east-1:361190373704:task/c838234c-e247-418d-9ffe-f3536231c187]: waiting for any previous stops to complete. Sequence number: 5
    2019-06-20T18:09:41Z [INFO] Establishing a Websocket connection to https://ecs-t-6.us-east-1.amazonaws.com/ws?cluster=DES-E1-USEZ&containerInstance=arn%3Aaws%3Aecs%3Aus-east-1%3A361190373704%3Acontainer-instance%2F575a2c1d-ff75-4cfc-8643-1ac0481f2f3a
    2019-06-20T18:09:43Z [INFO] Connected to TCS endpoint
    2019-06-20T18:09:49Z [WARN] DockerGoClient: Unable to decode stats for container 8133d8c0371ab05231a9eefd8c27deeb54311c81be5c131670536cbbcf493670: context canceled
    2019-06-20T18:09:53Z [INFO] Saving state! module="statemanager"
    2019-06-20T18:10:33Z [WARN] Error getting cpu stats, err: No data in the queue, container: 8133d8c0371ab05231a9eefd8c27deeb54311c81be5c131670536cbbcf493670
    2019-06-20T18:10:39Z [WARN] Error publishing metrics: stats engine: no task metrics to report
    2019-06-20T18:10:50Z [INFO] Task engine [arn:aws:ecs:us-east-1:361190373704:task/63bae6e8-b975-4661-b67e-ae06b9e31327]: error transitioning container [DES-desg] to [STOPPED]: Could not transition to stopped; timed out after waiting 1m0s
    2019-06-20T18:10:58Z [INFO] TCS Connection hasn't had any activity for too long; disconnecting
    2019-06-20T18:11:06Z [INFO] DockerGoClient: error stopping container 8133d8c0371ab05231a9eefd8c27deeb54311c81be5c131670536cbbcf493670: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
    2019-06-20T18:11:13Z [WARN] Error getting cpu stats, err: No data in the queue, container: 8133d8c0371ab05231a9eefd8c27deeb54311c81be5c131670536cbbcf493670
    2019-06-20T18:11:20Z [WARN] Unable to set read deadline for websocket connection: set tcp 10.66.9.203:46394: use of closed network connection for https://ecs-t-6.us-east-1.amazonaws.com/
    
    more info needed 
    opened by dineshkm 40
  • Programmatically retrieve service connect endpoint

    Programmatically retrieve service connect endpoint

    Summary

    In order to programmatically retrieve the ServiceConnect endpoint, the ECS client must be injected once created.

    This allows the service connect manager to get the correct endpoint or fallback to a default if there is an error.

    Licensing

    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

    opened by aws-gibbskt 0
  • Batch Size For Task Updates

    Batch Size For Task Updates

    Summary

    1. Creates a taskCount variable to keep track of how many task updates are sent in one minute (which is counted by the taskCountTimer) and sleeps for one minute after the throttlingLimit (for token refills) as defined here for API agent modify actions is reached.
    2. Passes the connectToACS channel into the startDisconnectMode() in order to successfully reconnect to ACS after disconnectModeEnabledis turned on (bug fix).

    Implementation details

    1. In taskHandler.go, there are two new struct variables (the taskCount and taskCountTimer) in order to keep track of task updates being sent. The logic to update the taskCount according to the timer is in the function submitTaskEvents.
    2. The logic to sleep after the limit has been reached is in the functionsendChange.
    3. connectToACS is passed into startDisconnectMode() as a channel argument, allowing the original function and channel to be referenced from the newer function.

    Testing

    The new feature was tested manually by checking log statements, but in this case the throttlingLimit was set to 2, for readability:

    level=debug time=2022-07-26T00:11:32Z msg="Starting taskCountTimer here."
    level=debug time=2022-07-26T00:11:32Z msg="Increasing taskCount by 1" taskCount=1
    level=debug time=2022-07-26T00:11:32Z msg="Checking TaskCount Timer"
    level=debug time=2022-07-26T00:11:32Z msg="Increasing taskCount by 1" taskCount=2
    level=debug time=2022-07-26T00:11:32Z msg="Checking TaskCount Timer"
    level=debug time=2022-07-26T00:11:32Z msg="Reached throttling limit for sending task events, starting sleep for one minute"
    
    

    Description for the changelog

    Feature - sending task state change events in batches

    Licensing

    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

    opened by rsheik29 0
  • [Test PR] test go 1.18.3

    [Test PR] test go 1.18.3

    Summary

    Implementation details

    Testing

    New tests cover the changes:

    Description for the changelog

    Licensing

    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

    opened by prateekchaudhry 0
  • Fixes bug in FSx when password has special characters

    Fixes bug in FSx when password has special characters

    Summary

    When a password contains characters that contain escape sequences such as a quotation mark PowerShell fails the command. Rather than passing on the cmdline directly we read the password from a file which allows it to contain any character sequence.

    Ref: https://github.com/aws/amazon-ecs-agent/issues/3270

    Implementation details

    We now specifically write the raw password to a temporary file that allows PowerShell to read the text directly rather than interpret the text.

    Testing

    Deployed instance on ec2 with private ecs agent. Ran a task Windows task with FSx share and validated that it successfully runs.

    New tests cover the changes: no

    Description for the changelog

    Fixed an issue in FSx Windows shares when the password contains special characters such as a ".

    Licensing

    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

    opened by jterry75 0
  • Add VpcId to TMDE Task Responses

    Add VpcId to TMDE Task Responses

    Summary

    This change is to enhance the TaskResponse returned by Task Metadata Endpoint to include VPC ID. This change does not apply to ECS Anywhere instances, so no VPC ID would be returned by Task Metadata Endpoint on ECS Anywhere instances.

    Implementation details

    For agent instances with TaskENIEnabled setting set to true

    Agent loads the VPC ID of the container instance by querying EC2 Instance Metadata Service (IMDS). So, for TMDE the VPC ID of the container instance is already available in ecsAgent type.

    For agent instances with TaskENIEnabled setting set to false

    Currently the agent does not load the VPC ID of the container instance in this case. This PR includes changes to make the agent load VPC ID (and mac and subnets) of the container instance if the container instance is not external. Doing so shouldn't have any side-effects because these values are not used to drive any logic.

    Changes under this PR add some piping to forward the VPC ID value from ecsAgent type to Task Metadata Endpoint handlers and change the handlers to include the VPC ID value in all task responses. TaskResponse type definition is updated to include a new VPCID field.

    Testing

    Deployed changed agent source to a test EC2 instance, ran test tasks with awsvpc, bridge, and host network modes, and verified that VpcId field is populated in task responses for all cases.

    Deployed changed agent to a test ECS Anywhere instance, ran a test task with host network mode, and verified that VpcId field is not populated and that a successful Task Response is returned.

    Updated MACIS TMDE functional tests for Linux and Windows and ran them against the agent artifacts for this PR for EC2 Linux and Windows, and ECS-A Linux and Windows platforms.

    Updated existing Task Metadata Endpoint unit tests to include VpcId.

    New tests cover the changes: Updated existing tests

    Description for the changelog

    VpcId field will be present in task responses from Task Metadata Endpoint for EC2 instances. ECS Anywhere instances are not affected.

    Licensing

    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

    opened by amogh09 0
  • [test pr] test ppa for go 1.18

    [test pr] test ppa for go 1.18

    Summary

    Implementation details

    Testing

    New tests cover the changes:

    Description for the changelog

    Licensing

    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

    opened by prateekchaudhry 0
Releases(v1.62.0)
  • v1.62.0(Jul 26, 2022)

    Enhancement - Update golang version to 1.18.3 https://github.com/aws/amazon-ecs-agent/pull/3301 Enhancement - Update windows golang version to 1.18.3 https://github.com/aws/amazon-ecs-agent/pull/3317

    Source code(tar.gz)
    Source code(zip)
  • v1.61.3(Jun 15, 2022)

  • v1.61.2(Jun 3, 2022)

    1.61.2

    • Enhancement - Integrate new/updated build targets and processes #3234
    • Enhancement - Trimming task reason to a max of 1024 characters as per Back-end model #3229
    • Enhancement - Add log message when receiving error during cached image inspection #3216
    • Bug - Fix an issue where a task can be stuck in PENDING for ever when container dependencies can never be fulfilled #3218
    Source code(tar.gz)
    Source code(zip)
  • v1.61.1(May 5, 2022)

    • Enhancement - Remove hard-coded task CPU limit and advertise a new capability ecs.capability.increased-task-cpu-limit #3197
    • Enhancement - Simplify api/task code #3176
    • Enhancement - Remove unused .travis.yml file #3171
    • Bug - Fix potential goroutine leaks #3170
    • Bug - Fix credential rotation issue with ECS-A Windows #3184
    • Bug - Fix Windows base image versions for integration tests #3179
    Source code(tar.gz)
    Source code(zip)
  • v1.61.0(Apr 8, 2022)

  • v1.60.1(Mar 28, 2022)

  • v1.60.0(Mar 4, 2022)

  • v1.59.0(Feb 9, 2022)

  • v1.58.0(Jan 20, 2022)

  • v1.57.1(Dec 9, 2021)

    • Enhancement - Remove unused TopContainer API #3079
    • Enhancement - Add support for metrics when using awsvpc network mode on Windows #3087
    • Enhancement - Update Agent build golang version to 1.17.3 #3097
    • Enhancement - Lower task cleanup duration #3088
    • Bug - Fix memory leak in task stats collector #3082
    Source code(tar.gz)
    Source code(zip)
  • v1.57.0(Nov 5, 2021)

  • v1.56.0(Oct 25, 2021)

  • v1.55.5(Oct 15, 2021)

  • v1.55.4(Oct 7, 2021)

  • v1.55.3(Sep 22, 2021)

    1.55.3

    • Enhancement - Upgrade Windows builds to golang version v1.17 #3010
    • Enhancement - Introduce a new environment variable ECS_EXCLUDE_IPV6_PORTBINDING. When enabled, this filters the IPv6 port bindings for default network mode tasks in DescribeTasks API call #3025
    • Bug - Fix a issue that agent does not clean task execution credentials from credential manager when stopping a task #2993
    Source code(tar.gz)
    Source code(zip)
  • v1.55.2(Sep 8, 2021)

    1.55.2

    • Enhancement - Add runtime-stats log file to periodically log agent's runtime stats such as used memory and CPU; also add new configuration setting to enable/disable pprof #3001
    • Enhancement - Improvement of log message displayed when container instance registartion fails due to attribute validation errors #2999
    • Enhancement - Upgrade to go 1.15.9 for Linux platforms #3002
    Source code(tar.gz)
    Source code(zip)
  • v1.55.1(Aug 24, 2021)

  • v1.55.0(Aug 12, 2021)

    • Feature - Support buffer limit option in FireLens #2958
    • Enhancement - Introduce optional jitter for task cleanup wait duration, configurable via ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION_JITTER environment variable. In use case where there are large number of tasks being stopped at the same time, specifying this jitter can help avoid all the task cleanup happening at the same time (the latter could add pressure to the instance and as a result affect running tasks) #2969
    Source code(tar.gz)
    Source code(zip)
  • v1.54.1(Jul 27, 2021)

    1.54.1

    • Enhancement - Get container's exit code from docker event in case we receive a container die event, but fail to inspect the container. Previously the container's exit code was left as null in this case. #2940
    Source code(tar.gz)
    Source code(zip)
  • v1.54.0(Jul 12, 2021)

    1.54.0

    • Feature - ECS EC2 task networking for Windows tasks #2915
    • Bug - Upgrading the amazon-vpc-cni plugins submodule to address a bug on Windows Server 2004 and Windows Server 20H2 platforms #2930
    Source code(tar.gz)
    Source code(zip)
  • v1.53.1(Jun 28, 2021)

  • v1.53.0(Jun 11, 2021)

    1.53.0

    • Bug - Revert change that registered Windows ECS Instances using specific OSFamilyType #2859 to address #2881
    • Bug - Fix an edge case that could incorrectly mark a task as STOPPED when Docker crashes while stopping a container #2885
    Source code(tar.gz)
    Source code(zip)
  • v1.52.2(May 21, 2021)

    1.52.2

    • Enhancement - Validate agent config file path permission on Windows #2866
    • Bug - Fix potential goroutine leak when closing websocket connections #2854
    • Bug - Fix a bug where a task can be stuck in RUNNING indefinitely when a container can't be stopped due to an unresolved docker bug (see also the open PR in moby to fix the bug).
    Source code(tar.gz)
    Source code(zip)
  • v1.52.1(May 15, 2021)

  • v1.52.0(Apr 30, 2021)

  • v1.51.0(Apr 1, 2021)

    • Enhancement - Add configurable agent healthcheck localhost ip env var. #2834
    • Bug - Fix bug that could incorrectly clean up pause container before other containers. #2838
    • Bug - Fix task's network stats by omitting pause container in the network metrics calculation. #2836
    Source code(tar.gz)
    Source code(zip)
  • v1.50.3(Mar 19, 2021)

  • v1.50.2(Feb 22, 2021)

  • v1.50.1(Feb 12, 2021)

    1.50.1

    • Enhancement - Implementation of structured logs on top of seelog #2797
    • Bug - Fixed a task status deadlock and pulled container state for cached images when ECS_PULL_DEPENDENT_CONTAINERS_UPFRONT is enabled #2800
    Source code(tar.gz)
    Source code(zip)
  • v1.50.0(Jan 23, 2021)

    1.50.0

    • Feature - Allows ECS customers to execute interactive commands inside containers #2798
    • Enhancement - Add error responses into TMDEv4 taskWithTags responses #2789
    • Bug - Fixed the number of cpu units the Agent will reserve for the Linux container instances #2783
    Source code(tar.gz)
    Source code(zip)
Prometheus exporter for Amazon Elastic Container Service (ECS)

ecs_exporter ?? ?? ?? This repo is still work in progress and is subject to change. This repo contains a Prometheus exporter for Amazon Elastic Contai

Prometheus Monitoring Community 42 Jul 27, 2022
Test-csi-driver - Amazon Elastic Block Store (EBS) CSI driver

Amazon Elastic Block Store (EBS) CSI driver Overview The Amazon Elastic Block St

Adi Vaknin 0 Feb 1, 2022
Igo Agent is the agent of Igo, a command-line tool, through which you can quickly start Igo

igo agent 英文 | 中文 Igo Agent is the agent of Igo, a command-line tool, through which you can quickly start Igo, and other capabilities may be added lat

null 1 Dec 22, 2021
Shoes-agent - Framework for myshoes provider using agent

shoes-agent Framework for myshoes provider using agent. agent: agent for shoes-a

Tachibana waita 2 Jan 8, 2022
Cloudbase Solutions 1 Feb 17, 2022
Integrated ssh-agent for windows. (pageant compatible. openSSH ssh-agent etc ..)

OmniSSHAgent About The chaotic windows ssh-agent has been integrated into one program. Chaos Map of SSH-Agent on Windows There are several different c

YAMASAKI Masahide 25 Jul 29, 2022
Pulumi provider for the Elasticsearch Service and Elastic Cloud Enterprise

Terraform Bridge Provider Boilerplate This repository contains boilerplate code for building a new Pulumi provider which wraps an existing Terraform p

Pulumi 3 May 25, 2022
Sign Container Images with cosign and Verify signature by using Open Policy Agent (OPA)

Sign Container Images with cosign and Verify signature by using Open Policy Agent (OPA) In the beginning, I believe it is worth saying that this proje

Batuhan Apaydın 59 Jul 7, 2022
Cloud-on-k8s- - Elastic Cloud on Kubernetes (ECK)

Elastic Cloud on Kubernetes (ECK) Elastic Cloud on Kubernetes automates the depl

null 1 Jan 29, 2022
A Terraform module to manage cluster authentication (aws-auth) for an Elastic Kubernetes (EKS) cluster on AWS.

Archive Notice The terraform-aws-modules/eks/aws v.18.20.0 release has brought back support aws-auth configmap! For this reason, I highly encourage us

Aidan Melen 25 Jul 28, 2022
Web user interface and service agent for the monitoring and remote management of WinAFL.

WinAFL Pet WinAFL Pet is a web user interface dedicated to WinAFL remote management via an agent running as a system service on fuzzing machines. The

Gabor Seljan 51 Jul 22, 2022
Moby Project - a collaborative project for the container ecosystem to assemble container-based systems

The Moby Project Moby is an open-source project created by Docker to enable and accelerate software containerization. It provides a "Lego set" of tool

Moby 63.7k Aug 4, 2022
Boxygen is a container as code framework that allows you to build container images from code

Boxygen is a container as code framework that allows you to build container images from code, allowing integration of container image builds into other tooling such as servers or CLI tooling.

nitric 5 Dec 13, 2021
The Container Storage Interface (CSI) Driver for Fortress Block Storage This driver allows you to use Fortress Block Storage with your container orchestrator

fortress-csi The Container Storage Interface (CSI) Driver for Fortress Block Storage This driver allows you to use Fortress Block Storage with your co

Fortress 0 Jan 23, 2022
Fast, concurrent, streaming access to Amazon S3, including gof3r, a CLI. http://godoc.org/github.com/rlmcpherson/s3gof3r

s3gof3r s3gof3r provides fast, parallelized, pipelined streaming access to Amazon S3. It includes a command-line interface: gof3r. It is optimized for

Randall McPherson 1.1k Jul 14, 2022
ecsk is a CLI tool to interactively use frequently used functions of docker command in Amazon ECS. (docker run, exec, cp, logs, stop)

English / 日本語 ecsk ECS + Task = ecsk ?? ecsk is a CLI tool to interactively use frequently used functions of docker command in Amazon ECS. (docker run

null 103 Jul 12, 2022
This repository contains Prowjob configurations for Amazon EKS Anywhere.

Amazon EKS Anywhere Prow Jobs This repository contains Prowjob configuration for the Amazon EKS Anywhere project, which includes the eks-anywhere and

Amazon Web Services 14 Apr 18, 2022
Run Amazon EKS on your own infrastructure 🚀

Amazon EKS Anywhere Conformance test status: Amazon EKS Anywhere is a new deployment option for Amazon EKS that enables you to easily create and opera

Amazon Web Services 1.6k Jul 31, 2022
Amazon Web Services (AWS) providerAmazon Web Services (AWS) provider

Amazon Web Services (AWS) provider The Amazon Web Services (AWS) resource provider for Pulumi lets you use AWS resources in your cloud programs. To us

William Garcia Jacobo 0 Nov 10, 2021