Intro
Recently, I got myself the AWS Certified Associate Solutions Architecture certificate. To start getting more familiar with it, I did the excellent Cloud Guru class on the certificate preparation on Udemy. One part that was completely missing from that preparatory course was ECS. Yet questions related to it came up in the exam. Questions on ECS and Fargate, among others. I though maybe Fargate is something from Star Trek. Enter the Q continuum? But no.
Later on, I also went through the Backspace preparatory course on Udemy, which briefly touches on ECS, but does not really give any in-depth understanding. Maybe the certificate does not require it, but I wanted to learn it to understand the practical options on working with AWS. So I went on to explore.. and here it is.
Elastic Container Service (ECS) is an AWS service for hosting and running Docker images and containers.
ECS Architecture
The following image illustrates the high-level architecture, components, and their relations in ECS (as I see it):
The main components in this:
- Elastic Container Service (ECS): The overarching service name that is composed of the other (following) elements.
- Elastic Container Registry (ECR): basically handles the role of private Docker Hub. Hosts Docker images (=templates for what a container runs).
- Docker Hub: The general Docker Hub on the internet. You can of course use standard Docker images and templates on AWS ECS as well.
- Docker/task runner: The hosts running the Docker containers. Fargate or EC2 runner.
- Docker image builder: Docker images are built from specifications given in a DockerFile. The images can then be run in a Docker container. So if you want to use your own images, you need to build them first, using either AWS EC2 instances, or your own computers. Upload the build images to ECR or Docker Hub. I call the machine used to do the build here "Docker Image Builder" even if it is not an official term.
- Event Sources: Triggers to start some task running in an ECS Docker container. ELB, Cloudwatch and S3 are just some examples here, I have not gone too deep into all the possibilities.
- Elastic Load Balancer (ELB): To route incoming traffic to different container instances/tasks in your ECS configuration. So while ELB can "start tasks", it can also direct traffic to running tasks.
- Scheduled tasks: Besides CloudWatch events, ECS tasks may be manually started or scheduled over time.
Above is of course a simplified description. But it should capture the high level idea.
Fargate: Serverless ECS
Fargate is the "serverless" ECS version. This just means the Docker containers are deployed on hosts fully managed by AWS. It reduces the maintenance overhead on the developer/AWS customer as the EC2 management for the containers is automated. The main difference being that there is no need to define the exact EC2 (host) instance types to run the container(s). This seems like a simply positive thing for me. Otherwise I would need to try to calculate my task resource definitions vs allocated containers, etc. So without Fargate, I need to manage the allocated vs required resources for the Docker containers manually. Seems complicated.
Elastic Container Registry / ECR
ECR is the AWS integrated, hosted, and managed container registry for Docker images. You build your images, upload them to ECR, and these are then available to ECS. Of course, you can also use Docker Hub or any other Docker registry (that you can connect to), but if you run your service on AWS and want to use private container images, ECR just makes sense.
When a new Docker container is needed to perform a task, the AWS ECS infrastructure can then pull the associated "container images" from this registry and deploy them in ECS host instances. The hosts being EC2 instances with the ECS-agent running. The EC2 instances managed either by you (EC2 ECS host type) or by AWS (Fargate).
Since hosting custom images with own code likely includes some IPR you don’t want to share with everyone, ECR is encrypted, as well as all communication with it. There are also ECS VPC Endpoints available to further secure the access and to reduce the communication latencies, removing public Internet roundtrips, with the ECR.
As for availability and reliability, I did not directly find good comments on this, except that the container images and ECR instances are region-specific. While AWS advertises ECR as reliable and scalable and all that, I guess this means they must simply be replicated within the region.
Besides being region-specific, there are also some limitations on the ECS service. But these are in the order of max 10000 repositories per region, each with max of 10000 images. And up to 20 docker pull type requests per second, bursting up to 200 per second. I don’t see myself going over those limits, pretty much ever. With some proper architecting, I do not see this generally happening or these limits becoming a problem. But I am not running Netflix on it, so maybe someone else has it bigger.
ECS Docker Hosting Components
The following image, inspired by a Medium post (thanks!), illustrates the actual Docker related components in ECS:
- Cluster: A group of ECS container instances (for EC2 mode), or a "logical grouping of tasks" (Fargate).
- Container instance: An EC2 instance running the ECS-agent (a Go program, similar to Docker daemon/agent).
- Service: This defines what your Docker tasks are supposed to do. It defines the configuration, such as the Task Defition to run the service, the number of task instances to create from the definition, and the scheduling policy. I see this as a service per task, but defining also how multiple instances of the tasks work together to implement a "service", and their related overall configuration.
- Task Definition: Defines the docker image, resources (CPU, memory), instance type (micro, nano, macro, …), IAM roles, image boot command, …
- Task Instance: An instantiation of a task definition. Like docker run on your own host, but for the ECS.
Elastic Load Balancer / ELB with ECS
The basic function of a load balancer is to spread the load for an ECS service across its multiple tasks running on different host instances. Similar to "traditional" EC2 scaling based on monitored ELB target health and status metrics, scaling on ECS can also be triggered. Simply based on ECS tasks vs pure EC2 instances in a traditional setting.
As noted higher above, an Elastic Load Balancer (ELB) can be used to manage the "dynamics" of the containers coming and going. Unlike in a traditional AWS load balancer setting, with ECS, I do not register the containers to the ELB as targets myself. Instead, the ECS system registers the deployed containers as targets to the ELB target group as the container instances are created. The following image illustrates the process:
The following points illustrate this process:
- ELB performs healthchecks on the containers with a given configuration (e.g., HTTP request on a path). If the health check fails (e.g., HTTP server does not respond), it terminates the associated ECS task and starts another one (according to defined ESC scaling policy)
- Additionally there are also ECS internal healthchecks for similar purposes, but configured directly on the (ECS) containers.
- Metrics such as Cloudwatch monitoring ECS service/task CPU loads can be used to trigger autoscaling, to deploy new tasks for a service (up-scaling) or remove excess tasks (down-scaling).
- As requests come in, they are forwarded to the associated ECS tasks, and the set of tasks may be scaled according to the defined service scaling policy.
- When a new task / container instance is spawned, it registers itself to the ELB target group. The ELB configuration is given in the service definition to enable this.
- Additionally, there can be other tasks not associated to the ELB, such as scheduled tasks, constantly running tasks, tasks triggered by Cloudwatch events or other sources (e.g., your own code on AWS), …
Few points that are still unclear for me:
- An ELB can be set to either instance or port type. I experimented with simple configurations but had the instance type set. Yet the documentation states that with awsvpc network type I should use IP based ELB configuration. But it still seemed to work when I used instance-type. Perhaps I would see more effect with larger configurations..
- How the ECS tasks, container instances, and ELBs actually relate to each other. Does the ELB actually monitor the tasks or the container instances? Does the ELB instance vs port type impact this? Should it monitor tasks but I set it to monitor instances, and it worked simply because I was just running a single task on a single instance? No idea..
Security Groups
As with the other services, such as Lambda in my previous post, to be able to route the traffic from the ELB to the actual Docker containers running your code, the security groups need to be configured to allow this. This would look something like this:
Here, the ELB is allowed to accept connections from the internet, and to make connections to the security group for the containers. The security groups are:
- SG1: Assigned to the ELB. Allows traffic in from the internet. Because it is a security group (not a network access control list]), traffic is also allowed back out if allowed in.
- SG2: Assigned to the ECS EC2 instances. Allows traffic in from SG1. And back out, as usual..
Final Thoughts
I found ECS to be reasonably simple, and providing useful services to simplify management of Docker images and containers. However, Lambda functions seem a lot simpler still, and I would generally use those (as the trend seems to be..). Still, I guess there are still be plenty of use cases for ECS as well. For those with investments into, or otherwise preferences for containers, and for longer running tasks, or otherwise tasks less suited for short invocation Lambdas.
As the AWS ECS-agent is just an open-source program written in Golang and hosted on Github, it seems to me that it should be possible to host ECS agents anywhere I like. Just as long as they could connect to the ECS services. How well that would work from outside the core AWS infrastructure, I am not sure. But why not? Have not tried it, but perhaps..
Looking at ECS and Docker in general, Lambda functions seem like a clear evolution path from this. Docker images are composed from DockerFiles, which build the images from [stacked layers(https://blog.risingstack.com/operating-system-containers-vs-application-containers/)%5D, which are sort of build commands. The final layer being the built "product" of those layered commands. Each layer in a Dockerfile builds on top of the previous layer.
Lambda functions similarly have a feature called Lambda Layers, which can be used to provide a base stack for the Lambda function to execute on. They seem a bit different in defining sets of underlying libraries and how those stack on top of each other. But the core concept seems very similar to me. Finally, the core a Lambda function is the function that executes when a triggering event runs, similar to the docker run command in ECS. Much similar, such function.
The main difference for Lambda vs ECS perhaps seems to be in requiring even less infrastructure management from me when using Lambda (vs ECS). The use of Lambda was illustrated in my earlier post.