How to Automate Container Instance Draining in Amazon ECS
My colleague Madhuri Peri sent a nice guest post that describes how to use container instance draining to remove tasks from an instance before scaling down a cluster with Auto Scaling Groups.
—–
There are times when you might need to remove an instance from an Amazon ECS cluster; for example, to perform system updates, update the Docker daemon, or scale down the cluster size. Container instance draining enables you to remove a container instance from a cluster without impacting tasks in your cluster. It works by preventing new tasks from being scheduled for placement on the container instance while it is in the DRAINING state, replacing service tasks on other container instances in the cluster if the resources are available, and enabling you to wait until tasks have successfully moved before terminating the instance.
You can change a container instance’s state to DRAINING manually, but in this post, I demonstrate how to use container instance draining with Auto Scaling groups and AWS Lambda to automate the process.
Amazon ECS overview
Amazon ECS is a container management service that makes it easy to run, stop, and manage Docker containers on a cluster, or logical grouping of EC2 instances. When you run tasks using ECS, you place them on a cluster. Amazon ECS downloads your container images from a registry that you specify, and runs those images on the container instances within your cluster.
Using the container instance draining state
Auto Scaling groups support lifecycle hooks that can be invoked to allow custom processes to finish before instances launch or terminate. For this example, the lifecycle hook invokes a Lambda function that performs two tasks:
- Sets the ECS container instance state to DRAINING.
- Checks if there are any tasks left on the container instance. If there are running tasks still in process of draining, it posts a message to SNS so that the Lambda function is called again.
Lambda repeats step 2 until there are no tasks running on the container instance OR the heartbeat timeout on the lifecycle hook is reached (set to TTL 15 minutes in the sample CloudFormation template), whichever occurs first. Afterward, control is returned to the Auto Scaling lifecycle hook, and the instance terminates. This process is shown in the following diagram:
Try it out!
Use the CloudFormation template to set up the resources described in this post. This template creates the following resources:
- The VPC and associated network elements (subnets, security groups, route table, etc.)
- An ECS cluster, ECS service, and sample ECS task definition
- An Auto Scaling group with two EC2 instances and a termination lifecycle hook
- A Lambda function
- An SNS topic
- IAM roles for Lambda to execute
Create the CloudFormation stack and then see how this works by triggering an instance termination event.
In the Amazon EC2 console, choose Auto Scaling Groups and select the name of the Auto Scaling group created by CloudFormation (from the resources section of the CloudFormation template).
Select Actions, Edit and update the service to reduce the desired number of instances by “1”. This initiates one of the instances’ termination process.
Select the Auto Scaling group Instances tab; one instance state value should show the lifecycle state “Terminating:Wait”.
This is when the lifecycle hook gets activated and posts a message to SNS. The Lambda function is then executed in response to the SNS message trigger.
The Lambda function changes the ECS container instance state to DRAINING. The ECS service scheduler then stop the tasks on the instance and starts tasks on an available instance.
You can go to the ECS console to confirm that the container instance state is DRAINING.
After the tasks have drained, the Auto Scaling group activity history confirms that the EC2 instance is terminated.
How it works
Take a moment to see the inner workings of the Lambda function. The function first checks to see if the event received has a LifecycleTransition value matching autoscaling:EC2_INSTANCE_TERMINATING.
# If the event received is instance terminating... if 'LifecycleTransition' in message.keys(): print("message autoscaling {}".format(message['LifecycleTransition'])) if message['LifecycleTransition'].find('autoscaling:EC2_INSTANCE_TERMINATING') > -1:
If there is a match, it proceeds to call the function “checkContainerInstanceTaskStatus”. This function gets the container instance ID of the EC2 instance ID received, and sets the container instance state to ‘DRAINING’.
# Get lifecycle hook name lifecycleHookName = message['LifecycleHookName'] print("Setting lifecycle hook name {} ".format(lifecycleHookName)) # Check if there are any tasks running on the instance tasksRunning = checkContainerInstanceTaskStatus(Ec2InstanceId)
It then checks to see if there are tasks running on the instance. If there are tasks, it publishes a message to the SNS topic to trigger the Lambda function again and then exits.
# Use Task ARNs to get describe tasks descTaskResp = ecsClient.describe_tasks(cluster=clusterName, tasks=listTaskResp['taskArns']) for key in descTaskResp['tasks']: print("Task status {}".format(key['lastStatus'])) print("Container instance ARN {}".format(key['containerInstanceArn'])) print("Task ARN {}".format(key['taskArn'])) # Check if any tasks are running if len(descTaskResp['tasks']) > 0: print("Tasks are still running..") return 1 else: print("NO tasks are on this instance {}..".format(Ec2InstanceId)) return 0
When the Lambda function sees that no more tasks are running on the container instance, it proceeds to complete the lifecycle hook and terminate the EC2 instance.
#Complete lifecycle hook. try: response = asgClient.complete_lifecycle_action( LifecycleHookName=lifecycleHookName, AutoScalingGroupName=asgGroupName, LifecycleActionResult='CONTINUE', InstanceId=Ec2InstanceId) print("Response = {}".format(response)) print("Completedlifecycle hook action") except Exception, e: print(str(e))
Conclusion
Container instance draining simplifies cluster scale-down and operational activities such as new AMI rollouts. For example, with the integration described in this post, you could use CloudFormation and CodePipeline to create a rolling deployment that launches new instances and terminates instances in batches.
To learn more about container instance draining, see the Amazon ECS Developer Guide.
If you have questions or suggestions, please comment below.
相關推薦
How to Automate Container Instance Draining in Amazon ECS
My colleague Madhuri Peri sent a nice guest post that describes how to use container instance draining to remove tasks from an instance before sca
How to Deploy a Kubernetes Application with Amazon Elastic Container Service for Kubernetes
This tutorial shows you how to deploy a containerized application onto a Kubernetes cluster managed by Amazon Elastic Container Service
[Python] How to unpack and pack collection in Python?
ide ont add off art video lec ref show It is a pity that i can not add the video here. As a result, i offer the link as below: How to
How To View the HTML Source in Google Chrome
inner eve spi together member mes mnt line split Whether you are new to the web industry or a seasoned veteran, viewing the HTML source o
How to Find Processlist Thread id in gdb !!!!!GDB 使用
ren openss lua comm lte ext htm out int https://mysqlentomologist.blogspot.jp/2017/07/ Saturday, July
How to Install The Latest Eclipse in Ubuntu 16.04, 15.10?
How to Install The Latest Eclipse in Ubuntu 16.04, 15.10? 1. Install Java Don’t have Java installed? Search for and install OpenJDK Java 7 or
How To Change Log Rate Limiting In Linux
ratelimit record cap reac systemctl evel rem mat mil By default in Linux there are a few different mechanisms in place that may rate limi
How to setup oAuth 1.0 in NetSuite RESTlet API 如何在NetSuite中設定RESTlet API的oAuth認證
步驟如下: 1. Got Restlet URL 訪問RESTlet的Deployment,這樣獲取WebService要Post或訪問到的具體URL地址, 如果你疑惑RESTlet是什麼,那要等我下一篇文章再介紹。 2. Setup Roles for Token user, goe
[轉]How to display the data read in DataReceived event handler of serialport
本文轉自:https://stackoverflow.com/questions/11590945/how-to-display-the-data-read-in-datareceived-event-handler-of-serialport 問: I have the followin
How to use *args and **kwargs in Python
這篇文章寫的滿好的耶,結論: 1星= array, 2星=dictionary. 1星範例: def test_var_args(farg, *args): print "formal arg:", farg for arg in args: print "an
How to know the directory size in CENTOS 檢視資料夾大小
Under any linux system, you want to use the command du. (Disk Usage) Common usage is : du -sh file(s) name(s) or du -sh /path/to/dir/* du -sh
How to display count of notifications in app launcher icon
Android (“vanilla” android without custom launchers and touch interfaces) does not allow changing of the application icon, because it is sealed in the .ap
How to get client Ip Address in Java Servlet
Try this one, String ipAddress = request.getHeader("X-FORWARDED-FOR"); if (ipAddress == null) { ipAddress = request.getRemoteAdd
[iOS] How to make a Global function in Swift
You can create a custom class with the method you need like this: class MyScene: SKScene { func CheckMusicMute() { if InGameMusicOnOff == tr
Ask HN: How to take advantage of living in the Bay?
Before moving to the Bay, I had hoped that the Bay to America would be like America to the rest of the world.I grew up in a developing country with an auth
How to write tidy SQL queries in R
How to write tidy SQL queries in RMost of us have to interact with databases nowadays, and SQL is by far the most common language used. However, working wi
How to make the impossible possible in CSS with a little creativity
CSS Previous sibling selectors don’t exist, but that doesn’t mean we can’t use themIf you ever used CSS sibling selectors, you know there’s only two. The +
Ask HN: How to validate a market opportunity in a cost effective way?
Here's the way that I do it. Copied from a previous comment:Keywords, Google keyword tool, forum chats, Reddit comments. Use tools like these to find EVIDE
Ask HN: How to model numerical energy data in Wolfram Alpha
I'm working on a dataset that contains energy supply and consumption data. This is just a hobby and idea is to do visualizations and simple moodelling base
How to Automate Surveillance Easily with Deep Learning
Surveillance is an integral part of security and patrol. For the most part, the job entails extended periods of looking out for something undesirable to ha