Why are spot instances terminated in AWS AutoScaling Group without being forcibly reclaimed?

AutoScaling Group tries to evenly distribute instances across Availability Zones (AZs). If a particular AZ has more instances, it terminates instances in that AZ and launches new ones in AZs with fewer instances to maintain balance.

How can you distinguish between AZ balancing termination and spot instance forced reclamation?

You can check the termination reason in the AWS Console's Activity History. For AZ balancing, messages like 'an instance was launched to balance instances across zones' are recorded.

Unexpected Termination of Spot Instances in AutoScaling Group

Analysis of the causes and countermeasures for unexpected spot instance termination in AWS AutoScaling Group.

Posted Jul 13, 2023 Updated Feb 8, 2026

By w-seok

1 min read

Introduction

A spot instance running in an AWS AutoScaling Group was unexpectedly terminated. This was a termination for reasons other than forced reclamation, and I analyzed the cause, impact, and countermeasures.

Problem Situation

1. Symptoms

A spot instance consuming message queues was terminated even though it wasn’t forcibly reclaimed
Received notification that a new instance was started
Fortunately, another server processed the requeued messages, so there was no service interruption

AutoScaling Group Log

At 2023-07-09T00:32:21Z instances were launched to balance instances
in zones null with other zones resulting in more than desired number
of instances in the group.
At 2023-07-09T00:46:17Z availability zones
had 4 1 instances respectively. An instance was launched to aid in
balancing the group's zones.

Looking at the details more closely:

Availability zones are isolated locations within a region. These are independent zones, for example us-west-2a, us-west-2b, us-west-2c.
AutoScaling distributes instances evenly across each availability zone.

At this time, it launches instances in the availability zone with the fewest active instances and terminates instances elsewhere to achieve this.

Conclusion

In other words, the problem occurred because an instance in a region with many instances was terminated to evenly distribute the number of instances across each region
Fortunately, it’s very rare for an instance processing user requests to die, and most of the time it’s a newly launched instance that gets terminated. Even if an instance processing requests dies, unprocessed messages are requeued back into the queue, so we resolved it by just logging to Slack

Issue

Issue AutoScaling Group AWS Spot Instance

This post is licensed under CC BY 4.0 by the author.