In Azure, Update Domains (UDs) play a crucial role in maintaining high availability during planned maintenance and platform updates.
Understanding how update domains work and how to configure them effectively is essential for ensuring that your applications experience minimal downtime during updates, reboots, or patches.
Here’s a comprehensive review of what you need to know about Update Domains in Azure.
Definition of Update Domain (UD)
An Update Domain (UD) is a logical grouping of Azure virtual machines (VMs) that are updated (patched, rebooted, etc.) together during planned maintenance or platform updates.
Azure performs updates on VMs within an Update Domain one at a time, ensuring that not all VMs in an Availability Set or Virtual Machine Scale Set (VMSS) are impacted by planned updates at once.
This minimizes downtime and ensures the continuous availability of your application.
Purpose of Update Domains
The primary purpose of update domains is to provide resilience during planned maintenance events.
Key Points
Isolation during updates
VMs in different update domains are not updated simultaneously.
This isolation ensures that if one update domain is being updated (e.g., patched or rebooted), VMs in other update domains continue to operate without interruption.
Preventing full downtime
During scheduled maintenance events (like operating system patches), Azure only updates one update domain at a time.
If you have multiple update domains, your application remains available, even if one update domain is affected.
How Update Domains Work
When you deploy VMs in an Availability Set or Virtual Machine Scale Set (VMSS), Azure automatically divides the VMs into multiple update domains.
During platform maintenance, only the VMs in the update domain currently being updated will be impacted (rebooted, patched, etc.).
VMs in other update domains will remain unaffected and continue to serve traffic.
Example
Availability Set with 3 Update Domains
If you have 6 VMs in an Availability Set with 3 update domains, Azure will distribute the VMs across the update domains like this:
Update Domain 1: VMs 1, 2
Update Domain 2: VMs 3, 4
Update Domain 3: VMs 5, 6
During an update, Azure will first update Update Domain 1 (reboot or apply patches to VMs 1 and 2), while VMs in Update Domain 2 and 3 continue running.
Once Update Domain 1 is updated, Azure will proceed to Update Domain 2, and so on.
Azure's Maintenance Cycle
This ensures that not all VMs are rebooted or patched simultaneously.
Therefore, only a portion of the VMs are unavailable at any given time, thus maintaining the availability of the application.
How to Configure Update Domains
Setting Update Domains in Availability Sets
When creating an Availability Set, you can specify the number of update domains to divide the VMs across.
Azure typically allows up to 20 update domains, depending on the size of the deployment.
Minimum Update Domains
You must have at least 2 update domains to ensure that VMs are not all affected during planned maintenance, ensuring the availability of your application.
Best Practices
For most scenarios, aim for at least 3 update domains for better fault tolerance, though the optimal number depends on the size of your application and how critical it is for uptime.
Setting Update Domains in VM Scale Sets (VMSS)
When creating VM Scale Sets, Azure will automatically distribute the VMs across multiple update domains to provide resilience during updates.
VMSS can have up to 20 update domains depending on the number of instances and scale of the deployment.
Update Domain Limits
Maximum Number of Update Domains
Azure allows a maximum of 20 update domains for a single Availability Set or VMSS deployment.
The actual number of update domains may vary based on the region, VM size, and other factors.
For large-scale deployments (with hundreds or thousands of VMs), Azure VM Scale Sets (VMSS) might offer a more scalable solution for managing update domains.
Effect on VM Availability
The more update domains you configure, the more granular control you have over when and how your VMs are updated, allowing you to balance updates across multiple instances without service interruption.
Impact of Update Domains on SLAs
SLA Considerations
For 99.95% uptime SLA, Azure requires that VMs be deployed in an Availability Set with at least 2 update domains.
With multiple update domains, your application has higher availability during platform updates because not all VMs will be affected by the updates at once.
SLA Example
VMs in an Availability Set with 2 Update Domains
In this case, if one update domain undergoes a planned maintenance event, the other update domain will still be available, maintaining the 99.95% SLA.
VMs in an Availability Set with 3 or more Update Domains
The more update domains you have, the better protected you are from downtime during planned updates, ensuring even higher availability.
Role of Load Balancers
Azure Load Balancer is essential when using multiple update domains to ensure that traffic is routed to healthy VMs during planned maintenance or updates.
If one update domain is being updated, the Load Balancer will ensure that traffic is only directed to the healthy VMs in other update domains.
Load Balancer Configuration
Health Probes
Configure health probes to ensure the Load Balancer only sends traffic to healthy VMs.
Health probes will detect if a VM in a specific update domain is undergoing maintenance and redirect traffic to the available VMs.
Automatic Failover
If a VM in an update domain is down (due to maintenance or failure), the Load Balancer can automatically failover to healthy instances in other update domains.
Best Practices for Update Domains
Use Multiple Update Domains
Always deploy VMs across multiple update domains to avoid having all VMs in your Availability Set or VMSS go down during maintenance.
At least 3 update domains is a good practice to improve resilience and availability.
Spread Critical Workloads Across Update Domains
Ensure that critical workloads are spread across different update domains.
This way, if one update domain goes down during maintenance, the other update domains can still serve traffic.
Use Azure Load Balancer
Configure an Azure Load Balancer to distribute traffic across healthy VMs in multiple update domains.
This ensures that users experience minimal disruption during updates.
Monitor with Azure Monitor
Azure Monitor provides insights into the health of VMs and allows you to track when updates or maintenance occur, so you can respond quickly if any issues arise.
Balance Between Fault and Update Domains
A well-balanced distribution of VMs across fault domains and update domains will ensure that both planned and unplanned downtime have minimal impact on your application.
Test and Verify
Regularly test your application and update strategies to ensure that the planned maintenance processes work as expected without affecting the overall availability or performance.
Differences Between Update Domains and Fault Domains
Aspect | Fault Domain (FD) | Update Domain (UD) |
---|---|---|
Definition | A grouping of VMs across different physical hardware (racks) to protect against hardware failures | A grouping of VMs that are updated together during maintenance |
Purpose | To protect against hardware failures (e.g., server crashes, power failures) | To protect against downtime during planned updates (e.g., patches) |
Impact of Failure | If a fault domain fails (e.g., hardware failure), VMs in that domain are affected | If an update domain is being updated, VMs in that domain are affected (e.g., reboot, patch) |
Number of Domains | Azure allows up to 3 fault domains in most regions | Azure allows up to 20 update domains in most regions |
Example Use Case | Spread VMs across fault domains to protect against hardware issues | Spread VMs across update domains to ensure minimal downtime during updates |
Management | Managed by Azure for hardware redundancy | Managed by Azure for software update redundancy |
Summary
Update Domains in Azure are crucial for maintaining the availability of your application during planned maintenance and platform updates.
They ensure that not all VMs in an Availability Set or VM Scale Set are updated at the same time, which protects your application from downtime caused by reboots or patching.
By configuring multiple update domains (typically between 2 and 20), you can ensure minimal impact on application availability during updates.
Always use a Load Balancer to distribute traffic across healthy VMs and monitor your VMs to manage updates effectively.
Leave a Reply