I’ve recently been asked by a colleague whether I had any PowerShell scripts that would automate the test failover and cleanup of Azure Site Recovery replicated VMs and my original thought was that there must be plenty of scripts available on the internet but quickly found that results from Google were either the official Microsoft documentation that goes through configuring ASR, replicate, and only failover over one VM (https://docs.microsoft.com/en-us/azure/site-recovery/azure-to-azure-powershell) or blog posts that provided bits and pieces of information and not a complete script.
Having been involved in Azure Site Recovery design, implementation and testing, I have created a PowerShell script to initiate the failover of a recovery plan and then perform the cleanup when the DR environment has been tested. This post serves to share the script that I use and I would encourage anyone who decides to use it to improve and customize the script as needed.
Environment
The environment this script will be used for will have the source as an on-premise and target in Azure’s East US region. The source environment are virtual machines hosted on VMware vSphere.
Requirements
- Account with appropriate permissions that will be used to connect to the tenant with the Connect-AzAccount PowerShell cmdlet
- Recovery Plan already configured (we’ll be initiating the Test failover on the Recovery Plan and not individual VMs).
- The Subscription ID containing the servers being repliated
- The name of the Recovery Services Vault containing the replicated VMs
- The Recovery Plan name that will be failed over
- The VNet name that will be used for the failover VMs
Script Process
- Connect to Azure with Connect-AzConnect
- Set the context to the subscription ID
- Initiates the Test Failover task for the recovery plan
- Wait until the Test Failover has completed
- Notify user that the Test Failover has completed
- Pause and prompt the user to cleanup the failover test VMs
- Proceed to clean up Test Failover
- End script
I have plans in the future to add additional improvements such as accepting a subscription ID upon execution, providing recovery plan selection for failover testing, or listing failed over VM details (I can’t seem to find a cmdlet that displays the list of VMs and its status in a specified Recovery Group).
Script Variables
$RSVaultName = <name of Recovery Services Group> - e.g. "rsv-us-eus-contoso-asr"
$ASRRecoveryPlanName = <name of Recovery Plan> - e.g. "Recover-Domain-Controllers"
$TestFailoverVNetName = <Name of VNet name in the failover site the VM is to be connected to> - e.g. "vnet-us-eus-dr"
The Script
The following is the script:
Connect-AzAccount
Set-AzContext -SubscriptionId "adae0952-xxxx-xxxx-xxxx-2b8ef42c9bbb"
$RSVaultName = "rsv-us-eus-contoso-asr"
$ASRRecoveryPlanName = "Recover-Domain-Controllers"
$TestFailoverVNetName = "vnet-us-eus-dr"
$vault = Get-AzRecoveryServicesVault -Name $RSVaultName
Set-AzRecoveryServicesAsrVaultContext -Vault $vault
$RecoveryPlan = Get-AzRecoveryServicesAsrRecoveryPlan -FriendlyName $ASRRecoveryPlanName
$TFOVnet = Get-AzVirtualNetwork -Name $TestFailoverVNetName
$TFONetwork= $TFOVnet.Id
#Start test failover of recovery plan
$Job_TFO = Start-AzRecoveryServicesAsrTestFailoverJob -RecoveryPlan $RecoveryPlan -Direction PrimaryToRecovery -AzureVMNetworkId $TFONetwork
do {
$Job_TFOState = Get-AzRecoveryServicesAsrJob -Job $Job_TFO | Select-Object State
Clear-Host
Write-Host "======== Monitoring Failover ========"
Write-Host "Status will refresh every 5 seconds."
try {
}
catch {
Write-Host -ForegroundColor Red "ERROR - Unable to get status of Failover job"
Write-Host -ForegroundColor Red "ERROR - " + $_
log "ERROR" "Unable to get status of Failover job"
log "ERROR" $_
exit
}
Write-Host "Failover status for $($Job_TFO.TargetObjectName) is $($Job_TFOState.state)"
Start-Sleep 5;
} while (($Job_TFOState.state -eq "InProgress") -or ($Job_TFOState.state -eq "NotStarted"))
if($Job_TFOState.state -eq "Failed"){
Write-host("The test failover job failed. Script terminating.")
Exit
}else {
Read-Host -Prompt "Test failover has completed. Please check ASR Portal, test VMs and press enter to perform cleanup..."
#Start test failover cleanup of recovery plan
$Job_TFOCleanup = Start-AzRecoveryServicesAsrTestFailoverCleanupJob -RecoveryPlan $RecoveryPlan -Comment "Testing Completed"
do {
$Job_TFOCleanupState = Get-AzRecoveryServicesAsrJob -Job $Job_TFOCleanup | Select-Object State
Clear-Host
Write-Host "======== Monitoring Cleanup ========"
Write-Host "Status will refresh every 5 seconds."
try {
}
catch {
Write-Host -ForegroundColor Red "ERROR - Unable to get status of cleanup job"
Write-Host -ForegroundColor Red "ERROR - " + $_
log "ERROR" "Unable to get status of cleanup job"
log "ERROR" $_
exit
}
Write-Host "Cleanup status for $($Job_TFO.TargetObjectName) is $($Job_TFOCleanupState.state)"
Start-Sleep 5;
} while (($Job_TFOCleanupState.state -eq "InProgress") -or ($Job_TFOCleanupState.state -eq "NotStarted"))
Write-Host "Test failover cleanup completed."
}
The following are screenshots of the PowerShell script output:
I hope this will help anyone out there who may be looking for a PowerShell script to automate ASR failover process.
One of the additions I wanted to add to this script was to list the Status VMs in the recovery group after the test failover has completed but I could not find a way to list the VMs that only belong to the recovery group. The cmdlets below lists all of the VMs that are protected but combing through the properties does not appear to contain any reference to what recovery plans they belong to. Please feel free to comment if you happen to know the solution.
$PrimaryFabric = Get-AzRecoveryServicesAsrFabric -FriendlyName svr-asr-01
#svr-asr-01 represents Configuration Servers
$PrimaryProtContainer = Get-AzRecoveryServicesAsrProtectionContainer -Fabric $PrimaryFabric
$ReplicationProtectedItem = Get-AzRecoveryServicesAsrReplicationProtectedItem -ProtectionContainer $PrimaryProtContainer
----------Update July 31, 2021---------
After reviewing some of my old notes, I managed to find another version of the PowerShell script that performed test failover for two plans and included steps to shutdown a VM, remove VNet peering between production and DR regions before the test failover, then recreate them afterwards. The following is a copy of the script:
Connect-AzAccount
Set-AzContext -SubscriptionId "53ea69af-xxx-xxxx-a020-xxxxea02f8b"
#Shutdown DC2
Write-Host "Shutting down DC2 VM in DR"
$DRDCName = "DC2"
$DRDCRG = "Canada-East-Prod"
Stop-AzVM -ResourceGroupName $DRDCRG -Name $DRDCName -force
#Declare variables for DR production VNet
$DRVNetName = "vnet-prod-canadaeast"
$DRVnetRG = "Canada-East-Prod"
$DRVNetPeerName = "DR-to-Prod"
$DRVNetObj = Get-AzVirtualNetwork -Name $DRVNetName
$DRVNetID = $DRVNetObj.ID
#Declare variables for Production VNet
$ProdVNetName = "Contoso-Prod-vnet"
$ProdVnetRG = "Contoso-Prod"
$ProdVNetPeerName = "Prod-to-DR"
$ProdVNetObj = Get-AzVirtualNetwork -Name $ProdVNetName
$ProdVNetID = $ProdVNetObj.ID
# Remove the DR VNet's peering to production
Write-Host "Removing VNet peering between Production and DR environment"
Remove-AzVirtualNetworkPeering -Name $DRVNetPeerName -VirtualNetworkName $DRVNetName -ResourceGroupName $DRVnetRG -force
Remove-AzVirtualNetworkPeering -Name $ProdVNetPeerName -VirtualNetworkName $ProdVNetName -ResourceGroupName $ProdVnetRG -force
#Failover Test for Domain Controller BREAZDC2
$RSVaultName = "rsv-asr-canada-east"
$ASRRecoveryPlanName = "Domain-Controller"
$TestFailoverVNetName = "vnet-prod-canadaeast"
$vault = Get-AzRecoveryServicesVault -Name $RSVaultName
Set-AzRecoveryServicesAsrVaultContext -Vault $vault
$RecoveryPlan = Get-AzRecoveryServicesAsrRecoveryPlan -FriendlyName $ASRRecoveryPlanName
$TFOVnet = Get-AzVirtualNetwork -Name $TestFailoverVNetName
$TFONetwork= $TFOVnet.Id
$Job_TFO = Start-AzRecoveryServicesAsrTestFailoverJob -RecoveryPlan $RecoveryPlan -Direction PrimaryToRecovery -AzureVMNetworkId $TFONetwork
do {
$Job_TFOState = Get-AzRecoveryServicesAsrJob -Job $Job_TFO | Select-Object State
Clear-Host
Write-Host "======== Monitoring Failover ========"
Write-Host "Status will refresh every 5 seconds."
try {
}
catch {
Write-Host -ForegroundColor Red "ERROR - Unable to get status of Failover job"
Write-Host -ForegroundColor Red "ERROR - " + $_
log "ERROR" "Unable to get status of Failover job"
log "ERROR" $_
exit
}
Write-Host "Failover status for $($Job_TFO.TargetObjectName) is $($Job_TFOState.state)"
Start-Sleep 5;
} while (($Job_TFOState.state -eq "InProgress") -or ($Job_TFOState.state -eq "NotStarted"))
if($Job_TFOState.state -eq "Failed"){
Write-host("The test failover job failed. Script terminating.")
Exit
}else {
#Failover Test for Remaining Servers
$ASRRecoveryPlanName = "DR-Servers"
$RecoveryPlan = Get-AzRecoveryServicesAsrRecoveryPlan -FriendlyName $ASRRecoveryPlanName
$Job_TFO = Start-AzRecoveryServicesAsrTestFailoverJob -RecoveryPlan $RecoveryPlan -Direction PrimaryToRecovery -AzureVMNetworkId $TFONetwork
do {
$Job_TFOState = Get-AzRecoveryServicesAsrJob -Job $Job_TFO | Select-Object State
Clear-Host
Write-Host "======== Monitoring Failover ========"
Write-Host "Status will refresh every 5 seconds."
try {
}
catch {
Write-Host -ForegroundColor Red "ERROR - Unable to get status of Failover job"
Write-Host -ForegroundColor Red "ERROR - " + $_
log "ERROR" "Unable to get status of Failover job"
log "ERROR" $_
exit
}
Write-Host "Failover status for $($Job_TFO.TargetObjectName) is $($Job_TFOState.state)"
Start-Sleep 5;
} while (($Job_TFOState.state -eq "InProgress") -or ($Job_TFOState.state -eq "NotStarted"))
if($Job_TFOState.state -eq "Failed"){
Write-host("The test failover job failed. Script terminating.")
Exit
}else {
Read-Host -Prompt "Test failover has completed. Please check ASR Portal, test VMs and press enter to perform cleanup..."
$Job_TFOCleanup = Start-AzRecoveryServicesAsrTestFailoverCleanupJob -RecoveryPlan $RecoveryPlan -Comment "Testing Completed"
do {
$Job_TFOCleanupState = Get-AzRecoveryServicesAsrJob -Job $Job_TFOCleanup | Select-Object State
Clear-Host
Write-Host "======== Monitoring Cleanup ========"
Write-Host "Status will refresh every 5 seconds."
try {
}
catch {
Write-Host -ForegroundColor Red "ERROR - Unable to get status of cleanup job"
Write-Host -ForegroundColor Red "ERROR - " + $_
log "ERROR" "Unable to get status of cleanup job"
log "ERROR" $_
exit
}
Write-Host "Cleanup status for $($Job_TFO.TargetObjectName) is $($Job_TFOCleanupState.state)"
Start-Sleep 5;
} while (($Job_TFOCleanupState.state -eq "InProgress") -or ($Job_TFOCleanupState.state -eq "NotStarted"))
$ASRRecoveryPlanName = "Domain-Controller"
$RecoveryPlan = Get-AzRecoveryServicesAsrRecoveryPlan -FriendlyName $ASRRecoveryPlanName
$Job_TFOCleanup = Start-AzRecoveryServicesAsrTestFailoverCleanupJob -RecoveryPlan $RecoveryPlan -Comment "Testing Completed"
do {
$Job_TFOCleanupState = Get-AzRecoveryServicesAsrJob -Job $Job_TFOCleanup | Select-Object State
Clear-Host
Write-Host "======== Monitoring Cleanup ========"
Write-Host "Status will refresh every 5 seconds."
try {
}
catch {
Write-Host -ForegroundColor Red "ERROR - Unable to get status of cleanup job"
Write-Host -ForegroundColor Red "ERROR - " + $_
log "ERROR" "Unable to get status of cleanup job"
log "ERROR" $_
exit
}
Write-Host "Cleanup status for $($ASRRecoveryPlanName) is $($Job_TFOCleanupState.state)"
Start-Sleep 5;
} while (($Job_TFOCleanupState.state -eq "InProgress") -or ($Job_TFOCleanupState.state -eq "NotStarted"))
Write-Host "Test failover cleanup completed."
}
}
#Create the DR VNet's peering to production
Write-Host "Recreating VNet peering between Production and DR environment after failover testing"
Add-AzVirtualNetworkPeering -Name $DRVNetPeerName -VirtualNetwork $DRVNetObj -RemoteVirtualNetworkId $ProdVNetID -AllowForwardedTraffic
Add-AzVirtualNetworkPeering -Name $ProdVNetPeerName -VirtualNetwork $ProdVNetObj -RemoteVirtualNetworkId $DRVNetID -AllowForwardedTraffic
#Power On DC2
Write-Host "Powering on DC2 VM in DR after testing"
Start-AzVM -ResourceGroupName $DRDCRG -Name $DRDCName