Pages

Friday, July 30, 2021

Automating Azure Site Recovery Recovery Plan Test Failover with PowerShell Script (on-premise VMs to Azure)

I’ve recently been asked by a colleague whether I had any PowerShell scripts that would automate the test failover and cleanup of Azure Site Recovery replicated VMs and my original thought was that there must be plenty of scripts available on the internet but quickly found that results from Google were either the official Microsoft documentation that goes through configuring ASR, replicate, and only failover over one VM (https://docs.microsoft.com/en-us/azure/site-recovery/azure-to-azure-powershell) or blog posts that provided bits and pieces of information and not a complete script.

Having been involved in Azure Site Recovery design, implementation and testing, I have created a PowerShell script to initiate the failover of a recovery plan and then perform the cleanup when the DR environment has been tested. This post serves to share the script that I use and I would encourage anyone who decides to use it to improve and customize the script as needed.

Environment

The environment this script will be used for will have the source as an on-premise and target in Azure’s East US region. The source environment are virtual machines hosted on VMware vSphere.

Requirements

  1. Account with appropriate permissions that will be used to connect to the tenant with the Connect-AzAccount PowerShell cmdlet
  2. Recovery Plan already configured (we’ll be initiating the Test failover on the Recovery Plan and not individual VMs).
  3. The Subscription ID containing the servers being repliated
  4. The name of the Recovery Services Vault containing the replicated VMs
  5. The Recovery Plan name that will be failed over
  6. The VNet name that will be used for the failover VMs

Script Process

  1. Connect to Azure with Connect-AzConnect
  2. Set the context to the subscription ID
  3. Initiates the Test Failover task for the recovery plan
  4. Wait until the Test Failover has completed
  5. Notify user that the Test Failover has completed
  6. Pause and prompt the user to cleanup the failover test VMs
  7. Proceed to clean up Test Failover
  8. End script

I have plans in the future to add additional improvements such as accepting a subscription ID upon execution, providing recovery plan selection for failover testing, or listing failed over VM details (I can’t seem to find a cmdlet that displays the list of VMs and its status in a specified Recovery Group).

Script Variables

$RSVaultName = <name of Recovery Services Group> - e.g. "rsv-us-eus-contoso-asr"

$ASRRecoveryPlanName = <name of Recovery Plan> - e.g. "Recover-Domain-Controllers"

$TestFailoverVNetName = <Name of VNet name in the failover site the VM is to be connected to> - e.g. "vnet-us-eus-dr"

The Script

The following is the script:

Connect-AzAccount

Set-AzContext -SubscriptionId "adae0952-xxxx-xxxx-xxxx-2b8ef42c9bbb"

$RSVaultName = "rsv-us-eus-contoso-asr"

$ASRRecoveryPlanName = "Recover-Domain-Controllers"

$TestFailoverVNetName = "vnet-us-eus-dr"

$vault = Get-AzRecoveryServicesVault -Name $RSVaultName

Set-AzRecoveryServicesAsrVaultContext -Vault $vault

$RecoveryPlan = Get-AzRecoveryServicesAsrRecoveryPlan -FriendlyName $ASRRecoveryPlanName

$TFOVnet = Get-AzVirtualNetwork -Name $TestFailoverVNetName

$TFONetwork= $TFOVnet.Id

#Start test failover of recovery plan

$Job_TFO = Start-AzRecoveryServicesAsrTestFailoverJob -RecoveryPlan $RecoveryPlan -Direction PrimaryToRecovery -AzureVMNetworkId $TFONetwork

do {

$Job_TFOState = Get-AzRecoveryServicesAsrJob -Job $Job_TFO | Select-Object State

Clear-Host

Write-Host "======== Monitoring Failover ========"

Write-Host "Status will refresh every 5 seconds."

try {

    }

catch {

Write-Host -ForegroundColor Red "ERROR - Unable to get status of Failover job"

Write-Host -ForegroundColor Red "ERROR - " + $_

        log "ERROR" "Unable to get status of Failover job"

        log "ERROR" $_

exit

    }

Write-Host "Failover status for $($Job_TFO.TargetObjectName) is $($Job_TFOState.state)"

Start-Sleep 5;

} while (($Job_TFOState.state -eq "InProgress") -or ($Job_TFOState.state -eq "NotStarted"))

if($Job_TFOState.state -eq "Failed"){

Write-host("The test failover job failed. Script terminating.")

Exit

}else {

Read-Host -Prompt "Test failover has completed. Please check ASR Portal, test VMs and press enter to perform cleanup..."

#Start test failover cleanup of recovery plan

$Job_TFOCleanup = Start-AzRecoveryServicesAsrTestFailoverCleanupJob -RecoveryPlan $RecoveryPlan -Comment "Testing Completed"

do {

$Job_TFOCleanupState = Get-AzRecoveryServicesAsrJob -Job $Job_TFOCleanup | Select-Object State

Clear-Host

Write-Host "======== Monitoring Cleanup ========"

Write-Host "Status will refresh every 5 seconds."

try {

    }

catch {

Write-Host -ForegroundColor Red "ERROR - Unable to get status of cleanup job"

Write-Host -ForegroundColor Red "ERROR - " + $_

        log "ERROR" "Unable to get status of cleanup job"

        log "ERROR" $_

exit

    }

Write-Host "Cleanup status for $($Job_TFO.TargetObjectName) is $($Job_TFOCleanupState.state)"

Start-Sleep 5;

} while (($Job_TFOCleanupState.state -eq "InProgress") -or ($Job_TFOCleanupState.state -eq "NotStarted"))

Write-Host "Test failover cleanup completed."

}

image

The following are screenshots of the PowerShell script output:

image

I hope this will help anyone out there who may be looking for a PowerShell script to automate ASR failover process.

One of the additions I wanted to add to this script was to list the Status VMs in the recovery group after the test failover has completed but I could not find a way to list the VMs that only belong to the recovery group. The cmdlets below lists all of the VMs that are protected but combing through the properties does not appear to contain any reference to what recovery plans they belong to. Please feel free to comment if you happen to know the solution.

$PrimaryFabric = Get-AzRecoveryServicesAsrFabric -FriendlyName svr-asr-01

#svr-asr-01 represents Configuration Servers

$PrimaryProtContainer = Get-AzRecoveryServicesAsrProtectionContainer -Fabric $PrimaryFabric

$ReplicationProtectedItem = Get-AzRecoveryServicesAsrReplicationProtectedItem -ProtectionContainer $PrimaryProtContainer

----------Update July 31, 2021---------

After reviewing some of my old notes, I managed to find another version of the PowerShell script that performed test failover for two plans and included steps to shutdown a VM, remove VNet peering between production and DR regions before the test failover, then recreate them afterwards. The following is a copy of the script:

Connect-AzAccount

Set-AzContext -SubscriptionId "53ea69af-xxx-xxxx-a020-xxxxea02f8b"

#Shutdown DC2

Write-Host "Shutting down DC2 VM in DR"

$DRDCName = "DC2"

$DRDCRG = "Canada-East-Prod"

Stop-AzVM -ResourceGroupName $DRDCRG -Name $DRDCName -force

#Declare variables for DR production VNet

$DRVNetName = "vnet-prod-canadaeast"

$DRVnetRG = "Canada-East-Prod"

$DRVNetPeerName = "DR-to-Prod"

$DRVNetObj = Get-AzVirtualNetwork -Name $DRVNetName

$DRVNetID = $DRVNetObj.ID

#Declare variables for Production VNet

$ProdVNetName = "Contoso-Prod-vnet"

$ProdVnetRG = "Contoso-Prod"

$ProdVNetPeerName = "Prod-to-DR"

$ProdVNetObj = Get-AzVirtualNetwork -Name $ProdVNetName

$ProdVNetID = $ProdVNetObj.ID

# Remove the DR VNet's peering to production

Write-Host "Removing VNet peering between Production and DR environment"

Remove-AzVirtualNetworkPeering -Name $DRVNetPeerName -VirtualNetworkName $DRVNetName -ResourceGroupName $DRVnetRG -force

Remove-AzVirtualNetworkPeering -Name $ProdVNetPeerName -VirtualNetworkName $ProdVNetName -ResourceGroupName $ProdVnetRG -force

#Failover Test for Domain Controller BREAZDC2

$RSVaultName = "rsv-asr-canada-east"

$ASRRecoveryPlanName = "Domain-Controller"

$TestFailoverVNetName = "vnet-prod-canadaeast"

$vault = Get-AzRecoveryServicesVault -Name $RSVaultName

Set-AzRecoveryServicesAsrVaultContext -Vault $vault

$RecoveryPlan = Get-AzRecoveryServicesAsrRecoveryPlan -FriendlyName $ASRRecoveryPlanName

$TFOVnet = Get-AzVirtualNetwork -Name $TestFailoverVNetName

$TFONetwork= $TFOVnet.Id

$Job_TFO = Start-AzRecoveryServicesAsrTestFailoverJob -RecoveryPlan $RecoveryPlan -Direction PrimaryToRecovery -AzureVMNetworkId $TFONetwork

do {

$Job_TFOState = Get-AzRecoveryServicesAsrJob -Job $Job_TFO | Select-Object State

Clear-Host

Write-Host "======== Monitoring Failover ========"

Write-Host "Status will refresh every 5 seconds."

try {

    }

catch {

Write-Host -ForegroundColor Red "ERROR - Unable to get status of Failover job"

Write-Host -ForegroundColor Red "ERROR - " + $_

        log "ERROR" "Unable to get status of Failover job"

        log "ERROR" $_

exit

    }

Write-Host "Failover status for $($Job_TFO.TargetObjectName) is $($Job_TFOState.state)"

Start-Sleep 5;

} while (($Job_TFOState.state -eq "InProgress") -or ($Job_TFOState.state -eq "NotStarted"))

if($Job_TFOState.state -eq "Failed"){

Write-host("The test failover job failed. Script terminating.")

Exit

}else {

#Failover Test for Remaining Servers

$ASRRecoveryPlanName = "DR-Servers"

$RecoveryPlan = Get-AzRecoveryServicesAsrRecoveryPlan -FriendlyName $ASRRecoveryPlanName

$Job_TFO = Start-AzRecoveryServicesAsrTestFailoverJob -RecoveryPlan $RecoveryPlan -Direction PrimaryToRecovery -AzureVMNetworkId $TFONetwork

do {

$Job_TFOState = Get-AzRecoveryServicesAsrJob -Job $Job_TFO | Select-Object State

Clear-Host

Write-Host "======== Monitoring Failover ========"

Write-Host "Status will refresh every 5 seconds."

try {

        }

catch {

Write-Host -ForegroundColor Red "ERROR - Unable to get status of Failover job"

Write-Host -ForegroundColor Red "ERROR - " + $_

            log "ERROR" "Unable to get status of Failover job"

            log "ERROR" $_

exit

        }

Write-Host "Failover status for $($Job_TFO.TargetObjectName) is $($Job_TFOState.state)"

Start-Sleep 5;

    } while (($Job_TFOState.state -eq "InProgress") -or ($Job_TFOState.state -eq "NotStarted"))

if($Job_TFOState.state -eq "Failed"){

Write-host("The test failover job failed. Script terminating.")

Exit

    }else {

Read-Host -Prompt "Test failover has completed. Please check ASR Portal, test VMs and press enter to perform cleanup..."

$Job_TFOCleanup = Start-AzRecoveryServicesAsrTestFailoverCleanupJob -RecoveryPlan $RecoveryPlan -Comment "Testing Completed"

do {

$Job_TFOCleanupState = Get-AzRecoveryServicesAsrJob -Job $Job_TFOCleanup | Select-Object State

Clear-Host

Write-Host "======== Monitoring Cleanup ========"

Write-Host "Status will refresh every 5 seconds."

try {

    }

catch {

Write-Host -ForegroundColor Red "ERROR - Unable to get status of cleanup job"

Write-Host -ForegroundColor Red "ERROR - " + $_

        log "ERROR" "Unable to get status of cleanup job"

        log "ERROR" $_

exit

    }

Write-Host "Cleanup status for $($Job_TFO.TargetObjectName) is $($Job_TFOCleanupState.state)"

Start-Sleep 5;

} while (($Job_TFOCleanupState.state -eq "InProgress") -or ($Job_TFOCleanupState.state -eq "NotStarted"))

$ASRRecoveryPlanName = "Domain-Controller"

$RecoveryPlan = Get-AzRecoveryServicesAsrRecoveryPlan -FriendlyName $ASRRecoveryPlanName

$Job_TFOCleanup = Start-AzRecoveryServicesAsrTestFailoverCleanupJob -RecoveryPlan $RecoveryPlan -Comment "Testing Completed"

do {

$Job_TFOCleanupState = Get-AzRecoveryServicesAsrJob -Job $Job_TFOCleanup | Select-Object State

Clear-Host

Write-Host "======== Monitoring Cleanup ========"

Write-Host "Status will refresh every 5 seconds."

try {

    }

catch {

Write-Host -ForegroundColor Red "ERROR - Unable to get status of cleanup job"

Write-Host -ForegroundColor Red "ERROR - " + $_

        log "ERROR" "Unable to get status of cleanup job"

        log "ERROR" $_

exit

    }

Write-Host "Cleanup status for $($ASRRecoveryPlanName) is $($Job_TFOCleanupState.state)"

Start-Sleep 5;

} while (($Job_TFOCleanupState.state -eq "InProgress") -or ($Job_TFOCleanupState.state -eq "NotStarted"))

Write-Host "Test failover cleanup completed."

}

}

#Create the DR VNet's peering to production

Write-Host "Recreating VNet peering between Production and DR environment after failover testing"

Add-AzVirtualNetworkPeering -Name $DRVNetPeerName -VirtualNetwork $DRVNetObj -RemoteVirtualNetworkId $ProdVNetID -AllowForwardedTraffic

Add-AzVirtualNetworkPeering -Name $ProdVNetPeerName -VirtualNetwork $ProdVNetObj -RemoteVirtualNetworkId $DRVNetID -AllowForwardedTraffic

#Power On DC2

Write-Host "Powering on DC2 VM in DR after testing"

Start-AzVM -ResourceGroupName $DRDCRG -Name $DRDCName

3 comments:

Anonymous said...

I just wanted to thank you for this post. I was able to adapt this script to meet our organization's needs (we split DR into multiple recovery plans so we can stand up different applications independently), and this was a great starting point for that.

Anonymous said...

This worked perfectly and was exactly what I needed. I'm automating cleanup tasks on each system that gets failed over and added this at the beginning of my script so I can run the whole process with one click.

spacemancw said...

Thanks, this has been very helpful. I was able to do a test failover and cleanup, from Vmware to Azure.

Then I tried to script an Unplanned failover (and of course failback).
Failover:
Start-AzRecoveryServicesAsrUnplannedFailoverJob -ReplicationProtectedItem $ReplicationProtectedItem -Direction PrimaryToRecovery

Commit
Start-AzRecoveryServicesAsrCommitFailoverJob -ReplicationProtectedItem $ReplicationProtectedItem

Re-protect ... I was unable to work this out, but I think it is
Update-AzRecoveryServicesAsrProtectionDirection ... something

Then Failback
Start-AzRecoveryServicesAsrPlannedFailoverJob -ReplicationProtectedItem $newReplicationProtectedItem -Confirm:$false -Direction RecoveryToPrimary

Commit Again
Start-AzRecoveryServicesAsrCommitFailoverJob -ReplicationProtectedItem $newReplicationProtectedItem

And then there is another re-protect .. I dont know this either

Do you know the proper commands for re-protect firstly after failover, and then again after failback?

thanks