Pages

Monday, February 28, 2022

Using an Automation Account to monitor a VM's Windows service and a Runbook to use Invoke-AzVMRunCommand to restart a stopped service

In my previous post:

Using Azure Change Tracking and Inventory to monitor Windows Services
http://terenceluk.blogspot.com/2022/02/using-azure-change-tracking-and.html

I demonstrated how to set up Change Tracking and Inventory in Azure Automation to monitor Windows services in a virtual machine and alert when the service was no longer in a running state. With monitoring and alerting in place, the next step is to incorporate automation so that issues can be immediately remediated without requiring manual intervention. Azure provides multiple methods for automation and for this post, I will demonstrate how to achieve this with the following:

To avoid recreating the same content that demonstrate how to set up monitoring for a virtual machine’s Windows service, let’s assume that we’ve gone through the same steps as we did for my previous post:

Using Azure Change Tracking and Inventory to monitor Windows Services
http://terenceluk.blogspot.com/2022/02/using-azure-change-tracking-and.html

Monitoring and alerting has already been setup and what needs to be done is to incorporate automation as shown in the following.

PowerShell cmdlet Invoke-AzVMRunCommand

The PowerShell cmdlet Invoke-AzVMRunCommand is a cmdlet that allows running PowerShell scripts or commands remotely on an Azure Virtual Machine and this will be the method we’ll be using to remotely restart a stopped Windows service in a automation runbook. For more information about this cmdlet, see the following documentation:

Invoke-AzVMRunCommand
https://docs.microsoft.com/en-us/powershell/module/az.compute/invoke-azvmruncommand?view=azps-7.2.0

Run scripts in your Windows VM by using action Run Commands
https://docs.microsoft.com/en-us/azure/virtual-machines/windows/run-command

PowerShell script that uses the Invoke-AzVMRunCommand cmdlet to restart a virtual machine’s Windows service

Next, we’ll incorporate the cmdlet that allows us to restart a virtual machine’s Windows service into the following script:

# This PowerShell using Invoke-AzVMRunCommand runs the PS cmdlet with a parameter directly that specifies the Windows Service name on the target VM

Connect-AzAccount

$resourceGroupName = "yourResourceGroupName"

$vmName ="ServerName"

$scriptToRun = "Start-Service -DisplayName 'Remote Registry'"

Out-File -InputObject $scriptToRun -FilePath ScriptToRun.ps1

Invoke-AzVMRunCommand -ResourceGroupName $resourceGroupName -Name $vmName -CommandId 'RunPowerShellScript' -ScriptPath ScriptToRun.ps1

Remove-Item -Path ScriptToRun.ps1

This script will need the Virtual Machine Contributor role to execute the Invoke-AzVMRunCommand cmdlet and it will be provided with the Managed Identity for the Automation Account (configured a bit later) so the final script will have the following lines inserted to execute Connect-AzAccount with the Managed Identity:

# Ensures you do not inherit an AzContext in your runbook

Disable-AzContextAutosave -Scope Process

# Connect to Azure with system-assigned managed identity

$AzureContext = (Connect-AzAccount -Identity).context

# set and store context

$AzureContext = Set-AzContext -SubscriptionName $AzureContext.Subscription -DefaultProfile $AzureContext

# This PowerShell using Invoke-AzVMRunCommand runs the PS cmdlet with a parameter directly that specifies the Windows Service name on the target VM

$resourceGroupName = "yourResourceGroupName"

$vmName ="ServerName"

$scriptToRun = "Start-Service -DisplayName 'Remote Registry'"

Out-File -InputObject $scriptToRun -FilePath ScriptToRun.ps1

Invoke-AzVMRunCommand -ResourceGroupName $resourceGroupName -Name $vmName -CommandId 'RunPowerShellScript' -ScriptPath ScriptToRun.ps1

Remove-Item -Path ScriptToRun.ps1

Automation Account Runbook

With the PowerShell script for restarting a Windows service prepared, proceed to create a runbook that will execute the script when an alert. Navigate to the Automation Account > Runbooks and Create a runbook:

image

The options available for the Runbook type are as follows:

  1. PowerShell
  2. Python
  3. PowerShell Workflow
  4. Graphical PowerShell
  5. Graphical PowerShell Workflow
image

This example will use a PowerShell script so we’ll select PowerShell with the runtime version as 5.1 and then create the runbook:

image

With the runbook created, navigate into the newly created runbook and click on the Edit button:

image

Insert the prepared script into the Runbook:

image

Proceed to click Save and then Publish to publish the PowerShell Runbook. Note that we won’t be testing this just yet because we haven’t configured the managed identity yet and therefore the runbook does not have the appropriate permissions to use the Invoke-AzVMRunCommand to start a Windows service on the VM.

Managed Identity

With the runbook created, we’ll need to configure a managed identity for the Automation account to run the PowerShell script. More documentation about managed identity can be found in the following documentation:

Using a system-assigned managed identity for an Azure Automation account
https://docs.microsoft.com/en-us/azure/automation/enable-managed-identity-for-automation

Navigate to the Automation Account > Identity > System assigned and switch the Status to On to enable a system assigned managed identity:

image

Next, click on the Azure role assignments and add the Virtual Machine Contributor role to the assignment:

image

The Automation Account now has the assigned role for the PowerShell script to execute with.

Test a Automation Account Runbook

With the managed identity configured, we can now proceed to test the PowerShell script and verify that it indeed starts the Windows service. Navigate into the runbook and click on the Edit button:

image

Click on the Test pane button:

image

Click on the Start button to execute the PowerShell script using the managed identity:

image

The runbook will now execute the script:

Queued..

Streams will display when the test completes.

image

Wait for the test complete and verify that the output indicates it succeeded without errors:

image

Proceed to verify that the Windows service on the VM has restarted:

image

Create an Alert to monitor for service down and create Action Group with Automation Account Runbook

With the Automation Account Runbook tested, let’s proceed to create an alert to detect when the Remote Registry service (or any service of your choice) has stopped. Navigate to the Log Analytics workspace that was created to monitor the service, click on Create > Alert rule:

image

Select Custom Log Search to provide a custom Kusto query:

image

Many posts, including a previous one I wrote, simply use the following query to look for when a service has stopped:

ConfigurationData

| where SvcName =~ "RemoteRegistry"

| project SvcName, SvcDisplayName, SvcState, TimeGenerated

| where SvcState != "Running"

I find that the issue with using this query is that it will return all records of when the service has not been running within the specified period. This means that if the service was restarted and is running afterwards then this query will not show it and therefore the alert would continue to be fired. After giving it bit of thought, what I wanted to was run a query to get the last time the service was running and the last time it was stopped then compare the two TimeGenerated. Given my lack of experience with Kusto query, I cannot figure out how I can take the time and compare them so I decided to capture the two queries in variables, join them together, then compare the time stamps as shown below:

let LastStopped =

ConfigurationData

| where SvcName =~ "RemoteRegistry"

and SvcState != "Running"

| project SvcName, SvcDisplayName, SvcState, TimeGenerated

| order by TimeGenerated desc

| limit 1;

let LastRunning =

ConfigurationData

| where SvcName =~ "RemoteRegistry"

and SvcState == "Running"

| project SvcName, SvcDisplayName, SvcState, TimeGenerated

| order by TimeGenerated desc

| limit 1;

LastRunning

| join LastStopped on SvcName

// LastRunning time is earlier than LastStopped time

| where TimeGenerated <= TimeGenerated1

The intention of this query is to only return a result if the most recent event was when the service was not running and to not return anything if the most recent event was when the service is running. I’m completely open to recommendations if anyone happen to read this and think there is a better way of doing it.

image

With the query in place, proceed to create the Alert by clicking on Continue Editing Alert to use the query, leave the rest of the conditions as default and click on Actions:

image

Create a new action group by clicking on Create action group:

image

Provide a name for this action group and click on Notifications:

image

Configuration the notification setting and then click on Actions:

image

Select Automation Runbook for the Action type:

image

Select the runbook that was created earlier and click OK:

image

Proceed to create the Action Group:

image

image

The newly created Action group should automatically be added to the Alert:

image

Fill in the Details tab for the alert:

image

Complete creating the Alert:

image

Test Alert and Automation Account Runbook

Proceed to test and confirm that an alert is fired when the service is down and the runbook has successfully executed to restart the service:

image

image

I hope this blog post is able to provide information on how to set up an Automation Account to monitor a virtual machine’s Windows service and a runbook that will execute a PowerShell script using the Invoke-AzVMRunCommand to restart a stopped service.

1 comment:

Study Abroad said...
This comment has been removed by a blog administrator.