In my previous post:
Using Azure Change Tracking and Inventory to monitor Windows Services
http://terenceluk.blogspot.com/2022/02/using-azure-change-tracking-and.html
I demonstrated how to set up Change Tracking and Inventory in Azure Automation to monitor Windows services in a virtual machine and alert when the service was no longer in a running state. With monitoring and alerting in place, the next step is to incorporate automation so that issues can be immediately remediated without requiring manual intervention. Azure provides multiple methods for automation and for this post, I will demonstrate how to achieve this with the following:
- The PowerShell cmdlet Invoke-AzVMRunCommand (https://docs.microsoft.com/en-us/powershell/module/az.compute/invoke-azvmruncommand?view=azps-7.2.0)
- PowerShell script that uses the Invoke-AzVMRunCommand cmdlet to restart a virtual machine’s Windows service
- Automation Account Runbook
- Managed Identity (https://docs.microsoft.com/en-us/azure/automation/enable-managed-identity-for-automation)
- Test a Automation Account Runbook
- Create an Alert to monitor for service down and create Action Group with Automation Account Runbook
- Test Alert and Automation Account Runbook
To avoid recreating the same content that demonstrate how to set up monitoring for a virtual machine’s Windows service, let’s assume that we’ve gone through the same steps as we did for my previous post:
Using Azure Change Tracking and Inventory to monitor Windows Services
http://terenceluk.blogspot.com/2022/02/using-azure-change-tracking-and.html
Monitoring and alerting has already been setup and what needs to be done is to incorporate automation as shown in the following.
PowerShell cmdlet Invoke-AzVMRunCommand
The PowerShell cmdlet Invoke-AzVMRunCommand is a cmdlet that allows running PowerShell scripts or commands remotely on an Azure Virtual Machine and this will be the method we’ll be using to remotely restart a stopped Windows service in a automation runbook. For more information about this cmdlet, see the following documentation:
Invoke-AzVMRunCommand
https://docs.microsoft.com/en-us/powershell/module/az.compute/invoke-azvmruncommand?view=azps-7.2.0
Run scripts in your Windows VM by using action Run Commands
https://docs.microsoft.com/en-us/azure/virtual-machines/windows/run-command
PowerShell script that uses the Invoke-AzVMRunCommand cmdlet to restart a virtual machine’s Windows service
Next, we’ll incorporate the cmdlet that allows us to restart a virtual machine’s Windows service into the following script:
# This PowerShell using Invoke-AzVMRunCommand runs the PS cmdlet with a parameter directly that specifies the Windows Service name on the target VM
Connect-AzAccount
$resourceGroupName = "yourResourceGroupName"
$vmName ="ServerName"
$scriptToRun = "Start-Service -DisplayName 'Remote Registry'"
Out-File -InputObject $scriptToRun -FilePath ScriptToRun.ps1
Invoke-AzVMRunCommand -ResourceGroupName $resourceGroupName -Name $vmName -CommandId 'RunPowerShellScript' -ScriptPath ScriptToRun.ps1
Remove-Item -Path ScriptToRun.ps1
This script will need the Virtual Machine Contributor role to execute the Invoke-AzVMRunCommand cmdlet and it will be provided with the Managed Identity for the Automation Account (configured a bit later) so the final script will have the following lines inserted to execute Connect-AzAccount with the Managed Identity:
# Ensures you do not inherit an AzContext in your runbook
Disable-AzContextAutosave -Scope Process
# Connect to Azure with system-assigned managed identity
$AzureContext = (Connect-AzAccount -Identity).context
# set and store context
$AzureContext = Set-AzContext -SubscriptionName $AzureContext.Subscription -DefaultProfile $AzureContext
# This PowerShell using Invoke-AzVMRunCommand runs the PS cmdlet with a parameter directly that specifies the Windows Service name on the target VM
$resourceGroupName = "yourResourceGroupName"
$vmName ="ServerName"
$scriptToRun = "Start-Service -DisplayName 'Remote Registry'"
Out-File -InputObject $scriptToRun -FilePath ScriptToRun.ps1
Invoke-AzVMRunCommand -ResourceGroupName $resourceGroupName -Name $vmName -CommandId 'RunPowerShellScript' -ScriptPath ScriptToRun.ps1
Remove-Item -Path ScriptToRun.ps1
Automation Account Runbook
With the PowerShell script for restarting a Windows service prepared, proceed to create a runbook that will execute the script when an alert. Navigate to the Automation Account > Runbooks and Create a runbook:
The options available for the Runbook type are as follows:
- PowerShell
- Python
- PowerShell Workflow
- Graphical PowerShell
- Graphical PowerShell Workflow
This example will use a PowerShell script so we’ll select PowerShell with the runtime version as 5.1 and then create the runbook:
With the runbook created, navigate into the newly created runbook and click on the Edit button:
Insert the prepared script into the Runbook:
Proceed to click Save and then Publish to publish the PowerShell Runbook. Note that we won’t be testing this just yet because we haven’t configured the managed identity yet and therefore the runbook does not have the appropriate permissions to use the Invoke-AzVMRunCommand to start a Windows service on the VM.
Managed Identity
With the runbook created, we’ll need to configure a managed identity for the Automation account to run the PowerShell script. More documentation about managed identity can be found in the following documentation:
Using a system-assigned managed identity for an Azure Automation account
https://docs.microsoft.com/en-us/azure/automation/enable-managed-identity-for-automation
Navigate to the Automation Account > Identity > System assigned and switch the Status to On to enable a system assigned managed identity:
Next, click on the Azure role assignments and add the Virtual Machine Contributor role to the assignment:
The Automation Account now has the assigned role for the PowerShell script to execute with.
Test a Automation Account Runbook
With the managed identity configured, we can now proceed to test the PowerShell script and verify that it indeed starts the Windows service. Navigate into the runbook and click on the Edit button:
Click on the Test pane button:
Click on the Start button to execute the PowerShell script using the managed identity:
The runbook will now execute the script:
Queued..
Streams will display when the test completes.
Wait for the test complete and verify that the output indicates it succeeded without errors:
Proceed to verify that the Windows service on the VM has restarted:
Create an Alert to monitor for service down and create Action Group with Automation Account Runbook
With the Automation Account Runbook tested, let’s proceed to create an alert to detect when the Remote Registry service (or any service of your choice) has stopped. Navigate to the Log Analytics workspace that was created to monitor the service, click on Create > Alert rule:
Select Custom Log Search to provide a custom Kusto query:
Many posts, including a previous one I wrote, simply use the following query to look for when a service has stopped:
ConfigurationData
| where SvcName =~ "RemoteRegistry"
| project SvcName, SvcDisplayName, SvcState, TimeGenerated
| where SvcState != "Running"
I find that the issue with using this query is that it will return all records of when the service has not been running within the specified period. This means that if the service was restarted and is running afterwards then this query will not show it and therefore the alert would continue to be fired. After giving it bit of thought, what I wanted to was run a query to get the last time the service was running and the last time it was stopped then compare the two TimeGenerated. Given my lack of experience with Kusto query, I cannot figure out how I can take the time and compare them so I decided to capture the two queries in variables, join them together, then compare the time stamps as shown below:
let LastStopped =
ConfigurationData
| where SvcName =~ "RemoteRegistry"
and SvcState != "Running"
| project SvcName, SvcDisplayName, SvcState, TimeGenerated
| order by TimeGenerated desc
| limit 1;
let LastRunning =
ConfigurationData
| where SvcName =~ "RemoteRegistry"
and SvcState == "Running"
| project SvcName, SvcDisplayName, SvcState, TimeGenerated
| order by TimeGenerated desc
| limit 1;
LastRunning
| join LastStopped on SvcName
// LastRunning time is earlier than LastStopped time
| where TimeGenerated <= TimeGenerated1
The intention of this query is to only return a result if the most recent event was when the service was not running and to not return anything if the most recent event was when the service is running. I’m completely open to recommendations if anyone happen to read this and think there is a better way of doing it.
With the query in place, proceed to create the Alert by clicking on Continue Editing Alert to use the query, leave the rest of the conditions as default and click on Actions:
Create a new action group by clicking on Create action group:
Provide a name for this action group and click on Notifications:
Configuration the notification setting and then click on Actions:
Select Automation Runbook for the Action type:
Select the runbook that was created earlier and click OK:
Proceed to create the Action Group:
The newly created Action group should automatically be added to the Alert:
Fill in the Details tab for the alert:
Complete creating the Alert:
Test Alert and Automation Account Runbook
Proceed to test and confirm that an alert is fired when the service is down and the runbook has successfully executed to restart the service:
I hope this blog post is able to provide information on how to set up an Automation Account to monitor a virtual machine’s Windows service and a runbook that will execute a PowerShell script using the Invoke-AzVMRunCommand to restart a stopped service.
1 comment:
Post a Comment