Pages

Monday, February 28, 2022

Using an Automation Account to monitor a VM's Windows service and a Runbook to use Invoke-AzVMRunCommand to restart a stopped service

In my previous post:

Using Azure Change Tracking and Inventory to monitor Windows Services
http://terenceluk.blogspot.com/2022/02/using-azure-change-tracking-and.html

I demonstrated how to set up Change Tracking and Inventory in Azure Automation to monitor Windows services in a virtual machine and alert when the service was no longer in a running state. With monitoring and alerting in place, the next step is to incorporate automation so that issues can be immediately remediated without requiring manual intervention. Azure provides multiple methods for automation and for this post, I will demonstrate how to achieve this with the following:

To avoid recreating the same content that demonstrate how to set up monitoring for a virtual machine’s Windows service, let’s assume that we’ve gone through the same steps as we did for my previous post:

Using Azure Change Tracking and Inventory to monitor Windows Services
http://terenceluk.blogspot.com/2022/02/using-azure-change-tracking-and.html

Monitoring and alerting has already been setup and what needs to be done is to incorporate automation as shown in the following.

PowerShell cmdlet Invoke-AzVMRunCommand

The PowerShell cmdlet Invoke-AzVMRunCommand is a cmdlet that allows running PowerShell scripts or commands remotely on an Azure Virtual Machine and this will be the method we’ll be using to remotely restart a stopped Windows service in a automation runbook. For more information about this cmdlet, see the following documentation:

Invoke-AzVMRunCommand
https://docs.microsoft.com/en-us/powershell/module/az.compute/invoke-azvmruncommand?view=azps-7.2.0

Run scripts in your Windows VM by using action Run Commands
https://docs.microsoft.com/en-us/azure/virtual-machines/windows/run-command

PowerShell script that uses the Invoke-AzVMRunCommand cmdlet to restart a virtual machine’s Windows service

Next, we’ll incorporate the cmdlet that allows us to restart a virtual machine’s Windows service into the following script:

# This PowerShell using Invoke-AzVMRunCommand runs the PS cmdlet with a parameter directly that specifies the Windows Service name on the target VM

Connect-AzAccount

$resourceGroupName = "yourResourceGroupName"

$vmName ="ServerName"

$scriptToRun = "Start-Service -DisplayName 'Remote Registry'"

Out-File -InputObject $scriptToRun -FilePath ScriptToRun.ps1

Invoke-AzVMRunCommand -ResourceGroupName $resourceGroupName -Name $vmName -CommandId 'RunPowerShellScript' -ScriptPath ScriptToRun.ps1

Remove-Item -Path ScriptToRun.ps1

This script will need the Virtual Machine Contributor role to execute the Invoke-AzVMRunCommand cmdlet and it will be provided with the Managed Identity for the Automation Account (configured a bit later) so the final script will have the following lines inserted to execute Connect-AzAccount with the Managed Identity:

# Ensures you do not inherit an AzContext in your runbook

Disable-AzContextAutosave -Scope Process

# Connect to Azure with system-assigned managed identity

$AzureContext = (Connect-AzAccount -Identity).context

# set and store context

$AzureContext = Set-AzContext -SubscriptionName $AzureContext.Subscription -DefaultProfile $AzureContext

# This PowerShell using Invoke-AzVMRunCommand runs the PS cmdlet with a parameter directly that specifies the Windows Service name on the target VM

$resourceGroupName = "yourResourceGroupName"

$vmName ="ServerName"

$scriptToRun = "Start-Service -DisplayName 'Remote Registry'"

Out-File -InputObject $scriptToRun -FilePath ScriptToRun.ps1

Invoke-AzVMRunCommand -ResourceGroupName $resourceGroupName -Name $vmName -CommandId 'RunPowerShellScript' -ScriptPath ScriptToRun.ps1

Remove-Item -Path ScriptToRun.ps1

Automation Account Runbook

With the PowerShell script for restarting a Windows service prepared, proceed to create a runbook that will execute the script when an alert. Navigate to the Automation Account > Runbooks and Create a runbook:

image

The options available for the Runbook type are as follows:

  1. PowerShell
  2. Python
  3. PowerShell Workflow
  4. Graphical PowerShell
  5. Graphical PowerShell Workflow
image

This example will use a PowerShell script so we’ll select PowerShell with the runtime version as 5.1 and then create the runbook:

image

With the runbook created, navigate into the newly created runbook and click on the Edit button:

image

Insert the prepared script into the Runbook:

image

Proceed to click Save and then Publish to publish the PowerShell Runbook. Note that we won’t be testing this just yet because we haven’t configured the managed identity yet and therefore the runbook does not have the appropriate permissions to use the Invoke-AzVMRunCommand to start a Windows service on the VM.

Managed Identity

With the runbook created, we’ll need to configure a managed identity for the Automation account to run the PowerShell script. More documentation about managed identity can be found in the following documentation:

Using a system-assigned managed identity for an Azure Automation account
https://docs.microsoft.com/en-us/azure/automation/enable-managed-identity-for-automation

Navigate to the Automation Account > Identity > System assigned and switch the Status to On to enable a system assigned managed identity:

image

Next, click on the Azure role assignments and add the Virtual Machine Contributor role to the assignment:

image

The Automation Account now has the assigned role for the PowerShell script to execute with.

Test a Automation Account Runbook

With the managed identity configured, we can now proceed to test the PowerShell script and verify that it indeed starts the Windows service. Navigate into the runbook and click on the Edit button:

image

Click on the Test pane button:

image

Click on the Start button to execute the PowerShell script using the managed identity:

image

The runbook will now execute the script:

Queued..

Streams will display when the test completes.

image

Wait for the test complete and verify that the output indicates it succeeded without errors:

image

Proceed to verify that the Windows service on the VM has restarted:

image

Create an Alert to monitor for service down and create Action Group with Automation Account Runbook

With the Automation Account Runbook tested, let’s proceed to create an alert to detect when the Remote Registry service (or any service of your choice) has stopped. Navigate to the Log Analytics workspace that was created to monitor the service, click on Create > Alert rule:

image

Select Custom Log Search to provide a custom Kusto query:

image

Many posts, including a previous one I wrote, simply use the following query to look for when a service has stopped:

ConfigurationData

| where SvcName =~ "RemoteRegistry"

| project SvcName, SvcDisplayName, SvcState, TimeGenerated

| where SvcState != "Running"

I find that the issue with using this query is that it will return all records of when the service has not been running within the specified period. This means that if the service was restarted and is running afterwards then this query will not show it and therefore the alert would continue to be fired. After giving it bit of thought, what I wanted to was run a query to get the last time the service was running and the last time it was stopped then compare the two TimeGenerated. Given my lack of experience with Kusto query, I cannot figure out how I can take the time and compare them so I decided to capture the two queries in variables, join them together, then compare the time stamps as shown below:

let LastStopped =

ConfigurationData

| where SvcName =~ "RemoteRegistry"

and SvcState != "Running"

| project SvcName, SvcDisplayName, SvcState, TimeGenerated

| order by TimeGenerated desc

| limit 1;

let LastRunning =

ConfigurationData

| where SvcName =~ "RemoteRegistry"

and SvcState == "Running"

| project SvcName, SvcDisplayName, SvcState, TimeGenerated

| order by TimeGenerated desc

| limit 1;

LastRunning

| join LastStopped on SvcName

// LastRunning time is earlier than LastStopped time

| where TimeGenerated <= TimeGenerated1

The intention of this query is to only return a result if the most recent event was when the service was not running and to not return anything if the most recent event was when the service is running. I’m completely open to recommendations if anyone happen to read this and think there is a better way of doing it.

image

With the query in place, proceed to create the Alert by clicking on Continue Editing Alert to use the query, leave the rest of the conditions as default and click on Actions:

image

Create a new action group by clicking on Create action group:

image

Provide a name for this action group and click on Notifications:

image

Configuration the notification setting and then click on Actions:

image

Select Automation Runbook for the Action type:

image

Select the runbook that was created earlier and click OK:

image

Proceed to create the Action Group:

image

image

The newly created Action group should automatically be added to the Alert:

image

Fill in the Details tab for the alert:

image

Complete creating the Alert:

image

Test Alert and Automation Account Runbook

Proceed to test and confirm that an alert is fired when the service is down and the runbook has successfully executed to restart the service:

image

image

I hope this blog post is able to provide information on how to set up an Automation Account to monitor a virtual machine’s Windows service and a runbook that will execute a PowerShell script using the Invoke-AzVMRunCommand to restart a stopped service.

Friday, February 25, 2022

Monitoring, Alerting, Reporting Azure AD logins and login failures with Log Analytics and Logic Apps

Having monitoring and alerting set up for failed login attempts to any identity directory services (e.g. on-premise AD and Azure AD) have always been important yet often times neglected for many environments I’ve worked in. Those who have worked with on-premise Active Directory would know the pain of going through a domain controller’s security event logs and how difficult it is to obtain what you need as all security events are recorded. This is why there are third party monitoring and alerting products that have had so much success for years. The built-in Azure AD Sign-in Logs provides a display of sign-in attempts for troubleshooting but is also rarely reviewed and audited. If configuration has been made for the Azure AD logs, the retention can range from 7 days (Azure AD Free) to 30 days (Azure AD Premium P1 or P2) as stated here in the documentation: https://docs.microsoft.com/en-us/azure/active-directory/reports-monitoring/reference-reports-data-retention#how-long-does-azure-ad-store-the-data

image

I’ve always bundled in the step for increasing the retention of these logs in my projects and the action required to ensure logs are retained for a longer retention is to route them to an Azure storage account using monitor as described in the following Microsoft documentation: https://docs.microsoft.com/en-us/azure/active-directory/reports-monitoring/quickstart-azure-monitor-route-logs-to-storage-account. With the retention of the logs in place, the next critical component is to have monitoring and alerting in place to capture events that are of concern to the organization. A common use case is to monitor the first global admin account created for the tenant as that is usually one that isn’t associated with a particular user and may be used as the Emergency Break Glass Account that does not have MFA. Another common monitoring that should be set up is for potential malicious login attempts.

With the above in mind, what I would like to do in this post is demonstrate how to:

  1. Set up monitoring of Azure AD with Log Analytics
  2. Set up an alert using Kusto to query Azure AD Sign-In Logs
  3. Set up reporting of Azure AD failed sign in attempts with Logic Apps

Configure Log Analytics Workspace for Azure AD

I won’t go into how to create a Log Analytics workspace so proceed to create one that will be used to ingest Azure AD logs as such:

image

Proceed to navigate into Azure Active Directory > Diagnostic settings and then click on Add diagnostic setting:

image

The following are the log category options:

  • AuditLogs
  • SignInLogs
  • NonInteractiveUserSignInLogs
  • ServicePrincipalSignInLogs
  • ManagedIdentitySignInLogs
  • ProvisioningLogs
  • ADFSSignInLogs
  • RiskyUsers
  • UserRiskEvents
  • NetworkAccessTrafficLogs
  • RiskyServicePrincipals
  • ServicePrincipalRiskEvents

For the purpose of this example, we’ll just capture the AuditLogs and SignInLogs and click on the Save button:

Note the following requirement:

In order to export Sign-in data, your organization needs Azure AD P1 or P2 license. If you don't have a P1 or P2, start a free trial.

image

With the AuditLogs and SignInLogs configured for collection, navigate to the corresponding Log Analytics workspace > Usage and estimated costs > Data Retention to configure the required retention:

image

Set up an alert using Kusto to query Azure AD Sign-In Logs

As mentioned earlier, a common use case is to monitor the first global admin account created for the tenant as that is usually one that isn’t associated with a particular user and may be used as the Emergency Break Glass Account that does not have MFA. To set this up, navigate to Monitor > Alerts > Create > Alert rule:

image

Select the log analytics workspace as the scope:

image

Select Custom log search for the Condition so we can define our own Kusto query:

image

The following query can be used to look for break glass account login:

SigninLogs
|where UserPrincipalName contains "cspadmin@contoso.onmicrosoft.com"

image

With the Condition defined, proceed to assign an Action Group that will alert the appropriate group an appropriate communication such as email.

Set up reporting of Azure AD failed sign in attempts with Logic Apps

Another common scenario I’ve come across is to provide a report of failed sign-in attempts to a group of administrators to review on either a hourly, daily or weekly basis. A frequent report can be very useful for an active attack while daily and weekly can be great for monitoring attacks. For the purpose of this example, I will use a directory with the following failed sign-in attempts from all sorts of Asia countries for a user:

image

The kusto query we’ll be using to look for failed sign-ins is the following:

SigninLogs

| where Status.errorCode != 0

| extend City=LocationDetails.city, State=LocationDetails.state, Country=LocationDetails.countryOrRegion, Error_Code=Status.errorCode, Failure_Reason=Status.failureReason

| project TimeGenerated, UserDisplayName, AppDisplayName, IPAddress, City, State, Country, AuthenticationRequirement, Failure_Reason, ConditionalAccessStatus, ConditionalAccessPolicies, Error_Code

image

You are free to adjust the query to include or exclude additional fields of the failed sign-in attempts.

With the kusto query created and tested, the next step is to create a Logic App that will generate and send a report to an email address for administrators to review. Create a Logic App as such:

image

**Note that the Enable log analytics option for the creation of the Logic App is to get richer debugging information about the logic apps during runtime.

Next, navigate into the Logic app and click on Logic app designer:

image

We’ll be creating 3 steps for this Logic App where:

  1. Recurrence: This will configure a recurring schedule for this Logic App to execute
  2. Run query and visualize results: This will allow us to run the Kusto query, set a Time Range and specify a Chart Type
  3. Send an email (V2): This will allow us to send the Kusto query results via email
image

Recurrence:

I wanted to send this report every day at 5:00p.m. EST:

image

Run query and visualize results:

The query I wanted to execute is:

SigninLogs

| where Status.errorCode != 0

| extend City=LocationDetails.city, State=LocationDetails.state, Country=LocationDetails.countryOrRegion, Error_Code=Status.errorCode, Failure_Reason=Status.failureReason

| project TimeGenerated, UserDisplayName, AppDisplayName, IPAddress, City, State, Country, AuthenticationRequirement, Failure_Reason, ConditionalAccessStatus, ConditionalAccessPolicies, Error_Code

The time range I wanted to query for is the last 12 hours and the Chart Type I want is an HTML Table:

image

Send an email (V2):

The report will include the Attachment Content and Attachment Name derived from the query with the subject Failed Login Report. The email will look pretty barebone so you are free to add HTML code to pretty it up.

image

Proceed to save the Logic App:

image

Use the Run Trigger to test the Logic App and confirm that an email is sent:

image

Hope this helps anyone who may be looking for a way to monitor, alert, and report Azure AD logins with Log Analytics and Logic Apps.

Wednesday, February 23, 2022

Using Azure Change Tracking and Inventory to monitor Windows Services

In my previous post:

Monitor and Alerting for an Azure Virtual Machine with Azure Monitor
Terence Luk: Monitor and Alerting for an Azure Virtual Machine with Azure Monitor

I demonstrated how to set up Log Analytics to monitor the event log for a system event ID 7031 with a specific string that represents a Windows Service we wanted to monitor for and detect if it stops. While this method is certainly a viable option, it isn’t very straight forward if you’re not familiar with Windows and do not know what system events are triggered when a service stops. Case in point, when a service is abruptly terminated, an event ID 7031 error is logged but if the service is gracefully stopped, the event logged will be a 7036 information. Having to capture all types of events with a query leaves a lot of room for error so I would like to demonstrate a different method for monitoring Windows or Linux services.

The Azure feature I typically use to monitor services from within a virtual machine is Change Tracking and Inventory in Azure Automation. This feature tracks changes in virtual machines hosted in Azure, on-premises, and other cloud environments. Items that are tracked by Change Tracking and Inventory include:

  • Windows software
  • Linux software (packages)
  • Windows and Linux files
  • Windows registry keys
  • Windows services
  • Linux daemons

Change Tracking and Inventory overview

https://docs.microsoft.com/en-us/azure/automation/change-tracking/overview

I find this feature extremely powerful and opens up many monitoring opportunities for all sorts of use cases. For the purpose of this example, we’ll use it to monitor Windows service status.

Creating an Automation Account

The Change Tracking and Inventory is a feature of Azure Automation and you’ll therefore need to create an Automation account that is linked to a Log Analytics workspace. Begin by navigate to Automation Accounts:

image

Then create an automation account:

image

Enabling Change Tracking for the Automation Account

Navigate to Configuration Management > Change Tracking, select a supported Log Analytics workspace, then click Enable:

image

The following console will be displayed once the deployment has successfully completed:

image

Adding VMs for Change Tracking

With Change Tracking ready in the Automation Account, proceed to add the VMs:

image

Select the virtual machine(s) you would like to enable Change Tracking and then click Enable:

image

With the virtual machine added, proceed to adjust the settings by clicking on Edit Settings:

image

Navigate to Windows Services and note how the frequency is set to 30 minutes:

image

This is likely not frequent enough and for the purpose of this example, we’ll use the lowest frequency of 10 seconds to collect the Windows Services changes:

image

With the change tracking configured, proceed to stop the service you intend to test with:

image

Refresh the Change tracking console and you should see the Windows Services change logged:

image

Note the details for the Advanced Monitoring Agent service we stopped:

image

Proceed to click on the Log Analytics button:

image

The query window will automatically execute the ConfigurationChange query without any other requirements, which will provide all results for any configuration change. We’ll refine it to only list the service we want to track:

ConfigurationData
| where SvcName =~ "Advanced Monitoring Agent"
| project SvcName, SvcDisplayName, SvcState, TimeGenerated
| where SvcState != "Running"

image

Now that we have a query to search for a specific service, we can create a new alert by clicking the New alert rule button:

image

Proceed to configure the Condition settings:

image

Update the Threshold value to 0 and Frequency of evaluation to 1 minute to capture any service status that is not “Running”:

image

Select an action group for the notification:

image

Fill in the details for the rule:

image

Complete creating the rule:

image

As with all rules, it may take a bit of time before it shows up in the console:

image

Note that although the rule was created within the Automation Account, it is actually configured and associated with the Log Analytics Workspace of the Automation Account:

image

Proceed to test stopping the monitored service and you should see an email notification similar to the one below:

image

Hope this provides a good overview of how to use Change Tracking and Inventory to monitor Windows Services. What’s great about this feature is that it allows you to track other changes such as file and registry, which opens up so many possibilities for monitoring.