Pages

Sunday, April 24, 2022

Infrastructure as Code in 15 Minutes PowerPoint Presentation

It’s finally April and this month is typically when I perform a bit of spring cleaning on my laptop to avoid having files and folders become too disorganized. One of the files I came across as I sorted away my documents folder is a PowerPoint I created a while back when I interviewed for a role where I was asked to create a presentation on a topic of my choice and present it to 5 interviewers. Rather than choosing a topic I was extremely fluent with, I decided to try my luck with something I was learning at the time and that was Infrastructure as Code with Terraform. I did not want the presentation to be too focused on a specific vendor so I spent most of the time talking about the benefits of IaC, then using Terraform as a solution. The window I had to work with was 30 minutes so I kept the presentation short to leave some time at the end for questions. The feedback I received was very positive as 3 of the 5 interviewers expressed how much they liked my presentation. Given that this presentation was created with my own personal time, I think it would be great to share it out in case anyone is looking for material to introduce an audience to IaC. This specific role I interviewed for had a special place in my heart because of the interviewers I had the opportunity to meet and how supportive everyone of them were. The marathon of interviews were long but extremely gratifying and I enjoyed the experience even though I wasn’t selected in the end.

Without further without further ado, the PowerPoint presentation can be downloaded here: https://docs.google.com/presentation/d/1v8X1e9RimDdkpiR01Rj5et_Mnip5n0u0/edit?usp=sharing&ouid=111702981669472586918&rtpof=true&sd=true

I will also paste the content of the presentation below along with the notes I used during the presentation. Enjoy!

image

Intro

Good afternoon everyone and thank you for attending this presentation. The topic I will be presenting is Infrastructure as Code in 15 minutes.

image

Agenda

The agenda today will begin with a look at how we traditionally deploy infrastructure, followed by What is Infrastructure as Code, also known as IaC. Then the benefits of IaC, what is imperative vs declarative, IaC with Terraform, IaC in DevOps Pipelines, a sample setup and finally Q&A.

image

Traditional infrastructure deployment

The tools for infrastructure deployment has traditionally been through the use of a graphical user interface and scripts. As user friendly GUIs are, the obvious challenges is that it is very much a manual and time-consuming process and prone to the errors that the administrators performing the configuration can make. Attempting to maintain consistency is very difficult thus leading to configuration drift and trying to keep multiple environments that are meant to mirror one another in lockstep is challenging. Trying to scale the environment is cumbersome (e.g. deploy more instances of VMs or add new subnets). Lastly, there isn’t an easy way to easily document the environment other than screenshots and spreadsheets containing configuration values.

Scripting adds a bit of automation but often difficult to maintain through time.

image

What is Infrastructure as Code?

Infrastructure as Code is the essentially managing and provisioning infrastructure through code. Leveraging code means that we can now introduce automation of the management of the infrastructure whether it is creating new resources or making modifications to them. Infrastructure of Code can be implemented as imperative or declarative, which is an important topic we will cover shortly.

image

Benefits of IaC

To further elaborate on the benefits of IaC, it is now possible to not only automate the deployment in one cloud but across multiple clouds such as GCP, Azure and AWS. The speed and efficiency of deployment can be greatly increased as the process eliminates the manual points and clicks of the administrator, the process is also repeatable and consistent allowing multiple environments to be deployed in lockstep. The code can easily be source controlled with versioning which will give way to team collaboration through Continuous Integration. CI/CD pipelines can be used to develop and deploy the infrastructure leveraging all the benefits of DevOps. Infrastructure management can simplified and standardized through policies and scale at ease – so think about tweaking a variable to scale from 1 to 100 rather than going into a GUI and deploying or cloning resources multiple times. Static application security testing, which is the process of reviewing source code and detecting vulnerabilities can now be performed rather than trying to comb through the deployment configuration documentation or GUI post deployment of the infrastructure. Manual labour is significantly reduced.

image

Imperative vs Declarative

One of the important aspects of IaC is the concept of imperative vs declarative. To put it in simple terms, let’s consider the end state or goal we want to achieve is to get to a pizza restaurant. Imperative can be described as “what to do” while declarative is “what is wanted.” So let’s say we hop into a taxi and want to get to this end state. An example of imperative instructions would be to tell the taxi driver to go:

  • Forward 1 mile
  • Turn right
  • Forward 2 miles
  • Turn left
  • Forward 3 miles
  • Arrive at pizza restaurant

While declarative is:

  • Go to the pizza restaurant.

image

Let’s dissect the differences and outline them.

With imperative, the starting point matters because we are explicitly calling out each action to get to the end state. This leads to difficulty in auditing the steps and trying to detect drift when changes are made. Version control is challenging if even possible, if the steps execute half way and stop due to error, you cannot repeat the steps without ending in a completely different state. The logic can get very complex as ordering matters and trying to change the destination state requires modifications to the steps.

Declarative, on the other hand, allows the starting point to be anywhere because the engine delivering or carrying you to the end state will handle the directions. Declarative is idempotent in nature so you can run it as many times as you want without affecting the state. The code can also be repeatedly ran in a pipeline to create multiple environments in lockstep. Having removed the detailed imperative steps, we can easily validate and detect any drift and introduce version control. Lastly, we can change the destination without worrying about changing the steps.

image

IaC with Terraform

One of the popular IaC solutions currently on the market is Terraform. It is written and compiled in Go and is a declarative language known as HashiCorp Configuration Language (HCL) and has multi-cloud support. The way it handles deployments to multiple clouds is through the use of providers and there are approximately 1521 providers currently available on their site. Terraform is written in plain text and can be source controlled with Git or Terraform cloud. Security can be introduced through RBAC so multiple workspaces for different teams managing different environments or components of it can only make changes to their environments. Lastly, policies with automation can be introduced to provide control and governance.

image

IaC with DevOps Pipelines

What IaC enables, which I feel is the most powerful aspect, is the use of pipelines. With IaC we can now leverage the DevOps methodology with CI/CD pipelines to deploy infrastructure. Pipelines can be created to only deploy infrastructure or can incorporate the deployment of infrastructure as a part of an application., which means the IaC is only a small component of the pipeline. The flow diagram shown here is a simplified version depicting of the process as we can integrate many different stages into the pipelines such as security scans and testing. This unlocks true automation and different release strategies.

image

Sample Setup

To demonstrate how we can fully automate the deployment of cloud resources, I have prepared a simple sample configuration where I will go through the setup in the following slides.

image

Prerequisites

We will assume that Jenkins along with the Terraform plugin is deployed, a GitHub repot with terraform deployment code is created and a service principal (in this case Azure) will be setup for Jenkins so it can deploy resources. So as show in the screenshots, we’ll have Jenkins, the Terraform plugin installed, the GitHub repo where the Terraform code is pulled and finally the service principal created in Azure.

image

Create Jenkins Pipeline

First, we’ll write the following Jenkins pipeline 4 stages for the infrastructure deployment. The first stage is named:

  • Checkout, which will checkout the code in the GitHub repo
  • The second will be to initialize Terraform downloading the required provider
  • Then Terraform plan will be executed so Terraform can perform a dry run which typically outputs to the console for the changes
  • Then Terraform apply or destroy will be executed to either deploy or remove the infrastructure

image

Parameterize the Jenkins pipeline

This simple setup will require administration intervention by choosing either to apply or destroy so we’ll configure a choice parameter for the pipeline. Note that we can also use triggers to automatically initiate the build through commits.

image

With the execution parameters setup, we will proceed to paste the code into the pipeline.

image

Build Pipeline

Then finally with the pipeline configured, we’ll initiate the pipeline build interactively by choosing apply, then we can view the progress as shown in the screenshot above. Once the build is complete, we should see the resources in Azure.

This short demonstration only scratches the surface of what are the limitless possibilities of IaC with pipelines. Other examples could be that a pipeline deploys an application which will include the infrastructure build as a step for the target infrastructure.

image

Ending

This concludes my IaC in 15 minutes presentation. Thank you for attending and feel free to ask any questions or provide any comment.

Saturday, April 9, 2022

Setting up Azure Monitor with Log Analytics for Azure Virtual Desktop

I’ve recently been involved in a few virtual desktop architecture design and one of the topics I was asked to discuss was around the monitoring of the virtual desktops. Having worked with Citrix and VMware VDI solutions, I’ve always enjoyed the “out of the box” monitoring solutions Citrix Director (shown in the screenshot below) and VMware Horizon Help Desk Tool that were included.

image

The metrics and visualization that these tools provided were extremely valuable for understanding the overall health and troubleshooting when issues arise. Those who worked with Azure Windows Virtual Desktop in the very beginning will remember that these metrics were possible but an investment of time were needed to develop the kusto queries to retrieve the capture metrics from Log Analytics so dashboards could be created. The professionals in the Azure community provided many prewritten queries, workbooks and even Power Bi dashboards. Fast forward today, Microsoft has made it extremely easy to capture and include many “out-of-the-box” dashboards with just a few clicks. Furthermore, the log analytics data is still available for anyone who would like to add additional logs and metrics to capture to create customized reports.

There are many great posts available that show up to configure Log Analytics to capture data but some of these manual steps aren’t really necessary today so the purpose of this blog post is to demonstrate how to quickly set up Log Analytics with baseline configuration defined by Microsoft. Upon completion of the setup, we’ll see how many dashboards for monitoring and reports are already available.

Create Log Analytics Workspace to store Azure Virtual Desktop events and metrics

As always, we’ll begin by creating a Log Analytics Workspace that is dedicated to storing the logs collected from the Azure Virtual Desktop components (it is best not to much other logs into this workspace):

image

Configure the required retention for the data (the default is 30 days) and be mindful of how much data being ingested according to the size of the deployment:

image

Set up Configuration Workbook

It is possible to proceed to the Azure Virtual Desktop’s Host pools and Workspaces to enabling logging, then configure event and performance metrics monitoring for each session host but this can all be completed by using the Configuration Workbook. Navigate to Azure Virtual Desktop in the Azure portal:

image

Select the Insights blade, select the Host Pool to be configured and click on Open Configuration Workbook:

image

Select the Log Analytics workspace that was configured in the first step and then click on Configure host pool:

image

image

The template for configuring Host pool diagnostic settings will be displayed indicating the following categories will be captured:

  • Management Activities
  • Feed
  • Connections
  • Errors
  • Checkpoints
  • HostRegistration
  • AgentHealthStatus

Proceed to deploy the template:

image

Note that the Host Pool’s Resources diagnostic settings is now configured:

image

Proceed to scroll down to the Workspace and click on Configure workspace:

image

The template for the workspace diagnostic settings will be displayed:

  • Management Activities
  • Feed
  • Errors
  • Checkpoints

Proceed to deploy the template:

image

Note that the Workspace’s Resources diagnostic settings is now configured:

image

Next, navigate to the Session host data settings tab:

image

Click on Configure performance counters to capture the recommended baseline counters as displayed on the right under Missing counters:

image

Click on Apply Config:

image

Note the performance counters that have been successfully added:

image

Proceed to configure the recommended event logs to be captured by clicking on Configure events:

image

Click on the Deploy button:

image

The Windows event logs that will be captured will be listed. Note that the Microsoft-Windows-GroupPolicy/Operational log is not included in the baseline but is one I added (more on this a bit later).

image

With the Resource diagnostic settings and Session host data settings configured, proceed to the Data Generated tab and a summary of the amount of Perf Counters, AVD Diagnostics and Events billed over the last 24hrs will be displayed. I’ve waited for a few days before capturing the screenshot so metrics would be displayed:

imageimageimage

If we navigate to the Host Pool blade of the Azure Virtual Desktop deployment and click on the Diagnostic settings, we’ll see the configuration we have just completed. Some administrations will know that this is something that can be configured manually as well.

imageimage

The same is for the Workspace as well:

 

imageimage

Adding new Session Hosts to Log Analytics Workspace

It is important to remember to add new session hosts (VDIs) as they are added to the AVD deployment so they are monitored. To add new hosts, navigate to Azure Virtual Desktop > Insights:

image

A message indicating There are session hosts not sending data to the expected Log Analytics workspace. will be displayed if any are unmonitored. Otherwise the following dashboard will be displayed:

image

Out-of-the-Box Dashboards

Microsoft provides many out-of-the-box dashboards after completing the configuration.

  • Connection diagnostics: % of users able to connect
  • Connection performance: Time to connect (new sessions)
  • Host diagnostics: Event log errors
  • Host performance: Median input latency
  • Utilization
  • Daily connections and reconnections
  • Daily alerts
imageimage

Navigating to the Connection Diagnostics tab will provide the following metrics:

  • Success rate of (re)establishing a connection (% of connections)
  • Success rate of establishing a connection (% of users able to connect)
  • Potential connectivity issues in Last 48 hours
  • Connection activity browser for Last 48 hours
  • Ranking of Errors impacting Connection activities in Last 48 hours
imageimageimage

Navigating to the Connection Performance tab will provide the following metrics:

  • Top 10 users with highest median time to connect
  • Top 10 hosts with highest median time to connect
  • Time to connect and sign in, end-to-end
  • Time for service to route user to a host
  • Round-trip time
  • RTT median and 95th percentile for all hosts

imageimage

Navigating to the Host Diagnostics tab will provide the following metrics:

  • Host pool details
  • Performance counters
  • Events
  • Host browser
  • CPU usage
  • Available memory
imageimageimage

Navigating to the Host Performance tab will provide the following metrics:

  • Input delay by host
  • Input delay by process
image

Navigating to the Users tab will allow you to interactively search for a user and then provide the following metrics:

  • Connections over time for tluk@contoso.com
  • Feed refreshes by client and version
  • Feed refreshes over time for tluk@contoso.com
  • Connections by client and version
  • Key usage numbers
  • Connection activity browser for Last 48 hours
  • Ranking of errors impacting Connection activities for tluk@contoso.com in Last 48 hours
imageimageimage

Navigating to the Utilization tab will provide the following metrics:

  • Sessions summary
  • Max users per core
  • Available sessions
  • CPU usage
  • Monthly active users (MAU)
  • Daily connections and reconnections
  • Daily connected hours
  • Top 10 users by connection time
  • Top 10 hosts by connection time
imageimage

Navigating to the Clients tab will provide the following metrics:

  • Active users by client type over time
  • Usage by client version for all clients
  • Users with potentially outdated clients (all activity types)
image

Navigating to the Alerts tab will provide the following metrics:

  • Alerts over time
  • Details filtered to all severities
image

And there you have it. It’s truly amazing the amount of dashboards made available with minimal amount of configuration for the environment.

Custom Monitoring of Metrics

I had indicated in one of the previous screenshots that I included the log Microsoft-Windows-GroupPolicy/Operational for the events captured and the reason for this is because I’ve worked in many projects for VDI deployments in the past where the virtual desktop solution was blamed for slow logon performance. One of the metrics I used quick frequent is the GPO processing that the Citrix Director dashboard provides and this value can be easily obtained by capturing the Microsoft-Windows-GroupPolicy/Operational log and using the following kusto queries:

User GPO processing:

// This query will retrieve the amount of time required for computer logon policy processing to complete by parsing ParameterXML

// The logon details can also be retrieved from EventData but we're using ParameterXml instead to demonstrate how to parse it

Event

| where EventLog == "Microsoft-Windows-GroupPolicy/Operational"

| where Computer contains "Server-or-Desktop-Name"

| where EventID == "8001"

| parse ParameterXml with * "<Param>" GPO_Processing_Seconds "</Param><Param>" Digit1 "</Param><Param>" Server_or_Computer "</Param><Param>" Digit2 "</Param><Param>" Boolean

| project TimeGenerated, Server_or_Computer, GPO_Processing_Seconds, RenderedDescription

https://github.com/terenceluk/Azure/blob/main/Kusto%20KQL/Get-User-Logon-Policy-Processing-Duration.kusto

Computer GPO processing:
// This query will retrieve the amount of time required for user logon policy processing to complete by parsing ParameterXML

// The logon details can also be retrieved from EventData but we're using ParameterXml instead to demonstrate how to parse it

Event

| where EventLog == "Microsoft-Windows-GroupPolicy/Operational"

| where Computer contains "Server-or-Desktop-Name"

| where EventID == "8001"

| parse ParameterXml with * "<Param>" GPO_Processing_Seconds "</Param><Param>" Digit1 "</Param><Param>" User "</Param><Param>" Digit2 "</Param><Param>" Boolean

| project TimeGenerated, Computer, User, GPO_Processing_Seconds, RenderedDescription

https://github.com/terenceluk/Azure/blob/main/Kusto%20KQL/Get-Computer-Logon-Policy-Processing-Duration.kusto

Other metrics can also be collected using kusto and your imagination is really the limit.

image

I hope this post served as a good refresher for anyone who hasn’t looked at Azure Virtual Desktop monitoring for a while and would like to know what features are available with minimal configuration. The following are Microsoft Azure Virtual Desktop related documentation I would highly recommend reading:

Sample Kusto Queries:
https://docs.microsoft.com/en-us/azure/virtual-desktop/diagnostics-log-analytics#example-queries

Using Log Analytics to Monitor AVD:
Walkthrough of setting up the diagnostics, events, performance, workbooks:
https://docs.microsoft.com/en-us/azure/virtual-desktop/azure-monitor

Monitoring Virtual Machines:
https://docs.microsoft.com/en-us/azure/azure-monitor/vm/monitor-virtual-machine

Diagnostic Logs References (fields):
WVDConnections
https://docs.microsoft.com/en-us/azure/azure-monitor/reference/tables/wvdconnections

WVDErrors
https://docs.microsoft.com/en-us/azure/azure-monitor/reference/tables/wvderrors

Troubleshoot Azure Monitor for Azure Virtual Desktop
https://docs.microsoft.com/en-us/azure/virtual-desktop/troubleshoot-azure-monitor

Log data ingestion time in Azure Monitor
https://docs.microsoft.com/en-us/azure/azure-monitor/logs/data-ingestion-time