Monday, March 25, 2024

Accessing SharePoint document library with AI Search for Azure OpenAI GPT

The first quarter of this year has been insanely busy and has led to my inability to blog as much. I have been carving whatever time I have over the weekends to continue testing new Azure AI Services but couldn’t find the time to clear out my backlog of blog topics.

One of the recent tests I’ve done is to test out the in-preview feature of using Azure AI Search (previously known as Cognitive Search) to index a SharePoint Online document library. This feature has been one that I had interest in because of the vast amounts of SharePoint libraries I work with across different clients and the ability to use Azure OpenAI to tap into the libraries would be very attractive. CoPilot Studio offers such a feature where you can easily configure a SharePoint URL for a CoPilot to tap into but I still prefer Azure AI Services as I feel the flexibility that development offers provides much more creative ideas.

With the above said, the purpose of this post is to provide 2 scripts:

  1. Script to create the App Registration with the appropriate permissions to access SharePoint Online
  2. Script that will create the AI Search data source, indexer, and index

The deployment will follow the same found in the following Microsoft document:

Please take the time to read the supported document formats and limitations of this feature.

Step 1 – Enable system managed assigned managed identity

This step is optional and isn’t needed for the configuration in this blog post because we’ll be including the tenant ID in the connection string of the data source but if your environment uses AI Search to access storage accounts then this will likely already be enabled.

Step 2 – Decide which permissions the indexer requires

There are advantages and disadvantages to go either way. This post will demonstrate using application permissions but the script also includes commented out code for delegated.

Step 3 - Create a Microsoft Entra application registration that will be used to access the SharePoint document library

Use the following script to create the App Registration that will configure the required permissions:

Note that there does not appear to be a way for Azure CLI to configure Platform configurations so you'll need to manually perform the following after the App Registration is created:

  1. Navigate to the Authentication tab of the App Registration
  2. Set Allow public client flows to Yes then select Save.
  3. Select + Add a platform, then Mobile and desktop applications, then check, then Configure.

Step 4 to 7 – Create SharePoint data source, indexer, index, and get properties of index

Use the following PowerShell script to create and configure the above components in the desired AI Search:

The following components should be displayed when successfully configured:

AI Search Data Source


AI Search Indexer


AI Search Index


Test Chatbot with SharePoint Online document library data

With the AI Search configured to tap into the SharePoint Online library, we can now use the Azure Open AI Studio to test chatting with the data.

Launch Azure Open AI Studio:


Select Add your data and Add a data source:


Select Azure AI Search as the data source:


Select the appropriate subscription, AI Search service, and the index that was created.

There is also an option to customize the field mapping rather than using the default.


These two screenshots show the customization options:


For those who have watched the YouTube videos demonstrating the configuration, most of them have selected “content” for all the fields but as shown in the screenshot below, this is not allowed anymore as of March 23, 2024 because if such an attempt is made, the following error message will be displayed:

You cannot use the same column data in multiple fields


Proceeding to the Data management configuration will reveal that Semantic search is not available:


Only Keyword is available:


Review the configuration and complete the setup:


You should now be able to chat with your data:


Thoughts and Options

As noted in the beginning of the Microsoft document, this preview feature isn’t recommended for production workloads and Microsoft is very clear in the limitations section indicating:

  • If you need a SharePoint content indexing solution in a production environment, consider creating a custom connector with SharePoint Webhooks, calling Microsoft Graph API to export the data to an Azure Blob container, and then use the Azure Blob indexer for incremental indexing.

Using the indexer against an Azure Storage account opens up text embedding model capabilities that provide vector and semantic search, which would yield much better results. However, if the requirement is simply to gain some light insight into a SharePoint document library then piloting this preview feature and waiting for it to GA may be a good initiative.

Monday, January 29, 2024

Updating aztfexport generated "res-#" resource names with PowerShell scripts

Happy new year! It has been an extremely busy start to 2024 for me with the projects I’ve been involved it so I’ve fallen behind on a few of the blog posts I have queued up since November of last year. While I still haven’t gotten to the backlog yet, I would like to quickly write this one as it was a challenge I came across while testing the aztfexport (Azure Export for Terraform) tool to export a set of Azure Firewall, VPN Gateway, and VNet resources in an environment. The following is the Microsoft documentation for this tool:

Quickstart: Export your first resources using Azure Export for Terraform

Those who have worked with this tool will know that the exported files it creates names the resource names of the resource types identified for import as:

  • res-0
  • res-1
  • res-2

… and so on. These references are used across these multiple files:

  • aztfexportResourceMapping.json

While the generated files with these default names will work, it makes it very difficult to identify what these resources are. One of the options available is to go and manually update these files with search and replace but any amount of over 20 resources can quickly because tedious and error prone.

With this challenge, I decided to create 2 PowerShell scripts to automate the process of searching and replacing the names of res-0, res-1, res-2 and so on. The first script will parse the file:


… and extract the fields “id” and “to” into 2 columns, then create and addition 2 columns that contain the “res-#” and the next containing the name of the resource in Azure to a CSV:


If the desire is to use the Azure names as the resource name then no changes are required. If alternate names are desired, then update the values for the Azure Resource Logical Name in the spreadsheet.

The second script will then reference this spreadsheet to search through the directory with the Terraform files and update the res-# values to the desired values.

The two scripts can be found here in my GitHub repo:

Create the CSV file from - Extract-import-tf-file.ps1

Replace all references to res-# with desired values - Replace-Text-with-CSV-Reference.ps1

I hope this helps anyone who may be looking for this automated way to update exported Terraform code.

Wednesday, November 29, 2023

Python script that will asynchronously receive events from an Azure Event Hub and send it to a Log Analytics Workspace custom table

One of the key items I’ve been working on over the past week as a follow up to my previous post:

How to log the identity of a user using an Azure OpenAI service with API Management logging (Part 1 of 2)

… is to write a Python script that will read events as they arrive in an Event Hub, then send it over to a Log Analytics Workspace’s custom table for logging. The topology is as such:


The main reason why I decided to go with this method is because:

Tutorial: Ingest events from Azure Event Hubs into Azure Monitor Logs (Public Preview)

… required the Log Analytics workspace to be linked to a dedicated cluster or to have a commitment tier. The lowest price for such a configuration would be cost prohibitive for me to deploy in a lab environment so I decided to build this simple ingestion method.

Log Analytics Pricing Tiers:


I used various documentation available to create the script, create the App Registration, configure the Data Collection Endpoint and Data Collection Rule for the Log Analytics ingestion. Here are a few for reference:

Send events to or receive events from event hubs by using Python

Logs Ingestion API in Azure Monitor

Tutorial: Send data to Azure Monitor Logs with Logs ingestion API (Azure portal)

The script can be found at my GitHub repository here:

The following are some screenshots of the execution and output:

OpenAI API Call from Postman to API Management:


Script Execution and Output:


Log Analytics Ingestion Results:


I hope this helps anyone who might be looking for a script for the processing of events and ingestion to Log Analytics as it took me quite a bit of time on and off to troubleshoot various issues encountered. With this script out of the way, I am no prepared to finish up the 2 of 2 post for an OpenAI logging end to end solution, which I will be writing shortly.

Sunday, November 26, 2023

"204 No Content" returned in Postman when attempting to write logs to a data collection endpoint with a data collection rule for Log Analytics custom log ingestion

I’ve been working on my Part 2 of 2 post to demonstrate how we can use Event Hubs to capture the identity of incoming API access for the Azure OpenAI service published by an API Management and while doing so, noticed an odd behavior when attempting to use the Log Ingestion API as demonstrated outlined here:

Logs Ingestion API in Azure Monitor


I configured all of the required components and wanted to test with Postman before updating the Python script I had for ingesting Event Hub logs but noticed that I would constantly get a 204 No Content status return with no entries added to the Log Analytics table I had set up. To make a long story short, the issue was because the JSON body I was submitting was not enclosed with square brackets [] and further tests show that regardless of whether the accepted format (with square brackets) was submitted or not, the same 204 No Content would be returned.

The following is a demonstration of this in Postman.

The variables I have defined in Postman are:

  • Data_Collection_Endpoint_URI
  • DCR_Immutable_ID
  • client_id_Log_Analytics
  • client_secret_Log_Analytics


The following are where we can retrieve the values:

The Data_Collection_Endpoint_URI can be retrieved by navigating to the Data collection endpoint you had setup:


The DCR_Immutable_ID can be retrieved in the JSON view of the Data collection rule that was setup:


The client_id_Log_Analytics is located in the App Registration object:


The client_secret_Log_Analytics is the secret setup for the App Registration:


You’ll also need your tenant ID for the tenantId variable.

Set up the authorization tab in Postman with the following configuration:

Type: OAuth 2.0

Add authorization data to: Request Headers

Token: Available Tokens

Header Prefix: Bearer

Token Name: <Name of preference>

Grant type: Client Credentials

Access Token URL:{{tenant_id}}/oauth2/v2.0/token

Client ID: {{client_id_Log_Analytics}}

Client Secret: {{client_secret_Analytics }}


Client Authentication: Send as Basic Auth header

Leave the rest as default and click on Get New Access Token:


The token should be successfully retrieved:


Click on Use Token:


Configure a POST request with the following URL:


The Custom-APIMOpenAILogs_CL value can be retrieved in the JSON View of the Data collection rule:


Proceed to configure the following for the Params tab:

api-version: 2021-11-01-preview


The Authorization key should be filled out with the token that was retrieved.

Set the Content-Type to application/json.


For the body, let’s test with the JSON content WITHOUT the square brackets:


"EventTime": "11/24/2023 8:19:57 PM",

"ServiceName": "",

"RequestId": "91ff7b54-a0eb-4ada-8d27-6081f71e44a3",

"RequestIp": "",

"OperationName": "Creates a completion for the chat message",

"apikey": "6f82e8f56e604e6cae6e0999e6bdc013",

"requestbody": {

"messages": [


"role": "user",

"content": "Testing without brackets."



"temperature": 0.7,

"top_p": 0.95,

"frequency_penalty": 0,

"presence_penalty": 0,

"max_tokens": 800,

"stop": null



"AppId": "12bccc26-b778-4a2d-bb7a-4f5732e7a79d",

"Oid": "ee116d59-d49b-4557-b2ab-c911f451d5c8",

"Name": "Terence Luk"



Notice the returned 204 status:

204 No Content

The server successfully processed the request, but is not returning any content.

Waiting for an indefinite time will show that the log is not written to Log Analytics.


Now WITH square brackets:


Notice the same 204 status is returned:


However, using the square brackets show that the log entry is successfully written:


All the GitHub and forum posts have others indicating this appears to be the expected behavior so the entry will be written as long as the square brackets are included.

I will be including the instructions on setting up the App Registration, Data Collection Endpoint, Data Collection Rule, and other components in my part 2 of 2 post for logging the identity of an OpenAI call through the API Management.

Friday, November 17, 2023

How to log the identity of a user using an Azure OpenAI service with API Management logging (Part 1 of 2)

The single question I’ve been asked the most over the past few months from colleagues, clients, other IT professionals is how can we identify exactly who is using the Azure OpenAI service so we can generate accurate consumption reports and allow proper charge back to a department? Those who have worked with the diagnostic settings for Azure OpenAI and API Management will know that logging is available but there are gaps that desperately needs to be addressed. A quick search over the internet will show that using API Management can log the caller’s IP address but that isn’t very useful for obvious reasons such as:

  1. If it’s public traffic with a public inbound IP address, how would we be able to tell who the user is?
  2. Even if we can tie a public IP address to an organization because that’s the outbound NAT, the identity of the user is not captured
  3. Even if we authenticate the user so a JWT token is provided to call the API, having the public IP address in the logs alone wouldn’t identify the user
  4. If these were private IP addresses, it would be a nightmare to try and match the inbound IP address with an internal workstation’s IP address that is likely DHCP

I believe the first time I was asked this question was 3 months ago and I’ve always thought that Microsoft will likely address this soon with a checkbox in the diagnostic settings or some other easy to configure offering but fast forward to today (November 2023), I haven’t seen a solution so I thought I’d do a bit of R&D over the weekend.

The closest solution I was able to find is from this DevRadio presentation:

Azure OpenAI scalability using API Management

… where the presenter used multiple instances of Azure OpenAI to separate prompts to the OpenAI service belonging to different business units. While this solution allowed costs to be separated between predefined business units, the thought of telling a client that I need multiple instances to serve this purpose didn’t seem like something they would be receptive. While the DevRadio solution did not meet the requirements I had, it did give me the idea that perhaps I can use the logging of events to event hubs feature of the Azure API Management to accomplish what I want in the solution.

I have to say that this blog post is probably one of the most exciting one I’ve written in a while because I was heads down focused on learning and testing the Azure API Management inbound processing capabilities over 3 days of my vacation time off and felt extremely fulfilled that I now have an answer to what I could not provide a solution to for months.

If you’re still reading this, you might be wondering why there is the label Part 1 of 2 and the reason is because I ran out of time and have gotten back to a busy work schedule so could not finish the last portion of this solution but don’t worry as what I will cover in Part 1 will at least capture the information to identify the calling user. Here is a summary of what I am able to cover in this blog post:

  1. How to set up API Management to log events to Event Hub
  2. What inbound processing code should be inserted to send the OAuth JWT token to event hub
  3. What inbound processing code can be used to extract any values in the JWT token to event hub
  4. How to view the logged entries in event hub

The following is what I will cover in Part 2 in a future post:

  1. How to ingest events from Azure Event Hubs into Azure Monitor Logs
  2. How to use KQL to join events logged by API Management’s diagnostic settings (containing token usage, prompt information) with Azure Event Hub ingested logs (containing user identification)

The following is a high level architecture diagram and the flow of the traffic:


I’m excited to get this post published so let’s get started.


This solution will require us to place an Azure API Management service in front of the Azure OpenAI service so API calls are:

  1. Logged by the APIM
  2. Authorized with OAuth by the API Management

Please refer to my previous post for how to set this up:

Securing Azure OpenAI with API Management to only allow access for specified Azure AD users

What is available today out-of-the-box: API Management Diagnostic Settings Logging Capabilities

Assuming you have configured the API Management service as I demonstrated in my prerequisite section and Diagnostics Logging is turned on:


… then a set of information for each API call would be logged in the configured Log Analytics. Let’s first review what is available out-of-the-box when for the API Management. The complain I hear repeatedly is that while the logs captured by the API Management provide all the following great information:

  • TenantId
  • TimeGenerated [UTC]
  • OperationName
  • Correlationid
  • Region
  • isRequestSuccess
  • Category
  • TotalTime
  • CalleripAddress
  • Method
  • Url
  • ClientProtocol
  • ResponseCode
  • BackendMethod
  • BackendUrl
  • BackendResponseCode
  • BackendProtocol
  • RequestSize
  • ResponseSize
  • Cache
  • BackendTime
  • Apid
  • Operationid
  • ApimSubscriptionid
  • ApiRevision
  • ClientTlsVersion
  • RequestBody
  • ResponseBody
  • BackendRequestBody

None of these captured fields allow for identifying the caller. To address this gap, we can leverage the log-to-eventhub inbound processing feature of API Management and Event Hubs to send additional information about the inbound API call to an event hub, then process it according to our requirements.

Turning on the logging of events for the API Management to Event Hubs

The first step for this solution is to turn on the feature that has API Management log to an Event Hub. I won’t go into the usual detail I provide for setting up the components due to my limited time but begin by creating an Event Hub Instance and Event Hub as shown in the following screenshots to serve as a destination for the APIM to send its logs:


Once the Event Hub Instance and Event Hub is created, and the API Management’s System Managed Identity is granted, we will use the following instructions to turn on the feature in API Management and use the Event Hub:

Logging with Event Hub

More detail about how the API Management is configured is described here:

How to log events to Azure Event Hubs in Azure API Management

Configuring API Management’s Inbound Processing rule to log JWT token and its values

The API Management log-to-eventhub can send any type of information to the Event Hub. For this post, I am going to demonstrate how to send the following information:

  • EventTime
  • ServiceName
  • RequestId
  • RequestIp
  • Operationname
  • api-key
  • request-body
  • JWTToken
  • AppId
  • Oid
  • Name

Let’s go through these fields in a bit more detail. The following list of fields:

  • EventTime
  • ServiceName
  • RequestId
  • RequestIp
  • Operationname
  • request-body

… are ones that can be retrieved from the out-of-the-box diagnostic settings logs. I haven’t looked into all the available fields but I suspect that we can send all the out-of-the-box diagnostic settings to event hub to recreate what we have and potentially allow us to turn off the built in logging. The advantage of such an approach is that all logs will be stored in a single log analytics workspace table. The disadvantage of such an approach is that if new fields are introduced into the built in logs then we would need to update our log-to-eventhub code to capture those fields.

The other fields:

  • api-key
  • JWTToken
  • AppId
  • Oid
  • Name

… are ones that we’re looking for. The api-key probably isn’t as important, but I wanted to include this to show that it can be captured. The JWT Token that was passed to the API Management is captured and while it can be copied out, then decoded with, it isn’t very useful if we’re trying to use KQL to generate reports. The remaining fields, which is probably what everyone is looking for, AppId, Oid, Name are extracted from the fields in the JWT Token. These fields are just examples that I included in the demonstration, and it is possible to extract any other field you like by adding to the inbound processing XML code.

Navigate to the API Management service, APIs blade, Azure OpenAI Service API, All Operations, then click on the </> policy code editor icon under Inbound processing:


The following is the XML code insert that you’ll need so that the fields listed above will be captured and sent to the Event Hub:



    - Policy elements can appear only within the <inbound>, <outbound>, <backend> section elements.

    - To apply a policy to the incoming request (before it is forwarded to the backend service), place a corresponding policy element within the <inbound> section element.

    - To apply a policy to the outgoing response (before it is sent back to the caller), place a corresponding policy element within the <outbound> section element.

    - To add a policy, place the cursor at the desired insertion point and select a policy from the sidebar.

    - To remove a policy, delete the corresponding policy statement from the policy document.

    - Position the <base> element within a section element to inherit all policies from the corresponding section element in the enclosing scope.

    - Remove the <base> element to prevent inheriting policies from the corresponding section element in the enclosing scope.

    - Policies are applied in the order of their appearance, from the top down.

    - Comments within policy elements are not supported and may disappear. Place your comments between policy elements or at a higher level scope.




<base />

<set-header name="api-key" exists-action="append">



<validate-jwt header-name="Authorization" failed-validation-httpcode="403" failed-validation-error-message="Forbidden" output-token-variable-name="jwt-token">

<openid-config url={{Tenant-ID}}/v2.0/.well-known/openid-configuration />





<claim name="roles" match="any">





<set-variable name="request" value="@(context.Request.Body.As<JObject>(preserveContent: true))" />

<set-variable name="api-key" value="@(context.Request.Headers.GetValueOrDefault("api-key",""))" />

<set-variable name="jwttoken" value="@(context.Request.Headers.GetValueOrDefault("Authorization",""))" />

<log-to-eventhub logger-id="event-hub-logger">@{

        var jwt = context.Request.Headers.GetValueOrDefault("Authorization","").AsJwt();

        var appId = jwt.Claims.GetValueOrDefault("appid", string.Empty);

        var oid = jwt.Claims.GetValueOrDefault("oid", string.Empty);

        var name = jwt.Claims.GetValueOrDefault("name", string.Empty);

         return new JObject(

             new JProperty("EventTime", DateTime.UtcNow.ToString()),

             new JProperty("ServiceName", context.Deployment.ServiceName),

             new JProperty("RequestId", context.RequestId),

             new JProperty("RequestIp", context.Request.IpAddress),

             new JProperty("OperationName", context.Operation.Name),

             new JProperty("api-key", context.Variables["api-key"]),

             new JProperty("request-body", context.Variables["request"]),

             new JProperty("JWTToken", context.Variables["jwttoken"]),

             new JProperty("AppId", appId),

             new JProperty("Oid", oid),

             new JProperty("Name", name)





<base />



<base />



<base />




The XML code can be found at my GitHub Repository:

Proceed to click on the Save button and additional set-variable and log-to-eventhub policies should be displayed under Inbound processing:


With the API Management’s inbound processing rule updated, initiating API calls to the APIM to generate request traffic and let it capture the information. Once a few requests have been made, navigate to the Event Hub then Process data:


Within the Process data blade, click on the Start button for Enable real time insights from events:


Click on the Test Query button to load the captured logs:


The logs typically take a minute or 2 to show up so if no logs are displayed then try executing the Test query again after a few minutes:


We can see that it is possible for us to edit the inbound processing policy to recreate the type of log entries the API Management out-of-the-box diagnostic settings but if that is not desired, it is possible to map the logs in the Event Hub to the logs in the diagnostic settings with the use of the RequestId from the Event Hub logs and the CorrelationId of the APIM diagnostic settings logs as shown in the screenshots below:

RequestID from Event Hub


CorrelationId from API Management Diagnostic Settings Logs


Note that there are different views available in the Event Hub logs. Below is a Raw view displayed as JSON:


As mentioned earlier, the JWT token passed for authorization is captured and it is possible to decode the value to view the full payload. If any additional fields are desired then the inbound processing policy can be modified to capture this information:


Now that we have the JWT token information captured, we can send the Azure Event Hubs logs into a Log Analytics Workspace and join the 2 tables together with KQL. I will be providing a walkthrough for how to accomplish this as outlined in this document:

Tutorial: Ingest events from Azure Event Hubs into Azure Monitor Logs (Public Preview)

… in the part 2 of my future post.

I hope this helps anyone out there looking for a way to capture the identity of the user using the Azure OpenAI Service.