Pages

Monday, March 25, 2024

Accessing SharePoint document library with AI Search for Azure OpenAI GPT

The first quarter of this year has been insanely busy and has led to my inability to blog as much. I have been carving whatever time I have over the weekends to continue testing new Azure AI Services but couldn’t find the time to clear out my backlog of blog topics.

One of the recent tests I’ve done is to test out the in-preview feature of using Azure AI Search (previously known as Cognitive Search) to index a SharePoint Online document library. This feature has been one that I had interest in because of the vast amounts of SharePoint libraries I work with across different clients and the ability to use Azure OpenAI to tap into the libraries would be very attractive. CoPilot Studio offers such a feature where you can easily configure a SharePoint URL for a CoPilot to tap into but I still prefer Azure AI Services as I feel the flexibility that development offers provides much more creative ideas.

With the above said, the purpose of this post is to provide 2 scripts:

  1. Script to create the App Registration with the appropriate permissions to access SharePoint Online
  2. Script that will create the AI Search data source, indexer, and index

The deployment will follow the same found in the following Microsoft document: https://learn.microsoft.com/en-us/azure/search/search-howto-index-sharepoint-online

Please take the time to read the supported document formats and limitations of this feature.

Step 1 – Enable system managed assigned managed identity

This step is optional and isn’t needed for the configuration in this blog post because we’ll be including the tenant ID in the connection string of the data source but if your environment uses AI Search to access storage accounts then this will likely already be enabled.

Step 2 – Decide which permissions the indexer requires

There are advantages and disadvantages to go either way. This post will demonstrate using application permissions but the script also includes commented out code for delegated.

Step 3 - Create a Microsoft Entra application registration that will be used to access the SharePoint document library

Use the following script to create the App Registration that will configure the required permissions: https://github.com/terenceluk/Azure/blob/main/AI%20Services/SharePoint%20Online%20Indexer/Create-App-Registration.sh

Note that there does not appear to be a way for Azure CLI to configure Platform configurations so you'll need to manually perform the following after the App Registration is created:

  1. Navigate to the Authentication tab of the App Registration
  2. Set Allow public client flows to Yes then select Save.
  3. Select + Add a platform, then Mobile and desktop applications, then check https://login.microsoftonline.com/common/oauth2/nativeclient, then Configure.

Step 4 to 7 – Create SharePoint data source, indexer, index, and get properties of index

Use the following PowerShell script to create and configure the above components in the desired AI Search: https://github.com/terenceluk/Azure/blob/main/AI%20Services/SharePoint%20Online%20Indexer/Configure-AI-Search-for-SharePoint.ps1

The following components should be displayed when successfully configured:

AI Search Data Source

image

AI Search Indexer

image

AI Search Index

image 

Test Chatbot with SharePoint Online document library data

With the AI Search configured to tap into the SharePoint Online library, we can now use the Azure Open AI Studio to test chatting with the data.

Launch Azure Open AI Studio:

image

Select Add your data and Add a data source:

image

Select Azure AI Search as the data source:

image

Select the appropriate subscription, AI Search service, and the index that was created.

There is also an option to customize the field mapping rather than using the default.

image

These two screenshots show the customization options:

imageimage

For those who have watched the YouTube videos demonstrating the configuration, most of them have selected “content” for all the fields but as shown in the screenshot below, this is not allowed anymore as of March 23, 2024 because if such an attempt is made, the following error message will be displayed:

You cannot use the same column data in multiple fields

image

Proceeding to the Data management configuration will reveal that Semantic search is not available:

image

Only Keyword is available:

image

Review the configuration and complete the setup:

image

You should now be able to chat with your data:

image

Thoughts and Options

As noted in the beginning of the Microsoft document, this preview feature isn’t recommended for production workloads and Microsoft is very clear in the limitations section indicating:

  • If you need a SharePoint content indexing solution in a production environment, consider creating a custom connector with SharePoint Webhooks, calling Microsoft Graph API to export the data to an Azure Blob container, and then use the Azure Blob indexer for incremental indexing.

Using the indexer against an Azure Storage account opens up text embedding model capabilities that provide vector and semantic search, which would yield much better results. However, if the requirement is simply to gain some light insight into a SharePoint document library then piloting this preview feature and waiting for it to GA may be a good initiative.

Monday, January 29, 2024

Updating aztfexport generated "res-#" resource names with PowerShell scripts

Happy new year! It has been an extremely busy start to 2024 for me with the projects I’ve been involved it so I’ve fallen behind on a few of the blog posts I have queued up since November of last year. While I still haven’t gotten to the backlog yet, I would like to quickly write this one as it was a challenge I came across while testing the aztfexport (Azure Export for Terraform) tool to export a set of Azure Firewall, VPN Gateway, and VNet resources in an environment. The following is the Microsoft documentation for this tool:

Quickstart: Export your first resources using Azure Export for Terraform
https://learn.microsoft.com/en-us/azure/developer/terraform/azure-export-for-terraform/export-first-resources?tabs=azure-cli

Those who have worked with this tool will know that the exported files it creates names the resource names of the resource types identified for import as:

  • res-0
  • res-1
  • res-2
image

… and so on. These references are used across these multiple files:

  • aztfexportResourceMapping.json
  • import.tf
  • main.tf
image

While the generated files with these default names will work, it makes it very difficult to identify what these resources are. One of the options available is to go and manually update these files with search and replace but any amount of over 20 resources can quickly because tedious and error prone.

With this challenge, I decided to create 2 PowerShell scripts to automate the process of searching and replacing the names of res-0, res-1, res-2 and so on. The first script will parse the import.tf file:

image

… and extract the fields “id” and “to” into 2 columns, then create and addition 2 columns that contain the “res-#” and the next containing the name of the resource in Azure to a CSV:

image

If the desire is to use the Azure names as the resource name then no changes are required. If alternate names are desired, then update the values for the Azure Resource Logical Name in the spreadsheet.

The second script will then reference this spreadsheet to search through the directory with the Terraform files and update the res-# values to the desired values.

The two scripts can be found here in my GitHub repo:

Create the CSV file from Import.tf - Extract-import-tf-file.ps1
https://github.com/terenceluk/Azure/blob/main/PowerShell/Extract-import-tf-file.ps1

Replace all references to res-# with desired values - Replace-Text-with-CSV-Reference.ps1
https://github.com/terenceluk/Azure/blob/main/PowerShell/Replace-Text-with-CSV-Reference.ps1

I hope this helps anyone who may be looking for this automated way to update exported Terraform code.