The first quarter of this year has been insanely busy and has led to my inability to blog as much. I have been carving whatever time I have over the weekends to continue testing new Azure AI Services but couldn’t find the time to clear out my backlog of blog topics.
One of the recent tests I’ve done is to test out the in-preview feature of using Azure AI Search (previously known as Cognitive Search) to index a SharePoint Online document library. This feature has been one that I had interest in because of the vast amounts of SharePoint libraries I work with across different clients and the ability to use Azure OpenAI to tap into the libraries would be very attractive. CoPilot Studio offers such a feature where you can easily configure a SharePoint URL for a CoPilot to tap into but I still prefer Azure AI Services as I feel the flexibility that development offers provides much more creative ideas.
With the above said, the purpose of this post is to provide 2 scripts:
- Script to create the App Registration with the appropriate permissions to access SharePoint Online
- Script that will create the AI Search data source, indexer, and index
The deployment will follow the same found in the following Microsoft document: https://learn.microsoft.com/en-us/azure/search/search-howto-index-sharepoint-online
Please take the time to read the supported document formats and limitations of this feature.
Step 1 – Enable system managed assigned managed identity
This step is optional and isn’t needed for the configuration in this blog post because we’ll be including the tenant ID in the connection string of the data source but if your environment uses AI Search to access storage accounts then this will likely already be enabled.
Step 2 – Decide which permissions the indexer requires
There are advantages and disadvantages to go either way. This post will demonstrate using application permissions but the script also includes commented out code for delegated.
Step 3 - Create a Microsoft Entra application registration that will be used to access the SharePoint document library
Use the following script to create the App Registration that will configure the required permissions: https://github.com/terenceluk/Azure/blob/main/AI%20Services/SharePoint%20Online%20Indexer/Create-App-Registration.sh
Note that there does not appear to be a way for Azure CLI to configure Platform configurations so you'll need to manually perform the following after the App Registration is created:
- Navigate to the Authentication tab of the App Registration
- Set Allow public client flows to Yes then select Save.
- Select + Add a platform, then Mobile and desktop applications, then check https://login.microsoftonline.com/common/oauth2/nativeclient, then Configure.
Step 4 to 7 – Create SharePoint data source, indexer, index, and get properties of index
Use the following PowerShell script to create and configure the above components in the desired AI Search: https://github.com/terenceluk/Azure/blob/main/AI%20Services/SharePoint%20Online%20Indexer/Configure-AI-Search-for-SharePoint.ps1
The following components should be displayed when successfully configured:
AI Search Data Source
AI Search Indexer
AI Search Index
Test Chatbot with SharePoint Online document library data
With the AI Search configured to tap into the SharePoint Online library, we can now use the Azure Open AI Studio to test chatting with the data.
Launch Azure Open AI Studio:
Select Add your data and Add a data source:
Select Azure AI Search as the data source:
Select the appropriate subscription, AI Search service, and the index that was created.
There is also an option to customize the field mapping rather than using the default.
These two screenshots show the customization options:
You cannot use the same column data in multiple fields
Proceeding to the Data management configuration will reveal that Semantic search is not available:
Only Keyword is available:
Review the configuration and complete the setup:
You should now be able to chat with your data:
Thoughts and Options
As noted in the beginning of the Microsoft document, this preview feature isn’t recommended for production workloads and Microsoft is very clear in the limitations section indicating:
- If you need a SharePoint content indexing solution in a production environment, consider creating a custom connector with SharePoint Webhooks, calling Microsoft Graph API to export the data to an Azure Blob container, and then use the Azure Blob indexer for incremental indexing.
Using the indexer against an Azure Storage account opens up text embedding model capabilities that provide vector and semantic search, which would yield much better results. However, if the requirement is simply to gain some light insight into a SharePoint document library then piloting this preview feature and waiting for it to GA may be a good initiative.