Pages

Tuesday, March 26, 2019

Skype for Business Server 2019 Front-End service stuck at Starting status

Problem

You’ve just completed deploying a new Skype for Business Server 2019 server into an environment but noticed that the Skype for Business Server Front-End service remains stuck at the Starting status and never completes to Running or stops:

Executing the cmdlet Get-CsWindowsService displays the following:

Reviewing the Lync Server logs show the following entries:

Log Name: Lync Server

Source: LS User Services

Event ID: 32174

Level: Warning

Server startup is being delayed because fabric pool manager has not finished initial placement of users.

Currently waiting for routing group: {63BB8586-A9D8-5AF2-83FF-B5CE680594C0}.

Number of groups potentially not yet placed: 1.

Total number of groups: 1.

Cause: This is normal during cold-start of a Pool and during server startup.

If you continue to see this message many times, it indicates that insufficient number of Front-Ends are available in the Pool.

Resolution:

During a cold-start of a large Pool it can take up to an hour for the placement process to finish as it needs to populate all the Front-End databases with data from the Backup Store. If the Pool is running and the Front-End is just started, this is normal for some time. If this repeats for a long time, ensure that all the Front-Ends configured for this Pool are up and running. If multiple Front-Ends have been recently decommissioned, run Reset-CsPoolRegistrarState -ResetType QuorumLossRecovery to enable the Pool to recover from Quorum Loss and make progress.

Scrolling upwards from the warning displays the following error:

Log Name: Lync Server

Source: LS MCU Infrastructure

Event ID: 61029

Level: Error

In the past 30.0093507983333 minutes the process RtcHost(6756) received 1 invalid certificates. The last one was from server: contsfbstd01.contoso.com, IP Address: 10.198.40.152:60873, with subject: CN=contsfbstd01.contoso.com, OU=IT, O=contoso Re, L=Hamilton, S=Hamilton, C=BM, issued by: CN=contoso-CA, DC=contoso, DC=com. Validation error code was: 800B0109.

Resolution:

Please check the remote server and ensure that the certificate is valid. Also ensure that the full certificate chain of the Issuer is present in the local machine. If the remote certificate and chain appear to be valid and error code is 0x800B0109 (CERT_E_UNTRUSTEDROOT), check that the ROOT certificate store on the local machine does not contain any intermediate certificates (certificates with different values in 'Issued To' and 'Issued By' fields do not belong to the ROOT store and cause client certificate validation errors in HTTP.SYS)

The following warning is also logged:

Log Name: Microsoft-Service Fabric/Admin

Source: Microsoft-Service Fabric

Event ID: 4097

Level: Error

ignore error 0x80092013:certificate revocation list offline

You attempt to navigate to the directory:

C:\Program Files\Skype for Business Server 2019\Server\Core

… and edit the file:

ClusterManifests.Xml.Template

Changing the flag:

<Parameter Name="CrlCheckingFlag" Value="%CRLCHECKINGFLAG%" />

… to:

<Parameter Name="CrlCheckingFlag" Value="0" />

… which should disable CRL Checking for the certificates but this does not correct the issue.

Solution

The solution to this problem can actually be found in the previous error log:

Note the following text highlighted in red:

In the past 30.0093507983333 minutes the process RtcHost(6756) received 1 invalid certificates. The last one was from server: contsfbstd01.contoso.com, IP Address: 10.198.40.152:60873, with subject: CN=contsfbstd01.contoso.com, OU=IT, O=contoso Re, L=Hamilton, S=Hamilton, C=BM, issued by: CN=contoso-CA, DC=contoso, DC=com. Validation error code was: 800B0109.

Resolution:

Please check the remote server and ensure that the certificate is valid. Also ensure that the full certificate chain of the Issuer is present in the local machine. If the remote certificate and chain appear to be valid and error code is 0x800B0109 (CERT_E_UNTRUSTEDROOT), check that the ROOT certificate store on the local machine does not contain any intermediate certificates (certificates with different values in 'Issued To' and 'Issued By' fields do not belong to the ROOT store and cause client certificate validation errors in HTTP.SYS)

The reason why the front-end service is unable to start is because there is a certificate stored in the Trusted Root Certification Authority that isn’t actually a Root certificate.  To check this, load the local computer’s certificate store (certlm.msc) and review the certificates in the Trusted Root Certification Authority ensuring that the Issued To matches the Issued By word for work.

The following is a screenshot of the offending certificate I found in the Trusted Root Certification Authority where the Issued To is arersa01.domain.com while the Issued By is RSA root CA for arersa01.domain.com:

Opening the properties of this certificate will show that it is actually an Intermediate Certification Authority certificate:

Either removing the certificate or placing it in the appropriate datastore will correct the issue.

Note that having improperly placed certificates in certificate stores are known to cause service start and replication issues.  The following are a few of my older posts of Skype for Business / Lync Server environments:

Lync Server Access Edge service fails to start with: “… service-specific error code -2146762487”
http://terenceluk.blogspot.com/2013/05/lync-server-access-edge-service-fails.html

Lync Server 2013 Edge server replication issues on Windows Server 2012
http://terenceluk.blogspot.com/2013/04/lync-server-2013-edge-server.html

3 comments:

gap said...

Hello Terence,

Would it be possible to put back the images on this blog post? The links to the images seems to be broken.

Thanks! Your blog is really useful!

Gilles-André

Anonymous said...

your article was helpful to me.
Thanks for the good work :)

Anonymous said...

Thanks for this blog. It saved our day!