Tuesday, October 1, 2013

Lync Server 2013 Edge server gets uninstalled when running “Setup or Remove Lync Server Components” after successful deployment

Problem

I ran into an interesting problem tonight while troubleshooting an Edge server issue where the services won’t start with the event ID 19005 error logged in the Lync Server logs:

LS Audio/Video Authentication service could not be started.

Exception: Microsoft.Rtc.Management.ServiceConsumer.ServiceNotFoundException: Current topology configuration does not specify a service definition with a role name "EdgeServer" on the computer "LYNCEDGE01". If this service should be running on this computer, verify that the current topology configuration includes either the computer or the definition of the service. Otherwise, use the management tools to uninstall this service from this computer.
   at Microsoft.Rtc.Management.ServiceConsumer.ServiceConsumer.EnsureRegistrationWithTopologyWatcher()
   at Microsoft.Rtc.Management.ServiceConsumer.ServiceConsumer..ctor(RoleName roleName, EventHandler`1 configStoreConnectionStatusChangedEventHandler, EventHandler`1 myServiceDefinitionStatusChangedEventArgs)
   at Microsoft.Rtc.MRAS.Configuration..ctor(ConfigChangedHandler ConfigChangedEventHandler, RoleName roleName)
   at Microsoft.Rtc.MRAS.Core..ctor(ServiceStopHandler serviceStop, RoleName roleName)
   at Microsoft.Rtc.MRAS.Server.OnStart(RoleName roleName)
Cause: Internal error.
Resolution:
Examine the details in the associated event log entry to determine the potential cause and report to Product Support Services.

image

What’s interesting was that this Edge server was fully operational just a few days ago.  In hindsight, the error above should have given the enough information to identify the root cause but instead of realizing what was the issue, I went down another path which I will go through now (sorry but if you want, you can just scroll down for the solution) in case anyone ends up going down the same route before finding this post.

My first thought was to launch the Lync Server Deployment Wizard to review the deployment status and what surprised me was the following:

image

Administrators who have deployed Lync would recognize this as a screen that we would see if Lync wasn’t installed at all on this Edge server.  My next step was to re-export the CS configuration file then run Step 1: Install Local Configuration Store to see if it would complete:

image

… and while it did:

image

… the green check marks I was looking for beside the steps were still missing:

image

… so the next step I ran was Step 2: Setup or Remove Lync Server Components and that’s when I noticed the verbose output indicating this server was not found in the topology and went on to uninstall all of the Lync Edge services.  Once the process completed, I was left looking at the same screenshot as the one above.

Reviewing the other errors in the event logs continuously indicate that the Lync Edge server was not a part of the topology:

Event ID 42012 Error:

Current topology configuration either does not include this machine or the definition for this service.

Cause: Possible topology configuration issue. This machine or the definition for this service might have been removed from topology configuration.
Resolution:
If this service should be running on this machine, ensure that the current topology configuration includes either the machine or the definition of the service. Otherwise, please use management tools to uninstall this service from this machine.

image

Event ID 41986 Error:

The service failed to start.

Error Code: 8000FFFF!_HRX!
Cause: Application Error
Resolution:
Check the previous event log entries and resolve them. Restart the server. If the problem persists contact Product Support Services.

image

Event ID 3040 Error:

The replication of certificates from the central management store to the local machine failed due to a problem assigning the certificate to the services running on the local machine. Microsoft Lync Server 2013, Replica Replicator Agent will continuously attempt to retry the replication. While this condition persists, the certificates on the local machine will not be updated.

Exception: Microsoft.Rtc.Management.Deployment.UnknownObjectException: "lyncedge01" is not a known computer in the topology. It may be raised by incomplete replication.
   at Microsoft.Rtc.Management.Deployment.Core.Computer.GetComputer(DeploymentContext context, Topology topology, Fqdn fqdn, Boolean bCurrentVersionOnly)
   at Microsoft.Rtc.Internal.Tools.Bootstrapper.Bootstrapper.ReplicateCMSCertificates().
Cause: The service running on the local machine is not configured correctly or cannot use the certificate.
Resolution:
Ensure that local services are properly setup.  Run Enable-CSComputer Power Shell command on the local machine to validate the configuration.

image

The last error appeared to indicate that there might be a certificate error so I checked the stores to ensure the root and intermediate certificates were in the correct stores.

Solution

After combing through all of the settings, reviewing the Topology Builder where it clearly has the Edge server defined, I tried running the Enable-CSComputer power shell command on the Edge server and saw this output:

PS C:\Users\Administrator> enable-cscomputer

WARNING: enable-cscomputer failed.

WARNING: Detailed results can be found at

"C:\Users\Administrator\AppData\Local\Temp\2\enable-cscomputer-ca0c4ec1-dd36-4e

3a-adfc-927fce8f58c7.html".

enable-cscomputer : Command execution failed: "lyncedge01" is not a known

computer in the topology. It may be raised by incomplete replication.

At line:1 char:1

+ enable-cscomputer

+ ~~~~~~~~~~~~~~~~~

+ CategoryInfo : InvalidOperation: (:) [Enable-CsComputer], Unkno

wnObjectException

+ FullyQualifiedErrorId : ProcessingFailed,Microsoft.Rtc.Management.Deploy

ment.ActivateMachineCmdlet

PS C:\Users\Administrator>

image 

I don’t know why it took me this long to realize this but I realized I was actually staring at the issue all along and it was indicated in this line:

enable-cscomputer : Command execution failed: "lyncedge01" is not a known

computer in the topology. It may be raised by incomplete replication.

What I realized was that all the certificate references and how this server wasn’t in the topology was because this server is being referenced as the NetBIOS name while, as we all know, in Lync it is referenced by the FQDN.  So I go on and browse the Edge server’s computer properties:

image

Then open up the DNS Suffix and NetBIOS Computer Name setting and long behold, the suffix that represents the domain’s FQDN was missing:

image

So I proceed to fill it in and restart the computer:

image

Once the computer was restarted, I was able to successfully reinstall the Lync Edge services and start them up.  This was definitely a strange issue as I’m not even sure why the suffix went missing to begin with.  Funny how I was staring at the root cause and went completely a different root.  Perhaps I need to cut back on those 90 hour work weeks I’ve been having.

5 comments:

summer lyra miles said...

It is better to double check your code before you install it.





UPS BATTERIES

Anonymous said...

Thanks Terence! I had the exact same problem except mine only occurred after I uninstalled/reinstalled and this fixed it.

Anonymous said...

Thanks Terrence. Nicely written article! I was having this same problem primarily because the edge server is not domain-joined. Typically, the domain suffix takes care of this.

Anonymous said...

I had the same issue.. only the DNS suffix was there but the NetBios name needed to be refreshed. After removing both NetBios and DNS, I rebooted and entered them again. Then restarted and the install worked.

Anonymous said...

Thanks for the detailed article. Saved me a ton of time!