Monday, July 12, 2010

Problem with UCS northbound connection to 3750e

We currently have a redundant UCS setup with:

1. 2 x Fabric interconnects
2. 2 x Chassis
3. 2 x 3750e

What we've noticed for the past 2 months is that the northbound connection from the fabric interconnect to the second 3750e would continuous go down after a few hours. We originally thought it might be the 2x extender, SFP or the fibre cable so we went ahead and tried swapping them but the problem kept coming back. This was when we decided to swap out the switches and update the firmware to 12.2(53)SG2 but once we completed the swap, we noticed that both ports on the 3750s would almost immediately go down. As soon as we get the the port back up, the other port would go down with the LED lights going out. What we've noticed through logging onto the 3750 was that there appeared to be a thermal threshold being exceeded thus the port would shutdown.

After a few hours of troubleshooting we finally engaged Cisco TAC and it looks like there was a bug that was discovered earlier in the day. This was when we started mapping out what versions we've tried and what version we can use. Once we reviewed all the versions that supported the CVR-X2-SFP10G, we found that the only versions left were:

1. 12.2(53)SG
2. 12.2(53)SG1

We've gone ahead and loaded #1 and the ports remained up for 4 to 8 hours but then the second 3750e's port would go down. We have yet to try #2 but we can't perform any maintenance on the infrastructure for the time being so we're waiting for Cisco to get back to us.

-------Update-------

The modules we used in our environment are:
CVR-X2-SFP10G
SFP-10G-SR

The errors and bugs:

%GBIC_SECURITY_CRYPT-4-VN_DATA_CRC_ERROR: GBIC in port Te3/0/2 has bad crc (NYNYCore01-3)
%SFF8472-5-THRESHOLD_VIOLATION: Te3/0/2: Rx power high alarm; Operating value: 8.1 dBm, Threshold value: 2.0 dBm

Versions tried:
12.2(53)SE , 12.2(53)SE1 & 12.2(53)SE2

No comments: