Friday, December 10, 2010

Error accessing webpage for FilerView: Reinstalling Data ONTAP on a NetApp FAS controller

Due to some scheduling issues with a colleague of mine, I had to fill in for him to do the initial setup of a NetApp FAS3140 with an active/active controller setup. I’ve done this in the past before and it’s not too difficult as long as you spend the time to have all the names and IPs ready but what was supposed to take a few hours ended up taking 2 days of my time with NetApp to finally get the problem resolved.

Problem

You’ve completed the initial setup through the console cable on your NetApp FAS controller(s) and would like to continue setting up the aggregates, volumes and LUNs via the web GUI. As expected, when you navigate to the IP of the controller, you receive the following message:

Error 503

HTTP is not licensed.

To administer this filer, use /na_admin/.

image

As usual, you proceed with clicking on the /na_admin/ link so you get prompted for the root password and once you enter the credentials, you expect to see this:

image

…but instead of seeing the screenshot above, you see this page:

HTTP 500 Internal Server Error

The website cannot display the page

HTTP 500

Most likely causes:
  • The website is under maintenance.
  • The website has a programming error.

What you can try:

clip_image001

Refresh the page.

clip_image001[1]

Go back to the previous page.

clip_image001[2]

More information

This error (HTTP 500 Internal Server Error) means that the website you are visiting had a server problem which prevented the webpage from displaying.

For more information about HTTP errors, see Help.

image

Troubleshooting the Issue

I went ahead to check all the settings I could think of ensuring that services were enabled but couldn’t figure it out after trying for an hour so I went ahead and called NetApp support. The engineer had me check my browser Java settings and the services again but we could not figure out what was wrong. After exhausting the list of items the engineer could find in their knowledge base, we decided to try and reinstall Data ONTAP but because it was near the end of the day, we had to put it off for the next day.

Solution

Fast forward a week later when I finally got time booked for this project again, I called NetApp again and got a different engineer which proceeded to do a few other checks he could think. What was interesting this time around was that the engineer said it looked as if “a light version of the kernel was installed” which would explain why certain components would work and some won’t. So just as the previous engineer suggested, we decided to reinstall Data ONTAP.

We proceeded to download the application from now.netapp.com:

image

image

Note: The reason why we opted with 7.3.2 instead of a later version was because the engineer said installing the same software did not require a reboot of the filer but what I noticed was that the filer still had to be rebooted at the end.

It was initially suggested that I use a TFTP server but then realized that when we executed the command, the filer would state that it preferred HTTP. The engineer then suggested to use FTP but we couldn’t get that going either because authentication would keep failing. What I ended up doing was put the package on a server that had IIS and then executed the command on the filer to download the package:

software get http://webserver/732_setup_q.exe

Once the command above is executed, you’ll see the following status:

software: 0% file read from location.
software: 5% file read from location.
software: 10% file read from location.
software: 15% file read from location.
software: 20% file read from location.
software: 25% file read from location.
software: 30% file read from location.
software: 35% file read from location.
software: 40% file read from location.
software: 45% file read from location.
software: 50% file read from location.
software: 55% file read from location.
software: 60% file read from location.
software: 65% file read from location.
software: 70% file read from location.
software: 75% file read from location.
software: 80% file read from location.
software: 85% file read from location.
software: 90% file read from location.
software: 95% file read from location.
software: 100% file read from location.
software: 100% file read from location.

software: /etc/software/732_setup_q.exe has been copied.

Once the filer has successfully downloaded the software, you can execute the install with the following command:

FAS3140A> software install /etc/software/732_setup_q.exe

Once the command above is executed, you’ll see the following status:

software: You can cancel this operation by hitting Ctrl-C in the next 6 seconds.

software: Depending on system load, it may take many minutes

software: to complete this operation. Until it finishes, you will

software: not be able to use the console.

software: installing software, this could take a few minutes...

software: installation of 732_setup_q.exe completed.

Please type "download" to load the new software,

and "reboot" subsequently for the changes to take effect.

One the above process completes, the final step is to execute the following command which will automatically reboot the filer:

FAS3140A> software update 732_setup_q.exe

Once the command above is executed, you’ll see the following status:

software: You can cancel this operation by hitting Ctrl-C in the next 6 seconds.

software: Depending on system load, it may take many minutes

software: to complete this operation. Until it finishes, you will

software: not be able to use the console.

software: installing software, this could take a few minutes...

software: Data ONTAP(R) Package Manager Verifier 1

software: Validating metadata entries in /etc/boot/NPM_METADATA.txt

software: Checking sha1 checksum of file checksum file: /etc/boot/NPM_FCSUM-x86-64.sha1.asc

software: Checking sha1 file checksums in /etc/boot/NPM_FCSUM-x86-64.sha1.asc

software: installation of 732_setup_q.exe completed.

Thu Dec 9 18:43:38 GMT [cmds.software.installDone:info]: Software: Installation of 732_setup_q.exe was completed.

Thu Dec 9 18:43:38 GMT [download.request:notice]: Operator requested download initiated

download: Downloading boot device

download: If upgrading from a version of Data ONTAP prior to 7.3, please ensure

download: there is at least 3% of available space on each aggregate before

download: upgrading. Additional information can be found in the release notes.

download: No bootblock on partition 1

download: Partition layout needs updating.

download: Partition 1 has size 1015021568, type 6.

download: Updating Partition layout and Loader.

tavorvia_cpeer_reinit: hung while flushing cpeer vi sendq
..........download: Failed to backup primary kernel image 1:/x86_64/kernel/primary.krn. Still continuing with download.

.......tavorvia_cpeer_reinit: hung while flushing cpeer vi sendq
........tavorvia_cpeer_reinit: hung while flushing cpeer vi sendq

download: Downloading boot device (Service Area)

...tavorvia_cpeer_reinit: hung while flushing cpeer vi sendq
......tavorvia_cpeer_reinit: hung while flushing cpeer vi sendq
......

Thu Dec 9 18:47:36 GMT [download.requestDone:notice]: Operator requested download completed

Thu Dec 9 18:47:40 GMT [kern.shutdown:notice]: System shut down because : "reboot".

Thu Dec 9 18:47:40 GMT [perf.archive.stop:info]: Performance archiver stopped.

Phoenix TrustedCore(tm) Server
Copyright 1985-2006 Phoenix Technologies Ltd.
All Rights Reserved
BIOS version: 4.3.0
Portions Copyright (c) 2007-2009 NetApp. All Rights Reserved.
CPU= Dual-Core AMD Opteron(tm) Processor 2216 X 1
Testing RAM
512MB RAM tested
4096MB RAM installed
Fixed Disk 0: STEC

Boot Loader version 1.7
Copyright (C) 2000-2003 Broadcom Corporation.
Portions Copyright (C) 2002-2009 NetApp

CPU Type: Dual-Core AMD Opteron(tm) Processor 2216

Starting AUTOBOOT press Ctrl-C to abort...
Loading x86_64/kernel/primary.krn:..............0x200000/45505032 0x2d65a08/17650688 0x3e3ae08/6194359 0x44232bf/1 Entry at 0x00202018
Starting program at 0x00202018
Press CTRL-C for special boot menu
Thu Dec 9 18:48:28 GMT [nvram.battery.state:info]: The NVRAM battery is currently OFF.
Thu Dec 9 18:48:30 GMT [nvram.battery.turned.on:info]: The NVRAM battery is turned ON. It is turned OFF during system shutdown.
Thu Dec 9 18:48:32 GMT [iomem.card.enable:info]: Acceleration card in slot 1 has been enabled.

NetApp Release 7.3.2: Thu Oct 15 04:12:15 PDT 2009
Copyright (c) 1992-2009 NetApp.
Starting boot on Thu Dec 9 18:48:26 GMT 2010
Thu Dec 9 18:48:39 GMT [diskown.isEnabled:info]: software ownership has been enabled for this system
Thu Dec 9 18:48:41 GMT [fmmb.current.lock.disk:info]: Disk 0c.16 is a local HA mailbox disk.
Thu Dec 9 18:48:41 GMT [fmmb.current.lock.disk:info]: Disk 0c.17 is a local HA mailbox disk.
Thu Dec 9 18:48:41 GMT [fmmb.instStat.change:info]: normal mailbox instance on local side.
Thu Dec 9 18:48:41 GMT [fmmb.current.lock.disk:info]: Disk 0c.23 is a partner HA mailbox disk.
Thu Dec 9 18:48:41 GMT [fmmb.current.lock.disk:info]: Disk 0c.24 is a partner HA mailbox disk.
Thu Dec 9 18:48:41 GMT [fmmb.instStat.change:info]: normal mailbox instance on partner side.
Thu Dec 9 18:48:42 GMT [raid.cksum.replay.summary:info]: Replayed 0 checksum blocks.
Thu Dec 9 18:48:42 GMT [raid.stripe.replay.summary:info]: Replayed 0 stripes.
sparse volume upgrade done. num vol 0.
Vdisk Snap Table for host:0 is initialized
Thu Dec 9 18:48:43 GMT [rc:notice]: The system was down for 62 seconds

Thu Dec 9 18:48:43 GMT [rc:info]: Registry is being upgraded to improve storing of local changes.

Thu Dec 9 18:48:43 GMT [rc:info]: Registry upgrade successful.

Thu Dec 9 18:48:45 GMT [perf.archive.start:info]: Performance archiver started. Sampling 22 objects and 198 counters.

Thu Dec 9 18:48:45 GMT [dfu.firmwareUpToDate:info]: Firmware is up-to-date on all disk drives

Thu Dec 9 18:48:46 GMT [netif.linkUp:info]: Ethernet e0M: Link up.

Thu Dec 9 18:48:48 GMT [netif.linkUp:info]: Ethernet e0a: Link up.

Thu Dec 9 18:48:52 GMT [netif.linkDown:info]: Ethernet e0b: Link down, check cable.

Thu Dec 9 18:48:54 GMT [netif.linkDown:info]: Ethernet e2a: Link down, check cable.

add net default: gateway 172.25.5.1

Thu Dec 9 18:48:55 GMT [snmp.agent.msg.access.denied:warning]: Permission denied for SNMPv3 requests from root. Reason: Password is too short (SNMPv3 requires at least 8 characters).

Thu Dec 9 18:48:55 GMT [mgr.boot.disk_done:info]: NetApp Release 7.3.2 boot complete. Last disk update written at Thu Dec 9 18:47:42 GMT 2010

Thu Dec 9 18:48:55 GMT [cf.fm.unexpectedAdapter:warning]: Warning: clustering is not licensed yet an interconnect adapter was found. NVRAM will be divided into two parts until adapter is removed

Thu Dec 9 18:48:55 GMT [cf.fm.unexpectedPartner:warning]: Warning: clustering is not licensed yet the node once had a cluster partner

Thu Dec 9 18:48:55 GMT [mgr.boot.reason_ok:notice]: System rebooted after a reboot command.

Thu Dec 9 18:48:56 GMT [shelf.config.spha:info]: System is using single path HA attached storage only.

FAS3140A> Thu Dec 9 18:48:56 GMT [console_login_mgr:info]: root logged in from console

Thu Dec 9 18:49:22 GMT [asup.post.host:info]: Autosupport (PERFORMANCE DATA) cannot connect to url support.netapp.com/asupprod/post/1.0/postAsup (Could not find hostname 'support.netapp.com', hostname lookup resolution error: Unknown host)

Thu Dec 9 18:49:56 GMT [asup.post.host:info]: Autosupport (PERFORMANCE DATA) cannot connect to url support.netapp.com/asupprod/post/1.0/postAsup (Could not find hostname 'support.netapp.com', hostname lookup resolution error: Unknown host)

Thu Dec 9 18:54:15 GMT [asup.post.host:info]: Autosupport (PERFORMANCE DATA) cannot connect to url support.netapp.com/asupprod/post/1.0/postAsup (Could not find hostname 'support.netapp.com', hostname lookup resolution error: Unknown host)

Thu Dec 9 18:58:16 GMT [asup.post.host:info]: Autosupport (PERFORMANCE DATA) cannot connect to url support.netapp.com/asupprod/post/1.0/postAsup (Could not find hostname 'support.netapp.com', hostname lookup resolution error: Unknown host)

Thu Dec 9 19:00:00 GMT [kern.uptime.filer:info]: 7:00pm up 11 mins, 0 NFS ops, 0 CIFS ops, 0 HTTP ops, 0 FCP ops, 0 iSCSI ops

Thu Dec 9 19:02:11 GMT [asup.post.host:info]: Autosupport (PERFORMANCE DATA) cannot connect to url support.netapp.com/asupprod/post/1.0/postAsup (Could not find hostname 'support.netapp.com', hostname lookup resolution error: Unknown host)

Thu Dec 9 19:05:43 GMT [asup.post.host:info]: Autosupport (PERFORMANCE DATA) cannot connect to url support.netapp.com/asupprod/post/1.0/postAsup (Could not find hostname 'support.netapp.com', hostname lookup resolution error: Unknown host)

Thu Dec 9 19:10:00 GMT [asup.post.host:info]: Autosupport (PERFORMANCE DATA) cannot connect to url support.netapp.com/asupprod/post/1.0/postAsup (Could not find hostname 'support.netapp.com', hostname lookup resolution error: Unknown host)

Thu Dec 9 19:14:14 GMT [asup.post.host:info]: Autosupport (PERFORMANCE DATA) cannot connect to url support.netapp.com/asupprod/post/1.0/postAsup (Could not find hostname 'support.netapp.com', hostname lookup resolution error: Unknown host)

Thu Dec 9 19:17:55 GMT [asup.post.host:info]: Autosupport (PERFORMANCE DATA) cannot connect to url support.netapp.com/asupprod/post/1.0/postAsup (Could not find hostname 'support.netapp.com', hostname lookup resolution error: Unknown host)

Thu Dec 9 19:37:03 GMT last message repeated 5 times

Thu Dec 9 19:41:04 GMT [asup.post.host:info]: Autosupport (PERFORMANCE DATA) cannot connect to url support.netapp.com/asupprod/post/1.0/postAsup (Could not find hostname 'support.netapp.com', hostname lookup resolution error: Unknown host)

Thu Dec 9 19:41:04 GMT [asup.post.drop:error]: Autosupport message (/etc/log/autosupport/201011210000.1) for host (0) was not posted to NetApp. The system will drop the message

Once the filer completed the reboot, I went ahead to try accessing the Filer View web interface and I was able to successfully do so:

image

Lesson learned: If you’re not sure what has been done with the filer and certain components don’t work, reinstall the Data ONTAP. :)

4 comments:

Fajar said...

Hi Terence,
Thanks for sharing.
I have similar Netapp filer, IBM N3600 configured with active/active cluster.

All seems good except that the filer cannot connect to the autosupport url although I'm sure the network settings is correct.

Sat Jan 22 18:22:05 SGT [n3600a: asup.post.host:info]: Autosupport (REBOOT (reboot command)) cannot connect to url eccgw01.boulder.ibm.com/support/electronic/nas (Could not find hostname 'eccgw01.boulder.ibm.com', hostname lookup resolution error: Unknown host)


Do you have any suggestion on how to troubleshoot it?
I wish there is a 'dig' or 'tracert' command in the filer.

Terence Luk said...

Hi Fajar,

If it's a rebranded NetApp filer, I'm not sure how can go about on troubleshooting your issue (don't want to give you bad advice). Is the filer in production yet?

I've always been happy with the support we receive from NetApp when we call them. so I think the best suggestion I can give you is to give IBM support a call so they can get someone to take the safest approach for you.

Tech solution Blog said...

Hi Terence,

This post help me to fix the issue, I had same problem on FAS2050 server. Steps were easy to understand.

Thanks
Techie guy

digital signature software said...

You just made something that I thought was so difficult be, truly, so easy! Thanks for the post!