Saturday, July 10, 2010

Capacity Planner and Solaris Servers

Seeing how there hasn't been a lot of information for troubleshooting VMware's Capacity Planner assessment application, I thought I'd share an experience I had with Solaris servers on a project a few months ago.

The environment for the capacity planner project had the following operating systems:

1. Windows 2000/2003/2008
2. Red Hat Enterprise Linux 4
3. Solaris 9

Capacity Planner used: 2.7.2 - Executable: cp27setup_38649.exe

As with all the other capacity planner projects I've been involved in, there were minor issues for the Windows and Red Hat servers but nothing out of the ordinary. The Solaris servers however, were another story.

As seen in the following screenshot, I noticed that the Solaris servers were missing information.

Note: Sorry about the highlighted Red Hat servers, we fixed those issues after setting the proper permissions on some folders for the service account.


Since the client wanted to include these servers, I went ahead to try and troubleshoot issue with their Unix/Linux administrator.

Before I continue with the troubleshooting, note that VMware's Capacity Planner installation guide (cp_installation.pdf) clearly states that the following Solaris versions are supported:

Sun Solaris 7 (SPARC)
Sun Solaris 8 (SPARC)
Sun Solaris 9 (SPARC)
Sun Solaris 9 (x86)
Sun Solaris 10 (SPARC)
Sun Solaris 10 (x86)

The client's Solaris servers were @ version 9.

We began by validating the service account on the Solaris servers by logging on while ensuring we didn't get prompted for a password and reviewing the permissions. Everything checked out so we proceeded with executing the commands in the scripts manually to see if we were able to see output and that checked out:

-------------------
login as: tech
tech@SolarisServer's password:
Last login: Tue Apr 20 11:26:59 2010 from vicviper2.domain.
Sun Microsystems Inc. SunOS 5.9 Generic May 2002
Sun Microsystems Inc. SunOS 5.9 Generic May 2002
$ which sudo
/usr/local/bin/sudo
$ sudo
usage: sudo -K -L -V -h -k -l -v
usage: sudo [-HPSb] [-p prompt] [-u username#uid]
{ -e file [...] -i -s }
$ sudo vmstat
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr m0 m1 m3 m1 in sy cs us sy id
0 0 0 4681464 1860016 5 24 0 0 0 0 0 0 0 1 0 330 258 378 0 1 98
$
-------------------

I reran the collector job from the data collector so we tried the above on another Solaris server:

-------------------
login as: tech
tech@someSolarisServer's password:
Last login: Tue Apr 20 11:26:32 2010 from vicviper2.domain.
Sun Microsystems Inc. SunOS 5.9 Generic May 2002
Sun Microsystems Inc. SunOS 5.9 Generic May 2002
$ which sudo
/usr/local/bin/sudo
$ sudo
usage: sudo -K -L -V -h -k -l -v
usage: sudo [-HPSb] [-p prompt] [-u username#uid]
{ -e file [...] -i -s }
$ sudo vmstat
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr dd dd m0 m1 in sy cs us sy id
0 0 0 2908608 1102968 7 33 29 0 0 0 0 4 4 0 0 52 139 12 1 1 97
$
-------------------

Since the data collector still couldn't pull the proper information, we proceeded to try and execute the scripts directly on the server. After executing the scripts without any issues, we decided to open a support ticket with VMware.

-------------------
$ ls -l
total 592
-rw-r--r-- 1 techul other 139469 Apr 20 14:01 Perf_server_20100420140001.txt
-rw-r--r-- 1 techul other 134765 Apr 20 15:01 Perf_server_20100420150000.txt
$
-------------------

As many of you who may have called VMware support for a non-critical issue, support can take awhile. We went back and forth with VMware as we continued to troubleshoot the problem. Some of the things we noticed were:

1) Users command returns no data because there is no data to return (logged in users)

2) Lack of memory stats or disk stats. In Linux you can access /proc/meminfo for memory data. There is no equivalent in Solaris 9. All files under /proc are numerical.

After another week of troubleshooting with the VMware support engineer, we finally discovered that the Unix scripts don’t always detect Solaris (SunOS) properly. As a result, some of the statistics are gathered using the wrong method, which is why we have some data, but not all.

We had the ability to hand edit some of the scripts on the machines, but we’ll then need to manually import the data which didn't seem practical. This was when the client decided that we will have to omit these servers from the data collector.

I hope someone out there had more luck with this, if you have and had success, feel free to let me know what needed to be done. Thanks.

2 comments:

Ross said...

Hi Terence, did you ever find a workable solution for the VMware CP SunOS issues you encountered? I'm having exactly the same issues as your blog mentions...

Cheers,
Ross

Terence Luk said...

Hi Ross,

Unfortunately, we pretty much hit a dead end after:

1. Troubleshooting with multiple SunOS admins trying to figure out a workaround.

2. Troubleshooting directly with VMware (tech name: Elton John).

The client and I were basically told that the scripts weren't pulling the information properly so even though we were getting some numbers, they probably weren't accurate.

With that being said, the client for this project decided that it was not worth trying to escalate the issue and find other ways of rectifying the issue so we ended up excluding the servers to continue with the assessment and report. It may be worth while for you to reach out to VMware technical support and try your luck with them.

Sorry I couldn't help more but if I do run into this same problem again in the future and find a resolution, I will post back here.

Thanks for checking out my post.

Terence