Thursday, November 13, 2008

More Plesk tips for troubleshooting

I must say that given the complexities of the amount of software packages that control panels (Plesk, cPanel, etc.) have to manage, they do a remarkably good job. Especially since the have to coordinate changes among dozens of different virtual hosts as well. And in Plesk's case, it does it in a cross-platform compatible way as well, working on Windows/IIS as well as Linux/Apache. But it still manages to frustrate me on a weekly basis with a head-scratching problem. This week went over quota with two problems, one of which I described here.

The second was caused after we upgraded a client's MailEnable installation from 3.14 to 3.6 on their Windows 2003/IIS Plesk server. From their own KB article, SWsoft says that you just have to follow the standard MailEnable upgrade instructions and Plesk will automatically detect the new version, so everything will be hunky-dory.

Well, after I finished the MailEnable upgrade, things were definitely not hunky-dory. The admin control panel was breaking with regularity, as the application pool repeatedly died and the Plesk event log filled up with errors. After shortening the process recycle time from 1740 to 240 minutes and upping the max processes from 1 to 4, the PleskControlPanel application pool stabilized. However, we were still seeing problems whenever we added or removed a domain. Multiple errors showed up in the Plesk event log similar to the following:


Execute websrvmng --start-vhost "--vhost-name=domainname.com" failed: Site domainname.com
Execute file name: C:\Program Files\SWsoft\Plesk\admin\bin\domainmng.exe

After digging through some forums and digging through the logs without finding a conclusive cause, I decided to go with my intuition. We'd had problems with MailEnable changing IIS settings before, so the best bet was to re-sync the IIS configuration from the stored Plesk configuration data using the websrvmng utility. Not much documentation exists for the *mng.exe command-line utilities found in the %plesk_bin% directory, but I've found them invaluable for fixing unknown problems such as this. I ran the following commands and after 10 minutes of processing, all was well again (fingers crossed).

websrvmng --reconfigure-all
ftpmng --reconfigure-all

Tuesday, November 11, 2008

Plesk License auto-renewal failures solved!

We've been battling a strange problem with Plesk licensing for several months. Plesk apparently requires your server to re-register itself with them every couple of months to make sure that its license key is still valid. If the key isn't valid, Plesk enters a grace period after which it will stop working. We've been seeing the following error whenever we try to renew the key:

Key Update Status:
Unable to update Plesk Key. An error occurred while processing your key.

You can try to update it later. The key cannot be upgraded due to the network failure during connection with the Key Authority server. Please check that your Internet connection is configured, you can resolve and access ka.swsoft.com and your firewall enables outgoing connections to TCP port 5224.


Of course, we have no problems connecting to port 5224 on ka.swsoft.com:

telnet ka.swsoft.com 5224
Trying 64.131.90.38...
Connected to ka.swsoft.com.
Escape character is '^]'.
....

The next step is obviously to call SWSoft's support, right? Well when we did that, we found out that since this server is leased from the Planet, they technically own the Plesk license and we'll have to go through their support. This is always a bad sign, when you have separate vendors support people talking to each other, they usually seem to be interested in seeing how they can lay the blame for the problem on the other vendor (and close out their own ticket). True to form, the first two times that our Plesk license was expiring, the Planet installed a temporary key and said they would follow up with SWSoft for a permanent solution. Of course, once the support ticket was closed we never heard back from them.

But the third time was the charm. Not only did I get a competent support person, but enough people had complained about the problem that SWSoft had released a document on how to fix the problem. It comes down to some corrupt MySQL rows in their license key storage table. Once those are removed, everything works just fine. Of course they never explain why those corrupt rows are there. It appears to be the same two corrupt rows for their example and our server, which makes me think that it is a widespread problem (perhaps caused by an upgrade), but I suppose that I have more pressing issues to deal with now.

Friday, November 7, 2008

PHP Profiling

For the past couple of days I've been troubleshooting performance problems with a Mediawiki installation on a server we maintain. It's run me through a lot of hoops trying to pinpoint the performance bottleneck. I didn't see anything obvious in the Apache or MySQL logs (although I did learn about the mysqldumpslow command, which makes digging through the slow query logs a lot easier).

Next I went to the Xdebug PHP extension. I've used it for debugging PHP problems before and I knew that it had a profiling component as well. Installation is a breeze if you have pecl installed:

pecl install xdebug

Edit php.ini and put in the following to enable profiling:

zend_extension=/usr/lib/php/modules/xdebug.so
[xdebug]
xdebug.default_enable=Off
xdebug.profiler_enable=On

I disabled the default stack traces since it was a production environment. I only left the profiler enabled for enough time for me to run my tests, since it generates a lot of information for each process. Of course, I guess I could have just enabled it for the Mediawiki directory if that had been a big issue, but it was a pretty low-traffic time, so I only had 15 files to sort through from my testing period. I copied over the /tmp/cachegrind.out.* files to my desktop and fired up KCachegrind to analyze the output.

It was much more enjoyable to use the color-coded GUI than looking through a text file. After finding the correct output file, I traced the bottleneck down to the MessageCache->lock() method. This Mediawiki installation uses memcached to cache db objects and it was just spinning there for insane amounts of time trying to lock objects in memcached.

Next I went to the Mediawiki debug log. As a side note, Mediawiki has its own profiling setup that you can use as well, although I prefer the GUI that Xdebug uses. I saw messages like this:

MessageCache::load(): cache is empty

Then 10 seconds or so later:

MessageCache::load(): unable to load cache, disabled

After a couple of false starts, including playing around with the APC cache as a replacement, it turns out that the memcached process Mediawiki was trying to use didn't exist. Apparently the startup commands had been put in /etc/rc.local and hadn't included the full path, so after the last reboot of the system, memcached hadn't restarted. A proper init script and a couple of chkconfig commands later, all was well.

Wednesday, November 5, 2008

MySQL Workbench

I've used DBDesigner in the past for MySQL data modeling, but that was discontinued a while back. It looks like Workbench might be my new tool for MySQL schema designing and visualization. I needed to do some schema comparisons from existing DBs today to write an upgrade script for a PHP/MySQL web application and Workbench wound up having just the features that I needed.

The steps that I needed to do to get the upgrade script were covered by this post. I'll keep evaluating it to see if I need the commercial version or not, but it's so cheap ($99), that I'll probably wind up buying it.