Tuesday, December 16, 2008

Sun IDM now supports MySQL in production

For a long time, Sun has supported use of a MySQL repository for development purposes only with their Identity Manager product. After Sun purchased MySQL, it seemed like only a matter of time until they supported it in production, as Sean O'Neill detailed here.

Well finally it looks like there is a low-cost option for IDM repositories. With patch 137621-08, Sun has released support for MySQL in IDM 7.1 and patch 139010-04 does the same for IDM 8.0.

We're currently using Oracle and it sure would be nice to move to MySQL. The repository doesn't use any of the advanced functionality that Oracle provides and administering an Oracle database is a lot more work than MySQL. Actually most of the headaches are caused by Oracle's "creative" licensing practices.

Wednesday, December 10, 2008

#@%#$ user-customizable config files (AKA nss_ldap configuration)

I just finished configuring a RHEL system to pull its account information from a central LDAP directory. This shouldn't have been a big deal, I've done it many times before in a lot of different permutations. But this time, TLS certificate verification just would not work no matter what I did. I'd copied the certificate from the LDAP server, put in all of the possible configuration options possible and still nothing.

My tests worked with openssl:

# openssl s_client -connect hostname:636 -CAfile /etc/openldap/cacerts/hostname.domain.com.cert
...

...
Verify return code: 0 (ok)
---
DONE


but not ldapsearch or getent passwd:

# ldapsearch -v -D "uid=user,ou=services,dc=domain,dc=com" -w password -H "ldaps://hostname.domain.com:636" -b "dc=domain,dc=com" -s sub -Z -x "(uid=b*)"
ldap_initialize( ldaps://hostname.domain.com:636 )
ldap_start_tls: Can't contact LDAP server (-1)
ldap_bind: Can't contact LDAP server (-1)


After scratching my head for a while, fiddling with a bunch of configuration options (which I'll talk more about later), I finally noticed the mention of .ldaprc files in the ldap.conf(5) man page. These are user-customizable config files similar to .bashrc or .netrc that sit in a user's home directory and override the global configuration values. I check the /root/.ldaprc file and bingo!!! There's a reference to a non-existent certificate for the TLS_CERT setting. I comment this out and everything starts working. It turns out that a developer had been experimenting with client certificate authentication to LDAP and had left this setting in after their PoC was done.

On a side note, nss_ldap configuration is confusing even without random .ldaprc files getting in your way because you have to deal with both the /etc/ldap.conf and /etc/openldap/ldap.conf files and it is not at all clear what should go in which file. Many of the configuration options overlap, so you will often see duplicate settings in both files. A benefit of all the experimentation that I had to do to solve this problem was that I have a much clearer idea of what needs to be set in which file. For our setup (RHEL 4 client, SSL LDAP server on port 636, no anonymous LDAP access) I configured the following:

/etc/ldap.conf:

# hostname
host hostname.domain.com

# search base
base dc=domain,dc=com

# service user to perform searches
binddn uid=user,ou=services,dc=domain,dc=com

# service user password
bindpw password

# port
port 636

# passwd file configuration
nss_base_passwd dc=domain,dc=com?sub

# enable ssl
ssl on

# Verify the certificate
tls_checkpeer yes

# disable SASL
sasl_secprops maxssf=0
use_sasl off


/etc/openldap/ldap.conf: (only one line, but it's vital!)
# The location of the trusted certificate (or its CA certificate)
TLS_CACERT /etc/openldap/cacerts/watson.uoregon.edu.cert

Wednesday, December 3, 2008

Zimbra 5 memory usage tweaking

We support several small installations of Zimbra (less than 20 users) along with a couple of larger installations. Everyone loves the features that it provides and the price tag for the ZCS edition. In the past six months we've upgraded several clients from Zimbra 4 to Zimbra 5. In most of the cases, the driving factor was support for Firefox 3 and/or Safari 3.

One thing that has come up in the small Zimbra upgrades is a big increase in memory usage with the default settings. The default Zimbra settings are appropriate for installations supporting a couple of hundred users, but are overkill for for small user bases. Below is a set of the steps that we've taken with the small installations:

* Reduce the amount of memory that the Java mailbox process and the MySQL server can use. By default these are 30% and 40% respectively. We've been using 25% for both and seen no problems so far.

zmlocalconfig -e mailboxd_java_heap_memory_percent=25
zmlocalconfig -e mysql_memory_percent=25

* Reduce the number of amavisd processes that are started by default. Amavis is a mail content checker that is used by the anti-spam and anti-virus components. By default 10 processes start up, each taking up about 45 MB of memory. We've reduced this to two without seeing any ill effects.

Edit $ZIMBRA_HOME/conf/amavisd.conf.in and change this line:
$max_servers = 10;
to:
$max_servers = 2;

* If you are really strapped for memory, you can also disable anti-virus, but make sure you have some other method of virus filtering enabled first! Whether it's on a mail filtering service that sits in front of your mail server or on the desktop, you need some sort of protection.

Thursday, November 13, 2008

More Plesk tips for troubleshooting

I must say that given the complexities of the amount of software packages that control panels (Plesk, cPanel, etc.) have to manage, they do a remarkably good job. Especially since the have to coordinate changes among dozens of different virtual hosts as well. And in Plesk's case, it does it in a cross-platform compatible way as well, working on Windows/IIS as well as Linux/Apache. But it still manages to frustrate me on a weekly basis with a head-scratching problem. This week went over quota with two problems, one of which I described here.

The second was caused after we upgraded a client's MailEnable installation from 3.14 to 3.6 on their Windows 2003/IIS Plesk server. From their own KB article, SWsoft says that you just have to follow the standard MailEnable upgrade instructions and Plesk will automatically detect the new version, so everything will be hunky-dory.

Well, after I finished the MailEnable upgrade, things were definitely not hunky-dory. The admin control panel was breaking with regularity, as the application pool repeatedly died and the Plesk event log filled up with errors. After shortening the process recycle time from 1740 to 240 minutes and upping the max processes from 1 to 4, the PleskControlPanel application pool stabilized. However, we were still seeing problems whenever we added or removed a domain. Multiple errors showed up in the Plesk event log similar to the following:


Execute websrvmng --start-vhost "--vhost-name=domainname.com" failed: Site domainname.com
Execute file name: C:\Program Files\SWsoft\Plesk\admin\bin\domainmng.exe

After digging through some forums and digging through the logs without finding a conclusive cause, I decided to go with my intuition. We'd had problems with MailEnable changing IIS settings before, so the best bet was to re-sync the IIS configuration from the stored Plesk configuration data using the websrvmng utility. Not much documentation exists for the *mng.exe command-line utilities found in the %plesk_bin% directory, but I've found them invaluable for fixing unknown problems such as this. I ran the following commands and after 10 minutes of processing, all was well again (fingers crossed).

websrvmng --reconfigure-all
ftpmng --reconfigure-all

Tuesday, November 11, 2008

Plesk License auto-renewal failures solved!

We've been battling a strange problem with Plesk licensing for several months. Plesk apparently requires your server to re-register itself with them every couple of months to make sure that its license key is still valid. If the key isn't valid, Plesk enters a grace period after which it will stop working. We've been seeing the following error whenever we try to renew the key:

Key Update Status:
Unable to update Plesk Key. An error occurred while processing your key.

You can try to update it later. The key cannot be upgraded due to the network failure during connection with the Key Authority server. Please check that your Internet connection is configured, you can resolve and access ka.swsoft.com and your firewall enables outgoing connections to TCP port 5224.


Of course, we have no problems connecting to port 5224 on ka.swsoft.com:

telnet ka.swsoft.com 5224
Trying 64.131.90.38...
Connected to ka.swsoft.com.
Escape character is '^]'.
....

The next step is obviously to call SWSoft's support, right? Well when we did that, we found out that since this server is leased from the Planet, they technically own the Plesk license and we'll have to go through their support. This is always a bad sign, when you have separate vendors support people talking to each other, they usually seem to be interested in seeing how they can lay the blame for the problem on the other vendor (and close out their own ticket). True to form, the first two times that our Plesk license was expiring, the Planet installed a temporary key and said they would follow up with SWSoft for a permanent solution. Of course, once the support ticket was closed we never heard back from them.

But the third time was the charm. Not only did I get a competent support person, but enough people had complained about the problem that SWSoft had released a document on how to fix the problem. It comes down to some corrupt MySQL rows in their license key storage table. Once those are removed, everything works just fine. Of course they never explain why those corrupt rows are there. It appears to be the same two corrupt rows for their example and our server, which makes me think that it is a widespread problem (perhaps caused by an upgrade), but I suppose that I have more pressing issues to deal with now.

Friday, November 7, 2008

PHP Profiling

For the past couple of days I've been troubleshooting performance problems with a Mediawiki installation on a server we maintain. It's run me through a lot of hoops trying to pinpoint the performance bottleneck. I didn't see anything obvious in the Apache or MySQL logs (although I did learn about the mysqldumpslow command, which makes digging through the slow query logs a lot easier).

Next I went to the Xdebug PHP extension. I've used it for debugging PHP problems before and I knew that it had a profiling component as well. Installation is a breeze if you have pecl installed:

pecl install xdebug

Edit php.ini and put in the following to enable profiling:

zend_extension=/usr/lib/php/modules/xdebug.so
[xdebug]
xdebug.default_enable=Off
xdebug.profiler_enable=On

I disabled the default stack traces since it was a production environment. I only left the profiler enabled for enough time for me to run my tests, since it generates a lot of information for each process. Of course, I guess I could have just enabled it for the Mediawiki directory if that had been a big issue, but it was a pretty low-traffic time, so I only had 15 files to sort through from my testing period. I copied over the /tmp/cachegrind.out.* files to my desktop and fired up KCachegrind to analyze the output.

It was much more enjoyable to use the color-coded GUI than looking through a text file. After finding the correct output file, I traced the bottleneck down to the MessageCache->lock() method. This Mediawiki installation uses memcached to cache db objects and it was just spinning there for insane amounts of time trying to lock objects in memcached.

Next I went to the Mediawiki debug log. As a side note, Mediawiki has its own profiling setup that you can use as well, although I prefer the GUI that Xdebug uses. I saw messages like this:

MessageCache::load(): cache is empty

Then 10 seconds or so later:

MessageCache::load(): unable to load cache, disabled

After a couple of false starts, including playing around with the APC cache as a replacement, it turns out that the memcached process Mediawiki was trying to use didn't exist. Apparently the startup commands had been put in /etc/rc.local and hadn't included the full path, so after the last reboot of the system, memcached hadn't restarted. A proper init script and a couple of chkconfig commands later, all was well.

Wednesday, November 5, 2008

MySQL Workbench

I've used DBDesigner in the past for MySQL data modeling, but that was discontinued a while back. It looks like Workbench might be my new tool for MySQL schema designing and visualization. I needed to do some schema comparisons from existing DBs today to write an upgrade script for a PHP/MySQL web application and Workbench wound up having just the features that I needed.

The steps that I needed to do to get the upgrade script were covered by this post. I'll keep evaluating it to see if I need the commercial version or not, but it's so cheap ($99), that I'll probably wind up buying it.