Thursday, May 7, 2009

Crossbow, Xen & Linux fun

Recently I had to re-install an OpenSuse on a box a co-worker had installed about two years ago. After his native Linux installation had managed to shred a little data (roughly 1TB loss) on an external raid, I decided to put something decent on the box ;-)
Ok to be fair it was kind of a bad combination that the raid box had a failure on one of the ports and Linux kept writing happily to the drives despite the errors it must have encountered (at least there where log entries about write errors). Seems OpenSuse's default settings for Ext3 are a really *BAD* idea. It is set to "on error continue". Looks like a safe way to lose data.
Luckily not a real loss in data, raw data are in a safe place, but some guys have to re-run their compute jobs (and hopefully keep in mind to make a copy to our HSM once they're finished).

Well after all it is a Sun Box (x4600), so it is now running SXCE (b112 at the moment). ZFS handling the attached disks (now the raid box is down to simply a JBOD) just makes me sleep better ;-)
The Suse installation is now running as a xVM pvm domU. Solaris doing the disk I/O and sharing the data to the domU, Suse running Matlab.
domU installation was a bit trickier than the Fedora 8 install I did a while ago for a few other domUs. It took me a while to find the right combination to make the Suse domU boot off the virtual disk.
After that I decided to try out the crossbow stuff. Especially the etherstubs looked interesting. So two dladm commands later I had an etherstub (kind of a virtual network switch within Solaris) and a virtual nic setup. Plumb the vnic, set an IP address, ping the "switch" works just fine. So next thing I attached another nic to my domU and brigded it over the etherstub. The new vnic gets created by xVM and shows up in the domU.
Assign an IP address, send a ping and wait... and wait... and wait... I did expect an answer. Well, seems like something went wrong.
A little snooping on the Solaris instance and I see packets coming and going. They just never make it into the Linux world.
The only thing that made me wonder was the different MTU on Solaris and Linux. Solaris quite happily set the MTU to 9000, while Linux had it set to 1500. After ditching the vnics, setting the MTU on the etherstub to 1500 and recreating the vnic with MTU 1500 ping finally got an answer.
Not sure if this is a Solaris problem, a feature or something completely different. Though I tend to blame Linux for not supporting jumbo frames ;-)
Well maybe it does. But at least not for the xen-provided virtual network card.
So if you come to a point where you want to make a Linux domU talk to a Solaris box over an etherstub and lose the packets you may want to check the MTU setting.