Murphy

From Redbrick Wiki
Revision as of 21:00, 21 February 2010 by Werdz (talk | contribs) (Added paragraph about linux)
Servers

Storage

Desktops

Deceased Hardware

Murphy ~ 136.206.15.14

Details

Type:Sun T2000
OS:Ubuntu 6.06 Solaris 10 Ubuntu 6.06 Ubuntu 7.10 Solaris 10 OpenBSD 4.4 Solaris 10 Debian Lenny
CPU:8-core (32 threads) 1Ghz UltraSPARC T1 (Niagara)
RAM:16GB
Disks:2x 73Gb 10,000 RPM SAS Disks
Drives:Internal DVD drive
Network:4x Gigabit Ethernet BroadCom
Extras:Onboard Advanced Lights-Out Management (ALOM) card.
The Name:Named in honour of RedBrick founder David Murphy (drjolt).

Description

Murphy was donated to RedBrick in April 2006 by a member of the society. It is named after David Murphy (drjolt) who was the founding system administrator of the society, and who died tragically in January 2006.

The machine is an 8-core UltraSPARC monster. It will gladly chew through any normal multithreaded application you throw at it (That is not a challenge). It sucks horrendously at floating-point operations, though.

Services

  • Primary Web Server
  • Available for login
  • Planned: Tomcat server
  • Solaris keeps the admins on their toes

Brief (hah) history of it's OSes

When Sun first released the (somewhat revolutionary) Niagara processor, Canonical (the people who co-ordinate Ubuntu) made a big deal about Ubuntu 6.06 fully supporting the architecture. Kernel support for sun4v (the Niagara architecture) was very new, but they said it was supported.

When we first received murphy, it was nicely set up with Ubuntu 6.06 (thanks to colmmacc). This was great - it had an entire repository of apt packages, was easy to manage, things made sense, and it was a long-term support release, so we wouldn't have to worry about upgrading to the next version until 2011. It was also very similar to our other Debian (and later Ubuntu) machines. Everything seemed to work quite nicely for the most part. The system went through a few months in a testing state, then it was promoted to be our login server, due to it's sheer scalability. Everything seemed to be wonderful.

It quickly turned out, however, that users weren't too fond of it. The emphasis of the system's design was on multi-threading and data throughput, but most of our users preferred the single-threaded performance of carbon. Complaints about speed ensued, even though the system was never hitting anywhere near max load or memory usage. Eventually the decision was taken to move primary login to Minerva, our new storage server with less RAM and inferior scalability but faster processor clock speeds. It was decided that murphy's future would be best spent as a web server (deathray was crashing once a month because it just couldn't handle the load of WWW anymore anyway).

This was where the problems began to show themselves. MySQL just plain didn't work by default. It turned out to be an issue with libnss, which, to our knowledge, still hasn't been fixed in Ubuntu. We compiled our own replacement packages, but it was an annoyance.

Apache mostly worked - there were some hiccups but they were mostly to do with our Apache configuration, not Ubuntu's packages. Pubcookie was a nightmare to set up, but that's pubcookie for you. SuPHP worried us - the packages supplied by Ubuntu were out of date, and had not applied critical security patches a few months after they'd been released by the upstream vendor. We ended up compiling our own patched versions, but the fact that it had been neglected (whereas up to date versions were available for Ubuntu 7.10) was concerning.

We moved our web server to Murphy after a few months of testing, and it worked quite nicely. General response times were a little higher then they were with deathray (due to the lower CPU clock speed), but the system never even flinched under heavy load (where deathray would've locked up).

It was about now that news came that Canonical had dropped SPARC support. So much for their big noise about sun4v support. Even worse, they dropped support just months before they released the next LTS release. It was also about now that Redbrick got rooted, and we had to reinstall all of our machines from scratch.

Our first course of action was to update the system firmware (as directed in the Ubuntu install instructions), and put dapper back on there. 2011 was a long way away, we could worry about rearchitecture later on. However, it turned out that the newest version of the firmware wouldn't even boot Linux, which was very worrying. Surely canonical would've at least thought to release a kernel update that ran on the updated hypervisor (Sun4v runs everything inside a hypervisor).

We faced a descision as to what to do with Murphy. There were a few alternatives, none of which were too appealing.

  • Install Ubuntu 6.06 again, knowing that Canonical seemed to be just losing interest in it (leaving out security patches, not bothering to patch libnss, leaving the kernel too old to boot on modern firmware). This would involve downgrading the firmware again.
  • Install Ubuntu 8.04 from ports. This is an unofficial Ubuntu release - essentially, a bunch of build scripts are set up to compile everything for SPARC, and it's left to do it's own thing. If things break, nobody really cares too much. While in theory everything here is newer and should work better, there's never a guarantee that anything will work.
  • Install Solaris 10. Solaris has near perfect support for Sun4v, because Sun has a very clear vested interest in it. However, Solaris is completely devoid of package management, making administration a nightmare. Also, it's full of proprietary crap that's incompatible with everything else we've spent years putting together. Finally, none of the current elected root holders had ever used Solaris.
  • Install Debian. There was evidence that it ran on Sun4v, but three hours of googling revealed 3 (yes, three) people in the world that seemed to have even tried it.
  • Install Gentoo. Seemed to have more users then Debian did, and the community around it seemed more interested in keeping it up to date then Canonical did with Ubuntu. However, a number of root holders seemed to object to "gen-feckin-too" (an actual quote), and more seriously, it doesn't have the enforced package compatibility that Ubuntu/Debian does. In other words, an update one day could cause our apache configuration to suddenly break for unknown reasons.
  • Install a BSD variant. This had a smallish community around it, but again administering it is more of a pain then the average Debian system, there was little in the way of root holder previous experience (although more then there was for Solaris), and there didn't seem to be any amazing success stories of it being used on servers in the wild.

We decided to stick with the devil we knew at first. We downgraded the firmware and put Ubuntu (6.06) back on it. We quite quickly regretted this - running into bizarre problems ranging from the serial console randomly not responding to random segmentation faults. The installer also didn't work at all; werdz ended up installing it manually with apt from a rescue console.

This was fairly unacceptable - especially the issue with the serial console. The ALOM still had the ability to shut off system power and get to OpenBoot, etc, but we couldn't actually access the operating system. So we jumped ship to Solaris.

As expected, it installed fairly painlessly (the installer wasn't amused at the sight of Linux partitions on the disks, but aside from that it worked perfectly). Once it was booted, everything worked nicely. It had far better support for the hardware than Linux did - the amount of detail given by prtdiag in the global zone showed that off. It could even update the firmware itself!

Another nice feature was zones/containers. We decided to split the system into a web host zone (with user logins), another for databases, and another to run a secondary LDAP in.

Unfortunately, getting Solaris to work with the rest of our systems was painful. Cian first tried getting PAM and PADL to authenticate off our LDAP, but couldn't get them to compile. After a while spent fecking around with the Sun client and a few other LDAP servers (trying them out), we eventually (somehow) got PADL working. Hurdle number 1 conquered.

The web stack proved to be equally as nightmarish. It took a while to even get apache to compile (turned out to be a path issue, I felt stupider then usual that day). PHP was a nightmare, because of the number of dependencies that had to be either installed from Sunfreeware or compiled. What we'd have done for apt-get install php5. That eventually compiled.

SuPHP compiled, but it doesn't work. And now, for no absolutely reason at all, LDAP authentication has broken again on it. Grumble grumble.

The system is also a nightmare to keep up to date. It's a pain that we have to do it alright, but what really worries us is that it will stagnate when the next group of rootholders are elected.

As such, it's entirely possible that I'm going to end up kicking the shit out of Murphy one day, and regardless of my reasoning, it'd deserve it. Coconut 01:02, 16 May 2009 (UTC)

So we're seriously considering the other alternatives again. Ubuntu 8.04 seems worth a try, but the main problems with that are likely to come down the line, when package builds start failing and nobody does anything about it. The only other reasonable option seems to be Gentoo, which makes some root holders break out in hives at the very mention of it's name. --Werdz 23:54, 1 Sep 2008 (IST)

Eventually, after lots of cursing, and werdz getting a good proportion of the way into porting Nexenta to sparc, we decided to move to OpenBSD. It's a system that myself and werdz have a reasonable amount of experience with, it has a reasonable packaging system (but as phaxx said, apt was spoiling us anyway), just work (TM) on Sun_4v, seems to be well looked after, and is just generally amazing. It's lack of PAM/NSS support is a bit of a pain, but YPldap appears to be doing the job, without any major hicups. We're still compiling our own apache, but that's a minor thing compared to the million and one things we'd have to look after on solaris. --lil_cain 14:16, 24 Nov 2008 (IST)

OpenBSD worked quite nicely. Until we tried to actually run stuff on it in production. Within minutes of the Apache changeover, it was kernel panicing like mad :(. Looks like sun4v support isn't quite there yet. So our solution was to try Solaris 10 again, and we think we've gotten it just right this time. We built our own packager from scratch (800 lines of delicious Perl) to keep things like Apache, suPHP, PHP, and all of its lovely dependencies up to date and running properly automatically. We've just gotten apache itself configured and running, and hopefully in the not too distant future we'll have it running in production. Also, we beat the native LDAP client into working, so no dirty OpenLDAP haxes. --Werdz 00:48, 5 April 2009 (UTC)

Well, Solaris worked happily (ish) for a while. It held the web server for a while, we experimented with various ways to keep it up to date easily, etc, but in the end just couldn't get it to a stage where you could update it with less then a day's preparation and the patience of a saint. So on the 20th of February 2010 we bit th bullet and put Linux back on it. And it's working. Perfectly. Thank fsck. --Werdz 21:00, 21 February 2010 (UTC)