User-Mode Linux Co-op

Introduction

This article introduces some of the advantages of sharing a cohosted server machine, and specifically sharing it with User-Mode Linux as a compartmentalization tool.

As we're still in the process of setting up this system, this is a work-in-progress. Expect updates from time to time.

History

I had worked in the mid 90's with Patrick Haller and Clem Yonkers at Websense.Net, a firm in northeastern Vermont that provided Internet access and a dial up masquerading Linux router/cache/firewall. When the company closed down, we went our seperate ways; Clem and I stayed in the area, but Pat went to Pennsylvania to pursue his masters degree and work at an ISP there - PA.net.

He offered to host a computer there at a good price if Clem and I wanted one. We decided to take him up on his offer. Even though I've had perfectly good Internet connections, it's nice to have a place to serve up email, web sites, dns info, and free shell accounts for friends.

Clem and I started setting up the box in March of 1999. At that point it had Redhat 5.2 on a Pentium 75 (149.50 BogoMIPS!) with 64M of memory, a 13G hard drive, and a 10mbps Tulip Ethernet card.

The computer was named Slartibartfast, after the Douglas Adams character in the Hitchhiker's Guide to the Galaxy series. Both seemed old and slow, so the name fit. :-) I've long since learned that it's a pain to spell that out to people over the phone. Short names are better.

Goals

Slartibartfast has served its purpose well; it has fed out web pages, dns info, and shell accounts quite well for over 3 years. Not bad, considering that I'm pretty sure Windows NT would have refused to allow itself to even be installed on something so underpowered.

I must admit, though, that I'd like to able to run more intensive tasks on it. Freenet and the Linux Team Fortress server come to mind; both have noticeable CPU requirements. I'd like to be able to build UML root filesystems on it as well, but that takes a boatload of disk space, and a non-trivial amount of CPU and RAM; a full build of all the filesystems can take the better part of a day. I could peg Slart with any one of those three, and the UML builds alone could probably push a week or more.

None of the users on that system use it all that heavily - most use it as an email platform - but a few use it as a jumping off point for managing their own networks. Checking that a firewall lets in a certain kind of connection pretty much requires you to ssh out to some machine on the internet and then try to come back in to that port. Slart makes a good point to do that, as well as running nmap and other vulnerability checkers. With permission, mind you.

A few of the users had mentioned upgrading it at some point. The idea started to make more and more sense as I saw the system steadily loaded with even simple monitoring tools.

Here are the goals for the new box:

At least a gigabyte of memory; I expect this will be running virtual machines that like lots of memory (see below)
Large amounts of hard drive space; ideally, we'd all be able to use this to store anything we'd like, up to and possibly including backing up our home systems to it. Two of us suggested mirrored storage for at least part of the available drive space to allow us to recover if a drive dies.
A fast processor, although I have a personal leaning that we avoid the fastest available because of the cost. Something in the 75-80% of fastest would still have plenty of horsepower. Originally, nobody had requested multiple CPU's, so we were going to buy a single processor system - but read on.
A rack mount system so Pat - our gracious cohosting host - is happy. :-)
The ability to handle lots of domains, email accounts, web serving, shell accounts, and any other servers we'd like. A few of us asked for a machine that could be a backup web server in case of failure in the primary; that's no problem!

New Hardware, first pass

I started to look around at replacements in the summer of 2002. Pat had been kind enough to allow us to cohost a machine in a desktop case, but non-rackmount systems make his job harder so we wanted to get a rackmount. Penguin computing, Dell, and IBM all had possibilities, but their prices were significantly higher than Eracks.com. We originally considered:

Eracks.com/SERVE:
Lockable 2U chassis
AMD Athlon XP 1800, 1.53ghz
1.5G ram
3x 120G drive, with 2 software raid'ed together for 240G storage.
Single Intel nic
No keyboard, mouse, or monitor
Redhat Linux 7.3

This looked beefy enough to handle the small number of people and jobs we had planned for it. Now, about paying for a $2640 machine...

Finances

I went back to the people that had had free acounts on Slart 1 for a while and some other friends that I thought might be interested. I asked them whether they'd be interested in splitting the cost on a replacement server, making sure it was completely clear that they could continue to use the server whether they took part in the replacement costs or not. I figured if I could get 4 others, we could split a $2640 server for $530 each.

Little did I know.

The response was completely unexpected. Almost everyone I wrote to sent back a "Yes!". Some of those I'd written also suggested other people who might be interested. Friends of friends wrote in asking if they could take part. By the time the dust settled, I had 21 partners in the new system.

Wow.

I was thrilled, but also a little worried about horsepower. Some of those users and tasks might use the system quite heavily. I sent out another proposal asking if we could make the system a little beefier. As nobody complained...

New Hardware, second pass

Eracks.com/TDA
Tyan Thunder K7 Dual AMD 760MP IDE
Dual AMD Athlon MP 1900 1.60 Ghz CPU's
2GB DDR ECC Registered Memory
CD-Rom drive
3x 120GB 7200 RPM IDE HD
There's an additional bay for a hard drive; I'll install a spare 40G I have here. That could later be replaced with a larger one if need be.

The end result is 120GB of mirrored storage for our main files, and 160GB of unmirrored storage for things that can be replaced if either drive dies, for a total of 280GB of storage.

The first two drives will be hardware mirrored with the raid card, freeing up the cpu.

While Eracks uses Western Digital and IBM drives, we requested that they not use IBM drives in our system. One of the contributors pointed out that his personal experience had been an almost 100% failure rate in IBM drives. That, and the fact that IBM is getting out of the drive business, encouraged us to go with WD.
2 Ethernet ports on the motherboard
2 port Promise IDE raid card
No keyboard, mouse, or monitor
Redhat Linux 7.3

The new system came out to $4670 base price. I figured that by the time we shipped it twice we'd be around $250 per person. Since Clem and I were splitting the monthly hosting costs without asking for any contributions to that part, if there was some left over we could apply the remainder to the hosting costs.

I ordered the system in early August 2002. When I sent out the notification, I sent along a real request for money. Realizing that people's financial situations do change, I reminded everyone that contributions were completly optional, and that any amount of contribution - including $0.00 - was enough to guarantee continued access to the old and new boxes. :-)

Remember that I'm asking for money from a pool of close friends and family. All through the process, I'm giving people easy ways to say "I'm not comfortable contributing.", and letting them know they don't have to give a reason why. I'd much rather lose them as financial contributors and keep them as friends.

There's a certain amount of risk in this. The machine has been charged to my credit card, and if large numbers of the contributors back out, I'm going to have a very expensive and very idle box on my hands. *grin* I'm not too worried, though. I know most of these people very well.

Two of the other team members were amazingly kind in offering to split any shortfall with me, and a third made an extra contribution to cover another contributor that wasn't able to do a full share. They deserve special appreciation. Many thanks, guys.

Cohosting

Cohosting wasn't too much of a problem. Pat had been giving us a good price per month for hosting the original Slart, and was willing to continue to do so.

IP addresses did turn out to be a bit of an issue, however. I wanted to allow users to set up their own networking and be able to run servers. While this could have all been done with a single IP address, it would have taken quite a bit of work on the host to do the masquerading and port forwarding. SMTP mail, for example, would have been a pain as there's not an easy way to specify a port other than 25 for an SMTP mail server. We'd be stuck with one mail server for everyone.

In the end, I decided to suck it up and go for an extra 32 addresses, even though that was an extra $40/month out of my pocket.

Security issues

Root for everyone!

As others needed to be able to change the system and server configuration on the original Slartibartfast from time to time (to add new dns info, virtual web sites, email domains, etc.), I had been giving out sudo access for a while to those that needed it. For a small number of accounts given to trusted friends, this isn't a problem, but we're looking at 21 people to start with, and probably more in the future. How do you give out the ability to edit system files, start new daemons, install new applications, etc., while avoiding the security and privacy problems inherent in giving out the root password or even sudo access?

User Mode Linux to the rescue.

There's a variation on the standard Linux kernel that's exactly what we need. User-Mode Linux allows one to start a second (or third, fourth...21st) linux kernel on a system. Each new kernel gets its own root filesystem (each stored as a file on the host hard drive). In this way, each UML instance is a complete virtual machine that's all but indistinguishable from a real computer. All of them run as a normal user on the host, but still give root-level access, the ability to start daemons, the ability to run text and graphical applications, full networking, and almost all of the other capabilites of a stock Linux system. The only exception is that one can't directly address hardware inside UML, so the UML environment provides virtual drives, virtual network adapters, and virtual X windows displays. In practice, almost no applications (other than /sbin/hwclock and some scsi tools) access devices directly, so very few changes are needed to the root filesystem to make it work correctly under UML.

The solution is relatively straightforward, then. Everybody gets their own UML root filesystem and gets root access to it. While everybody gets an account on the host to keep file ownership straight, only the administrator (I, at the moment) actually gets to log into the host. While people inside UML's can't see what tasks are running on other UML's, the administrator can see all of them. While people can't see each other's root filesystems, the administrator can. From a trust standpoint, it does mean that the users have to trust the host administrator to not invade their privacy. For example, there's a patch to the UML kernel that allows the administrator to monitor every keystroke going into a terminal session in a UML (even if ssh is used!) and see all screen output coming back on that terminal. As the administrator, I'm promising not to use that patch on users' kernels. I will use it on a few honeypots, however.

In practice, the trust issue here is the same as if 21 people, all with their own computers, decided to cohost at an ISP. As the ISP has physical access to the boxes and the network cables, those 21 people need to trust that ISP not to break into their machines or sniff their traffic.

Performance issues

Because User-Mode Linux intercepts system calls that would normally have gone to the host kernel and may need to modify the original system call going to the host or the results coming back, there's some overhead in running UML. That's one of the reasons I wanted to get extra memory and processor power for this box; one way to compensate for this slowdown is to get better hardware on which to run it.

I don't have hard numbers for this slowdown. As there's a relatively fixed overhead for each system call, programs that call out to the kernel a lot will be hit harder than ones that do a lot of work internally and only rarely need to access the disk, network, or screen.

Jeff Dike, the primary developer of the UML project, is actively working on performance questions. I have no doubt that UML will steadily progress to the point where the performance difference will be less than a few percent.

Even with the beefy box, it may still be the case that some applications may have performance or latency problems inside UML; we won't know until the box goes in and we start using it. For that reason, I'm still holding open the possibility that we'll end up running those sensitive apps right on the host OS rather than inside a UML instance.

Possible problematic applications include:

NTP: NTP needs to lock portions of memory as non-swappable so that its timing sensitive routines aren't thrown off by the code being swapped in on demand.
Half-Life: This Linux server allows multiple (up to 32) client machines to connect; each players movements in this online game are retransmitted to all the other clients. While the individual packets tend to be small (under 100 bytes), there are a lot of them and the game tends to be very sensitive to even small latency problems. It's not clear whether the dual scheduler (one in the host kernel, one in the UML kernel), the network latency, or the sheer numbers of packets to be processed could cause problems for the game.
Heavy Use servers: We're lucky at the moment in that none of our services are used all that heavily. Mail, Web, and DNS are all low load services. If any of them were heavy load servers, I'd consider moving them out to the host as well, remembering that each service I move to the host is going faster, but is less secure.

Host hardware and OS

The raid card is being used to mirror the first two 120G drives. If one dies, the other can take over for it. This means that even the root filesystem can easily be mirrored.

I did consider using software raid, but that has some cpu overhead and I don't yet know how to mirror the root filesystem in such a way that the system will come back up seamlessly in case of primary drive failure.

When the box arrives, I plan to do a very bare install of Linux on the host. The only publicly visible server on the host will be OpenSSH. All other services that need to run will be run in UML instances.

I plan to have a dedicated mail server UML for those that don't want to run their own. There will also be a DNS UML, a Web UML, a Half-Life UML, etc. While this might seem like overkill, I want to compartmentalize services. If someone comes up with an exploit for the Bind name server, for example, they may be able to break into the DNS UML, maybe even getting root level access, but only to that UML instance. The attacker can't access the host or any of the other UML's.

There's a possible quirk in using an SMP machine in the host. Since signals are used to indicate that a disk block or network packet is ready to be processed, there's a potential race condition where a signal is sent to a process, but never delivered. The end result is that the UML hangs.

Jeff Dike is aware of the problem and is considering a real fix, but in the meantime there are some good workarounds. If we see this problem at all - and we hope we don't - one can nudge the UML awake again by providing it with another signal to replace the lost one. That's as simple as pinging the UML. If we see the problem at all, I plan to run a job in the background on the host that does nothing but ping each of the UML's every 5 seconds. That's a very small amount of CPU to work around a potential problem.

A more drastic workaround would be to simply disable the second CPU entirely by running a Uniprocessor kernel on the host. This would be a last gasp measure if the UMLs freeze even with the constant ping going on.

UMLs

Filesystems

Root

Empty

Shared/hostfs

COW

Snapshots

Custom services

Maintenance and upgrades

Networking and addresses

Transition

Doing this yourself

There's no reason why you can't do the above yourself!

You'll need the following things to make it work:

Cohosting ISP: Which one isn't terribly important, but you'll want to take into account price, bandwidth available (and whether there are bandwidth limits or surcharges), and location. While I'm comfortable managing a machine that's 7 hours away, that does mean that I occasionally have to ask the ISP to do things for me when I'm having trouble getting to a command prompt for whatever reason. Pat and his crew have been great at coming to my rescue, but I've worked up quite a Sushi debt to them. I suppose if that debt ever got too large, a Sumo wrestler would show up at my house to break my kneecaps, or at least my chopsticks. :-)
Physical computer: As we've shown, you can do this with nothing more than a throwaway box. Machines that are no longer fast enough to run Windows are perfectly fine with Linux. I'll bet a used machine on EBay would get people up and running while you evaluate your disk, CPU, and memory needs. If your budget is limited, focus more on getting lots of memory and at least one moderately large disk in the system before looking to get the fastest CPU; a memory starved system is going to be effectively slower than one whose CPU is fully loaded.
A community of users: I'm lucky enough to have friends that need a cohosted machine and are technically savvy enough to make use of it. If you're sharing a machine (with or without UML), that community will need to have at least a basic level of trust in each other.
At least one moderately capable Linux system administrator: Setting up a system like this requires some background in how to set up networking, daemons, filesytems, and operating systems. If you go with the UML approach I've described above, someone in the project needs to be familiar with - or willing to learn about - the subtleties of UML setup. There's quite a bit of documentation at the User-Mode Linux web site, and pointers to mailing lists and IRC channels for where the documentation isn't complete.

Thanks and credits

This project would not have gotten off the ground without the help of the following people. At the moment I'm not including the names of the financial contributors as I haven't gotten their permission to mention them. I sincerely appreciate their help and enthusiasm for the project.

Patrick Haller and Pa.net have been wonderful as our cohost ISP. Over the years I've asked Pat for help from time to time with system, hardware or network problems and he's always been willing to offer his help. Many thanks, Pat.

Clem and I came up with the component parts that made up the original Slartibartfast and have split the hosting costs. Thanks, Clem!

Finally, thanks to Jeff Dike, the primary UML developer, and all the other Linux developers that make such an excellent operating system.

William is an Open-Source developer, enthusiast, and advocate from New Hampshire, USA. His day job at SANS pays him to work on network security and Linux projects.

Last edited: 8/17/2002

Best viewed with something that can show web pages... <grin>