Jails – High value but shitty Virtualization¶
The archaeology of virtualization¶
Virtualization is nothing new, and depending how fundamentalist you define “virtualized environment” one can point to the earliest of timesharing systems as the origin.
IBM’s mainframe hardware, the 360 machine series, introduced hardware virtualization, so that it was possible to run several of IBMs different and incompatible operating systems on the same computer at the same time.
It’s more than a little bit ironic that a platform which have lasted 50 years now, were beset by backwards-compatibility issues almost from the start, and even more so that IBMs patents on this area of technology prevented anybody else from repeating their mistake for that long.
Everybody else did software virtualization.
The UNIX virtual machine and its anchors¶
The UNIX process model which all modern operating systems use these days, is a virtual machine where all the hardware including the memory, is hidden behind programmer convenient abstractions.
Memory and filedescriptors are unanchored, in the sense that they are entirely inside the programs environment and no concern of anything else.
But once the program reaches out of its own environment, for instance to open a file, implicit and explicit anchorpoints become evident.
The process’ current directory is an explicit anchorpoint, visible and changeable and all that.
But the directory where absolute paths start their lookup is implicit: It’s inode 2 on the filesystem the kernel decided to use as root file-system during boot.
Likewise, an attempt to use the network depends on the implicit anchorpoint of the routing table and the network interface table.
A further anchorpoint exists in the process-table, processes can communicate and send signals to each other, as long as they know each others “pid” - basically sort of serial number.
Chroot hoists the anchor¶
The ‘chroot’ – “Change Root” trick was invented at CSRG, UC Berkeley, as far as I’ve found out by Bill Joy, as a quick hack to do release engineering work on the BSD operating system, without needing a dedicated computer just for that.
It’s a really neat trick, which simply makes the implicit anchorpoint of the root directory an explict anchorpoint which can be changed, per process.
As such chroot(2) was never intended to be a security mechanism nor a solid container, it was simply a name-space based hack, but at some point it was employed as a little bit of both: The FTP server program, ftpd(8) grew a facility for “anonymous” access where only a subset of the total filesystem would be visible to unknown visitors.
Rather than do the proper screening of filenames in software, which is a rather tedious and error-prone task, the author simply did a chroot(2) and left it at that.
There are ways for a process to escape chroot(2) because of some corner-cases of the filesystem, in particular the ‘..’ entries, and this was not exactly a secret, but for obvious reasons nobody went out of their way to tell undergraduate students about it.
But the use in ftpd(8) gave some people the wrong impression that chroot(2) was a security encasement, and this led to some unfortunate claims and assumptions about security.
This was, more or less, the state of affairs when the Internet exploded into our homes.
Development of Jails¶
A fellow named Derrick T. Woolworth contacted me about something in FreeBSD I have long since forgotten, and as we exchanged emails he complained about the fact that different customers in his webhotel needed different versions of apache, mysql, perl etc, and that his forced him to run many machines, each almost idle, just for these different software loads.
Thinking about this I realized that chroot(8) could be pretty trivially extended to become light-weight virtual “machines” and proposed that Derricks company could fund the development.
The actual development consisted of five parts:
Making sure you don’t escape the chroot/jail
Restricting process visibility
Deciding what “root” can and cannot do in a jail
Teach certain device drivers about jails
Giving each jail it’s own IP number.
The third one was the most tedious, since every single place in the kernel where the code said “are you super-user ?” had to be located and thought about, but I used the chance to normalize all these checks to the same function call, an investment which have made life easier for other projects later on.
I won’t go into technical details, you can read those in the Jail-paper Robert Watson co-wrote with me for the SANE 2000 conference
Once it was working, I send the patches to Derrick and per our agreement he got exclusive use for a year before I committed them to FreeBSD in april 1999.
What jails developed¶
Once jails were released to the public, people starting being creative with them, and that’s where they showed their true potential.
For one thing, they are cheap in resources: The jailed processes looks just like other processes, but they all share the same operating system kernel and there is no special overhead associated with jails, except for having more processes needing more memory etc.
But the real killer-features were the side effects.
Bill Joy could have implemented chroot(8) such that a dedicated filesystem were mounted as root directory for the chroot’ed processes, but he didn’t, he simply used a sub-tree of the computers filesystem.
The side effect of this was a sort of “one-way-mirror” effect, where a process which were not chroot’ed could reach into the file namespace of the chroot’ed process (subject to UNIX access controls).
When I created jails, I retained and extended this behaviour, also into other namespaces, such as process IDs.
This means that an unjailed process can see all the jailed processes and subject to UNIX access controls, send them signals, attach debuggers to them and so on.
But the jailed processes can not ‘see’ out of their jails, neither into other jails, nor into the unjailed part of the system.
One of the first uses of this was to fence in a often defaced and terribly written web-application.
The web-application was put in a jail, and a process outside the jail monitored it for corruptions. If any were found, all processes in the jail were stopped and a new jail started from an already prepared master-copy.
The net effect was that defacements lasted only a few seconds at a time, and after some hours even script-kiddies gets tired of that.
This “We see you, you cannot see us” property became a big hit in security applications.
Under normal circumstances, if somebody breaks into your computer, you face a nasty set of problems. For instance if you open a terminal connection, you may be talking to their special version of the sshd(8) daemon, and who knows what that does ?
As a minimum it probably sniffs your password.
Once you’re logged in, the attackers can see you and your processes. Much malware goes into hiding when they detect an interactive root-shell.
If you are running on perfectly virtualized servers, such as VMware, Xen or similar, you are as much subject to these problems as you are with dedicated hardware.
But if you put your outward facing services in a jail, you can comfortably log in using the unadultered sshd(8) in the unjailed part of the system, and see what the attackers are up to, while they cannot see you.
Many of the tall tales I’ve been told about sysadmins playing cat&mouse with attackers have been downright hillarious.
Imagine you are an attacker, you’ve broken into a machine and now you’re trying to compile some exploit code to gain root privilege.
Only, every time you run the compiler it cannot find your source code ?
Because the source file seems to rename itself ?
At random ??
Or weird control-characters gets sent to your terminal emulation so that the window moves around and changes sizes and fonts ???
It took some times before attackers learned to recognize jails but eventually they figured it out.
Robert Watsons role¶
If the the idea and implementation was all mine, why did Robert end up as co-author of the SANE paper ?
Because I felt I needed a non-danglish writing co-author.
Not only was Robert a really nice guy, he was also well versed in all the lingo and jargon in the security sub-culture, so I asked him if he wanted to help me out with the paper and since the paper was a joint effort there was no doubt that he should be a co-author on it.
I’ve never asked what our colaboration on that article did for him in a larger sense. I suppose the citations count in his valuation as an academic if nothing else. It certainly helped me put words around some of my mental models of what jails were and what they could become, paving the road for shittly virtualization of a lot more stuff in the UNIX model.
Soon, Robert eagerly grabbed these ideas and ran with them, luring, bribing, goading, and assigning various innocent bystanders, eager volunteers and hapless students various missing bits, until jails in FreeBSD today, sometimes called “jailNG”, are very close to the vision we had back then, where you can pick and choose which namespaces you want your jail to have a private, semi-transparent version of, and which it should have access to the global namespace of.
So yes, Robert fully earned his role as co-author of the jail paper, maybe not by prior, but certainly by his subsequent work.
The important discovery in jails, and the reason our article has been cited almost 300 times, is that we accidentally discovered that imperfect virtualization can and often is preferable to perfect virtualization, both as a matter of cost and as matter of security.
Robert and I wrote another article, for ACM Queue about that in 2004.
Other operating systems have adopted this idea subsequently, Solaris, OS/X and Linux both sport it prominently these days, but I don’t think Microsoft has done so yet.
Weirdly enough, Sun even took out a dozen of patents on patently obvious aspects of jails, including features which were present in the FreeBSD implementation when they sought their patent, and even citing our SANE article as a reference. If anybody ever sue you with one of those patents, drop me an email, I’ll be happy to help.
And finally: This little saga was triggered by the news that Google runs everything in the Linux version of jails now
Yes, I’m proud :-)