A saga of funded FreeBSD development

Poul-Henning Kamp
Revision $Id: index.html,v 1.28 2004/12/15 22:28:07 phk Exp $

This page is the status page for my experiment with community funded development of FreeBSD.

I am not much of a blogging type, and all in all a bit protective of my privacy, so you will not find information about what kind of tea I prefer, the state of my sons bicycle or my opinion about the royal family on this page.

But hopefully you will find information about what the donated money is being spent on, other wise feel free to send me questions in email.

I will try to add an entry here at least once a week.

Mon Dec 13 14:19:30 CET 2004

If you sent me something from my amazon wishlist recently please read this:

It seems that Amazon (sometimes ?) used an old shipping address for me and I can see that six books have been sent to me which I have never received.

Tomorrow I'll try to see if I can persuade the postoffice to give me anything that they have waiting still, but some of the packets may have gone back to Amazon. I've sent them email also to see if I can find out when the packets were sent and tracking numbers if they have them.

If you have tracking numbers for something you sent, please send them to me in an email, that would help me a lot.

I apologize for not spotting the wrong shipping address earlier, my bad.

Mon Dec 13 14:19:30 CET 2004

The sound you hear in the background is the fat lady warming up her voice.

The kernel is converted to nmount(2) and most of the dust have settled as far as I can tell. I'm working on a corner case of the filedesc locking and have another couple of bugfixes in the pipeline.

I have sent a proposal to the FreeBSD Foundation, but I have not yet received their reply. If I get none or get a negative reply I will ponder over xmas if I want to do another round of fund-raising.

Thu Dec 2 10:58:21 CET 2004

Ok, we're into the home strech, according to my books there are approximately 11 days left of my funding so I'm trying to get as much as possible crammed into that time.

Yesterday I committed the VOP typesafety thing which I have had sitting around for far too long.

I'm currently trying to get rid of the boot-time bogo-vnods we create to find the root filesystems disk. It's not quite ready yet.

As I have said at various occations, I really don't want to be in the fundraising business if I can avoid it, so I have sent a proposal to the FreeBSD Foundation that basically says "continue along this path" at half-time effort during the first half of 2005 and I have a number of FreeBSD related contracts in the pipeline for the other half part.

If the Foundation does not feel able to support my FreeBSD habbit, I will consider running another round of fundraising myself but let's see if they don't come through this time.

Fri Nov 12 08:51:12 CET 2004

Apologies for not updating every week as I sort of promised, but I'm not really a diary kind of guy :-)

I have gone on into the depths of the buf cache and started to unravel the circular tangle. One particular success there is that VOP_BMAP() no longer returns a device vnode but only a device bufobj. With that in place, the device vnode is only known inside the individual filesystems, and in theory it is now possible to get rid of the vnode and only use a bufobj.

Before that is fully possible it is necessary to wean more of the buf-cache functions from vnodes, thinks like bread() etc.

The other thing I am working on is to change the root mounting code. Currently the central code tries to create a vnode for the root device, even if it is NFS. That code belongs in the individual disk-based filesystems. Or better yet: nowhere at all: if the filesystems didn't use a device vnode we would not need to create bogo-vnodes at all.

Anyway, that is an ongoing journey.

At the same time I'm working on pulling device access from userland out of vnode layer and Giant. There is some partial code in -current which can be enabled with a boot time hint. It makes "dd if=/dev/zero of=/dev/null bs=1" go about twice the speed and some people report fsck speedups as well.

The select/poll code needs some work to get out from giant, in particular the FILEDESC_LOCK has to be sleepable. My phk_bufwork branch has a crude first stab at this and rwatson has verified that it doesn't affect network performance. right now it is pending on the exact shape of the kind of lock we use for FILEDESC lock.

Ohh, and I got fed up with make(1) starting hundreds of jobs when I said "make -j 12 universe", so I spent a day implementing a token pool which all submakes respect. Works nice, expect it in -current in a day or two.

Fri Oct 29 13:23:18 CEST 2004

There, did it. All local filesystems now go directly to GEOM instead of via DEVFS. Now we can get to the really intersting stuff :-)

Packing, heading off to Karlsruhe for EuroBSDcon'04 se you there!

Wed Oct 20 21:31:41 CEST 2004

OK, the tty work is completed now and the kernel is rid of 3100 lines of copy&paste code. Now we can start to look at getting Giant out of the tty code, a project I will put on the back burner for now to get some fresh air.

Now that the 5.3 branch has been but, the p4::phk_bufwork branch is next. I need to find a way to cut into commit- friendly bits and get it integrated in current.

I have also started to ponder what I will do after this six months project, and one crucial input to that is if any of my cherished donors (that means you) have any comments they want to share with me: Do you feel you get your moneys worth ? Would you be willing to pour more money into this or should I start looking for Real Work(tm) instead ? Email please...

Tue Oct 12 15:22:09 CEST 2004

Almost done integrating the tty driver changes now only 18 files left.

Wed Sep 22 19:07:58 CEST 2004

Moving seriously forward now. I'm integrating both the phk_tty stuff and the phk_bufwork stuff into -current now. This involves a lot of testing and reading diffs, but it's moving.

It looks like people are OK with my tty consistency proposal and RELENG_5 is increasingly going its own way, so I will soon be able to merge som of the significant bits.

Sat Sep 11 20:37:55 CEST 2004

My arm is getting better, and I'm starting to work full bore again. SUCON'04 was a nice conference, and I spent a lot of time talking to Theodore T'so about Linux Standards Base and why it needs to be about more than Linux and the past to become a success.

The phk_bufwork branch is pretty commit ready. I'm trying to cut it into a sequence of independent commits. It is a bit hard to separate out the last bit but we'll see how it goes. Expect code to hit -current next week.

Mon Aug 30 22:03:00 CEST 2004

Thanks for the many well-wishing emails. I can use both hands on a keyboard now for an hour at a time, but I can't sleep for more than a few hours before pains in my shoulder wakes me up. I'm going to the hospital tomorrow and unless they ground me I'll be speaking at SUCON'04 in Zürich later in the week.

Nontheless, I'm making progress: The phk_bufwork branch has only approx 40 lines of code where buffers know too much about vnodes left, and it still passes all my tests.

Those are of course the nasty 40 lines, and the three major bits they control are the buf->vnode backpressure signal, the syncer and the interaction with VM objects.

The "lemming-syncer" will need a lobotomy so it doesn't dump thousands of I/O requests into the I/O system once every 30 seconds anyway.

Sat Aug 21 14:49:41 CEST 2004 -- Ouch!

As you probably noticed my rewritten floppy driver hit -current yesterday. Hopefully it will end up in 5.3 but time will show. The reason why this rewrite was necessary is that with my buf_work branch you can only mount local filesystems from GEOM devices, so it had to be reeducated and rather than just slap it I ended up fixing the issues I hit underway and made it SMPng compliant.

That was the good news. The bad news is that I just bruised my left shoulder because my bike and I parted ways going downhill. Hopefully it'll get better in a few days but right now hitting escape is a bit too painful for comfort.

Tue Aug 17 19:12:17 CEST 2004

Making solid progress. The phk_bufwork branch now runs all local filesystems directly into GEOM and it is running on my two major development servers and on my testbox. Not on my laptop yet though :-)

I'm working on the floppy driver right now, the intersection between SMPng, mediasize autodetection and GEOMs tasting of devices as they appear is hard to nail down, but I'm getting closer, and the driver shrinks too.

I'll be speaking at SUCON'04 in Zürich a couple of weeks from now and will be attending EuroBSDcon'04 in Karlsruhe later this autumn.

I keep a wishlist on Amazon with books I'd like to get hold off, mostly for my own convenience, but it seems that somebody who wants to be anonymous found it and sent me a book. In case you read this: Thanks a LOT!

Sun Aug 8 19:00:42 CEST 2004

Hmm, I seem to have missed an update there, sorry about that, we've been vacationing and things have been a bit out of the usual routine here.

I've been hacking away in perforce and I have now managed to complete the struct buf/struct bio divorce and put all the disk based filesystems on Geom. Things are moving. I suspect all these changes to hit -current around sep. 1st.

Sun Jul 25 10:36:20 CEST 2004 -- Change of plans.

OK, I've given up on getting anything more into 5-stable, there isn't enough time to get the politics and testing done. Most people have been more than happy about my tty patch but a single high profile complaint prevents it from going forward in a manner which is compatible with the re@ crews calendar.

I decided to bite the bullet and have started using p4 so that I can move forward with the stuff during the 5-stable freeze.

Wed Jul 14 15:54:34 CEST 2004

Still hacking my way through the tty layer. So far my patch removes 2755 lines of replicated code from the drivers.

In case you wonder why I'm doing this: Apart from the sanity and consistency this will bring to tty devices under FreeBSD it also vastly reduces the number of places where we enter into the tty code. Instead of every driver having its own copy of cdevsw->open they now use the same. This means there is only one place I have to get the locking correct when I get to that, not 23 places.

Mon Jul 12 09:21:57 CEST 2004

Got more work done. Looked over the incomplete nmount implementation and found 200 lines of copy&paste code handling iovec's and uios. Moving the tty/cua and init/lock semantics of ttys op to the generic layer, that's currently removing about 200 lines of copy&paste for each of the affected serial drivers. Fixed a couple of GEOM issues. Trying to give kldunload a -f(orce) option so people can communicate what they want to happen more precisely.

Sun Jul 4 12:52:25 CEST 2004

Slow going this week, mostly because it the kids were away so I had time to deal with a lot of practical issues and old promises to friends. Grog's email didn't help my mood any, but Markm inpired me to a public reply which did.

I have started to trace the outline of "struct bufobj", the gadget which will be what the buffer-cache controls buffers with. Today that is a vnode, but in the future we need an independent object for this purpose if we are to make the buffer cache useful for geom classes (caching raid-5 parity sectors and GBDE keysectors) or put filesystems directly on goem.

Vnodes will obviously contain a bufobj, and various functions like gbincore() should take a pointer to the bufobj instead of the vnode. More later.

Sat Jun 26 11:56:55 CEST 2004

Our tty code is hairier than I imagined. I'm making headway but it is slow going. I've talked with the re@ team in the past week to make sure that I am spending my time the best way for 5.3/5-STABLE but it seems that the tty code is indeed the place to hit right now.

I have floated the patch which bypasses the vnode layer for userland access to device. The patch is only active for Giant free devices and that currently limits it to stuff like /dev/null, /dev/zero, /dev/random and GEOM devices. We have not been able to show sufficiently significant performance improvement in real world use (stuff like "dd if=/dev/zero of=/dev/null" runs nearly twice as fast, but that is not really interesting).

If I manage to pull through the tty stuff, this patch needs to go in to reap the benefits, but otherwise I do not think we can justify the risk for 5-stable.

Sat Jun 19 10:42:09 CEST 2004

I'm still fighting the tty code, I had hoped that pulling ptys out from under Giant would be a relatively easy thing but I'm getting wiser. If all else fails I will still be able to pull the "master" side (the side which sshd(8) or telnetd(8) works on) out from under Giant, but the actual tty code is such a bunch of weeds that it may be out of my reach. How far I will get depends quite a lot on how much resistance I will get to the stuff I will need to do along the way.

The dev_t conversion is done. I have decided to not do the si_ prefix change since nobody seemed to care much about that.

The Wemm-field seems to have calmed down again and I have not had any hardware troubles the last couple of days.

Wed Jun 16 00:45:57 CEST 2004 -- Hardware.

This is been an interesting week, but a lot of time has gone into mucking about with hardware in various shapes. My laptop started to seriously dislike being moved, so I caved in and ordered a IBM tp41p which I've spent most of today getting into shape. I have still to find a browser+java combo that works with my bank.

Mon Jun 7 09:24:02 CEST 2004 -- TTY code

I've managed to get the tty code sanitized a little bit. Still quite a mess. Marcel has gotten hold of some of the right stuff in uart(4), I'm trying to see how much we can exploit that. Wonder why we don't switch to mbufs instead of clists. Nearly done with the <sys/module.h> thing.

Wed Jun 2 09:17:42 CEST 2004 -- Aaaaand we're off...

It happend so nicely that monday was a public holiday here in Denmark so the workweek started right on the 1st of june, and I went straight for the tty subsystem in the hope that I can get it out from under Giant for 5.3.

What a mess. Cut&Paste of bugs are rampant and some of the layerings are nonexistent (ioctl's for line disciplines for instance). Found a bunch of things which could be cleaned up and improved things by about 100 lines. I had quite forgotten how incestuous the tty code is the the process table. That is going to be a challenge locking wise.

The nmdm(4) driver was totally bogus. Somebody cloned the pty driver which has a tty side and a "raw" side. They ended up with a nmdm driver with two symmetric sides which were not quite ttys. Running a getty(8) on one side and using tip(1) on the other mostly worked, only no NL -> CRNL conversion happened. Yanked out the bogus read() and write() functions an found out that the direct approach does not work because the tty input path calls the tty output path which would call the other sides input path which would call the output path which would... Used a taskqueue to calm things down.

The fact that we default our ttys to start out in ECHO mode is a problem since by the time the late side of nmdm opens, the other side may spew characters at it before it ever gets to make an ioctl to eliminate ECHO. hacked this for nmdm by not setting ECHO. Correct fix would be to not raise DTR on tty ports until the first non-ioctl systemcall (ie: read, write, poll, select or kqueue). I doubt this is big enough problem to get attention in these networked days.

Started today by getting rid of a few PRs, including a 6 year old bug in RELENG_4's i386 and alpha pmap prefault optimization.

Sat May 29 23:41:15 CEST 2004 -- Ready, aim ...

Things are winding down, or if you prefer, up, nicely. Robert and I got a sensible paper out of our effort I think, I got the LEDS to to work on the PC engines box and now I only have two outstanding issues from previous contracts to polish off. The receipt/invoices generated 6 bounces in total, that is par for the course, but I don't know how many copies were swallowed by spam filters. Still need to write a bit for the next status report. I have a number of emails stuck in my inbox which I need to deal with now that I will have some spare time again, a lot of people have been very patient with me the last half year and I need to make it worth their wait.

And as always there are new hardware problems to deal with. On monday one of my network switches blew to cap's and stopped working. Yesterday my trusty ASUS M1300 laptop added a few new sounds to its repetoire, so I think a new laptop is in my very near future. I wish I could find a 4 pound AMD64 based laptop, but I guess I'll have to settle for the Asus S5200N or similar.

So what will I actually do come tuesday morning ? First order is look at what low-hanging fruit I might be able to get into 5.3. There is a chance that I might be able to get pty's out from under Giant, and that would really help the interactive response. There are some ugly details though, select(2)/poll(2) being one of them, but I will give it a shot anyhow.

Mon May 24 13:47:06 CEST 2004 -- Finishing paper work

My auditor looked over the paperwork today and found no problems apart from the fact that the paypal fees have to be "inside" the sales tax calculation instead of outside. Final Receipt/Invoices have been generated and will be mailed out in a few moments. I wonder how many bounces that will get me :-)

And Guess what, I tried to mail one of the emails to myself and my trusty annoyance-filter (http://www.fourmilab.ch/annoyance-filter/) ate it. I armored the email with a few choice keywords at the bottom, I hope it makes it through your filters. Yell at me if you didn't get it.

Thu May 20 15:55:54 CEST 2004 -- Doing paper work

I am almost ready to begin on june 1st, I only have two small projects and a paper I'm writing with Robert Watson left on the TODO list.

Right now I am busy doing the paper work. I have written a script to parse out the PayPal information and generate a invoice/receipt using troff. Yes, I'm that old fashioned. The bank transfers will be done by hand since I do not have the detailed information readily available in machine-readable format. As soon as my auditor has approved them, the emails will go out. I will have to send out an announce@ email since there are a few of the bank-transfers I do not have email addresses for.

I decided to leave the USD donations in the PayPal account in case I need to purchase anything from the US in the near future that way I save a bit of fees here and there, but it does complicate the paperwork a little bit. My auditor may nix that idea though, time will tell.

My lab is almost back to usual configuration again, the Opteron box is waiting for a replacement of the other half of the the sick Kingston RAM and then it can go back on its shelf. Right now I don't quite trust it fully, so I plan to retain the dual-Athlon as main compile/devel host and use the diskless K7, a soekris and the Opteron as main test machines, with the sparc64 and alpha and various other for more specific tests. (This is not only a matter of convenience but also economy and cooling).

Valid HTML 3.2!