Friday, 24 October 2008

Tarballs, tarballs everywhere (or "More madness of glibc")

My previous posting on the packaging of GNU's glibc project seems to have created quite a stir (see here and here). There's been lots of informed, and some uninformed, debate on the subject.

The glibc project has taken the informed choice to abandon the traditional tarball method of source code release, instead directing developers to get the code from the project's CVS repository directly. Thankfully, following my earlier post, instructions have been put up on the public website so it's now clearer how to obtain the latest version of the glibc source from CVS. So my initial problem has now been solved. Thanks, guys!

Now, clearly this isn't the End Of The World. And any developer capable of working usefully with glibc should be more than competent enough to operate CVS themselves. So stop whinging and get over it... right?


The madness of CVS

Source code projects should always be developed using a Source Code Management system, like CVS (or Subversion, or perhaps a more modern, fashionable, distributed SCM like Bazaar). This is common sense. It's good practice. It's important. Each release should be managed in the SCM on a release branch, and public code drops should be recorded with tags (or labels). This is accepted release engineering practice.

However, downloading a source code release from CVS (or any other SCM) is a really poor distribution mechanism. Here's why:
  • It's not conventional. Unix source packages are traditionally distributed as tarballs (either tar.gz or tar.bz2 tarballs). It's the way the world works. Not providing tarballs forces users to work harder, and makes them less likely to use your code. However, this is not a compelling reason for glibc to distribute tarballs. Indeed, for glibc it's not even an issue. The project is important enough that people will do whatever is necessary to get the source. The glibc developers can distribute their code however they like. And I'll always say thank you.
  • It's not secure. Downloads sometimes go wrong. Unusual, but it can happen. Servers can also be hacked, and the code fiddled with. The accepted convention for this is to distribute MD5 checksums of the source tarballs that the downloader can use to ensure that their tarball is indeed the original. It's far easier to do this on a tarball than a CVS checkout (and there is currently no check mechanism for CVS exports from glibc). This is important, but is also not a compelling reason for glibc to distribute tarballs.
  • It's failable. By definition, the code in an SCM is dynamic. It changes over time. The codeline in the release branch changes as the release nears completion. True, the 2.7 relelase, and the 2.8 release and the 2.8.1 release can be individually tagged. However, these tags are dynamic. There is no guarantee that at a point some time after the 2.7 release, someone can't go back to the CVS repository and move the 2.7 release tag. The glibc 2.7 release could be different on Tuesday that it was on Monday! Sure, good developers shouldn't do that. But they might. By mistake or otherwise. In less capable SCM systems, like CVS, there is no way of usefully tracking the history of the tag. Release tags in CVS are not static. The only safe way to construct a stable, unchanging release snapshot of the code is to archive a tarball. Say, on an FTP site. This is a compelling reason not to download glibc source code releases from the project's CVS repository.
I believe that CVS access is simply not an acceptable release mechanism for the glibc, or for any other, project.

There has to be another way

Additionally, people have been pointing out third party ways to get glibc more easily, or to track its progress more easily, like an external gitweb interface. This is also not an acceptable solution for easily pulling glibc releases. As a developer who would like to use pure, unadulterated glibc source code release, I can only accept source code that I obtain directly from the glibc project themselves.

I know these third parties are trying to perform a useful service, and are doing this with the best intentions. But, unfortunately, I can only accept source code drops that come directly form the official project. Not from unverifiable third party mirrors, or from tarballs packaged by a third party (like Fedora - how can I be sure that over time they won't put their own Fedora-specific patches in their version of glibc?)

If these are useful services, the providers should join the hard working glibc team, and perform their packaging services under their umbrella as an official service.


I have nothing against the glib developers, or the glibc project itself. I appreciate all of their hard work. I offer this as constructive feedback. And I particularly do not like personal abuse directed at individual people. They are all hard working, clever guys who benefit the community greatly. I hope that I give back to the community a fraction of what they have done!


Anonymous said...

True, the 2.7 relelase, and the 2.8 release and the 2.8.1 release can be individually tarred. However, these files are dynamic. There is no guarantee that at a point some time after the 2.7 release, someone can't go back to the ftp server and edit the 2.7 release tarball. The glibc 2.7 release could be different on Tuesday that it was on Monday!

Pete Goodliffe said...

This is a very valid point.

However, the tarballs tend to propagate around the net and the GNU mirrors pick them up quickly. Changes in tarballs would be spotted a lot more obviously than single file modifications within the CVS repository.

shevegen said...

First, let me state that I believe that your blog series about this issue is GOOD to have.

As my tone seemed to be a bit too aggressive I decided to tone done my critique on individuals in general, no matter if it would be justified or not, as even this is subjective or may be felt as unfair...
(But on a personaly note I still believe that developers who display strong opinions towards others do not deserve to be taken off too lightly, no matter WHO the developer in question is. Even though I agreed with i.e. Linus on some points about Gnome, I felt the tone was inappropriate.)

But enough of this, sustaining a hostile atmosphere would be bad anyway in general.

The critique about certain procedures however fully stands as it is from my point of view, and I agree with you about what you wrote here - personally, I still believe that projects that are unable to provide tarballs demonstrate a certain degree of incompetence.

Even the git-folks of Linus' kernel are able to provide tarballs (or patch- releases to update your latest tarballs)

I am not sure if it is appropriate to compare different projects, but I try to "briefly" from my personal experience:

- KDE and Gnome

The big KDE and Gnome projects both update their source code & tarballs for users rather regularly. In my opinion the Gnome libraries are updated much more frequently than the KDE libraries, but probably with smaller, incremental changes. And KDE undertook the switch to Qt4 not long ago which was a quite big change, not without problems as far as I am concerned. For example the switch from autoconf to cmake was quite significant in my opinion. Cmake seems to insist on being in a specific build directory, with autoconf this is normally not needed - although actually, gcc and glibc seem to require separate build directories, so maybe the difference is not that great after all ...

Both projects regularly provide new tarballs and have ways to access the "latest and greatest" easily and transparently, more or less.

Sometimes there are problems to compile something, but normally, let's say... in 99% of the time, the source works fine, compiles cleanly and nicely. Issues can be resolved rather easily. On the kde website, you can quite quickly come to download the latest tarball.

For me as a user, I think this is quite a nice way to have if I chose to compile from source (or use a binary, i.e I still have a windows machine, and I often download some .exe files from sourceforge).

The developers behind these two projects seem to be reasonably friendly too, and given that Linux as Operating System is still some time away from a fully mainstream OS for _elderly_ people (who can be easily and quickly confused in my experience), so... I think it is ok to cater to younger people who can learn easily and quickly, and do not want to be spoonfed completely one-by-one when other people made decisions for them. Nothing against Ubuntu for example, and as I wrote earlier I agree with you about easy and simple distribution ways. Ubuntu has a great ambitious goal, just look at bug #1. That is great to have. All distributions should have big goals in mind.

But equally important are projects like Gentoo or Linux from scratch IMHO. They provide a wealth of information, lessening the knowledge-gap between developers and users. Or steps taken by KDE, which recently implemented a forum and focused on a user-centric wiki.

I think this is the right step to do.

Other projects should learn from such (if they have not done so already, many of course did so)

It seems user-friendly.

I digress ...

Let me finish about KDE & Gnome that I think both projects should be lauded for their (continuous) work, and there are many "sub" projects done by different individuals (i.e. for kde projects on great projects are ktorrent, k3b, yakuake, smplayer.

As one can see these smaller projects often have great ideas, and evolve on their own i.e. how amarok has evolved is quite impressive. (I personally use MPlayer for pretty everything audio and video though, simply because I started with it and sticked to it over the years)

I hope both Kde and Gnome will continue to cooperate for the benefit of their users by the way. And I hope there will not be "KDE vs Gnome" wars, because such "wars" are pointless to have and a waste of time.

Both projects should focus on empowering their users rather than compete with each other directly, but they should cooperate intensively with each other. is a step in the right direction, though at times it seems a little bit fuzzy ... - anyway, I am fine with it in general.

Let me now jump to Mplayer quickly.

- Mplayer

The attitude of the developers is a bit ambivalent. I have had tarball releases of Mplayer which had compile problems whereas the version before that worked and compiled better. Additionally they do not seem to enjoy providing tarball releases, in other words - they do so rather seldomly. I do not complain much about this procedure though.
For one, I usually always got help on their IRC channel. And additionally, the project is still active as far as I can judge, i.e. developers have not abandoned it and continue to improve it. Contrast this ogle, as far as I can see, ogle died long ago. A dead project is quite sad. Lessens the choice ...

The latest tarball release of
MPlayer-1.0rc2.tar.bz2 dates back to 07-Oct-2007 though, which is now +1 year ago. So, as we can see, not that active on the tarball front ...

Mplayer is however a great project.

Now, let us quickly jump to a project which is even more peculiar - ffmpeg:

- Ffmpeg:

There are no more tarball releases. They tell you to do this instead:

svn checkout svn:// ffmpeg

So in other words they seem to be in a similar situation to the glibc developers.

No Tarballs for you folks (at least no somewhat recent ones that is, I think I have some rather old ffmpeg tarball somewhere on a backup tape from two or three years ago.).

So we have an array of different practices. I believe the BIG majority out there provides tarball releases. A smaller base provides only CVS/SVN access. Many provide both. Providing both is, I think, the better way.

I agree with developers who state that users can use CVS.
I am not disputing that. I can understand when they say they do want to make their life easier. But at the same time, I think users who want to have an easy life are equally important.

I can not understand what is so difficult for a developer to create a tarball. Isn't anyone using scripting languages who perform exactly such trivial tasks anymore?

We have companies who provide automated testing, backups, fully integrated reporting, wikis, bug trackers, continuous automated builds and what-not.

In the year 2008 these things are trivial. We have git and github, which are IMHO both wonderful ideas...

Let me jump back to glibc - Yes, glibc is a different beast compared to other applications. Replacing the existing glibc with a newer glibc can be a daunting task. (People are quick to shout at you to not do it, and people who provide help on doing exactly this are less common, as if this were an impossible task or a the user would be too stupid to ever do this on his own... Hmm, I complain again, and digress.)

Let me throw in modular xorg quickly. It is a good contrast to glibc, because compiling modular xorg is also not that easy. The move away from monolithic tarballs to those 200-something individual tarballs was not a genious move per se, in my opinion. I do not want to flame that decision. I have read their opinion, but I am still unable to understand why there is no more easy way to compile a "monolithic" modular xorg. At least one can download the individual components and update on their own... sometimes it does not work, i.e. when xorg-server introduced pixman, I even had a xorg-server tarball release complain about a pixman version which was not on their ftp server for 5 days before they updated it....

Anyway. I can understand that they feel it is an improvement to release individual components more readily.

But from my point of view, I do not understand why their was a need to make it so much harder to compile it compared to the old way.... Oh well. I digress AGAIN.

Let me conclude that compiling all of modular xorg on your own is more annoying than compiling glibc. I just feel that both projects here did not make the best decisions in light of being user friendly. As written above, other projects coped better with such situations.

Itt is the wrong signal to send to (potential?) users to enforce certain procedures, while other (i.e. GNU projects) continue to have them. Don't get me wrong - "smart" users can write scripts that automate certain tasks. I did so too, so my biggest gripe is not so much that things are hard to do, but instead that information is not so easy to find... and that these procedures rather complicate my life as an "advanced" user. It would be so much better if the developers in such projects could UNDERSTAND this point of view and tried to find ways to SUPPORT it as well.

Issues should be addressed, an attempt should be done to resolve them. Let me contrast this with Libtool.

I used to dislike Libtool a lot due to some issues, but after I wrote to their dev-mailing list and 3 different people all tried to help me instantly I changed my mind. You know how quickly I changed it?

It makes a huge difference to KNOW that there are people who try to help. I still think Libtool is somewhat problematic, and I have a hard time to enjoy the beauty of a huge shell script, but if a nice team actively drives it forward then I have a much smaller problem with it. I.e. hope for future changes to the better. It really really makes a big difference how users feel treated, and whether they feel alienated, or encouraged to support something.

Anyway, I wrote so much and have rather avoided some points you wrote here, so let me finish with these things:

- I agree completely about the labeling of software projects that have a poor distribution mechanism if they focus on svn/cvs etc.. only.
It is user unfriendly, and not needed in the year 2008. Who wants to see robots on Saturn and Jupiter building towns, but the software is only provided with CVS and SVN, or splitted to become 200 tarballs?

- The security issue is something I in general am not interested at all. I mean, i.e. Fedora may not appreciate if someone tampers with their machines and tries to abuse the good-will of Fedora to provide resources (source code, binary packages etc..) to people free of charge, but I as a user really hardly care. I want things to work easily.
I do not even blame anyone if I would ever have a security problem. I would trust the developers to fix these problems. Patience is good!
But I really just want that things work smoothly.
Making my life harder with security issues is not something I enjoy.

The whole security issue is, in my opinion, blown out of proportion too, but this is another matter ... I just think security should not hinder users too much. As far as I can see by the way, I never ever had a trojan or virus on my Linux boxes. Maybe I did not notice a trojan, but I think the more likely explanation really is that in my 6 years I really had none of either.

- I agree with you about gitweb. I however must say... gitweb looks more sexy than CVS stuff... I even happily click on all the summaries of the linux kernel project or other git-happy projects.
Makes me click-happy. Those colours are nice!

Anyway, I agree with you about the last point.

I do however want for developers to listen to users as well. Among the noise of users made in "complaints" or even "flames", there are sometimes well made points of critique, and it sometimes is my observation that these are simply ignored or ridiculed by developers. I may be wrong, and it depends on the climate of talks too, but I claim that this has happened in the past.

And if we want some real flames, we can read tuomov's blog. (I do not necessarily agree with all what he writes, and he seems to have a strong tone in his choice of words, but among that there are points which I agree with entirely, and in a way it evolves around listening to users as well.)

Michael said...

You can also add that using a cvs server hardly scale, and take more ressources server side.

But given the fact that few people need glibc source, I guess this argument is not a compeling reason to use tarball either.