[PATCH] add a new unlzma applet

Rob Landley rob at landley.net
Tue Jan 17 22:46:18 PST 2006


Copy the list on busybox design discussions, please.

On Tuesday 17 January 2006 19:16, Aurelien Jacobs wrote:
> On Tue, 17 Jan 2006 10:49:59 -0600
> > archival/tar.c:
> >
> > Is "a" the accepted key for this?  Or is this new?  (More curiosity than
> > anything else...)
>
> AFAIK, there's no accepted key for this. There was another lzma
> implementation which had a tar patch, but the web site is down and I can't
> remember what key the choosed
> (http://martinus.geekisp.com/rublog.cgi/Projects/LZMA). So "a" is my totaly
> arbitrary choice. You can change it to whatever you want.

I'm fine with "a".

> > CONFIG_LZMA_OPTIMIZE_FOR_SPEED should be a CONFIG_FEATURE (both for
> > consistency and so "make allbareconfig" can test that the applet builds
> > ok without it).  Perhaps CONFIG_FEATURE_LZMA_FAST?
>
> Ok. I don't know this CONFIG_FEATURE stuff very well, but I trust you.

It's pretty straightforward.  For testing we want to not just build 
"allyesconfig" and "allnoconfig" but also build all applets with their 
sub-features disabled so we can be sure the build doesn't break.  So "make 
allbareconfig" does an allyesconfig minus anything starting "CONFIG_FEATURE".

And this is something we'd want that testing coverage on.  (It's also good as 
documentation, the convention is "CONFIG_APPNAME" and 
"CONFIG_FEATURE_APPNAME_*".)

> > About CONFIG_FEATURE_DEB_TAR_LZMA: has anybody on the planet ever created
> > such an archive?  Are they likely to?  (You're not adding it to RPM...)
>
> Not that I know of. But some people already thought about this and have
> done some tests (with 7zip instead of using lzma directly) :
> http://www.linuks.mine.nu/sizematters/
> Anyway, that's so small and optionnal... I think it can't hurt.

Agreed.

> > archival/libunarchive/rangecoder.h:
> >
> > Hmmm, I'm not quite sure how to do small boilerplate for the LGPL since
> > we don't include the text of that license in the tarball.  (Should we?) 
> > Not a major issue...
>
> I think the LGPL licence text should be included in the tarball
> (as LICENSE-LGPL for example). Anyway, LGPL is already used by
> decompress_bunzip2.c.
> Si I used the small boilerplate for the LGPL.

I wrote the bunzip compress code (and have mostly finished bzip compression 
code lying around somewhere in the to-do pile of doom).  It's under LGPL so 
people can make a library out of it, outside of busybox.

Adding LICENSE-LGPL at the top level would imply that the whole of busybox is 
LGPL, which it isn't.  We'd need to find somewhere to shoehorn explanatory 
text.  Busybox (and all the individual files of busybox) is distributed under 
GPLv2.  Some individual contributed files are dual-licensed, but that doesn't 
mean the majority of busybox is dual licensed.  (The LGPL is inherently dual 
licensed due to the "if you don't like this license you can use the terms of 
the GPL instead" clause in it somewhere.)

For now, keep the boilerplate long for LGPL files, and we can figure out a 
more graceful way of dealing with it later.  (In both cases it's really 
"additional licensing terms when distributed outside of busybox", so it's not 
_exactly_ a busybox problem...)

> > Busybox needs to start up a "platform.h" file.  The version specific bit
> > with #ifndef always_inline really really _really_ belongs in a header
> > file somewhere.  Any time I see code checking a compiler version I get
> > uncomfortable.
>
> I all agree. But as long as there's no "platform.h" file, I can see no
> better way than keeping it in rangecoder.h.

Well let's do a platform.h then.  There's a discussion on the list about that 
now, and we should have something soon.  If you're trying to inline things, 
and you need this in platform.h to define it:

#ifndef always_inline
#  if defined(__GNUC__) && (__GNUC__ > 3 || __GNUC__ == 3 && __GNUC_MINOR__>0)
#    define always_inline __attribute__((always_inline)) inline
#  else
#    define always_inline inline
#  endif
#endif

Then that would go in platform.h.  (However, having looked closer at the code 
I think you could get away without inlining anything with a bit of judicious 
rewriting.  Busybox doesn't go in for inlining much, since we're optimizing 
for size...)

> > Does rc_read_byte() really need to be a function?  It's used in exactly
> > two places and expands to a very small amount of straightforward code
> > that's inlined anyway.  (A similar comment could be made about rc_test.)
>
> True. In fact that was a kind of raw implementation of standard rangecoder
> algorithm. But those functions where quite stupid, so removed.
>
> > And
> > rc_free() there's just no excuse for, it's just a wrapper around free()
> > that's only called once
>
> I somewhat disagree about this. On a purely technical POV, you're right.
> But at a higher "algorithmic" level, this rc_free() allow a total
> abstraction of the rangecoder.

Meaning what?

> unlzma() don't need to know how the 
> rangecoder is implemented, if it was malloced or whatever. And this allow
> to replace the rangecoder by another implementation without touching
> anything else. Anyway, rc_free() is allways_inline so this won't make any
> size difference. Ok, this is kind of nit-picking, so change it if you want.

Are you likely to replace the rangecoder with another implementation?

> > (and could conceivably have an
> > ENABLE_FEATURE_CLEAN_UP guard anyway, since it's at exit time.  It's
> > potentially a big chunk of memory being freed that I suppose I can see
> > but it's at exit time.)
>
> Ok, true. I'm not so much accustomed to these busybox specificities, but
> I'm learning :-)

We need to document them better.

> Done.
>
> > rc_read() is only used in rc_test, but if test is manually inlined in two
> > places then we have to look closer at the function.  And looking closer
> > at the function, the first three lines of it reimplement bb_xread_all(),
> > and
>
> Not exactly... It's closer to bb_xread(), as for the last buffer, it will
> try to read a full buffer even if the file is not big enough to fill it. So
> I used bb_xread().

Good point.  That doesn't do what you want when the read returns 0 at EOF, 
though...

> > hopefully the compiler is smart enough to treat the remaining two lines
> > as rc->buffer_end = (rc->ptr = rc_buffer) + rc->buffer_size; so as not to
> > dereference rc too many times unnecessarily.  (I know c++ compilers go
> > out of their way to optimize this sort of thing correctly, but I dunno if
> > it applies to the C part of it...)
>
> It seems it's smart enough (at least it makes no size difference).
>
> > Then again, bb_xread_all() is a piece of garbage because if the read ever
> > hits an error and returns -1, the sucker will spin endlessly.  We
> > _really_ need to clean up libbb.c.  But using (and fixing) bb_xread_all()
> > here would be nice...
>
> True. Attached patch will correct this issue (for apparently no size
> diffrence ??). But bb_xread_all() would certainly require some more
> improvements.

Fixing libbb is a long term project.  The first step is auditing and 
documenting what's there...

> > archival/libunarchive/decompress_unlzma.c:
> >
> > I believe __attribute__ ((packed)) generates warnings on some gcc
> > versions. (It's a stupid warning merely saying it's ignored, but that's
> > gcc for you.)
> >
> > On the indentation front again: I tend to treat docs/style-guide.txt as a
> > suggestion and the two spaces are what I personally default to anyway,
> > but the extra indentation before the curly brackets gives me bad
> > flashbacks to reading through the gcc source code in 1993.  And thus a
> > headache.  (Not your fault...)
>
> I personnaly don't like putting curly brackets on the same line as if() or
> while()... It reminds me some java coding :-(

The space after control statements ala "if (", but _not_ putting that space 
between function names and their parentheses, strikes me as just insane.  It 
annoys me deeply.  But I'm trying to do it anyway.  (And I _like_ the two 
space indents...)

*shrug*.  I'm not going to reject a patch due to that kind of whitespace 
issue.  As long as the file is consistent with itself I'll live.  But it 
would be better to be consistent with the rest of the project.

> And I also hate tabs as you can be sure that at least some people will
> display it as 8 spaces, which will make all the line warp at 80 cols...

"Doctor, it hurts when I do this..."

I hated them at first too, but I got used to them.  (What I still hate is 
mixing spaces and tabs.  That's just evil.  Yes, I program in python...)

> (BTW, when I have a quick look at a diff with less it uses 8 spaces tabs
> and looks horrible, but I guess that's also configurable).
>
> Anyway, that's all about the busybox project, so I now tried to best fit
> the style-guide (In fact I kind of trusted GNU indent ;-).

I don't nit-pick about the style guide but I do try to get the major bits 
right. :)

> > filter_accept_list_reassign.c:
> >
> > The #ifdef is unnecessary now that we've got ENABLE_ symbols:
> >
> >   if (ENABLE_FEATURE_DEB_TAR_LZMA && !strcmp(name_ptr, ".lzma")) {
> >
> > The ENABLE_ symbol should resolve to a constant 0 when the feature is
> > disabled, and thus the if() gets optimized out.
>
> Ok. Changed.
>
> > On the whole, really cool.
>
> Nice to hear :-)
>
> Aurel

It's late enough I'm about to collapse but if I haven't applied your patch by 
thursday remind me, ok?

Rob
-- 
Steve Ballmer: Innovation!  Inigo Montoya: You keep using that word.
I do not think it means what you think it means.


More information about the busybox mailing list