patch: .diff interpreted incorrectly?
Rob Landley
rob at landley.net
Mon Oct 10 17:38:16 UTC 2005
On Monday 10 October 2005 07:42, Bernhard Fischer wrote:
> Hi,
>
> ./patch is the busybox one, it behaves differently as GNU patch,
> i.e. it seems to patch the "wrong" file as compared to GNU patch
> version 2.5.9? "wrong" because i'd expect _both_ patch commands to patch
> the same file..
>
> Is this a feature or a bug in busybox' patch?
Figuring out which file to patch is a perennial problem with the patch
command, because diff lists differences between two files, so patch should
apply those changes to which of those two?
In real world use, only one of the two files mentioned ever actually exists
when you're applying a patch, so the problem doesn't come up much. (The diff
is between old and new versions of a file in the tree you're applying to. If
the diff is between two files that both exist in the tree you're applying to,
you created the diff wrong.)
So this is a "corner case that's at best handled by a heuristic anyway".
There was a nice discussion of this on linux-kernel years ago, but my
google-fu roll is failing. Basically, this is why they diff two entire trees
and then use -p1 to chop off the differing names. What's left is two
identical (and thus unambigous) paths to the file to modify.
On the other hand, if you diff "editors/sed.bak" and "editors/sed.c" and the
person who applies it has a sed.bak lying around in their tree, there's a 50%
chance the wrong file will be patched. (And it doesn't matter _which_ one
the heuristic picks because you could have made the patch the other way.
sed.c and sed.new, for example.) And this problem actually bites high-volume
projects like linux-kernel that marshall large numbers of patches through
many different actively developed trees using automatic scripts. A patch
that _fails_ to apply gives you a warning, but one that happily applies to
the wrong file means hunks get lost and not forwarded.
Plus the scripts that automatically want to figure out _which_ files the patch
applies to (to generate a diffstat, for example) have a bit of trouble
figuring out if it should be "sed.bak" or "sed.c" in the absence of a tree to
apply the patch to. The "diff one level up and strip with -p1" trick solves
that nicely, too.
> $ cp -a TMP/* .;diff -qs one two ;patch -p0 <test.diff;diff -qs one two;ls
> -l one two Files one and two are identical
> patching file one
> Files one and two differ
> -rw-r--r-- 1 b b 14 Oct 10 13:43 one
> -rw-r--r-- 1 b b 16 Oct 10 11:30 two
> $ cp -a TMP/* .;diff -qs one two ;./patch -p0 <test.diff;diff -qs one
> two;ls -l one two Files one and two are identical
> patching file two
> Files one and two differ
> -rw-r--r-- 1 b b 16 Oct 10 11:30 one
> -rw-r--r-- 1 b b 14 Oct 10 13:43 two
It looks like the gnu heuristic is checking the "old" first file first, and
we're checking the "new" file first. We can do that, if you think it
matters. But in the end, you're just testing that the heuristic is
consistent in a situation where there _is_ no right answer. (Ours actually
makes more sense than gnu, as usual. We're picking the "new" file. The
result is that the pair of files look like what you diffed from after you
apply the patch. Why is that wrong?)
If you want to poke around with patch, there's plenty of to-do items. In
addition to fairly standard usage ("patch -p1 -i thingy.patch") not currently
working right due to yesterday's message about command option parsing going
"boing", we have two current problems with patch:
1) No fuzz factor support. (New feature, but useful new feature people
actually miss if it's not there.)
2) We're not close to full susv3 feature set:
http://www.opengroup.org/onlinepubs/009695399/utilities/patch.html
Although our response to -c, -e, and -n should probably be "wontfix" because
those are obsolete formats we have no incentive to support. (Only unified
diff format actually matters anymore. Probably if full susv3 support is
enabled we should catch these and print some kind of brief "this lack is
intentional" message.)
The other susv3 command line options would actually be nice to have, if they
can be configured out. (I can see -b, -d, and -l actually coming in handy.)
The essential minimal subset is really just -p and -i support, with no fuzz
factor, which is enough to consistently apply real-world patches via the
linux-kernel method (the ones that apply cleanly and aren't
whitespace-damaged, anyway).
Rob
More information about the busybox
mailing list