patch: .diff interpreted incorrectly?

Mon Oct 10 17:38:16 UTC 2005

On Monday 10 October 2005 07:42, Bernhard Fischer wrote:
> Hi,
>
> ./patch is the busybox one, it behaves differently as GNU patch,
> i.e. it seems to patch the "wrong" file as compared to GNU patch
> version 2.5.9? "wrong" because i'd expect _both_ patch commands to patch
> the same file..
>
> Is this a feature or a bug in busybox' patch?

Figuring out which file to patch is a perennial problem with the patch 
command, because diff lists differences between two files, so patch should 
apply those changes to which of those two?

In real world use, only one of the two files mentioned ever actually exists 
when you're applying a patch, so the problem doesn't come up much.  (The diff 
is between old and new versions of a file in the tree you're applying to.  If 
the diff is between two files that both exist in the tree you're applying to, 
you created the diff wrong.)

So this is a "corner case that's at best handled by a heuristic anyway".

There was a nice discussion of this on linux-kernel years ago, but my 
google-fu roll is failing.  Basically, this is why they diff two entire trees 
and then use -p1 to chop off the differing names.  What's left is two 
identical (and thus unambigous) paths to the file to modify.

On the other hand, if you diff "editors/sed.bak" and "editors/sed.c" and the 
person who applies it has a sed.bak lying around in their tree, there's a 50% 
chance the wrong file will be patched.  (And it doesn't matter _which_ one 
the heuristic picks because you could have made the patch the other way.  
sed.c and sed.new, for example.)  And this problem actually bites high-volume 
projects like linux-kernel that marshall large numbers of patches through 
many different actively developed trees using automatic scripts.  A patch 
that _fails_ to apply gives you a warning, but one that happily applies to 
the wrong file means hunks get lost and not forwarded.

Plus the scripts that automatically want to figure out _which_ files the patch 
applies to (to generate a diffstat, for example) have a bit of trouble 
figuring out if it should be "sed.bak" or "sed.c" in the absence of a tree to 
apply the patch to.  The "diff one level up and strip with -p1" trick solves 
that nicely, too.

> $ cp -a TMP/* .;diff -qs one two ;patch -p0 <test.diff;diff -qs one two;ls
> -l one two Files one and two are identical
> patching file one
> Files one and two differ
> -rw-r--r--  1 b b 14 Oct 10 13:43 one
> -rw-r--r--  1 b b 16 Oct 10 11:30 two
> $ cp -a TMP/* .;diff -qs one two ;./patch -p0 <test.diff;diff -qs one
> two;ls -l one two Files one and two are identical
> patching file two
> Files one and two differ
> -rw-r--r--  1 b b 16 Oct 10 11:30 one
> -rw-r--r--  1 b b 14 Oct 10 13:43 two

It looks like the gnu heuristic is checking the "old" first file first, and 
we're checking the "new" file first.  We can do that, if you think it 
matters.  But in the end, you're just testing that the heuristic is 
consistent in a situation where there _is_ no right answer.  (Ours actually 
makes more sense than gnu, as usual.  We're picking the "new" file.  The 
result is that the pair of files look like what you diffed from after you 
apply the patch.  Why is that wrong?)

If you want to poke around with patch, there's plenty of to-do items.  In 
addition to fairly standard usage ("patch -p1 -i thingy.patch") not currently 
working right due to yesterday's message about command option parsing going 
"boing", we have two current problems with patch:

1) No fuzz factor support.  (New feature, but useful new feature people 
actually miss if it's not there.)

2) We're not close to full susv3 feature set:

http://www.opengroup.org/onlinepubs/009695399/utilities/patch.html

Although our response to -c, -e, and -n should probably be "wontfix" because 
those are obsolete formats we have no incentive to support.  (Only unified 
diff format actually matters anymore.  Probably if full susv3 support is 
enabled we should catch these and print some kind of brief "this lack is 
intentional" message.)

The other susv3 command line options would actually be nice to have, if they 
can be configured out.  (I can see -b, -d, and -l actually coming in handy.)

The essential minimal subset is really just -p and -i support, with no fuzz 
factor, which is enough to consistently apply real-world patches via the 
linux-kernel method (the ones that apply cleanly and aren't 
whitespace-damaged, anyway).

Rob