How to fix an Index which contains the same file name multiple times, but with different encodings
Background
The Index file (.git/index
) contains meta-information related to the
working tree and is constrained by Git to have exactly one entry per
file (and stage level). While Git handles file names as raw
byte-sequences, SmartGit interprets them as (Unicode-)Strings to
display them more nicely. Certain parts of a byte sequence may not
represent a valid encoding, hence the relevant parts of the byte
sequence will be replaced by the ‘Unicode replacement character ‘ (�,
U+FFFD). If the Index contains two byte sequences which differ only in
‘unconvertible’ parts, the resulting Strings may be identical. Such
Index entries are confusing (not only SmartGit) and need to be fixed.
Note
This kind of problem might occur after updating to msysgit 1.7.10 and not properly converting non-unicode file names.
Resolution
If an encoding problem in the Index is encountered, SmartGit will show an error message which contains information about the problematic file. The resolution is performed using Git from the command line:
cd
to your repository.- Invoke
git status
to be sure you have a clean working tree and Index. If not clean, either commit or stash away your local modifications. - For msysgit, on Windows: invoke
bash.exe
to enter Git’s shell. - Invoke
git ls-files | grep <readable-part-of-the-file-name>
. - Inspect the output of the command and identify the offending entries: usually they will be almost identical, but only differ at certain positions where the octal encoding (\xyz) slightly varies. In the next step we will remove these offending entries from the repository (and re-add them again, if necessary). Hence, for every offending entry <path-to-file>:
- Make a backup of <path-to-file> and
- Invoke
git rm --cached "`printf "<path-to-file>"`"
(Note: all the quotes are necessary!).
Note
On Windows, you may also work without
bash.exe
: just open a command line (DOS box) and invokegit rm --cached <path-to-file-pattern>
with<path-to-file-pattern>
being an unambiguous pattern of the file’s path where the non-ASCII characters are replaced by?
. E.g.:x?y.txt
.
- After all entries have been removed, invoke
git commit -m "clean up bad encoding of file names"
. - If necessary, re-add the entries (this time with correct encoding)
with
git add .
and finally commit withgit commit --amend -m "clean up bad encoding of file names"
. - Now SmartGit should be able to open the repository. Don’t forget to push your changes!