Web lists-archives.com

Truncating file names with Unicode characters




# Truncating file names with Unicode characters

When shortening file names that contain Unicode characters, git performs
truncation without awareness of two-byte characters. That often leads to
splitting a character in half and displaying a garbage byte that's left.

Unawareness of Unicode also means that filename length is calculated incorrectly
and some output gets misaligned.

I have tested this with git 2.14.1 on Windows and with git 2.11.0 on Linux. My
configuration includes setting `core.quotepath = off` to display Unicode paths.

# Example: `git log --stat`

## Bad output: half-characters and wrong text alignment

The last file name gets truncated in the middle of the character (`ˆ` is
what's left of it). Text alignment is off because string lengths are calculated
in bytes instead of characters.

    Extension/README.md                                |  28 +++++++++
    .../Catalog.Номенклатура.xml           |  32 ++++++++++
    .../Configuration.xml                              |   5 +-
    ...етПереработчика.ObjectModule.txt |  39 ++++++++++++
    ...cument.ОтчетПереработчика.xml |  68 +++++++++++++++++++++
    .../Enum.СтавкиНДС.xml                    |  24 ++++++++
    ...ˆирениеERPПотяркин_2018-06-05.cfe | Bin 0 -> 22018 bytes
    7 files changed, 195 insertions(+), 1 deletion(-)

## Good output with ASCII file names

Truncation and alignment are done right because each character is represented
by a single byte.

    .../index.html                                             | 14
++++++++++++++
    docs/posts/2017/loops-in-power-query-m-language/index.html | 14
++++++++++++++
    .../index.html                                             |  7 +++++++
    .../temporary-virtual-environment-for-python/index.html    | 14
++++++++++++++
    .../index.html                                             | 14
++++++++++++++
    docs/posts/2018/getting-started-with-libpq/index.html      | 14
++++++++++++++
    .../index.html                                             | 14
++++++++++++++
    .../2018/unit-testing-in-power-query-m-language/index.html |  7 +++++++
    8 files changed, 98 insertions(+)