Web lists-archives.com

[PATCH v13 00/10] convert: add support for different encodings




From: Lars Schneider <larsxschneider@xxxxxxxxx>

Hi,

Patches 1-6,9 are preparation and helper functions.
Patch 7,8,10 are the actual change.

This series is based on v2.16.0 and Torsten's 8462ff43e4 (convert_to_git():
safe_crlf/checksafe becomes int conv_flags, 2018-01-13).

The series can be rebased without conflicts on top of v2.17.0:
https://github.com/larsxschneider/git/tree/encoding-2.17

Changes since v12:

* commit message improvement (Torsten)
* prevent undefined memcpy behavior in has_bom_prefix (Avar)
* improve error message: true/false are no valid working-tree-encodings (Torsten)
* fix crash in same_encoding() if only one argument is NULL (this bug
  was already present before this series, Eric)

Thanks,
Lars

  RFC: https://public-inbox.org/git/BDB9B884-6D17-4BE3-A83C-F67E2AFA2B46@xxxxxxxxx/
   v1: https://public-inbox.org/git/20171211155023.1405-1-lars.schneider@xxxxxxxxxxxx/
   v2: https://public-inbox.org/git/20171229152222.39680-1-lars.schneider@xxxxxxxxxxxx/
   v3: https://public-inbox.org/git/20180106004808.77513-1-lars.schneider@xxxxxxxxxxxx/
   v4: https://public-inbox.org/git/20180120152418.52859-1-lars.schneider@xxxxxxxxxxxx/
   v5: https://public-inbox.org/git/20180129201855.9182-1-tboegi@xxxxxx/
   v6: https://public-inbox.org/git/20180209132830.55385-1-lars.schneider@xxxxxxxxxxxx/
   v7: https://public-inbox.org/git/20180215152711.158-1-lars.schneider@xxxxxxxxxxxx/
   v8: https://public-inbox.org/git/20180224162801.98860-1-lars.schneider@xxxxxxxxxxxx/
   v9: https://public-inbox.org/git/20180304201418.60958-1-lars.schneider@xxxxxxxxxxxx/
  v10: https://public-inbox.org/git/20180307173026.30058-1-lars.schneider@xxxxxxxxxxxx/
  v11: https://public-inbox.org/git/20180309173536.62012-1-lars.schneider@xxxxxxxxxxxx/
  v12: https://public-inbox.org/git/20180315225746.18119-1-lars.schneider@xxxxxxxxxxxx/

Base Ref:
Web-Diff: https://github.com/larsxschneider/git/commit/3aa98e6975
Checkout: git fetch https://github.com/larsxschneider/git encoding-v13 && git checkout 3aa98e6975


### Interdiff (v12..v13):

diff --git a/convert.c b/convert.c
index 2a002af66d..1ae6301629 100644
--- a/convert.c
+++ b/convert.c
@@ -1222,7 +1222,7 @@ static const char *git_path_check_encoding(struct attr_check_item *check)
 		return NULL;

 	if (ATTR_TRUE(value) || ATTR_FALSE(value)) {
-		die(_("working-tree-encoding attribute requires a value"));
+		die(_("true/false are no valid working-tree-encodings"));
 	}

 	/* Don't encode to the default encoding */
diff --git a/t/t0028-working-tree-encoding.sh b/t/t0028-working-tree-encoding.sh
index 884f0878b1..12b8eb963a 100755
--- a/t/t0028-working-tree-encoding.sh
+++ b/t/t0028-working-tree-encoding.sh
@@ -152,7 +152,7 @@ test_expect_success 'check unsupported encodings' '
 	echo "*.set text working-tree-encoding" >.gitattributes &&
 	printf "set" >t.set &&
 	test_must_fail git add t.set 2>err.out &&
-	test_i18ngrep "working-tree-encoding attribute requires a value" err.out &&
+	test_i18ngrep "true/false are no valid working-tree-encodings" err.out &&

 	echo "*.unset text -working-tree-encoding" >.gitattributes &&
 	printf "unset" >t.unset &&
diff --git a/utf8.c b/utf8.c
index 2d8821d36e..25d366d6b3 100644
--- a/utf8.c
+++ b/utf8.c
@@ -428,8 +428,12 @@ int is_encoding_utf8(const char *name)

 int same_encoding(const char *src, const char *dst)
 {
-	if (is_encoding_utf8(src) && is_encoding_utf8(dst))
-		return 1;
+	static const char utf8[] = "UTF-8";
+
+	if (!src)
+		src = utf8;
+	if (!dst)
+		dst = utf8;
 	if (same_utf_encoding(src, dst))
 		return 1;
 	return !strcasecmp(src, dst);
@@ -559,7 +563,7 @@ char *reencode_string_len(const char *in, int insz,
 static int has_bom_prefix(const char *data, size_t len,
 			  const char *bom, size_t bom_len)
 {
-	return (len >= bom_len) && !memcmp(data, bom, bom_len);
+	return data && bom && (len >= bom_len) && !memcmp(data, bom, bom_len);
 }

 static const char utf16_be_bom[] = {0xFE, 0xFF};


### Patches

Lars Schneider (10):
  strbuf: remove unnecessary NUL assignment in xstrdup_tolower()
  strbuf: add xstrdup_toupper()
  strbuf: add a case insensitive starts_with()
  utf8: teach same_encoding() alternative UTF encoding names
  utf8: add function to detect prohibited UTF-16/32 BOM
  utf8: add function to detect a missing UTF-16/32 BOM
  convert: add 'working-tree-encoding' attribute
  convert: check for detectable errors in UTF encodings
  convert: add tracing for 'working-tree-encoding' attribute
  convert: add round trip check based on 'core.checkRoundtripEncoding'

 Documentation/config.txt         |   6 +
 Documentation/gitattributes.txt  |  88 +++++++++++++
 config.c                         |   5 +
 convert.c                        | 276 ++++++++++++++++++++++++++++++++++++++-
 convert.h                        |   2 +
 environment.c                    |   1 +
 git-compat-util.h                |   1 +
 sha1_file.c                      |   2 +-
 strbuf.c                         |  22 +++-
 strbuf.h                         |   1 +
 t/t0028-working-tree-encoding.sh | 245 ++++++++++++++++++++++++++++++++++
 utf8.c                           |  65 ++++++++-
 utf8.h                           |  28 ++++
 13 files changed, 737 insertions(+), 5 deletions(-)
 create mode 100755 t/t0028-working-tree-encoding.sh


base-commit: 8a2f0888555ce46ac87452b194dec5cb66fb1417
--
2.16.2