Web lists-archives.com

Re: [PATCH v9 4/8] utf8: add function to detect a missing UTF-16/32 BOM

lars.schneider@xxxxxxxxxxxx writes:

> +int is_missing_required_utf_bom(const char *enc, const char *data, size_t len)
> +{
> +	return (
> +	   !strcmp(enc, "UTF-16") &&
> +	   !(has_bom_prefix(data, len, utf16_be_bom, sizeof(utf16_be_bom)) ||
> +	     has_bom_prefix(data, len, utf16_le_bom, sizeof(utf16_le_bom)))
> +	) || (
> +	   !strcmp(enc, "UTF-32") &&
> +	   !(has_bom_prefix(data, len, utf32_be_bom, sizeof(utf32_be_bom)) ||
> +	     has_bom_prefix(data, len, utf32_le_bom, sizeof(utf32_le_bom)))
> +	);
> +}

These strcmp() calls seem inconsistent with the principle embodied
by utf8.c::fallback_encoding(), i.e. "be lenient to what we accept",
and make the interface uneven.  I am wondering if we also want to
complain when the user gave us "utf16" and there is no byte order
mark in the contents, for example?  Also "UTF16" or other spelling
the platform may support but this code fails to recognise will go

Which actually may be a feature, not a bug, to be able to bypass
this check---I dunno.

The same comment applies to the previous step.