Web lists-archives.com

Re: Invalid UTF-8 byte? (was: Re: utf)




-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, Apr 04, 2018 at 03:44:23PM -0300, Henrique de Moraes Holschuh wrote:

[...]

> That said, it is always safe to break valid "modified UTF-8" into
> records using zeroes, as long as you don't expect the result to be valid
> UTF-8 (it isn't valid UTF-8 because NULs will be encoded using a
> non-minimal byte sequence that *will* decode to a zero even if it is
> invalid) or valid modified UTF-8 (it isn't valid modified UTF-8 because
> 0 is not valid as an encoding for NUL in modified UTF-8).  But a lax
> UTF-8 or modified UTF-8 *would* parse "modified UTF-8 with zero as
> record separators" and reconstruct the unicode text properly (but it
> would read the record separators as NULs, so you'd get extra NULs in the
> resulting text).

You are a nasty guy, aren't you ;-)

Pretty cunning...

Cheers
- -- t
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlrFJngACgkQBcgs9XrR2kZqLgCdEuap+rqSU6HCrXpkL6XHl3Az
lRUAnjwGhiMNNlY+SXwIxpd/kfnvst1z
=kHBa
-----END PGP SIGNATURE-----