Web lists-archives.com

Re: Invalid UTF-8 byte? (was: Re: utf)




-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, Apr 03, 2018 at 02:14:07PM +0200, Michael Lange wrote:
> On Tue, 3 Apr 2018 13:58:33 +0200
> Michael Lange <klappnase@xxxxxxxxxx> wrote:
> 
> > I believe (please anyone correct me if I am wrong) that "text" files
> > won't contain any null byte; many text editors even refuse to open such
> > a file, I guess since they assume it is a "binary" file.

Emacs and vi(m) do open, edit and save such a file (and make no fuss
about it). By default, both depict those NULL characters as ^@.

Are there other editors? (yah, that was a bit snarky, I know ;-)

> > Probably it is the same with some other control characters like 04 (End
> > of Transmission). When I look at https://en.wikipedia.org/wiki/ASCII
> > it seems like 1C (File Separator) or 1E (Record Separator) might be 
> > appropriate choices for you. I'm no expert on this, though.

Just assuming "this won't happen" is a sure recipe for some debugging
fun. But perhaps the OP is looking for such fun. (S)he seems set on
trying...

> Addendum: iirc (again please correct me if I am wrong) unix file names
> may contain (at least in theory) any byte except 2F (the slash) and the
> null byte. So if your text files might contain arbitrary file names there
> may be (at least in theory) a (admittedly very small) chance that such a
> file name actually might contain any control character except the null
> byte.

You are correct on file names.

Cheers
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlrDdEgACgkQBcgs9XrR2kYbPgCeIA04MoYQleL5IDw5wwmerx0o
bqEAnA24L1+etC0tlCH2ExSdNigEPMDU
=iV6I
-----END PGP SIGNATURE-----