Re: Invalid UTF-8 byte? (was: Re: utf)
- Date: Tue, 3 Apr 2018 14:32:08 +0200
- From: <tomas@xxxxxxxxxx>
- Subject: Re: Invalid UTF-8 byte? (was: Re: utf)
-----BEGIN PGP SIGNED MESSAGE-----
On Tue, Apr 03, 2018 at 02:14:07PM +0200, Michael Lange wrote:
> On Tue, 3 Apr 2018 13:58:33 +0200
> Michael Lange <klappnase@xxxxxxxxxx> wrote:
> > I believe (please anyone correct me if I am wrong) that "text" files
> > won't contain any null byte; many text editors even refuse to open such
> > a file, I guess since they assume it is a "binary" file.
Emacs and vi(m) do open, edit and save such a file (and make no fuss
about it). By default, both depict those NULL characters as ^@.
Are there other editors? (yah, that was a bit snarky, I know ;-)
> > Probably it is the same with some other control characters like 04 (End
> > of Transmission). When I look at https://en.wikipedia.org/wiki/ASCII
> > it seems like 1C (File Separator) or 1E (Record Separator) might be
> > appropriate choices for you. I'm no expert on this, though.
Just assuming "this won't happen" is a sure recipe for some debugging
fun. But perhaps the OP is looking for such fun. (S)he seems set on
> Addendum: iirc (again please correct me if I am wrong) unix file names
> may contain (at least in theory) any byte except 2F (the slash) and the
> null byte. So if your text files might contain arbitrary file names there
> may be (at least in theory) a (admittedly very small) chance that such a
> file name actually might contain any control character except the null
You are correct on file names.
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
-----END PGP SIGNATURE-----