Re: Invalid UTF-8 byte? (was: Re: utf)
- Date: Tue, 3 Apr 2018 08:30:04 -0400
- From: Greg Wooledge <wooledg@xxxxxxxxxxx>
- Subject: Re: Invalid UTF-8 byte? (was: Re: utf)
> Addendum: iirc (again please correct me if I am wrong) unix file names
> may contain (at least in theory) any byte except 2F (the slash) and the
> null byte. So if your text files might contain arbitrary file names there
> may be (at least in theory) a (admittedly very small) chance that such a
> file name actually might contain any control character except the null
One might question whether a file that contains a list of filenames is
really a "text file". It sounds more like a broken data file.
The real question here (for the OP) is:
WHAT ARE YOU TRYING TO DO?
There was a glimpse a few messages back that looked like you were trying
to parse information out of an mbox-format mail folder. (I.e. a flat
file that has a concatenated series of mbox-format mail messages in it,
with all the silliness and problems inherent in this format, like having
to prefix body lines with ">" if they begin with the word "From".)
"I want to write a shell script to parse an mbox folder..." is enough
to send most people running away screaming. What other horrors are we
in store for next?
Of course, that might be a red herring, since you didn't actually tell
us what your goal is, or what your inputs are, and we're having to
guess at the moment based on tiny hints and information leaks.