Web lists-archives.com

Re: Invalid UTF-8 byte? (was: Re: utf)




Hi,

On Tue, 3 Apr 2018 14:32:08 +0200
<tomas@xxxxxxxxxx> wrote:

> > > Probably it is the same with some other control characters like 04
> > > (End of Transmission). When I look at
> > > https://en.wikipedia.org/wiki/ASCII it seems like 1C (File
> > > Separator) or 1E (Record Separator) might be appropriate choices
> > > for you. I'm no expert on this, though.
> 
> Just assuming "this won't happen" is a sure recipe for some debugging
> fun. But perhaps the OP is looking for such fun. (S)he seems set on
> trying...
> 
> > Addendum: iirc (again please correct me if I am wrong) unix file names
> > may contain (at least in theory) any byte except 2F (the slash) and
> > the null byte. So if your text files might contain arbitrary file
> > names there may be (at least in theory) a (admittedly very small)
> > chance that such a file name actually might contain any control
> > character except the null byte.
> 
> You are correct on file names.

Thanks for the clarification.

>From what i have understood I think the OP should certainly at least,
whatever the files they want to include exactly look like and whichever
byte they choose as delimiter, scan the file first for such a byte and if
it is actually found replace it with either an empty string or
(probably better) some sort of "tag" before applying the contents to the
new database. This way they could at least be sure that their chosen
delimiter does not split one record into halves.

I don't think it makes really any difference which of the bytes that
aren't supposed to be present in "text" files is used, one can never be
100% sure that these bytes don't show up nonetheless.

I have no idea what these "text files" look like of course. It just seemed
-to me - that the fact that the null byte cannot ever be part of a file
name might make it slightly more appropriate for this purpose than other
candidate bytes. Of course, it depends...

Best regards

Michael

.-.. .. ...- .   .-.. --- -. --.   .- -. -..   .--. .-. --- ... .--. . .-.

Suffocating together ... would create heroic camaraderie.
		-- Khan Noonian Singh, "Space Seed", stardate 3142.8