Web lists-archives.com

Re: CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts

On 2017-06-14 10:07, cyg Simple wrote:
> On 6/13/2017 1:34 PM, Brian Inglis wrote:
>> On 2017-06-13 08:11, cyg Simple wrote:
>>> On 6/10/2017 10:30 PM, Eric Blake wrote:
>>>> On 06/10/2017 08:48 AM, cyg Simple wrote:
>>>>> Uhm, 'wt' and 'wb' came from MS itself.
>>>> Not quite. fopen(,"wb") comes from POSIX.  "wt" is probably a microsoft
>>>> extension, but it is certainly not in POSIX nor in glibc.
>>> I think it's a C standard so it should be in glibc.  It may be mentioned
>>> in the POSIX standard as in support of the C standard.
>>>>>  GNU GCC was adapted to allow it
>>>> Huh? It's not whether the compiler allows it, but whether libc allows
>>>> it.  ALL libc that are remotely close to POSIX compliant support
>>>> fopen(,"wb"), but only Windows platforms (and NOT glibc) support
>>>> fopen(,"wt").
>>> Looking at http://www.cplusplus.com/reference/cstdio/fopen/ I see:
>>> "If additional characters follow the sequence, the behavior depends on
>>> the library implementation: some implementations may ignore additional
>>> characters so that for example an additional "t" (sometimes used to
>>> explicitly state a text file) is accepted."
>>> There is also a lot of discussion about the topic at:
>>> https://stackoverflow.com/questions/229924/difference-between-files-writen-in-binary-and-text-mode
>>> As for glibc, it will just ignore the extra character but it allows the
>>> use of "wt"; it just means nothing to that C runtime library. It does
>>> aide in portable code though.
>>> As for me conflating GCC with a C runtime - please forgive my lapse in
>>> memory.
>> There's no need for open mode "t", as text is the default mode unless
>> "b" is specified, and assuming you use "cooked" line I/O functions like
>> fgets/fputs, not "raw" binary I/O like fread/fwrite; fscanf ignores all
>> line terminators unless you use formats like "%c" which could see them.
> That isn't exactly true based on the MSDN[1] the "t" manages the CTRL-Z
> EOF marker.  However, I agree that it worthless.  But regardless the C
> standard states that "t" or whatever extra character can be added and
> left to the implementing library to interpret or ignored.  If the C
> runtime library doesn't use it or ignore it then it isn't complying to
> the C standard.

The Standard supports only /[ra](b|+|b+|+b)?|w(b|+|b+|+b)?x?/, although
implementations may choose to ignore some of the allowed trailing
characters (presumably "b", "+", or "x", as the footnote is unclear), or
the file so created may not be accessible as a stream, and anything else
invokes UB.

" The fopen function
1 #include <stdio.h>
FILE *fopen(const char * restrict filename,
const char * restrict mode);
3 The argument mode points to a string. If the string is one of the
following, the file is open in the indicated mode. Otherwise, the
behavior is undefined.[271]

r		open text file for reading
w		truncate to zero length or create text file for writing
wx		create text file for writing
a		append; open or create text file for writing at
rb		open binary file for reading
wb		truncate to zero length or create binary file for
wbx		create binary file for writing
ab		append; open or create binary file for writing at
r+		open text file for update (reading and writing)
w+		truncate to zero length or create text file for update
w+x		create text file for update
a+		append; open or create text file for update, writing at
r+b or rb+	open binary file for update (reading and writing)
w+b or wb+	truncate to zero length or create binary file for update
w+bx or wb+x	create binary file for update
a+b or ab+	append; open or create binary file for update, writing
		at end-of-file
[271] If the string begins with one of the above sequences, the
implementation might choose to ignore the remaining characters, or it
might use them to select different kinds of a file (some of which might
not conform to the properties in 7.21.2."

> [1] https://msdn.microsoft.com/en-us/library/yeby3zcb(v=vs.140).aspx
> "t
> Open in text (translated) mode. In this mode, CTRL+Z is interpreted as
> an EOF character on input. In files that are opened for reading/writing
> by using "a+", fopen checks for a CTRL+Z at the end of the file and
> removes it, if it is possible. This is done because using fseek and
> ftell to move within a file that ends with CTRL+Z may cause fseek to
> behave incorrectly near the end of the file."

Wonder if "t" is also required in order to have <ctrl-Z> recognized as
console input EOF?
That page also documents a bunch of other mode characters and encoding
arguments that make that implementation far from Standard.

Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple