Web lists-archives.com

Re: gawk 4.1.4: CR separate char for CRLF files




Vermessung AVT - Wolfgang Rieger writes:
> Another solution which we have been using for many years now, though
> it might not be feasible for you:

Cygwin is, like it or not, a rolling distribution.

> We very rarely update Cygwin. We have been using Cygwin for some 15+
> years now. We use tools like gawk (hundreds of scripts), head, tail,
> sort, etc. that we are using in shell scripts running under cmd.exe
> (no Unix shells involved). I soon realized that upgrades of Cygwin may
> cause troubles with existing scripts, so we only update if we really
> need to (e.g.: New functionality that would be important, 32 to 64 bit
> shift, eventually new Windows versions, bugs we needed to be fixed).

Hopefully the machine(s) runnning those scripts are isolated.

In your particular case you might be better off using MSys2 or GNUwin32
tools, although you'd still need a better way to deal with updates.
Also, audit your scripts for non-portable constructs, since those are
the parts that most likely to break.  CMD scripting is a tough nut to
crack if it's of any complexity and there are lots of things that are
poorly or not officially documented.  I don't quite understand why you
use POSIX tools, but specifically shun POSIX scripting.

> I have followed the discussions about the CR/LF behaviour changes in
> the past attentively and decided not to update in near future, because
> that would lead to a massive problem with many hundreds of scripts -
> hoping that sometimes there will be a change in gawk again.

You'd better replace that hope with a feature request at gawk upstream.

> What is Unix-like or OS-like or Posix-like behaviour in that context?
> You could argue that gawk interprets line endings like the underlying
> OS does (i. e., gawk reads LF in Unix and CR/LF in Win), or it
> interprets line endings in a Unix-style no matter of the underlying OS
> used. That's a developer's decision in my opinion.

Cygwin uses LF line endings (yes there are still text mounts, but you'd
be better off pretending they don't exist).  When you're trying to use
it for CRLF files, you need to wrap those invocations to do an explicit
conversion.

https://cygwin.com/cygwin-ug-net/using-textbinary.html

> But since with pipes or output redirection gawk used to write no CRs
> even in previous versions, we already had the problem that gawk had to
> accept *both* inputs, LF with or without CR. That worked widely fine
> so far, since most Windows and other application SW we use accept both
> record formats, fortunately (we had issues with SW upgrades of other
> vendors no longer accepting pure LF, but that only concerned a very
> small number of scripts). With the new approach in Cygwin that seems
> to be broken, so we did not upgrade Cygwin since then (we currently
> use gawk 4.1.3).

Again, your attempt to freeze your system at some arbitrary point in
time is misguided.  It'll never quite work out and chances are that when
it breaks it will do so in ways that creates more work and forces you to
do it in emergency mode, which is never a good thing.

> Of course the reason for that really annoying CR/LF thing is the
> arrogance and ignorance of MS, which caused innumerable of useless
> developers' hours when I think of the endless discussions and changes
> in Cygwin; but MS is the one who defines the standards because of its
> very market power, so we have to deal with it, if we like or not.

You really can't blame them for CRLF, they weren't and aren't the only
ones using it and it's been in use long before Microsoft entered the
scene.

> I'd definitely prefer to use Unix for its powerful tools, but most of
> the SW we use is simply not available for Unix, and MS does not
> provide gawk etc. So we have to deal with that CR/LF issue in a
> pragmatic rather than in a more, say, philosophical approach: We need
> to run our scripts with as little changes as possible. So that's why
> we upgrade Cygwin as seldom as possible. It is a "living system", yes,
> which is great on the one side - but can be annoying in everyday
> practice.

Again, you'd better figure out how to transform your input (and possibly
output) so it'll conform to the conventions of the tool(s) you use,
perhaps by providing a handful of wrapper scripts.  Alternatively, only
use tools that adhere to the same set of conventions.

> In my opinion there should be at least an option for gawk to accept
> both LF and CR/LF line endings equally, preferably with a system
> variable so that there is no need to change the command line call of
> gawk at all. That's what I vote for.

Yes, but please cast that vote with the upstream developers.  I reckon
it'd be a generally useful function, so there's no point in providing it
only on Cygwin.


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

SD adaptations for KORG EX-800 and Poly-800MkII V0.9:
http://Synth.Stromeko.net/Downloads.html#KorgSDada

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple