Web lists-archives.com

RE: Need multibyte advice - Shift-JIS




On February 27, 2019 12:51, Michal Suchánek wrote:
> To: Randall S. Becker <rsbecker@xxxxxxxxxxxxx>
> Cc: git@xxxxxxxxxxxxxxx
> Subject: Re: Need multibyte advice - Shift-JIS
> 
> On Wed, 27 Feb 2019 12:38:06 -0500
> "Randall S. Becker" <rsbecker@xxxxxxxxxxxxx> wrote:
> 
> > On February 27, 2019 12:15, Michal Suchánek wrote:
> > > To: Randall S. Becker <rsbecker@xxxxxxxxxxxxx>
> > > Cc: git@xxxxxxxxxxxxxxx
> > > Subject: Re: Need multibyte advice - Shift-JIS
> > >
> > > On Wed, 27 Feb 2019 12:03:58 -0500
> > > "Randall S. Becker" <rsbecker@xxxxxxxxxxxxx> wrote:
> > >
> > > > On February 27, 2019 11:52, Michal Suchánek wrote:
> > > > > To: Randall S. Becker <rsbecker@xxxxxxxxxxxxx>
> > > > > Cc: git@xxxxxxxxxxxxxxx
> > > > > Subject: Re: Need multibyte advice - Shift-JIS
> > > > >
> > > > > On Wed, 27 Feb 2019 11:33:47 -0500 "Randall S. Becker"
> > > > > <rsbecker@xxxxxxxxxxxxx> wrote:
> > > > >
> > > > > > On February 27, 2019 11:29 Michal Suchánek wrote:
> > > > > > > On Wed, 27 Feb 2019 11:19:33 -0500 "Randall S. Becker"
> > > > > > > <rsbecker@xxxxxxxxxxxxx> wrote:
> > > > > > >
> > > > > > > > On February 27, 2019 11:11, Michal Suchánek wrote:
> > > > > > > > > On Wed, 27 Feb 2019 10:54:23 -0500 "Randall S. Becker"
> > > > > > > > > <rsbecker@xxxxxxxxxxxxx> wrote:
> > > > > > > > >
> > > > > > > > > > On February 27, 2019 9:09, Michal Suchánek wrote:
> > > > > > > > > > > On Wed, 27 Feb 2019 08:04:08 -0500 "Randall S. Becker"
> > > > > > > > > > > <rsbecker@xxxxxxxxxxxxx> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Git Team,
> > > > > > > > > > > >
> > > > > > > > > > > > I have to admit being perplexed by this one. I
> > > > > > > > > > > > have been asked to support the Shift-JIS character
> > > > > > > > > > > > set in file contents, comments, and logs, for a
> > > > > > > > > > > > partner of mine. I know there are a few ways to do
> > > > > > > > > > > > this, but I'm looking for the official non-hacky
> > > > > > > > > > > > way
> > > > > > > to do this.
> > > > > > > > > > > > This is CLI only, and our pager, less, does not
> > > > > > > > > > > > support multi-byte, so I'm looking
> > > > > > > > > > for
> > > > > > > > > > > options there also.
> > > > > > > > > > >
> > > > > > > > > > > SJIS is about as much multibyte as UTF-8.
> > > > > > > > > > >
> > > > > > > > > > > Why do you think less does not support it?
> > > > > > > > > > >
> > > > > > > > > > > Last time I looked there was SJIS locale for libc so
> > > > > > > > > > > it is only matter of generating the correct locales
> > > > > > > > > > > and using them. Of course, if you are
> > > > > > > > > > running
> > > > > > > > > > > in UTF-8 SJIS will look like garbage.
> > > > > > > > > >
> > > > > > > > > > Sadly, I did not personally build less on this
> > > > > > > > > > platform, and the libc used did not include UTF-16, on
> > > > > > > > > > the platform vendor supplied less. cat works fine, but
> > > > > > > > > > the usual
> > > > > > > > > > LESSCHARSET=utf-16 is unsupported, so I am looking for
> > > > > > > > > > an alternative. THAT is why I think less does not support it.
> > > > > > > > > > Sorry, I should have made that more
> > > > > > > clear.
> > > > > > > > > >
> > > > > > > > > > cat works fine, so if I set GIT_PAGER=cat, I can at
> > > > > > > > > > least see the diffs cleanly in SJIS, but this partner
> > > > > > > > > > wants a pager that is
> > > usable.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > So you want to use SJIS because UTF-16 is not supported.
> > > > > > > > > So what is the problem with SJIS (or UTF-8 for that matter)?
> > > > > > > >
> > > > > > > > The partner I am working with is using multi-byte SJIS,
> > > > > > > > which is also not
> > > > > > > supported by this incarnation of less. As a result, UTF-8
> > > > > > > does not work either in this situation. The content is definitely
> multi-byte.
> > > > > > > I know this was fixed in RedHat's Less in 2016, but did not
> > > > > > > make this
> > > > > platform.
> > > > > > > >
> > > > > > >
> > > > > > > Both UTF-8 and SJIS is multibyte and both is supported by
> > > > > > > less in general. If your particular less cannot support it
> > > > > > > then it is broken and you should fix it or get it fixed.
> > > > > >
> > > > > > To be more specific, the implementation of less' UTF-8 on this
> > > > > > platform will
> > > > > present the data as unusable junk as expected. SJIS is
> > > > > multi-byte, but is not one of the allowed encodings in less. I
> > > > > am not empowered to
> > > "get it fixed".
> > > > > Thanks for your advice.
> > > > > >
> > > > >
> > > > > How is this 'allowed encodings in less' defined?
> > > >
> > > > When you run less with LESSCHARSET=encoding, if the encoding is
> > > > not
> > > known, you get the error:
> > > > invalid charset name
> > > >
> > > > Doing the due diligence, I actually read the man page on the
> > > > platform
> > > before asking the question, which listed the following as the only
> > > allowed
> > > encodings: ascii, iso8859, latin1, latin9, dos, IBM-1047, koi8-r,
> > > next, utf-8, windows. The utf-8 variant does not know how to display
> > > its multi-byte forms in SJIS, as with other platforms. Does that make
> sense now?
> > > >
> > >
> > > Does the said man page also mention LESSCHARDEF or LESSOPEN?
> >
> > Of course it does.
> >
> 
> So what's the problem with displaying SJIS or even UTF-16 in less, exactly?
> 
> Also if you really don't like less there is lv.

I'm sorry if I was not clear about all this. NonStop is not a Linux platform. It is POSIX. Not all utilities are available and not all utilities have all capabilities. lv is not available for the platform. less considers the data binary and displays what usually is displayed when you try to use it for binary multibyte. You get @^@- and such. It does not present the data in the correct character set for the user.

This was only one part of my original question. I am searching elsewhere for support on pagers, because this really is not an appropriate discussion for the git group to focus on, do let's drop this, please, as not worth continuing. My original request was more about how to set up the file attributes, difference engine, and the rest of the git infrastructure. The partner I am working with is doing this with git hooks, which I am not really happy about. Let's prune this discussion as not worthy.