Web lists-archives.com

Re: Need multibyte advice - Shift-JIS




On Wed, 27 Feb 2019 11:19:33 -0500
"Randall S. Becker" <rsbecker@xxxxxxxxxxxxx> wrote:

> On February 27, 2019 11:11, Michal Suchánek wrote:
> > On Wed, 27 Feb 2019 10:54:23 -0500
> > "Randall S. Becker" <rsbecker@xxxxxxxxxxxxx> wrote:
> >   
> > > On February 27, 2019 9:09, Michal Suchánek wrote:  
> > > > On Wed, 27 Feb 2019 08:04:08 -0500
> > > > "Randall S. Becker" <rsbecker@xxxxxxxxxxxxx> wrote:
> > > >  
> > > > > Hi Git Team,
> > > > >
> > > > > I have to admit being perplexed by this one. I have been asked to
> > > > > support the Shift-JIS character set in file contents, comments,
> > > > > and logs, for a partner of mine. I know there are a few ways to do
> > > > > this, but I'm looking for the official non-hacky way to do this.
> > > > > This is CLI only, and our pager, less, does not support
> > > > > multi-byte, so I'm looking  
> > > for  
> > > > options there also.
> > > >
> > > > SJIS is about as much multibyte as UTF-8.
> > > >
> > > > Why do you think less does not support it?
> > > >
> > > > Last time I looked there was SJIS locale for libc so it is only
> > > > matter of generating the correct locales and using them. Of course,
> > > > if you are  
> > > running  
> > > > in UTF-8 SJIS will look like garbage.  
> > >
> > > Sadly, I did not personally build less on this platform, and the libc
> > > used did not include UTF-16, on the platform vendor supplied less. cat
> > > works fine, but the usual LESSCHARSET=utf-16 is unsupported, so I am
> > > looking for an alternative. THAT is why I think less does not support
> > > it. Sorry, I should have made that more clear.
> > >
> > > cat works fine, so if I set GIT_PAGER=cat, I can at least see the
> > > diffs cleanly in SJIS, but this partner wants a pager that is usable.
> > >  
> > 
> > So you want to use SJIS because UTF-16 is not supported. So what is the
> > problem with SJIS (or UTF-8 for that matter)?  
> 
> The partner I am working with is using multi-byte SJIS, which is also not supported by this incarnation of less. As a result, UTF-8 does not work either in this situation. The content is definitely multi-byte. I know this was fixed in RedHat's Less in 2016, but did not make this platform.
> 

Both UTF-8 and SJIS is multibyte and both is supported by less
in general. If your particular less cannot support it then it is broken
and you should fix it or get it fixed.

HTH

Michal