Web lists-archives.com

Re: [PATCH] specify encoding for sed command

On Thu, Apr 5, 2018 at 8:53 AM, Ævar Arnfjörð Bjarmason
<avarab@xxxxxxxxx> wrote:
> On Thu, Apr 05 2018, Stephon Harris wrote:
>> Fixes issue with seeing `sed: RE error: illegal byte sequence` when running git-completion.bash

  - prefix the subject line with "completion:".
  - nit: replace "running" with "sourcing".
  - wrap the body of the commit message around 70 characters.
  - sign off your patch; see Documentation/SubmittingPatches and 'git
    commit -s'.

>> ---
>>  contrib/completion/git-completion.bash | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
>> index b09c8a23626b4..52a4ab5e2165a 100644
>> --- a/contrib/completion/git-completion.bash
>> +++ b/contrib/completion/git-completion.bash
>> @@ -282,7 +282,7 @@ __gitcomp ()
>>  # Clear the variables caching builtins' options when (re-)sourcing
>>  # the completion script.
>> -unset $(set |sed -ne 's/^\(__gitcomp_builtin_[a-zA-Z0-9_][a-zA-Z0-9_]*\)=.*/\1/p') 2>/dev/null
>> +unset $(set |LANG=C sed -ne 's/^\(__gitcomp_builtin_[a-zA-Z0-9_][a-zA-Z0-9_]*\)=.*/\1/p') 2>/dev/null

I was wondering how could you see an error message from 'sed', when the
stderr of the whole thing is redirected to /dev/null.  It turns out that
such a redirection doesn't affect the stderr of the commands executed
inside the command substitution:

  $ echo $(echo foo ; echo >&2 error) 2>/dev/null

Interesting, I didn't know that.

> This is getting closer to the issue than your previous patch, but
> there's still some open questions:
> 1) What platform OS / version / sed version is this on?
> 2) What's the output from "set" that's causing this error? Do we have an
>    isolated test case for that?

A quick internet search suggests that this error message tends to appear
on Macs, when its 'sed' encounters an invalid UTF-8 character in its

> 3) There's other invocations of "sed" in the file, aren't those affected
>    as well?

The two 'sed' invocations in _git_stash() are suspect, as they process
the output of 'git stash list', which includes the stashes'
descriptions, which can contain whatever the users wished to store there
(though they would probably get a "Warning: commit message did not
conform to UTF-8" message from 9e83266525 (commit-tree: encourage UTF-8
commit messages., 2006-12-22) when doing so).

The 'sed' invocation in __git_complete_revlist_file() is suspect as
well, as it processes the output of 'git ls-tree'.  Pathnames can
contain anything, and while 'git ls-tree' quotes pathnames with
"unusual" characters, I don't know whether it quotes all characters that
can possibly upset Stephon's 'sed'.

What about 'awk' on Stephon's platform?  We already have one 'awk'
invocation in __git_match_ctag(), which we use for 'git grep <TAB>' and
'git log -L:<TAB>'.  That 'awk' script uses a regexp to match the symbol
name in a ctags file; does any programming language allow such unusual
characters in symbol names?

What about 'sed' and 'awk' scripts that don't use regexps at all?  (I
intend to add such an 'awk' script in a day or two...)

> 4) Any reason we wouldn't just set LC_AlL=C for the whole file? I see we
>    already do it for our invocation to "git merge".

That would perhaps be the safest thing to do...
But how can we set it for a whole file, when the file is not executed as
a script but sourced into the user's shell environment?