Web lists-archives.com

Re: Quotes around command-line argument that has unicode characters are not removed




On Thu, 22 Mar 2018 01:15:00 +0100
Dmitry Katsubo via cygwin <...> wrote:

> Dear Cygwin community,
> 
> I observe the following on my Cygwin: when I put quotes around file that has
> non-ASCII symbols, these quotes are passed to argv of the process literally,
> otherwise they are removed. I would expect that there is a consistency.
> 
> I have written a small C program that displays arguments, and run it three
> times:
> 
> #1 For the file with space, taken into quotes ("the file.txt") -- OK
> #2 For the file with non-ASCII characters (Château.txt) -- OK
> #3 For the file with non-ASCII characters, taken into quotes ("Château.txt") -- WRONG
> 
> d:\cli> uname -a
> CYGWIN_NT-6.1-WOW PC 2.9.0(0.318/5/3) 2017-09-12 10:41 i686 Cygwin
> 
> D:\cli> chcp
> Active code page: 866
> 
> D:\cli> dir
> ...cut...
> 2018-03-22  00:43                 0 Château.txt
> 2018-03-22  00:01               393 test.c
> 2018-03-22  00:01           150,230 test.exe
> 2018-03-21  00:15               186 test.pl
> 2018-03-22  00:43                 0 the file.txt
> 2018-03-22  00:40                16 текст плюс.txt
>                6 File(s)        150,825 bytes
>                2 Dir(s)  41,972,293,632 bytes free
> 
> D:\cli> test "the file.txt"
> param 0 = test
> param 1 = the file.txt
> File 'the file.txt' was opened
> 
> D:\cli> test Château.txt
> param 0 = test
> param 1 = Château.txt
> File 'Château.txt' was opened
> 
> D:\cli> test "Château.txt"
> param 0 = test
> param 1 = "Château.txt"
> Failed to open '"Château.txt"': No such file or directory
> 
> As one can see, the last run fails. I am a bit puzzled: how can I pass the name
> of the file with space and Unicode symbols? I need to do it in uniform way, as I
> am calling a Cygwin program from native Windows program, as in [1].
> 
> D:\cli> test "текст плюс.txt"
> param 0 = test
> param 1 = "текст плюс.txt"
> Failed to open '"текст плюс.txt"': No such file or directory
> 
> I have search a bit, but I couldn't find a direct answer. From post [1] and [2]
> I see that compiler inserts the code to do some argument pre-processing like
> @pathnames [3], but what are exactly the rules? Is quote pre-processing done in
> dcrt0.cc:177 [4]?
> 
> Any feedback is appreciated.
> 
> [1] https://sourceware.org/ml/cygwin/2016-05/msg00082.html
> [2] http://daviddeley.com/autohotkey/parameters/parameters.htm
> [3] https://cygwin.com/cygwin-ug-net/using-specialnames.html#pathnames-at
> [4] https://github.com/openunix/cygwin/blob/master/winsup/cygwin/dcrt0.cc#L177
> 
> === test.c ===
> #include <stdio.h>
> #include <errno.h>
> #include <string.h>
> 
> int main(int argc, char* argv[])
> {
> 	for (int i = 0; i < argc; i++)
> 	{
> 		printf("param %d = %s\n", i, argv[i]);
> 	}
> 	FILE* f = fopen(argv[1], "r");
> 	if (f != NULL)
> 	{
> 		printf("File '%s' was opened\n", argv[1]);
> 		fclose(f);
> 	} else {
> 		printf("Failed to open '%s': %s\n", argv[1], strerror(errno));
> 	}
> 	return 0;
> }
> 
> -- 

Hello, Dmintry,
consider these test cases:

Native (msvcrt) binary:
-----------------------
$ x86_64-w64-mingw32-gcc test.c -o test-win.exe
$ ldd test-win.exe
        ntdll.dll => /cygdrive/c/Windows/SYSTEM32/ntdll.dll (0x7fa05900000)
        KERNEL32.DLL => /cygdrive/c/Windows/system32/KERNEL32.DLL (0x7fa030e0000)
        KERNELBASE.dll => /cygdrive/c/Windows/system32/KERNELBASE.dll (0x7fa028f0000)
        msvcrt.dll => /cygdrive/c/Windows/system32/msvcrt.dll (0x7fa03220000)
-----------------------

Cygwin-flavor binary:
---------------------
$ gcc test.c -o test-cygwin.exe
$ ldd test-cygwin.exe
        ntdll.dll => /cygdrive/c/Windows/SYSTEM32/ntdll.dll (0x7fa05900000)
        KERNEL32.DLL => /cygdrive/c/Windows/system32/KERNEL32.DLL (0x7fa030e0000)
        KERNELBASE.dll => /cygdrive/c/Windows/system32/KERNELBASE.dll (0x7fa028f0000)
        cygwin1.dll => /usr/bin/cygwin1.dll (0x180040000)
---------------------

Create a file with non-ascii chars in the name:
-----------------------------------------------
$ touch "текст плюс.txt"
-----------------------------------------------

Run both binaries in mintty with bash:
--------------------------------------
$ ./test-win "текст плюс.txt"
param 0 = D:\wroot\test.cygwin\Quotes around command-line argument that has unicode characters are not removed\test-win.exe
param 1 = ▒▒▒▒▒ ▒▒▒▒.txt
File '▒▒▒▒▒ ▒▒▒▒.txt' was opened
$ ./test-cygwin "текст плюс.txt"
param 0 = ./test-cygwin
param 1 = текст плюс.txt
File 'текст плюс.txt' was opened
--------------------------------------

Run the binaries in cmd.exe with bash:
--------------------------------------
$ ./test-win "текст плюс.txt"
param 0 = D:\wroot\test.cygwin\Quotes around command-line argument that has unicode characters are not removed\test-win.exe
param 1 = ЄхъёЄ яы■ё.txt
File 'ЄхъёЄ яы■ё.txt' was opened
$ ./test-cygwin "текст плюс.txt"
param 0 = ./test-cygwin
param 1 = текст плюс.txt
File 'текст плюс.txt' was opened
--------------------------------------

Run in bare cmd.exe
(/usr/bin/cygwin1.dll should be copied next to ./test-cygwin.exe)
-------------------
D:\wroot\test.cygwin\Quotes around command-line argument that has unicode characters are not removed>.\test-win.exe "текст плюс.txt"
param 0 = .\test-win.exe
param 1 = ЄхъёЄ яы■ё.txt
File 'ЄхъёЄ яы■ё.txt' was opened
D:\wroot\test.cygwin\Quotes around command-line argument that has unicode characters are not removed>.\test-cygwin.exe "текст плюс.txt"
param 0 = ./test-cygwin
param 1 = "текст плюс.txt"
Failed to open '"текст плюс.txt"': No such file or directory
-------------------

In bare cmd.exe native-msvcrt binary is working OK with quoted non-ascii
arguments, while cygwin-flavor binary is not. But I don't know exactly which
level here: cmd.exe or msvcrt.dll/cygwin1.dll is responsible for
such a behavior.


-- 


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple