I found out a case where mingw gcc does not honour x64 calling convention even 
if explicitly asked to use this ABI. Consider a function as simple as this:

__attribute__((ms_abi, noinline)) __int128 passthrough_a_c(__int128 a) {
   return a;

This assembly gets generated:

   movdqa (%rcx), %xmm0

Now, to summarize what happens here: argument `a` is expected to get passed in 
as a pointer to caller-allocated stack location. That is correct behaviour as 
per strict reading of [this document][ms1]:

> Integer arguments are passed in registers RCX, RDX, R8, and R9. Floating 
> point arguments are passed in XMM0L, XMM1L, XMM2L, and XMM3L. 16-byte 
> arguments are passed by reference.

Argument `a` is indeed a 16 byte argument and is passed by reference. Great!

Return value, however, gets put into the %xmm0 register and int128 being an… 
integer… this is alarming. Turns out, wrong too. A strict reading of [this 
document][msdn2] has this to say:

> Otherwise [NB: if a type is in fact a C++03 POD type; which __int128 is], 
> the caller assumes the responsibility of allocating memory and passing a 
> pointer for the return value as the first argument [NB: that would be 
> RCX]. Subsequent arguments are then shifted one argument to the right. The 
> same pointer must be returned by the callee in RAX.

So, instead of being compiled as it is currently, the function should be 
compiled like this instead (intel syntax):

   vmovups     xmm0,xmmword ptr [rdx]
   vmovdqa     xmmword ptr [rcx],xmm0
   mov         rax,rcx

or, alternatively, this:

   movq %rcx, %rax
   movq (%rdx), %r9
   movq 8(%rdx), %r10
   movq %r9, (%rcx)
   movq %r10, 8(%rcx)

which is what mingw gcc generates for a slight variation of the original case:

struct i128 { uint64_t a; uint64_t b; };
__attribute__((ms_abi, noinline)) struct i128 passthrough_a_c(struct i128 a) {
   return a;

[msdn1]: https://msdn.microsoft.com/en-us/library/ms235286.aspx
[msdn2]: https://msdn.microsoft.com/en-us/library/7572ztz4.aspx

