Web lists-archives.com

RE: Regression for OCaml introduced by rebase 4.4.4




Corinna Vinschen wrote:
> On Feb  9 11:29, David Allsopp wrote:
> > Corinna Vinschen wrote:
> > > On Feb  8 11:47, David Allsopp wrote:
> > > > TL;DR flexlink-compiled DLLs (i.e. ocaml libraries) are broken by
> > > > the
> > > > 0x200000000 base address requirement added in rebase 4.4.4.
> > > > Possible fixes for this at the bottom.
> > > > [...]
> > > >   $ ocaml
> > > >           OCaml version 4.04.2
> > > >
> > > >   # #load "unix.cma";;
> > > >   Cannot load required shared library dllunix.
> > > >   Reason: /usr/lib/ocaml/stublibs/dllunix.so: flexdll error:
> > > > cannot relocate RELOC_REL32, target is too far: 0xfffffffc013d8b5f
> > > 0x13d8b5f.
> > > >
> > > > This is a known problem and fundamental limitation of flexdll
> > > > (there is no
> > > > RELOC_REL64 in COFF).
> > >
> > > Apart from that, not only Cygwin DLLs but also the Windows system
> > > DLLs are all loaded and relocated to the area beyond 0x1:80000000,
> > > so relocation beyond the 32 bit address space is no generic problem
> > > in Windows.  Why isn't that possible in FlexDLL?  I don't understand
> this.
> > > To me this looks like a bug in FlexDLL, not a requirement to let
> > > certain DLLs slip through the cracks.
> >
> > There's a more full explanation of what and why for flexdll here:
> > https://github.com/alainfrisch/flexdll/blob/master/README.md. I
> > believe it's not unrelated to some of the black magic going on in
> > Cygwin's autoload.cc, but without (at least at the moment), quite as
> > much self-modifying code.
> >
> > FlexDLL is "solving" the problem of allowing a dynamically loaded
> > library to refer to symbols in the main application (or in previously
> > dynamically loaded libraries, without loading them a second time, as
> > the Windows loader I believe does). FlexDLL does this by deferring
> > COFF relocations to runtime and it achieves that by sitting in front
> > of both the linker when the DLL is constructed and also an
> > application's main (or dllmain). For normal linking, since PE limits
> > code size to 2GB, there is no need for a RELOC_REL64 relocation type.
> > However, because we're actually resolving the symbols dynamically, on
> > 64-bit the DLL may have been loaded too far from the executable (or
> > other DLL) image it's resolving to (for actual Windows resolution to
> > DLL symbols, you'd be using the stub code generated either by the
> > linker or by __declspec(dllimport), which would similarly be
> > guaranteed to be within the range of RELOC_REL32 because the stub
> > itself is static).
> >
> > When this was originally encountered for 64-bit MSVC (this was all
> > added before Cygwin64 existed), the solution at the time was to keep
> > the preferred base addresses low, but in reality what's really
> > required is that everything is within a 2GB window somewhere in the
> > address space.
> >
> > I guess one can argue over whether that's a bug or a limitation, but
> > the problem we face is that we can engineer it so that our DLLs and
> > executables are within a 2GB range (having looked again at this in
> > even more detail, we could just as readily do this with addresses >
> > 0x200000000), but we still run the risk of rebase messing up the DLLs.
> >
> > However, we'll scratch our heads some more on possible alternative
> > solutions, since having a flag for DLLs which says "keep us within a
> > 2GB range somewhere" sounds even more less likely to get merged than
> > my previous suggestion.
> 
> Two points:
> 
> - You are aware that the main executable of 64 bit Cygwin processes are
>   loaded to 0x1:00400000, right?  The 2 GB offset problem is already
>   imminent.

Our executables are also compiled via flexdll's flexlink which sets --image-base in its call to the linker. I don't think the Cygwin DLL does anything which alters that, right? Another "fix" I tried while investigating was to change the --image-base we specified to be within 2GB of where rebase has put the DLLs, which also worked.

> - What about adding an addition jump table?  The relocation would only
>   have to point to the jump table in the vicinity of the DLL in
>   question, the jump table points to the actual 64 bit address.

That was what our head-scratching has arrived at too, which I'm in the process of doing.

> I'm curious why this isn't done yet.

I'm hoping that doing it is going to reveal that it simply wasn't considered in 2008, rather than that it was and there was an issue with it (I think it will just be that it wasn't thought of - like Cygwin at that time in 2008, our x86_64 on Windows support was extremely limited and not receiving much engineering focus).


David