Web lists-archives.com

Re: [RFC PATCH 4/4] x86/vdso: Add __vdso_sgx_eenter() to wrap SGX enclave transitions




On Wed, Dec 5, 2018 at 3:20 PM Sean Christopherson
<sean.j.christopherson@xxxxxxxxx> wrote:
>
> Intel Software Guard Extensions (SGX) SGX introduces a new CPL3-only
> enclave mode that runs as a sort of black box shared object that is
> hosted by an untrusted normal CPL3 process.
>
> Enclave transitions have semantics that are a lovely blend of SYCSALL,
> SYSRET and VM-Exit.  In a non-faulting scenario, entering and exiting
> an enclave can only be done through SGX-specific instructions, EENTER
> and EEXIT respectively.  EENTER+EEXIT is analogous to SYSCALL+SYSRET,
> e.g. EENTER/SYSCALL load RCX with the next RIP and EEXIT/SYSRET load
> RIP from R{B,C}X.
>
> But in a faulting/interrupting scenario, enclave transitions act more
> like VM-Exit and VMRESUME.  Maintaining the black box nature of the
> enclave means that hardware must automatically switch CPU context when
> an Asynchronous Exiting Event (AEE) occurs, an AEE being any interrupt
> or exception (exceptions are AEEs because asynchronous in this context
> is relative to the enclave and not CPU execution, e.g. the enclave
> doesn't get an opportunity to save/fuzz CPU state).
>
> Like VM-Exits, all AEEs jump to a common location, referred to as the
> Asynchronous Exiting Point (AEP).  The AEP is specified at enclave entry
> via register passed to EENTER/ERESUME, similar to how the hypervisor
> specifies the VM-Exit point (via VMCS.HOST_RIP at VMLAUNCH/VMRESUME).
> Resuming the enclave/VM after the exiting event is handled is done via
> ERESUME/VMRESUME respectively.  In SGX, AEEs that are handled by the
> kernel, e.g. INTR, NMI and most page faults, IRET will journey back to
> the AEP which then ERESUMEs th enclave.
>
> Enclaves also behave a bit like VMs in the sense that they can generate
> exceptions as part of their normal operation that for all intents and
> purposes need to handled in the enclave/VM.  However, unlike VMX, SGX
> doesn't allow the host to modify its guest's, a.k.a. enclave's, state,
> as doing so would circumvent the enclave's security.  So to handle an
> exception, the enclave must first be re-entered through the normal
> EENTER flow (SYSCALL/SYSRET behavior), and then resumed via ERESUME
> (VMRESUME behavior) after the source of the exception is resolved.
>
> All of the above is just the tip of the iceberg when it comes to running
> an enclave.  But, SGX was designed in such a way that the host process
> can utilize a library to build, launch and run an enclave.  This is
> roughly analogous to how e.g. libc implementations are used by most
> applications so that the application can focus on its business logic.
>
> The big gotcha is that because enclaves can generate *and* handle
> exceptions, any SGX library must be prepared to handle nearly any
> exception at any time (well, any time a thread is executing in an
> enclave).  In Linux, this means the SGX library must register a
> signal handler in order to intercept relevant exceptions and forward
> them to the enclave (or in some cases, take action on behalf of the
> enclave).  Unfortunately, Linux's signal mechanism doesn't mesh well
> with libraries, e.g. signal handlers are process wide, are difficult
> to chain, etc...  This becomes particularly nasty when using multiple
> levels of libraries that register signal handlers, e.g. running an
> enclave via cgo inside of the Go runtime.
>
> In comes vDSO to save the day.  Now that vDSO can fixup exceptions,
> add a function to wrap enclave transitions and intercept any exceptions
> that occur in the enclave or on EENTER/ERESUME.  The actually code is
> blissfully short (especially compared to this changelog).
>
> In addition to the obvious trapnr, error_code and address, propagate
> the leaf number, i.e. RAX, back to userspace so that the caller can know
> whether the fault occurred in the enclave or if it occurred on EENTER.
> A fault on EENTER generally means the enclave has died and needs to be
> restarted.
>
> Suggested-by: Andy Lutomirski <luto@xxxxxxxxxxxxxx>
> Cc: Andy Lutomirski <luto@xxxxxxxxxxxxxx>
> Cc: Jarkko Sakkinen <jarkko.sakkinen@xxxxxxxxxxxxxxx>
> Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
> Cc: Josh Triplett <josh@xxxxxxxxxxxxxxxx>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
> ---
>  arch/x86/entry/vdso/Makefile      |   1 +
>  arch/x86/entry/vdso/vdso.lds.S    |   1 +
>  arch/x86/entry/vdso/vsgx_eenter.c | 108 ++++++++++++++++++++++++++++++
>  3 files changed, 110 insertions(+)
>  create mode 100644 arch/x86/entry/vdso/vsgx_eenter.c
>
> diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
> index eb543ee1bcec..ba46673076bd 100644
> --- a/arch/x86/entry/vdso/Makefile
> +++ b/arch/x86/entry/vdso/Makefile
> @@ -18,6 +18,7 @@ VDSO32-$(CONFIG_IA32_EMULATION)       := y
>
>  # files to link into the vdso
>  vobjs-y := vdso-note.o vclock_gettime.o vgetcpu.o
> +vobjs-$(VDSO64-y)              += vsgx_eenter.o
>
>  # files to link into kernel
>  obj-y                          += vma.o extable.o
> diff --git a/arch/x86/entry/vdso/vdso.lds.S b/arch/x86/entry/vdso/vdso.lds.S
> index d3a2dce4cfa9..e422c4454f34 100644
> --- a/arch/x86/entry/vdso/vdso.lds.S
> +++ b/arch/x86/entry/vdso/vdso.lds.S
> @@ -25,6 +25,7 @@ VERSION {
>                 __vdso_getcpu;
>                 time;
>                 __vdso_time;
> +               __vdso_sgx_eenter;
>         local: *;
>         };
>  }
> diff --git a/arch/x86/entry/vdso/vsgx_eenter.c b/arch/x86/entry/vdso/vsgx_eenter.c
> new file mode 100644
> index 000000000000..3df4a95a34cc
> --- /dev/null
> +++ b/arch/x86/entry/vdso/vsgx_eenter.c
> @@ -0,0 +1,108 @@
> +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
> +// Copyright(c) 2018 Intel Corporation.
> +
> +#include <uapi/linux/errno.h>
> +#include <uapi/linux/types.h>
> +
> +#include "extable.h"
> +
> +/*
> + * This struct will be defined elsewhere in the actual implementation,
> + * e.g. arch/x86/include/uapi/asm/sgx.h.
> + */
> +struct sgx_eenter_fault_info {
> +       __u32   leaf;
> +       __u16   trapnr;
> +       __u16   error_code;
> +       __u64   address;
> +};
> +
> +/*
> + * ENCLU (ENCLave User) is an umbrella instruction for a variety of CPL3
> + * SGX functions,  The ENCLU function that is executed is specified in EAX,
> + * with each function potentially having more leaf-specific operands beyond
> + * EAX.  In the vDSO we're only concerned with the leafs that are used to
> + * transition to/from the enclave.
> + */
> +enum sgx_enclu_leaves {
> +       SGX_EENTER      = 2,
> +       SGX_ERESUME     = 3,
> +       SGX_EEXIT       = 4,
> +};
> +
> +notrace long __vdso_sgx_eenter(void *tcs, void *priv,
> +                              struct sgx_eenter_fault_info *fault_info)
> +{
> +       u32 trapnr, error_code;
> +       long leaf;
> +       u64 addr;
> +
> +       /*
> +        *      %eax = EENTER
> +        *      %rbx = tcs
> +        *      %rcx = do_eresume
> +        *      %rdi = priv
> +        * do_eenter:
> +        *      enclu
> +        *      jmp     out
> +        *
> +        * do_eresume:
> +        *      enclu
> +        *      ud2

Is the only reason for do_eresume to be different from do_eenter so
that you can do the ud2?

> +        *
> +        * out:
> +        *      <return to C code>
> +        *
> +        * fault_fixup:
> +        *      <extable loads RDI, DSI and RDX with fault info>
> +        *      jmp     out
> +        */

This has the IMO excellent property that it's extremely awkward to use
it for a model where the enclave is reentrant.  I think it's excellent
because reentrancy on the same enclave thread is just asking for
severe bugs.  Of course, I fully expect the SDK to emulate reentrancy,
but then it's 100% their problem :)  On the fiip side, it means that
you can't really recover from a reported fault, even if you want to,
because there's no way to ask for ERESUME.  So maybe the API should
allow that after all.

I think it might be polite to at least give some out regs, maybe RSI and RDI?