Web lists-archives.com

Re: [PATCH] KVM: nVMX: Fix L2 guest hang if shadow page tables on EPT




On Sat, Mar 18, 2017 at 7:37 AM, Wanpeng Li <kernellwp@xxxxxxxxx> wrote:
> 2017-03-18 1:28 GMT+08:00 Ladi Prosek <lprosek@xxxxxxxxxx>:
>> On Fri, Mar 17, 2017 at 3:41 PM, Wanpeng Li <kernellwp@xxxxxxxxx> wrote:
>>> From: Wanpeng Li <wanpeng.li@xxxxxxxxxxx>
>>>
>>> The L2 guest hang if shadow page tables on EPT, the trace on L1 shows that
>>> L2 kvm_exit reason EXCEPTION_NMI and page fault repeatedly:
>>>
>>> qemu-system-x86-2821  [003] d..2    45.848814: kvm_entry: vcpu 0
>>> qemu-system-x86-2821  [003] ...1    45.848827: kvm_exit: reason EXCEPTION_NMI rip 0xe05b info fe05b 80000b0e
>>> qemu-system-x86-2821  [003] ...1    45.848827: kvm_page_fault: address fe05b error_code 14
>>>
>>> Commit 7ca29de21362 (KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT)
>>> prevents to load L2's PDPTRs according to dereferencing L2's CR3 since it is
>>> uninitialized in real mode. Hyper-V L1 will emulate L2 real mode with PAE
>>> paging and EPT enabled. However, there is a progress to switch from Legacy
>>> mode's such-mode Protected mode to Long mode during system boot, the check
>>> in nested_vmx_load_cr3() will prevent to load PDPTRs if it is still in
>>> Protected mode w/ PAE paging and nested EPT/shadow page tables on EPT. Actually
>>> the original commit should just intended to prevent to dereference L2's CR3
>>> if the L1 hypervisor emulates L2's real mode through vm8086.
>>>
>>> This patch fixes it by allowing load PDPTRs if PAE paing, EPT enabled and
>>> !vm86_active.
>>>
>>> Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>
>>> Cc: Radim Krčmář <rkrcmar@xxxxxxxxxx>
>>> Cc: Ladi Prosek <lprosek@xxxxxxxxxx>
>>> Signed-off-by: Wanpeng Li <wanpeng.li@xxxxxxxxxxx>
>>> ---
>>>  arch/x86/kvm/vmx.c | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>> index c664365..2b2a05f 100644
>>> --- a/arch/x86/kvm/vmx.c
>>> +++ b/arch/x86/kvm/vmx.c
>>> @@ -9933,7 +9933,7 @@ static bool nested_cr3_valid(struct kvm_vcpu *vcpu, unsigned long val)
>>>  static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool nested_ept,
>>>                                u32 *entry_failure_code)
>>>  {
>>> -       if (cr3 != kvm_read_cr3(vcpu) || (!nested_ept && pdptrs_changed(vcpu))) {
>>> +       if (cr3 != kvm_read_cr3(vcpu) || pdptrs_changed(vcpu)) {
>>>                 if (!nested_cr3_valid(vcpu, cr3)) {
>>>                         *entry_failure_code = ENTRY_FAIL_DEFAULT;
>>>                         return 1;
>>> @@ -9944,7 +9944,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne
>>>                  * must not be dereferenced.
>>>                  */
>>>                 if (!is_long_mode(vcpu) && is_pae(vcpu) && is_paging(vcpu) &&
>>> -                   !nested_ept) {
>>> +                   !(nested_ept && to_vmx(vcpu)->rmode.vm86_active)) {
>>
>> This change breaks Hyper-V on KVM. L2 hangs on start-up, same symptoms
>> as before 7ca29de21362.
>
> Hmm, I miss the function pdptrs_changed() will also dereference CR3.
> How about something like this:
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index c664365..d7ebf03 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -9933,7 +9933,9 @@ static bool nested_cr3_valid(struct kvm_vcpu
> *vcpu, unsigned long val)
>  static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long
> cr3, bool nested_ept,
>                     u32 *entry_failure_code)
>  {
> -    if (cr3 != kvm_read_cr3(vcpu) || (!nested_ept && pdptrs_changed(vcpu))) {
> +    if (cr3 != kvm_read_cr3(vcpu) ||
> +        (!(nested_ept && to_vmx(vcpu)->rmode.vm86_active) &&
> +        pdptrs_changed(vcpu))) {
>          if (!nested_cr3_valid(vcpu, cr3)) {
>              *entry_failure_code = ENTRY_FAIL_DEFAULT;
>              return 1;
> @@ -9944,7 +9946,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu
> *vcpu, unsigned long cr3, bool ne
>           * must not be dereferenced.
>           */
>          if (!is_long_mode(vcpu) && is_pae(vcpu) && is_paging(vcpu) &&
> -            !nested_ept) {
> +            !(nested_ept && to_vmx(vcpu)->rmode.vm86_active)) {
>              if (!load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3)) {
>                  *entry_failure_code = ENTRY_FAIL_PDPTE;
>                  return 1;

Still the same, Hyper-V is broken. The problem is not in real vs.
protected mode. The way nested_ept_enabled is computed is incorrect.

I can run both Hyper-V and KVM with EPT = 0 in L1 with this patch. Can
you please give it a try?

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 98e82ee..9145c94 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10121,7 +10121,7 @@ static int prepare_vmcs02(struct kvm_vcpu
*vcpu, struct vmcs12 *vmcs12,
                                vmcs12->guest_intr_status);
                }

-               nested_ept_enabled = (exec_control &
SECONDARY_EXEC_ENABLE_EPT) != 0;
+               nested_ept_enabled =
(vmcs12->secondary_vm_exec_control & SECONDARY_EXEC_ENABLE_EPT) != 0;

                /*
                 * Write an illegal value to APIC_ACCESS_ADDR. Later,

Thanks!
Ladi

>> I'll take a closer look next week. Is there an easy way for me to
>> reproduce the issue you're seeing?
>
> L1 KVM w/ EPT = 0 and L0 KVM w/ EPT = 1.
>
> Regards,
> Wanpeng Li
>
>>
>>>                         if (!load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3)) {
>>>                                 *entry_failure_code = ENTRY_FAIL_PDPTE;
>>>                                 return 1;
>>> --
>>> 2.7.4
>>>