Web lists-archives.com

[GIT PULL] device-dax fixes for 4.11-rc3




Hi Linus, please pull from:

  git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm libnvdimm-fixes

...to receive 2 fixes and a cleanup for device-dax.

The device-dax driver was not being careful to handle falling back to
smaller fault-granularity sizes. The driver already fails fault
attempts that are smaller than the device's alignment, but it also
needs to handle the cases where a larger page mapping could be
established. For simplicity of the immediate fix the implementation
just signals VM_FAULT_FALLBACK until fault-size == device-alignment.
One fix is for -stable to address pmd-to-pte fallback from the
original implementation, another fix is for the new (introduced in
4.11-rc1) pud-to-pmd regression, and a typo fix comes along for the
ride. These have received a build success notification from the kbuild
robot.

The following changes since commit c1ae3cfa0e89fa1a7ecc4c99031f5e9ae99d9201:

  Linux 4.11-rc1 (2017-03-05 12:59:56 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm libnvdimm-fixes

for you to fetch changes up to 52084f89b38cdd896b59627c629915ef1a7bf615:

  device-dax: fix debug output typo (2017-03-10 19:56:56 -0800)

----------------------------------------------------------------
Dave Jiang (3):
      device-dax: fix pmd/pte fault fallback handling
      device-dax: fix pud fault fallback handling
      device-dax: fix debug output typo

 drivers/dax/dax.c | 33 ++++++++++++++++++++++++++++++---
 1 file changed, 30 insertions(+), 3 deletions(-)

---

commit 0134ed4fb9e78672ee9f7b18007114404c81e63f
Author: Dave Jiang <dave.jiang@xxxxxxxxx>
Date:   Fri Mar 10 13:24:22 2017 -0700

    device-dax: fix pmd/pte fault fallback handling

    Jeff Moyer reports:

        With a device dax alignment of 4KB or 2MB, I get sigbus when running
        the attached fio job file for the current kernel (4.11.0-rc1+).  If
        I specify an alignment of 1GB, it works.

        I turned on debug output, and saw that it was failing in the huge
        fault code.

         dax dax1.0: dax_open
         dax dax1.0: dax_mmap
         dax dax1.0: dax_dev_huge_fault: fio: write (0x7f08f0a00000 -
         dax dax1.0: __dax_dev_pud_fault: phys_to_pgoff(0xffffffffcf60
         dax dax1.0: dax_release

        fio config for reproduce:
        [global]
        ioengine=dev-dax
        direct=0
        filename=/dev/dax0.0
        bs=2m

        [write]
        rw=write

        [read]
        stonewall
        rw=read

    The driver fails to fallback when taking a fault that is larger than
    the device alignment, or handling a larger fault when a smaller
    mapping is already established. While we could support larger
    mappings for a device with a smaller alignment, that change is
    too large for the immediate fix. The simplest change is to force
    fallback until the fault size matches the alignment.

    Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap")
    Cc: <stable@xxxxxxxxxxxxxxx>
    Reported-by: Jeff Moyer <jmoyer@xxxxxxxxxx>
    Signed-off-by: Dave Jiang <dave.jiang@xxxxxxxxx>
    Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>

commit 70b085b06c4560a69e95607f77bb4c2b2e41943c
Author: Dave Jiang <dave.jiang@xxxxxxxxx>
Date:   Fri Mar 10 13:24:27 2017 -0700

    device-dax: fix pud fault fallback handling

    Jeff Moyer reports:

        With a device dax alignment of 4KB or 2MB, I get sigbus when running
        the attached fio job file for the current kernel (4.11.0-rc1+).  If
        I specify an alignment of 1GB, it works.

        I turned on debug output, and saw that it was failing in the huge
        fault code.

         dax dax1.0: dax_open
         dax dax1.0: dax_mmap
         dax dax1.0: dax_dev_huge_fault: fio: write (0x7f08f0a00000 -
         dax dax1.0: __dax_dev_pud_fault: phys_to_pgoff(0xffffffffcf60)
         dax dax1.0: dax_release

        fio config for reproduce:
        [global]
        ioengine=dev-dax
        direct=0
        filename=/dev/dax0.0
        bs=2m

        [write]
        rw=write

        [read]
        stonewall
        rw=read

    The driver fails to fallback when taking a fault that is larger than
    the device alignment, or handling a larger fault when a smaller
    mapping is already established. While we could support larger
    mappings for a device with a smaller alignment, that change is
    too large for the immediate fix. The simplest change is to force
    fallback until the fault size matches the alignment.

    Reported-by: Jeff Moyer <jmoyer@xxxxxxxxxx>
    Signed-off-by: Dave Jiang <dave.jiang@xxxxxxxxx>
    Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>

commit 52084f89b38cdd896b59627c629915ef1a7bf615
Author: Dave Jiang <dave.jiang@xxxxxxxxx>
Date:   Thu Mar 9 16:56:01 2017 -0700

    device-dax: fix debug output typo

    The debug output for return the return data of pgoff_to_phys() in the
    fault handlers has 'phys' and 'pgoff' incorrectly swapped.

    Reported-by: Jeff Moyer <jmoyer@xxxxxxxxxx>
    Signed-off-by: Dave Jiang <dave.jiang@xxxxxxxxx>
    Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>

diff --git a/drivers/dax/dax.c b/drivers/dax/dax.c
index 8d9829ff2a78..80c6db279ae1 100644
--- a/drivers/dax/dax.c
+++ b/drivers/dax/dax.c
@@ -427,6 +427,7 @@ static int __dax_dev_pte_fault(struct dax_dev
*dax_dev, struct vm_fault *vmf)
        int rc = VM_FAULT_SIGBUS;
        phys_addr_t phys;
        pfn_t pfn;
+       unsigned int fault_size = PAGE_SIZE;

        if (check_vma(dax_dev, vmf->vma, __func__))
                return VM_FAULT_SIGBUS;
@@ -437,9 +438,12 @@ static int __dax_dev_pte_fault(struct dax_dev
*dax_dev, struct vm_fault *vmf)
                return VM_FAULT_SIGBUS;
        }

+       if (fault_size != dax_region->align)
+               return VM_FAULT_SIGBUS;
+
        phys = pgoff_to_phys(dax_dev, vmf->pgoff, PAGE_SIZE);
        if (phys == -1) {
-               dev_dbg(dev, "%s: phys_to_pgoff(%#lx) failed\n", __func__,
+               dev_dbg(dev, "%s: pgoff_to_phys(%#lx) failed\n", __func__,
                                vmf->pgoff);
                return VM_FAULT_SIGBUS;
        }
@@ -464,6 +468,7 @@ static int __dax_dev_pmd_fault(struct dax_dev
*dax_dev, struct vm_fault *vmf)
        phys_addr_t phys;
        pgoff_t pgoff;
        pfn_t pfn;
+       unsigned int fault_size = PMD_SIZE;

        if (check_vma(dax_dev, vmf->vma, __func__))
                return VM_FAULT_SIGBUS;
@@ -480,10 +485,20 @@ static int __dax_dev_pmd_fault(struct dax_dev
*dax_dev, struct vm_fault *vmf)
                return VM_FAULT_SIGBUS;
        }

+       if (fault_size < dax_region->align)
+               return VM_FAULT_SIGBUS;
+       else if (fault_size > dax_region->align)
+               return VM_FAULT_FALLBACK;
+
+       /* if we are outside of the VMA */
+       if (pmd_addr < vmf->vma->vm_start ||
+                       (pmd_addr + PMD_SIZE) > vmf->vma->vm_end)
+               return VM_FAULT_SIGBUS;
+
        pgoff = linear_page_index(vmf->vma, pmd_addr);
        phys = pgoff_to_phys(dax_dev, pgoff, PMD_SIZE);
:        if (phys == -1) {
-               dev_dbg(dev, "%s: phys_to_pgoff(%#lx) failed\n", __func__,
+               dev_dbg(dev, "%s: pgoff_to_phys(%#lx) failed\n", __func__,
                                pgoff);
                return VM_FAULT_SIGBUS;
        }
@@ -503,6 +518,8 @@ static int __dax_dev_pud_fault(struct dax_dev
*dax_dev, struct vm_fault *vmf)
        phys_addr_t phys;
        pgoff_t pgoff;
        pfn_t pfn;
+       unsigned int fault_size = PUD_SIZE;
+

        if (check_vma(dax_dev, vmf->vma, __func__))
                return VM_FAULT_SIGBUS;
@@ -519,10 +536,20 @@ static int __dax_dev_pud_fault(struct dax_dev
*dax_dev, struct vm_fault *vmf)
                return VM_FAULT_SIGBUS;
        }

+       if (fault_size < dax_region->align)
+               return VM_FAULT_SIGBUS;
+       else if (fault_size > dax_region->align)
+               return VM_FAULT_FALLBACK;
+
+       /* if we are outside of the VMA */
+       if (pud_addr < vmf->vma->vm_start ||
+                       (pud_addr + PUD_SIZE) > vmf->vma->vm_end)
+               return VM_FAULT_SIGBUS;
+
        pgoff = linear_page_index(vmf->vma, pud_addr);
        phys = pgoff_to_phys(dax_dev, pgoff, PUD_SIZE);
        if (phys == -1) {
-               dev_dbg(dev, "%s: phys_to_pgoff(%#lx) failed\n", __func__,
+               dev_dbg(dev, "%s: pgoff_to_phys(%#lx) failed\n", __func__,
                                pgoff);
                return VM_FAULT_SIGBUS;
        }