Web lists-archives.com

[PATCH RFC] x86/smpboot: Set safer __max_logical_packages limit




Recent changes in logical package management (Commit 9d85eb9119f4
("x86/smpboot: Make logical package management more robust") and its
predecessor) caused boot failures for some Xen guests. E.g. I'm trying to
boot 10 CPU guest on AMD Opteron 4284 system and I see the following crash:

[    0.116104] smpboot: Max logical packages: 1
...
[    0.590068]   #8
[    0.001000] smpboot: Package 1 of CPU 8 exceeds BIOS package data 1.
[    0.001000] ------------[ cut here ]------------
[    0.001000] kernel BUG at arch/x86/kernel/cpu/common.c:1020!

This is happening because total_cpus is 10 and x86_max_cores is 16(!).
Turns out, the number of CPUs (vCPUs in our case) in each logical package
doesn't have to be exactly x86_max_cores, we can have any number of CPUs
<= x86_max_cores and they also don't have to match for all logical
packages. This breaks the current concept of __max_logical_packages.

In this patch I suggest we set __max_logical_packages based on the
max_physical_pkg_id and total_cpus, this should be safe and cover all
possible cases. Alternatively, we may think about eliminating the concept
of __max_logical_packages completely and relying on max_physical_pkg_id/
total_cpus where we currently use topology_max_packages().

The issue could've been solved in Xen too I guess. CPUID returning
x86_max_cores can be tweaked to be the lowerest(?) possible number of
all logical packages of the guest.

Fixes: 9d85eb9119f4 ("x86/smpboot: Make logical package management more robust")
Signed-off-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>
---
 arch/x86/kernel/smpboot.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index bd1f1ad..85f41cd 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -359,7 +359,6 @@ static void __init smp_init_package_map(struct cpuinfo_x86 *c, unsigned int cpu)
 		ncpus = 1;
 	}
 
-	__max_logical_packages = DIV_ROUND_UP(total_cpus, ncpus);
 	logical_packages = 0;
 
 	/*
@@ -367,6 +366,15 @@ static void __init smp_init_package_map(struct cpuinfo_x86 *c, unsigned int cpu)
 	 * package can be smaller than the actual used apic ids.
 	 */
 	max_physical_pkg_id = DIV_ROUND_UP(MAX_LOCAL_APIC, ncpus);
+
+	/*
+	 * Each logical package has not more than x86_max_cores CPUs but
+	 * it can happen that it has less, e.g. we may have 1 CPU per logical
+	 * package regardless of what's in x86_max_cores. This is seen on some
+	 * Xen setups with AMD processors.
+	 */
+	__max_logical_packages = min(max_physical_pkg_id, total_cpus);
+
 	size = max_physical_pkg_id * sizeof(unsigned int);
 	physical_to_logical_pkg = kmalloc(size, GFP_KERNEL);
 	memset(physical_to_logical_pkg, 0xff, size);
-- 
2.9.3