commit 93e02ae4200184bab43ce29966e895826a756a37
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date:   Wed Aug 15 18:14:55 2018 +0200

    Linux 4.9.120

commit 7f5d090ffe9e7603265e7991aacec64d86cf70ab
Author: Borislav Petkov <bpetkov@suse.de>
Date:   Fri Apr 27 16:34:34 2018 -0500

    x86/CPU/AMD: Have smp_num_siblings and cpu_llc_id always be present
    
    commit f8b64d08dde2714c62751d18ba77f4aeceb161d3 upstream.
    
    Move smp_num_siblings and cpu_llc_id to cpu/common.c so that they're
    always present as symbols and not only in the CONFIG_SMP case. Then,
    other code using them doesn't need ugly ifdeffery anymore. Get rid of
    some ifdeffery.
    
    Signed-off-by: Borislav Petkov <bpetkov@suse.de>
    Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
    Signed-off-by: Borislav Petkov <bp@suse.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Link: http://lkml.kernel.org/r/1524864877-111962-2-git-send-email-suravee.suthikulpanit@amd.com
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 4edf4ad2e7ee7d527fd8288c22d6ee608eae705c
Author: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Date:   Mon Jul 31 10:51:58 2017 +0200

    x86/cpu/amd: Limit cpu_core_id fixup to families older than F17h
    
    commit b89b41d0b8414690ec0030c134b8bde209e6d06c upstream.
    
    Current cpu_core_id fixup causes downcored F17h configurations to be
    incorrect:
    
      NODE: 0
      processor  0 core id : 0
      processor  1 core id : 1
      processor  2 core id : 2
      processor  3 core id : 4
      processor  4 core id : 5
      processor  5 core id : 0
    
      NODE: 1
      processor  6 core id : 2
      processor  7 core id : 3
      processor  8 core id : 4
      processor  9 core id : 0
      processor 10 core id : 1
      processor 11 core id : 2
    
    Code that relies on the cpu_core_id, like match_smt(), for example,
    which builds the thread siblings masks used by the scheduler, is
    mislead.
    
    So, limit the fixup to pre-F17h machines. The new value for cpu_core_id
    for F17h and later will represent the CPUID_Fn8000001E_EBX[CoreId],
    which is guaranteed to be unique for each core within a socket.
    
    This way we have:
    
      NODE: 0
      processor  0 core id : 0
      processor  1 core id : 1
      processor  2 core id : 2
      processor  3 core id : 4
      processor  4 core id : 5
      processor  5 core id : 6
    
      NODE: 1
      processor  6 core id : 8
      processor  7 core id : 9
      processor  8 core id : 10
      processor  9 core id : 12
      processor 10 core id : 13
      processor 11 core id : 14
    
    Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
    [ Heavily massaged. ]
    Signed-off-by: Borislav Petkov <bp@suse.de>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
    Link: http://lkml.kernel.org/r/20170731085159.9455-2-bp@alien8.de
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit b4f17de89e7aaecfc67a173ca8607899ee8707c3
Author: Jiri Kosina <jkosina@suse.cz>
Date:   Sat Jul 14 21:56:13 2018 +0200

    x86/speculation/l1tf: Unbreak !__HAVE_ARCH_PFN_MODIFY_ALLOWED architectures
    
    commit 6c26fcd2abfe0a56bbd95271fce02df2896cfd24 upstream.
    
    pfn_modify_allowed() and arch_has_pfn_modify_check() are outside of the
    !__ASSEMBLY__ section in include/asm-generic/pgtable.h, which confuses
    assembler on archs that don't have __HAVE_ARCH_PFN_MODIFY_ALLOWED (e.g.
    ia64) and breaks build:
    
        include/asm-generic/pgtable.h: Assembler messages:
        include/asm-generic/pgtable.h:538: Error: Unknown opcode `static inline bool pfn_modify_allowed(unsigned long pfn,pgprot_t prot)'
        include/asm-generic/pgtable.h:540: Error: Unknown opcode `return true'
        include/asm-generic/pgtable.h:543: Error: Unknown opcode `static inline bool arch_has_pfn_modify_check(void)'
        include/asm-generic/pgtable.h:545: Error: Unknown opcode `return false'
        arch/ia64/kernel/entry.S:69: Error: `mov' does not fit into bundle
    
    Move those two static inlines into the !__ASSEMBLY__ section so that they
    don't confuse the asm build pass.
    
    Fixes: 42e4089c7890 ("x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings")
    Signed-off-by: Jiri Kosina <jkosina@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    [groeck: Context changes]
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 16848eb10e9e0989e5898dec204f0967c483f044
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Tue Aug 14 20:50:47 2018 +0200

    x86/init: fix build with CONFIG_SWAP=n
    
    commit 792adb90fa724ce07c0171cbc96b9215af4b1045 upstream.
    
    The introduction of generic_max_swapfile_size and arch-specific versions has
    broken linking on x86 with CONFIG_SWAP=n due to undefined reference to
    'generic_max_swapfile_size'. Fix it by compiling the x86-specific
    max_swapfile_size() only with CONFIG_SWAP=y.
    
    Reported-by: Tomas Pruzina <pruzinat@gmail.com>
    Fixes: 377eeaa8e11f ("x86/speculation/l1tf: Limit swap file size to MAX_PA/2")
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Cc: stable@vger.kernel.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit aee0861fbe95f2311c81b8162bbd1eb196cdf5f2
Author: Abel Vesa <abelvesa@linux.com>
Date:   Wed Aug 15 00:26:00 2018 +0300

    cpu/hotplug: Non-SMP machines do not make use of booted_once
    
    commit 269777aa530f3438ec1781586cdac0b5fe47b061 upstream.
    
    Commit 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once")
    breaks non-SMP builds.
    
    [ I suspect the 'bool' fields should just be made to be bitfields and be
      exposed regardless of configuration, but that's a separate cleanup
      that I'll leave to the owners of this file for later.   - Linus ]
    
    Fixes: 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once")
    Cc: Dave Hansen <dave.hansen@intel.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Tony Luck <tony.luck@intel.com>
    Signed-off-by: Abel Vesa <abelvesa@linux.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 59a6e1f27602b24f7919e188ff54561e0653620b
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Tue Aug 14 23:38:57 2018 +0200

    x86/smp: fix non-SMP broken build due to redefinition of apic_id_is_primary_thread
    
    commit d0055f351e647f33f3b0329bff022213bf8aa085 upstream.
    
    The function has an inline "return false;" definition with CONFIG_SMP=n
    but the "real" definition is also visible leading to "redefinition of
    ‘apic_id_is_primary_thread’" compiler error.
    
    Guard it with #ifdef CONFIG_SMP
    
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Fixes: 6a4d2657e048 ("x86/smp: Provide topology_is_primary_thread()")
    Cc: stable@vger.kernel.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit da540c063b06b18f77168c8a52ee5a9c783a7481
Author: Josh Poimboeuf <jpoimboe@redhat.com>
Date:   Fri Aug 10 08:31:10 2018 +0100

    x86/microcode: Allow late microcode loading with SMT disabled
    
    commit 07d981ad4cf1e78361c6db1c28ee5ba105f96cc1 upstream
    
    The kernel unnecessarily prevents late microcode loading when SMT is
    disabled.  It should be safe to allow it if all the primary threads are
    online.
    
    Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
    Acked-by: Borislav Petkov <bp@suse.de>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 760f9488c13b7d2da69b152a55069e0267ca1477
Author: Ashok Raj <ashok.raj@intel.com>
Date:   Wed Feb 28 11:28:43 2018 +0100

    x86/microcode: Do not upload microcode if CPUs are offline
    
    commit 30ec26da9967d0d785abc24073129a34c3211777 upstream.
    
    Avoid loading microcode if any of the CPUs are offline, and issue a
    warning. Having different microcode revisions on the system at any time
    is outright dangerous.
    
    [ Borislav: Massage changelog. ]
    
    Signed-off-by: Ashok Raj <ashok.raj@intel.com>
    Signed-off-by: Borislav Petkov <bp@suse.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
    Tested-by: Ashok Raj <ashok.raj@intel.com>
    Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
    Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
    Link: http://lkml.kernel.org/r/1519352533-15992-4-git-send-email-ashok.raj@intel.com
    Link: https://lkml.kernel.org/r/20180228102846.13447-5-bp@alien8.de
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit d21c27185b6f2c32d4b029d1b5c0661702099baf
Author: David Woodhouse <dwmw@amazon.co.uk>
Date:   Wed Aug 8 11:00:16 2018 +0100

    tools headers: Synchronise x86 cpufeatures.h for L1TF additions
    
    commit e24f14b0ff985f3e09e573ba1134bfdf42987e05 upstream
    
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit e79d049743f1466084df5708cd0e052b0d586548
Author: Andi Kleen <ak@linux.intel.com>
Date:   Tue Aug 7 15:09:38 2018 -0700

    x86/mm/kmmio: Make the tracer robust against L1TF
    
    commit 1063711b57393c1999248cccb57bebfaf16739e7 upstream
    
    The mmio tracer sets io mapping PTEs and PMDs to non present when enabled
    without inverting the address bits, which makes the PTE entry vulnerable
    for L1TF.
    
    Make it use the right low level macros to actually invert the address bits
    to protect against L1TF.
    
    In principle this could be avoided because MMIO tracing is not likely to be
    enabled on production machines, but the fix is straigt forward and for
    consistency sake it's better to get rid of the open coded PTE manipulation.
    
    Signed-off-by: Andi Kleen <ak@linux.intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 7e464373357dd6ff33a1a7373d5e596ed1dbb219
Author: Andi Kleen <ak@linux.intel.com>
Date:   Tue Aug 7 15:09:39 2018 -0700

    x86/mm/pat: Make set_memory_np() L1TF safe
    
    commit 958f79b9ee55dfaf00c8106ed1c22a2919e0028b upstream
    
    set_memory_np() is used to mark kernel mappings not present, but it has
    it's own open coded mechanism which does not have the L1TF protection of
    inverting the address bits.
    
    Replace the open coded PTE manipulation with the L1TF protecting low level
    PTE routines.
    
    Passes the CPA self test.
    
    Signed-off-by: Andi Kleen <ak@linux.intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    [ dwmw2: Pull in pud_mkhuge() from commit a00cc7d9dd, and pfn_pud() ]
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 5ebf3f8d5b56412973ca3f2363dae52f795c6700
Author: Andi Kleen <ak@linux.intel.com>
Date:   Tue Aug 7 15:09:37 2018 -0700

    x86/speculation/l1tf: Make pmd/pud_mknotpresent() invert
    
    commit 0768f91530ff46683e0b372df14fd79fe8d156e5 upstream
    
    Some cases in THP like:
      - MADV_FREE
      - mprotect
      - split
    
    mark the PMD non present for temporarily to prevent races. The window for
    an L1TF attack in these contexts is very small, but it wants to be fixed
    for correctness sake.
    
    Use the proper low level functions for pmd/pud_mknotpresent() to address
    this.
    
    Signed-off-by: Andi Kleen <ak@linux.intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 4656dfb6b5ddb2c7e6120b8a8d0b144445bf5914
Author: Andi Kleen <ak@linux.intel.com>
Date:   Tue Aug 7 15:09:36 2018 -0700

    x86/speculation/l1tf: Invert all not present mappings
    
    commit f22cc87f6c1f771b57c407555cfefd811cdd9507 upstream
    
    For kernel mappings PAGE_PROTNONE is not necessarily set for a non present
    mapping, but the inversion logic explicitely checks for !PRESENT and
    PROT_NONE.
    
    Remove the PROT_NONE check and make the inversion unconditional for all not
    present mappings.
    
    Signed-off-by: Andi Kleen <ak@linux.intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit c504b9fce7ba7a2ff96f857d609c69e291553ef0
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Aug 7 08:19:57 2018 +0200

    cpu/hotplug: Fix SMT supported evaluation
    
    commit bc2d8d262cba5736332cbc866acb11b1c5748aa9 upstream
    
    Josh reported that the late SMT evaluation in cpu_smt_state_init() sets
    cpu_smt_control to CPU_SMT_NOT_SUPPORTED in case that 'nosmt' was supplied
    on the kernel command line as it cannot differentiate between SMT disabled
    by BIOS and SMT soft disable via 'nosmt'. That wreckages the state and
    makes the sysfs interface unusable.
    
    Rework this so that during bringup of the non boot CPUs the availability of
    SMT is determined in cpu_smt_allowed(). If a newly booted CPU is not a
    'primary' thread then set the local cpu_smt_available marker and evaluate
    this explicitely right after the initial SMP bringup has finished.
    
    SMT evaulation on x86 is a trainwreck as the firmware has all the
    information _before_ booting the kernel, but there is no interface to query
    it.
    
    Fixes: 73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS")
    Reported-by: Josh Poimboeuf <jpoimboe@redhat.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit f56c8ee659c926bdba42c0d45405433e1a00eb2e
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Sun Aug 5 16:07:47 2018 +0200

    KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry
    
    commit 5b76a3cff011df2dcb6186c965a2e4d809a05ad4 upstream
    
    When nested virtualization is in use, VMENTER operations from the nested
    hypervisor into the nested guest will always be processed by the bare metal
    hypervisor, and KVM's "conditional cache flushes" mode in particular does a
    flush on nested vmentry.  Therefore, include the "skip L1D flush on
    vmentry" bit in KVM's suggested ARCH_CAPABILITIES setting.
    
    Add the relevant Documentation.
    
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 383f160027af7f3e3c32c2988980652e708a2119
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Sun Aug 5 16:07:46 2018 +0200

    x86/speculation: Use ARCH_CAPABILITIES to skip L1D flush on vmentry
    
    commit 8e0b2b916662e09dd4d09e5271cdf214c6b80e62 upstream
    
    Bit 3 of ARCH_CAPABILITIES tells a hypervisor that L1D flush on vmentry is
    not needed.  Add a new value to enum vmx_l1d_flush_state, which is used
    either if there is no L1TF bug at all, or if bit 3 is set in ARCH_CAPABILITIES.
    
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit ee782edd87b482e66cc283cc23d1e984792874e8
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Sun Aug 5 16:07:45 2018 +0200

    x86/speculation: Simplify sysfs report of VMX L1TF vulnerability
    
    commit ea156d192f5257a5bf393d33910d3b481bf8a401 upstream
    
    Three changes to the content of the sysfs file:
    
     - If EPT is disabled, L1TF cannot be exploited even across threads on the
       same core, and SMT is irrelevant.
    
     - If mitigation is completely disabled, and SMT is enabled, print "vulnerable"
       instead of "vulnerable, SMT vulnerable"
    
     - Reorder the two parts so that the main vulnerability state comes first
       and the detail on SMT is second.
    
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit ce2c755166f9503b1671bd2822d04939afce5b34
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Mon Jun 25 14:04:37 2018 +0200

    KVM: VMX: support MSR_IA32_ARCH_CAPABILITIES as a feature MSR
    
    commit cd28325249a1ca0d771557ce823e0308ad629f98 upstream
    
    This lets userspace read the MSR_IA32_ARCH_CAPABILITIES and check that all
    requested features are available on the host.
    
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 7a1eac80b5127b20abfcaaf92062c236078f812a
Author: Wanpeng Li <wanpengli@tencent.com>
Date:   Wed Feb 28 14:03:31 2018 +0800

    KVM: X86: Allow userspace to define the microcode version
    
    commit 518e7b94817abed94becfe6a44f1ece0d4745afe upstream
    
    Linux (among the others) has checks to make sure that certain features
    aren't enabled on a certain family/model/stepping if the microcode version
    isn't greater than or equal to a known good version.
    
    By exposing the real microcode version, we're preventing buggy guests that
    don't check that they are running virtualized (i.e., they should trust the
    hypervisor) from disabling features that are effectively not buggy.
    
    Suggested-by: Filippo Sironi <sironi@amazon.de>
    Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
    Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
    Cc: Liran Alon <liran.alon@oracle.com>
    Cc: Nadav Amit <nadav.amit@gmail.com>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Tom Lendacky <thomas.lendacky@amd.com>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 8a01dd38e5e1b06f9be73eb5eb80b267a236f29d
Author: Wanpeng Li <wanpengli@tencent.com>
Date:   Wed Feb 28 14:03:30 2018 +0800

    KVM: X86: Introduce kvm_get_msr_feature()
    
    commit 66421c1ec340096b291af763ed5721314cdd9c5c upstream
    
    Introduce kvm_get_msr_feature() to handle the msrs which are supported
    by different vendors and sharing the same emulation logic.
    
    Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
    Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
    Cc: Liran Alon <liran.alon@oracle.com>
    Cc: Nadav Amit <nadav.amit@gmail.com>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Tom Lendacky <thomas.lendacky@amd.com>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 1a155ef3c958b4916594eca132472c9af1c642f7
Author: Tom Lendacky <thomas.lendacky@amd.com>
Date:   Sat Feb 24 00:18:20 2018 +0100

    KVM: SVM: Add MSR-based feature support for serializing LFENCE
    
    commit d1d93fa90f1afa926cb060b7f78ab01a65705b4d upstream
    
    In order to determine if LFENCE is a serializing instruction on AMD
    processors, MSR 0xc0011029 (MSR_F10H_DECFG) must be read and the state
    of bit 1 checked.  This patch will add support to allow a guest to
    properly make this determination.
    
    Add the MSR feature callback operation to svm.c and add MSR 0xc0011029
    to the list of MSR-based features.  If LFENCE is serializing, then the
    feature is supported, allowing the hypervisor to set the value of the
    MSR that guest will see.  Support is also added to write (hypervisor only)
    and read the MSR value for the guest.  A write by the guest will result in
    a #GP.  A read by the guest will return the value as set by the host.  In
    this way, the support to expose the feature to the guest is controlled by
    the hypervisor.
    
    Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 62d88fc0fb6bc888d30a5bd074afd5a0ae59a1af
Author: Tom Lendacky <thomas.lendacky@amd.com>
Date:   Wed Feb 21 13:39:51 2018 -0600

    KVM: x86: Add a framework for supporting MSR-based features
    
    commit 801e459a6f3a63af9d447e6249088c76ae16efc4 upstream
    
    Provide a new KVM capability that allows bits within MSRs to be recognized
    as features.  Two new ioctls are added to the /dev/kvm ioctl routine to
    retrieve the list of these MSRs and then retrieve their values. A kvm_x86_ops
    callback is used to determine support for the listed MSR-based features.
    
    Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    [Tweaked documentation. - Radim]
    Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit d9f378f64c0ae3d76c1828742557c6c0ccc9e977
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Sun Aug 5 17:06:12 2018 +0200

    Documentation/l1tf: Remove Yonah processors from not vulnerable list
    
    commit 58331136136935c631c2b5f06daf4c3006416e91 upstream
    
    Dave reported, that it's not confirmed that Yonah processors are
    unaffected. Remove them from the list.
    
    Reported-by: ave Hansen <dave.hansen@intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 77a83b3a622a0fbdcb7c0d81c853649dbc0eb7a2
Author: Nicolai Stange <nstange@suse.de>
Date:   Sun Jul 22 13:38:18 2018 +0200

    x86/KVM/VMX: Don't set l1tf_flush_l1d from vmx_handle_external_intr()
    
    commit 18b57ce2eb8c8b9a24174a89250cf5f57c76ecdc upstream
    
    For VMEXITs caused by external interrupts, vmx_handle_external_intr()
    indirectly calls into the interrupt handlers through the host's IDT.
    
    It follows that these interrupts get accounted for in the
    kvm_cpu_l1tf_flush_l1d per-cpu flag.
    
    The subsequently executed vmx_l1d_flush() will thus be aware that some
    interrupts have happened and conduct a L1d flush anyway.
    
    Setting l1tf_flush_l1d from vmx_handle_external_intr() isn't needed
    anymore. Drop it.
    
    Signed-off-by: Nicolai Stange <nstange@suse.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 2c5a3a05474011cb84a1b6c45543d56c324cadca
Author: Nicolai Stange <nstange@suse.de>
Date:   Sun Jul 29 13:06:04 2018 +0200

    x86/irq: Let interrupt handlers set kvm_cpu_l1tf_flush_l1d
    
    commit ffcba43ff66c7dab34ec700debd491d2a4d319b4 upstream
    
    The last missing piece to having vmx_l1d_flush() take interrupts after
    VMEXIT into account is to set the kvm_cpu_l1tf_flush_l1d per-cpu flag on
    irq entry.
    
    Issue calls to kvm_set_cpu_l1tf_flush_l1d() from entering_irq(),
    ipi_entering_ack_irq(), smp_reschedule_interrupt() and
    uv_bau_message_interrupt().
    
    Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: Nicolai Stange <nstange@suse.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 8574df1a8741f6cce1f2fbdd921b07adeec8d932
Author: Nicolai Stange <nstange@suse.de>
Date:   Sun Jul 29 12:15:33 2018 +0200

    x86: Don't include linux/irq.h from asm/hardirq.h
    
    commit 447ae316670230d7d29430e2cbf1f5db4f49d14c upstream
    
    The next patch in this series will have to make the definition of
    irq_cpustat_t available to entering_irq().
    
    Inclusion of asm/hardirq.h into asm/apic.h would cause circular header
    dependencies like
    
      asm/smp.h
        asm/apic.h
          asm/hardirq.h
            linux/irq.h
              linux/topology.h
                linux/smp.h
                  asm/smp.h
    
    or
    
      linux/gfp.h
        linux/mmzone.h
          asm/mmzone.h
            asm/mmzone_64.h
              asm/smp.h
                asm/apic.h
                  asm/hardirq.h
                    linux/irq.h
                      linux/irqdesc.h
                        linux/kobject.h
                          linux/sysfs.h
                            linux/kernfs.h
                              linux/idr.h
                                linux/gfp.h
    
    and others.
    
    This causes compilation errors because of the header guards becoming
    effective in the second inclusion: symbols/macros that had been defined
    before wouldn't be available to intermediate headers in the #include chain
    anymore.
    
    A possible workaround would be to move the definition of irq_cpustat_t
    into its own header and include that from both, asm/hardirq.h and
    asm/apic.h.
    
    However, this wouldn't solve the real problem, namely asm/harirq.h
    unnecessarily pulling in all the linux/irq.h cruft: nothing in
    asm/hardirq.h itself requires it. Also, note that there are some other
    archs, like e.g. arm64, which don't have that #include in their
    asm/hardirq.h.
    
    Remove the linux/irq.h #include from x86' asm/hardirq.h.
    
    Fix resulting compilation errors by adding appropriate #includes to *.c
    files as needed.
    
    Note that some of these *.c files could be cleaned up a bit wrt. to their
    set of #includes, but that should better be done from separate patches, if
    at all.
    
    Signed-off-by: Nicolai Stange <nstange@suse.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    [dwmw2: More fixes for EFI and Xen in 4.9]
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit e371c92e168df9c0713bd4085fbb8501d88b297a
Author: Nicolai Stange <nstange@suse.de>
Date:   Fri Jul 27 13:22:16 2018 +0200

    x86/KVM/VMX: Introduce per-host-cpu analogue of l1tf_flush_l1d
    
    commit 45b575c00d8e72d69d75dd8c112f044b7b01b069 upstream
    
    Part of the L1TF mitigation for vmx includes flushing the L1D cache upon
    VMENTRY.
    
    L1D flushes are costly and two modes of operations are provided to users:
    "always" and the more selective "conditional" mode.
    
    If operating in the latter, the cache would get flushed only if a host side
    code path considered unconfined had been traversed. "Unconfined" in this
    context means that it might have pulled in sensitive data like user data
    or kernel crypto keys.
    
    The need for L1D flushes is tracked by means of the per-vcpu flag
    l1tf_flush_l1d. KVM exit handlers considered unconfined set it. A
    vmx_l1d_flush() subsequently invoked before the next VMENTER will conduct a
    L1d flush based on its value and reset that flag again.
    
    Currently, interrupts delivered "normally" while in root operation between
    VMEXIT and VMENTER are not taken into account. Part of the reason is that
    these don't leave any traces and thus, the vmx code is unable to tell if
    any such has happened.
    
    As proposed by Paolo Bonzini, prepare for tracking all interrupts by
    introducing a new per-cpu flag, "kvm_cpu_l1tf_flush_l1d". It will be in
    strong analogy to the per-vcpu ->l1tf_flush_l1d.
    
    A later patch will make interrupt handlers set it.
    
    For the sake of cache locality, group kvm_cpu_l1tf_flush_l1d into x86'
    per-cpu irq_cpustat_t as suggested by Peter Zijlstra.
    
    Provide the helpers kvm_set_cpu_l1tf_flush_l1d(),
    kvm_clear_cpu_l1tf_flush_l1d() and kvm_get_cpu_l1tf_flush_l1d(). Make them
    trivial resp. non-existent for !CONFIG_KVM_INTEL as appropriate.
    
    Let vmx_l1d_flush() handle kvm_cpu_l1tf_flush_l1d in the same way as
    l1tf_flush_l1d.
    
    Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
    Suggested-by: Peter Zijlstra <peterz@infradead.org>
    Signed-off-by: Nicolai Stange <nstange@suse.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 5766dc12985ca8fdba999d7b5d35035f252b27cf
Author: Nicolai Stange <nstange@suse.de>
Date:   Fri Jul 27 12:46:29 2018 +0200

    x86/irq: Demote irq_cpustat_t::__softirq_pending to u16
    
    commit 9aee5f8a7e30330d0a8f4c626dc924ca5590aba5 upstream
    
    An upcoming patch will extend KVM's L1TF mitigation in conditional mode
    to also cover interrupts after VMEXITs. For tracking those, stores to a
    new per-cpu flag from interrupt handlers will become necessary.
    
    In order to improve cache locality, this new flag will be added to x86's
    irq_cpustat_t.
    
    Make some space available there by shrinking the ->softirq_pending bitfield
    from 32 to 16 bits: the number of bits actually used is only NR_SOFTIRQS,
    i.e. 10.
    
    Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: Nicolai Stange <nstange@suse.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 90bc306b76b8923e365b8e59ddc6968441594970
Author: Nicolai Stange <nstange@suse.de>
Date:   Sat Jul 21 22:35:28 2018 +0200

    x86/KVM/VMX: Move the l1tf_flush_l1d test to vmx_l1d_flush()
    
    commit 5b6ccc6c3b1a477fbac9ec97a0b4c1c48e765209 upstream
    
    Currently, vmx_vcpu_run() checks if l1tf_flush_l1d is set and invokes
    vmx_l1d_flush() if so.
    
    This test is unncessary for the "always flush L1D" mode.
    
    Move the check to vmx_l1d_flush()'s conditional mode code path.
    
    Notes:
    - vmx_l1d_flush() is likely to get inlined anyway and thus, there's no
      extra function call.
    
    - This inverts the (static) branch prediction, but there hadn't been any
      explicit likely()/unlikely() annotations before and so it stays as is.
    
    Signed-off-by: Nicolai Stange <nstange@suse.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 936f566260c2ae883e41301edafa1afb2ba11241
Author: Nicolai Stange <nstange@suse.de>
Date:   Sat Jul 21 22:25:00 2018 +0200

    x86/KVM/VMX: Replace 'vmx_l1d_flush_always' with 'vmx_l1d_flush_cond'
    
    commit 427362a142441f08051369db6fbe7f61c73b3dca upstream
    
    The vmx_l1d_flush_always static key is only ever evaluated if
    vmx_l1d_should_flush is enabled. In that case however, there are only two
    L1d flushing modes possible: "always" and "conditional".
    
    The "conditional" mode's implementation tends to require more sophisticated
    logic than the "always" mode.
    
    Avoid inverted logic by replacing the 'vmx_l1d_flush_always' static key
    with a 'vmx_l1d_flush_cond' one.
    
    There is no change in functionality.
    
    Signed-off-by: Nicolai Stange <nstange@suse.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 698ac1bc17c413fd340c243d64fb15cbaadf7178
Author: Nicolai Stange <nstange@suse.de>
Date:   Sat Jul 21 22:16:56 2018 +0200

    x86/KVM/VMX: Don't set l1tf_flush_l1d to true from vmx_l1d_flush()
    
    commit 379fd0c7e6a391e5565336a646f19f218fb98c6c upstream
    
    vmx_l1d_flush() gets invoked only if l1tf_flush_l1d is true. There's no
    point in setting l1tf_flush_l1d to true from there again.
    
    Signed-off-by: Nicolai Stange <nstange@suse.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 8b1969db5567d49dd32c1f93fa9d7295a2c238a0
Author: Josh Poimboeuf <jpoimboe@redhat.com>
Date:   Wed Jul 25 12:00:27 2018 +0200

    cpu/hotplug: detect SMT disabled by BIOS
    
    commit 73d5e2b472640b1fcdb61ae8be389912ef211bda upstream
    
    If SMT is disabled in BIOS, the CPU code doesn't properly detect it.
    The /sys/devices/system/cpu/smt/control file shows 'on', and the 'l1tf'
    vulnerabilities file shows SMT as vulnerable.
    
    Fix it by forcing 'cpu_smt_control' to CPU_SMT_NOT_SUPPORTED in such a
    case.  Unfortunately the detection can only be done after bringing all
    the CPUs online, so we have to overwrite any previous writes to the
    variable.
    
    Reported-by: Joe Mario <jmario@redhat.com>
    Tested-by: Jiri Kosina <jkosina@suse.cz>
    Fixes: f048c399e0f7 ("x86/topology: Provide topology_smt_supported()")
    Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
    Signed-off-by: Peter Zijlstra <peterz@infradead.org>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 03b3614d4d6febe96117b9e5edc4941a8265e844
Author: Tony Luck <tony.luck@intel.com>
Date:   Thu Jul 19 13:49:58 2018 -0700

    Documentation/l1tf: Fix typos
    
    commit 1949f9f49792d65dba2090edddbe36a5f02e3ba3 upstream
    
    Fix spelling and other typos
    
    Signed-off-by: Tony Luck <tony.luck@intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 587d499c8bd203f6158779b5782a07fe7a5bcea8
Author: Nicolai Stange <nstange@suse.de>
Date:   Wed Jul 18 19:07:38 2018 +0200

    x86/KVM/VMX: Initialize the vmx_l1d_flush_pages' content
    
    commit 288d152c23dcf3c09da46c5c481903ca10ebfef7 upstream
    
    The slow path in vmx_l1d_flush() reads from vmx_l1d_flush_pages in order
    to evict the L1d cache.
    
    However, these pages are never cleared and, in theory, their data could be
    leaked.
    
    More importantly, KSM could merge a nested hypervisor's vmx_l1d_flush_pages
    to fewer than 1 << L1D_CACHE_ORDER host physical pages and this would break
    the L1d flushing algorithm: L1D on x86_64 is tagged by physical addresses.
    
    Fix this by initializing the individual vmx_l1d_flush_pages with a
    different pattern each.
    
    Rename the "empty_zp" asm constraint identifier in vmx_l1d_flush() to
    "flush_pages" to reflect this change.
    
    Fixes: a47dd5f06714 ("x86/KVM/VMX: Add L1D flush algorithm")
    Signed-off-by: Nicolai Stange <nstange@suse.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 93aed2469df1fdef8ed97d6cbb6dd042181fe46e
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Fri Jul 13 16:23:26 2018 +0200

    Documentation: Add section about CPU vulnerabilities
    
    commit 3ec8ce5d866ec6a08a9cfab82b62acf4a830b35f upstream
    
    Add documentation for the L1TF vulnerability and the mitigation mechanisms:
    
      - Explain the problem and risks
      - Document the mitigation mechanisms
      - Document the command line controls
      - Document the sysfs files
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
    Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
    Link: https://lkml.kernel.org/r/20180713142323.287429944@linutronix.de
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 2decbf5264ea6175c6fca28ba2b5c0c683facf27
Author: Jiri Kosina <jkosina@suse.cz>
Date:   Fri Jul 13 16:23:25 2018 +0200

    x86/bugs, kvm: Introduce boot-time control of L1TF mitigations
    
    commit d90a7a0ec83fb86622cd7dae23255d3c50a99ec8 upstream
    
    Introduce the 'l1tf=' kernel command line option to allow for boot-time
    switching of mitigation that is used on processors affected by L1TF.
    
    The possible values are:
    
      full
            Provides all available mitigations for the L1TF vulnerability. Disables
            SMT and enables all mitigations in the hypervisors. SMT control via
            /sys/devices/system/cpu/smt/control is still possible after boot.
            Hypervisors will issue a warning when the first VM is started in
            a potentially insecure configuration, i.e. SMT enabled or L1D flush
            disabled.
    
      full,force
            Same as 'full', but disables SMT control. Implies the 'nosmt=force'
            command line option. sysfs control of SMT and the hypervisor flush
            control is disabled.
    
      flush
            Leaves SMT enabled and enables the conditional hypervisor mitigation.
            Hypervisors will issue a warning when the first VM is started in a
            potentially insecure configuration, i.e. SMT enabled or L1D flush
            disabled.
    
      flush,nosmt
            Disables SMT and enables the conditional hypervisor mitigation. SMT
            control via /sys/devices/system/cpu/smt/control is still possible
            after boot. If SMT is reenabled or flushing disabled at runtime
            hypervisors will issue a warning.
    
      flush,nowarn
            Same as 'flush', but hypervisors will not warn when
            a VM is started in a potentially insecure configuration.
    
      off
            Disables hypervisor mitigations and doesn't emit any warnings.
    
    Default is 'flush'.
    
    Let KVM adhere to these semantics, which means:
    
      - 'lt1f=full,force'   : Performe L1D flushes. No runtime control
                              possible.
    
      - 'l1tf=full'
      - 'l1tf-flush'
      - 'l1tf=flush,nosmt'  : Perform L1D flushes and warn on VM start if
                              SMT has been runtime enabled or L1D flushing
                              has been run-time enabled
    
      - 'l1tf=flush,nowarn' : Perform L1D flushes and no warnings are emitted.
    
      - 'l1tf=off'          : L1D flushes are not performed and no warnings
                              are emitted.
    
    KVM can always override the L1D flushing behavior using its 'vmentry_l1d_flush'
    module parameter except when lt1f=full,force is set.
    
    This makes KVM's private 'nosmt' option redundant, and as it is a bit
    non-systematic anyway (this is something to control globally, not on
    hypervisor level), remove that option.
    
    Add the missing Documentation entry for the l1tf vulnerability sysfs file
    while at it.
    
    Signed-off-by: Jiri Kosina <jkosina@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Jiri Kosina <jkosina@suse.cz>
    Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
    Link: https://lkml.kernel.org/r/20180713142323.202758176@linutronix.de
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 929d3b2e9b130f238a8eb206bdc3f063ca68438f
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Fri Jul 13 16:23:24 2018 +0200

    cpu/hotplug: Set CPU_SMT_NOT_SUPPORTED early
    
    commit fee0aede6f4739c87179eca76136f83210953b86 upstream
    
    The CPU_SMT_NOT_SUPPORTED state is set (if the processor does not support
    SMT) when the sysfs SMT control file is initialized.
    
    That was fine so far as this was only required to make the output of the
    control file correct and to prevent writes in that case.
    
    With the upcoming l1tf command line parameter, this needs to be set up
    before the L1TF mitigation selection and command line parsing happens.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Jiri Kosina <jkosina@suse.cz>
    Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
    Link: https://lkml.kernel.org/r/20180713142323.121795971@linutronix.de
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit a69c5e0706dc6783e11830bccafe34c0b7f0a979
Author: Jiri Kosina <jkosina@suse.cz>
Date:   Fri Jul 13 16:23:23 2018 +0200

    cpu/hotplug: Expose SMT control init function
    
    commit 8e1b706b6e819bed215c0db16345568864660393 upstream
    
    The L1TF mitigation will gain a commend line parameter which allows to set
    a combination of hypervisor mitigation and SMT control.
    
    Expose cpu_smt_disable() so the command line parser can tweak SMT settings.
    
    [ tglx: Split out of larger patch and made it preserve an already existing
            force off state ]
    
    Signed-off-by: Jiri Kosina <jkosina@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Jiri Kosina <jkosina@suse.cz>
    Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
    Link: https://lkml.kernel.org/r/20180713142323.039715135@linutronix.de
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 4797c2f3791e58d21e82bf0948483ae9b639286b
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Fri Jul 13 16:23:22 2018 +0200

    x86/kvm: Allow runtime control of L1D flush
    
    commit 895ae47f9918833c3a880fbccd41e0692b37e7d9 upstream
    
    All mitigation modes can be switched at run time with a static key now:
    
     - Use sysfs_streq() instead of strcmp() to handle the trailing new line
       from sysfs writes correctly.
     - Make the static key management handle multiple invocations properly.
     - Set the module parameter file to RW
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Jiri Kosina <jkosina@suse.cz>
    Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
    Link: https://lkml.kernel.org/r/20180713142322.954525119@linutronix.de
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 6ccf633238db85cbadd3fa0830eab88fe949dd67
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Fri Jul 13 16:23:21 2018 +0200

    x86/kvm: Serialize L1D flush parameter setter
    
    commit dd4bfa739a72508b75760b393d129ed7b431daab upstream
    
    Writes to the parameter files are not serialized at the sysfs core
    level, so local serialization is required.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Jiri Kosina <jkosina@suse.cz>
    Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
    Link: https://lkml.kernel.org/r/20180713142322.873642605@linutronix.de
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit dff0982c5719eaedff58c026be9871ea63af992c
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Fri Jul 13 16:23:20 2018 +0200

    x86/kvm: Add static key for flush always
    
    commit 4c6523ec59fe895ea352a650218a6be0653910b1 upstream
    
    Avoid the conditional in the L1D flush control path.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Jiri Kosina <jkosina@suse.cz>
    Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
    Link: https://lkml.kernel.org/r/20180713142322.790914912@linutronix.de
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 641a211704f630a3cc0c9ad1a7d922baf9432f11
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Fri Jul 13 16:23:19 2018 +0200

    x86/kvm: Move l1tf setup function
    
    commit 7db92e165ac814487264632ab2624e832f20ae38 upstream
    
    In preparation of allowing run time control for L1D flushing, move the
    setup code to the module parameter handler.
    
    In case of pre module init parsing, just store the value and let vmx_init()
    do the actual setup after running kvm_init() so that enable_ept is having
    the correct state.
    
    During run-time invoke it directly from the parameter setter to prepare for
    run-time control.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Jiri Kosina <jkosina@suse.cz>
    Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
    Link: https://lkml.kernel.org/r/20180713142322.694063239@linutronix.de
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 4186ae815556590798de371e0d6ed85fc3682534
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Fri Jul 13 16:23:18 2018 +0200

    x86/l1tf: Handle EPT disabled state proper
    
    commit a7b9020b06ec6d7c3f3b0d4ef1a9eba12654f4f7 upstream
    
    If Extended Page Tables (EPT) are disabled or not supported, no L1D
    flushing is required. The setup function can just avoid setting up the L1D
    flush for the EPT=n case.
    
    Invoke it after the hardware setup has be done and enable_ept has the
    correct state and expose the EPT disabled state in the mitigation status as
    well.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Jiri Kosina <jkosina@suse.cz>
    Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
    Link: https://lkml.kernel.org/r/20180713142322.612160168@linutronix.de
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 31282cf43b9d4fd950d8879af081771c0ff04f5f
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Fri Jul 13 16:23:17 2018 +0200

    x86/kvm: Drop L1TF MSR list approach
    
    commit 2f055947ae5e2741fb2dc5bba1033c417ccf4faa upstream
    
    The VMX module parameter to control the L1D flush should become
    writeable.
    
    The MSR list is set up at VM init per guest VCPU, but the run time
    switching is based on a static key which is global. Toggling the MSR list
    at run time might be feasible, but for now drop this optimization and use
    the regular MSR write to make run-time switching possible.
    
    The default mitigation is the conditional flush anyway, so for extra
    paranoid setups this will add some small overhead, but the extra code
    executed is in the noise compared to the flush itself.
    
    Aside of that the EPT disabled case is not handled correctly at the moment
    and the MSR list magic is in the way for fixing that as well.
    
    If it's really providing a significant advantage, then this needs to be
    revisited after the code is correct and the control is writable.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Jiri Kosina <jkosina@suse.cz>
    Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
    Link: https://lkml.kernel.org/r/20180713142322.516940445@linutronix.de
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 80e55b5ea4e9dbc049594bf357b1a9b0347bb584
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Fri Jul 13 16:23:16 2018 +0200

    x86/litf: Introduce vmx status variable
    
    commit 72c6d2db64fa18c996ece8f06e499509e6c9a37e upstream
    
    Store the effective mitigation of VMX in a status variable and use it to
    report the VMX state in the l1tf sysfs file.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Jiri Kosina <jkosina@suse.cz>
    Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
    Link: https://lkml.kernel.org/r/20180713142322.433098358@linutronix.de
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit e7cda2ffe1279bcf63f1dd8bbc3c7b818a9ba457
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Sat Jul 7 11:40:18 2018 +0200

    cpu/hotplug: Online siblings when SMT control is turned on
    
    commit 215af5499d9e2b55f111d2431ea20218115f29b3 upstream
    
    Writing 'off' to /sys/devices/system/cpu/smt/control offlines all SMT
    siblings. Writing 'on' merily enables the abilify to online them, but does
    not online them automatically.
    
    Make 'on' more useful by onlining all offline siblings.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit a8c14676a93da6b3ef6610b37ef84e7d89f9f3a2
Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date:   Thu Jun 28 17:10:36 2018 -0400

    x86/KVM/VMX: Use MSR save list for IA32_FLUSH_CMD if required
    
    commit 390d975e0c4e60ce70d4157e0dd91ede37824603 upstream
    
    If the L1D flush module parameter is set to 'always' and the IA32_FLUSH_CMD
    MSR is available, optimize the VMENTER code with the MSR save list.
    
    Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit c45ff817e91bef4cbb36944b0c723a42c4c920d2
Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date:   Wed Jun 20 22:01:22 2018 -0400

    x86/KVM/VMX: Extend add_atomic_switch_msr() to allow VMENTER only MSRs
    
    commit 989e3992d2eca32c3f1404f2bc91acda3aa122d8 upstream
    
    The IA32_FLUSH_CMD MSR needs only to be written on VMENTER. Extend
    add_atomic_switch_msr() with an entry_only parameter to allow storing the
    MSR only in the guest (ENTRY) MSR array.
    
    Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 5d3eaa2d3935e9a5be2dd7186962e72e475d5966
Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date:   Wed Jun 20 22:00:47 2018 -0400

    x86/KVM/VMX: Separate the VMX AUTOLOAD guest/host number accounting
    
    commit 3190709335dd31fe1aeeebfe4ffb6c7624ef971f upstream
    
    This allows to load a different number of MSRs depending on the context:
    VMEXIT or VMENTER.
    
    Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 1555f9e8ed973df3e4a5aecc37cdb6d48469d366
Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date:   Wed Jun 20 20:11:39 2018 -0400

    x86/KVM/VMX: Add find_msr() helper function
    
    commit ca83b4a7f2d068da79a029d323024aa45decb250 upstream
    
    .. to help find the MSR on either the guest or host MSR list.
    
    Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 57e3ada3e552dcd2de7e22acfb6eac2000f98868
Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date:   Wed Jun 20 13:58:37 2018 -0400

    x86/KVM/VMX: Split the VMX MSR LOAD structures to have an host/guest numbers
    
    commit 33966dd6b2d2c352fae55412db2ea8cfff5df13a upstream
    
    There is no semantic change but this change allows an unbalanced amount of
    MSRs to be loaded on VMEXIT and VMENTER, i.e. the number of MSRs to save or
    restore on VMEXIT or VMENTER may be different.
    
    Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 69c2525237979d595bf0db29b84cc79b222e25c7
Author: Jim Mattson <jmattson@google.com>
Date:   Tue Oct 4 10:48:38 2016 -0700

    kvm: nVMX: Update MSR load counts on a VMCS switch
    
    Commit 83bafef1a131d1b8743d63658a180948bc880a74 upstream
    
    When L0 establishes (or removes) an MSR entry in the VM-entry or VM-exit
    MSR load lists, the change should affect the dormant VMCS as well as the
    current VMCS. Moreover, the vmcs02 MSR-load addresses should be
    initialized.
    
    [ dwmw2: Pulled in to 4.9 backports for L1TF ]
    
    Signed-off-by: Jim Mattson <jmattson@google.com>
    Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit b3dc63c4f43e57d73d769ad0d3f34eae74cb68a8
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Mon Jul 2 13:07:14 2018 +0200

    x86/KVM/VMX: Add L1D flush logic
    
    commit c595ceee45707f00f64f61c54fb64ef0cc0b4e85 upstream
    
    Add the logic for flushing L1D on VMENTER. The flush depends on the static
    key being enabled and the new l1tf_flush_l1d flag being set.
    
    The flags is set:
     - Always, if the flush module parameter is 'always'
    
     - Conditionally at:
       - Entry to vcpu_run(), i.e. after executing user space
    
       - From the sched_in notifier, i.e. when switching to a vCPU thread.
    
       - From vmexit handlers which are considered unsafe, i.e. where
         sensitive data can be brought into L1D:
    
         - The emulator, which could be a good target for other speculative
           execution-based threats,
    
         - The MMU, which can bring host page tables in the L1 cache.
    
         - External interrupts
    
         - Nested operations that require the MMU (see above). That is
           vmptrld, vmptrst, vmclear,vmwrite,vmread.
    
         - When handling invept,invvpid
    
    [ tglx: Split out from combo patch and reduced to a single flag ]
    [ dwmw2: Backported to 4.9, set l1tf_flush_l1d in svm/vmx code ]
    
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit acca8a70a5f6179007e1148a62b8bef12b212d9b
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Mon Jul 2 13:03:48 2018 +0200

    x86/KVM/VMX: Add L1D MSR based flush
    
    commit 3fa045be4c720146b18a19cea7a767dc6ad5df94 upstream
    
    336996-Speculative-Execution-Side-Channel-Mitigations.pdf defines a new MSR
    (IA32_FLUSH_CMD aka 0x10B) which has similar write-only semantics to other
    MSRs defined in the document.
    
    The semantics of this MSR is to allow "finer granularity invalidation of
    caching structures than existing mechanisms like WBINVD. It will writeback
    and invalidate the L1 data cache, including all cachelines brought in by
    preceding instructions, without invalidating all caches (eg. L2 or
    LLC). Some processors may also invalidate the first level level instruction
    cache on a L1D_FLUSH command. The L1 data and instruction caches may be
    shared across the logical processors of a core."
    
    Use it instead of the loop based L1 flush algorithm.
    
    A copy of this document is available at
       https://bugzilla.kernel.org/show_bug.cgi?id=199511
    
    [ tglx: Avoid allocating pages when the MSR is available ]
    
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit b3d648aefab5265a566d6616de0e3a6b0aa2334b
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Mon Jul 2 12:47:38 2018 +0200

    x86/KVM/VMX: Add L1D flush algorithm
    
    commit a47dd5f06714c844b33f3b5f517b6f3e81ce57b5 upstream
    
    To mitigate the L1 Terminal Fault vulnerability it's required to flush L1D
    on VMENTER to prevent rogue guests from snooping host memory.
    
    CPUs will have a new control MSR via a microcode update to flush L1D with a
    single MSR write, but in the absence of microcode a fallback to a software
    based flush algorithm is required.
    
    Add a software flush loop which is based on code from Intel.
    
    [ tglx: Split out from combo patch ]
    [ bpetkov: Polish the asm code ]
    
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit af6ce92977a25540e5d6e0cf90ca187178b0ff9f
Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date:   Mon Jul 2 12:29:30 2018 +0200

    x86/KVM/VMX: Add module argument for L1TF mitigation
    
    commit a399477e52c17e148746d3ce9a483f681c2aa9a0 upstream
    
    Add a mitigation mode parameter "vmentry_l1d_flush" for CVE-2018-3620, aka
    L1 terminal fault. The valid arguments are:
    
     - "always"     L1D cache flush on every VMENTER.
     - "cond"       Conditional L1D cache flush, explained below
     - "never"      Disable the L1D cache flush mitigation
    
    "cond" is trying to avoid L1D cache flushes on VMENTER if the code executed
    between VMEXIT and VMENTER is considered safe, i.e. is not bringing any
    interesting information into L1D which might exploited.
    
    [ tglx: Split out from a larger patch ]
    
    Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit a0695af3406ae2a08184bd47a9e948fe6f9858b9
Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date:   Wed Jun 20 11:29:53 2018 -0400

    x86/KVM: Warn user if KVM is loaded SMT and L1TF CPU bug being present
    
    commit 26acfb666a473d960f0fd971fe68f3e3ad16c70b upstream
    
    If the L1TF CPU bug is present we allow the KVM module to be loaded as the
    major of users that use Linux and KVM have trusted guests and do not want a
    broken setup.
    
    Cloud vendors are the ones that are uncomfortable with CVE 2018-3620 and as
    such they are the ones that should set nosmt to one.
    
    Setting 'nosmt' means that the system administrator also needs to disable
    SMT (Hyper-threading) in the BIOS, or via the 'nosmt' command line
    parameter, or via the /sys/devices/system/cpu/smt/control. See commit
    05736e4ac13c ("cpu/hotplug: Provide knobs to control SMT").
    
    Other mitigations are to use task affinity, cpu sets, interrupt binding,
    etc - anything to make sure that _only_ the same guests vCPUs are running
    on sibling threads.
    
    Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 8438e49bcac479213ada6a29595adfd2e3d99460
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Fri Jun 29 16:05:48 2018 +0200

    cpu/hotplug: Boot HT siblings at least once
    
    commit 0cc3cd21657be04cb0559fe8063f2130493f92cf upstream
    
    Due to the way Machine Check Exceptions work on X86 hyperthreads it's
    required to boot up _all_ logical cores at least once in order to set the
    CR4.MCE bit.
    
    So instead of ignoring the sibling threads right away, let them boot up
    once so they can configure themselves. After they came out of the initial
    boot stage check whether its a "secondary" sibling and cancel the operation
    which puts the CPU back into offline state.
    
    [dwmw2: Backport to 4.9]
    
    Reported-by: Dave Hansen <dave.hansen@intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Tony Luck <tony.luck@intel.com>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit fe2a955476f9d9a00d09840b5642d963893abebb
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Fri Jun 29 16:05:47 2018 +0200

    Revert "x86/apic: Ignore secondary threads if nosmt=force"
    
    commit 506a66f374891ff08e064a058c446b336c5ac760 upstream
    
    Dave Hansen reported, that it's outright dangerous to keep SMT siblings
    disabled completely so they are stuck in the BIOS and wait for SIPI.
    
    The reason is that Machine Check Exceptions are broadcasted to siblings and
    the soft disabled sibling has CR4.MCE = 0. If a MCE is delivered to a
    logical core with CR4.MCE = 0, it asserts IERR#, which shuts down or
    reboots the machine. The MCE chapter in the SDM contains the following
    blurb:
    
        Because the logical processors within a physical package are tightly
        coupled with respect to shared hardware resources, both logical
        processors are notified of machine check errors that occur within a
        given physical processor. If machine-check exceptions are enabled when
        a fatal error is reported, all the logical processors within a physical
        package are dispatched to the machine-check exception handler. If
        machine-check exceptions are disabled, the logical processors enter the
        shutdown state and assert the IERR# signal. When enabling machine-check
        exceptions, the MCE flag in control register CR4 should be set for each
        logical processor.
    
    Reverting the commit which ignores siblings at enumeration time solves only
    half of the problem. The core cpuhotplug logic needs to be adjusted as
    well.
    
    This thoughtful engineered mechanism also turns the boot process on all
    Intel HT enabled systems into a MCE lottery. MCE is enabled on the boot CPU
    before the secondary CPUs are brought up. Depending on the number of
    physical cores the window in which this situation can happen is smaller or
    larger. On a HSW-EX it's about 750ms:
    
    MCE is enabled on the boot CPU:
    
    [    0.244017] mce: CPU supports 22 MCE banks
    
    The corresponding sibling #72 boots:
    
    [    1.008005] .... node  #0, CPUs:    #72
    
    That means if an MCE hits on physical core 0 (logical CPUs 0 and 72)
    between these two points the machine is going to shutdown. At least it's a
    known safe state.
    
    It's obvious that the early boot can be hit by an MCE as well and then runs
    into the same situation because MCEs are not yet enabled on the boot CPU.
    But after enabling them on the boot CPU, it does not make any sense to
    prevent the kernel from recovering.
    
    Adjust the nosmt kernel parameter documentation as well.
    
    Reverts: 2207def700f9 ("x86/apic: Ignore secondary threads if nosmt=force")
    Reported-by: Dave Hansen <dave.hansen@intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Tony Luck <tony.luck@intel.com>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 3f0eb66f652ceb5985b9b619e33fc61519121045
Author: Michal Hocko <mhocko@suse.cz>
Date:   Wed Jun 27 17:46:50 2018 +0200

    x86/speculation/l1tf: Fix up pte->pfn conversion for PAE
    
    commit e14d7dfb41f5807a0c1c26a13f2b8ef16af24935 upstream
    
    Jan has noticed that pte_pfn and co. resp. pfn_pte are incorrect for
    CONFIG_PAE because phys_addr_t is wider than unsigned long and so the
    pte_val reps. shift left would get truncated. Fix this up by using proper
    types.
    
    [dwmw2: Backport to 4.9]
    
    Fixes: 6b28baca9b1f ("x86/speculation/l1tf: Protect PROT_NONE PTEs against speculation")
    Reported-by: Jan Beulich <JBeulich@suse.com>
    Signed-off-by: Michal Hocko <mhocko@suse.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 53527af79dc940a225efa266f6320ae9e8dae5e3
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Fri Jun 22 17:39:33 2018 +0200

    x86/speculation/l1tf: Protect PAE swap entries against L1TF
    
    commit 0d0f6249058834ffe1ceaad0bb31464af66f6e7a upstream
    
    The PAE 3-level paging code currently doesn't mitigate L1TF by flipping the
    offset bits, and uses the high PTE word, thus bits 32-36 for type, 37-63 for
    offset. The lower word is zeroed, thus systems with less than 4GB memory are
    safe. With 4GB to 128GB the swap type selects the memory locations vulnerable
    to L1TF; with even more memory, also the swap offfset influences the address.
    This might be a problem with 32bit PAE guests running on large 64bit hosts.
    
    By continuing to keep the whole swap entry in either high or low 32bit word of
    PTE we would limit the swap size too much. Thus this patch uses the whole PAE
    PTE with the same layout as the 64bit version does. The macros just become a
    bit tricky since they assume the arch-dependent swp_entry_t to be 32bit.
    
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 250f0aebe2763df9e7ca8c9445d75fe5bbb3f970
Author: Borislav Petkov <bp@suse.de>
Date:   Fri Jun 22 11:34:11 2018 +0200

    x86/CPU/AMD: Move TOPOEXT reenablement before reading smp_num_siblings
    
    commit 7ce2f0393ea2396142b7faf6ee9b1f3676d08a5f upstream
    
    The TOPOEXT reenablement is a workaround for broken BIOSen which didn't
    enable the CPUID bit. amd_get_topology_early(), however, relies on
    that bit being set so that it can read out the CPUID leaf and set
    smp_num_siblings properly.
    
    Move the reenablement up to early_init_amd(). While at it, simplify
    amd_get_topology_early().
    
    [dwmw2: Backport to 4.9]
    
    Signed-off-by: Borislav Petkov <bp@suse.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit a8358624a3ca9139b461c669231e5f474df8ccf1
Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date:   Wed Jun 20 16:42:58 2018 -0400

    x86/cpufeatures: Add detection of L1D cache flush support.
    
    commit 11e34e64e4103955fc4568750914c75d65ea87ee upstream
    
    336996-Speculative-Execution-Side-Channel-Mitigations.pdf defines a new MSR
    (IA32_FLUSH_CMD) which is detected by CPUID.7.EDX[28]=1 bit being set.
    
    This new MSR "gives software a way to invalidate structures with finer
    granularity than other architectual methods like WBINVD."
    
    A copy of this document is available at
      https://bugzilla.kernel.org/show_bug.cgi?id=199511
    
    Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit c4b998c88f86971400b556520ba55c8ca96fd8dc
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Thu Jun 21 12:36:29 2018 +0200

    x86/speculation/l1tf: Extend 64bit swap file size limit
    
    commit 1a7ed1ba4bba6c075d5ad61bb75e3fbc870840d6 upstream
    
    The previous patch has limited swap file size so that large offsets cannot
    clear bits above MAX_PA/2 in the pte and interfere with L1TF mitigation.
    
    It assumed that offsets are encoded starting with bit 12, same as pfn. But
    on x86_64, offsets are encoded starting with bit 9.
    
    Thus the limit can be raised by 3 bits. That means 16TB with 42bit MAX_PA
    and 256TB with 46bit MAX_PA.
    
    Fixes: 377eeaa8e11f ("x86/speculation/l1tf: Limit swap file size to MAX_PA/2")
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 4a818f2c354249439dcd9409dafbd95212c5cdb0
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jun 5 14:00:11 2018 +0200

    x86/apic: Ignore secondary threads if nosmt=force
    
    commit 2207def700f902f169fc237b717252c326f9e464 upstream
    
    nosmt on the kernel command line merely prevents the onlining of the
    secondary SMT siblings.
    
    nosmt=force makes the APIC detection code ignore the secondary SMT siblings
    completely, so they even do not show up as possible CPUs. That reduces the
    amount of memory allocations for per cpu variables and saves other
    resources from being allocated too large.
    
    This is not fully equivalent to disabling SMT in the BIOS because the low
    level SMT enabling in the BIOS can result in partitioning of resources
    between the siblings, which is not undone by just ignoring them. Some CPUs
    can use the full resources when their sibling is not onlined, but this is
    depending on the CPU family and model and it's not well documented whether
    this applies to all partitioned resources. That means depending on the
    workload disabling SMT in the BIOS might result in better performance.
    
    Linus analysis of the Intel manual:
    
      The intel optimization manual is not very clear on what the partitioning
      rules are.
    
      I find:
    
        "In general, the buffers for staging instructions between major pipe
         stages  are partitioned. These buffers include µop queues after the
         execution trace cache, the queues after the register rename stage, the
         reorder buffer which stages instructions for retirement, and the load
         and store buffers.
    
         In the case of load and store buffers, partitioning also provided an
         easier implementation to maintain memory ordering for each logical
         processor and detect memory ordering violations"
    
      but some of that partitioning may be relaxed if the HT thread is "not
      active":
    
        "In Intel microarchitecture code name Sandy Bridge, the micro-op queue
         is statically partitioned to provide 28 entries for each logical
         processor,  irrespective of software executing in single thread or
         multiple threads. If one logical processor is not active in Intel
         microarchitecture code name Ivy Bridge, then a single thread executing
         on that processor  core can use the 56 entries in the micro-op queue"
    
      but I do not know what "not active" means, and how dynamic it is. Some of
      that partitioning may be entirely static and depend on the early BIOS
      disabling of HT, and even if we park the cores, the resources will just be
      wasted.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Acked-by: Ingo Molnar <mingo@kernel.org>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit ae76eb1198fb9f90217d6f36072b780a250dda98
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Wed Jun 6 00:57:38 2018 +0200

    x86/cpu/AMD: Evaluate smp_num_siblings early
    
    commit 1e1d7e25fd759eddf96d8ab39d0a90a1979b2d8c upstream
    
    To support force disabling of SMT it's required to know the number of
    thread siblings early. amd_get_topology() cannot be called before the APIC
    driver is selected, so split out the part which initializes
    smp_num_siblings and invoke it from amd_early_init().
    
    [dwmw2: Backport to 4.9]
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Ingo Molnar <mingo@kernel.org>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 112d243045c2b18f56b37334ee0f7faa01edc205
Author: Borislav Petkov <bp@suse.de>
Date:   Fri Jun 15 20:48:39 2018 +0200

    x86/CPU/AMD: Do not check CPUID max ext level before parsing SMP info
    
    commit 119bff8a9c9bb00116a844ec68be7bc4b1c768f5 upstream
    
    Old code used to check whether CPUID ext max level is >= 0x80000008 because
    that last leaf contains the number of cores of the physical CPU.  The three
    functions called there now do not depend on that leaf anymore so the check
    can go.
    
    Signed-off-by: Borislav Petkov <bp@suse.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Ingo Molnar <mingo@kernel.org>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 0ee6f3b23c04b41ea5cf415aa8a31ef56ab21da7
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Wed Jun 6 01:00:55 2018 +0200

    x86/cpu/intel: Evaluate smp_num_siblings early
    
    commit 1910ad5624968f93be48e8e265513c54d66b897c upstream
    
    Make use of the new early detection function to initialize smp_num_siblings
    on the boot cpu before the MP-Table or ACPI/MADT scan happens. That's
    required for force disabling SMT.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Acked-by: Ingo Molnar <mingo@kernel.org>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 3b4f20ad388755d8b049b6b7387cbb847d142af6
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Wed Jun 6 00:55:39 2018 +0200

    x86/cpu/topology: Provide detect_extended_topology_early()
    
    commit 95f3d39ccf7aaea79d1ffdac1c887c2e100ec1b6 upstream
    
    To support force disabling of SMT it's required to know the number of
    thread siblings early. detect_extended_topology() cannot be called before
    the APIC driver is selected, so split out the part which initializes
    smp_num_siblings.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Acked-by: Ingo Molnar <mingo@kernel.org>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 691997bff5ff7e69b63c30ef36a29afc4a861c4e
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Wed Jun 6 00:53:57 2018 +0200

    x86/cpu/common: Provide detect_ht_early()
    
    commit 545401f4448a807b963ff17b575e0a393e68b523 upstream
    
    To support force disabling of SMT it's required to know the number of
    thread siblings early. detect_ht() cannot be called before the APIC driver
    is selected, so split out the part which initializes smp_num_siblings.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Acked-by: Ingo Molnar <mingo@kernel.org>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit a6d2fa5dd70ad5caf47afce59ec468e176d026d0
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Wed Jun 6 00:47:10 2018 +0200

    x86/cpu/AMD: Remove the pointless detect_ht() call
    
    commit 44ca36de56d1bf196dca2eb67cd753a46961ffe6 upstream
    
    Real 32bit AMD CPUs do not have SMT and the only value of the call was to
    reach the magic printout which got removed.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Acked-by: Ingo Molnar <mingo@kernel.org>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit e0439285c628dea71517a1e77cab805d9134f898
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Wed Jun 6 00:36:15 2018 +0200

    x86/cpu: Remove the pointless CPU printout
    
    commit 55e6d279abd92cfd7576bba031e7589be8475edb upstream
    
    The value of this printout is dubious at best and there is no point in
    having it in two different places along with convoluted ways to reach it.
    
    Remove it completely.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Acked-by: Ingo Molnar <mingo@kernel.org>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit f37486c0a1d05f41e1d159a0798a19d5461c764a
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue May 29 17:48:27 2018 +0200

    cpu/hotplug: Provide knobs to control SMT
    
    commit 05736e4ac13c08a4a9b1ef2de26dd31a32cbee57 upstream
    
    Provide a command line and a sysfs knob to control SMT.
    
    The command line options are:
    
     'nosmt':       Enumerate secondary threads, but do not online them
    
     'nosmt=force': Ignore secondary threads completely during enumeration
                    via MP table and ACPI/MADT.
    
    The sysfs control file has the following states (read/write):
    
     'on':           SMT is enabled. Secondary threads can be freely onlined
     'off':          SMT is disabled. Secondary threads, even if enumerated
                     cannot be onlined
     'forceoff':     SMT is permanentely disabled. Writes to the control
                     file are rejected.
     'notsupported': SMT is not supported by the CPU
    
    The command line option 'nosmt' sets the sysfs control to 'off'. This
    can be changed to 'on' to reenable SMT during runtime.
    
    The command line option 'nosmt=force' sets the sysfs control to
    'forceoff'. This cannot be changed during runtime.
    
    When SMT is 'on' and the control file is changed to 'off' then all online
    secondary threads are offlined and attempts to online a secondary thread
    later on are rejected.
    
    When SMT is 'off' and the control file is changed to 'on' then secondary
    threads can be onlined again. The 'off' -> 'on' transition does not
    automatically online the secondary threads.
    
    When the control file is set to 'forceoff', the behaviour is the same as
    setting it to 'off', but the operation is irreversible and later writes to
    the control file are rejected.
    
    When the control status is 'notsupported' then writes to the control file
    are rejected.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Acked-by: Ingo Molnar <mingo@kernel.org>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 373b8def455ec80db7d951b20562a75ed2df2703
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue May 29 17:49:05 2018 +0200

    cpu/hotplug: Split do_cpu_down()
    
    commit cc1fe215e1efa406b03aa4389e6269b61342dec5 upstream
    
    Split out the inner workings of do_cpu_down() to allow reuse of that
    function for the upcoming SMT disabling mechanism.
    
    No functional change.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Acked-by: Ingo Molnar <mingo@kernel.org>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 9333575fc4a35dbeda5f75fb9cf72e899e569f00
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue May 29 19:05:25 2018 +0200

    cpu/hotplug: Make bringup/teardown of smp threads symmetric
    
    commit c4de65696d865c225fda3b9913b31284ea65ea96 upstream
    
    The asymmetry caused a warning to trigger if the bootup was stopped in state
    CPUHP_AP_ONLINE_IDLE. The warning no longer triggers as kthread_park() can
    now be invoked on already or still parked threads. But there is still no
    reason to have this be asymmetric.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Acked-by: Ingo Molnar <mingo@kernel.org>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 16fd33cd353be2cb71f2431788e5b2ae02891a77
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Thu Jun 21 10:37:20 2018 +0200

    x86/topology: Provide topology_smt_supported()
    
    commit f048c399e0f7490ab7296bc2c255d37eb14a9675 upstream
    
    Provide information whether SMT is supoorted by the CPUs. Preparatory patch
    for SMT control mechanism.
    
    Suggested-by: Dave Hansen <dave.hansen@intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Ingo Molnar <mingo@kernel.org>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 7b69a96e5a328f17fe33f3826d7e8349ab59015d
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue May 29 17:50:22 2018 +0200

    x86/smp: Provide topology_is_primary_thread()
    
    commit 6a4d2657e048f096c7ffcad254010bd94891c8c0 upstream
    
    If the CPU is supporting SMT then the primary thread can be found by
    checking the lower APIC ID bits for zero. smp_num_siblings is used to build
    the mask for the APIC ID bits which need to be taken into account.
    
    This uses the MPTABLE or ACPI/MADT supplied APIC ID, which can be different
    than the initial APIC ID in CPUID. But according to AMD the lower bits have
    to be consistent. Intel gave a tentative confirmation as well.
    
    Preparatory patch to support disabling SMT at boot/runtime.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Acked-by: Ingo Molnar <mingo@kernel.org>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 1ac1dc14671f531134f29755f98386f8e168b810
Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date:   Wed Jun 20 16:42:57 2018 -0400

    x86/bugs: Move the l1tf function and define pr_fmt properly
    
    commit 56563f53d3066afa9e63d6c997bf67e76a8b05c0 upstream
    
    The pr_warn in l1tf_select_mitigation would have used the prior pr_fmt
    which was defined as "Spectre V2 : ".
    
    Move the function to be past SSBD and also define the pr_fmt.
    
    Fixes: 17dbca119312 ("x86/speculation/l1tf: Add sysfs reporting for l1tf")
    Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit e3923475ebb1b503668dfdb3ba90e2ebd46931e6
Author: Andi Kleen <ak@linux.intel.com>
Date:   Wed Jun 13 15:48:28 2018 -0700

    x86/speculation/l1tf: Limit swap file size to MAX_PA/2
    
    commit 377eeaa8e11fe815b1d07c81c4a0e2843a8c15eb upstream
    
    For the L1TF workaround its necessary to limit the swap file size to below
    MAX_PA/2, so that the higher bits of the swap offset inverted never point
    to valid memory.
    
    Add a mechanism for the architecture to override the swap file size check
    in swapfile.c and add a x86 specific max swapfile check function that
    enforces that limit.
    
    The check is only enabled if the CPU is vulnerable to L1TF.
    
    In VMs with 42bit MAX_PA the typical limit is 2TB now, on a native system
    with 46bit PA it is 32TB. The limit is only per individual swap file, so
    it's always possible to exceed these limits with multiple swap files or
    partitions.
    
    Signed-off-by: Andi Kleen <ak@linux.intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Dave Hansen <dave.hansen@intel.com>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 7c5b42f82c13365b8284b5945f5ffa9f88380dd7
Author: Andi Kleen <ak@linux.intel.com>
Date:   Wed Jun 13 15:48:27 2018 -0700

    x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings
    
    commit 42e4089c7890725fcd329999252dc489b72f2921 upstream
    
    For L1TF PROT_NONE mappings are protected by inverting the PFN in the page
    table entry. This sets the high bits in the CPU's address space, thus
    making sure to point to not point an unmapped entry to valid cached memory.
    
    Some server system BIOSes put the MMIO mappings high up in the physical
    address space. If such an high mapping was mapped to unprivileged users
    they could attack low memory by setting such a mapping to PROT_NONE. This
    could happen through a special device driver which is not access
    protected. Normal /dev/mem is of course access protected.
    
    To avoid this forbid PROT_NONE mappings or mprotect for high MMIO mappings.
    
    Valid page mappings are allowed because the system is then unsafe anyways.
    
    It's not expected that users commonly use PROT_NONE on MMIO. But to
    minimize any impact this is only enforced if the mapping actually refers to
    a high MMIO address (defined as the MAX_PA-1 bit being set), and also skip
    the check for root.
    
    For mmaps this is straight forward and can be handled in vm_insert_pfn and
    in remap_pfn_range().
    
    For mprotect it's a bit trickier. At the point where the actual PTEs are
    accessed a lot of state has been changed and it would be difficult to undo
    on an error. Since this is a uncommon case use a separate early page talk
    walk pass for MMIO PROT_NONE mappings that checks for this condition
    early. For non MMIO and non PROT_NONE there are no changes.
    
    [dwmw2: Backport to 4.9]
    
    Signed-off-by: Andi Kleen <ak@linux.intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
    Acked-by: Dave Hansen <dave.hansen@intel.com>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 432e99b34066099db62f87b2704654b1b23fd6be
Author: Andi Kleen <ak@linux.intel.com>
Date:   Wed Jun 13 15:48:26 2018 -0700

    x86/speculation/l1tf: Add sysfs reporting for l1tf
    
    commit 17dbca119312b4e8173d4e25ff64262119fcef38 upstream
    
    L1TF core kernel workarounds are cheap and normally always enabled, However
    they still should be reported in sysfs if the system is vulnerable or
    mitigated. Add the necessary CPU feature/bug bits.
    
    - Extend the existing checks for Meltdowns to determine if the system is
      vulnerable. All CPUs which are not vulnerable to Meltdown are also not
      vulnerable to L1TF
    
    - Check for 32bit non PAE and emit a warning as there is no practical way
      for mitigation due to the limited physical address bits
    
    - If the system has more than MAX_PA/2 physical memory the invert page
      workarounds don't protect the system against the L1TF attack anymore,
      because an inverted physical address will also point to valid
      memory. Print a warning in this case and report that the system is
      vulnerable.
    
    Add a function which returns the PFN limit for the L1TF mitigation, which
    will be used in follow up patches for sanity and range checks.
    
    [ tglx: Renamed the CPU feature bit to L1TF_PTEINV ]
    [ dwmw2: Backport to 4.9 (cpufeatures.h, E820) ]
    
    Signed-off-by: Andi Kleen <ak@linux.intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
    Acked-by: Dave Hansen <dave.hansen@intel.com>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 5b2ec92f70f6d4084d23bf42391fd27fa03e8c4c
Author: Andi Kleen <ak@linux.intel.com>
Date:   Wed Jun 13 15:48:25 2018 -0700

    x86/speculation/l1tf: Make sure the first page is always reserved
    
    commit 10a70416e1f067f6c4efda6ffd8ea96002ac4223 upstream
    
    The L1TF workaround doesn't make any attempt to mitigate speculate accesses
    to the first physical page for zeroed PTEs. Normally it only contains some
    data from the early real mode BIOS.
    
    It's not entirely clear that the first page is reserved in all
    configurations, so add an extra reservation call to make sure it is really
    reserved. In most configurations (e.g.  with the standard reservations)
    it's likely a nop.
    
    Signed-off-by: Andi Kleen <ak@linux.intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
    Acked-by: Dave Hansen <dave.hansen@intel.com>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 33182fe97add6e83c195e9d0f7297a6499563b52
Author: Andi Kleen <ak@linux.intel.com>
Date:   Wed Jun 13 15:48:24 2018 -0700

    x86/speculation/l1tf: Protect PROT_NONE PTEs against speculation
    
    commit 6b28baca9b1f0d4a42b865da7a05b1c81424bd5c upstream
    
    When PTEs are set to PROT_NONE the kernel just clears the Present bit and
    preserves the PFN, which creates attack surface for L1TF speculation
    speculation attacks.
    
    This is important inside guests, because L1TF speculation bypasses physical
    page remapping. While the host has its own migitations preventing leaking
    data from other VMs into the guest, this would still risk leaking the wrong
    page inside the current guest.
    
    This uses the same technique as Linus' swap entry patch: while an entry is
    is in PROTNONE state invert the complete PFN part part of it. This ensures
    that the the highest bit will point to non existing memory.
    
    The invert is done by pte/pmd_modify and pfn/pmd/pud_pte for PROTNONE and
    pte/pmd/pud_pfn undo it.
    
    This assume that no code path touches the PFN part of a PTE directly
    without using these primitives.
    
    This doesn't handle the case that MMIO is on the top of the CPU physical
    memory. If such an MMIO region was exposed by an unpriviledged driver for
    mmap it would be possible to attack some real memory.  However this
    situation is all rather unlikely.
    
    For 32bit non PAE the inversion is not done because there are really not
    enough bits to protect anything.
    
    Q: Why does the guest need to be protected when the HyperVisor already has
       L1TF mitigations?
    
    A: Here's an example:
    
       Physical pages 1 2 get mapped into a guest as
       GPA 1 -> PA 2
       GPA 2 -> PA 1
       through EPT.
    
       The L1TF speculation ignores the EPT remapping.
    
       Now the guest kernel maps GPA 1 to process A and GPA 2 to process B, and
       they belong to different users and should be isolated.
    
       A sets the GPA 1 PA 2 PTE to PROT_NONE to bypass the EPT remapping and
       gets read access to the underlying physical page. Which in this case
       points to PA 2, so it can read process B's data, if it happened to be in
       L1, so isolation inside the guest is broken.
    
       There's nothing the hypervisor can do about this. This mitigation has to
       be done in the guest itself.
    
    [ tglx: Massaged changelog ]
    [ dwmw2: backported to 4.9 ]
    
    Signed-off-by: Andi Kleen <ak@linux.intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Acked-by: Dave Hansen <dave.hansen@intel.com>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 60712274887fcd4ad5eb8e01796022b6b202143c
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Wed Jun 13 15:48:23 2018 -0700

    x86/speculation/l1tf: Protect swap entries against L1TF
    
    commit 2f22b4cd45b67b3496f4aa4c7180a1271c6452f6 upstream
    
    With L1 terminal fault the CPU speculates into unmapped PTEs, and resulting
    side effects allow to read the memory the PTE is pointing too, if its
    values are still in the L1 cache.
    
    For swapped out pages Linux uses unmapped PTEs and stores a swap entry into
    them.
    
    To protect against L1TF it must be ensured that the swap entry is not
    pointing to valid memory, which requires setting higher bits (between bit
    36 and bit 45) that are inside the CPUs physical address space, but outside
    any real memory.
    
    To do this invert the offset to make sure the higher bits are always set,
    as long as the swap file is not too big.
    
    Note there is no workaround for 32bit !PAE, or on systems which have more
    than MAX_PA/2 worth of memory. The later case is very unlikely to happen on
    real systems.
    
    [AK: updated description and minor tweaks by. Split out from the original
         patch ]
    
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Andi Kleen <ak@linux.intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Andi Kleen <ak@linux.intel.com>
    Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Acked-by: Dave Hansen <dave.hansen@intel.com>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 2c9b57e4474d93222bcb6e7f901fd1e71ded699c
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Wed Jun 13 15:48:22 2018 -0700

    x86/speculation/l1tf: Change order of offset/type in swap entry
    
    commit bcd11afa7adad8d720e7ba5ef58bdcd9775cf45f upstream
    
    If pages are swapped out, the swap entry is stored in the corresponding
    PTE, which has the Present bit cleared. CPUs vulnerable to L1TF speculate
    on PTE entries which have the present bit set and would treat the swap
    entry as phsyical address (PFN). To mitigate that the upper bits of the PTE
    must be set so the PTE points to non existent memory.
    
    The swap entry stores the type and the offset of a swapped out page in the
    PTE. type is stored in bit 9-13 and offset in bit 14-63. The hardware
    ignores the bits beyond the phsyical address space limit, so to make the
    mitigation effective its required to start 'offset' at the lowest possible
    bit so that even large swap offsets do not reach into the physical address
    space limit bits.
    
    Move offset to bit 9-58 and type to bit 59-63 which are the bits that
    hardware generally doesn't care about.
    
    That, in turn, means that if you on desktop chip with only 40 bits of
    physical addressing, now that the offset starts at bit 9, there needs to be
    30 bits of offset actually *in use* until bit 39 ends up being set, which
    means when inverted it will again point into existing memory.
    
    So that's 4 terabyte of swap space (because the offset is counted in pages,
    so 30 bits of offset is 42 bits of actual coverage). With bigger physical
    addressing, that obviously grows further, until the limit of the offset is
    hit (at 50 bits of offset - 62 bits of actual swap file coverage).
    
    This is a preparatory change for the actual swap entry inversion to protect
    against L1TF.
    
    [ AK: Updated description and minor tweaks. Split into two parts ]
    [ tglx: Massaged changelog ]
    
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Andi Kleen <ak@linux.intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Andi Kleen <ak@linux.intel.com>
    Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Acked-by: Dave Hansen <dave.hansen@intel.com>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 1a4922e0f01d08a4789b1e17b195bc30bf234a3b
Author: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Date:   Fri Sep 8 16:10:46 2017 -0700

    mm: x86: move _PAGE_SWP_SOFT_DIRTY from bit 7 to bit 1
    
    commit eee4818baac0f2b37848fdf90e4b16430dc536ac upstream
    
    _PAGE_PSE is used to distinguish between a truly non-present
    (_PAGE_PRESENT=0) PMD, and a PMD which is undergoing a THP split and
    should be treated as present.
    
    But _PAGE_SWP_SOFT_DIRTY currently uses the _PAGE_PSE bit, which would
    cause confusion between one of those PMDs undergoing a THP split, and a
    soft-dirty PMD.  Dropping _PAGE_PSE check in pmd_present() does not work
    well, because it can hurt optimization of tlb handling in thp split.
    
    Thus, we need to move the bit.
    
    In the current kernel, bits 1-4 are not used in non-present format since
    commit 00839ee3b299 ("x86/mm: Move swap offset/type up in PTE to work
    around erratum").  So let's move _PAGE_SWP_SOFT_DIRTY to bit 1.  Bit 7
    is used as reserved (always clear), so please don't use it for other
    purpose.
    
    [dwmw2: Pulled in to 4.9 backport to support L1TF changes]
    
    Link: http://lkml.kernel.org/r/20170717193955.20207-3-zi.yan@sent.com
    Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
    Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
    Acked-by: Dave Hansen <dave.hansen@intel.com>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
    Cc: David Nellans <dnellans@nvidia.com>
    Cc: Ingo Molnar <mingo@elte.hu>
    Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit bbd07cbb1076de03d896c9c3787081b1080e8c99
Author: Andi Kleen <ak@linux.intel.com>
Date:   Wed Jun 13 15:48:21 2018 -0700

    x86/speculation/l1tf: Increase 32bit PAE __PHYSICAL_PAGE_SHIFT
    
    commit 50896e180c6aa3a9c61a26ced99e15d602666a4c upstream
    
    L1 Terminal Fault (L1TF) is a speculation related vulnerability. The CPU
    speculates on PTE entries which do not have the PRESENT bit set, if the
    content of the resulting physical address is available in the L1D cache.
    
    The OS side mitigation makes sure that a !PRESENT PTE entry points to a
    physical address outside the actually existing and cachable memory
    space. This is achieved by inverting the upper bits of the PTE. Due to the
    address space limitations this only works for 64bit and 32bit PAE kernels,
    but not for 32bit non PAE.
    
    This mitigation applies to both host and guest kernels, but in case of a
    64bit host (hypervisor) and a 32bit PAE guest, inverting the upper bits of
    the PAE address space (44bit) is not enough if the host has more than 43
    bits of populated memory address space, because the speculation treats the
    PTE content as a physical host address bypassing EPT.
    
    The host (hypervisor) protects itself against the guest by flushing L1D as
    needed, but pages inside the guest are not protected against attacks from
    other processes inside the same guest.
    
    For the guest the inverted PTE mask has to match the host to provide the
    full protection for all pages the host could possibly map into the
    guest. The hosts populated address space is not known to the guest, so the
    mask must cover the possible maximal host address space, i.e. 52 bit.
    
    On 32bit PAE the maximum PTE mask is currently set to 44 bit because that
    is the limit imposed by 32bit unsigned long PFNs in the VMs. This limits
    the mask to be below what the host could possible use for physical pages.
    
    The L1TF PROT_NONE protection code uses the PTE masks to determine which
    bits to invert to make sure the higher bits are set for unmapped entries to
    prevent L1TF speculation attacks against EPT inside guests.
    
    In order to invert all bits that could be used by the host, increase
    __PHYSICAL_PAGE_SHIFT to 52 to match 64bit.
    
    The real limit for a 32bit PAE kernel is still 44 bits because all Linux
    PTEs are created from unsigned long PFNs, so they cannot be higher than 44
    bits on a 32bit kernel. So these extra PFN bits should be never set. The
    only users of this macro are using it to look at PTEs, so it's safe.
    
    [ tglx: Massaged changelog ]
    
    Signed-off-by: Andi Kleen <ak@linux.intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Dave Hansen <dave.hansen@intel.com>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 329d815667373e858497b5947ad0484194d8c3e2
Author: Nick Desaulniers <ndesaulniers@google.com>
Date:   Fri Aug 3 10:05:50 2018 -0700

    x86/irqflags: Provide a declaration for native_save_fl
    
    commit 208cbb32558907f68b3b2a081ca2337ac3744794 upstream.
    
    It was reported that the commit d0a8d9378d16 is causing users of gcc < 4.9
    to observe -Werror=missing-prototypes errors.
    
    Indeed, it seems that:
    extern inline unsigned long native_save_fl(void) { return 0; }
    
    compiled with -Werror=missing-prototypes produces this warning in gcc <
    4.9, but not gcc >= 4.9.
    
    Fixes: d0a8d9378d16 ("x86/paravirt: Make native_save_fl() extern inline").
    Reported-by: David Laight <david.laight@aculab.com>
    Reported-by: Jean Delvare <jdelvare@suse.de>
    Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Cc: hpa@zytor.com
    Cc: jgross@suse.com
    Cc: kstewart@linuxfoundation.org
    Cc: gregkh@linuxfoundation.org
    Cc: boris.ostrovsky@oracle.com
    Cc: astrachan@google.com
    Cc: mka@chromium.org
    Cc: arnd@arndb.de
    Cc: tstellar@redhat.com
    Cc: sedat.dilek@gmail.com
    Cc: David.Laight@aculab.com
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20180803170550.164688-1-ndesaulniers@google.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit a92daabdfc87c320a626b2ad0318c2a0dee17a30
Author: Masami Hiramatsu <mhiramat@kernel.org>
Date:   Sat Apr 28 21:37:03 2018 +0900

    kprobes/x86: Fix %p uses in error messages
    
    commit 0ea063306eecf300fcf06d2f5917474b580f666f upstream.
    
    Remove all %p uses in error messages in kprobes/x86.
    
    Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
    Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: David Howells <dhowells@redhat.com>
    Cc: David S . Miller <davem@davemloft.net>
    Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
    Cc: Jon Medhurst <tixy@linaro.org>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Thomas Richter <tmricht@linux.ibm.com>
    Cc: Tobin C . Harding <me@tobin.cc>
    Cc: Will Deacon <will.deacon@arm.com>
    Cc: acme@kernel.org
    Cc: akpm@linux-foundation.org
    Cc: brueckner@linux.vnet.ibm.com
    Cc: linux-arch@vger.kernel.org
    Cc: rostedt@goodmis.org
    Cc: schwidefsky@de.ibm.com
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/lkml/152491902310.9916.13355297638917767319.stgit@devbox
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 6455f41db5206cf46b623be071a0aa308c183642
Author: Jiri Kosina <jkosina@suse.cz>
Date:   Thu Jul 26 13:14:55 2018 +0200

    x86/speculation: Protect against userspace-userspace spectreRSB
    
    commit fdf82a7856b32d905c39afc85e34364491e46346 upstream.
    
    The article "Spectre Returns! Speculation Attacks using the Return Stack
    Buffer" [1] describes two new (sub-)variants of spectrev2-like attacks,
    making use solely of the RSB contents even on CPUs that don't fallback to
    BTB on RSB underflow (Skylake+).
    
    Mitigate userspace-userspace attacks by always unconditionally filling RSB on
    context switch when the generic spectrev2 mitigation has been enabled.
    
    [1] https://arxiv.org/pdf/1807.07940.pdf
    
    Signed-off-by: Jiri Kosina <jkosina@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
    Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
    Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Cc: Borislav Petkov <bp@suse.de>
    Cc: David Woodhouse <dwmw@amazon.co.uk>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1807261308190.997@cbobk.fhfr.pm
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 640fe070d801b91081b7c9e3575b1ed2f0018eeb
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Fri Aug 3 16:41:39 2018 +0200

    x86/paravirt: Fix spectre-v2 mitigations for paravirt guests
    
    commit 5800dc5c19f34e6e03b5adab1282535cb102fafd upstream.
    
    Nadav reported that on guests we're failing to rewrite the indirect
    calls to CALLEE_SAVE paravirt functions. In particular the
    pv_queued_spin_unlock() call is left unpatched and that is all over the
    place. This obviously wrecks Spectre-v2 mitigation (for paravirt
    guests) which relies on not actually having indirect calls around.
    
    The reason is an incorrect clobber test in paravirt_patch_call(); this
    function rewrites an indirect call with a direct call to the _SAME_
    function, there is no possible way the clobbers can be different
    because of this.
    
    Therefore remove this clobber check. Also put WARNs on the other patch
    failure case (not enough room for the instruction) which I've not seen
    trigger in my (limited) testing.
    
    Three live kernel image disassemblies for lock_sock_nested (as a small
    function that illustrates the problem nicely). PRE is the current
    situation for guests, POST is with this patch applied and NATIVE is with
    or without the patch for !guests.
    
    PRE:
    
    (gdb) disassemble lock_sock_nested
    Dump of assembler code for function lock_sock_nested:
       0xffffffff817be970 <+0>:     push   %rbp
       0xffffffff817be971 <+1>:     mov    %rdi,%rbp
       0xffffffff817be974 <+4>:     push   %rbx
       0xffffffff817be975 <+5>:     lea    0x88(%rbp),%rbx
       0xffffffff817be97c <+12>:    callq  0xffffffff819f7160 <_cond_resched>
       0xffffffff817be981 <+17>:    mov    %rbx,%rdi
       0xffffffff817be984 <+20>:    callq  0xffffffff819fbb00 <_raw_spin_lock_bh>
       0xffffffff817be989 <+25>:    mov    0x8c(%rbp),%eax
       0xffffffff817be98f <+31>:    test   %eax,%eax
       0xffffffff817be991 <+33>:    jne    0xffffffff817be9ba <lock_sock_nested+74>
       0xffffffff817be993 <+35>:    movl   $0x1,0x8c(%rbp)
       0xffffffff817be99d <+45>:    mov    %rbx,%rdi
       0xffffffff817be9a0 <+48>:    callq  *0xffffffff822299e8
       0xffffffff817be9a7 <+55>:    pop    %rbx
       0xffffffff817be9a8 <+56>:    pop    %rbp
       0xffffffff817be9a9 <+57>:    mov    $0x200,%esi
       0xffffffff817be9ae <+62>:    mov    $0xffffffff817be993,%rdi
       0xffffffff817be9b5 <+69>:    jmpq   0xffffffff81063ae0 <__local_bh_enable_ip>
       0xffffffff817be9ba <+74>:    mov    %rbp,%rdi
       0xffffffff817be9bd <+77>:    callq  0xffffffff817be8c0 <__lock_sock>
       0xffffffff817be9c2 <+82>:    jmp    0xffffffff817be993 <lock_sock_nested+35>
    End of assembler dump.
    
    POST:
    
    (gdb) disassemble lock_sock_nested
    Dump of assembler code for function lock_sock_nested:
       0xffffffff817be970 <+0>:     push   %rbp
       0xffffffff817be971 <+1>:     mov    %rdi,%rbp
       0xffffffff817be974 <+4>:     push   %rbx
       0xffffffff817be975 <+5>:     lea    0x88(%rbp),%rbx
       0xffffffff817be97c <+12>:    callq  0xffffffff819f7160 <_cond_resched>
       0xffffffff817be981 <+17>:    mov    %rbx,%rdi
       0xffffffff817be984 <+20>:    callq  0xffffffff819fbb00 <_raw_spin_lock_bh>
       0xffffffff817be989 <+25>:    mov    0x8c(%rbp),%eax
       0xffffffff817be98f <+31>:    test   %eax,%eax
       0xffffffff817be991 <+33>:    jne    0xffffffff817be9ba <lock_sock_nested+74>
       0xffffffff817be993 <+35>:    movl   $0x1,0x8c(%rbp)
       0xffffffff817be99d <+45>:    mov    %rbx,%rdi
       0xffffffff817be9a0 <+48>:    callq  0xffffffff810a0c20 <__raw_callee_save___pv_queued_spin_unlock>
       0xffffffff817be9a5 <+53>:    xchg   %ax,%ax
       0xffffffff817be9a7 <+55>:    pop    %rbx
       0xffffffff817be9a8 <+56>:    pop    %rbp
       0xffffffff817be9a9 <+57>:    mov    $0x200,%esi
       0xffffffff817be9ae <+62>:    mov    $0xffffffff817be993,%rdi
       0xffffffff817be9b5 <+69>:    jmpq   0xffffffff81063aa0 <__local_bh_enable_ip>
       0xffffffff817be9ba <+74>:    mov    %rbp,%rdi
       0xffffffff817be9bd <+77>:    callq  0xffffffff817be8c0 <__lock_sock>
       0xffffffff817be9c2 <+82>:    jmp    0xffffffff817be993 <lock_sock_nested+35>
    End of assembler dump.
    
    NATIVE:
    
    (gdb) disassemble lock_sock_nested
    Dump of assembler code for function lock_sock_nested:
       0xffffffff817be970 <+0>:     push   %rbp
       0xffffffff817be971 <+1>:     mov    %rdi,%rbp
       0xffffffff817be974 <+4>:     push   %rbx
       0xffffffff817be975 <+5>:     lea    0x88(%rbp),%rbx
       0xffffffff817be97c <+12>:    callq  0xffffffff819f7160 <_cond_resched>
       0xffffffff817be981 <+17>:    mov    %rbx,%rdi
       0xffffffff817be984 <+20>:    callq  0xffffffff819fbb00 <_raw_spin_lock_bh>
       0xffffffff817be989 <+25>:    mov    0x8c(%rbp),%eax
       0xffffffff817be98f <+31>:    test   %eax,%eax
       0xffffffff817be991 <+33>:    jne    0xffffffff817be9ba <lock_sock_nested+74>
       0xffffffff817be993 <+35>:    movl   $0x1,0x8c(%rbp)
       0xffffffff817be99d <+45>:    mov    %rbx,%rdi
       0xffffffff817be9a0 <+48>:    movb   $0x0,(%rdi)
       0xffffffff817be9a3 <+51>:    nopl   0x0(%rax)
       0xffffffff817be9a7 <+55>:    pop    %rbx
       0xffffffff817be9a8 <+56>:    pop    %rbp
       0xffffffff817be9a9 <+57>:    mov    $0x200,%esi
       0xffffffff817be9ae <+62>:    mov    $0xffffffff817be993,%rdi
       0xffffffff817be9b5 <+69>:    jmpq   0xffffffff81063ae0 <__local_bh_enable_ip>
       0xffffffff817be9ba <+74>:    mov    %rbp,%rdi
       0xffffffff817be9bd <+77>:    callq  0xffffffff817be8c0 <__lock_sock>
       0xffffffff817be9c2 <+82>:    jmp    0xffffffff817be993 <lock_sock_nested+35>
    End of assembler dump.
    
    
    Fixes: 63f70270ccd9 ("[PATCH] i386: PARAVIRT: add common patching machinery")
    Fixes: 3010a0663fd9 ("x86/paravirt, objtool: Annotate indirect calls")
    Reported-by: Nadav Amit <namit@vmware.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Juergen Gross <jgross@suse.com>
    Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
    Cc: David Woodhouse <dwmw2@infradead.org>
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 16aeb3f175a1b4d68dd68418230a1644de95fb6b
Author: Oleksij Rempel <o.rempel@pengutronix.de>
Date:   Fri Jun 15 09:41:29 2018 +0200

    ARM: dts: imx6sx: fix irq for pcie bridge
    
    commit 1bcfe0564044be578841744faea1c2f46adc8178 upstream.
    
    Use the correct IRQ line for the MSI controller in the PCIe host
    controller. Apparently a different IRQ line is used compared to other
    i.MX6 variants. Without this change MSI IRQs aren't properly propagated
    to the upstream interrupt controller.
    
    Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
    Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
    Fixes: b1d17f68e5c5 ("ARM: dts: imx: add initial imx6sx device tree source")
    Signed-off-by: Shawn Guo <shawnguo@kernel.org>
    Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 27250cf83def3eeff4e438571a0aa9c18deb898f
Author: Michael Mera <dev@michaelmera.com>
Date:   Mon May 1 15:41:16 2017 +0900

    IB/ocrdma: fix out of bounds access to local buffer
    
    commit 062d0f22a30c39840ea49b72cfcfc1aa4cc538fa upstream.
    
    In write to debugfs file 'resource_stats' the local buffer 'tmp_str' is
    written at index 'count-1' where 'count' is the size of the write, so
    potentially 0.
    
    This patch filters odd values for the write size/position to avoid this
    type of problem.
    
    Signed-off-by: Michael Mera <dev@michaelmera.com>
    Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
    Signed-off-by: Doug Ledford <dledford@redhat.com>
    Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 5ee45fc998a3e45af45f8886fb13cc417c6f18d0
Author: Fabio Estevam <fabio.estevam@nxp.com>
Date:   Fri Jan 5 18:02:55 2018 -0200

    mtd: nand: qcom: Add a NULL check for devm_kasprintf()
    
    commit 069f05346d01e7298939f16533953cdf52370be3 upstream.
    
    devm_kasprintf() may fail, so we should better add a NULL check
    and propagate an error on failure.
    
    Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
    Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
    Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit e2ba7bf19727b67f0f3850c84829a19898401928
Author: Jack Morgenstein <jackm@dev.mellanox.co.il>
Date:   Wed May 23 15:30:31 2018 +0300

    IB/mlx4: Mark user MR as writable if actual virtual memory is writable
    
    commit d8f9cc328c8888369880e2527e9186d745f2bbf6 upstream.
    
    To allow rereg_user_mr to modify the MR from read-only to writable without
    using get_user_pages again, we needed to define the initial MR as writable.
    However, this was originally done unconditionally, without taking into
    account the writability of the underlying virtual memory.
    
    As a result, any attempt to register a read-only MR over read-only
    virtual memory failed.
    
    To fix this, do not add the writable flag bit when the user virtual memory
    is not writable (e.g. const memory).
    
    However, when the underlying memory is NOT writable (and we therefore
    do not define the initial MR as writable), the IB core adds a
    "force writable" flag to its user-pages request. If this succeeds,
    the reg_user_mr caller gets a writable copy of the original pages.
    
    If the user-space caller then does a rereg_user_mr operation to enable
    writability, this will succeed. This should not be allowed, since
    the original virtual memory was not writable.
    
    Cc: <stable@vger.kernel.org>
    Fixes: 9376932d0c26 ("IB/mlx4_ib: Add support for user MR re-registration")
    Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
    Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
    Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
    Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 11410f99982cbc7ee71c58437d1f0cccd7bb5e96
Author: Jack Morgenstein <jackm@dev.mellanox.co.il>
Date:   Wed May 23 15:30:30 2018 +0300

    IB/core: Make testing MR flags for writability a static inline function
    
    commit 08bb558ac11ab944e0539e78619d7b4c356278bd upstream.
    
    Make the MR writability flags check, which is performed in umem.c,
    a static inline function in file ib_verbs.h
    
    This allows the function to be used by low-level infiniband drivers.
    
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
    Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
    Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
    Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit a3a7b992b240ba621a47ff2d3465fa4f0534e297
Author: Eric W. Biederman <ebiederm@xmission.com>
Date:   Thu Jul 6 08:41:06 2017 -0500

    proc: Fix proc_sys_prune_dcache to hold a sb reference
    
    commit 2fd1d2c4ceb2248a727696962cf3370dc9f5a0a4 upstream.
    
    Andrei Vagin writes:
    FYI: This bug has been reproduced on 4.11.7
    > BUG: Dentry ffff895a3dd01240{i=4e7c09a,n=lo}  still in use (1) [unmount of proc proc]
    > ------------[ cut here ]------------
    > WARNING: CPU: 1 PID: 13588 at fs/dcache.c:1445 umount_check+0x6e/0x80
    > CPU: 1 PID: 13588 Comm: kworker/1:1 Not tainted 4.11.7-200.fc25.x86_64 #1
    > Hardware name: CompuLab sbc-flt1/fitlet, BIOS SBCFLT_0.08.04 06/27/2015
    > Workqueue: events proc_cleanup_work
    > Call Trace:
    >  dump_stack+0x63/0x86
    >  __warn+0xcb/0xf0
    >  warn_slowpath_null+0x1d/0x20
    >  umount_check+0x6e/0x80
    >  d_walk+0xc6/0x270
    >  ? dentry_free+0x80/0x80
    >  do_one_tree+0x26/0x40
    >  shrink_dcache_for_umount+0x2d/0x90
    >  generic_shutdown_super+0x1f/0xf0
    >  kill_anon_super+0x12/0x20
    >  proc_kill_sb+0x40/0x50
    >  deactivate_locked_super+0x43/0x70
    >  deactivate_super+0x5a/0x60
    >  cleanup_mnt+0x3f/0x90
    >  mntput_no_expire+0x13b/0x190
    >  kern_unmount+0x3e/0x50
    >  pid_ns_release_proc+0x15/0x20
    >  proc_cleanup_work+0x15/0x20
    >  process_one_work+0x197/0x450
    >  worker_thread+0x4e/0x4a0
    >  kthread+0x109/0x140
    >  ? process_one_work+0x450/0x450
    >  ? kthread_park+0x90/0x90
    >  ret_from_fork+0x2c/0x40
    > ---[ end trace e1c109611e5d0b41 ]---
    > VFS: Busy inodes after unmount of proc. Self-destruct in 5 seconds.  Have a nice day...
    > BUG: unable to handle kernel NULL pointer dereference at           (null)
    > IP: _raw_spin_lock+0xc/0x30
    > PGD 0
    
    Fix this by taking a reference to the super block in proc_sys_prune_dcache.
    
    The superblock reference is the core of the fix however the sysctl_inodes
    list is converted to a hlist so that hlist_del_init_rcu may be used.  This
    allows proc_sys_prune_dache to remove inodes the sysctl_inodes list, while
    not causing problems for proc_sys_evict_inode when if it later choses to
    remove the inode from the sysctl_inodes list.  Removing inodes from the
    sysctl_inodes list allows proc_sys_prune_dcache to have a progress
    guarantee, while still being able to drop all locks.  The fact that
    head->unregistering is set in start_unregistering ensures that no more
    inodes will be added to the the sysctl_inodes list.
    
    Previously the code did a dance where it delayed calling iput until the
    next entry in the list was being considered to ensure the inode remained on
    the sysctl_inodes list until the next entry was walked to.  The structure
    of the loop in this patch does not need that so is much easier to
    understand and maintain.
    
    Cc: stable@vger.kernel.org
    Reported-by: Andrei Vagin <avagin@gmail.com>
    Tested-by: Andrei Vagin <avagin@openvz.org>
    Fixes: ace0c791e6c3 ("proc/sysctl: Don't grab i_lock under sysctl_lock.")
    Fixes: d6cffbbe9a7e ("proc/sysctl: prune stale dentries during unregistering")
    Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 631f93a6fe847d2d317010d5bbd7cb3bcc284336
Author: Eric W. Biederman <ebiederm@xmission.com>
Date:   Mon Feb 20 18:17:03 2017 +1300

    proc/sysctl: Don't grab i_lock under sysctl_lock.
    
    commit ace0c791e6c3cf5ef37cad2df69f0d90ccc40ffb upstream.
    
    Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:
    > This patch has locking problem. I've got lockdep splat under LTP.
    >
    > [ 6633.115456] ======================================================
    > [ 6633.115502] [ INFO: possible circular locking dependency detected ]
    > [ 6633.115553] 4.9.10-debug+ #9 Tainted: G             L
    > [ 6633.115584] -------------------------------------------------------
    > [ 6633.115627] ksm02/284980 is trying to acquire lock:
    > [ 6633.115659]  (&sb->s_type->i_lock_key#4){+.+...}, at: [<ffffffff816bc1ce>] igrab+0x1e/0x80
    > [ 6633.115834] but task is already holding lock:
    > [ 6633.115882]  (sysctl_lock){+.+...}, at: [<ffffffff817e379b>] unregister_sysctl_table+0x6b/0x110
    > [ 6633.116026] which lock already depends on the new lock.
    > [ 6633.116026]
    > [ 6633.116080]
    > [ 6633.116080] the existing dependency chain (in reverse order) is:
    > [ 6633.116117]
    > -> #2 (sysctl_lock){+.+...}:
    > -> #1 (&(&dentry->d_lockref.lock)->rlock){+.+...}:
    > -> #0 (&sb->s_type->i_lock_key#4){+.+...}:
    >
    > d_lock nests inside i_lock
    > sysctl_lock nests inside d_lock in d_compare
    >
    > This patch adds i_lock nesting inside sysctl_lock.
    
    Al Viro <viro@ZenIV.linux.org.uk> replied:
    > Once ->unregistering is set, you can drop sysctl_lock just fine.  So I'd
    > try something like this - use rcu_read_lock() in proc_sys_prune_dcache(),
    > drop sysctl_lock() before it and regain after.  Make sure that no inodes
    > are added to the list ones ->unregistering has been set and use RCU list
    > primitives for modifying the inode list, with sysctl_lock still used to
    > serialize its modifications.
    >
    > Freeing struct inode is RCU-delayed (see proc_destroy_inode()), so doing
    > igrab() is safe there.  Since we don't drop inode reference until after we'd
    > passed beyond it in the list, list_for_each_entry_rcu() should be fine.
    
    I agree with Al Viro's analsysis of the situtation.
    
    Fixes: d6cffbbe9a7e ("proc/sysctl: prune stale dentries during unregistering")
    Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
    Tested-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
    Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
    Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit b96e215e539509cae8bfe468689b70661cf511b4
Author: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Date:   Fri Feb 10 10:35:02 2017 +0300

    proc/sysctl: prune stale dentries during unregistering
    
    commit d6cffbbe9a7e51eb705182965a189457c17ba8a3 upstream.
    
    Currently unregistering sysctl table does not prune its dentries.
    Stale dentries could slowdown sysctl operations significantly.
    
    For example, command:
    
     # for i in {1..100000} ; do unshare -n -- sysctl -a &> /dev/null ; done
     creates a millions of stale denties around sysctls of loopback interface:
    
     # sysctl fs.dentry-state
     fs.dentry-state = 25812579  24724135        45      0       0       0
    
     All of them have matching names thus lookup have to scan though whole
     hash chain and call d_compare (proc_sys_compare) which checks them
     under system-wide spinlock (sysctl_lock).
    
     # time sysctl -a > /dev/null
     real    1m12.806s
     user    0m0.016s
     sys     1m12.400s
    
    Currently only memory reclaimer could remove this garbage.
    But without significant memory pressure this never happens.
    
    This patch collects sysctl inodes into list on sysctl table header and
    prunes all their dentries once that table unregisters.
    
    Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:
    > On 10.02.2017 10:47, Al Viro wrote:
    >> how about >> the matching stats *after* that patch?
    >
    > dcache size doesn't grow endlessly, so stats are fine
    >
    > # sysctl fs.dentry-state
    > fs.dentry-state = 92712       58376   45      0       0       0
    >
    > # time sysctl -a &>/dev/null
    >
    > real  0m0.013s
    > user  0m0.004s
    > sys   0m0.008s
    
    Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
    Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
    Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit e31578c6fb0b89ceb8ef943528279571dfc0f8dc
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Thu Aug 9 17:51:32 2018 -0400

    fix __legitimize_mnt()/mntput() race
    
    commit 119e1ef80ecfe0d1deb6378d4ab41f5b71519de1 upstream.
    
    __legitimize_mnt() has two problems - one is that in case of success
    the check of mount_lock is not ordered wrt preceding increment of
    refcount, making it possible to have successful __legitimize_mnt()
    on one CPU just before the otherwise final mntpu() on another,
    with __legitimize_mnt() not seeing mntput() taking the lock and
    mntput() not seeing the increment done by __legitimize_mnt().
    Solved by a pair of barriers.
    
    Another is that failure of __legitimize_mnt() on the second
    read_seqretry() leaves us with reference that'll need to be
    dropped by caller; however, if that races with final mntput()
    we can end up with caller dropping rcu_read_lock() and doing
    mntput() to release that reference - with the first mntput()
    having freed the damn thing just as rcu_read_lock() had been
    dropped.  Solution: in "do mntput() yourself" failure case
    grab mount_lock, check if MNT_DOOMED has been set by racing
    final mntput() that has missed our increment and if it has -
    undo the increment and treat that as "failure, caller doesn't
    need to drop anything" case.
    
    It's not easy to hit - the final mntput() has to come right
    after the first read_seqretry() in __legitimize_mnt() *and*
    manage to miss the increment done by __legitimize_mnt() before
    the second read_seqretry() in there.  The things that are almost
    impossible to hit on bare hardware are not impossible on SMP
    KVM, though...
    
    Reported-by: Oleg Nesterov <oleg@redhat.com>
    Fixes: 48a066e72d97 ("RCU'd vsfmounts")
    Cc: stable@vger.kernel.org
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 87a2d84d2ff4aea2f9bc8c5801f5044024fac1c4
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Thu Aug 9 17:21:17 2018 -0400

    fix mntput/mntput race
    
    commit 9ea0a46ca2c318fcc449c1e6b62a7230a17888f1 upstream.
    
    mntput_no_expire() does the calculation of total refcount under mount_lock;
    unfortunately, the decrement (as well as all increments) are done outside
    of it, leading to false positives in the "are we dropping the last reference"
    test.  Consider the following situation:
            * mnt is a lazy-umounted mount, kept alive by two opened files.  One
    of those files gets closed.  Total refcount of mnt is 2.  On CPU 42
    mntput(mnt) (called from __fput()) drops one reference, decrementing component
            * After it has looked at component #0, the process on CPU 0 does
    mntget(), incrementing component #0, gets preempted and gets to run again -
    on CPU 69.  There it does mntput(), which drops the reference (component #69)
    and proceeds to spin on mount_lock.
            * On CPU 42 our first mntput() finishes counting.  It observes the
    decrement of component #69, but not the increment of component #0.  As the
    result, the total it gets is not 1 as it should've been - it's 0.  At which
    point we decide that vfsmount needs to be killed and proceed to free it and
    shut the filesystem down.  However, there's still another opened file
    on that filesystem, with reference to (now freed) vfsmount, etc. and we are
    screwed.
    
    It's not a wide race, but it can be reproduced with artificial slowdown of
    the mnt_get_count() loop, and it should be easier to hit on SMP KVM setups.
    
    Fix consists of moving the refcount decrement under mount_lock; the tricky
    part is that we want (and can) keep the fast case (i.e. mount that still
    has non-NULL ->mnt_ns) entirely out of mount_lock.  All places that zero
    mnt->mnt_ns are dropping some reference to mnt and they call synchronize_rcu()
    before that mntput().  IOW, if mntput() observes (under rcu_read_lock())
    a non-NULL ->mnt_ns, it is guaranteed that there is another reference yet to
    be dropped.
    
    Reported-by: Jann Horn <jannh@google.com>
    Tested-by: Jann Horn <jannh@google.com>
    Fixes: 48a066e72d97 ("RCU'd vsfmounts")
    Cc: stable@vger.kernel.org
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 59199c04b746b87db92843f28364547cb7ca1764
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Thu Aug 9 10:15:54 2018 -0400

    make sure that __dentry_kill() always invalidates d_seq, unhashed or not
    
    commit 4c0d7cd5c8416b1ef41534d19163cb07ffaa03ab upstream.
    
    RCU pathwalk relies upon the assumption that anything that changes
    ->d_inode of a dentry will invalidate its ->d_seq.  That's almost
    true - the one exception is that the final dput() of already unhashed
    dentry does *not* touch ->d_seq at all.  Unhashing does, though,
    so for anything we'd found by RCU dcache lookup we are fine.
    Unfortunately, we can *start* with an unhashed dentry or jump into
    it.
    
    We could try and be careful in the (few) places where that could
    happen.  Or we could just make the final dput() invalidate the damn
    thing, unhashed or not.  The latter is much simpler and easier to
    backport, so let's do it that way.
    
    Reported-by: "Dae R. Jeong" <threeearcat@gmail.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit cfac7df7dc10a1187176c19c4ba950b365d388b7
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Mon Aug 6 09:03:58 2018 -0400

    root dentries need RCU-delayed freeing
    
    commit 90bad5e05bcdb0308cfa3d3a60f5c0b9c8e2efb3 upstream.
    
    Since mountpoint crossing can happen without leaving lazy mode,
    root dentries do need the same protection against having their
    memory freed without RCU delay as everything else in the tree.
    
    It's partially hidden by RCU delay between detaching from the
    mount tree and dropping the vfsmount reference, but the starting
    point of pathwalk can be on an already detached mount, in which
    case umount-caused RCU delay has already passed by the time the
    lazy pathwalk grabs rcu_read_lock().  If the starting point
    happens to be at the root of that vfsmount *and* that vfsmount
    covers the entire filesystem, we get trouble.
    
    Fixes: 48a066e72d97 ("RCU'd vsfmounts")
    Cc: stable@vger.kernel.org
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 6bb53ee170c45f44ac80ad8318f72feff9cdee1b
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Sun Aug 12 12:19:42 2018 -0700

    init: rename and re-order boot_cpu_state_init()
    
    commit b5b1404d0815894de0690de8a1ab58269e56eae6 upstream.
    
    This is purely a preparatory patch for upcoming changes during the 4.19
    merge window.
    
    We have a function called "boot_cpu_state_init()" that isn't really
    about the bootup cpu state: that is done much earlier by the similarly
    named "boot_cpu_init()" (note lack of "state" in name).
    
    This function initializes some hotplug CPU state, and needs to run after
    the percpu data has been properly initialized.  It even has a comment to
    that effect.
    
    Except it _doesn't_ actually run after the percpu data has been properly
    initialized.  On x86 it happens to do that, but on at least arm and
    arm64, the percpu base pointers are initialized by the arch-specific
    'smp_prepare_boot_cpu()' hook, which ran _after_ boot_cpu_state_init().
    
    This had some unexpected results, and in particular we have a patch
    pending for the merge window that did the obvious cleanup of using
    'this_cpu_write()' in the cpu hotplug init code:
    
      -       per_cpu_ptr(&cpuhp_state, smp_processor_id())->state = CPUHP_ONLINE;
      +       this_cpu_write(cpuhp_state.state, CPUHP_ONLINE);
    
    which is obviously the right thing to do.  Except because of the
    ordering issue, it actually failed miserably and unexpectedly on arm64.
    
    So this just fixes the ordering, and changes the name of the function to
    be 'boot_cpu_hotplug_init()' to make it obvious that it's about cpu
    hotplug state, because the core CPU state was supposed to have already
    been done earlier.
    
    Marked for stable, since the (not yet merged) patch that will show this
    problem is marked for stable.
    
    Reported-by: Vlastimil Babka <vbabka@suse.cz>
    Reported-by: Mian Yousaf Kaukab <yousaf.kaukab@suse.com>
    Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
    Acked-by: Thomas Gleixner <tglx@linutronix.de>
    Cc: Will Deacon <will.deacon@arm.com>
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit bcf447f808b5e054d529eea01dcbabe6a576666a
Author: Bart Van Assche <bart.vanassche@wdc.com>
Date:   Thu Aug 2 10:44:42 2018 -0700

    scsi: sr: Avoid that opening a CD-ROM hangs with runtime power management enabled
    
    commit 1214fd7b497400d200e3f4e64e2338b303a20949 upstream.
    
    Surround scsi_execute() calls with scsi_autopm_get_device() and
    scsi_autopm_put_device(). Note: removing sr_mutex protection from the
    scsi_cd_get() and scsi_cd_put() calls is safe because the purpose of
    sr_mutex is to serialize cdrom_*() calls.
    
    This patch avoids that complaints similar to the following appear in the
    kernel log if runtime power management is enabled:
    
    INFO: task systemd-udevd:650 blocked for more than 120 seconds.
         Not tainted 4.18.0-rc7-dbg+ #1
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    systemd-udevd   D28176   650    513 0x00000104
    Call Trace:
    __schedule+0x444/0xfe0
    schedule+0x4e/0xe0
    schedule_preempt_disabled+0x18/0x30
    __mutex_lock+0x41c/0xc70
    mutex_lock_nested+0x1b/0x20
    __blkdev_get+0x106/0x970
    blkdev_get+0x22c/0x5a0
    blkdev_open+0xe9/0x100
    do_dentry_open.isra.19+0x33e/0x570
    vfs_open+0x7c/0xd0
    path_openat+0x6e3/0x1120
    do_filp_open+0x11c/0x1c0
    do_sys_open+0x208/0x2d0
    __x64_sys_openat+0x59/0x70
    do_syscall_64+0x77/0x230
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    
    Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
    Cc: Maurizio Lombardi <mlombard@redhat.com>
    Cc: Johannes Thumshirn <jthumshirn@suse.de>
    Cc: Alan Stern <stern@rowland.harvard.edu>
    Cc: <stable@vger.kernel.org>
    Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
    Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 51b3938e399bdf0cef090cea7b146c1ba9604ca2
Author: Hans de Goede <hdegoede@redhat.com>
Date:   Thu Apr 26 14:10:24 2018 +0200

    ACPI / LPSS: Add missing prv_offset setting for byt/cht PWM devices
    
    commit fdcb613d49321b5bf5d5a1bd0fba8e7c241dcc70 upstream.
    
    The LPSS PWM device on on Bay Trail and Cherry Trail devices has a set
    of private registers at offset 0x800, the current lpss_device_desc for
    them already sets the LPSS_SAVE_CTX flag to have these saved/restored
    over device-suspend, but the current lpss_device_desc was not setting
    the prv_offset field, leading to the regular device registers getting
    saved/restored instead.
    
    This is causing the PWM controller to no longer work, resulting in a black
    screen,  after a suspend/resume on systems where the firmware clears the
    APB clock and reset bits at offset 0x804.
    
    This commit fixes this by properly setting prv_offset to 0x800 for
    the PWM devices.
    
    Cc: stable@vger.kernel.org
    Fixes: e1c748179754 ("ACPI / LPSS: Add Intel BayTrail ACPI mode PWM")
    Fixes: 1bfbd8eb8a7f ("ACPI / LPSS: Add ACPI IDs for Intel Braswell")
    Signed-off-by: Hans de Goede <hdegoede@redhat.com>
    Acked-by: Rafael J . Wysocki <rjw@rjwysocki.net>
    Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
    Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit af3bd8d6a9efcb782d44e537dc391970e0d70fc7
Author: Juergen Gross <jgross@suse.com>
Date:   Thu Aug 9 16:42:16 2018 +0200

    xen/netfront: don't cache skb_shinfo()
    
    commit d472b3a6cf63cd31cae1ed61930f07e6cd6671b5 upstream.
    
    skb_shinfo() can change when calling __pskb_pull_tail(): Don't cache
    its return value.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Juergen Gross <jgross@suse.com>
    Reviewed-by: Wei Liu <wei.liu2@citrix.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit fbf12e19c9f13374ded72893f34777c2fa41f75c
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Mon Jan 8 11:51:04 2018 -0800

    Mark HI and TASKLET softirq synchronous
    
    commit 3c53776e29f81719efcf8f7a6e30cdf753bee94d upstream.
    
    Way back in 4.9, we committed 4cd13c21b207 ("softirq: Let ksoftirqd do
    its job"), and ever since we've had small nagging issues with it.  For
    example, we've had:
    
      1ff688209e2e ("watchdog: core: make sure the watchdog_worker is not deferred")
      8d5755b3f77b ("watchdog: softdog: fire watchdog even if softirqs do not get to run")
      217f69743681 ("net: busy-poll: allow preemption in sk_busy_loop()")
    
    all of which worked around some of the effects of that commit.
    
    The DVB people have also complained that the commit causes excessive USB
    URB latencies, which seems to be due to the USB code using tasklets to
    schedule USB traffic.  This seems to be an issue mainly when already
    living on the edge, but waiting for ksoftirqd to handle it really does
    seem to cause excessive latencies.
    
    Now Hanna Hawa reports that this issue isn't just limited to USB URB and
    DVB, but also causes timeout problems for the Marvell SoC team:
    
     "I'm facing kernel panic issue while running raid 5 on sata disks
      connected to Macchiatobin (Marvell community board with Armada-8040
      SoC with 4 ARMv8 cores of CA72) Raid 5 built with Marvell DMA engine
      and async_tx mechanism (ASYNC_TX_DMA [=y]); the DMA driver (mv_xor_v2)
      uses a tasklet to clean the done descriptors from the queue"
    
    The latency problem causes a panic:
    
      mv_xor_v2 f0400000.xor: dma_sync_wait: timeout!
      Kernel panic - not syncing: async_tx_quiesce: DMA error waiting for transaction
    
    We've discussed simply just reverting the original commit entirely, and
    also much more involved solutions (with per-softirq threads etc).  This
    patch is intentionally stupid and fairly limited, because the issue
    still remains, and the other solutions either got sidetracked or had
    other issues.
    
    We should probably also consider the timer softirqs to be synchronous
    and not be delayed to ksoftirqd (since they were the issue with the
    earlier watchdog problems), but that should be done as a separate patch.
    This does only the tasklet cases.
    
    Reported-and-tested-by: Hanna Hawa <hannah@marvell.com>
    Reported-and-tested-by: Josef Griebichler <griebichler.josef@gmx.at>
    Reported-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
    Cc: Alan Stern <stern@rowland.harvard.edu>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Cc: Eric Dumazet <edumazet@google.com>
    Cc: Ingo Molnar <mingo@kernel.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 50bed434ad9c0d1d41a4b617935ac28c3fc778b8
Author: Andrey Konovalov <andreyknvl@google.com>
Date:   Fri Apr 20 14:55:52 2018 -0700

    kasan: add no_sanitize attribute for clang builds
    
    commit 12c8f25a016dff69ee284aa3338bebfd2cfcba33 upstream.
    
    KASAN uses the __no_sanitize_address macro to disable instrumentation of
    particular functions.  Right now it's defined only for GCC build, which
    causes false positives when clang is used.
    
    This patch adds a definition for clang.
    
    Note, that clang's revision 329612 or higher is required.
    
    [andreyknvl@google.com: remove redundant #ifdef CONFIG_KASAN check]
      Link: http://lkml.kernel.org/r/c79aa31a2a2790f6131ed607c58b0dd45dd62a6c.1523967959.git.andreyknvl@google.com
    Link: http://lkml.kernel.org/r/4ad725cc903f8534f8c8a60f0daade5e3d674f8d.1523554166.git.andreyknvl@google.com
    Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
    Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
    Cc: Alexander Potapenko <glider@google.com>
    Cc: Dmitry Vyukov <dvyukov@google.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Ingo Molnar <mingo@kernel.org>
    Cc: David Woodhouse <dwmw@amazon.co.uk>
    Cc: Andrey Konovalov <andreyknvl@google.com>
    Cc: Will Deacon <will.deacon@arm.com>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Cc: Paul Lawrence <paullawrence@google.com>
    Cc: Sandipan Das <sandipan@linux.vnet.ibm.com>
    Cc: Kees Cook <keescook@chromium.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Sodagudi Prasad <psodagud@codeaurora.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 2106b21a8a59acd74cf5e473f71e75fdf03e3b99
Author: John David Anglin <dave.anglin@bell.net>
Date:   Sun Aug 5 13:30:31 2018 -0400

    parisc: Define mb() and add memory barriers to assembler unlock sequences
    
    commit fedb8da96355f5f64353625bf96dc69423ad1826 upstream.
    
    For years I thought all parisc machines executed loads and stores in
    order. However, Jeff Law recently indicated on gcc-patches that this is
    not correct. There are various degrees of out-of-order execution all the
    way back to the PA7xxx processor series (hit-under-miss). The PA8xxx
    series has full out-of-order execution for both integer operations, and
    loads and stores.
    
    This is described in the following article:
    http://web.archive.org/web/20040214092531/http://www.cpus.hp.com/technical_references/advperf.shtml
    
    For this reason, we need to define mb() and to insert a memory barrier
    before the store unlocking spinlocks. This ensures that all memory
    accesses are complete prior to unlocking. The ldcw instruction performs
    the same function on entry.
    
    Signed-off-by: John David Anglin <dave.anglin@bell.net>
    Cc: stable@vger.kernel.org # 4.0+
    Signed-off-by: Helge Deller <deller@gmx.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 5f394c9ef67234750fe277bb9bfdcf99ebee41d8
Author: Helge Deller <deller@gmx.de>
Date:   Sat Jul 28 11:47:17 2018 +0200

    parisc: Enable CONFIG_MLONGCALLS by default
    
    commit 66509a276c8c1d19ee3f661a41b418d101c57d29 upstream.
    
    Enable the -mlong-calls compiler option by default, because otherwise in most
    cases linking the vmlinux binary fails due to truncations of R_PARISC_PCREL22F
    relocations. This fixes building the 64-bit defconfig.
    
    Cc: stable@vger.kernel.org # 4.0+
    Signed-off-by: Helge Deller <deller@gmx.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 1d4167a818e6f4a749b5f6ce54f98176276ed85b
Author: Tadeusz Struk <tadeusz.struk@intel.com>
Date:   Tue May 22 14:37:18 2018 -0700

    tpm: fix race condition in tpm_common_write()
    
    commit 3ab2011ea368ec3433ad49e1b9e1c7b70d2e65df upstream.
    
    There is a race condition in tpm_common_write function allowing
    two threads on the same /dev/tpm<N>, or two different applications
    on the same /dev/tpmrm<N> to overwrite each other commands/responses.
    Fixed this by taking the priv->buffer_mutex early in the function.
    
    Also converted the priv->data_pending from atomic to a regular size_t
    type. There is no need for it to be atomic since it is only touched
    under the protection of the priv->buffer_mutex.
    
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Cc: stable@vger.kernel.org
    Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
    Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
    Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 954e572ae2f26ec98a4d8c1c04ab91798ccace75
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Sat Jul 28 08:12:04 2018 -0400

    ext4: fix check to prevent initializing reserved inodes
    
    commit 5012284700775a4e6e3fbe7eac4c543c4874b559 upstream.
    
    Commit 8844618d8aa7: "ext4: only look at the bg_flags field if it is
    valid" will complain if block group zero does not have the
    EXT4_BG_INODE_ZEROED flag set.  Unfortunately, this is not correct,
    since a freshly created file system has this flag cleared.  It gets
    almost immediately after the file system is mounted read-write --- but
    the following somewhat unlikely sequence will end up triggering a
    false positive report of a corrupted file system:
    
       mkfs.ext4 /dev/vdc
       mount -o ro /dev/vdc /vdc
       mount -o remount,rw /dev/vdc
    
    Instead, when initializing the inode table for block group zero, test
    to make sure that itable_unused count is not too large, since that is
    the case that will result in some or all of the reserved inodes
    getting cleared.
    
    This fixes the failures reported by Eric Whiteney when running
    generic/230 and generic/231 in the the nojournal test case.
    
    Fixes: 8844618d8aa7 ("ext4: only look at the bg_flags field if it is valid")
    Reported-by: Eric Whitney <enwlinux@gmail.com>
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>