commit 3996e9c638b8fe280971dc7f7c1f5baf3a6b4578
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date:   Thu Nov 2 09:54:50 2017 +0100

    Linux 4.13.11

commit 0cdddc6f88f999a240d64127feea404b5cc11acf
Author: Cédric Le Goater <clg@kaod.org>
Date:   Tue Aug 8 11:02:49 2017 +0200

    powerpc/xive: Fix the size of the cpumask used in xive_find_target_in_mask()
    
    commit a9dadc1c512807f955f0799e85830b420da47932 upstream.
    
    When called from xive_irq_startup(), the size of the cpumask can be
    larger than nr_cpu_ids. This can result in a WARN_ON such as:
    
      WARNING: CPU: 10 PID: 1 at ../arch/powerpc/sysdev/xive/common.c:476 xive_find_target_in_mask+0x110/0x2f0
      ...
      NIP [c00000000008a310] xive_find_target_in_mask+0x110/0x2f0
      LR [c00000000008a2e4] xive_find_target_in_mask+0xe4/0x2f0
      Call Trace:
        xive_find_target_in_mask+0x74/0x2f0 (unreliable)
        xive_pick_irq_target.isra.1+0x200/0x230
        xive_irq_startup+0x60/0x180
        irq_startup+0x70/0xd0
        __setup_irq+0x7bc/0x880
        request_threaded_irq+0x14c/0x2c0
        request_event_sources_irqs+0x100/0x180
        __machine_initcall_pseries_init_ras_IRQ+0x104/0x134
        do_one_initcall+0x68/0x1d0
        kernel_init_freeable+0x290/0x374
        kernel_init+0x24/0x170
        ret_from_kernel_thread+0x5c/0x74
    
    This happens because we're being called with our affinity mask set to
    irq_default_affinity. That in turn was populated using
    cpumask_setall(), which sets NR_CPUs worth of bits, not nr_cpu_ids
    worth. Finally cpumask_weight() will return > nr_cpu_ids when passed a
    mask which has > nr_cpu_ids bits set.
    
    Fix it by limiting the value returned by cpumask_weight().
    
    Signed-off-by: Cédric Le Goater <clg@kaod.org>
    [mpe: Add change log details on actual cause]
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Stewart Smith <stewart@linux.vnet.ibm.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 5ee110383f15b2cef1e47f82cf7e0066274c5269
Author: Guillaume Tucker <guillaume.tucker@collabora.com>
Date:   Mon Aug 21 13:47:43 2017 +0100

    regulator: fan53555: fix I2C device ids
    
    commit fc1111b885437f374ed54aadda44d8b241ebd2a3 upstream.
    
    The device tree nodes all correctly describe the regulators as
    syr827 or syr828, but the I2C device id is currently set to the
    wildcard value of syr82x in the driver.  This causes udev to fail
    to match the driver module with the modalias data from sysfs.
    
    Fix this by replacing the I2C device ids with ones that match the
    device tree descriptions, with syr827 and syr828.  Tested on
    Firefly rk3288 board.  The syr82x id was not used anywhere.
    
    Fixes: e80c47bd738b (regulator: fan53555: Export I2C module alias information)
    Signed-off-by: Guillaume Tucker <guillaume.tucker@collabora.com>
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 20d17a2d1347b3754acfc395c7c57a068fc84d40
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Thu Oct 19 20:51:10 2017 +0800

    ipsec: Fix aborted xfrm policy dump crash
    
    commit 1137b5e2529a8f5ca8ee709288ecba3e68044df2 upstream.
    
    An independent security researcher, Mohamed Ghannam, has reported
    this vulnerability to Beyond Security's SecuriTeam Secure Disclosure
    program.
    
    The xfrm_dump_policy_done function expects xfrm_dump_policy to
    have been called at least once or it will crash.  This can be
    triggered if a dump fails because the target socket's receive
    buffer is full.
    
    This patch fixes it by using the cb->start mechanism to ensure that
    the initialisation is always done regardless of the buffer situation.
    
    Fixes: 12a169e7d8f4 ("ipsec: Put dumpers on the dump list")
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
    Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit f2aa694b7459f20d37199651bc2a4495a5559502
Author: Johannes Berg <johannes.berg@intel.com>
Date:   Tue Oct 17 21:56:20 2017 +0200

    cfg80211: fix connect/disconnect edge cases
    
    commit 51e13359cd5ea34acc62c90627603352956380af upstream.
    
    If we try to connect while already connected/connecting, but
    this fails, we set ssid_len=0 but leave current_bss hanging,
    leading to errors.
    
    Check all of this better, first of all ensuring that we can't
    try to connect to a different SSID while connected/ing; ensure
    that prev_bssid is set for re-association attempts even in the
    case of the driver supporting the connect() method, and don't
    reset ssid_len in the failure cases.
    
    While at it, also reset ssid_len while disconnecting unless we
    were connected and expect a disconnected event, and warn on a
    successful connection without ssid_len being set.
    
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 8ec0e2194f426897cc56b1338ed305c15b5c6806
Author: Jimmy Assarsson <jimmyassarsson@gmail.com>
Date:   Tue Oct 24 12:23:29 2017 +0200

    can: kvaser_usb: Ignore CMD_FLUSH_QUEUE_REPLY messages
    
    commit e1d2d1329a5722dbecc9c278303fcc4aa01f8790 upstream.
    
    To avoid kernel warning "Unhandled message (68)", ignore the
    CMD_FLUSH_QUEUE_REPLY message for now.
    
    As of Leaf v2 firmware version v4.1.844 (2017-02-15), flush tx queue is
    synchronous. There is a capability bit indicating whether flushing tx
    queue is synchronous or asynchronous.
    
    A proper solution would be to query the device for capabilities. If the
    synchronous tx flush capability bit is set, we should wait for
    CMD_FLUSH_QUEUE_REPLY message, while flushing the tx queue.
    
    Signed-off-by: Jimmy Assarsson <jimmyassarsson@gmail.com>
    Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 1221c0d8ad2eda8e597f32776b748e0fa8b4d01d
Author: Jimmy Assarsson <jimmyassarsson@gmail.com>
Date:   Tue Oct 24 12:23:28 2017 +0200

    can: kvaser_usb: Correct return value in printout
    
    commit 8f65a923e6b628e187d5e791cf49393dd5e8c2f9 upstream.
    
    If the return value from kvaser_usb_send_simple_msg() was non-zero, the
    return value from kvaser_usb_flush_queue() was printed in the kernel
    warning.
    
    Signed-off-by: Jimmy Assarsson <jimmyassarsson@gmail.com>
    Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit f5167f643ca27f9d7ec6bbc08c855d05123f5538
Author: Gerhard Bertelsmann <info@gerhard-bertelsmann.de>
Date:   Thu Aug 17 15:59:49 2017 +0200

    can: sun4i: fix loopback mode
    
    commit 3a379f5b36ae039dfeb6f73316e47ab1af4945df upstream.
    
    Fix loopback mode by setting the right flag and remove presume mode.
    
    Signed-off-by: Gerhard Bertelsmann <info@gerhard-bertelsmann.de>
    Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 0a2effa4941326d2d9e5d26fc70ebaa1a0e4eca5
Author: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Date:   Tue Oct 24 16:27:28 2017 +0100

    drm/i915/perf: fix perf enable/disable ioctls with 32bits userspace
    
    commit 7277f755048da562eb2489becacd38d0d05e1e06 upstream.
    
    The compat callback was missing and triggered failures in 32bits
    userspace when enabling/disable the perf stream. We don't require any
    particular processing here as these ioctls don't take any argument.
    
    Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
    Fixes: eec688e1420 ("drm/i915: Add i915 perf infrastructure")
    Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
    Link: https://patchwork.freedesktop.org/patch/msgid/20171024152728.4873-1-lionel.g.landwerlin@intel.com
    (cherry picked from commit 191f896085cf3b5d85920d58a759da4eea141721)
    Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 0d74253003e6370e65468f5aec8c969bdef6733e
Author: Rex Zhu <Rex.Zhu@amd.com>
Date:   Fri Oct 20 15:07:41 2017 +0800

    drm/amd/powerplay: fix uninitialized variable
    
    commit 8b95f4f730cba02ef6febbdc4ca7e55ca045b00e upstream.
    
    refresh_rate was not initialized when program
    display gap.
    this patch can fix vce ring test failed
    when do S3 on Polaris10.
    
    bug: https://bugs.freedesktop.org/show_bug.cgi?id=103102
    bug: https://bugzilla.kernel.org/show_bug.cgi?id=196615
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 9970679f497a84b39b8e3f165cd0af465f7bc453
Author: Borislav Petkov <bp@suse.de>
Date:   Sun Oct 22 12:47:31 2017 +0200

    x86/cpu/AMD: Apply the Erratum 688 fix when the BIOS doesn't
    
    commit bfc1168de949cd3e9ca18c3480b5085deff1ea7c upstream.
    
    Some F14h machines have an erratum which, "under a highly specific
    and detailed set of internal timing conditions" can lead to skipping
    instructions and RIP corruption.
    
    Add the fix for those machines when their BIOS doesn't apply it or
    there simply isn't BIOS update for them.
    
    Tested-by: <mirh@protonmail.ch>
    Signed-off-by: Borislav Petkov <bp@suse.de>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Sherry Hurwitz <sherry.hurwitz@amd.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
    Link: http://lkml.kernel.org/r/20171022104731.28249-1-bp@alien8.de
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=197285
    [ Added pr_info() that we activated the workaround. ]
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 0aba1bf48a3da8b1c350c1707de340a11a9c9873
Author: Ben Hutchings <ben.hutchings@codethink.co.uk>
Date:   Sun Oct 15 18:16:33 2017 +0100

    scsi: sg: Re-fix off by one in sg_fill_request_table()
    
    commit 587c3c9f286cee5c9cac38d28c8ae1875f4ec85b upstream.
    
    Commit 109bade9c625 ("scsi: sg: use standard lists for sg_requests")
    introduced an off-by-one error in sg_ioctl(), which was fixed by commit
    bd46fc406b30 ("scsi: sg: off by one in sg_ioctl()").
    
    Unfortunately commit 4759df905a47 ("scsi: sg: factor out
    sg_fill_request_table()") moved that code, and reintroduced the
    bug (perhaps due to a botched rebase).  Fix it again.
    
    Fixes: 4759df905a47 ("scsi: sg: factor out sg_fill_request_table()")
    Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
    Acked-by: Douglas Gilbert <dgilbert@interlog.com>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit ebe378b7f29f3d88153df9cf7a3ed3f4fc596525
Author: Himanshu Madhani <himanshu.madhani@cavium.com>
Date:   Mon Oct 16 11:26:05 2017 -0700

    scsi: qla2xxx: Initialize Work element before requesting IRQs
    
    commit 1010f21ecf8ac43be676d498742de18fa6c20987 upstream.
    
    commit a9e170e28636 ("scsi: qla2xxx: Fix uninitialized work element")
    moved initializiation of work element earlier in the probe to fix call
    stack. However, it still leaves a window where interrupt can be
    generated before work element is initialized. Fix that window by
    initializing work element before we are requesting IRQs.
    
    [mkp: fixed typos]
    
    Fixes: a9e170e28636 ("scsi: qla2xxx: Fix uninitialized work element")
    Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
    Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 2d90ae4f0c679439e687bf7def1c9c5504ec5939
Author: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Date:   Mon Oct 16 17:22:31 2017 -0700

    scsi: aacraid: Fix controller initialization failure
    
    commit 45348de2c8a7a1e64c5be27b22c9786b4152dd41 upstream.
    
    This is a fix to an issue where the driver sends its periodic WELLNESS
    command to the controller after the driver shut it down.This causes the
    controller to crash. The window where this can happen is small, but it
    can be hit at around 4 hours of constant resets.
    
    Fixes: fbd185986eba (aacraid: Fix AIF triggered IOP_RESET)
    Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
    Reviewed-by: Dave Carroll <david.carroll@microsemi.com>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 75c5541046d43aef74916439e05c5e930bf1dc48
Author: Steffen Maier <maier@linux.vnet.ibm.com>
Date:   Fri Oct 13 15:40:07 2017 +0200

    scsi: zfcp: fix erp_action use-before-initialize in REC action trace
    
    commit ab31fd0ce65ec93828b617123792c1bb7c6dcc42 upstream.
    
    v4.10 commit 6f2ce1c6af37 ("scsi: zfcp: fix rport unblock race with LUN
    recovery") extended accessing parent pointer fields of struct
    zfcp_erp_action for tracing.  If an erp_action has never been enqueued
    before, these parent pointer fields are uninitialized and NULL. Examples
    are zfcp objects freshly added to the parent object's children list,
    before enqueueing their first recovery subsequently. In
    zfcp_erp_try_rport_unblock(), we iterate such list. Accessing erp_action
    fields can cause a NULL pointer dereference.  Since the kernel can read
    from lowcore on s390, it does not immediately cause a kernel page
    fault. Instead it can cause hangs on trying to acquire the wrong
    erp_action->adapter->dbf->rec_lock in zfcp_dbf_rec_action_lvl()
                          ^bogus^
    while holding already other locks with IRQs disabled.
    
    Real life example from attaching lots of LUNs in parallel on many CPUs:
    
    crash> bt 17723
    PID: 17723  TASK: ...               CPU: 25  COMMAND: "zfcperp0.0.1800"
     LOWCORE INFO:
      -psw      : 0x0404300180000000 0x000000000038e424
      -function : _raw_spin_lock_wait_flags at 38e424
    ...
     #0 [fdde8fc90] zfcp_dbf_rec_action_lvl at 3e0004e9862 [zfcp]
     #1 [fdde8fce8] zfcp_erp_try_rport_unblock at 3e0004dfddc [zfcp]
     #2 [fdde8fd38] zfcp_erp_strategy at 3e0004e0234 [zfcp]
     #3 [fdde8fda8] zfcp_erp_thread at 3e0004e0a12 [zfcp]
     #4 [fdde8fe60] kthread at 173550
     #5 [fdde8feb8] kernel_thread_starter at 10add2
    
    zfcp_adapter
     zfcp_port
      zfcp_unit <address>, 0x404040d600000000
      scsi_device NULL, returning early!
    zfcp_scsi_dev.status = 0x40000000
    0x40000000 ZFCP_STATUS_COMMON_RUNNING
    
    crash> zfcp_unit <address>
    struct zfcp_unit {
      erp_action = {
        adapter = 0x0,
        port = 0x0,
        unit = 0x0,
      },
    }
    
    zfcp_erp_action is always fully embedded into its container object. Such
    container object is never moved in its object tree (only add or delete).
    Hence, erp_action parent pointers can never change.
    
    To fix the issue, initialize the erp_action parent pointers before
    adding the erp_action container to any list and thus before it becomes
    accessible from outside of its initializing function.
    
    In order to also close the time window between zfcp_erp_setup_act()
    memsetting the entire erp_action to zero and setting the parent pointers
    again, drop the memset and instead explicitly initialize individually
    all erp_action fields except for parent pointers. To be extra careful
    not to introduce any other unintended side effect, even keep zeroing the
    erp_action fields for list and timer. Also double-check with
    WARN_ON_ONCE that erp_action parent pointers never change, so we get to
    know when we would deviate from previous behavior.
    
    Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
    Fixes: 6f2ce1c6af37 ("scsi: zfcp: fix rport unblock race with LUN recovery")
    Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit ca6711747c5a1434219ae0bd6745864a618f68b3
Author: David Howells <dhowells@redhat.com>
Date:   Wed Oct 11 23:32:27 2017 +0100

    assoc_array: Fix a buggy node-splitting case
    
    commit ea6789980fdaa610d7eb63602c746bf6ec70cd2b upstream.
    
    This fixes CVE-2017-12193.
    
    Fix a case in the assoc_array implementation in which a new leaf is
    added that needs to go into a node that happens to be full, where the
    existing leaves in that node cluster together at that level to the
    exclusion of new leaf.
    
    What needs to happen is that the existing leaves get moved out to a new
    node, N1, at level + 1 and the existing node needs replacing with one,
    N0, that has pointers to the new leaf and to N1.
    
    The code that tries to do this gets this wrong in two ways:
    
     (1) The pointer that should've pointed from N0 to N1 is set to point
         recursively to N0 instead.
    
     (2) The backpointer from N0 needs to be set correctly in the case N0 is
         either the root node or reached through a shortcut.
    
    Fix this by removing this path and using the split_node path instead,
    which achieves the same end, but in a more general way (thanks to Eric
    Biggers for spotting the redundancy).
    
    The problem manifests itself as:
    
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
      IP: assoc_array_apply_edit+0x59/0xe5
    
    Fixes: 3cb989501c26 ("Add a generic associative array implementation.")
    Reported-and-tested-by: WU Fan <u3536072@connect.hku.hk>
    Signed-off-by: David Howells <dhowells@redhat.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 1f33b1c5271fb470c9d569e9e3bb13731ca877bd
Author: Steve French <smfrench@gmail.com>
Date:   Wed Oct 25 15:58:31 2017 -0500

    SMB3: Validate negotiate request must always be signed
    
    commit 4587eee04e2ac7ac3ac9fa2bc164fb6e548f99cd upstream.
    
    According to MS-SMB2 3.2.55 validate_negotiate request must
    always be signed. Some Windows can fail the request if you send it unsigned
    
    See kernel bugzilla bug 197311
    
    Acked-by: Ronnie Sahlberg <lsahlber.redhat.com>
    Signed-off-by: Steve French <smfrench@gmail.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit b395d4baa286956234f70e0ddb6065eb558afbd4
Author: Steve French <smfrench@gmail.com>
Date:   Mon Sep 25 20:11:58 2017 -0500

    Fix encryption labels and lengths for SMB3.1.1
    
    commit 06e2290844fa408d3295ac03a1647f0798518ebe upstream.
    
    SMB3.1.1 is most secure and recent dialect. Fixup labels and lengths
    for sMB3.1.1 signing and encryption.
    
    Signed-off-by: Steve French <smfrench@gmail.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 8d3736faa7f7e438fab0cd28a31c0dc4eea552b8
Author: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Date:   Mon Oct 23 16:46:00 2017 -0700

    Input: gtco - fix potential out-of-bound access
    
    commit a50829479f58416a013a4ccca791336af3c584c7 upstream.
    
    parse_hid_report_descriptor() has a while (i < length) loop, which
    only guarantees that there's at least 1 byte in the buffer, but the
    loop body can read multiple bytes which causes out-of-bounds access.
    
    Reported-by: Andrey Konovalov <andreyknvl@google.com>
    Reviewed-by: Andrey Konovalov <andreyknvl@google.com>
    Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit ecf572cb4f8d954b0085231c9ab310b1614e3808
Author: Kai-Heng Feng <kai.heng.feng@canonical.com>
Date:   Tue Oct 24 11:08:18 2017 -0700

    Input: elan_i2c - add ELAN0611 to the ACPI table
    
    commit 57a95b41869b8f0d1949c24df2a9dac1ca7082ee upstream.
    
    ELAN0611 touchpad uses elan_i2c as its driver. It can be found
    on Lenovo ideapad 320-15IKB.
    
    So add it to ACPI table to enable the touchpad.
    
    [Ido Adiv <idoad123@gmail.com> reports that the same ACPI ID is used for
    Elan touchpad in ideapad 520].
    
    BugLink: https://bugs.launchpad.net/bugs/1723736
    Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
    Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 64cc7af317f0bbe38def90b6798d75b6f14293ff
Author: Aurélien Aptel <aaptel@suse.com>
Date:   Wed Oct 11 13:23:36 2017 +0200

    CIFS: Fix NULL pointer deref on SMB2_tcon() failure
    
    commit db3b5474f462e77b82ca1e27627f03c47b622c99 upstream.
    
    If SendReceive2() fails rsp is set to NULL but is dereferenced in the
    error handling code.
    
    Signed-off-by: Aurelien Aptel <aaptel@suse.com>
    Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
    Signed-off-by: Steve French <smfrench@gmail.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit b232aad2d75146a6fa104b23475a0a16f8d72301
Author: Benjamin Gilbert <benjamin.gilbert@coreos.com>
Date:   Thu Oct 19 13:09:29 2017 -0700

    cifs: Select all required crypto modules
    
    commit 5b454a64555055aaa5769b3ba877bd911d375d5a upstream.
    
    Some dependencies were lost when CIFS_SMB2 was merged into CIFS.
    
    Fixes: 2a38e12053b7 ("[SMB3] Remove ifdef since SMB3 (and later) now STRONGLY preferred")
    Signed-off-by: Benjamin Gilbert <benjamin.gilbert@coreos.com>
    Reviewed-by: Aurelien Aptel <aaptel@suse.com>
    Signed-off-by: Steve French <smfrench@gmail.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit bdd4c1859bc4e1f8b52e591ad4d712ce54b30fec
Author: Juergen Gross <jgross@suse.com>
Date:   Thu Oct 26 11:50:56 2017 +0200

    xen: fix booting ballooned down hvm guest
    
    commit 5266b8e4445cc836c46689d80a9ff539fa3bfbda upstream.
    
    Commit 96edd61dcf44362d3ef0bed1a5361e0ac7886a63 ("xen/balloon: don't
    online new memory initially") introduced a regression when booting a
    HVM domain with memory less than mem-max: instead of ballooning down
    immediately the system would try to use the memory up to mem-max
    resulting in Xen crashing the domain.
    
    For HVM domains the current size will be reflected in Xenstore node
    memory/static-max instead of memory/target.
    
    Additionally we have to trigger the ballooning process at once.
    
    Fixes: 96edd61dcf44362d3ef0bed1a5361e0ac7886a63 ("xen/balloon: don't online new memory initially")
    Reported-by: Simon Gaiser <hw42@ipsumj.de>
    Suggested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
    Signed-off-by: Juergen Gross <jgross@suse.com>
    Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
    Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 298df948fdbd6c659db7a2a68f8be6c58614ab27
Author: Juergen Gross <jgross@suse.com>
Date:   Wed Oct 25 17:08:07 2017 +0200

    xen/gntdev: avoid out of bounds access in case of partial gntdev_mmap()
    
    commit 298d275d4d9bea3524ff4bc76678c140611d8a8d upstream.
    
    In case gntdev_mmap() succeeds only partially in mapping grant pages
    it will leave some vital information uninitialized needed later for
    cleanup. This will lead to an out of bounds array access when unmapping
    the already mapped pages.
    
    So just initialize the data needed for unmapping the pages a little bit
    earlier.
    
    Reported-by: Arthur Borsboom <arthurborsboom@gmail.com>
    Signed-off-by: Juergen Gross <jgross@suse.com>
    Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
    Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 9406430877f1c5f2f9d401b28704da5ef5191251
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Wed Oct 25 16:34:27 2017 +0200

    fuse: fix READDIRPLUS skipping an entry
    
    commit c6cdd51404b7ac12dd95173ddfc548c59ecf037f upstream.
    
    Marios Titas running a Haskell program noticed a problem with fuse's
    readdirplus: when it is interrupted by a signal, it skips one directory
    entry.
    
    The reason is that fuse erronously updates ctx->pos after a failed
    dir_emit().
    
    The issue originates from the patch adding readdirplus support.
    
    Reported-by: Jakob Unterwurzacher <jakobunt@gmail.com>
    Tested-by: Marios Titas <redneb@gmx.com>
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
    Fixes: 0b05b18381ee ("fuse: implement NFS-like readdirplus support")
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 29fd10fb041b08f225df85a514bd6e2e55d84316
Author: Amir Goldstein <amir73il@gmail.com>
Date:   Tue Oct 24 12:24:11 2017 +0300

    ovl: do not cleanup unsupported index entries
    
    commit fa0096e3bad69ed6f34843fd7ae1c45ca987012a upstream.
    
    With index=on, ovl_indexdir_cleanup() tries to cleanup invalid index
    entries (e.g. bad index name). This behavior could result in cleaning of
    entries created by newer kernels and is therefore undesirable.
    Instead, abort mount if such entries are encountered. We still cleanup
    'stale' entries and 'orphan' entries, both those cases can be a result
    of offline changes to lower and upper dirs.
    
    When encoutering an index entry of type directory or whiteout, kernel
    was supposed to fallback to read-only mount, but the fill_super()
    operation returns EROFS in this case instead of returning success with
    read-only mount flag, so mount fails when encoutering directory or
    whiteout index entries. Bless this behavior by returning -EINVAL on
    directory and whiteout index entries as we do for all unsupported index
    entries.
    
    Fixes: 61b674710cd9 ("ovl: do not cleanup directory and whiteout index..")
    Signed-off-by: Amir Goldstein <amir73il@gmail.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 725b704522357dad35d46cd0a2b26418669fd06f
Author: Amir Goldstein <amir73il@gmail.com>
Date:   Fri Oct 20 17:19:06 2017 +0300

    ovl: handle ENOENT on index lookup
    
    commit 7937a56fdf0b064c2ffa33025210f725a4ebc822 upstream.
    
    Treat ENOENT from index entry lookup the same way as treating a returned
    negative dentry. Apparently, either could be returned if file is not
    found, depending on the underlying file system.
    
    Fixes: 359f392ca53e ("ovl: lookup index entry for copy up origin")
    Signed-off-by: Amir Goldstein <amir73il@gmail.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 02a7b62123140fbcd991e59ff8001c26e00d1f0a
Author: Amir Goldstein <amir73il@gmail.com>
Date:   Thu Oct 12 19:03:04 2017 +0300

    ovl: fix EIO from lookup of non-indexed upper
    
    commit 6eaf011144af10cad34c0d46f82e50d382c8e926 upstream.
    
    Commit fbaf94ee3cd5 ("ovl: don't set origin on broken lower hardlink")
    attempt to avoid the condition of non-indexed upper inode with lower
    hardlink as origin. If this condition is found, lookup returns EIO.
    
    The protection of commit mentioned above does not cover the case of lower
    that is not a hardlink when it is copied up (with either index=off/on)
    and then lower is hardlinked while overlay is offline.
    
    Changes to lower layer while overlayfs is offline should not result in
    unexpected behavior, so a permanent EIO error after creating a link in
    lower layer should not be considered as correct behavior.
    
    This fix replaces EIO error with success in cases where upper has origin
    but no index is found, or index is found that does not match upper
    inode. In those cases, lookup will not fail and the returned overlay inode
    will be hashed by upper inode instead of by lower origin inode.
    
    Fixes: 359f392ca53e ("ovl: lookup index entry for copy up origin")
    Signed-off-by: Amir Goldstein <amir73il@gmail.com>
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit d22011059462aa003e5825a8180f12db9a980e0c
Author: Hirofumi Nakagawa <nklabs@gmail.com>
Date:   Tue Sep 26 03:09:53 2017 +0900

    ovl: add NULL check in ovl_alloc_inode
    
    commit b3885bd6edb41b91a0e3976469f72ae31bfb8d95 upstream.
    
    This was detected by fault injection test
    
    Signed-off-by: Hirofumi Nakagawa <nklabs@gmail.com>
    Fixes: 13cf199d0088 ("ovl: allocate an ovl_inode struct")
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 176fab4e08a7a578ff6967a8658fd7429faed58f
Author: Miquel Raynal <miquel.raynal@free-electrons.com>
Date:   Wed Sep 13 18:21:38 2017 +0200

    spi: armada-3700: Fix failing commands with quad-SPI
    
    commit 747e1f60470b975363cbbfcde0c41a3166391be5 upstream.
    
    A3700 SPI controller datasheet states that only the first line (IO0) is
    used to receive and send instructions, addresses and dummy bytes,
    unless for addresses during an RX operation in a quad SPI configuration
    (see p.821 of the Armada-3720-DB datasheet). Otherwise, some commands
    such as SPI NOR commands like READ_FROM_CACHE_DUAL_IO(0xeb) and
    READ_FROM_CACHE_DUAL_IO(0xbb) will fail because these commands must send
    address bytes through the four pins. Data transfer always use the four
    bytes with this setup.
    
    Thus, in quad SPI configuration, the A3700_SPI_ADDR_PIN bit must be set
    only in this case to inform the controller that it must use the number
    of pins indicated in the {A3700_SPI_DATA_PIN1,A3700_SPI_DATA_PIN0} field
    during the address cycles of an RX operation.
    
    Suggested-by: Ken Ma <make@marvell.com>
    Signed-off-by: Miquel Raynal <miquel.raynal@free-electrons.com>
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit c39d070d9d8ce7d6390c10a45695a3c208391e01
Author: Florian Fainelli <f.fainelli@gmail.com>
Date:   Wed Oct 11 14:59:22 2017 -0700

    spi: bcm-qspi: Fix use after free in bcm_qspi_probe() in error path
    
    commit c0368e4db4a3e8a3dce40f3f621c06e14c560d79 upstream.
    
    There was an inversion in how the error path in bcm_qspi_probe() is done
    which would make us trip over a KASAN use-after-free report. Turns out
    that qspi->dev_ids does not get allocated until later in the probe
    process. Fix this by introducing a new lable: qspi_resource_err which
    takes care of cleaning up the SPI master instance.
    
    Fixes: fa236a7ef240 ("spi: bcm-qspi: Add Broadcom MSPI driver")
    Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit ccecf863c130fbeaffa2657e4ac56e7f64aa4097
Author: Maxime Chevallier <maxime.chevallier@smile.fr>
Date:   Tue Oct 10 10:43:17 2017 +0200

    spi: a3700: Return correct value on timeout detection
    
    commit 5a866ec0014b2baa4ecbb1eaa19c835482829d08 upstream.
    
    When waiting for transfer completion, a3700_spi_wait_completion
    returns a boolean indicating if a timeout occurred.
    
    The function was returning 'true' everytime, failing to detect any
    timeout.
    
    This patch makes it return 'false' when a timeout is reached.
    
    Signed-off-by: Maxime Chevallier <maxime.chevallier@smile.fr>
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 5388b44287b63629b894053e2cd306e361d134d7
Author: Baruch Siach <baruch@tkos.co.il>
Date:   Sun Sep 10 20:29:45 2017 +0300

    spi: uapi: spidev: add missing ioctl header
    
    commit a2b4a79b88b24c49d98d45a06a014ffd22ada1a4 upstream.
    
    The SPI_IOC_MESSAGE() macro references _IOC_SIZEBITS. Add linux/ioctl.h
    to make sure this macro is defined. This fixes the following build
    failure of lcdproc with the musl libc:
    
    In file included from .../sysroot/usr/include/sys/ioctl.h:7:0,
                     from hd44780-spi.c:31:
    hd44780-spi.c: In function 'spi_transfer':
    hd44780-spi.c:89:24: error: '_IOC_SIZEBITS' undeclared (first use in this function)
      status = ioctl(p->fd, SPI_IOC_MESSAGE(1), &xfer);
                            ^
    
    Signed-off-by: Baruch Siach <baruch@tkos.co.il>
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 2d1b540a885ec8e97dfd2c286d1a98be371d281d
Author: Josef Bacik <jbacik@fb.com>
Date:   Tue Oct 24 15:57:18 2017 -0400

    nbd: handle interrupted sendmsg with a sndtimeo set
    
    commit 32e67a3a06b88904155170560b7a63d372b320bd upstream.
    
    If you do not set sk_sndtimeo you will get -ERESTARTSYS if there is a
    pending signal when you enter sendmsg, which we handle properly.
    However if you set a timeout for your commands we'll set sk_sndtimeo to
    that timeout, which means that sendmsg will start returning -EINTR
    instead of -ERESTARTSYS.  Fix this by checking either cases and doing
    the correct thing.
    
    Fixes: dc88e34d69d8 ("nbd: set sk->sk_sndtimeo for our sockets")
    Reported-and-tested-by: Daniel Xu <dlxu@fb.com>
    Signed-off-by: Josef Bacik <jbacik@fb.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 2f2774b334316734200288c6164c4a57a3cb6de2
Author: Martin Schwidefsky <schwidefsky@de.ibm.com>
Date:   Thu Oct 5 08:29:47 2017 +0200

    s390/kvm: fix detection of guest machine checks
    
    commit 0a5e2ec2647737907d267c09dc9a25fab1468865 upstream.
    
    The new detection code for guest machine checks added a check based
    on %r11 to .Lcleanup_sie to distinguish between normal asynchronous
    interrupts and machine checks. But the funtion is called from the
    program check handler as well with an undefined value in %r11.
    
    The effect is that all program exceptions pointing to the SIE instruction
    will set the CIF_MCCK_GUEST bit. The bit stays set for the CPU until the
     next machine check comes in which will incorrectly be interpreted as a
    guest machine check.
    
    The simplest fix is to stop using .Lcleanup_sie in the program check
    handler and duplicate a few instructions.
    
    Fixes: c929500d7a5a ("s390/nmi: s390: New low level handling for machine check happening in guest")
    Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
    Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit eb836f08f29712eeb564ca8e95913ac1d0ef6d87
Author: Alexey Kardashevskiy <aik@ozlabs.ru>
Date:   Wed Oct 11 16:00:34 2017 +1100

    KVM: PPC: Book3S: Protect kvmppc_gpa_to_ua() with SRCU
    
    commit 8f6a9f0d0604817f7c8d4376fd51718f1bf192ee upstream.
    
    kvmppc_gpa_to_ua() accesses KVM memory slot array via
    srcu_dereference_check() and this produces warnings from RCU like below.
    
    This extends the existing srcu_read_lock/unlock to cover that
    kvmppc_gpa_to_ua() as well.
    
    We did not hit this before as this lock is not needed for the realmode
    handlers and hash guests would use the realmode path all the time;
    however the radix guests are always redirected to the virtual mode
    handlers and hence the warning.
    
    [   68.253798] ./include/linux/kvm_host.h:575 suspicious rcu_dereference_check() usage!
    [   68.253799]
                   other info that might help us debug this:
    
    [   68.253802]
                   rcu_scheduler_active = 2, debug_locks = 1
    [   68.253804] 1 lock held by qemu-system-ppc/6413:
    [   68.253806]  #0:  (&vcpu->mutex){+.+.}, at: [<c00800000e3c22f4>] vcpu_load+0x3c/0xc0 [kvm]
    [   68.253826]
                   stack backtrace:
    [   68.253830] CPU: 92 PID: 6413 Comm: qemu-system-ppc Tainted: G        W       4.14.0-rc3-00553-g432dcba58e9c-dirty #72
    [   68.253833] Call Trace:
    [   68.253839] [c000000fd3d9f790] [c000000000b7fcc8] dump_stack+0xe8/0x160 (unreliable)
    [   68.253845] [c000000fd3d9f7d0] [c0000000001924c0] lockdep_rcu_suspicious+0x110/0x180
    [   68.253851] [c000000fd3d9f850] [c0000000000e825c] kvmppc_gpa_to_ua+0x26c/0x2b0
    [   68.253858] [c000000fd3d9f8b0] [c00800000e3e1984] kvmppc_h_put_tce+0x12c/0x2a0 [kvm]
    
    Fixes: 121f80ba68f1 ("KVM: PPC: VFIO: Add in-kernel acceleration for VFIO")
    Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
    Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 39418e2c388ffd74703b67d727ed439f6e3fc09f
Author: Nicholas Piggin <npiggin@gmail.com>
Date:   Tue Oct 10 20:18:28 2017 +1000

    KVM: PPC: Book3S HV: POWER9 more doorbell fixes
    
    commit 2cde3716321ec64a1faeaf567bd94100c7b4160f upstream.
    
    - Add another case where msgsync is required.
    - Required barrier sequence for global doorbells is msgsync ; lwsync
    
    When msgsnd is used for IPIs to other cores, msgsync must be executed by
    the target to order stores performed on the source before its msgsnd
    (provided the source executes the appropriate sync).
    
    Fixes: 1704a81ccebc ("KVM: PPC: Book3S HV: Use msgsnd for IPIs to other cores on POWER9")
    Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
    Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 3f3414599f110c8dce4e0b0d51502fc8c07517d7
Author: Greg Kurz <groug@kaod.org>
Date:   Thu Sep 14 23:56:25 2017 +0200

    KVM: PPC: Fix oops when checking KVM_CAP_PPC_HTM
    
    commit ac64115a66c18c01745bbd3c47a36b124e5fd8c0 upstream.
    
    The following program causes a kernel oops:
    
    #include <sys/types.h>
    #include <sys/stat.h>
    #include <fcntl.h>
    #include <sys/ioctl.h>
    #include <linux/kvm.h>
    
    main()
    {
        int fd = open("/dev/kvm", O_RDWR);
        ioctl(fd, KVM_CHECK_EXTENSION, KVM_CAP_PPC_HTM);
    }
    
    This happens because when using the global KVM fd with
    KVM_CHECK_EXTENSION, kvm_vm_ioctl_check_extension() gets
    called with a NULL kvm argument, which gets dereferenced
    in is_kvmppc_hv_enabled(). Spotted while reading the code.
    
    Let's use the hv_enabled fallback variable, like everywhere
    else in this function.
    
    Fixes: 23528bb21ee2 ("KVM: PPC: Introduce KVM_CAP_PPC_HTM")
    Signed-off-by: Greg Kurz <groug@kaod.org>
    Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
    Reviewed-by: Thomas Huth <thuth@redhat.com>
    Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 5388b61da09c600e3980c18870c7767116de1482
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Fri Oct 27 20:35:31 2017 -0700

    Fix tracing sample code warning.
    
    commit a0cb2b5c390151837b08e5f7bca4a6ecddbcd39c upstream.
    
    Commit 6575257c60e1 ("tracing/samples: Fix creation and deletion of
    simple_thread_fn creation") introduced a new warning due to using a
    boolean as a counter.
    
    Just make it "int".
    
    Fixes: 6575257c60e1 ("tracing/samples: Fix creation and deletion of simple_thread_fn creation")
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit ad424492a7aa8a9d089eaad369b52de07f0f8f2a
Author: Jeff Layton <jlayton@redhat.com>
Date:   Thu Oct 19 08:52:58 2017 -0400

    ceph: unlock dangling spinlock in try_flush_caps()
    
    commit 6c2838fbdedb9b72a81c931d49e56b229b6cdbca upstream.
    
    sparse warns:
    
      fs/ceph/caps.c:2042:9: warning: context imbalance in 'try_flush_caps' - wrong count at exit
    
    We need to exit this function with the lock unlocked, but a couple of
    cases leave it locked.
    
    Signed-off-by: Jeff Layton <jlayton@redhat.com>
    Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
    Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
    Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 6077f1f01069f9ad0b5f5a03adc82bd9925cfa50
Author: Hui Wang <hui.wang@canonical.com>
Date:   Tue Oct 24 16:53:34 2017 +0800

    ALSA: hda - fix headset mic problem for Dell machines with alc236
    
    commit f265788c336979090ac80b9ae173aa817c4fe40d upstream.
    
    We have several Dell laptops which use the codec alc236, the headset
    mic can't work on these machines. Following the commit 736f20a70, we
    add the pin cfg table to make the headset mic work.
    
    Signed-off-by: Hui Wang <hui.wang@canonical.com>
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit ce790934cbd261794eef68e53636e6adffc9b6fa
Author: Kailang Yang <kailang@realtek.com>
Date:   Fri Oct 20 15:06:34 2017 +0800

    ALSA: hda/realtek - Add support for ALC236/ALC3204
    
    commit 736f20a7060857ff569e9e9586ae6c1204a73e07 upstream.
    
    Add support for ALC236/ALC3204.
    Add headset mode support for ALC236/ALC3204.
    
    Signed-off-by: Kailang Yang <kailang@realtek.com>
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit ac307385849db68fbdb4ce9665f25c84c3623089
Author: James Smart <jsmart2021@gmail.com>
Date:   Mon Oct 9 13:39:44 2017 -0700

    nvme-fc: fix iowait hang
    
    commit 8a82dbf19129dde9e6fc9ab25a00dbc7569abe6a upstream.
    
    Add missing iowait head initialization.
    Fix irqsave vs irq: wait_event_lock_irq() doesn't do irq save/restore
    
    Fixes: 36715cf4b366 ("nvme_fc: replace ioabort msleep loop with completion”)
    Signed-off-by: James Smart <james.smart@broadcom.com>
    Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
    Reviewed-by: Himanshu Madhani <himanshu.madhani@cavium.com>
    Tested-by: Himanshu Madhani <himanshu.madhani@cavium.com>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit d8b29f286d8925e98927aa4162df1661ef0479bf
Author: Tejun Heo <tj@kernel.org>
Date:   Mon Oct 9 08:04:13 2017 -0700

    workqueue: replace pool->manager_arb mutex with a flag
    
    commit 692b48258dda7c302e777d7d5f4217244478f1f6 upstream.
    
    Josef reported a HARDIRQ-safe -> HARDIRQ-unsafe lock order detected by
    lockdep:
    
     [ 1270.472259] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
     [ 1270.472783] 4.14.0-rc1-xfstests-12888-g76833e8 #110 Not tainted
     [ 1270.473240] -----------------------------------------------------
     [ 1270.473710] kworker/u5:2/5157 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
     [ 1270.474239]  (&(&lock->wait_lock)->rlock){+.+.}, at: [<ffffffff8da253d2>] __mutex_unlock_slowpath+0xa2/0x280
     [ 1270.474994]
     [ 1270.474994] and this task is already holding:
     [ 1270.475440]  (&pool->lock/1){-.-.}, at: [<ffffffff8d2992f6>] worker_thread+0x366/0x3c0
     [ 1270.476046] which would create a new lock dependency:
     [ 1270.476436]  (&pool->lock/1){-.-.} -> (&(&lock->wait_lock)->rlock){+.+.}
     [ 1270.476949]
     [ 1270.476949] but this new dependency connects a HARDIRQ-irq-safe lock:
     [ 1270.477553]  (&pool->lock/1){-.-.}
     ...
     [ 1270.488900] to a HARDIRQ-irq-unsafe lock:
     [ 1270.489327]  (&(&lock->wait_lock)->rlock){+.+.}
     ...
     [ 1270.494735]  Possible interrupt unsafe locking scenario:
     [ 1270.494735]
     [ 1270.495250]        CPU0                    CPU1
     [ 1270.495600]        ----                    ----
     [ 1270.495947]   lock(&(&lock->wait_lock)->rlock);
     [ 1270.496295]                                local_irq_disable();
     [ 1270.496753]                                lock(&pool->lock/1);
     [ 1270.497205]                                lock(&(&lock->wait_lock)->rlock);
     [ 1270.497744]   <Interrupt>
     [ 1270.497948]     lock(&pool->lock/1);
    
    , which will cause a irq inversion deadlock if the above lock scenario
    happens.
    
    The root cause of this safe -> unsafe lock order is the
    mutex_unlock(pool->manager_arb) in manage_workers() with pool->lock
    held.
    
    Unlocking mutex while holding an irq spinlock was never safe and this
    problem has been around forever but it never got noticed because the
    only time the mutex is usually trylocked while holding irqlock making
    actual failures very unlikely and lockdep annotation missed the
    condition until the recent b9c16a0e1f73 ("locking/mutex: Fix
    lockdep_assert_held() fail").
    
    Using mutex for pool->manager_arb has always been a bit of stretch.
    It primarily is an mechanism to arbitrate managership between workers
    which can easily be done with a pool flag.  The only reason it became
    a mutex is that pool destruction path wants to exclude parallel
    managing operations.
    
    This patch replaces the mutex with a new pool flag POOL_MANAGER_ACTIVE
    and make the destruction path wait for the current manager on a wait
    queue.
    
    v2: Drop unnecessary flag clearing before pool destruction as
        suggested by Boqun.
    
    Signed-off-by: Tejun Heo <tj@kernel.org>
    Reported-by: Josef Bacik <josef@toxicpanda.com>
    Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>