commit 94c67449c7550597e86f15cce923055f7b2c8e09
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date:   Sat Jul 28 07:49:14 2018 +0200

    Linux 4.9.116

commit b9dd13488acb680c5f141f5c077880317c9749cf
Author: Arnd Bergmann <arnd@arndb.de>
Date:   Thu Dec 14 15:32:41 2017 -0800

    exec: avoid gcc-8 warning for get_task_comm
    
    commit 3756f6401c302617c5e091081ca4d26ab604bec5 upstream.
    
    gcc-8 warns about using strncpy() with the source size as the limit:
    
      fs/exec.c:1223:32: error: argument to 'sizeof' in 'strncpy' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess]
    
    This is indeed slightly suspicious, as it protects us from source
    arguments without NUL-termination, but does not guarantee that the
    destination is terminated.
    
    This keeps the strncpy() to ensure we have properly padded target
    buffer, but ensures that we use the correct length, by passing the
    actual length of the destination buffer as well as adding a build-time
    check to ensure it is exactly TASK_COMM_LEN.
    
    There are only 23 callsites which I all reviewed to ensure this is
    currently the case.  We could get away with doing only the check or
    passing the right length, but it doesn't hurt to do both.
    
    Link: http://lkml.kernel.org/r/20171205151724.1764896-1-arnd@arndb.de
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Suggested-by: Kees Cook <keescook@chromium.org>
    Acked-by: Kees Cook <keescook@chromium.org>
    Acked-by: Ingo Molnar <mingo@kernel.org>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Serge Hallyn <serge@hallyn.com>
    Cc: James Morris <james.l.morris@oracle.com>
    Cc: Aleksa Sarai <asarai@suse.de>
    Cc: "Eric W. Biederman" <ebiederm@xmission.com>
    Cc: Frederic Weisbecker <frederic@kernel.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit b1a1d9bdb1b5123924956f069920bf4b9c4b11c4
Author: Arnd Bergmann <arnd@arndb.de>
Date:   Thu Jul 26 10:13:22 2018 +0200

    turn off -Wattribute-alias
    
    Starting with gcc-8.1, we get a warning about all system call definitions,
    which use an alias between functions with incompatible prototypes, e.g.:
    
    In file included from ../mm/process_vm_access.c:19:
    ../include/linux/syscalls.h:211:18: warning: 'sys_process_vm_readv' alias between functions of incompatible types 'long int(pid_t,  const struct iovec *, long unsigned int,  const struct iovec *, long unsigned int,  long unsigned int)' {aka 'long int(int,  const struct iovec *, long unsigned int,  const struct iovec *, long unsigned int,  long unsigned int)'} and 'long int(long int,  long int,  long int,  long int,  long int,  long int)' [-Wattribute-alias]
      asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \
                      ^~~
    ../include/linux/syscalls.h:207:2: note: in expansion of macro '__SYSCALL_DEFINEx'
      __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
      ^~~~~~~~~~~~~~~~~
    ../include/linux/syscalls.h:201:36: note: in expansion of macro 'SYSCALL_DEFINEx'
     #define SYSCALL_DEFINE6(name, ...) SYSCALL_DEFINEx(6, _##name, __VA_ARGS__)
                                        ^~~~~~~~~~~~~~~
    ../mm/process_vm_access.c:300:1: note: in expansion of macro 'SYSCALL_DEFINE6'
     SYSCALL_DEFINE6(process_vm_readv, pid_t, pid, const struct iovec __user *, lvec,
     ^~~~~~~~~~~~~~~
    ../include/linux/syscalls.h:215:18: note: aliased declaration here
      asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \
                      ^~~
    ../include/linux/syscalls.h:207:2: note: in expansion of macro '__SYSCALL_DEFINEx'
      __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
      ^~~~~~~~~~~~~~~~~
    ../include/linux/syscalls.h:201:36: note: in expansion of macro 'SYSCALL_DEFINEx'
     #define SYSCALL_DEFINE6(name, ...) SYSCALL_DEFINEx(6, _##name, __VA_ARGS__)
                                        ^~~~~~~~~~~~~~~
    ../mm/process_vm_access.c:300:1: note: in expansion of macro 'SYSCALL_DEFINE6'
     SYSCALL_DEFINE6(process_vm_readv, pid_t, pid, const struct iovec __user *, lvec,
    
    This is really noisy and does not indicate a real problem. In the latest
    mainline kernel, this was addressed by commit bee20031772a ("disable
    -Wattribute-alias warning for SYSCALL_DEFINEx()"), which seems too invasive
    to backport.
    
    This takes a much simpler approach and just disables the warning across the
    kernel.
    
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit b2019f0f7021b5ca9da3173ace7fcf97aa1d4923
Author: Anssi Hannula <anssi.hannula@bitwise.fi>
Date:   Mon Feb 26 14:27:13 2018 +0200

    can: xilinx_can: fix RX overflow interrupt not being enabled
    
    commit 83997997252f5d3fc7f04abc24a89600c2b504ab upstream.
    
    RX overflow interrupt (RXOFLW) is disabled even though xcan_interrupt()
    processes it. This means that an RX overflow interrupt will only be
    processed when another interrupt gets asserted (e.g. for RX/TX).
    
    Fix that by enabling the RXOFLW interrupt.
    
    Fixes: b1201e44f50b ("can: xilinx CAN controller support")
    Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
    Cc: Michal Simek <michal.simek@xilinx.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 9f7308434ed6922545e00562a9a6ccd0aa3759da
Author: Anssi Hannula <anssi.hannula@bitwise.fi>
Date:   Mon Feb 26 14:39:59 2018 +0200

    can: xilinx_can: fix incorrect clear of non-processed interrupts
    
    commit 2f4f0f338cf453bfcdbcf089e177c16f35f023c8 upstream.
    
    xcan_interrupt() clears ERROR|RXOFLV|BSOFF|ARBLST interrupts if any of
    them is asserted. This does not take into account that some of them
    could have been asserted between interrupt status read and interrupt
    clear, therefore clearing them without handling them.
    
    Fix the code to only clear those interrupts that it knows are asserted
    and therefore going to be processed in xcan_err_interrupt().
    
    Fixes: b1201e44f50b ("can: xilinx CAN controller support")
    Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
    Cc: Michal Simek <michal.simek@xilinx.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit bee7ff7eaade2f51d0a4b485642e06b6d2b2d125
Author: Anssi Hannula <anssi.hannula@bitwise.fi>
Date:   Thu Feb 23 14:50:03 2017 +0200

    can: xilinx_can: keep only 1-2 frames in TX FIFO to fix TX accounting
    
    commit 620050d9c2be15c47017ba95efe59e0832e99a56 upstream.
    
    The xilinx_can driver assumes that the TXOK interrupt only clears after
    it has been acknowledged as many times as there have been successfully
    sent frames.
    
    However, the documentation does not mention such behavior, instead
    saying just that the interrupt is cleared when the clear bit is set.
    
    Similarly, testing seems to also suggest that it is immediately cleared
    regardless of the amount of frames having been sent. Performing some
    heavy TX load and then going back to idle has the tx_head drifting
    further away from tx_tail over time, steadily reducing the amount of
    frames the driver keeps in the TX FIFO (but not to zero, as the TXOK
    interrupt always frees up space for 1 frame from the driver's
    perspective, so frames continue to be sent) and delaying the local echo
    frames.
    
    The TX FIFO tracking is also otherwise buggy as it does not account for
    TX FIFO being cleared after software resets, causing
      BUG!, TX FIFO full when queue awake!
    messages to be output.
    
    There does not seem to be any way to accurately track the state of the
    TX FIFO for local echo support while using the full TX FIFO.
    
    The Zynq version of the HW (but not the soft-AXI version) has watermark
    programming support and with it an additional TX-FIFO-empty interrupt
    bit.
    
    Modify the driver to only put 1 frame into TX FIFO at a time on soft-AXI
    and 2 frames at a time on Zynq. On Zynq the TXFEMP interrupt bit is used
    to detect whether 1 or 2 frames have been sent at interrupt processing
    time.
    
    Tested with the integrated CAN on Zynq-7000 SoC. The 1-frame-FIFO mode
    was also tested.
    
    An alternative way to solve this would be to drop local echo support but
    keep using the full TX FIFO.
    
    v2: Add FIFO space check before TX queue wake with locking to
    synchronize with queue stop. This avoids waking the queue when xmit()
    had just filled it.
    
    v3: Keep local echo support and reduce the amount of frames in FIFO
    instead as suggested by Marc Kleine-Budde.
    
    Fixes: b1201e44f50b ("can: xilinx CAN controller support")
    Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 1fd9fa57c1d97938b877018324809bafd59e15f5
Author: Anssi Hannula <anssi.hannula@bitwise.fi>
Date:   Tue Feb 7 13:23:04 2017 +0200

    can: xilinx_can: fix device dropping off bus on RX overrun
    
    commit 2574fe54515ed3487405de329e4e9f13d7098c10 upstream.
    
    The xilinx_can driver performs a software reset when an RX overrun is
    detected. This causes the device to enter Configuration mode where no
    messages are received or transmitted.
    
    The documentation does not mention any need to perform a reset on an RX
    overrun, and testing by inducing an RX overflow also indicated that the
    device continues to work just fine without a reset.
    
    Remove the software reset.
    
    Tested with the integrated CAN on Zynq-7000 SoC.
    
    Fixes: b1201e44f50b ("can: xilinx CAN controller support")
    Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit c98f577204b4bdc46453c2f64fa760fa17721046
Author: Anssi Hannula <anssi.hannula@bitwise.fi>
Date:   Wed Feb 8 13:13:40 2017 +0200

    can: xilinx_can: fix recovery from error states not being propagated
    
    commit 877e0b75947e2c7acf5624331bb17ceb093c98ae upstream.
    
    The xilinx_can driver contains no mechanism for propagating recovery
    from CAN_STATE_ERROR_WARNING and CAN_STATE_ERROR_PASSIVE.
    
    Add such a mechanism by factoring the handling of
    XCAN_STATE_ERROR_PASSIVE and XCAN_STATE_ERROR_WARNING out of
    xcan_err_interrupt and checking for recovery after RX and TX if the
    interface is in one of those states.
    
    Tested with the integrated CAN on Zynq-7000 SoC.
    
    Fixes: b1201e44f50b ("can: xilinx CAN controller support")
    Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 1fadfbd9f593903e84f66d4e6902193886c5de1c
Author: Anssi Hannula <anssi.hannula@bitwise.fi>
Date:   Thu May 17 15:41:19 2018 +0300

    can: xilinx_can: fix power management handling
    
    commit 8ebd83bdb027f29870d96649dba18b91581ea829 upstream.
    
    There are several issues with the suspend/resume handling code of the
    driver:
    
    - The device is attached and detached in the runtime_suspend() and
      runtime_resume() callbacks if the interface is running. However,
      during xcan_chip_start() the interface is considered running,
      causing the resume handler to incorrectly call netif_start_queue()
      at the beginning of xcan_chip_start(), and on xcan_chip_start() error
      return the suspend handler detaches the device leaving the user
      unable to bring-up the device anymore.
    
    - The device is not brought properly up on system resume. A reset is
      done and the code tries to determine the bus state after that.
      However, after reset the device is always in Configuration mode
      (down), so the state checking code does not make sense and
      communication will also not work.
    
    - The suspend callback tries to set the device to sleep mode (low-power
      mode which monitors the bus and brings the device back to normal mode
      on activity), but then immediately disables the clocks (possibly
      before the device reaches the sleep mode), which does not make sense
      to me. If a clean shutdown is wanted before disabling clocks, we can
      just bring it down completely instead of only sleep mode.
    
    Reorganize the PM code so that only the clock logic remains in the
    runtime PM callbacks and the system PM callbacks contain the device
    bring-up/down logic. This makes calling the runtime PM callbacks during
    e.g. xcan_chip_start() safe.
    
    The system PM callbacks now simply call common code to start/stop the
    HW if the interface was running, replacing the broken code from before.
    
    xcan_chip_stop() is updated to use the common reset code so that it will
    wait for the reset to complete. Reset also disables all interrupts so do
    not do that separately.
    
    Also, the device_may_wakeup() checks are removed as the driver does not
    have wakeup support.
    
    Tested on Zynq-7000 integrated CAN.
    
    Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
    Cc: Michal Simek <michal.simek@xilinx.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit de2219a86c0ba1d07e16e81011b0749b867f76c5
Author: Anssi Hannula <anssi.hannula@bitwise.fi>
Date:   Tue Feb 7 17:01:14 2017 +0200

    can: xilinx_can: fix RX loop if RXNEMP is asserted without RXOK
    
    commit 32852c561bffd613d4ed7ec464b1e03e1b7b6c5c upstream.
    
    If the device gets into a state where RXNEMP (RX FIFO not empty)
    interrupt is asserted without RXOK (new frame received successfully)
    interrupt being asserted, xcan_rx_poll() will continue to try to clear
    RXNEMP without actually reading frames from RX FIFO. If the RX FIFO is
    not empty, the interrupt will not be cleared and napi_schedule() will
    just be called again.
    
    This situation can occur when:
    
    (a) xcan_rx() returns without reading RX FIFO due to an error condition.
    The code tries to clear both RXOK and RXNEMP but RXNEMP will not clear
    due to a frame still being in the FIFO. The frame will never be read
    from the FIFO as RXOK is no longer set.
    
    (b) A frame is received between xcan_rx_poll() reading interrupt status
    and clearing RXOK. RXOK will be cleared, but RXNEMP will again remain
    set as the new message is still in the FIFO.
    
    I'm able to trigger case (b) by flooding the bus with frames under load.
    
    There does not seem to be any benefit in using both RXNEMP and RXOK in
    the way the driver does, and the polling example in the reference manual
    (UG585 v1.10 18.3.7 Read Messages from RxFIFO) also says that either
    RXOK or RXNEMP can be used for detecting incoming messages.
    
    Fix the issue and simplify the RX processing by only using RXNEMP
    without RXOK.
    
    Tested with the integrated CAN on Zynq-7000 SoC.
    
    Fixes: b1201e44f50b ("can: xilinx CAN controller support")
    Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit bf0070e2f56eb3da708978da5dab5d36bbed1bf6
Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date:   Tue Jul 10 14:51:33 2018 +0200

    driver core: Partially revert "driver core: correct device's shutdown order"
    
    commit 722e5f2b1eec7de61117b7c0a7914761e3da2eda upstream.
    
    Commit 52cdbdd49853 (driver core: correct device's shutdown order)
    introduced a regression by breaking device shutdown on some systems.
    
    Namely, the devices_kset_move_last() call in really_probe() added by
    that commit is a mistake as it may cause parents to follow children
    in the devices_kset list which then causes shutdown to fail.  For
    example, if a device has children before really_probe() is called
    for it (which is not uncommon), that call will cause it to be
    reordered after the children in the devices_kset list and the
    ordering of that list will not reflect the correct device shutdown
    order any more.
    
    Also it causes the devices_kset list to be constantly reordered
    until all drivers have been probed which is totally pointless
    overhead in the majority of cases and it only covered an issue
    with system shutdown, while system-wide suspend/resume potentially
    had the same issue on the affected platforms (which was not covered).
    
    Moreover, the shutdown issue originally addressed by the change in
    really_probe() made by commit 52cdbdd49853 is not present in 4.18-rc
    any more, since dra7 started to use the sdhci-omap driver which
    doesn't disable any regulators during shutdown, so the really_probe()
    part of commit 52cdbdd49853 can be safely reverted.  [The original
    issue was related to the omap_hsmmc driver used by dra7 previously.]
    
    For the above reasons, revert the really_probe() modifications made
    by commit 52cdbdd49853.
    
    The other code changes made by commit 52cdbdd49853 are useful and
    they need not be reverted.
    
    Fixes: 52cdbdd49853 (driver core: correct device's shutdown order)
    Link: https://lore.kernel.org/lkml/CAFgQCTt7VfqM=UyCnvNFxrSw8Z6cUtAi3HUwR4_xPAc03SgHjQ@mail.gmail.com/
    Reported-by: Pingfan Liu <kernelfans@gmail.com>
    Tested-by: Pingfan Liu <kernelfans@gmail.com>
    Reviewed-by: Kishon Vijay Abraham I <kishon@ti.com>
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Cc: stable <stable@vger.kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 9e10043b6bdcc1a991a029de8a0bc745950345c6
Author: Jerry Zhang <zhangjerry@google.com>
Date:   Mon Jul 2 12:48:08 2018 -0700

    usb: gadget: f_fs: Only return delayed status when len is 0
    
    commit 4d644abf25698362bd33d17c9ddc8f7122c30f17 upstream.
    
    Commit 1b9ba000 ("Allow function drivers to pause control
    transfers") states that USB_GADGET_DELAYED_STATUS is only
    supported if data phase is 0 bytes.
    
    It seems that when the length is not 0 bytes, there is no
    need to explicitly delay the data stage since the transfer
    is not completed until the user responds. However, when the
    length is 0, there is no data stage and the transfer is
    finished once setup() returns, hence there is a need to
    explicitly delay completion.
    
    This manifests as the following bugs:
    
    Prior to 946ef68ad4e4 ('Let setup() return
    USB_GADGET_DELAYED_STATUS'), when setup is 0 bytes, ffs
    would require user to queue a 0 byte request in order to
    clear setup state. However, that 0 byte request was actually
    not needed and would hang and cause errors in other setup
    requests.
    
    After the above commit, 0 byte setups work since the gadget
    now accepts empty queues to ep0 to clear the delay, but all
    other setups hang.
    
    Fixes: 946ef68ad4e4 ("Let setup() return USB_GADGET_DELAYED_STATUS")
    Signed-off-by: Jerry Zhang <zhangjerry@google.com>
    Cc: stable <stable@vger.kernel.org>
    Acked-by: Felipe Balbi <felipe.balbi@linux.intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit e2996cf59ebf6564002f6bdf9ebb16336cae62e1
Author: Bin Liu <b-liu@ti.com>
Date:   Thu Jul 19 14:39:37 2018 -0500

    usb: core: handle hub C_PORT_OVER_CURRENT condition
    
    commit 249a32b7eeb3edb6897dd38f89651a62163ac4ed upstream.
    
    Based on USB2.0 Spec Section 11.12.5,
    
      "If a hub has per-port power switching and per-port current limiting,
      an over-current on one port may still cause the power on another port
      to fall below specific minimums. In this case, the affected port is
      placed in the Power-Off state and C_PORT_OVER_CURRENT is set for the
      port, but PORT_OVER_CURRENT is not set."
    
    so let's check C_PORT_OVER_CURRENT too for over current condition.
    
    Fixes: 08d1dec6f405 ("usb:hub set hub->change_bits when over-current happens")
    Cc: <stable@vger.kernel.org>
    Tested-by: Alessandro Antenucci <antenucci@korg.it>
    Signed-off-by: Bin Liu <b-liu@ti.com>
    Acked-by: Alan Stern <stern@rowland.harvard.edu>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit b0bd06a4757e4356569c2a73bc535f090dec5fa8
Author: Lubomir Rintel <lkundrak@v3.sk>
Date:   Tue Jul 10 08:28:49 2018 +0200

    usb: cdc_acm: Add quirk for Castles VEGA3000
    
    commit 1445cbe476fc3dd09c0b380b206526a49403c071 upstream.
    
    The device (a POS terminal) implements CDC ACM, but has not union
    descriptor.
    
    Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
    Acked-by: Oliver Neukum <oneukum@suse.com>
    Cc: stable <stable@vger.kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 94623c7463f3424776408df2733012c42b52395a
Author: Eric Dumazet <edumazet@google.com>
Date:   Mon Jul 23 09:28:20 2018 -0700

    tcp: call tcp_drop() from tcp_data_queue_ofo()
    
    [ Upstream commit 8541b21e781a22dce52a74fef0b9bed00404a1cd ]
    
    In order to be able to give better diagnostics and detect
    malicious traffic, we need to have better sk->sk_drops tracking.
    
    Fixes: 9f5afeae5152 ("tcp: use an RB tree for ooo receive queue")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
    Acked-by: Yuchung Cheng <ycheng@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit a878681484a0992ee3dfbd7826439951f9f82a69
Author: Eric Dumazet <edumazet@google.com>
Date:   Mon Jul 23 09:28:19 2018 -0700

    tcp: detect malicious patterns in tcp_collapse_ofo_queue()
    
    [ Upstream commit 3d4bf93ac12003f9b8e1e2de37fe27983deebdcf ]
    
    In case an attacker feeds tiny packets completely out of order,
    tcp_collapse_ofo_queue() might scan the whole rb-tree, performing
    expensive copies, but not changing socket memory usage at all.
    
    1) Do not attempt to collapse tiny skbs.
    2) Add logic to exit early when too many tiny skbs are detected.
    
    We prefer not doing aggressive collapsing (which copies packets)
    for pathological flows, and revert to tcp_prune_ofo_queue() which
    will be less expensive.
    
    In the future, we might add the possibility of terminating flows
    that are proven to be malicious.
    
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit fdf258ed5dd85b57cf0e0e66500be98d38d42d02
Author: Eric Dumazet <edumazet@google.com>
Date:   Mon Jul 23 09:28:18 2018 -0700

    tcp: avoid collapses in tcp_prune_queue() if possible
    
    [ Upstream commit f4a3313d8e2ca9fd8d8f45e40a2903ba782607e7 ]
    
    Right after a TCP flow is created, receiving tiny out of order
    packets allways hit the condition :
    
    if (atomic_read(&sk->sk_rmem_alloc) >= sk->sk_rcvbuf)
            tcp_clamp_window(sk);
    
    tcp_clamp_window() increases sk_rcvbuf to match sk_rmem_alloc
    (guarded by tcp_rmem[2])
    
    Calling tcp_collapse_ofo_queue() in this case is not useful,
    and offers a O(N^2) surface attack to malicious peers.
    
    Better not attempt anything before full queue capacity is reached,
    forcing attacker to spend lots of resource and allow us to more
    easily detect the abuse.
    
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
    Acked-by: Yuchung Cheng <ycheng@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 2d08921c8da26bdce3d8848ef6f32068f594d7d4
Author: Eric Dumazet <edumazet@google.com>
Date:   Mon Jul 23 09:28:17 2018 -0700

    tcp: free batches of packets in tcp_prune_ofo_queue()
    
    [ Upstream commit 72cd43ba64fc172a443410ce01645895850844c8 ]
    
    Juha-Matti Tilli reported that malicious peers could inject tiny
    packets in out_of_order_queue, forcing very expensive calls
    to tcp_collapse_ofo_queue() and tcp_prune_ofo_queue() for
    every incoming packet. out_of_order_queue rb-tree can contain
    thousands of nodes, iterating over all of them is not nice.
    
    Before linux-4.9, we would have pruned all packets in ofo_queue
    in one go, every XXXX packets. XXXX depends on sk_rcvbuf and skbs
    truesize, but is about 7000 packets with tcp_rmem[2] default of 6 MB.
    
    Since we plan to increase tcp_rmem[2] in the future to cope with
    modern BDP, can not revert to the old behavior, without great pain.
    
    Strategy taken in this patch is to purge ~12.5 % of the queue capacity.
    
    Fixes: 36a6503fedda ("tcp: refine tcp_prune_ofo_queue() to not drop all packets")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reported-by: Juha-Matti Tilli <juha-matti.tilli@iki.fi>
    Acked-by: Yuchung Cheng <ycheng@google.com>
    Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 8736711f4e55a50e0ec89edd366256f8edb5f9c3
Author: Yuchung Cheng <ycheng@google.com>
Date:   Wed Jul 18 13:56:36 2018 -0700

    tcp: do not delay ACK in DCTCP upon CE status change
    
    [ Upstream commit a0496ef2c23b3b180902dd185d0d63ccbc624cf8 ]
    
    Per DCTCP RFC8257 (Section 3.2) the ACK reflecting the CE status change
    has to be sent immediately so the sender can respond quickly:
    
    """ When receiving packets, the CE codepoint MUST be processed as follows:
    
       1.  If the CE codepoint is set and DCTCP.CE is false, set DCTCP.CE to
           true and send an immediate ACK.
    
       2.  If the CE codepoint is not set and DCTCP.CE is true, set DCTCP.CE
           to false and send an immediate ACK.
    """
    
    Previously DCTCP implementation may continue to delay the ACK. This
    patch fixes that to implement the RFC by forcing an immediate ACK.
    
    Tested with this packetdrill script provided by Larry Brakmo
    
    0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
    0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
    0.000 setsockopt(3, SOL_TCP, TCP_CONGESTION, "dctcp", 5) = 0
    0.000 bind(3, ..., ...) = 0
    0.000 listen(3, 1) = 0
    
    0.100 < [ect0] SEW 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
    0.100 > SE. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>
    0.110 < [ect0] . 1:1(0) ack 1 win 257
    0.200 accept(3, ..., ...) = 4
       +0 setsockopt(4, SOL_SOCKET, SO_DEBUG, [1], 4) = 0
    
    0.200 < [ect0] . 1:1001(1000) ack 1 win 257
    0.200 > [ect01] . 1:1(0) ack 1001
    
    0.200 write(4, ..., 1) = 1
    0.200 > [ect01] P. 1:2(1) ack 1001
    
    0.200 < [ect0] . 1001:2001(1000) ack 2 win 257
    +0.005 < [ce] . 2001:3001(1000) ack 2 win 257
    
    +0.000 > [ect01] . 2:2(0) ack 2001
    // Previously the ACK below would be delayed by 40ms
    +0.000 > [ect01] E. 2:2(0) ack 3001
    
    +0.500 < F. 9501:9501(0) ack 4 win 257
    
    Signed-off-by: Yuchung Cheng <ycheng@google.com>
    Acked-by: Neal Cardwell <ncardwell@google.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 57ec8824b14d76089d1fd0d963ccc2d084070e57
Author: Yuchung Cheng <ycheng@google.com>
Date:   Wed Jul 18 13:56:35 2018 -0700

    tcp: do not cancel delay-AcK on DCTCP special ACK
    
    [ Upstream commit 27cde44a259c380a3c09066fc4b42de7dde9b1ad ]
    
    Currently when a DCTCP receiver delays an ACK and receive a
    data packet with a different CE mark from the previous one's, it
    sends two immediate ACKs acking previous and latest sequences
    respectly (for ECN accounting).
    
    Previously sending the first ACK may mark off the delayed ACK timer
    (tcp_event_ack_sent). This may subsequently prevent sending the
    second ACK to acknowledge the latest sequence (tcp_ack_snd_check).
    The culprit is that tcp_send_ack() assumes it always acknowleges
    the latest sequence, which is not true for the first special ACK.
    
    The fix is to not make the assumption in tcp_send_ack and check the
    actual ack sequence before cancelling the delayed ACK. Further it's
    safer to pass the ack sequence number as a local variable into
    tcp_send_ack routine, instead of intercepting tp->rcv_nxt to avoid
    future bugs like this.
    
    Reported-by: Neal Cardwell <ncardwell@google.com>
    Signed-off-by: Yuchung Cheng <ycheng@google.com>
    Acked-by: Neal Cardwell <ncardwell@google.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 1fcccc57866bd5c31ec09df7accdfecb0d222a85
Author: Yuchung Cheng <ycheng@google.com>
Date:   Wed Jul 18 13:56:34 2018 -0700

    tcp: helpers to send special DCTCP ack
    
    [ Upstream commit 2987babb6982306509380fc11b450227a844493b ]
    
    Refactor and create helpers to send the special ACK in DCTCP.
    
    Signed-off-by: Yuchung Cheng <ycheng@google.com>
    Acked-by: Neal Cardwell <ncardwell@google.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 841778018235f7328a8be29cbdaa36c7703d064c
Author: Yuchung Cheng <ycheng@google.com>
Date:   Thu Jul 12 06:04:52 2018 -0700

    tcp: fix dctcp delayed ACK schedule
    
    [ Upstream commit b0c05d0e99d98d7f0cd41efc1eeec94efdc3325d ]
    
    Previously, when a data segment was sent an ACK was piggybacked
    on the data segment without generating a CA_EVENT_NON_DELAYED_ACK
    event to notify congestion control modules. So the DCTCP
    ca->delayed_ack_reserved flag could incorrectly stay set when
    in fact there were no delayed ACKs being reserved. This could result
    in sending a special ECN notification ACK that carries an older
    ACK sequence, when in fact there was no need for such an ACK.
    DCTCP keeps track of the delayed ACK status with its own separate
    state ca->delayed_ack_reserved. Previously it may accidentally cancel
    the delayed ACK without updating this field upon sending a special
    ACK that carries a older ACK sequence. This inconsistency would
    lead to DCTCP receiver never acknowledging the latest data until the
    sender times out and retry in some cases.
    
    Packetdrill script (provided by Larry Brakmo)
    
    0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
    0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
    0.000 setsockopt(3, SOL_TCP, TCP_CONGESTION, "dctcp", 5) = 0
    0.000 bind(3, ..., ...) = 0
    0.000 listen(3, 1) = 0
    
    0.100 < [ect0] SEW 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
    0.100 > SE. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>
    0.110 < [ect0] . 1:1(0) ack 1 win 257
    0.200 accept(3, ..., ...) = 4
    
    0.200 < [ect0] . 1:1001(1000) ack 1 win 257
    0.200 > [ect01] . 1:1(0) ack 1001
    
    0.200 write(4, ..., 1) = 1
    0.200 > [ect01] P. 1:2(1) ack 1001
    
    0.200 < [ect0] . 1001:2001(1000) ack 2 win 257
    0.200 write(4, ..., 1) = 1
    0.200 > [ect01] P. 2:3(1) ack 2001
    
    0.200 < [ect0] . 2001:3001(1000) ack 3 win 257
    0.200 < [ect0] . 3001:4001(1000) ack 3 win 257
    0.200 > [ect01] . 3:3(0) ack 4001
    
    0.210 < [ce] P. 4001:4501(500) ack 3 win 257
    
    +0.001 read(4, ..., 4500) = 4500
    +0 write(4, ..., 1) = 1
    +0 > [ect01] PE. 3:4(1) ack 4501
    
    +0.010 < [ect0] W. 4501:5501(1000) ack 4 win 257
    // Previously the ACK sequence below would be 4501, causing a long RTO
    +0.040~+0.045 > [ect01] . 4:4(0) ack 5501   // delayed ack
    
    +0.311 < [ect0] . 5501:6501(1000) ack 4 win 257  // More data
    +0 > [ect01] . 4:4(0) ack 6501     // now acks everything
    
    +0.500 < F. 9501:9501(0) ack 4 win 257
    
    Reported-by: Larry Brakmo <brakmo@fb.com>
    Signed-off-by: Yuchung Cheng <ycheng@google.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Acked-by: Neal Cardwell <ncardwell@google.com>
    Acked-by: Lawrence Brakmo <brakmo@fb.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 19b74799159ab6f780fedd568adf00cdee9edd5a
Author: Roopa Prabhu <roopa@cumulusnetworks.com>
Date:   Fri Jul 20 13:21:01 2018 -0700

    rtnetlink: add rtnl_link_state check in rtnl_configure_link
    
    [ Upstream commit 5025f7f7d506fba9b39e7fe8ca10f6f34cb9bc2d ]
    
    rtnl_configure_link sets dev->rtnl_link_state to
    RTNL_LINK_INITIALIZED and unconditionally calls
    __dev_notify_flags to notify user-space of dev flags.
    
    current call sequence for rtnl_configure_link
    rtnetlink_newlink
        rtnl_link_ops->newlink
        rtnl_configure_link (unconditionally notifies userspace of
                             default and new dev flags)
    
    If a newlink handler wants to call rtnl_configure_link
    early, we will end up with duplicate notifications to
    user-space.
    
    This patch fixes rtnl_configure_link to check rtnl_link_state
    and call __dev_notify_flags with gchanges = 0 if already
    RTNL_LINK_INITIALIZED.
    
    Later in the series, this patch will help the following sequence
    where a driver implementing newlink can call rtnl_configure_link
    to initialize the link early.
    
    makes the following call sequence work:
    rtnetlink_newlink
        rtnl_link_ops->newlink (vxlan) -> rtnl_configure_link (initializes
                                                    link and notifies
                                                    user-space of default
                                                    dev flags)
        rtnl_configure_link (updates dev flags if requested by user ifm
                             and notifies user-space of new dev flags)
    
    Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit c6ac36be72e46082bedd4433ccba9925bbf6a322
Author: Heiner Kallweit <hkallweit1@gmail.com>
Date:   Thu Jul 19 08:15:16 2018 +0200

    net: phy: consider PHY_IGNORE_INTERRUPT in phy_start_aneg_priv
    
    [ Upstream commit 215d08a85b9acf5e1fe9dbf50f1774cde333efef ]
    
    The situation described in the comment can occur also with
    PHY_IGNORE_INTERRUPT, therefore change the condition to include it.
    
    Fixes: f555f34fdc58 ("net: phy: fix auto-negotiation stall due to unavailable interrupt")
    Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit cc403d5dc140ba74400335b84b91f97d07dbd569
Author: Hangbin Liu <liuhangbin@gmail.com>
Date:   Fri Jul 20 14:04:27 2018 +0800

    multicast: do not restore deleted record source filter mode to new one
    
    There are two scenarios that we will restore deleted records. The first is
    when device down and up(or unmap/remap). In this scenario the new filter
    mode is same with previous one. Because we get it from in_dev->mc_list and
    we do not touch it during device down and up.
    
    The other scenario is when a new socket join a group which was just delete
    and not finish sending status reports. In this scenario, we should use the
    current filter mode instead of restore old one. Here are 4 cases in total.
    
    old_socket        new_socket       before_fix       after_fix
      IN(A)             IN(A)           ALLOW(A)         ALLOW(A)
      IN(A)             EX( )           TO_IN( )         TO_EX( )
      EX( )             IN(A)           TO_EX( )         ALLOW(A)
      EX( )             EX( )           TO_EX( )         TO_EX( )
    
    Fixes: 24803f38a5c0b (igmp: do not remove igmp souce list info when set link down)
    Fixes: 1666d49e1d416 (mld: do not remove mld souce list info when set link down)
    Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit b7e37add79bf1a2c9f4a7574b0d7f53f0f6df663
Author: Eran Ben Elisha <eranbe@mellanox.com>
Date:   Sun Jul 8 13:08:55 2018 +0300

    net/mlx5e: Fix quota counting in aRFS expire flow
    
    [ Upstream commit 2630bae8018823c3b88788b69fb9f16ea3b4a11e ]
    
    Quota should follow the amount of rules which do expire, and not the
    number of rules that were examined, fixed that.
    
    Fixes: 18c908e477dc ("net/mlx5e: Add accelerated RFS support")
    Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
    Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
    Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
    Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit d9d580121617d95aade9aa839d4e405521eafcb8
Author: Eran Ben Elisha <eranbe@mellanox.com>
Date:   Sun Jul 8 14:52:12 2018 +0300

    net/mlx5e: Don't allow aRFS for encapsulated packets
    
    [ Upstream commit d2e1c57bcf9a07cbb67f30ecf238f298799bce1c ]
    
    Driver is yet to support aRFS for encapsulated packets, return early
    error in such case.
    
    Fixes: 18c908e477dc ("net/mlx5e: Add accelerated RFS support")
    Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
    Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
    Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit adcecd4ab150f4e2dfdb2fcbf3f1394de6f9dd7a
Author: Ariel Levkovich <lariel@mellanox.com>
Date:   Mon Jun 25 19:12:02 2018 +0300

    net/mlx5: Adjust clock overflow work period
    
    [ Upstream commit 33180bee86a8940a84950edca46315cd9dd6deb5 ]
    
    When driver converts HW timestamp to wall clock time it subtracts
    the last saved cycle counter from the HW timestamp and converts the
    difference to nanoseconds.
    The conversion is done by multiplying the cycles difference with the
    clock multiplier value as a first step and therefore the cycles
    difference should be small enough so that the multiplication product
    doesn't exceed 64bit.
    
    The overflow handling routine is in charge of updating the last saved
    cycle counter in driver and it is called periodically using kernel
    delayed workqueue.
    
    The delay period for this work is calculated using the max HW cycle
    counter value (a 41 bit mask) as a base which doesn't take the 64bit
    limit into account so the delay period may be incorrect and too
    long to prevent a large difference between the HW counter and the last
    saved counter in SW.
    
    This change adjusts the work period for the HW clock overflow work by
    taking the minimum between the previous value and the quotient of max
    u64 value and the clock multiplier value.
    
    Fixes: ef9814deafd0 ("net/mlx5e: Add HW timestamping (TS) support")
    Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
    Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
    Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit e2ffdd646cf61ae1444d376d1ea1e756a495eff8
Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Jul 19 16:04:38 2018 -0700

    net: skb_segment() should not return NULL
    
    [ Upstream commit ff907a11a0d68a749ce1a321f4505c03bf72190c ]
    
    syzbot caught a NULL deref [1], caused by skb_segment()
    
    skb_segment() has many "goto err;" that assume the @err variable
    contains -ENOMEM.
    
    A successful call to __skb_linearize() should not clear @err,
    otherwise a subsequent memory allocation error could return NULL.
    
    While we are at it, we might use -EINVAL instead of -ENOMEM when
    MAX_SKB_FRAGS limit is reached.
    
    [1]
    kasan: CONFIG_KASAN_INLINE enabled
    kasan: GPF could be caused by NULL-ptr deref or user memory access
    general protection fault: 0000 [#1] SMP KASAN
    CPU: 0 PID: 13285 Comm: syz-executor3 Not tainted 4.18.0-rc4+ #146
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:tcp_gso_segment+0x3dc/0x1780 net/ipv4/tcp_offload.c:106
    Code: f0 ff ff 0f 87 1c fd ff ff e8 00 88 0b fb 48 8b 75 d0 48 b9 00 00 00 00 00 fc ff df 48 8d be 90 00 00 00 48 89 f8 48 c1 e8 03 <0f> b6 14 08 48 8d 86 94 00 00 00 48 89 c6 83 e0 07 48 c1 ee 03 0f
    RSP: 0018:ffff88019b7fd060 EFLAGS: 00010206
    RAX: 0000000000000012 RBX: 0000000000000020 RCX: dffffc0000000000
    RDX: 0000000000040000 RSI: 0000000000000000 RDI: 0000000000000090
    RBP: ffff88019b7fd0f0 R08: ffff88019510e0c0 R09: ffffed003b5c46d6
    R10: ffffed003b5c46d6 R11: ffff8801dae236b3 R12: 0000000000000001
    R13: ffff8801d6c581f4 R14: 0000000000000000 R15: ffff8801d6c58128
    FS:  00007fcae64d6700(0000) GS:ffff8801dae00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00000000004e8664 CR3: 00000001b669b000 CR4: 00000000001406f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
     tcp4_gso_segment+0x1c3/0x440 net/ipv4/tcp_offload.c:54
     inet_gso_segment+0x64e/0x12d0 net/ipv4/af_inet.c:1342
     inet_gso_segment+0x64e/0x12d0 net/ipv4/af_inet.c:1342
     skb_mac_gso_segment+0x3b5/0x740 net/core/dev.c:2792
     __skb_gso_segment+0x3c3/0x880 net/core/dev.c:2865
     skb_gso_segment include/linux/netdevice.h:4099 [inline]
     validate_xmit_skb+0x640/0xf30 net/core/dev.c:3104
     __dev_queue_xmit+0xc14/0x3910 net/core/dev.c:3561
     dev_queue_xmit+0x17/0x20 net/core/dev.c:3602
     neigh_hh_output include/net/neighbour.h:473 [inline]
     neigh_output include/net/neighbour.h:481 [inline]
     ip_finish_output2+0x1063/0x1860 net/ipv4/ip_output.c:229
     ip_finish_output+0x841/0xfa0 net/ipv4/ip_output.c:317
     NF_HOOK_COND include/linux/netfilter.h:276 [inline]
     ip_output+0x223/0x880 net/ipv4/ip_output.c:405
     dst_output include/net/dst.h:444 [inline]
     ip_local_out+0xc5/0x1b0 net/ipv4/ip_output.c:124
     iptunnel_xmit+0x567/0x850 net/ipv4/ip_tunnel_core.c:91
     ip_tunnel_xmit+0x1598/0x3af1 net/ipv4/ip_tunnel.c:778
     ipip_tunnel_xmit+0x264/0x2c0 net/ipv4/ipip.c:308
     __netdev_start_xmit include/linux/netdevice.h:4148 [inline]
     netdev_start_xmit include/linux/netdevice.h:4157 [inline]
     xmit_one net/core/dev.c:3034 [inline]
     dev_hard_start_xmit+0x26c/0xc30 net/core/dev.c:3050
     __dev_queue_xmit+0x29ef/0x3910 net/core/dev.c:3569
     dev_queue_xmit+0x17/0x20 net/core/dev.c:3602
     neigh_direct_output+0x15/0x20 net/core/neighbour.c:1403
     neigh_output include/net/neighbour.h:483 [inline]
     ip_finish_output2+0xa67/0x1860 net/ipv4/ip_output.c:229
     ip_finish_output+0x841/0xfa0 net/ipv4/ip_output.c:317
     NF_HOOK_COND include/linux/netfilter.h:276 [inline]
     ip_output+0x223/0x880 net/ipv4/ip_output.c:405
     dst_output include/net/dst.h:444 [inline]
     ip_local_out+0xc5/0x1b0 net/ipv4/ip_output.c:124
     ip_queue_xmit+0x9df/0x1f80 net/ipv4/ip_output.c:504
     tcp_transmit_skb+0x1bf9/0x3f10 net/ipv4/tcp_output.c:1168
     tcp_write_xmit+0x1641/0x5c20 net/ipv4/tcp_output.c:2363
     __tcp_push_pending_frames+0xb2/0x290 net/ipv4/tcp_output.c:2536
     tcp_push+0x638/0x8c0 net/ipv4/tcp.c:735
     tcp_sendmsg_locked+0x2ec5/0x3f00 net/ipv4/tcp.c:1410
     tcp_sendmsg+0x2f/0x50 net/ipv4/tcp.c:1447
     inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
     sock_sendmsg_nosec net/socket.c:641 [inline]
     sock_sendmsg+0xd5/0x120 net/socket.c:651
     __sys_sendto+0x3d7/0x670 net/socket.c:1797
     __do_sys_sendto net/socket.c:1809 [inline]
     __se_sys_sendto net/socket.c:1805 [inline]
     __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1805
     do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
     entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x455ab9
    Code: 1d ba fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb b9 fb ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007fcae64d5c68 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
    RAX: ffffffffffffffda RBX: 00007fcae64d66d4 RCX: 0000000000455ab9
    RDX: 0000000000000001 RSI: 0000000020000200 RDI: 0000000000000013
    RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000014
    R13: 00000000004c1145 R14: 00000000004d1818 R15: 0000000000000006
    Modules linked in:
    Dumping ftrace buffer:
       (ftrace buffer empty)
    
    Fixes: ddff00d42043 ("net: Move skb_has_shared_frag check out of GRE code and into segmentation")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: Alexander Duyck <alexander.h.duyck@intel.com>
    Reported-by: syzbot <syzkaller@googlegroups.com>
    Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 444987d535bf65b109b60f45406abe1d7f11f1f7
Author: Jack Morgenstein <jackm@dev.mellanox.co.il>
Date:   Tue Jul 24 14:27:55 2018 +0300

    net/mlx4_core: Save the qpn from the input modifier in RST2INIT wrapper
    
    [ Upstream commit 958c696f5a7274d9447a458ad7aa70719b29a50a ]
    
    Function mlx4_RST2INIT_QP_wrapper saved the qp number passed in the qp
    context, rather than the one passed in the input modifier.
    
    However, the qp number in the qp context is not defined as a
    required parameter by the FW. Therefore, drivers may choose to not
    specify the qp number in the qp context for the reset-to-init transition.
    
    Thus, we must save the qp number passed in the command input modifier --
    which is always present. (This saved qp number is used as the input
    modifier for command 2RST_QP when a slave's qp's are destroyed).
    
    Fixes: c82e9aa0a8bc ("mlx4_core: resource tracking for HCA resources used by guests")
    Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
    Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 03fbf2b8237a3f1689848cf9f2d9bc7c1869e834
Author: Willem de Bruijn <willemb@google.com>
Date:   Mon Jul 23 19:36:48 2018 -0400

    ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull
    
    [ Upstream commit 2efd4fca703a6707cad16ab486eaab8fc7f0fd49 ]
    
    Syzbot reported a read beyond the end of the skb head when returning
    IPV6_ORIGDSTADDR:
    
      BUG: KMSAN: kernel-infoleak in put_cmsg+0x5ef/0x860 net/core/scm.c:242
      CPU: 0 PID: 4501 Comm: syz-executor128 Not tainted 4.17.0+ #9
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      Call Trace:
        __dump_stack lib/dump_stack.c:77 [inline]
        dump_stack+0x185/0x1d0 lib/dump_stack.c:113
        kmsan_report+0x188/0x2a0 mm/kmsan/kmsan.c:1125
        kmsan_internal_check_memory+0x138/0x1f0 mm/kmsan/kmsan.c:1219
        kmsan_copy_to_user+0x7a/0x160 mm/kmsan/kmsan.c:1261
        copy_to_user include/linux/uaccess.h:184 [inline]
        put_cmsg+0x5ef/0x860 net/core/scm.c:242
        ip6_datagram_recv_specific_ctl+0x1cf3/0x1eb0 net/ipv6/datagram.c:719
        ip6_datagram_recv_ctl+0x41c/0x450 net/ipv6/datagram.c:733
        rawv6_recvmsg+0x10fb/0x1460 net/ipv6/raw.c:521
        [..]
    
    This logic and its ipv4 counterpart read the destination port from
    the packet at skb_transport_offset(skb) + 4.
    
    With MSG_MORE and a local SOCK_RAW sender, syzbot was able to cook a
    packet that stores headers exactly up to skb_transport_offset(skb) in
    the head and the remainder in a frag.
    
    Call pskb_may_pull before accessing the pointer to ensure that it lies
    in skb head.
    
    Link: http://lkml.kernel.org/r/CAF=yD-LEJwZj5a1-bAAj2Oy_hKmGygV6rsJ_WOrAYnv-fnayiQ@mail.gmail.com
    Reported-by: syzbot+9adb4b567003cac781f0@syzkaller.appspotmail.com
    Signed-off-by: Willem de Bruijn <willemb@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 93d94fec94c7df51c7f36f45fdc18994073f47ee
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Mon Jul 23 16:50:48 2018 +0200

    ip: hash fragments consistently
    
    [ Upstream commit 3dd1c9a1270736029ffca670e9bd0265f4120600 ]
    
    The skb hash for locally generated ip[v6] fragments belonging
    to the same datagram can vary in several circumstances:
    * for connected UDP[v6] sockets, the first fragment get its hash
      via set_owner_w()/skb_set_hash_from_sk()
    * for unconnected IPv6 UDPv6 sockets, the first fragment can get
      its hash via ip6_make_flowlabel()/skb_get_hash_flowi6(), if
      auto_flowlabel is enabled
    
    For the following frags the hash is usually computed via
    skb_get_hash().
    The above can cause OoO for unconnected IPv6 UDPv6 socket: in that
    scenario the egress tx queue can be selected on a per packet basis
    via the skb hash.
    It may also fool flow-oriented schedulers to place fragments belonging
    to the same datagram in different flows.
    
    Fix the issue by copying the skb hash from the head frag into
    the others at fragmentation time.
    
    Before this commit:
    perf probe -a "dev_queue_xmit skb skb->hash skb->l4_hash:b1@0/8 skb->sw_hash:b1@1/8"
    netperf -H $IPV4 -t UDP_STREAM -l 5 -- -m 2000 -n &
    perf record -e probe:dev_queue_xmit -e probe:skb_set_owner_w -a sleep 0.1
    perf script
    probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=3713014309 l4_hash=1 sw_hash=0
    probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=0 l4_hash=0 sw_hash=0
    
    After this commit:
    probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=2171763177 l4_hash=1 sw_hash=0
    probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=2171763177 l4_hash=1 sw_hash=0
    
    Fixes: b73c3d0e4f0e ("net: Save TX flow hash in sock and set in skbuf on xmit")
    Fixes: 67800f9b1f4e ("ipv6: Call skb_get_hash_flowi6 to get skb->hash in ip6_make_flowlabel")
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 650321fe96150898c69c58c6684b33edbe689de1
Author: Paul Burton <paul.burton@mips.com>
Date:   Thu Jul 12 09:33:04 2018 -0700

    MIPS: Fix off-by-one in pci_resource_to_user()
    
    commit 38c0a74fe06da3be133cae3fb7bde6a9438e698b upstream.
    
    The MIPS implementation of pci_resource_to_user() introduced in v3.12 by
    commit 4c2924b725fb ("MIPS: PCI: Use pci_resource_to_user to map pci
    memory space properly") incorrectly sets *end to the address of the
    byte after the resource, rather than the last byte of the resource.
    
    This results in userland seeing resources as a byte larger than they
    actually are, for example a 32 byte BAR will be reported by a tool such
    as lspci as being 33 bytes in size:
    
        Region 2: I/O ports at 1000 [disabled] [size=33]
    
    Correct this by subtracting one from the calculated end address,
    reporting the correct address to userland.
    
    Signed-off-by: Paul Burton <paul.burton@mips.com>
    Reported-by: Rui Wang <rui.wang@windriver.com>
    Fixes: 4c2924b725fb ("MIPS: PCI: Use pci_resource_to_user to map pci memory space properly")
    Cc: James Hogan <jhogan@kernel.org>
    Cc: Ralf Baechle <ralf@linux-mips.org>
    Cc: Wolfgang Grandegger <wg@grandegger.com>
    Cc: linux-mips@linux-mips.org
    Cc: stable@vger.kernel.org # v3.12+
    Patchwork: https://patchwork.linux-mips.org/patch/19829/
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 92f724130fac801b97aef580ae5e185a699661c4
Author: Felix Fietkau <nbd@nbd.name>
Date:   Fri Jul 20 13:58:21 2018 +0200

    MIPS: ath79: fix register address in ath79_ddr_wb_flush()
    
    commit bc88ad2efd11f29e00a4fd60fcd1887abfe76833 upstream.
    
    ath79_ddr_wb_flush_base has the type void __iomem *, so register offsets
    need to be a multiple of 4 in order to access the intended register.
    
    Signed-off-by: Felix Fietkau <nbd@nbd.name>
    Signed-off-by: John Crispin <john@phrozen.org>
    Signed-off-by: Paul Burton <paul.burton@mips.com>
    Fixes: 24b0e3e84fbf ("MIPS: ath79: Improve the DDR controller interface")
    Patchwork: https://patchwork.linux-mips.org/patch/19912/
    Cc: Alban Bedel <albeu@free.fr>
    Cc: James Hogan <jhogan@kernel.org>
    Cc: Ralf Baechle <ralf@linux-mips.org>
    Cc: linux-mips@linux-mips.org
    Cc: stable@vger.kernel.org # 4.2+
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>