Skip to content
  1. Jan 25, 2016
  2. Jan 22, 2016
    • David Howells's avatar
      KEYS: Fix race between read and revoke · e41946e4
      David Howells authored
      [ Upstream commit b4a1b4f5 ]
      
      This fixes CVE-2015-7550.
      
      There's a race between keyctl_read() and keyctl_revoke().  If the revoke
      happens between keyctl_read() checking the validity of a key and the key's
      semaphore being taken, then the key type read method will see a revoked key.
      
      This causes a problem for the user-defined key type because it assumes in
      its read method that there will always be a payload in a non-revoked key
      and doesn't check for a NULL pointer.
      
      Fix this by making keyctl_read() check the validity of a key after taking
      semaphore instead of before.
      
      I think the bug was introduced with the original keyrings code.
      
      This was discovered by a multithreaded test program generated by syzkaller
      (http://github.com/google/syzkaller
      
      ).  Here's a cleaned up version:
      
      	#include <sys/types.h>
      	#include <keyutils.h>
      	#include <pthread.h>
      	void *thr0(void *arg)
      	{
      		key_serial_t key = (unsigned long)arg;
      		keyctl_revoke(key);
      		return 0;
      	}
      	void *thr1(void *arg)
      	{
      		key_serial_t key = (unsigned long)arg;
      		char buffer[16];
      		keyctl_read(key, buffer, 16);
      		return 0;
      	}
      	int main()
      	{
      		key_serial_t key = add_key("user", "%", "foo", 3, KEY_SPEC_USER_KEYRING);
      		pthread_t th[5];
      		pthread_create(&th[0], 0, thr0, (void *)(unsigned long)key);
      		pthread_create(&th[1], 0, thr1, (void *)(unsigned long)key);
      		pthread_create(&th[2], 0, thr0, (void *)(unsigned long)key);
      		pthread_create(&th[3], 0, thr1, (void *)(unsigned long)key);
      		pthread_join(th[0], 0);
      		pthread_join(th[1], 0);
      		pthread_join(th[2], 0);
      		pthread_join(th[3], 0);
      		return 0;
      	}
      
      Build as:
      
      	cc -o keyctl-race keyctl-race.c -lkeyutils -lpthread
      
      Run as:
      
      	while keyctl-race; do :; done
      
      as it may need several iterations to crash the kernel.  The crash can be
      summarised as:
      
      	BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
      	IP: [<ffffffff81279b08>] user_read+0x56/0xa3
      	...
      	Call Trace:
      	 [<ffffffff81276aa9>] keyctl_read_key+0xb6/0xd7
      	 [<ffffffff81277815>] SyS_keyctl+0x83/0xe0
      	 [<ffffffff815dbb97>] entry_SYSCALL_64_fastpath+0x12/0x6f
      
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJames Morris <james.l.morris@oracle.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      e41946e4
    • WANG Cong's avatar
      net: check both type and procotol for tcp sockets · e49b606c
      WANG Cong authored
      [ Upstream commit ac5cc977
      
       ]
      
      Dmitry reported the following out-of-bound access:
      
      Call Trace:
       [<ffffffff816cec2e>] __asan_report_load4_noabort+0x3e/0x40
      mm/kasan/report.c:294
       [<ffffffff84affb14>] sock_setsockopt+0x1284/0x13d0 net/core/sock.c:880
       [<     inline     >] SYSC_setsockopt net/socket.c:1746
       [<ffffffff84aed7ee>] SyS_setsockopt+0x1fe/0x240 net/socket.c:1729
       [<ffffffff85c18c76>] entry_SYSCALL_64_fastpath+0x16/0x7a
      arch/x86/entry/entry_64.S:185
      
      This is because we mistake a raw socket as a tcp socket.
      We should check both sk->sk_type and sk->sk_protocol to ensure
      it is a tcp socket.
      
      Willem points out __skb_complete_tx_timestamp() needs to fix as well.
      
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      e49b606c
    • Ben Hutchings's avatar
      usb: Use the USB_SS_MULT() macro to decode burst multiplier for log message · b3bd889d
      Ben Hutchings authored
      [ Upstream commit 5377adb0 ]
      
      usb_parse_ss_endpoint_companion() now decodes the burst multiplier
      correctly in order to check that it's <= 3, but still uses the wrong
      expression if warning that it's > 3.
      
      Fixes: ff30cbc8
      
       ("usb: Use the USB_SS_MULT() macro to get the ...")
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      b3bd889d
    • Hans Yang's avatar
      usb: core : hub: Fix BOS 'NULL pointer' kernel panic · 292e5af9
      Hans Yang authored
      [ Upstream commit 464ad8c4
      
       ]
      
      When a USB 3.0 mass storage device is disconnected in transporting
      state, storage device driver may handle it as a transport error and
      reset the device by invoking usb_reset_and_verify_device()
      and following could happen:
      
      in usb_reset_and_verify_device():
         udev->bos = NULL;
      
      For U1/U2 enabled devices, driver will disable LPM, and in some
      conditions:
         from usb_unlocked_disable_lpm()
          --> usb_disable_lpm()
          --> usb_enable_lpm()
              udev->bos->ss_cap->bU1devExitLat;
      
      And it causes 'NULL pointer' and 'kernel panic':
      
      [  157.976257] Unable to handle kernel NULL pointer dereference
      at virtual address 00000010
      ...
      [  158.026400] PC is at usb_enable_link_state+0x34/0x2e0
      [  158.031442] LR is at usb_enable_lpm+0x98/0xac
      ...
      [  158.137368] [<ffffffc0006a1cac>] usb_enable_link_state+0x34/0x2e0
      [  158.143451] [<ffffffc0006a1fec>] usb_enable_lpm+0x94/0xac
      [  158.148840] [<ffffffc0006a20e8>] usb_disable_lpm+0xa8/0xb4
      ...
      [  158.214954] Kernel panic - not syncing: Fatal exception
      
      This commit moves 'udev->bos = NULL' behind usb_unlocked_disable_lpm()
      to prevent from NULL pointer access.
      
      Issue can be reproduced by following setup:
      1) A SS pen drive behind a SS hub connected to the host.
      2) Transporting data between the pen drive and the host.
      3) Abruptly disconnect hub and pen drive from host.
      4) With a chance it crashes.
      
      Signed-off-by: default avatarHans Yang <hansy@nvidia.com>
      Acked-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      292e5af9
    • Arnd Bergmann's avatar
      usb: musb: USB_TI_CPPI41_DMA requires dmaengine support · 51a55c40
      Arnd Bergmann authored
      [ Upstream commit 183e53e8
      
       ]
      
      The CPPI-4.1 driver selects TI_CPPI41, which is a dmaengine
      driver and that may not be available when CONFIG_DMADEVICES
      is not set:
      
      warning: (USB_TI_CPPI41_DMA) selects TI_CPPI41 which has unmet direct dependencies (DMADEVICES && ARCH_OMAP)
      
      This adds an extra dependency to avoid generating warnings in randconfig
      builds. Ideally we'd remove the 'select' statement, but that has the
      potential to break defconfig files.
      
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Fixes: 411dd19c
      
       ("usb: musb: Kconfig: Select the DMA driver if DMA mode of MUSB is enabled")
      Signed-off-by: default avatarFelipe Balbi <balbi@ti.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      51a55c40
    • Felipe Balbi's avatar
      usb: gadget: pxa27x: fix suspend callback · 8b6655c0
      Felipe Balbi authored
      [ Upstream commit 391e6dcb
      
       ]
      
      pxa27x disconnects pullups on suspend but doesn't
      notify the gadget driver about it, so gadget driver
      can't disable the endpoints it was using.
      
      This causes problems on resume because gadget core
      will think endpoints are still enabled and just
      ignore the following usb_ep_enable().
      
      Fix this problem by calling
      gadget_driver->disconnect().
      
      Cc: <stable@vger.kernel.org> # v3.10+
      Tested-by: default avatarRobert Jarzmik <robert.jarzmik@free.fr>
      Signed-off-by: default avatarFelipe Balbi <balbi@ti.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      8b6655c0
    • Alexey Khoroshilov's avatar
      USB: whci-hcd: add check for dma mapping error · a4e9e566
      Alexey Khoroshilov authored
      [ Upstream commit f9fa1887
      
       ]
      
      qset_fill_page_list() do not check for dma mapping errors.
      
      Found by Linux Driver Verification project (linuxtesting.org).
      
      Signed-off-by: default avatarAlexey Khoroshilov <khoroshilov@ispras.ru>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      a4e9e566
    • Alan Stern's avatar
      USB: add quirk for devices with broken LPM · 782f17fd
      Alan Stern authored
      [ Upstream commit ad87e032
      
       ]
      
      Some USB device / host controller combinations seem to have problems
      with Link Power Management.  For example, Steinar found that his xHCI
      controller wouldn't handle bandwidth calculations correctly for two
      video cards simultaneously when LPM was enabled, even though the bus
      had plenty of bandwidth available.
      
      This patch introduces a new quirk flag for devices that should remain
      disabled for LPM, and creates quirk entries for Steinar's devices.
      
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Reported-by: default avatarSteinar H. Gunderson <sgunderson@bigfoot.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      782f17fd
    • Konstantin Shkolnyy's avatar
      USB: cp210x: Remove CP2110 ID from compatibility list · 5163e218
      Konstantin Shkolnyy authored
      [ Upstream commit 7c90e610
      
       ]
      
      CP2110 ID (0x10c4, 0xea80) doesn't belong here because it's a HID
      and completely different from CP210x devices.
      
      Signed-off-by: default avatarKonstantin Shkolnyy <konstantin.shkolnyy@gmail.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      5163e218
    • Jonas Jonsson's avatar
      USB: serial: Another Infineon flash loader USB ID · fc70c8a4
      Jonas Jonsson authored
      [ Upstream commit a0e80fbd
      
       ]
      
      The flash loader has been seen on a Telit UE910 modem. The flash loader
      is a bit special, it presents both an ACM and CDC Data interface but
      only the latter is useful. Unless a magic string is sent to the device
      it will disappear and the regular modem device appears instead.
      
      Signed-off-by: default avatarJonas Jonsson <jonas@ludd.ltu.se>
      Tested-by: default avatarDaniele Palmas <dnlplm@gmail.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      fc70c8a4
    • Jonas Jonsson's avatar
      USB: cdc_acm: Ignore Infineon Flash Loader utility · 753c4e20
      Jonas Jonsson authored
      [ Upstream commit f33a7f72
      
       ]
      
      Some modems, such as the Telit UE910, are using an Infineon Flash Loader
      utility. It has two interfaces, 2/2/0 (Abstract Modem) and 10/0/0 (CDC
      Data). The latter can be used as a serial interface to upgrade the
      firmware of the modem. However, that isn't possible when the cdc-acm
      driver takes control of the device.
      
      The following is an explanation of the behaviour by Daniele Palmas during
      discussion on linux-usb.
      
      "This is what happens when the device is turned on (without modifying
      the drivers):
      
      [155492.352031] usb 1-3: new high-speed USB device number 27 using ehci-pci
      [155492.485429] usb 1-3: config 1 interface 0 altsetting 0 endpoint 0x81 has an invalid bInterval 255, changing to 11
      [155492.485436] usb 1-3: New USB device found, idVendor=058b, idProduct=0041
      [155492.485439] usb 1-3: New USB device strings: Mfr=0, Product=0, SerialNumber=0
      [155492.485952] cdc_acm 1-3:1.0: ttyACM0: USB ACM device
      
      This is the flashing device that is caught by the cdc-acm driver. Once
      the ttyACM appears, the application starts sending a magic string
      (simple write on the file descriptor) to keep the device in flashing
      mode. If this magic string is not properly received in a certain time
      interval, the modem goes on in normal operative mode:
      
      [155493.748094] usb 1-3: USB disconnect, device number 27
      [155494.916025] usb 1-3: new high-speed USB device number 28 using ehci-pci
      [155495.059978] usb 1-3: New USB device found, idVendor=1bc7, idProduct=0021
      [155495.059983] usb 1-3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
      [155495.059986] usb 1-3: Product: 6 CDC-ACM + 1 CDC-ECM
      [155495.059989] usb 1-3: Manufacturer: Telit
      [155495.059992] usb 1-3: SerialNumber: 359658044004697
      [155495.138958] cdc_acm 1-3:1.0: ttyACM0: USB ACM device
      [155495.140832] cdc_acm 1-3:1.2: ttyACM1: USB ACM device
      [155495.142827] cdc_acm 1-3:1.4: ttyACM2: USB ACM device
      [155495.144462] cdc_acm 1-3:1.6: ttyACM3: USB ACM device
      [155495.145967] cdc_acm 1-3:1.8: ttyACM4: USB ACM device
      [155495.147588] cdc_acm 1-3:1.10: ttyACM5: USB ACM device
      [155495.154322] cdc_ether 1-3:1.12 wwan0: register 'cdc_ether' at usb-0000:00:1a.7-3, Mobile Broadband Network Device, 00:00:11:12:13:14
      
      Using the cdc-acm driver, the string, though being sent in the same way
      than using the usb-serial-simple driver (I can confirm that the data is
      passing properly since I used an hw usb sniffer), does not make the
      device to stay in flashing mode."
      
      Signed-off-by: default avatarJonas Jonsson <jonas@ludd.ltu.se>
      Tested-by: default avatarDaniele Palmas <dnlplm@gmail.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      753c4e20
    • Ilya Dryomov's avatar
      rbd: don't leak parent_spec in rbd_dev_probe_parent() · b51cf48b
      Ilya Dryomov authored
      [ Upstream commit 1f2c6651
      
       ]
      
      Currently we leak parent_spec and trigger a "parent reference
      underflow" warning if rbd_dev_create() in rbd_dev_probe_parent() fails.
      The problem is we take the !parent out_err branch and that only drops
      refcounts; parent_spec that would've been freed had we called
      rbd_dev_unparent() remains and triggers rbd_warn() in
      rbd_dev_parent_put() - at that point we have parent_spec != NULL and
      parent_ref == 0, so counter ends up being -1 after the decrement.
      
      Redo rbd_dev_probe_parent() to fix this.
      
      Cc: stable@vger.kernel.org # 3.10+, needs backporting for < 4.2
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      b51cf48b
    • Sasha Levin's avatar
      RDS: verify the underlying transport exists before creating a connection · e7b7ee7b
      Sasha Levin authored
      [ Upstream commit 74e98eb0
      
       ]
      
      There was no verification that an underlying transport exists when creating
      a connection, this would cause dereferencing a NULL ptr.
      
      It might happen on sockets that weren't properly bound before attempting to
      send a message, which will cause a NULL ptr deref:
      
      [135546.047719] kasan: GPF could be caused by NULL-ptr deref or user memory accessgeneral protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
      [135546.051270] Modules linked in:
      [135546.051781] CPU: 4 PID: 15650 Comm: trinity-c4 Not tainted 4.2.0-next-20150902-sasha-00041-gbaa1222-dirty #2527
      [135546.053217] task: ffff8800835bc000 ti: ffff8800bc708000 task.ti: ffff8800bc708000
      [135546.054291] RIP: __rds_conn_create (net/rds/connection.c:194)
      [135546.055666] RSP: 0018:ffff8800bc70fab0  EFLAGS: 00010202
      [135546.056457] RAX: dffffc0000000000 RBX: 0000000000000f2c RCX: ffff8800835bc000
      [135546.057494] RDX: 0000000000000007 RSI: ffff8800835bccd8 RDI: 0000000000000038
      [135546.058530] RBP: ffff8800bc70fb18 R08: 0000000000000001 R09: 0000000000000000
      [135546.059556] R10: ffffed014d7a3a23 R11: ffffed014d7a3a21 R12: 0000000000000000
      [135546.060614] R13: 0000000000000001 R14: ffff8801ec3d0000 R15: 0000000000000000
      [135546.061668] FS:  00007faad4ffb700(0000) GS:ffff880252000000(0000) knlGS:0000000000000000
      [135546.062836] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [135546.063682] CR2: 000000000000846a CR3: 000000009d137000 CR4: 00000000000006a0
      [135546.064723] Stack:
      [135546.065048]  ffffffffafe2055c ffffffffafe23fc1 ffffed00493097bf ffff8801ec3d0008
      [135546.066247]  0000000000000000 00000000000000d0 0000000000000000 ac194a24c0586342
      [135546.067438]  1ffff100178e1f78 ffff880320581b00 ffff8800bc70fdd0 ffff880320581b00
      [135546.068629] Call Trace:
      [135546.069028] ? __rds_conn_create (include/linux/rcupdate.h:856 net/rds/connection.c:134)
      [135546.069989] ? rds_message_copy_from_user (net/rds/message.c:298)
      [135546.071021] rds_conn_create_outgoing (net/rds/connection.c:278)
      [135546.071981] rds_sendmsg (net/rds/send.c:1058)
      [135546.072858] ? perf_trace_lock (include/trace/events/lock.h:38)
      [135546.073744] ? lockdep_init (kernel/locking/lockdep.c:3298)
      [135546.074577] ? rds_send_drop_to (net/rds/send.c:976)
      [135546.075508] ? __might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3795)
      [135546.076349] ? __might_fault (mm/memory.c:3795)
      [135546.077179] ? rds_send_drop_to (net/rds/send.c:976)
      [135546.078114] sock_sendmsg (net/socket.c:611 net/socket.c:620)
      [135546.078856] SYSC_sendto (net/socket.c:1657)
      [135546.079596] ? SYSC_connect (net/socket.c:1628)
      [135546.080510] ? trace_dump_stack (kernel/trace/trace.c:1926)
      [135546.081397] ? ring_buffer_unlock_commit (kernel/trace/ring_buffer.c:2479 kernel/trace/ring_buffer.c:2558 kernel/trace/ring_buffer.c:2674)
      [135546.082390] ? trace_buffer_unlock_commit (kernel/trace/trace.c:1749)
      [135546.083410] ? trace_event_raw_event_sys_enter (include/trace/events/syscalls.h:16)
      [135546.084481] ? do_audit_syscall_entry (include/trace/events/syscalls.h:16)
      [135546.085438] ? trace_buffer_unlock_commit (kernel/trace/trace.c:1749)
      [135546.085515] rds_ib_laddr_check(): addr 36.74.25.172 ret -99 node type -1
      
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      e7b7ee7b
    • Emmanuel Grumbach's avatar
      iwlwifi: bump firmware API for mvm devices to 12 · 680a7411
      Emmanuel Grumbach authored
      [ Upstream commit 91f491fd
      
       ]
      
      This allows 3160 / 7260 / 7265 / 7265D / 8000 devices to
      use the latest version of the firmware.
      
      Signed-off-by: default avatarEmmanuel Grumbach <emmanuel.grumbach@intel.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      680a7411
    • Emmanuel Grumbach's avatar
      iwlwifi: 7000: fix reported firmware name for 7265D · 2fef208e
      Emmanuel Grumbach authored
      [ Upstream commit a443f5e1
      
       ]
      
      We were advertising iwlwifi-7265-X.ucode instead of
      iwlwifi-7265D-X.ucode. Fix this.
      
      Signed-off-by: default avatarEmmanuel Grumbach <emmanuel.grumbach@intel.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      2fef208e
    • Lu, Han's avatar
      ALSA: hda/hdmi - apply Skylake fix-ups to Broxton display codec · 7f765fcd
      Lu, Han authored
      [ Upstream commit e2656412
      
       ]
      
      Broxton and Skylake have the same behavior on display audio. So this patch
      applys Skylake fix-ups to Broxton.
      
      Signed-off-by: default avatarLu, Han <han.lu@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      7f765fcd
  3. Jan 21, 2016
    • Arnd Bergmann's avatar
      ceph: fix message length computation · 27327566
      Arnd Bergmann authored
      [ Upstream commit 777d738a ]
      
      create_request_message() computes the maximum length of a message,
      but uses the wrong type for the time stamp: sizeof(struct timespec)
      may be 8 or 16 depending on the architecture, while sizeof(struct
      ceph_timespec) is always 8, and that is what gets put into the
      message.
      
      Found while auditing the uses of timespec for y2038 problems.
      
      Fixes: b8e69066
      
       ("ceph: include time stamp in every MDS request")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      27327566
    • Junxiao Bi's avatar
      ocfs2: fix umask ignored issue · 33fde5c2
      Junxiao Bi authored
      [ Upstream commit 8f1eb487 ]
      
      New created file's mode is not masked with umask, and this makes umask not
      work for ocfs2 volume.
      
      Fixes: 702e5bc6
      
       ("ocfs2: use generic posix ACL infrastructure")
      Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Gang He <ghe@suse.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      33fde5c2
    • Jeff Layton's avatar
      nfs: if we have no valid attrs, then don't declare the attribute cache valid · 2de7d462
      Jeff Layton authored
      [ Upstream commit c812012f
      
       ]
      
      If we pass in an empty nfs_fattr struct to nfs_update_inode, it will
      (correctly) not update any of the attributes, but it then clears the
      NFS_INO_INVALID_ATTR flag, which indicates that the attributes are
      up to date. Don't clear the flag if the fattr struct has no valid
      attrs to apply.
      
      Reviewed-by: default avatarSteve French <steve.french@primarydata.com>
      Signed-off-by: default avatarJeff Layton <jeff.layton@primarydata.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      2de7d462
    • Benjamin Coddington's avatar
      nfs4: start callback_ident at idr 1 · d48e82da
      Benjamin Coddington authored
      [ Upstream commit c68a027c
      
       ]
      
      If clp->cl_cb_ident is zero, then nfs_cb_idr_remove_locked() skips removing
      it when the nfs_client is freed.  A decoding or server bug can then find
      and try to put that first nfs_client which would lead to a crash.
      
      Signed-off-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Fixes: d6870312
      
       ("nfs4client: convert to idr_alloc()")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      d48e82da
    • Jeff Layton's avatar
      nfsd: serialize state seqid morphing operations · e46b3f45
      Jeff Layton authored
      [ Upstream commit 35a92fe8
      
       ]
      
      Andrew was seeing a race occur when an OPEN and OPEN_DOWNGRADE were
      running in parallel. The server would receive the OPEN_DOWNGRADE first
      and check its seqid, but then an OPEN would race in and bump it. The
      OPEN_DOWNGRADE would then complete and bump the seqid again.  The result
      was that the OPEN_DOWNGRADE would be applied after the OPEN, even though
      it should have been rejected since the seqid changed.
      
      The only recourse we have here I think is to serialize operations that
      bump the seqid in a stateid, particularly when we're given a seqid in
      the call. To address this, we add a new rw_semaphore to the
      nfs4_ol_stateid struct. We do a down_write prior to checking the seqid
      after looking up the stateid to ensure that nothing else is going to
      bump it while we're operating on it.
      
      In the case of OPEN, we do a down_read, as the call doesn't contain a
      seqid. Those can run in parallel -- we just need to serialize them when
      there is a concurrent OPEN_DOWNGRADE or CLOSE.
      
      LOCK and LOCKU however always take the write lock as there is no
      opportunity for parallelizing those.
      
      Reported-and-Tested-by: default avatarAndrew W Elble <aweits@rit.edu>
      Signed-off-by: default avatarJeff Layton <jeff.layton@primarydata.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      e46b3f45
    • Stefan Richter's avatar
      firewire: ohci: fix JMicron JMB38x IT context discovery · 1d985e68
      Stefan Richter authored
      [ Upstream commit 100ceb66
      
       ]
      
      Reported by Clifford and Craig for JMicron OHCI-1394 + SDHCI combo
      controllers:  Often or even most of the time, the controller is
      initialized with the message "added OHCI v1.10 device as card 0, 4 IR +
      0 IT contexts, quirks 0x10".  With 0 isochronous transmit DMA contexts
      (IT contexts), applications like audio output are impossible.
      
      However, OHCI-1394 demands that at least 4 IT contexts are implemented
      by the link layer controller, and indeed JMicron JMB38x do implement
      four of them.  Only their IsoXmitIntMask register is unreliable at early
      access.
      
      With my own JMB381 single function controller I found:
        - I can reproduce the problem with a lower probability than Craig's.
        - If I put a loop around the section which clears and reads
          IsoXmitIntMask, then either the first or the second attempt will
          return the correct initial mask of 0x0000000f.  I never encountered
          a case of needing more than a second attempt.
        - Consequently, if I put a dummy reg_read(...IsoXmitIntMaskSet)
          before the first write, the subsequent read will return the correct
          result.
        - If I merely ignore a wrong read result and force the known real
          result, later isochronous transmit DMA usage works just fine.
      
      So let's just fix this chip bug up by the latter method.  Tested with
      JMB381 on kernel 3.13 and 4.3.
      
      Since OHCI-1394 generally requires 4 IT contexts at a minium, this
      workaround is simply applied whenever the initial read of IsoXmitIntMask
      returns 0, regardless whether it's a JMicron chip or not.  I never heard
      of this issue together with any other chip though.
      
      I am not 100% sure that this fix works on the OHCI-1394 part of JMB380
      and JMB388 combo controllers exactly the same as on the JMB381 single-
      function controller, but so far I haven't had a chance to let an owner
      of a combo chip run a patched kernel.
      
      Strangely enough, IsoRecvIntMask is always reported correctly, even
      though it is probed right before IsoXmitIntMask.
      
      Reported-by: Clifford Dunn
      Reported-by: default avatarCraig Moore <craig.moore@qenos.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarStefan Richter <stefanr@s5r6.in-berlin.de>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      1d985e68
    • Daeho Jeong's avatar
      ext4, jbd2: ensure entering into panic after recording an error in superblock · 8fecc1e2
      Daeho Jeong authored
      [ Upstream commit 4327ba52
      
       ]
      
      If a EXT4 filesystem utilizes JBD2 journaling and an error occurs, the
      journaling will be aborted first and the error number will be recorded
      into JBD2 superblock and, finally, the system will enter into the
      panic state in "errors=panic" option.  But, in the rare case, this
      sequence is little twisted like the below figure and it will happen
      that the system enters into panic state, which means the system reset
      in mobile environment, before completion of recording an error in the
      journal superblock. In this case, e2fsck cannot recognize that the
      filesystem failure occurred in the previous run and the corruption
      wouldn't be fixed.
      
      Task A                        Task B
      ext4_handle_error()
      -> jbd2_journal_abort()
        -> __journal_abort_soft()
          -> __jbd2_journal_abort_hard()
          | -> journal->j_flags |= JBD2_ABORT;
          |
          |                         __ext4_abort()
          |                         -> jbd2_journal_abort()
          |                         | -> __journal_abort_soft()
          |                         |   -> if (journal->j_flags & JBD2_ABORT)
          |                         |           return;
          |                         -> panic()
          |
          -> jbd2_journal_update_sb_errno()
      
      Tested-by: default avatarHobin Woo <hobin.woo@samsung.com>
      Signed-off-by: default avatarDaeho Jeong <daeho.jeong@samsung.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      8fecc1e2
    • Ilya Dryomov's avatar
      rbd: don't put snap_context twice in rbd_queue_workfn() · ebf6b532
      Ilya Dryomov authored
      [ Upstream commit 70b16db8 ]
      
      Commit 4e752f0a
      
       ("rbd: access snapshot context and mapping size
      safely") moved ceph_get_snap_context() out of rbd_img_request_create()
      and into rbd_queue_workfn(), adding a ceph_put_snap_context() to the
      error path in rbd_queue_workfn().  However, rbd_img_request_create()
      consumes a ref on snapc, so calling ceph_put_snap_context() after
      a successful rbd_img_request_create() leads to an extra put.  Fix it.
      
      Cc: stable@vger.kernel.org # 3.18+
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarJosh Durgin <jdurgin@redhat.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      ebf6b532
    • Filipe Manana's avatar
      Btrfs: fix race when listing an inode's xattrs · c0cdea62
      Filipe Manana authored
      [ Upstream commit f1cd1f0b
      
       ]
      
      When listing a inode's xattrs we have a time window where we race against
      a concurrent operation for adding a new hard link for our inode that makes
      us not return any xattr to user space. In order for this to happen, the
      first xattr of our inode needs to be at slot 0 of a leaf and the previous
      leaf must still have room for an inode ref (or extref) item, and this can
      happen because an inode's listxattrs callback does not lock the inode's
      i_mutex (nor does the VFS does it for us), but adding a hard link to an
      inode makes the VFS lock the inode's i_mutex before calling the inode's
      link callback.
      
      If we have the following leafs:
      
                     Leaf X (has N items)                    Leaf Y
      
       [ ... (257 INODE_ITEM 0) (257 INODE_REF 256) ]  [ (257 XATTR_ITEM 12345), ... ]
                 slot N - 2         slot N - 1              slot 0
      
      The race illustrated by the following sequence diagram is possible:
      
             CPU 1                                               CPU 2
      
        btrfs_listxattr()
      
          searches for key (257 XATTR_ITEM 0)
      
          gets path with path->nodes[0] == leaf X
          and path->slots[0] == N
      
          because path->slots[0] is >=
          btrfs_header_nritems(leaf X), it calls
          btrfs_next_leaf()
      
          btrfs_next_leaf()
            releases the path
      
                                                         adds key (257 INODE_REF 666)
                                                         to the end of leaf X (slot N),
                                                         and leaf X now has N + 1 items
      
            searches for the key (257 INODE_REF 256),
            with path->keep_locks == 1, because that
            is the last key it saw in leaf X before
            releasing the path
      
            ends up at leaf X again and it verifies
            that the key (257 INODE_REF 256) is no
            longer the last key in leaf X, so it
            returns with path->nodes[0] == leaf X
            and path->slots[0] == N, pointing to
            the new item with key (257 INODE_REF 666)
      
          btrfs_listxattr's loop iteration sees that
          the type of the key pointed by the path is
          different from the type BTRFS_XATTR_ITEM_KEY
          and so it breaks the loop and stops looking
          for more xattr items
            --> the application doesn't get any xattr
                listed for our inode
      
      So fix this by breaking the loop only if the key's type is greater than
      BTRFS_XATTR_ITEM_KEY and skip the current key if its type is smaller.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      c0cdea62
    • Filipe Manana's avatar
      Btrfs: fix race leading to BUG_ON when running delalloc for nodatacow · 938165ad
      Filipe Manana authored
      [ Upstream commit 1d512cb7
      
       ]
      
      If we are using the NO_HOLES feature, we have a tiny time window when
      running delalloc for a nodatacow inode where we can race with a concurrent
      link or xattr add operation leading to a BUG_ON.
      
      This happens because at run_delalloc_nocow() we end up casting a leaf item
      of type BTRFS_INODE_[REF|EXTREF]_KEY or of type BTRFS_XATTR_ITEM_KEY to a
      file extent item (struct btrfs_file_extent_item) and then analyse its
      extent type field, which won't match any of the expected extent types
      (values BTRFS_FILE_EXTENT_[REG|PREALLOC|INLINE]) and therefore trigger an
      explicit BUG_ON(1).
      
      The following sequence diagram shows how the race happens when running a
      no-cow dellaloc range [4K, 8K[ for inode 257 and we have the following
      neighbour leafs:
      
                   Leaf X (has N items)                    Leaf Y
      
       [ ... (257 INODE_ITEM 0) (257 INODE_REF 256) ]  [ (257 EXTENT_DATA 8192), ... ]
                    slot N - 2         slot N - 1              slot 0
      
       (Note the implicit hole for inode 257 regarding the [0, 8K[ range)
      
             CPU 1                                         CPU 2
      
       run_dealloc_nocow()
         btrfs_lookup_file_extent()
           --> searches for a key with value
               (257 EXTENT_DATA 4096) in the
               fs/subvol tree
           --> returns us a path with
               path->nodes[0] == leaf X and
               path->slots[0] == N
      
         because path->slots[0] is >=
         btrfs_header_nritems(leaf X), it
         calls btrfs_next_leaf()
      
         btrfs_next_leaf()
           --> releases the path
      
                                                    hard link added to our inode,
                                                    with key (257 INODE_REF 500)
                                                    added to the end of leaf X,
                                                    so leaf X now has N + 1 keys
      
           --> searches for the key
               (257 INODE_REF 256), because
               it was the last key in leaf X
               before it released the path,
               with path->keep_locks set to 1
      
           --> ends up at leaf X again and
               it verifies that the key
               (257 INODE_REF 256) is no longer
               the last key in the leaf, so it
               returns with path->nodes[0] ==
               leaf X and path->slots[0] == N,
               pointing to the new item with
               key (257 INODE_REF 500)
      
         the loop iteration of run_dealloc_nocow()
         does not break out the loop and continues
         because the key referenced in the path
         at path->nodes[0] and path->slots[0] is
         for inode 257, its type is < BTRFS_EXTENT_DATA_KEY
         and its offset (500) is less then our delalloc
         range's end (8192)
      
         the item pointed by the path, an inode reference item,
         is (incorrectly) interpreted as a file extent item and
         we get an invalid extent type, leading to the BUG_ON(1):
      
         if (extent_type == BTRFS_FILE_EXTENT_REG ||
            extent_type == BTRFS_FILE_EXTENT_PREALLOC) {
             (...)
         } else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
             (...)
         } else {
             BUG_ON(1)
         }
      
      The same can happen if a xattr is added concurrently and ends up having
      a key with an offset smaller then the delalloc's range end.
      
      So fix this by skipping keys with a type smaller than
      BTRFS_EXTENT_DATA_KEY.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      938165ad
    • Filipe Manana's avatar
      Btrfs: fix race leading to incorrect item deletion when dropping extents · c03844ae
      Filipe Manana authored
      [ Upstream commit aeafbf84
      
       ]
      
      While running a stress test I got the following warning triggered:
      
        [191627.672810] ------------[ cut here ]------------
        [191627.673949] WARNING: CPU: 8 PID: 8447 at fs/btrfs/file.c:779 __btrfs_drop_extents+0x391/0xa50 [btrfs]()
        (...)
        [191627.701485] Call Trace:
        [191627.702037]  [<ffffffff8145f077>] dump_stack+0x4f/0x7b
        [191627.702992]  [<ffffffff81095de5>] ? console_unlock+0x356/0x3a2
        [191627.704091]  [<ffffffff8104b3b0>] warn_slowpath_common+0xa1/0xbb
        [191627.705380]  [<ffffffffa0664499>] ? __btrfs_drop_extents+0x391/0xa50 [btrfs]
        [191627.706637]  [<ffffffff8104b46d>] warn_slowpath_null+0x1a/0x1c
        [191627.707789]  [<ffffffffa0664499>] __btrfs_drop_extents+0x391/0xa50 [btrfs]
        [191627.709155]  [<ffffffff8115663c>] ? cache_alloc_debugcheck_after.isra.32+0x171/0x1d0
        [191627.712444]  [<ffffffff81155007>] ? kmemleak_alloc_recursive.constprop.40+0x16/0x18
        [191627.714162]  [<ffffffffa06570c9>] insert_reserved_file_extent.constprop.40+0x83/0x24e [btrfs]
        [191627.715887]  [<ffffffffa065422b>] ? start_transaction+0x3bb/0x610 [btrfs]
        [191627.717287]  [<ffffffffa065b604>] btrfs_finish_ordered_io+0x273/0x4e2 [btrfs]
        [191627.728865]  [<ffffffffa065b888>] finish_ordered_fn+0x15/0x17 [btrfs]
        [191627.730045]  [<ffffffffa067d688>] normal_work_helper+0x14c/0x32c [btrfs]
        [191627.731256]  [<ffffffffa067d96a>] btrfs_endio_write_helper+0x12/0x14 [btrfs]
        [191627.732661]  [<ffffffff81061119>] process_one_work+0x24c/0x4ae
        [191627.733822]  [<ffffffff810615b0>] worker_thread+0x206/0x2c2
        [191627.734857]  [<ffffffff810613aa>] ? process_scheduled_works+0x2f/0x2f
        [191627.736052]  [<ffffffff810613aa>] ? process_scheduled_works+0x2f/0x2f
        [191627.737349]  [<ffffffff810669a6>] kthread+0xef/0xf7
        [191627.738267]  [<ffffffff810f3b3a>] ? time_hardirqs_on+0x15/0x28
        [191627.739330]  [<ffffffff810668b7>] ? __kthread_parkme+0xad/0xad
        [191627.741976]  [<ffffffff81465592>] ret_from_fork+0x42/0x70
        [191627.743080]  [<ffffffff810668b7>] ? __kthread_parkme+0xad/0xad
        [191627.744206] ---[ end trace bbfddacb7aaada8d ]---
      
        $ cat -n fs/btrfs/file.c
        691  int __btrfs_drop_extents(struct btrfs_trans_handle *trans,
        (...)
        758                  btrfs_item_key_to_cpu(leaf, &key, path->slots[0]);
        759                  if (key.objectid > ino ||
        760                      key.type > BTRFS_EXTENT_DATA_KEY || key.offset >= end)
        761                          break;
        762
        763                  fi = btrfs_item_ptr(leaf, path->slots[0],
        764                                      struct btrfs_file_extent_item);
        765                  extent_type = btrfs_file_extent_type(leaf, fi);
        766
        767                  if (extent_type == BTRFS_FILE_EXTENT_REG ||
        768                      extent_type == BTRFS_FILE_EXTENT_PREALLOC) {
        (...)
        774                  } else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
        (...)
        778                  } else {
        779                          WARN_ON(1);
        780                          extent_end = search_start;
        781                  }
        (...)
      
      This happened because the item we were processing did not match a file
      extent item (its key type != BTRFS_EXTENT_DATA_KEY), and even on this
      case we cast the item to a struct btrfs_file_extent_item pointer and
      then find a type field value that does not match any of the expected
      values (BTRFS_FILE_EXTENT_[REG|PREALLOC|INLINE]). This scenario happens
      due to a tiny time window where a race can happen as exemplified below.
      For example, consider the following scenario where we're using the
      NO_HOLES feature and we have the following two neighbour leafs:
      
                     Leaf X (has N items)                    Leaf Y
      
      [ ... (257 INODE_ITEM 0) (257 INODE_REF 256) ]  [ (257 EXTENT_DATA 8192), ... ]
                slot N - 2         slot N - 1              slot 0
      
      Our inode 257 has an implicit hole in the range [0, 8K[ (implicit rather
      than explicit because NO_HOLES is enabled). Now if our inode has an
      ordered extent for the range [4K, 8K[ that is finishing, the following
      can happen:
      
                CPU 1                                       CPU 2
      
        btrfs_finish_ordered_io()
          insert_reserved_file_extent()
            __btrfs_drop_extents()
               Searches for the key
                (257 EXTENT_DATA 4096) through
                btrfs_lookup_file_extent()
      
               Key not found and we get a path where
               path->nodes[0] == leaf X and
               path->slots[0] == N
      
               Because path->slots[0] is >=
               btrfs_header_nritems(leaf X), we call
               btrfs_next_leaf()
      
               btrfs_next_leaf() releases the path
      
                                                        inserts key
                                                        (257 INODE_REF 4096)
                                                        at the end of leaf X,
                                                        leaf X now has N + 1 keys,
                                                        and the new key is at
                                                        slot N
      
               btrfs_next_leaf() searches for
               key (257 INODE_REF 256), with
               path->keep_locks set to 1,
               because it was the last key it
               saw in leaf X
      
                 finds it in leaf X again and
                 notices it's no longer the last
                 key of the leaf, so it returns 0
                 with path->nodes[0] == leaf X and
                 path->slots[0] == N (which is now
                 < btrfs_header_nritems(leaf X)),
                 pointing to the new key
                 (257 INODE_REF 4096)
      
               __btrfs_drop_extents() casts the
               item at path->nodes[0], slot
               path->slots[0], to a struct
               btrfs_file_extent_item - it does
               not skip keys for the target
               inode with a type less than
               BTRFS_EXTENT_DATA_KEY
               (BTRFS_INODE_REF_KEY < BTRFS_EXTENT_DATA_KEY)
      
               sees a bogus value for the type
               field triggering the WARN_ON in
               the trace shown above, and sets
               extent_end = search_start (4096)
      
               does the if-then-else logic to
               fixup 0 length extent items created
               by a past bug from hole punching:
      
                 if (extent_end == key.offset &&
                     extent_end >= search_start)
                     goto delete_extent_item;
      
               that evaluates to true and it ends
               up deleting the key pointed to by
               path->slots[0], (257 INODE_REF 4096),
               from leaf X
      
      The same could happen for example for a xattr that ends up having a key
      with an offset value that matches search_start (very unlikely but not
      impossible).
      
      So fix this by ensuring that keys smaller than BTRFS_EXTENT_DATA_KEY are
      skipped, never casted to struct btrfs_file_extent_item and never deleted
      by accident. Also protect against the unexpected case of getting a key
      for a lower inode number by skipping that key and issuing a warning.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      c03844ae
    • Filipe Manana's avatar
      Btrfs: fix file corruption and data loss after cloning inline extents · d7ca88a4
      Filipe Manana authored
      [ Upstream commit 8039d87d
      
       ]
      
      Currently the clone ioctl allows to clone an inline extent from one file
      to another that already has other (non-inlined) extents. This is a problem
      because btrfs is not designed to deal with files having inline and regular
      extents, if a file has an inline extent then it must be the only extent
      in the file and must start at file offset 0. Having a file with an inline
      extent followed by regular extents results in EIO errors when doing reads
      or writes against the first 4K of the file.
      
      Also, the clone ioctl allows one to lose data if the source file consists
      of a single inline extent, with a size of N bytes, and the destination
      file consists of a single inline extent with a size of M bytes, where we
      have M > N. In this case the clone operation removes the inline extent
      from the destination file and then copies the inline extent from the
      source file into the destination file - we lose the M - N bytes from the
      destination file, a read operation will get the value 0x00 for any bytes
      in the the range [N, M] (the destination inode's i_size remained as M,
      that's why we can read past N bytes).
      
      So fix this by not allowing such destructive operations to happen and
      return errno EOPNOTSUPP to user space.
      
      Currently the fstest btrfs/035 tests the data loss case but it totally
      ignores this - i.e. expects the operation to succeed and does not check
      the we got data loss.
      
      The following test case for fstests exercises all these cases that result
      in file corruption and data loss:
      
        seq=`basename $0`
        seqres=$RESULT_DIR/$seq
        echo "QA output created by $seq"
        tmp=/tmp/$$
        status=1	# failure is the default!
        trap "_cleanup; exit \$status" 0 1 2 3 15
      
        _cleanup()
        {
            rm -f $tmp.*
        }
      
        # get standard environment, filters and checks
        . ./common/rc
        . ./common/filter
      
        # real QA test starts here
        _need_to_be_root
        _supported_fs btrfs
        _supported_os Linux
        _require_scratch
        _require_cloner
        _require_btrfs_fs_feature "no_holes"
        _require_btrfs_mkfs_feature "no-holes"
      
        rm -f $seqres.full
      
        test_cloning_inline_extents()
        {
            local mkfs_opts=$1
            local mount_opts=$2
      
            _scratch_mkfs $mkfs_opts >>$seqres.full 2>&1
            _scratch_mount $mount_opts
      
            # File bar, the source for all the following clone operations, consists
            # of a single inline extent (50 bytes).
            $XFS_IO_PROG -f -c "pwrite -S 0xbb 0 50" $SCRATCH_MNT/bar \
                | _filter_xfs_io
      
            # Test cloning into a file with an extent (non-inlined) where the
            # destination offset overlaps that extent. It should not be possible to
            # clone the inline extent from file bar into this file.
            $XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 16K" $SCRATCH_MNT/foo \
                | _filter_xfs_io
            $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo
      
            # Doing IO against any range in the first 4K of the file should work.
            # Due to a past clone ioctl bug which allowed cloning the inline extent,
            # these operations resulted in EIO errors.
            echo "File foo data after clone operation:"
            # All bytes should have the value 0xaa (clone operation failed and did
            # not modify our file).
            od -t x1 $SCRATCH_MNT/foo
            $XFS_IO_PROG -c "pwrite -S 0xcc 0 100" $SCRATCH_MNT/foo | _filter_xfs_io
      
            # Test cloning the inline extent against a file which has a hole in its
            # first 4K followed by a non-inlined extent. It should not be possible
            # as well to clone the inline extent from file bar into this file.
            $XFS_IO_PROG -f -c "pwrite -S 0xdd 4K 12K" $SCRATCH_MNT/foo2 \
                | _filter_xfs_io
            $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo2
      
            # Doing IO against any range in the first 4K of the file should work.
            # Due to a past clone ioctl bug which allowed cloning the inline extent,
            # these operations resulted in EIO errors.
            echo "File foo2 data after clone operation:"
            # All bytes should have the value 0x00 (clone operation failed and did
            # not modify our file).
            od -t x1 $SCRATCH_MNT/foo2
            $XFS_IO_PROG -c "pwrite -S 0xee 0 90" $SCRATCH_MNT/foo2 | _filter_xfs_io
      
            # Test cloning the inline extent against a file which has a size of zero
            # but has a prealloc extent. It should not be possible as well to clone
            # the inline extent from file bar into this file.
            $XFS_IO_PROG -f -c "falloc -k 0 1M" $SCRATCH_MNT/foo3 | _filter_xfs_io
            $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo3
      
            # Doing IO against any range in the first 4K of the file should work.
            # Due to a past clone ioctl bug which allowed cloning the inline extent,
            # these operations resulted in EIO errors.
            echo "First 50 bytes of foo3 after clone operation:"
            # Should not be able to read any bytes, file has 0 bytes i_size (the
            # clone operation failed and did not modify our file).
            od -t x1 $SCRATCH_MNT/foo3
            $XFS_IO_PROG -c "pwrite -S 0xff 0 90" $SCRATCH_MNT/foo3 | _filter_xfs_io
      
            # Test cloning the inline extent against a file which consists of a
            # single inline extent that has a size not greater than the size of
            # bar's inline extent (40 < 50).
            # It should be possible to do the extent cloning from bar to this file.
            $XFS_IO_PROG -f -c "pwrite -S 0x01 0 40" $SCRATCH_MNT/foo4 \
                | _filter_xfs_io
            $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo4
      
            # Doing IO against any range in the first 4K of the file should work.
            echo "File foo4 data after clone operation:"
            # Must match file bar's content.
            od -t x1 $SCRATCH_MNT/foo4
            $XFS_IO_PROG -c "pwrite -S 0x02 0 90" $SCRATCH_MNT/foo4 | _filter_xfs_io
      
            # Test cloning the inline extent against a file which consists of a
            # single inline extent that has a size greater than the size of bar's
            # inline extent (60 > 50).
            # It should not be possible to clone the inline extent from file bar
            # into this file.
            $XFS_IO_PROG -f -c "pwrite -S 0x03 0 60" $SCRATCH_MNT/foo5 \
                | _filter_xfs_io
            $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo5
      
            # Reading the file should not fail.
            echo "File foo5 data after clone operation:"
            # Must have a size of 60 bytes, with all bytes having a value of 0x03
            # (the clone operation failed and did not modify our file).
            od -t x1 $SCRATCH_MNT/foo5
      
            # Test cloning the inline extent against a file which has no extents but
            # has a size greater than bar's inline extent (16K > 50).
            # It should not be possible to clone the inline extent from file bar
            # into this file.
            $XFS_IO_PROG -f -c "truncate 16K" $SCRATCH_MNT/foo6 | _filter_xfs_io
            $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo6
      
            # Reading the file should not fail.
            echo "File foo6 data after clone operation:"
            # Must have a size of 16K, with all bytes having a value of 0x00 (the
            # clone operation failed and did not modify our file).
            od -t x1 $SCRATCH_MNT/foo6
      
            # Test cloning the inline extent against a file which has no extents but
            # has a size not greater than bar's inline extent (30 < 50).
            # It should be possible to clone the inline extent from file bar into
            # this file.
            $XFS_IO_PROG -f -c "truncate 30" $SCRATCH_MNT/foo7 | _filter_xfs_io
            $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo7
      
            # Reading the file should not fail.
            echo "File foo7 data after clone operation:"
            # Must have a size of 50 bytes, with all bytes having a value of 0xbb.
            od -t x1 $SCRATCH_MNT/foo7
      
            # Test cloning the inline extent against a file which has a size not
            # greater than the size of bar's inline extent (20 < 50) but has
            # a prealloc extent that goes beyond the file's size. It should not be
            # possible to clone the inline extent from bar into this file.
            $XFS_IO_PROG -f -c "falloc -k 0 1M" \
                            -c "pwrite -S 0x88 0 20" \
                            $SCRATCH_MNT/foo8 | _filter_xfs_io
            $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo8
      
            echo "File foo8 data after clone operation:"
            # Must have a size of 20 bytes, with all bytes having a value of 0x88
            # (the clone operation did not modify our file).
            od -t x1 $SCRATCH_MNT/foo8
      
            _scratch_unmount
        }
      
        echo -e "\nTesting without compression and without the no-holes feature...\n"
        test_cloning_inline_extents
      
        echo -e "\nTesting with compression and without the no-holes feature...\n"
        test_cloning_inline_extents "" "-o compress"
      
        echo -e "\nTesting without compression and with the no-holes feature...\n"
        test_cloning_inline_extents "-O no-holes" ""
      
        echo -e "\nTesting with compression and with the no-holes feature...\n"
        test_cloning_inline_extents "-O no-holes" "-o compress"
      
        status=0
        exit
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      d7ca88a4
    • Quentin Casasnovas's avatar
      RDS: fix race condition when sending a message on unbound socket · 96c7b10c
      Quentin Casasnovas authored
      [ Upstream commit 8c7188b2 ]
      
      Sasha's found a NULL pointer dereference in the RDS connection code when
      sending a message to an apparently unbound socket.  The problem is caused
      by the code checking if the socket is bound in rds_sendmsg(), which checks
      the rs_bound_addr field without taking a lock on the socket.  This opens a
      race where rs_bound_addr is temporarily set but where the transport is not
      in rds_bind(), leading to a NULL pointer dereference when trying to
      dereference 'trans' in __rds_conn_create().
      
      Vegard wrote a reproducer for this issue, so kindly ask him to share if
      you're interested.
      
      I cannot reproduce the NULL pointer dereference using Vegard's reproducer
      with this patch, whereas I could without.
      
      Complete earlier incomplete fix to CVE-2015-6937:
      
        74e98eb0
      
       ("RDS: verify the underlying transport exists before creating a connection")
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: stable@vger.kernel.org
      
      Reviewed-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Reviewed-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarQuentin Casasnovas <quentin.casasnovas@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      96c7b10c
  4. Jan 16, 2016
    • Rainer Weikusat's avatar
      unix: avoid use-after-free in ep_remove_wait_queue · 72032798
      Rainer Weikusat authored
      [ Upstream commit 7d267278
      
       ]
      
      Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:
      An AF_UNIX datagram socket being the client in an n:1 association with
      some server socket is only allowed to send messages to the server if the
      receive queue of this socket contains at most sk_max_ack_backlog
      datagrams. This implies that prospective writers might be forced to go
      to sleep despite none of the message presently enqueued on the server
      receive queue were sent by them. In order to ensure that these will be
      woken up once space becomes again available, the present unix_dgram_poll
      routine does a second sock_poll_wait call with the peer_wait wait queue
      of the server socket as queue argument (unix_dgram_recvmsg does a wake
      up on this queue after a datagram was received). This is inherently
      problematic because the server socket is only guaranteed to remain alive
      for as long as the client still holds a reference to it. In case the
      connection is dissolved via connect or by the dead peer detection logic
      in unix_dgram_sendmsg, the server socket may be freed despite "the
      polling mechanism" (in particular, epoll) still has a pointer to the
      corresponding peer_wait queue. There's no way to forcibly deregister a
      wait queue with epoll.
      
      Based on an idea by Jason Baron, the patch below changes the code such
      that a wait_queue_t belonging to the client socket is enqueued on the
      peer_wait queue of the server whenever the peer receive queue full
      condition is detected by either a sendmsg or a poll. A wake up on the
      peer queue is then relayed to the ordinary wait queue of the client
      socket via wake function. The connection to the peer wait queue is again
      dissolved if either a wake up is about to be relayed or the client
      socket reconnects or a dead peer is detected or the client socket is
      itself closed. This enables removing the second sock_poll_wait from
      unix_dgram_poll, thus avoiding the use-after-free, while still ensuring
      that no blocked writer sleeps forever.
      
      Signed-off-by: default avatarRainer Weikusat <rweikusat@mobileactivedefense.com>
      Fixes: ec0d215f
      
       ("af_unix: fix 'poll for write'/connected DGRAM sockets")
      Reviewed-by: default avatarJason Baron <jbaron@akamai.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      72032798
    • Rainer Weikusat's avatar
      af_unix: Revert 'lock_interruptible' in stream receive code · 2950f7c8
      Rainer Weikusat authored
      [ Upstream commit 3822b5c2 ]
      
      With b3ca9b02
      
      , the AF_UNIX SOCK_STREAM
      receive code was changed from using mutex_lock(&u->readlock) to
      mutex_lock_interruptible(&u->readlock) to prevent signals from being
      delayed for an indefinite time if a thread sleeping on the mutex
      happened to be selected for handling the signal. But this was never a
      problem with the stream receive code (as opposed to its datagram
      counterpart) as that never went to sleep waiting for new messages with the
      mutex held and thus, wouldn't cause secondary readers to block on the
      mutex waiting for the sleeping primary reader. As the interruptible
      locking makes the code more complicated in exchange for no benefit,
      change it back to using mutex_lock.
      
      Signed-off-by: default avatarRainer Weikusat <rweikusat@mobileactivedefense.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      2950f7c8
    • Hannes Frederic Sowa's avatar
      fou: clean up socket with kfree_rcu · 49d0edfd
      Hannes Frederic Sowa authored
      [ Upstream commit 3036facb ]
      
      fou->udp_offloads is managed by RCU. As it is actually included inside
      the fou sockets, we cannot let the memory go out of scope before a grace
      period. We either can synchronize_rcu or switch over to kfree_rcu to
      manage the sockets. kfree_rcu seems appropriate as it is used by vxlan
      and geneve.
      
      Fixes: 23461551
      
       ("fou: Support for foo-over-udp RX path")
      Cc: Tom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      49d0edfd
    • David S. Miller's avatar
      56619856
    • WANG Cong's avatar
      652ed6f6
    • Vlad Yasevich's avatar
      skbuff: Fix offset error in skb_reorder_vlan_header · eb2c241e
      Vlad Yasevich authored
      [ Upstream commit f6548615 ]
      
      skb_reorder_vlan_header is called after the vlan header has
      been pulled.  As a result the offset of the begining of
      the mac header has been incrased by 4 bytes (VLAN_HLEN).
      When moving the mac addresses, include this incrase in
      the offset calcualation so that the mac addresses are
      copied correctly.
      
      Fixes: a6e18ff1
      
       (vlan: Fix untag operations of stacked vlans with REORDER_HEADER off)
      CC: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      CC: Patrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarVladislav Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      eb2c241e
    • Vlad Yasevich's avatar
      vlan: Fix untag operations of stacked vlans with REORDER_HEADER off · 6ff6d3c9
      Vlad Yasevich authored
      [ Upstream commit a6e18ff1
      
       ]
      
      When we have multiple stacked vlan devices all of which have
      turned off REORDER_HEADER flag, the untag operation does not
      locate the ethernet addresses correctly for nested vlans.
      The reason is that in case of REORDER_HEADER flag being off,
      the outer vlan headers are put back and the mac_len is adjusted
      to account for the presense of the header.  Then, the subsequent
      untag operation, for the next level vlan, always use VLAN_ETH_HLEN
      to locate the begining of the ethernet header and that ends up
      being a multiple of 4 bytes short of the actuall beginning
      of the mac header (the multiple depending on the how many vlan
      encapsulations ethere are).
      
      As a reslult, if there are multiple levles of vlan devices
      with REODER_HEADER being off, the recevied packets end up
      being dropped.
      
      To solve this, we use skb->mac_len as the offset.  The value
      is always set on receive path and starts out as a ETH_HLEN.
      The value is also updated when the vlan header manupations occur
      so we know it will be correct.
      
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      6ff6d3c9
    • Eric Dumazet's avatar
      net: fix IP early demux races · 54b6eaa3
      Eric Dumazet authored
      [ Upstream commit 5037e9ef
      
       ]
      
      David Wilder reported crashes caused by dst reuse.
      
      <quote David>
        I am seeing a crash on a distro V4.2.3 kernel caused by a double
        release of a dst_entry.  In ipv4_dst_destroy() the call to
        list_empty() finds a poisoned next pointer, indicating the dst_entry
        has already been removed from the list and freed. The crash occurs
        18 to 24 hours into a run of a network stress exerciser.
      </quote>
      
      Thanks to his detailed report and analysis, we were able to understand
      the core issue.
      
      IP early demux can associate a dst to skb, after a lookup in TCP/UDP
      sockets.
      
      When socket cache is not properly set, we want to store into
      sk->sk_dst_cache the dst for future IP early demux lookups,
      by acquiring a stable refcount on the dst.
      
      Problem is this acquisition is simply using an atomic_inc(),
      which works well, unless the dst was queued for destruction from
      dst_release() noticing dst refcount went to zero, if DST_NOCACHE
      was set on dst.
      
      We need to make sure current refcount is not zero before incrementing
      it, or risk double free as David reported.
      
      This patch, being a stable candidate, adds two new helpers, and use
      them only from IP early demux problematic paths.
      
      It might be possible to merge in net-next skb_dst_force() and
      skb_dst_force_safe(), but I prefer having the smallest patch for stable
      kernels : Maybe some skb_dst_force() callers do not expect skb->dst
      can suddenly be cleared.
      
      Can probably be backported back to linux-3.6 kernels
      
      Reported-by: default avatarDavid J. Wilder <dwilder@us.ibm.com>
      Tested-by: default avatarDavid J. Wilder <dwilder@us.ibm.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      54b6eaa3