Skip to content
  1. Sep 13, 2015
    • Greg Kroah-Hartman's avatar
      Linux 3.14.52 · 48f8f36a
      Greg Kroah-Hartman authored
      v3.14.52
      48f8f36a
    • Marc Zyngier's avatar
      arm64: KVM: Fix host crash when injecting a fault into a 32bit guest · cd677748
      Marc Zyngier authored
      commit 126c69a0
      
       upstream.
      
      When injecting a fault into a misbehaving 32bit guest, it seems
      rather idiotic to also inject a 64bit fault that is only going
      to corrupt the guest state. This leads to a situation where we
      perform an illegal exception return at EL2 causing the host
      to crash instead of killing the guest.
      
      Just fix the stupid bug that has been there from day 1.
      
      Reported-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Tested-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cd677748
    • Alan Stern's avatar
      SCSI: Fix NULL pointer dereference in runtime PM · 6a353c14
      Alan Stern authored
      commit 49718f0f
      
       upstream.
      
      The routines in scsi_rpm.c assume that if a runtime-PM callback is
      invoked for a SCSI device, it can only mean that the device's driver
      has asked the block layer to handle the runtime power management (by
      calling blk_pm_runtime_init(), which among other things sets q->dev).
      
      However, this assumption turns out to be wrong for things like the ses
      driver.  Normally ses devices are not allowed to do runtime PM, but
      userspace can override this setting.  If this happens, the kernel gets
      a NULL pointer dereference when blk_post_runtime_resume() tries to use
      the uninitialized q->dev pointer.
      
      This patch fixes the problem by calling the block layer's runtime-PM
      routines only if the device's driver really does have a runtime-PM
      callback routine.  Since ses doesn't define any such callbacks, the
      crash won't occur.
      
      This fixes Bugzilla #101371.
      
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Reported-by: default avatarStanisław Pitucha <viraptor@gmail.com>
      Reported-by: default avatarIlan Cohen <ilanco@gmail.com>
      Tested-by: default avatarIlan Cohen <ilanco@gmail.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarJames Bottomley <JBottomley@Odin.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6a353c14
    • Yann Droneaud's avatar
      arm64/mm: Remove hack in mmap randomize layout · 8cb907d7
      Yann Droneaud authored
      commit d6c763af upstream.
      
      Since commit 8a0a9bd4 ('random: make get_random_int() more
      random'), get_random_int() returns a random value for each call,
      so comment and hack introduced in mmap_rnd() as part of commit
      1d18c47c ('arm64: MMU fault handling and page table management')
      are incorrects.
      
      Commit 1d18c47c seems to use the same hack introduced by
      commit a5adc91a ('powerpc: Ensure random space between stack
      and mmaps'), latter copied in commit 5a0efea0 ('sparc64: Sharpen
      address space randomization calculations.').
      
      But both architectures were cleaned up as part of commit
      fa8cbaaf ('powerpc+sparc64/mm: Remove hack in mmap randomize
      layout') as hack is no more needed since commit 8a0a9bd4
      
      .
      
      So the present patch removes the comment and the hack around
      get_random_int() on AArch64's mmap_rnd().
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Acked-by: default avatarDan McGee <dpmcgee@gmail.com>
      Signed-off-by: default avatarYann Droneaud <ydroneaud@opteya.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Cc: Matthias Brugger <mbrugger@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8cb907d7
    • Horia Geant?'s avatar
      crypto: caam - fix memory corruption in ahash_final_ctx · 66df6a5d
      Horia Geant? authored
      commit b310c178 upstream.
      
      When doing pointer operation for accessing the HW S/G table,
      a value representing number of entries (and not number of bytes)
      must be used.
      
      Fixes: 045e3678
      
       ("crypto: caam - ahash hmac support")
      Signed-off-by: default avatarHoria Geant? <horia.geanta@freescale.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      66df6a5d
    • Guenter Roeck's avatar
      regmap: regcache-rbtree: Clean new present bits on present bitmap resize · 0d57510d
      Guenter Roeck authored
      commit 8ef9724b upstream.
      
      When inserting a new register into a block, the present bit map size is
      increased using krealloc. krealloc does not clear the additionally
      allocated memory, leaving it filled with random values. Result is that
      some registers are considered cached even though this is not the case.
      
      Fix the problem by clearing the additionally allocated memory. Also, if
      the bitmap size does not increase, do not reallocate the bitmap at all
      to reduce overhead.
      
      Fixes: 3f4ff561
      
       ("regmap: rbtree: Make cache_present bitmap per node")
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0d57510d
    • Bart Van Assche's avatar
      libfc: Fix fc_fcp_cleanup_each_cmd() · 56568d1f
      Bart Van Assche authored
      commit 8f2777f5
      
       upstream.
      
      Since fc_fcp_cleanup_cmd() can sleep this function must not
      be called while holding a spinlock. This patch avoids that
      fc_fcp_cleanup_each_cmd() triggers the following bug:
      
      BUG: scheduling while atomic: sg_reset/1512/0x00000202
      1 lock held by sg_reset/1512:
       #0:  (&(&fsp->scsi_pkt_lock)->rlock){+.-...}, at: [<ffffffffc0225cd5>] fc_fcp_cleanup_each_cmd.isra.21+0xa5/0x150 [libfc]
      Preemption disabled at:[<ffffffffc0225cd5>] fc_fcp_cleanup_each_cmd.isra.21+0xa5/0x150 [libfc]
      Call Trace:
       [<ffffffff816c612c>] dump_stack+0x4f/0x7b
       [<ffffffff810828bc>] __schedule_bug+0x6c/0xd0
       [<ffffffff816c87aa>] __schedule+0x71a/0xa10
       [<ffffffff816c8ad2>] schedule+0x32/0x80
       [<ffffffffc0217eac>] fc_seq_set_resp+0xac/0x100 [libfc]
       [<ffffffffc0218b11>] fc_exch_done+0x41/0x60 [libfc]
       [<ffffffffc0225cff>] fc_fcp_cleanup_each_cmd.isra.21+0xcf/0x150 [libfc]
       [<ffffffffc0225f43>] fc_eh_device_reset+0x1c3/0x270 [libfc]
       [<ffffffff814a2cc9>] scsi_try_bus_device_reset+0x29/0x60
       [<ffffffff814a3908>] scsi_ioctl_reset+0x258/0x2d0
       [<ffffffff814a2650>] scsi_ioctl+0x150/0x440
       [<ffffffff814b3a9d>] sd_ioctl+0xad/0x120
       [<ffffffff8132f266>] blkdev_ioctl+0x1b6/0x810
       [<ffffffff811da608>] block_ioctl+0x38/0x40
       [<ffffffff811b4e08>] do_vfs_ioctl+0x2f8/0x530
       [<ffffffff811b50c1>] SyS_ioctl+0x81/0xa0
       [<ffffffff816cf8b2>] system_call_fastpath+0x16/0x7a
      
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Signed-off-by: default avatarVasu Dev <vasu.dev@intel.com>
      Signed-off-by: default avatarJames Bottomley <JBottomley@Odin.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      56568d1f
    • Bart Van Assche's avatar
      libfc: Fix fc_exch_recv_req() error path · cdef3013
      Bart Van Assche authored
      commit f6979ade upstream.
      
      Due to patch "libfc: Do not invoke the response handler after
      fc_exch_done()" (commit ID 7030fd62
      
      ) the lport_recv() call
      in fc_exch_recv_req() is passed a dangling pointer. Avoid this
      by moving the fc_frame_free() call from fc_invoke_resp() to its
      callers. This patch fixes the following crash:
      
      general protection fault: 0000 [#3] PREEMPT SMP
      RIP: fc_lport_recv_req+0x72/0x280 [libfc]
      Call Trace:
       fc_exch_recv+0x642/0xde0 [libfc]
       fcoe_percpu_receive_thread+0x46a/0x5ed [fcoe]
       kthread+0x10a/0x120
       ret_from_fork+0x42/0x70
      
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Signed-off-by: default avatarVasu Dev <vasu.dev@intel.com>
      Signed-off-by: default avatarJames Bottomley <JBottomley@Odin.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cdef3013
    • Thomas Hellstrom's avatar
      drm/vmwgfx: Fix execbuf locking issues · d2aacf48
      Thomas Hellstrom authored
      commit 3e04e2fe
      
       upstream.
      
      This addresses two issues that cause problems with viewperf maya-03 in
      situation with memory pressure.
      
      The first issue causes attempts to unreserve buffers if batched
      reservation fails due to, for example, a signal pending. While previously
      the ttm_eu api was resistant against this type of error, it is no longer
      and the lockdep code will complain about attempting to unreserve buffers
      that are not reserved. The issue is resolved by avoid calling
      ttm_eu_backoff_reservation in the buffer reserve error path.
      
      The second issue is that the binding_mutex may be held when user-space
      fence objects are created and hence during memory reclaims. This may cause
      recursive attempts to grab the binding mutex. The issue is resolved by not
      holding the binding mutex across fence creation and submission.
      
      Signed-off-by: default avatarThomas Hellstrom <thellstrom@vmware.com>
      Reviewed-by: default avatarSinclair Yeh <syeh@vmware.com>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d2aacf48
    • Alex Deucher's avatar
      drm/radeon: add new OLAND pci id · 09e72860
      Alex Deucher authored
      commit e037239e
      
       upstream.
      
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      09e72860
    • Michael Walle's avatar
      EDAC, ppc4xx: Access mci->csrows array elements properly · 8fe179e3
      Michael Walle authored
      commit 5c16179b upstream.
      
      The commit
      
        de3910eb
      
       ("edac: change the mem allocation scheme to
      		 make Documentation/kobject.txt happy")
      
      changed the memory allocation for the csrows member. But ppc4xx_edac was
      forgotten in the patch. Fix it.
      
      Signed-off-by: default avatarMichael Walle <michael@walle.cc>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
      Link: http://lkml.kernel.org/r/1437469253-8611-1-git-send-email-michael@walle.cc
      
      
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8fe179e3
    • Richard Weinberger's avatar
      localmodconfig: Use Kbuild files too · 89651211
      Richard Weinberger authored
      commit c0ddc8c7 upstream.
      
      In kbuild it is allowed to define objects in files named "Makefile"
      and "Kbuild".
      Currently localmodconfig reads objects only from "Makefile"s and misses
      modules like nouveau.
      
      Link: http://lkml.kernel.org/r/1437948415-16290-1-git-send-email-richard@nod.at
      
      
      
      Reported-and-tested-by: default avatarLeonidas Spyropoulos <artafinde@gmail.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      89651211
    • Joe Thornber's avatar
      dm thin metadata: delete btrees when releasing metadata snapshot · e7a778e9
      Joe Thornber authored
      commit 7f518ad0
      
       upstream.
      
      The device details and mapping trees were just being decremented
      before.  Now btree_del() is called to do a deep delete.
      
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e7a778e9
    • Peter Zijlstra's avatar
      perf: Fix PERF_EVENT_IOC_PERIOD migration race · 5862cc57
      Peter Zijlstra authored
      commit c7999c6f
      
       upstream.
      
      I ran the perf fuzzer, which triggered some WARN()s which are due to
      trying to stop/restart an event on the wrong CPU.
      
      Use the normal IPI pattern to ensure we run the code on the correct CPU.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: bad7192b
      
       ("perf: Fix PERF_EVENT_IOC_PERIOD to force-reset the period")
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5862cc57
    • Peter Zijlstra's avatar
      perf: Fix fasync handling on inherited events · cf766f63
      Peter Zijlstra authored
      commit fed66e2c
      
       upstream.
      
      Vince reported that the fasync signal stuff doesn't work proper for
      inherited events. So fix that.
      
      Installing fasync allocates memory and sets filp->f_flags |= FASYNC,
      which upon the demise of the file descriptor ensures the allocation is
      freed and state is updated.
      
      Now for perf, we can have the events stick around for a while after the
      original FD is dead because of references from child events. So we
      cannot copy the fasync pointer around. We can however consistently use
      the parent's fasync, as that will be updated.
      
      Reported-and-Tested-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho deMelo <acme@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: eranian@google.com
      Link: http://lkml.kernel.org/r/1434011521.1495.71.camel@twins
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cf766f63
    • Bob Liu's avatar
      xen-blkfront: don't add indirect pages to list when !feature_persistent · ee159186
      Bob Liu authored
      commit 7b076750
      
       upstream.
      
      We should consider info->feature_persistent when adding indirect page to list
      info->indirect_pages, else the BUG_ON() in blkif_free() would be triggered.
      
      When we are using persistent grants the indirect_pages list
      should always be empty because blkfront has pre-allocated enough
      persistent pages to fill all requests on the ring.
      
      Acked-by: default avatarRoger Pau Monné <roger.pau@citrix.com>
      Signed-off-by: default avatarBob Liu <bob.liu@oracle.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ee159186
    • Wanpeng Li's avatar
      mm/hwpoison: fix page refcount of unknown non LRU page · 97091ac3
      Wanpeng Li authored
      commit 4f32be67
      
       upstream.
      
      After trying to drain pages from pagevec/pageset, we try to get reference
      count of the page again, however, the reference count of the page is not
      reduced if the page is still not on LRU list.
      
      Fix it by adding the put_page() to drop the page reference which is from
      __get_any_page().
      
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Acked-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      97091ac3
    • Manfred Spraul's avatar
      ipc/sem.c: update/correct memory barriers · 502b83be
      Manfred Spraul authored
      commit 3ed1f8a9
      
       upstream.
      
      sem_lock() did not properly pair memory barriers:
      
      !spin_is_locked() and spin_unlock_wait() are both only control barriers.
      The code needs an acquire barrier, otherwise the cpu might perform read
      operations before the lock test.
      
      As no primitive exists inside <include/spinlock.h> and since it seems
      noone wants another primitive, the code creates a local primitive within
      ipc/sem.c.
      
      With regards to -stable:
      
      The change of sem_wait_array() is a bugfix, the change to sem_lock() is a
      nop (just a preprocessor redefinition to improve the readability).  The
      bugfix is necessary for all kernels that use sem_wait_array() (i.e.:
      starting from 3.10).
      
      Signed-off-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Reported-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Kirill Tkhai <ktkhai@parallels.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      502b83be
    • Herton R. Krzesinski's avatar
      ipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits · 9ab6ec25
      Herton R. Krzesinski authored
      commit 602b8593
      
       upstream.
      
      The current semaphore code allows a potential use after free: in
      exit_sem we may free the task's sem_undo_list while there is still
      another task looping through the same semaphore set and cleaning the
      sem_undo list at freeary function (the task called IPC_RMID for the same
      semaphore set).
      
      For example, with a test program [1] running which keeps forking a lot
      of processes (which then do a semop call with SEM_UNDO flag), and with
      the parent right after removing the semaphore set with IPC_RMID, and a
      kernel built with CONFIG_SLAB, CONFIG_SLAB_DEBUG and
      CONFIG_DEBUG_SPINLOCK, you can easily see something like the following
      in the kernel log:
      
         Slab corruption (Not tainted): kmalloc-64 start=ffff88003b45c1c0, len=64
         000: 6b 6b 6b 6b 6b 6b 6b 6b 00 6b 6b 6b 6b 6b 6b 6b  kkkkkkkk.kkkkkkk
         010: ff ff ff ff 6b 6b 6b 6b ff ff ff ff ff ff ff ff  ....kkkk........
         Prev obj: start=ffff88003b45c180, len=64
         000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a  .....N......ZZZZ
         010: ff ff ff ff ff ff ff ff c0 fb 01 37 00 88 ff ff  ...........7....
         Next obj: start=ffff88003b45c200, len=64
         000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a  .....N......ZZZZ
         010: ff ff ff ff ff ff ff ff 68 29 a7 3c 00 88 ff ff  ........h).<....
         BUG: spinlock wrong CPU on CPU#2, test/18028
         general protection fault: 0000 [#1] SMP
         Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
         CPU: 2 PID: 18028 Comm: test Not tainted 4.2.0-rc5+ #1
         Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
         RIP: spin_dump+0x53/0xc0
         Call Trace:
           spin_bug+0x30/0x40
           do_raw_spin_unlock+0x71/0xa0
           _raw_spin_unlock+0xe/0x10
           freeary+0x82/0x2a0
           ? _raw_spin_lock+0xe/0x10
           semctl_down.clone.0+0xce/0x160
           ? __do_page_fault+0x19a/0x430
           ? __audit_syscall_entry+0xa8/0x100
           SyS_semctl+0x236/0x2c0
           ? syscall_trace_leave+0xde/0x130
           entry_SYSCALL_64_fastpath+0x12/0x71
         Code: 8b 80 88 03 00 00 48 8d 88 60 05 00 00 48 c7 c7 a0 2c a4 81 31 c0 65 8b 15 eb 40 f3 7e e8 08 31 68 00 4d 85 e4 44 8b 4b 08 74 5e <45> 8b 84 24 88 03 00 00 49 8d 8c 24 60 05 00 00 8b 53 04 48 89
         RIP  [<ffffffff810d6053>] spin_dump+0x53/0xc0
          RSP <ffff88003750fd68>
         ---[ end trace 783ebb76612867a0 ]---
         NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [test:18053]
         Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
         CPU: 3 PID: 18053 Comm: test Tainted: G      D         4.2.0-rc5+ #1
         Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
         RIP: native_read_tsc+0x0/0x20
         Call Trace:
           ? delay_tsc+0x40/0x70
           __delay+0xf/0x20
           do_raw_spin_lock+0x96/0x140
           _raw_spin_lock+0xe/0x10
           sem_lock_and_putref+0x11/0x70
           SYSC_semtimedop+0x7bf/0x960
           ? handle_mm_fault+0xbf6/0x1880
           ? dequeue_task_fair+0x79/0x4a0
           ? __do_page_fault+0x19a/0x430
           ? kfree_debugcheck+0x16/0x40
           ? __do_page_fault+0x19a/0x430
           ? __audit_syscall_entry+0xa8/0x100
           ? do_audit_syscall_entry+0x66/0x70
           ? syscall_trace_enter_phase1+0x139/0x160
           SyS_semtimedop+0xe/0x10
           SyS_semop+0x10/0x20
           entry_SYSCALL_64_fastpath+0x12/0x71
         Code: 47 10 83 e8 01 85 c0 89 47 10 75 08 65 48 89 3d 1f 74 ff 7e c9 c3 0f 1f 44 00 00 55 48 89 e5 e8 87 17 04 00 66 90 c9 c3 0f 1f 00 <55> 48 89 e5 0f 31 89 c1 48 89 d0 48 c1 e0 20 89 c9 48 09 c8 c9
         Kernel panic - not syncing: softlockup: hung tasks
      
      I wasn't able to trigger any badness on a recent kernel without the
      proper config debugs enabled, however I have softlockup reports on some
      kernel versions, in the semaphore code, which are similar as above (the
      scenario is seen on some servers running IBM DB2 which uses semaphore
      syscalls).
      
      The patch here fixes the race against freeary, by acquiring or waiting
      on the sem_undo_list lock as necessary (exit_sem can race with freeary,
      while freeary sets un->semid to -1 and removes the same sem_undo from
      list_proc or when it removes the last sem_undo).
      
      After the patch I'm unable to reproduce the problem using the test case
      [1].
      
      [1] Test case used below:
      
          #include <stdio.h>
          #include <sys/types.h>
          #include <sys/ipc.h>
          #include <sys/sem.h>
          #include <sys/wait.h>
          #include <stdlib.h>
          #include <time.h>
          #include <unistd.h>
          #include <errno.h>
      
          #define NSEM 1
          #define NSET 5
      
          int sid[NSET];
      
          void thread()
          {
                  struct sembuf op;
                  int s;
                  uid_t pid = getuid();
      
                  s = rand() % NSET;
                  op.sem_num = pid % NSEM;
                  op.sem_op = 1;
                  op.sem_flg = SEM_UNDO;
      
                  semop(sid[s], &op, 1);
                  exit(EXIT_SUCCESS);
          }
      
          void create_set()
          {
                  int i, j;
                  pid_t p;
                  union {
                          int val;
                          struct semid_ds *buf;
                          unsigned short int *array;
                          struct seminfo *__buf;
                  } un;
      
                  /* Create and initialize semaphore set */
                  for (i = 0; i < NSET; i++) {
                          sid[i] = semget(IPC_PRIVATE , NSEM, 0644 | IPC_CREAT);
                          if (sid[i] < 0) {
                                  perror("semget");
                                  exit(EXIT_FAILURE);
                          }
                  }
                  un.val = 0;
                  for (i = 0; i < NSET; i++) {
                          for (j = 0; j < NSEM; j++) {
                                  if (semctl(sid[i], j, SETVAL, un) < 0)
                                          perror("semctl");
                          }
                  }
      
                  /* Launch threads that operate on semaphore set */
                  for (i = 0; i < NSEM * NSET * NSET; i++) {
                          p = fork();
                          if (p < 0)
                                  perror("fork");
                          if (p == 0)
                                  thread();
                  }
      
                  /* Free semaphore set */
                  for (i = 0; i < NSET; i++) {
                          if (semctl(sid[i], NSEM, IPC_RMID))
                                  perror("IPC_RMID");
                  }
      
                  /* Wait for forked processes to exit */
                  while (wait(NULL)) {
                          if (errno == ECHILD)
                                  break;
                  };
          }
      
          int main(int argc, char **argv)
          {
                  pid_t p;
      
                  srand(time(NULL));
      
                  while (1) {
                          p = fork();
                          if (p < 0) {
                                  perror("fork");
                                  exit(EXIT_FAILURE);
                          }
                          if (p == 0) {
                                  create_set();
                                  goto end;
                          }
      
                          /* Wait for forked processes to exit */
                          while (wait(NULL)) {
                                  if (errno == ECHILD)
                                          break;
                          };
                  }
          end:
                  return 0;
          }
      
      [akpm@linux-foundation.org: use normal comment layout]
      Signed-off-by: default avatarHerton R. Krzesinski <herton@redhat.com>
      Acked-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Rafael Aquini <aquini@redhat.com>
      CC: Aristeu Rozanski <aris@redhat.com>
      Cc: David Jeffery <djeffery@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9ab6ec25
  2. Aug 17, 2015
    • Greg Kroah-Hartman's avatar
      Linux 3.14.51 · 318ff69c
      Greg Kroah-Hartman authored
      v3.14.51
      318ff69c
    • Michal Hocko's avatar
      mm, vmscan: Do not wait for page writeback for GFP_NOFS allocations · 0fa4301f
      Michal Hocko authored
      commit ecf5fc6e upstream.
      
      Nikolay has reported a hang when a memcg reclaim got stuck with the
      following backtrace:
      
      PID: 18308  TASK: ffff883d7c9b0a30  CPU: 1   COMMAND: "rsync"
        #0 __schedule at ffffffff815ab152
        #1 schedule at ffffffff815ab76e
        #2 schedule_timeout at ffffffff815ae5e5
        #3 io_schedule_timeout at ffffffff815aad6a
        #4 bit_wait_io at ffffffff815abfc6
        #5 __wait_on_bit at ffffffff815abda5
        #6 wait_on_page_bit at ffffffff8111fd4f
        #7 shrink_page_list at ffffffff81135445
        #8 shrink_inactive_list at ffffffff81135845
        #9 shrink_lruvec at ffffffff81135ead
       #10 shrink_zone at ffffffff811360c3
       #11 shrink_zones at ffffffff81136eff
       #12 do_try_to_free_pages at ffffffff8113712f
       #13 try_to_free_mem_cgroup_pages at ffffffff811372be
       #14 try_charge at ffffffff81189423
       #15 mem_cgroup_try_charge at ffffffff8118c6f5
       #16 __add_to_page_cache_locked at ffffffff8112137d
       #17 add_to_page_cache_lru at ffffffff81121618
       #18 pagecache_get_page at ffffffff8112170b
       #19 grow_dev_page at ffffffff811c8297
       #20 __getblk_slow at ffffffff811c91d6
       #21 __getblk_gfp at ffffffff811c92c1
       #22 ext4_ext_grow_indepth at ffffffff8124565c
       #23 ext4_ext_create_new_leaf at ffffffff81246ca8
       #24 ext4_ext_insert_extent at ffffffff81246f09
       #25 ext4_ext_map_blocks at ffffffff8124a848
       #26 ext4_map_blocks at ffffffff8121a5b7
       #27 mpage_map_one_extent at ffffffff8121b1fa
       #28 mpage_map_and_submit_extent at ffffffff8121f07b
       #29 ext4_writepages at ffffffff8121f6d5
       #30 do_writepages at ffffffff8112c490
       #31 __filemap_fdatawrite_range at ffffffff81120199
       #32 filemap_flush at ffffffff8112041c
       #33 ext4_alloc_da_blocks at ffffffff81219da1
       #34 ext4_rename at ffffffff81229b91
       #35 ext4_rename2 at ffffffff81229e32
       #36 vfs_rename at ffffffff811a08a5
       #37 SYSC_renameat2 at ffffffff811a3ffc
       #38 sys_renameat2 at ffffffff811a408e
       #39 sys_rename at ffffffff8119e51e
       #40 system_call_fastpath at ffffffff815afa89
      
      Dave Chinner has properly pointed out that this is a deadlock in the
      reclaim code because ext4 doesn't submit pages which are marked by
      PG_writeback right away.
      
      The heuristic was introduced by commit e62e384e ("memcg: prevent OOM
      with too many dirty pages") and it was applied only when may_enter_fs
      was specified.  The code has been changed by c3b94f44 ("memcg:
      further prevent OOM with too many dirty pages") which has removed the
      __GFP_FS restriction with a reasoning that we do not get into the fs
      code.  But this is not sufficient apparently because the fs doesn't
      necessarily submit pages marked PG_writeback for IO right away.
      
      ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily
      submit the bio.  Instead it tries to map more pages into the bio and
      mpage_map_one_extent might trigger memcg charge which might end up
      waiting on a page which is marked PG_writeback but hasn't been submitted
      yet so we would end up waiting for something that never finishes.
      
      Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2)
      before we go to wait on the writeback.  The page fault path, which is
      the only path that triggers memcg oom killer since 3.12, shouldn't
      require GFP_NOFS and so we shouldn't reintroduce the premature OOM
      killer issue which was originally addressed by the heuristic.
      
      As per David Chinner the xfs is doing similar thing since 2.6.15 already
      so ext4 is not the only affected filesystem.  Moreover he notes:
      
      : For example: IO completion might require unwritten extent conversion
      : which executes filesystem transactions and GFP_NOFS allocations. The
      : writeback flag on the pages can not be cleared until unwritten
      : extent conversion completes. Hence memory reclaim cannot wait on
      : page writeback to complete in GFP_NOFS context because it is not
      : safe to do so, memcg reclaim or otherwise.
      
      [tytso@mit.edu: corrected the control flow]
      Fixes: c3b94f44
      
       ("memcg: further prevent OOM with too many dirty pages")
      Reported-by: default avatarNikolay Borisov <kernel@kyup.com>
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      0fa4301f
    • NeilBrown's avatar
      md/bitmap: return an error when bitmap superblock is corrupt. · ed969167
      NeilBrown authored
      commit b97e9257 upstream
          Use separate bitmaps for each nodes in the cluster
      
      bitmap_read_sb() validates the bitmap superblock that it reads in.
      If it finds an inconsistency like a bad magic number or out-of-range
      version number, it prints an error and returns, but it incorrectly
      returns zero, so the array is still assembled with the (invalid) bitmap.
      
      This means it could try to use a bitmap with a new version number which
      it therefore does not understand.
      
      This bug was introduced in 3.5 and fix as part of a larger patch in 4.1.
      So the patch is suitable for any -stable kernel in that range.
      
      Fixes: 27581e5a
      
       ("md/bitmap: centralise allocation of bitmap file pages.")
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Reported-by: default avatarGuoQing Jiang <gqjiang@suse.com>
      ed969167
    • Al Viro's avatar
      path_openat(): fix double fput() · 88b4f377
      Al Viro authored
      commit f15133df
      
       upstream.
      
      path_openat() jumps to the wrong place after do_tmpfile() - it has
      already done path_cleanup() (as part of path_lookupat() called by
      do_tmpfile()), so doing that again can lead to double fput().
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      88b4f377
    • Paolo Bonzini's avatar
      kvm: x86: fix kvm_apic_has_events to check for NULL pointer · c76b576d
      Paolo Bonzini authored
      commit ce40cd3f
      
       upstream.
      
      Malicious (or egregiously buggy) userspace can trigger it, but it
      should never happen in normal operation.
      
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarWang Kai <morgan.wang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c76b576d
    • Miklos Szeredi's avatar
      dcache: don't need rcu in shrink_dentry_list() · 635fa0fd
      Miklos Szeredi authored
      commit 60942f2f
      
       upstream.
      
      Since now the shrink list is private and nobody can free the dentry while
      it is on the shrink list, we can remove RCU protection from this.
      
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      635fa0fd
    • Al Viro's avatar
      more graceful recovery in umount_collect() · 85903ac9
      Al Viro authored
      commit 9c8c10e2
      
       upstream.
      
      Start with shrink_dcache_parent(), then scan what remains.
      
      First of all, BUG() is very much an overkill here; we are holding
      ->s_umount, and hitting BUG() means that a lot of interesting stuff
      will be hanging after that point (sync(2), for example).  Moreover,
      in cases when there had been more than one leak, we'll be better
      off reporting all of them.  And more than just the last component
      of pathname - %pd is there for just such uses...
      
      That was the last user of dentry_lru_del(), so kill it off...
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      85903ac9
    • Al Viro's avatar
      don't remove from shrink list in select_collect() · c214cb82
      Al Viro authored
      commit fe91522a
      
       upstream.
      
      	If we find something already on a shrink list, just increment
      data->found and do nothing else.  Loops in shrink_dcache_parent() and
      check_submounts_and_drop() will do the right thing - everything we
      did put into our list will be evicted and if there had been nothing,
      but data->found got non-zero, well, we have somebody else shrinking
      those guys; just try again.
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c214cb82
    • Al Viro's avatar
      dentry_kill(): don't try to remove from shrink list · cb55ed7f
      Al Viro authored
      commit 41edf278
      
       upstream.
      
      If the victim in on the shrink list, don't remove it from there.
      If shrink_dentry_list() manages to remove it from the list before
      we are done - fine, we'll just free it as usual.  If not - mark
      it with new flag (DCACHE_MAY_FREE) and leave it there.
      
      Eventually, shrink_dentry_list() will get to it, remove the sucker
      from shrink list and call dentry_kill(dentry, 0).  Which is where
      we'll deal with freeing.
      
      Since now dentry_kill(dentry, 0) may happen after or during
      dentry_kill(dentry, 1), we need to recognize that (by seeing
      DCACHE_DENTRY_KILLED already set), unlock everything
      and either free the sucker (in case DCACHE_MAY_FREE has been
      set) or leave it for ongoing dentry_kill(dentry, 1) to deal with.
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cb55ed7f
    • Al Viro's avatar
      expand the call of dentry_lru_del() in dentry_kill() · 065c70fd
      Al Viro authored
      commit 01b60351
      
       upstream.
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      065c70fd
    • Al Viro's avatar
      new helper: dentry_free() · 5d81116c
      Al Viro authored
      commit b4f0354e
      
       upstream.
      
      The part of old d_free() that dealt with actual freeing of dentry.
      Taken out of dentry_kill() into a separate function.
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5d81116c
    • Al Viro's avatar
      fold try_prune_one_dentry() · a76c1ead
      Al Viro authored
      commit 5c47e6d0
      
       upstream.
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a76c1ead
    • Al Viro's avatar
      fold d_kill() and d_free() · 79575968
      Al Viro authored
      commit 03b3b889
      
       upstream.
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      79575968
    • Amanieu d'Antras's avatar
      signal: fix information leak in copy_siginfo_from_user32 · a209694c
      Amanieu d'Antras authored
      commit 3c00cb5e
      
       upstream.
      
      This function can leak kernel stack data when the user siginfo_t has a
      positive si_code value.  The top 16 bits of si_code descibe which fields
      in the siginfo_t union are active, but they are treated inconsistently
      between copy_siginfo_from_user32, copy_siginfo_to_user32 and
      copy_siginfo_to_user.
      
      copy_siginfo_from_user32 is called from rt_sigqueueinfo and
      rt_tgsigqueueinfo in which the user has full control overthe top 16 bits
      of si_code.
      
      This fixes the following information leaks:
      x86:   8 bytes leaked when sending a signal from a 32-bit process to
             itself. This leak grows to 16 bytes if the process uses x32.
             (si_code = __SI_CHLD)
      x86:   100 bytes leaked when sending a signal from a 32-bit process to
             a 64-bit process. (si_code = -1)
      sparc: 4 bytes leaked when sending a signal from a 32-bit process to a
             64-bit process. (si_code = any)
      
      parsic and s390 have similar bugs, but they are not vulnerable because
      rt_[tg]sigqueueinfo have checks that prevent sending a positive si_code
      to a different process.  These bugs are also fixed for consistency.
      
      Signed-off-by: default avatarAmanieu d'Antras <amanieu@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a209694c
    • Amanieu d'Antras's avatar
      signal: fix information leak in copy_siginfo_to_user · d096e807
      Amanieu d'Antras authored
      commit 26135022
      
       upstream.
      
      This function may copy the si_addr_lsb, si_lower and si_upper fields to
      user mode when they haven't been initialized, which can leak kernel
      stack data to user mode.
      
      Just checking the value of si_code is insufficient because the same
      si_code value is shared between multiple signals.  This is solved by
      checking the value of si_signo in addition to si_code.
      
      Signed-off-by: default avatarAmanieu d'Antras <amanieu@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d096e807
    • Amanieu d'Antras's avatar
      signalfd: fix information leak in signalfd_copyinfo · fda4653f
      Amanieu d'Antras authored
      commit 3ead7c52
      
       upstream.
      
      This function may copy the si_addr_lsb field to user mode when it hasn't
      been initialized, which can leak kernel stack data to user mode.
      
      Just checking the value of si_code is insufficient because the same
      si_code value is shared between multiple signals.  This is solved by
      checking the value of si_signo in addition to si_code.
      
      Signed-off-by: default avatarAmanieu d'Antras <amanieu@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fda4653f
    • Al Viro's avatar
      sg_start_req(): make sure that there's not too many elements in iovec · 08ac1787
      Al Viro authored
      commit 451a2886
      
       upstream.
      
      unfortunately, allowing an arbitrary 16bit value means a possibility of
      overflow in the calculation of total number of pages in bio_map_user_iov() -
      we rely on there being no more than PAGE_SIZE members of sum in the
      first loop there.  If that sum wraps around, we end up allocating
      too small array of pointers to pages and it's easy to overflow it in
      the second loop.
      
      X-Coverup: TINC (and there's no lumber cartel either)
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      [bwh: s/MAX_UIOVEC/UIO_MAXIOV/. This was fixed upstream by commit
       fdc81f45
      
       ("sg_start_req(): use import_iovec()"), but we don't have
        that function.]
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      08ac1787
    • NeilBrown's avatar
      md/raid1: extend spinlock to protect raid1_end_read_request against inconsistencies · bb23b9bd
      NeilBrown authored
      commit 423f04d6 upstream.
      
      raid1_end_read_request() assumes that the In_sync bits are consistent
      with the ->degaded count.
      raid1_spare_active updates the In_sync bit before the ->degraded count
      and so exposes an inconsistency, as does error()
      So extend the spinlock in raid1_spare_active() and error() to hide those
      inconsistencies.
      
      This should probably be part of
        Commit: 34cab6f4 ("md/raid1: fix test for 'was read error from
        last working device'.")
      as it addresses the same issue.  It fixes the same bug and should go
      to -stable for same reasons.
      
      Fixes: 76073054
      
       ("md/raid1: clean up read_balance.")
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bb23b9bd
    • Michael S. Tsirkin's avatar
      PCI: Restore PCI_MSIX_FLAGS_BIRMASK definition · 57469339
      Michael S. Tsirkin authored
      commit c9ddbac9 upstream.
      
      09a2c73d
      
       ("PCI: Remove unused PCI_MSIX_FLAGS_BIRMASK definition")
      removed PCI_MSIX_FLAGS_BIRMASK from an exported header because it was
      unused in the kernel.  But that breaks user programs that were using it
      (QEMU in particular).
      
      Restore the PCI_MSIX_FLAGS_BIRMASK definition.
      
      [bhelgaas: changelog]
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      57469339
    • Joseph Qi's avatar
      ocfs2: fix BUG in ocfs2_downconvert_thread_do_work() · f39afff6
      Joseph Qi authored
      commit 209f7512
      
       upstream.
      
      The "BUG_ON(list_empty(&osb->blocked_lock_list))" in
      ocfs2_downconvert_thread_do_work can be triggered in the following case:
      
      ocfs2dc has firstly saved osb->blocked_lock_count to local varibale
      processed, and then processes the dentry lockres.  During the dentry
      put, it calls iput and then deletes rw, inode and open lockres from
      blocked list in ocfs2_mark_lockres_freeing.  And this causes the
      variable `processed' to not reflect the number of blocked lockres to be
      processed, which triggers the BUG.
      
      Signed-off-by: default avatarJoseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f39afff6
    • Marcus Gelderie's avatar
      ipc: modify message queue accounting to not take kernel data structures into account · f5dd4ec9
      Marcus Gelderie authored
      commit de54b9ac upstream.
      
      A while back, the message queue implementation in the kernel was
      improved to use btrees to speed up retrieval of messages, in commit
      d6629859 ("ipc/mqueue: improve performance of send/recv").
      
      That patch introducing the improved kernel handling of message queues
      (using btrees) has, as a by-product, changed the meaning of the QSIZE
      field in the pseudo-file created for the queue.  Before, this field
      reflected the size of the user-data in the queue.  Since, it also takes
      kernel data structures into account.  For example, if 13 bytes of user
      data are in the queue, on my machine the file reports a size of 61
      bytes.
      
      There was some discussion on this topic before (for example
      https://lkml.org/lkml/2014/10/1/115).  Commenting on a th lkml, Michael
      Kerrisk gave the following background
      (https://lkml.org/lkml/2015/6/16/74
      
      ):
      
          The pseudofiles in the mqueue filesystem (usually mounted at
          /dev/mqueue) expose fields with metadata describing a message
          queue. One of these fields, QSIZE, as originally implemented,
          showed the total number of bytes of user data in all messages in
          the message queue, and this feature was documented from the
          beginning in the mq_overview(7) page. In 3.5, some other (useful)
          work happened to break the user-space API in a couple of places,
          including the value exposed via QSIZE, which now includes a measure
          of kernel overhead bytes for the queue, a figure that renders QSIZE
          useless for its original purpose, since there's no way to deduce
          the number of overhead bytes consumed by the implementation.
          (The other user-space breakage was subsequently fixed.)
      
      This patch removes the accounting of kernel data structures in the
      queue.  Reporting the size of these data-structures in the QSIZE field
      was a breaking change (see Michael's comment above).  Without the QSIZE
      field reporting the total size of user-data in the queue, there is no
      way to deduce this number.
      
      It should be noted that the resource limit RLIMIT_MSGQUEUE is counted
      against the worst-case size of the queue (in both the old and the new
      implementation).  Therefore, the kernel overhead accounting in QSIZE is
      not necessary to help the user understand the limitations RLIMIT imposes
      on the processes.
      
      Signed-off-by: default avatarMarcus Gelderie <redmnic@gmail.com>
      Acked-by: default avatarDoug Ledford <dledford@redhat.com>
      Acked-by: default avatarMichael Kerrisk <mtk.manpages@gmail.com>
      Acked-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: John Duffy <jb_duffy@btinternet.com>
      Cc: Arto Bendiken <arto@bendiken.net>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f5dd4ec9