Skip to content
  1. May 03, 2015
    • Lukas Czerner's avatar
      ext4: fix data corruption caused by unwritten and delayed extents · d2dc317d
      Lukas Czerner authored
      
      
      Currently it is possible to lose whole file system block worth of data
      when we hit the specific interaction with unwritten and delayed extents
      in status extent tree.
      
      The problem is that when we insert delayed extent into extent status
      tree the only way to get rid of it is when we write out delayed buffer.
      However there is a limitation in the extent status tree implementation
      so that when inserting unwritten extent should there be even a single
      delayed block the whole unwritten extent would be marked as delayed.
      
      At this point, there is no way to get rid of the delayed extents,
      because there are no delayed buffers to write out. So when a we write
      into said unwritten extent we will convert it to written, but it still
      remains delayed.
      
      When we try to write into that block later ext4_da_map_blocks() will set
      the buffer new and delayed and map it to invalid block which causes
      the rest of the block to be zeroed loosing already written data.
      
      For now we can fix this by simply not allowing to set delayed status on
      written extent in the extent status tree. Also add WARN_ON() to make
      sure that we notice if this happens in the future.
      
      This problem can be easily reproduced by running the following xfs_io.
      
      xfs_io -f -c "pwrite -S 0xaa 4096 2048" \
                -c "falloc 0 131072" \
                -c "pwrite -S 0xbb 65536 2048" \
                -c "fsync" /mnt/test/fff
      
      echo 3 > /proc/sys/vm/drop_caches
      xfs_io -c "pwrite -S 0xdd 67584 2048" /mnt/test/fff
      
      This can be theoretically also reproduced by at random by running fsx,
      but it's not very reliable, though on machines with bigger page size
      (like ppc) this can be seen more often (especially xfstest generic/127)
      
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      d2dc317d
  2. May 02, 2015
  3. May 01, 2015
    • Theodore Ts'o's avatar
      ext4 crypto: add padding to filenames before encrypting · a44cd7a0
      Theodore Ts'o authored
      
      
      This obscures the length of the filenames, to decrease the amount of
      information leakage.  By default, we pad the filenames to the next 4
      byte boundaries.  This costs nothing, since the directory entries are
      aligned to 4 byte boundaries anyway.  Filenames can also be padded to
      8, 16, or 32 bytes, which will consume more directory space.
      
      Change-Id: Ibb7a0fb76d2c48e2061240a709358ff40b14f322
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      a44cd7a0
    • Theodore Ts'o's avatar
      ext4 crypto: simplify and speed up filename encryption · 5de0b4d0
      Theodore Ts'o authored
      
      
      Avoid using SHA-1 when calculating the user-visible filename when the
      encryption key is available, and avoid decrypting lots of filenames
      when searching for a directory entry in a directory block.
      
      Change-Id: If4655f144784978ba0305b597bfa1c8d7bb69e63
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      5de0b4d0
  4. Apr 16, 2015
  5. Apr 12, 2015
  6. Apr 11, 2015
  7. Apr 08, 2015
  8. Apr 03, 2015
    • Lukas Czerner's avatar
      ext4: make fsync to sync parent dir in no-journal for real this time · e12fb972
      Lukas Czerner authored
      Previously commit 14ece102
      
       added a
      support for for syncing parent directory of newly created inodes to
      make sure that the inode is not lost after a power failure in
      no-journal mode.
      
      However this does not work in majority of cases, namely:
       - if the directory has inline data
       - if the directory is already indexed
       - if the directory already has at least one block and:
      	- the new entry fits into it
      	- or we've successfully converted it to indexed
      
      So in those cases we might lose the inode entirely even after fsync in
      the no-journal mode. This also includes ext2 default mode obviously.
      
      I've noticed this while running xfstest generic/321 and even though the
      test should fail (we need to run fsck after a crash in no-journal mode)
      I could not find a newly created entries even when if it was fsynced
      before.
      
      Fix this by adjusting the ext4_add_entry() successful exit paths to set
      the inode EXT4_STATE_NEWENTRY so that fsync has the chance to fsync the
      parent directory as well.
      
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Frank Mayhar <fmayhar@google.com>
      Cc: stable@vger.kernel.org
      e12fb972
    • Eric Whitney's avatar
      ext4: don't release reserved space for previously allocated cluster · 9d21c9fa
      Eric Whitney authored
      
      
      When xfstests' auto group is run on a bigalloc filesystem with a
      4.0-rc3 kernel, e2fsck failures and kernel warnings occur for some
      tests. e2fsck reports incorrect iblocks values, and the warnings
      indicate that the space reserved for delayed allocation is being
      overdrawn at allocation time.
      
      Some of these errors occur because the reserved space is incorrectly
      decreased by one cluster when ext4_ext_map_blocks satisfies an
      allocation request by mapping an unused portion of a previously
      allocated cluster.  Because a cluster's worth of reserved space was
      already released when it was first allocated, it should not be released
      again.
      
      This patch appears to correct the e2fsck failure reported for
      generic/232 and the kernel warnings produced by ext4/001, generic/009,
      and generic/033.  Failures and warnings for some other tests remain to
      be addressed.
      
      Signed-off-by: default avatarEric Whitney <enwlinux@gmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      9d21c9fa
    • Eric Whitney's avatar
      ext4: fix loss of delalloc extent info in ext4_zero_range() · 94426f4b
      Eric Whitney authored
      
      
      In ext4_zero_range(), removing a file's entire block range from the
      extent status tree removes all records of that file's delalloc extents.
      The delalloc accounting code uses this information, and its loss can
      then lead to accounting errors and kernel warnings at writeback time and
      subsequent file system damage.  This is most noticeable on bigalloc
      file systems where code in ext4_ext_map_blocks() handles cases where
      delalloc extents share clusters with a newly allocated extent.
      
      Because we're not deleting a block range and are correctly updating the
      status of its associated extent, there is no need to remove anything
      from the extent status tree.
      
      When this patch is combined with an unrelated bug fix for
      ext4_zero_range(), kernel warnings and e2fsck errors reported during
      xfstests runs on bigalloc filesystems are greatly reduced without
      introducing regressions on other xfstests-bld test scenarios.
      
      Signed-off-by: default avatarEric Whitney <enwlinux@gmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      94426f4b
    • Lukas Czerner's avatar
      ext4: allocate entire range in zero range · 0f2af21a
      Lukas Czerner authored
      
      
      Currently there is a bug in zero range code which causes zero range
      calls to only allocate block aligned portion of the range, while
      ignoring the rest in some cases.
      
      In some cases, namely if the end of the range is past i_size, we do
      attempt to preallocate the last nonaligned block. However this might
      cause kernel to BUG() in some carefully designed zero range requests
      on setups where page size > block size.
      
      Fix this problem by first preallocating the entire range, including
      the nonaligned edges and converting the written extents to unwritten
      in the next step. This approach will also give us the advantage of
      having the range to be as linearly contiguous as possible.
      
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      0f2af21a
    • Maurizio Lombardi's avatar
      ext4: remove unnecessary lock/unlock of i_block_reservation_lock · 5a4f3145
      Maurizio Lombardi authored
      This is a leftover of commit 71d4f7d0
      
      
      
      Signed-off-by: default avatarMaurizio Lombardi <mlombard@redhat.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarLukas Czerner <lczerner@redhat.com>
      5a4f3145
    • Christoph Hellwig's avatar
      ext4: remove block_device_ejected · 08439fec
      Christoph Hellwig authored
      
      
      bdi->dev now never goes away, so this function became useless.
      
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      08439fec
    • Wei Yuan's avatar
      ext4: remove useless condition in if statement. · 5f80f62a
      Wei Yuan authored
      
      
      In this if statement, the previous condition is useless, the later one
      has covered it.
      
      Signed-off-by: default avatarWeiyuan <weiyuan.wei@huawei.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarLukas Czerner <lczerner@redhat.com>
      5f80f62a
    • Sheng Yong's avatar
      ext4: remove unused header files · 72b8e0f9
      Sheng Yong authored
      
      
      Remove unused header files and header files which are included in
      ext4.h.
      
      Signed-off-by: default avatarSheng Yong <shengyong1@huawei.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      72b8e0f9
  9. Apr 02, 2015
  10. Mar 17, 2015
    • Theodore Ts'o's avatar
      fs: add dirtytime_expire_seconds sysctl · 1efff914
      Theodore Ts'o authored
      
      
      Add a tuning knob so we can adjust the dirtytime expiration timeout,
      which is very useful for testing lazytime.
      
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      1efff914
    • Theodore Ts'o's avatar
      fs: make sure the timestamps for lazytime inodes eventually get written · a2f48706
      Theodore Ts'o authored
      
      
      Jan Kara pointed out that if there is an inode which is constantly
      getting dirtied with I_DIRTY_PAGES, an inode with an updated timestamp
      will never be written since inode->dirtied_when is constantly getting
      updated.  We fix this by adding an extra field to the inode,
      dirtied_time_when, so inodes with a stale dirtytime can get detected
      and handled.
      
      In addition, if we have a dirtytime inode caused by an atime update,
      and there is no write activity on the file system, we need to have a
      secondary system to make sure these inodes get written out.  We do
      this by setting up a second delayed work structure which wakes up the
      CPU much more rarely compared to writeback_expire_centisecs.
      
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      a2f48706
  11. Mar 03, 2015
    • Linus Torvalds's avatar
      Linux 4.0-rc2 · 13a7a6ac
      Linus Torvalds authored
      v4.0-rc2
      13a7a6ac
    • Daniel Vetter's avatar
      drm/i915: Fix modeset state confusion in the load detect code · 9128b040
      Daniel Vetter authored
      This is a tricky story of the new atomic state handling and the legacy
      code fighting over each another. The bug at hand is an underrun of the
      framebuffer reference with subsequent hilarity caused by the load
      detect code. Which is peculiar since the the exact same code works
      fine as the implementation of the legacy setcrtc ioctl.
      
      Let's look at the ingredients:
      
      - Currently our code is a crazy mix of legacy modeset interfaces to
        set the parameters and half-baked atomic state tracking underneath.
        While this transition is going we're using the transitional plane
        helpers to update the atomic side (drm_plane_helper_disable/update
        and friends), i.e. plane->state->fb. Since the state structure owns
        the fb those functions take care of that themselves.
      
        The legacy state (specifically crtc->primary->fb) is still managed
        by the old code (and mostly by the drm core), with the fb reference
        counting done by callers (core drm for the ioctl or the i915 load
        detect code). The relevant commit is
      
        commit ea2c67bb
        Author: Matt Roper <matthew.d.roper@intel.com>
        Date:   Tue Dec 23 10:41:52 2014 -0800
      
            drm/i915: Move to atomic plane helpers (v9)
      
      - drm_plane_helper_disable has special code to handle multiple calls
        in a row - it checks plane->crtc == NULL and bails out. This is to
        match the proper atomic implementation which needs the crtc to get
        at the implied locking context atomic updates always need. See
      
        commit acf24a39
        Author: Daniel Vetter <daniel.vetter@ffwll.ch>
        Date:   Tue Jul 29 15:33:05 2014 +0200
      
            drm/plane-helper: transitional atomic plane helpers
      
      - The universal plane code split out the implicit primary plane from
        the CRTC into it's own full-blown drm_plane object. As part of that
        the setcrtc ioctl (which updated both the crtc mode and primary
        plane) learned to set crtc->primary->crtc on modeset to make sure
        the plane->crtc assignments statate up to date in
      
        commit e13161af
      
      
        Author: Matt Roper <matthew.d.roper@intel.com>
        Date:   Tue Apr 1 15:22:38 2014 -0700
      
            drm: Add drm_crtc_init_with_planes() (v2)
      
        Unfortunately we've forgotten to update the load detect code. Which
        wasn't a problem since the load detect modeset is temporary and
        always undone before we drop the locks.
      
      - Finally there is a organically grown history (i.e. don't ask) around
        who sets the legacy plane->fb for the various driver entry points.
        Originally updating that was the drivers duty, but for almost all
        places we've moved that (plus updating the refcounts) into the core.
        Again the exception is the load detect code.
      
      Taking all together the following happens:
      - The load detect code doesn't set crtc->primary->crtc. This is only
        really an issue on crtcs never before used or when userspace
        explicitly disabled the primary plane.
      
      - The plane helper glue code short-circuits because of that and leaves
        a non-NULL fb behind in plane->state->fb and plane->fb. The state
        fb isn't a real problem (it's properly refcounted on its own), it's
        just the canary.
      
      - Load detect code drops the reference for that fb, but doesn't set
        plane->fb = NULL. This is ok since it's still living in that old
        world where drivers had to clear the pointer but the core/callers
        handled the refcounting.
      
      - On the next modeset the drm core notices plane->fb and takes care of
        refcounting it properly by doing another unref. This drops the
        refcount to zero, leaving state->plane now pointing at freed memory.
      
      - intel_plane_duplicate_state still assume it owns a reference to that
        very state->fb and bad things start to happen.
      
      Fix this all by applying the same duct-tape as for the legacy setcrtc
      ioctl code and set crtc->primary->crtc properly.
      
      Cc: Matt Roper <matthew.d.roper@intel.com>
      Cc: Paul Bolle <pebolle@tiscali.nl>
      Cc: Rob Clark <robdclark@gmail.com>
      Cc: Paulo Zanoni <przanoni@gmail.com>
      Cc: Sean Paul <seanpaul@chromium.org>
      Cc: Matt Roper <matthew.d.roper@intel.com>
      Reported-and-tested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Reported-by: default avatarPaul Bolle <pebolle@tiscali.nl>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9128b040