Commits · 7224327373b5e39c96a7b249f28398ba5bbaefa9 · onBOX.tv / linux-amlogic

Aug 20, 2015

amlogic: meson8b: Compile kernel tuned for cortex-a5 · 72243273
Arne Coucheron authored Aug 13, 2015

3.10-7224327

72243273

ARM: kconfig: select HAVE_EFFICIENT_UNALIGNED_ACCESS for CPUv6+ && MMU · abc87ac4

Will Deacon authored Dec 17, 2013



Modern ARM CPUs can perform efficient unaligned memory accesses in
hardware and this feature is relied up on by code such as the dcache
word-at-a-time name hashing.

Change-Id: I41559fc9255229e2601f4bef28aa8d48246156b7
Signed-off-by: Will Deacon <will.deacon@arm.com>

abc87ac4

ARM: 8215/1: vfp: Silence mvfr0 variable unused warning · 7a10c745

Stephen Boyd authored Nov 20, 2014



Stephen Rothwell reports that commit 3f4c9f8f0a20 ("ARM: 8197/1:
vfp: Fix VFPv3 hwcap detection on CPUID based cpus") introduced a
variable unused warning.

arch/arm/vfp/vfpmodule.c: In function 'vfp_init':
arch/arm/vfp/vfpmodule.c:725:6: warning: unused variable 'mvfr0'
[-Wunused-variable]
  u32 mvfr0;

Silence this warning by using IS_ENABLED instead of ifdefs.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

7a10c745

ARM: 8197/1: vfp: Fix VFPv3 hwcap detection on CPUID based cpus · 095f4cb9

Stephen Boyd authored Nov 10, 2014



The subarchitecture field in the fpsid register is 7 bits wide on
ARM CPUs using the CPUID identification scheme, spanning bits 22
to 16. The topmost bit is used to designate that the
subarchitecture designer is not ARM when it is set to 1. On
non-CPUID scheme CPUs the subarchitecture field is only 4 bits
wide and the higher bits are used to indicate no double precision
support (bit 20) and the FTSMX/FLDMX format (bits 21-22).

The VFP support code only looks at bits 19-16 to determine the
VFP version. On Qualcomm's processors (Krait and Scorpion) we
should see that we have HWCAP_VFPv3 but we don't because bit 22
is set to 1 to indicate that the subarchitecture is not
implemented by ARM and the rest of the bits are left as 0 because
this is the first subarchitecture that Qualcomm has designed.
Unfortunately we can't just widen the FPSID subarchitecture
bitmask to consider all the bits on a CPUID scheme because there
may be CPUs without the CPUID scheme that have VFP without double
precision support and then the version would be a very wrong and
large number. Instead, update the version detection logic to
consider if the CPU is using the CPUID scheme.

If the CPU is using CPUID scheme, use the MVFR registers to
determine what version of VFP is supported. We already do this
for VFPv4, so do something similar for VFPv3 and look for single
or double precision support in MVFR0. Otherwise fall back to
using FPSID to detect VFP support on non-CPUID scheme CPUs. We
know that VFPv3 is only present in CPUs that have support for the
CPUID scheme so this should be equivalent.

Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

095f4cb9

ARM: 7919/1: mm: refactor v7 cache cleaning ops to use way/index sequence · 5a2be6c8

Lorenzo Pieralisi authored Dec 09, 2013



Set-associative caches on all v7 implementations map the index bits
to physical addresses LSBs and tag bits to MSBs. As the last level
of cache on current and upcoming ARM systems grows in size,
this means that under normal DRAM controller configurations, the
current v7 cache flush routine using set/way operations triggers a
DRAM memory controller precharge/activate for every cache line
writeback since the cache routine cleans lines by first fixing the
index and then looping through ways (index bits are mapped to lower
physical addresses on all v7 cache implementations; this means that,
with last level cache sizes in the order of MBytes, lines belonging
to the same set but different ways map to different DRAM pages).

Given the random content of cache tags, swapping the order between
indexes and ways loops do not prevent DRAM pages precharge and
activate cycles but at least, on average, improves the chances that
either multiple lines hit the same page or multiple lines belong to
different DRAM banks, improving throughput significantly.

This patch swaps the inner loops in the v7 cache flushing routine
to carry out the clean operations first on all sets belonging to
a given way (looping through sets) and then decrementing the way.

Benchmarks showed that by swapping the ordering in which sets and
ways are decremented in the v7 cache flushing routine, that uses
set/way operations, time required to flush caches is reduced
significantly, owing to improved writebacks throughput to the DRAM
controller.

Benchmarks results vary and depend heavily on the last level of
cache tag RAM content when cache is cleaned and invalidated, ranging
from 2x throughput when all tag RAM entries contain dirty lines
mapping to sequential pages of RAM to 1x (ie no improvement) when
all tag RAM accesses trigger a DRAM precharge/activate cycle, as the
current code implies on most DRAM controller configurations.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

5a2be6c8

ARM: mm: use inner-shareable barriers for TLB and user cache operations · f9e5dd03

Will Deacon authored May 13, 2013



System-wide barriers aren't required for situations where we only need
to make visibility and ordering guarantees in the inner-shareable domain
(i.e. we are not dealing with devices or potentially incoherent CPUs).

This patch changes the v7 TLB operations, coherent_user_range and
dcache_clean_area functions to user inner-shareable barriers. For cache
maintenance, only the store access type is required to ensure completion.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

f9e5dd03

ARM: 7984/1: prefetch: add prefetchw invocations for barriered atomics · 8672e7da

Will Deacon authored Feb 21, 2014



After a bunch of benchmarking on the interaction between dmb and pldw,
it turns out that issuing the pldw *after* the dmb instruction can
give modest performance gains (~3% atomic_add_return improvement on a
dual A15).

This patch adds prefetchw invocations to our barriered atomic operations
including cmpxchg, test_and_xxx and futexes.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

8672e7da

ARM: 7853/1: cmpxchg: implement cmpxchg64_relaxed · cc917933

Will Deacon authored Oct 09, 2013



This patch introduces cmpxchg64_relaxed for arm, which performs a 64-bit
cmpxchg operation without barrier semantics. cmpxchg64_local is updated
to use the new operation.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

cc917933

ARM: 7852/1: cmpxchg: implement barrier-less cmpxchg64_local · 793310d5

Will Deacon authored Oct 09, 2013



Our cmpxchg64 macros are wrappers around atomic64_cmpxchg. Whilst this is
great for code re-use, there is a case for barrier-less cmpxchg where it
is known to be safe (for example cmpxchg64_local and cmpxchg-based
lockrefs).

This patch introduces a 64-bit cmpxchg implementation specifically
for the cmpxchg64_* macros, so that it can be later used by the lockref
code.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

793310d5

ARM: 7907/1: lib: delay-loop: Add align directive to fix BogoMIPS calculation · 75be9d65

Fabio Estevam authored Nov 30, 2013



Currently mx53 (CortexA8) running at 1GHz reports:
Calibrating delay loop... 663.55 BogoMIPS (lpj=3317760)

Tom Evans verified that alignments of 0x0 and 0x8 run the two instructions of __loop_delay in one clock cycle (1 clock/loop), while alignments of 0x4 and 0xc take 3 clocks to run the loop twice. (1.5 clock/loop)

The original object code looks like this:

00000010 <__loop_const_udelay>:
  10:	e3e01000 	mvn	r1, #0
  14:	e51f201c 	ldr	r2, [pc, #-28]	; 0 <__loop_udelay-0x8>
  18:	e5922000 	ldr	r2, [r2]
  1c:	e0800921 	add	r0, r0, r1, lsr #18
  20:	e1a00720 	lsr	r0, r0, #14
  24:	e0822b21 	add	r2, r2, r1, lsr #22
  28:	e1a02522 	lsr	r2, r2, #10
  2c:	e0000092 	mul	r0, r2, r0
  30:	e0800d21 	add	r0, r0, r1, lsr #26
  34:	e1b00320 	lsrs	r0, r0, #6
  38:	01a0f00e 	moveq	pc, lr

0000003c <__loop_delay>:
  3c:	e2500001 	subs	r0, r0, #1
  40:	8afffffe 	bhi	3c <__loop_delay>
  44:	e1a0f00e 	mov	pc, lr

After adding the 'align 3' directive to __loop_delay (align to 8 bytes):

00000010 <__loop_const_udelay>:
  10:	e3e01000 	mvn	r1, #0
  14:	e51f201c 	ldr	r2, [pc, #-28]	; 0 <__loop_udelay-0x8>
  18:	e5922000 	ldr	r2, [r2]
  1c:	e0800921 	add	r0, r0, r1, lsr #18
  20:	e1a00720 	lsr	r0, r0, #14
  24:	e0822b21 	add	r2, r2, r1, lsr #22
  28:	e1a02522 	lsr	r2, r2, #10
  2c:	e0000092 	mul	r0, r2, r0
  30:	e0800d21 	add	r0, r0, r1, lsr #26
  34:	e1b00320 	lsrs	r0, r0, #6
  38:	01a0f00e 	moveq	pc, lr
  3c:	e320f000 	nop	{0}

00000040 <__loop_delay>:
  40:	e2500001 	subs	r0, r0, #1
  44:	8afffffe 	bhi	40 <__loop_delay>
  48:	e1a0f00e 	mov	pc, lr
  4c:	e320f000 	nop	{0}

, which now reports:
Calibrating delay loop... 996.14 BogoMIPS (lpj=4980736)

Some more test results:

On mx31 (ARM1136) running at 532 MHz, before the patch:
Calibrating delay loop... 351.43 BogoMIPS (lpj=1757184)

On mx31 (ARM1136) running at 532 MHz after the patch:
Calibrating delay loop... 528.79 BogoMIPS (lpj=2643968)

Also tested on mx6 (CortexA9) and on mx27 (ARM926), which shows the same
BogoMIPS value before and after this patch.

Reported-by: Tom Evans <tom_usenet@optusnet.com.au>
Suggested-by: Tom Evans <tom_usenet@optusnet.com.au>
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

75be9d65

ARM: bitops: prefetch the destination word for write prior to strex · e2c5bfa6

Will Deacon authored Jun 27, 2013



The cost of changing a cacheline from shared to exclusive state can be
significant, especially when this is triggered by an exclusive store,
since it may result in having to retry the transaction.

This patch prefixes our atomic bitops implementation with prefetchw,
to try and grab the line in exclusive state from the start. The testop
macro is left alone, since the barrier semantics limit the usefulness
of prefetching data.

Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>

e2c5bfa6

ARM: 8191/1: decompressor: ensure I-side picks up relocated code · 424e6036

Will Deacon authored Nov 04, 2014



To speed up decompression, the decompressor sets up a flat, cacheable
mapping of memory. However, when there is insufficient space to hold
the page tables for this mapping, we don't bother to enable the caches
and subsequently skip all the cache maintenance hooks.

Skipping the cache maintenance before jumping to the relocated code
allows the processor to predict the branch and populate the I-cache
with stale data before the relocation loop has completed (since a
bootloader may have SCTLR.I set, which permits normal, cacheable
instruction fetches regardless of SCTLR.M).

This patch moves the cache maintenance check into the maintenance
routines themselves, allowing the v6/v7 versions to invalidate the
I-cache regardless of the MMU state.

Cc: <stable@vger.kernel.org>
Reported-by: Marc Carino <marc.ceeeee@gmail.com>
Tested-by: Julien Grall <julien.grall@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

424e6036

ARM: 7992/1: boot: compressed: ignore bswapsdi2.S · 15a8d6de

Mark Rutland authored Feb 26, 2014

Commit 017f161a

 (ARM: 7877/1: use built-in byte swap function) added
bswapsdi2.{o,S} to arch/arm/boot/compressed/Makefile, but didn't update
the .gitignore. Thus after a a build git status shows bswapsdi2.S as a
new file, which is a little annoying.

This patch updates arch/arm/boot/compressed/.gitignore to ignore
bswapsdi2.S, as we already do for ashldi3.S and others.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Kim Phillips <kim.phillips@freescale.com>
Cc: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

15a8d6de

ARM: 7877/1: use built-in byte swap function · c71489fa

Kim Phillips authored Nov 06, 2013



Enable the compiler intrinsic for byte swapping on arch ARM. This
allows the compiler to detect and be able to optimize out byte
swappings, and has a very modest benefit on vmlinux size (Linaro gcc
4.8):

text data bss dec hex filename
2840310 123932 61960 3026202 2e2d1a vmlinux-lart #orig
2840152 123932 61960 3026044 2e2c7c vmlinux-lart #builtin-bswap

6473120 314840 5616016 12403976 bd4508 vmlinux-mxs #orig
6472586 314848 5616016 12403450 bd42fa vmlinux-mxs #builtin-bswap

7419872 318372 379556 8117800 7bde28 vmlinux-imx_v6_v7 #orig
7419170 318364 379556 8117090 7bdb62 vmlinux-imx_v6_v7 #builtin-bswap

Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

c71489fa

arm: Fixup LZ4 compression support for 3.10 kernel · a20f0620
Arne Coucheron authored Aug 04, 2015

a20f0620

arm: add support for LZ4-compressed kernel · 0b53722f

Kyungsik Lee authored Jul 08, 2013

Integrates the LZ4 decompression code to the arm pre-boot code.

Signed-off-by: Kyungsik Lee <kyungsik.lee@lge.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Florian Fainelli <florian@openwrt.org>
Cc: Yann Collet <yann.collet.73@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

0b53722f

scripts/extract-ikconfig: Support LZ4-compressed images. · 9ca423cc

Alex Pilon authored Apr 14, 2015



Support for kernel image LZ4 compression was added around 3.11, but not
the corresponding kernel .config extraction.

This makes possible extracting the kernel config for LZ4-compressed
kernels you're not running, or the current LZ4-compressed kernel if
compiled without /proc/config.gz support.

Signed-off-by: Alex Pilon <alp+linux@alexpilon.ca>
Signed-off-by: Michal Marek <mmarek@suse.cz>

9ca423cc

scripts/gen_initramfs_list.sh: fix flags for initramfs LZ4 compression · f7b11777

Daniel M. Weeks authored Mar 03, 2014



LZ4 as implemented in the kernel differs from the default method now
used by the reference implementation of LZ4.  Until the in-kernel method
is updated to support the new default, passing the legacy flag (-l) to
the compressor is necessary.  Without this flag the kernel-generated,
LZ4-compressed initramfs is junk.

Kyungsik said:

: It seems that lz4 supports legacy format with the same option as lz4c
: does.  Just looking at the first few bytes of lz4 compressed image, we can
: see whether it is new format or not.
:
: It shows new format magic number without this patch.  New format magic
: number is 0x184d2204.
:
: $ hexdump -C ./initramfs_data.cpio.lz4 |more
: 00000000  04 22 4d 18 64 70 b9 69 (Little Endian)
: ...
:
: Currently kernel supports legacy format only.

Signed-off-by: Daniel M. Weeks <dan@danweeks.net>
Cc: Michal Marek <mmarek@suse.cz>
Acked-by: Kyungsik Lee <kyungsik.lee@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

f7b11777

initramfs: read CONFIG_RD_ variables for initramfs compression · 23f58c4b

P J P authored Nov 12, 2013



When expert configuration option(CONFIG_EXPERT) is enabled, menuconfig
offers a choice of compression algorithm to compress initial ramfs image;
This choice is stored into CONFIG_RD_* variables.  But usr/Makefile uses
earlier INITRAMFS_COMPRESSION_* macros to build initial ramfs file.  Since
none of them is defined, resulting 'initramfs_data.cpio' file remains
un-compressed.

This patch updates the Makefile to use CONFIG_RD_* variables and adds
support for LZ4 compression algorithm.  Also updates the
'gen_initramfs_list.sh' script to check whether a selected compression
command is accessible or not.  And fall-back to default gzip(1)
compression when it is not.

Signed-off-by: P J P <prasad@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

23f58c4b

lz4: fix system halt at boot kernel on x86_64 · 230ae7dc

Krzysztof Kolasa authored May 03, 2015

Sometimes, on x86_64, decompression fails with the following
error:

Decompressing Linux...

Decoding failed

 -- System halted

This condition is not needed for a 64bit kernel(from commit d5e7cafd

):

if( ... ||
    (op + COPYLENGTH) > oend)
    goto _output_error

macro LZ4_SECURE_COPY() tests op and does not copy any data
when op exceeds the value.

added by analogy to lz4_uncompress_unknownoutputsize(...)

Signed-off-by: Krzysztof Kolasa <kkolasa@winsoft.pl>
Tested-by: Alexander Kuleshov <kuleshovmail@gmail.com>
Tested-by: Caleb Jorden <cjorden@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

230ae7dc

lib/lz4: Pull out constant tables · e5023db5

Rasmus Villemoes authored Feb 10, 2015



There's no reason to allocate the dec{32,64}table on the stack; it
just wastes a bunch of instructions setting them up and, of course,
also consumes quite a bit of stack. Using size_t for such small
integers is a little excessive.

$ scripts/bloat-o-meter /tmp/built-in.o lib/built-in.o
add/remove: 2/2 grow/shrink: 2/0 up/down: 1304/-1548 (-244)
function                                     old     new   delta
lz4_decompress_unknownoutputsize              55     718    +663
lz4_decompress                                55     632    +577
dec64table                                     -      32     +32
dec32table                                     -      32     +32
lz4_uncompress                               747       -    -747
lz4_uncompress_unknownoutputsize             801       -    -801

The now inlined lz4_uncompress functions used to have a stack
footprint of 176 bytes (according to -fstack-usage); their inlinees
have increased their stack use from 32 bytes to 48 and 80 bytes,
respectively.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

e5023db5

LZ4 : fix the data abort issue · 6fa63426

JeHyeon Yeon authored Mar 16, 2015



If the part of the compression data are corrupted, or the compression
data is totally fake, the memory access over the limit is possible.

This is the log from my system usning lz4 decompression.
   [6502]data abort, halting
   [6503]r0  0x00000000 r1  0x00000000 r2  0xdcea0ffc r3  0xdcea0ffc
   [6509]r4  0xb9ab0bfd r5  0xdcea0ffc r6  0xdcea0ff8 r7  0xdce80000
   [6515]r8  0x00000000 r9  0x00000000 r10 0x00000000 r11 0xb9a98000
   [6522]r12 0xdcea1000 usp 0x00000000 ulr 0x00000000 pc  0x820149bc
   [6528]spsr 0x400001f3
and the memory addresses of some variables at the moment are
    ref:0xdcea0ffc, op:0xdcea0ffc, oend:0xdcea1000

As you can see, COPYLENGH is 8bytes, so @ref and @op can access the momory
over @oend.

Signed-off-by: JeHyeon Yeon <tom.yeon@windriver.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

6fa63426

lz4: add overrun checks to lz4_uncompress_unknownoutputsize() · 900f9d1a

Greg Kroah-Hartman authored Jul 03, 2014



Jan points out that I forgot to make the needed fixes to the
lz4_uncompress_unknownoutputsize() function to mirror the changes done
in lz4_decompress() with regards to potential pointer overflows.

The only in-kernel user of this function is the zram code, which only
takes data from a valid compressed buffer that it made itself, so it's
not a big issue.  But due to external kernel modules using this
function, it's better to be safe here.

Reported-by: Jan Beulich <JBeulich@suse.com>
Cc: "Don A. Bailey" <donb@securitymouse.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

900f9d1a

lz4: fix another possible overrun · 869d1417

Greg Kroah-Hartman authored Jun 24, 2014



There is one other possible overrun in the lz4 code as implemented by
Linux at this point in time (which differs from the upstream lz4
codebase, but will get synced at in a future kernel release.)  As
pointed out by Don, we also need to check the overflow in the data
itself.

While we are at it, replace the odd error return value with just a
"simple" -1 value as the return value is never used for anything other
than a basic "did this work or not" check.

Reported-by: "Don A. Bailey" <donb@securitymouse.com>
Reported-by: Willy Tarreau <w@1wt.eu>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

869d1417

lz4: ensure length does not wrap · d93508bd

Greg Kroah-Hartman authored Jun 20, 2014



Given some pathologically compressed data, lz4 could possibly decide to
wrap a few internal variables, causing unknown things to happen.  Catch
this before the wrapping happens and abort the decompression.

Reported-by: "Don A. Bailey" <donb@securitymouse.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

d93508bd

lib/lz4: correct the LZ4 license · 43c28186

Richard Laager authored Aug 22, 2013



The LZ4 code is listed as using the "BSD 2-Clause License".

Signed-off-by: Richard Laager <rlaager@wiktel.com>
Acked-by: Kyungsik Lee <kyungsik.lee@lge.com>
Cc: Chanho Min <chanho.min@lge.com>
Cc: Richard Yao <ryao@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ The 2-clause BSD can be just converted into GPL, but that's rude and
  pointless, so don't do it   - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

43c28186

lz4: fix compression/decompression signedness mismatch · bee5be94

Sergey Senozhatsky authored Sep 11, 2013



LZ4 compression and decompression functions require different in
signedness input/output parameters: unsigned char for compression and
signed char for decompression.

Change decompression API to require "(const) unsigned char *".

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Kyungsik Lee <kyungsik.lee@lge.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Yann Collet <yann.collet.73@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

bee5be94

lib: add lz4 compressor module · 5aab78b6

Chanho Min authored Jul 08, 2013

This patchset is for supporting LZ4 compression and the crypto API using
it.

As shown below, the size of data is a little bit bigger but compressing
speed is faster under the enabled unaligned memory access.  We can use
lz4 de/compression through crypto API as well.  Also, It will be useful
for another potential user of lz4 compression.

lz4 Compression Benchmark:
Compiler: ARM gcc 4.6.4
ARMv7, 1 GHz based board
   Kernel: linux 3.4
   Uncompressed data Size: 101 MB
         Compressed Size  compression Speed
   LZO   72.1MB		  32.1MB/s, 33.0MB/s(UA)
   LZ4   75.1MB		  30.4MB/s, 35.9MB/s(UA)
   LZ4HC 59.8MB		   2.4MB/s,  2.5MB/s(UA)
- UA: Unaligned memory Access support
- Latest patch set for LZO applied

This patch:

Add support for LZ4 compression in the Linux Kernel.  LZ4 Compression APIs
for kernel are based on LZ4 implementation by Yann Collet and were changed
for kernel coding style.

LZ4 homepage : http://fastcompression.blogspot.com/p/lz4.html
LZ4 source repository : http://code.google.com/p/lz4/


svn revision : r90

Two APIs are added:

lz4_compress() support basic lz4 compression whereas lz4hc_compress()
support high compression or CPU performance get lower but compression
ratio get higher.  Also, we require the pre-allocated working memory with
the defined size and destination buffer must be allocated with the size of
lz4_compressbound.

[akpm@linux-foundation.org: make lz4_compresshcctx() static]
Signed-off-by: Chanho Min <chanho.min@lge.com>
Cc: "Darrick J. Wong" <djwong@us.ibm.com>
Cc: Bob Pearson <rpearson@systemfabricworks.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Herbert Xu <herbert@gondor.hengli.com.au>
Cc: Yann Collet <yann.collet.73@gmail.com>
Cc: Kyungsik Lee <kyungsik.lee@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

5aab78b6

lib: add support for LZ4-compressed kernel · fc1c1126

Kyungsik Lee authored Jul 08, 2013

Add support for extracting LZ4-compressed kernel images, as well as
LZ4-compressed ramdisk images in the kernel boot process.

Signed-off-by: Kyungsik Lee <kyungsik.lee@lge.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Florian Fainelli <florian@openwrt.org>
Cc: Yann Collet <yann.collet.73@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fc1c1126

decompressor: add LZ4 decompressor module · c9a83e1c

Kyungsik Lee authored Jul 08, 2013

Add support for LZ4 decompression in the Linux Kernel.  LZ4 Decompression
APIs for kernel are based on LZ4 implementation by Yann Collet.

Benchmark Results(PATCH v3)
Compiler: Linaro ARM gcc 4.6.2

1. ARMv7, 1.5GHz based board
   Kernel: linux 3.4
   Uncompressed Kernel Size: 14MB
        Compressed Size  Decompression Speed
   LZO  6.7MB            20.1MB/s, 25.2MB/s(UA)
   LZ4  7.3MB            29.1MB/s, 45.6MB/s(UA)

2. ARMv7, 1.7GHz based board
   Kernel: linux 3.7
   Uncompressed Kernel Size: 14MB
        Compressed Size  Decompression Speed
   LZO  6.0MB            34.1MB/s, 52.2MB/s(UA)
   LZ4  6.5MB            86.7MB/s
- UA: Unaligned memory Access support
- Latest patch set for LZO applied

This patch set is for adding support for LZ4-compressed Kernel.  LZ4 is a
very fast lossless compression algorithm and it also features an extremely
fast decoder [1].

But we have five of decompressors already and one question which does
arise, however, is that of where do we stop adding new ones?  This issue
had been discussed and came to the conclusion [2].

Russell King said that we should have:

 - one decompressor which is the fastest
 - one decompressor for the highest compression ratio
 - one popular decompressor (eg conventional gzip)

If we have a replacement one for one of these, then it should do exactly
that: replace it.

The benchmark shows that an 8% increase in image size vs a 66% increase
in decompression speed compared to LZO(which has been known as the
fastest decompressor in the Kernel).  Therefore the "fast but may not be
small" compression title has clearly been taken by LZ4 [3].

[1] http://code.google.com/p/lz4/
[2] http://thread.gmane.org/gmane.linux.kbuild.devel/9157
[3] http://thread.gmane.org/gmane.linux.kbuild.devel/9347

LZ4 homepage: http://fastcompression.blogspot.com/p/lz4.html
LZ4 source repository: http://code.google.com/p/lz4/



Signed-off-by: Kyungsik Lee <kyungsik.lee@lge.com>
Signed-off-by: Yann Collet <yann.collet.73@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Florian Fainelli <florian@openwrt.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

c9a83e1c

Squashfs: Add LZ4 compression configuration option · 911946b1

Phillip Lougher authored Nov 27, 2014



Add the glue code, and also update the documentation.

Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>

911946b1

Squashfs: add LZ4 compression support · c555b13e

Phillip Lougher authored Nov 27, 2014



Add support for reading file systems compressed with the
LZ4 compression algorithm.

This patch adds the LZ4 decompressor wrapper code.

Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>

c555b13e

fs/squashfs/super.c: logging cleanup · 8084c74d

Fabian Frederick authored Aug 06, 2014



- Convert printk to pr_foo()
- Add pr_fmt for future logging entries
- Coalesce formats

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8084c74d

fs/squashfs/file_direct.c: replace count*size kmalloc by kmalloc_array · f2c3297d

Fabian Frederick authored Aug 06, 2014



kmalloc_array() manages count*sizeof overflow.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Phillip Lougher <phillip@squashfs.org.uk>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

f2c3297d

fs/squashfs/squashfs.h: replace pr_warning by pr_warn · 6d1ff450

Fabian Frederick authored Jun 04, 2014



Update the last pr_warning callsite in fs branch

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

6d1ff450

fs: push sync_filesystem() down to the file system's remount_fs() · 2d59e6f8

Theodore Ts'o authored Mar 13, 2014



Previously, the no-op "mount -o mount /dev/xxx" operation when the
file system is already mounted read-write causes an implied,
unconditional syncfs().  This seems pretty stupid, and it's certainly
documented or guaraunteed to do this, nor is it particularly useful,
except in the case where the file system was mounted rw and is getting
remounted read-only.

However, it's possible that there might be some file systems that are
actually depending on this behavior.  In most file systems, it's
probably fine to only call sync_filesystem() when transitioning from
read-write to read-only, and there are some file systems where this is
not needed at all (for example, for a pseudo-filesystem or something
like romfs).

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: linux-fsdevel@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Cc: Jan Kara <jack@suse.cz>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Anders Larsen <al@alarsen.net>
Cc: Phillip Lougher <phillip@squashfs.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Cc: Petr Vandrovec <petr@vandrovec.name>
Cc: xfs@oss.sgi.com
Cc: linux-btrfs@vger.kernel.org
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Cc: codalist@coda.cs.cmu.edu
Cc: linux-ext4@vger.kernel.org
Cc: linux-f2fs-devel@lists.sourceforge.net
Cc: fuse-devel@lists.sourceforge.net
Cc: cluster-devel@redhat.com
Cc: linux-mtd@lists.infradead.org
Cc: jfs-discussion@lists.sourceforge.net
Cc: linux-nfs@vger.kernel.org
Cc: linux-nilfs@vger.kernel.org
Cc: linux-ntfs-dev@lists.sourceforge.net
Cc: ocfs2-devel@oss.oracle.com
Cc: reiserfs-devel@vger.kernel.org

2d59e6f8

Squashfs: fix failure to unlock pages on decompress error · 4cce3ef3

Phillip Lougher authored Nov 24, 2013



Direct decompression into the page cache.  If we fall back
to using an intermediate buffer (because we cannot grab all the
page cache pages) and we get a decompress fail, we forgot to
release the pages.

Reported-by: Roman Peniaev <r.peniaev@gmail.com>
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>

4cce3ef3

Squashfs: Check stream is not NULL in decompressor_multi.c · ff4dbec8

Phillip Lougher authored Nov 10, 2013



Fix static checker complaint that stream is not checked in
squashfs_decompressor_destroy().

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Reviewed-by: Minchan Kim <minchan@kernel.org>

ff4dbec8

Squashfs: Directly decompress into the page cache for file data · 966078b9

Phillip Lougher authored Nov 13, 2013



This introduces an implementation of squashfs_readpage_block()
that directly decompresses into the page cache.

This uses the previously added page handler abstraction to push
down the necessary kmap_atomic/kunmap_atomic operations on the
page cache buffers into the decompressors.  This enables
direct copying into the page cache without using the slow
kmap/kunmap calls.

The code detects when multiple threads are racing in
squashfs_readpage() to decompress the same block, and avoids
this regression by falling back to using an intermediate
buffer.

This patch enhances the performance of Squashfs significantly
when multiple processes are accessing the filesystem simultaneously
because it not only reduces memcopying, but it more importantly
eliminates the lock contention on the intermediate buffer.

Using single-thread decompression.

        dd if=file1 of=/dev/null bs=4096 &
        dd if=file2 of=/dev/null bs=4096 &
        dd if=file3 of=/dev/null bs=4096 &
        dd if=file4 of=/dev/null bs=4096

Before:

629145600 bytes (629 MB) copied, 45.8046 s, 13.7 MB/s

After:

629145600 bytes (629 MB) copied, 9.29414 s, 67.7 MB/s

Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Reviewed-by: Minchan Kim <minchan@kernel.org>

966078b9

Squashfs: Restructure squashfs_readpage() · 75a063e3

Phillip Lougher authored Oct 31, 2013



Restructure squashfs_readpage() splitting it into separate
functions for datablocks, fragments and sparse blocks.

Move the memcpying (from squashfs cache entry) implementation of
squashfs_readpage_block into file_cache.c

This allows different implementations to be supported.

Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Reviewed-by: Minchan Kim <minchan@kernel.org>

75a063e3