compsize takes a list of files on a btrfs filesystem (recursing directories) and measures used
compression types and the effective compression ratio. Besides compression, compsize shows the effect of
reflinks (cp--reflink, snapshots, deduplication), and certain types of btrfs waste.
The program gives a report similar to:
Processed 90319 files.
Type Perc Disk Usage Uncompressed Referenced
TOTAL 79% 1.4G 1.8G 1.9G
none 100% 1.0G 1.0G 1.0G
lzo 53% 446M 833M 843M
The fields above are:
Type compression algorithm
Perc disk usage/uncompressed (compression ratio)
DiskUsage
blocks on the disk; this is what storing these files actually costs you (save for RAID
considerations)
Uncompressed
uncompressed extents; what you would need without compression - includes deduplication savings and
pinned extent waste
Referenced
apparent file sizes (sans holes); this is what a traditional filesystem that supports holes and
efficient tail packing, or tar-S, would need to store these files
Let's see this on an example: a file 128K big is stored as a single extent A which compressed to a single
4K page. It then receives a write of 32K at offset 32K, which also compressed to a single 4K page,
stored as extent B.
The file now appears as:
+-------+-------+---------------+
extent A | used | waste | used |
+-------+-------+---------------+
extent B | used |
+-------+
The "waste" inside extent A can't be gotten rid until the whole extent is rewritten (for example by
defrag). If compressed, the whole extent needs to be read every time that part of the file is being
read, thus the "waste" is still required.
In this case, we have: DiskUsage: 8KB, Uncompressed: 160K, Referenced: 128K.