Deduplicating Btrfs
1. Intro
BTRFS supports deduplication. According to the BTRFS docs:
Deduplication is the process of looking up identical data blocks tracked separately and creating a shared logical link while removing one of the copies of the data blocks. This leads to data space savings while it increases metadata consumption.
BTRFS provides the basic building blocks for deduplication allowing other tools to choose the strategy and scope of the deduplication.
So, to take advantage of deduplication in BTRFS, we have to use one of these deduplication tools.
We will use BEES (Best-Effort Extent-Same). It is a block-oriented userspace deduplication agent designed for large btrfs filesystems.
2. Installation
In a Debian 12 server we have to build it from the source:
cd ~
git clone https://github.com/Zygo/bees
cd bees/
apt install -y build-essential btrfs-progs markdown
make
make install
which beesd
apt install -y uuid-runtime # it installs 'uuidparse'
3. Configuration
For more details look at this page.
First we need to find out the UUIDs of the filesystems we want to run Bees on:
btrfs filesystem show
Then we should create a config file for each filesystem, like this:
cat <<EOF > /etc/bees/disk1.conf
UUID=91f2d0de-6678-4e89-9b0d-9ab8bdc724f2
OPTIONS="-P -v 6"
DB_SIZE=$((256*1024*1024))
EOF
The sample config file /etc/bees/beesd.conf.sample
has also
some comments with some explanations.
4. Running
We want to run Bees as a service (one for each filesystem):
cp ~/bees/scripts/beesd@.service /lib/systemd/system/
systemctl enable --now beesd@fe0a1142-51ab-4181-b635-adbf9f4ea6e6.service
systemctl enable --now beesd@91f2d0de-6678-4e89-9b0d-9ab8bdc724f2.service
systemctl status 'bees*'
After it has been running for some time (maybe a few hours or more), we will notice that the amount of the used disk space is decreased:
btrfs filesystem show
The command top
will also show that bees
is working
intensively.