Deduplicating Data With XFS And Reflinks

Reading time ~3 minutes

Deduplicating Data With XFS And Reflinks

Deduplicating Data With XFS And Reflinks

Copy-on-Write filesystems have the nice property that it is possible to "clone" files instantly, by having the new file refer to the old blocks, and copying (possibly) changed blocks. This both saves time and space, and can be very beneficial in a lot of situations (for example when working with big files). In Linux this type of copy is called "reflink". We will see how to use it on the XFS filesystem.

1 Create a virtual block device

Linux supports a special block device called the loop device, which maps a normal file onto a virtual block device. This allows for the file to be used as a "virtual file system". Let's create such a file:

$ dd if=/dev/zero of=disk.img bs=100M count=10
10+0 records in
10+0 records out
1048576000 bytes (1,0 GB, 1000 MiB) copied, 10,8156 s, 97,0 MB/s

The size of this file is 1GB:

$ du -hs disk.img
1001M	disk.img

Now let's create a loop device with this file:

$ sudo losetup -f disk.img
$ losetup -a
/dev/loop0: []: (/home/user/disk.img)

The option -f finds an unused loop device, and losetup -a shows the name of the loop device that was created.

2 Create an XFS filesystem with the 'reflink' flag

Let's create an XFS filesystem on it, with the flag reflink=1 and the label test:

$ mkfs.xfs -m reflink=1 -L test disk.img
meta-data=disk.img               isize=512    agcount=4, agsize=64000 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=0, rmapbt=0, reflink=1
data     =                       bsize=4096   blocks=256000, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=1850, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

Now we can mount it on a directory:

$ mkdir mnt
$ sudo mount /dev/loop0 mnt
$ df -h mnt/
Filesystem      Size  Used Avail Use% Mounted on
/dev/loop0      993M   40M  954M   4% /home/user/mnt

3 Copy files with '–reflink'

Let's create for testing a file of size 100 MB (with random data):

$ sudo chown mnt: user
$ cd mnt/
$ dd if=/dev/urandom of=test bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0,445498 s, 235 MB/s

The command df -h . shows us 140M used:

$ df -h .
Filesystem      Size  Used Avail Use% Mounted on
/dev/loop0      993M  140M  854M  15% /home/user/mnt

Let’s copy the file with reflinks enabled:

$ cp -v --reflink=always test test1
'test' -> 'test1'

$ ls -hsl
total 200M
100M -rw-rw-r-- 1 user user 100M Aug 30 10:44 test
100M -rw-rw-r-- 1 user user 100M Aug 30 10:49 test1

$ df -h .
Filesystem      Size  Used Avail Use% Mounted on
/dev/loop0      993M  140M  854M  15% /home/user/mnt

So, each copy of the file is 100M, but both of them still take on disk the same amout of space as before (140M). This shows the space-saving feature of reflinks. If the file was big enough, we would have noticed as well that the reflink copy takes no time at all, it is done instantly.

4 Deduplicate existing files

If there were already normal (non-reflink) copies of the file, we could deduplicate them with a tool like duperemove:

$ cp test test2
$ cp test test3

$ ls -hsl
total 400M
100M -rw-rw-r-- 1 user user 100M Aug 30 10:44 test
100M -rw-rw-r-- 1 user user 100M Aug 30 10:49 test1
100M -rw-rw-r-- 1 user user 100M Aug 30 11:03 test2
100M -rw-rw-r-- 1 user user 100M Aug 30 11:03 test3

$ df -h .
Filesystem      Size  Used Avail Use% Mounted on
/dev/loop0      993M  340M  654M  35% /home/user/mnt

Now let's install and run duperemove

$ sudo apt install duperemove

$ duperemove -hdr --hashfile=/tmp/test.hash .

$ df -h .
Filesystem      Size  Used Avail Use% Mounted on
/dev/loop0      993M  198M  796M  20% /home/user/mnt

So, it has reduced the amount of disk space used from 340M to 198M.

5 Clean up

Unmount and delete the test directory mnt/:

$ cd ..
$ umount mnt/
$ rmdir mnt/

Delete the loop device:

$ losetup -a
$ sudo losetup -d /dev/loop0

Remove the file that was used to create the loop device:

$ rm disk.img

Date: 2019-08-30

Author: Dashamir Hoxha

Created: 2019-08-30 Fri 20:28

Emacs 25.2.2 (Org mode 8.2.10)

Validate

Remote Desktop Access With VNC And SSH Tunnels

Remote Desktop Access With VNC And SSH TunnelsRemote Desktop Access With VNC And SSH TunnelsTable of Contents1. Enable desktop sharing on...… Continue reading

Installing NextCloud With Docker Scripts

Published on August 18, 2019

Linux Desktop In a Container

Published on April 17, 2019