Friday, May 2, 2014

keeping that data forever (with btrfs)

So we like Linux, that is established.  And we must have our data forever, that is also established.  But the ability to do this is a new thing.  btrfs is a very new thing, allegedly not supported well before kernel 3.9, but with the ability to do some pretty awesome stuff.

If that interests you, read this article.  I've read it a few times, and it is absolutely making me drool over the possibilites of btrfs.  Failing-ish drives will no longer corrupt files in the mp3 collection you've been curating since high school - an errant cosmic ray can no longer bork that video file from senior week.  With anything but zfs and btrfs (even mdadm or hardware raid) this can happen.

So, even though it's probably ill-advised and guaranteed to lose all the data because it's so new, I built a nice shiny new NAS box to test btrfs out on.  I'll just be doing backups from other personal machines with it, so if it crashes and burns, the worst that will happen is another device fails catastrophically at the same time and my decades of personal data are all lost.  If you use this at your job based solely on these instructions, you're insane.  And I like your style.

Start off by installing Debian jessie.  Or use wheezy and get the newer kernel from backports, but the newer btrfs-tools in jessie isn't backported.  Some of this stuff probably won't work with the older wheezy btrfs-tools.

I bought a couple 3T drives, and I had a couple 1.5T drives laying around, so I figured, why not make it interesting and get another 3T by turning the three spare 1.5T drives into a 3T mdadm raid5?  One of the drives was busy elsewhere today, so we're making the array degraded.

Make the filesystem on all three disks, like this:

$ fdisk /dev/sdd
Command (m for help): n
Partition type:
   p   primary (0 primary, 0 extended, 4 free)
   e   extended
Select (default p): 
Using default response p
Partition number (1-4, default 1): 
Using default value 1
First sector (2048-2930277167, default 2048): 
Using default value 2048
Last sector, +sectors or +size{K,M,G} (2048-2930277167, default 2930277167): 
Using default value 2930277167

Command (m for help): t
Selected partition 1
Hex code (type L to list codes): fd
Changed system type of partition 1 to fd (Linux raid autodetect)

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

Make the mdadm array, make sure it exists and write the mdadm conf file:

$ mdadm --create /dev/md0 --level=5 --raid-devices=3 /dev/sdd1 /dev/sde1 missing
$ cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] 
md0 : active raid5 sde1[1] sdd1[0]
      2930012160 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_]
      bitmap: 0/11 pages [0KB], 65536KB chunk

unused devices: <none>
$ mdadm --examine --scan >> /etc/mdadm/mdadm.conf

Now we make the raid6 btrfs.  I used force because some of these disks had already been formatted, but be careful with that option - it doesn't fail mkfs if there is already an fs on the disk.

$ mkfs.btrfs --data raid6 --metadata raid6 --label thevault --format /dev/sd[abfgh] /dev/md0

I stupidly ran the command on the actual physical console so I can't give output, but it looks like the optional features "extref" and "raid56" are turned on by default when you use mkfs.btrfs this way.

Check for your new fs:

$ btrfs filesystem show
Label: 'root'  uuid: 810ff954-180f-4997-a487-30c561ff3820
Total devices 1 FS bytes used 1.28GiB
devid    1 size 28.35GiB used 5.04GiB path /dev/sdc1

Label: 'thevault'  uuid: 7f407975-6c24-42ec-a9a5-4d1967d89cbe
Total devices 6 FS bytes used 112.00KiB
devid    1 size 2.73TiB used 2.02GiB path /dev/sda
devid    2 size 2.73TiB used 2.00GiB path /dev/sdb
devid    3 size 2.73TiB used 2.00GiB path /dev/sdf
devid    4 size 2.73TiB used 2.00GiB path /dev/sdg
devid    5 size 2.73TiB used 2.00GiB path /dev/sdh
devid    6 size 2.73TiB used 2.00GiB path /dev/md0

Btrfs v3.14.1

There's my boot disk and new storage array!  I'm guessing the already-used space is the disk metadata?

Now mount it so we can df it.  You can use any component drive as the source of the mount command and btrfs mount will automatically find the other members of the filesystem.  I'm using the md block device becuase it's easiest to remember here:

$ mkdir /mnt/thevault && mount /dev/md0 /mnt/thevault
$ btrfs filesystem df /mnt/thevault
Data, single: total=8.00MiB, used=0.00
Data, RAID6: total=4.00GiB, used=2.00MiB
System, single: total=4.00MiB, used=0.00
System, RAID6: total=10.50MiB, used=16.00KiB
Metadata, single: total=8.00MiB, used=0.00
Metadata, RAID6: total=4.00GiB, used=112.00KiB

We have metadata, outstanding.  Regular df appears to work too:

$ df
Filesystem       1K-blocks    Used   Available Use% Mounted on
/dev/sdc1         29729792 1424256    26350720   6% /
udev                 10240       0       10240   0% /dev
tmpfs               828964     416      828548   1% /run
tmpfs                 5120       0        5120   0% /run/lock
tmpfs              1964900       0     1964900   0% /run/shm
/dev/sda       17581345080    2176 17572912576   1% /mnt/thevault

So it looks like it works.  Time to make some samba shares and copy in a ton of stuff.  I'll post back later if there are any problems, or just to mention how it goes after some time has elapsed.

No comments:

Post a Comment