Tuesday, September 30, 2014

btrfs raid1 as root file system - the immortal life of lil turbo

Sometimes, you just need to reformat.  Instead of trying to convert my existing system from extX to btrfs raid, I reinstalled.  And then converted my brand-new system from ext to btrfs.  Because why do things the easy way when you could do them the hard way?  (If you want to actually convert an existing system, just skip to step 3.  It should work, but this is all cowboy-style, so don't blame me if everything explodes.)  Here are the basic steps.  Be warned, this is from a few weeks old memory:

(If you want to read about how I got here, check this out.  If you just want a guide to do this, the backstory doesn't matter, so just keep reading this page.)
  1. Install Wheezy.  However you normally do; all the defaults are fine.  If you feel like it, you can halve the size of swap and add that back in to the system partition.  Or you can do this later with gparted, or you can leave it and have twice as much disk devoted to swap as the Debian installer thinks you'll need.
  2. Upgrade to Jessie.  Also in the normal way.
  3. root@serv$ vi /etc/apt/sources.list
    :%s/wheezy/jessie/g
    :%s/stable/testing/g
    :%s/^deb-src/#deb-src/g
    :wq
    root@serv$ apt-get update
    root@serv$ apt-get dist-upgrade -y
    
  4. Boot into an alternate Jessie environment.  Or at least something with recent btrfs-tools.  Ubuntu may work, but I made a custom Jessie iso on the Debian live-systems build interface.  This tool is really cool, someone's dedicating a lot of server time to make this thing happen and I think it's awesome.  On the downside, you'll probably have to wait a few days before you make it to the top of the queue, and once your "build finished" email is sent out, you'll have to download the iso in the 24 hours before they delete it.
  5. Install btrfs-tools in the live boot.  Once booted into the new environment, we'll need btrfs-tools, of course.  btrfs --version to make sure you've done stuff right - if you're using the ancient Wheezy 0.19 version this stuff may not work right.  The correct version should sound like a kernel version number, mine is currently π:

  6. :)
  7. Convert the just-installed ext root to btrfs.  I got most of my instructions on this step from the occasionally wonderful btrfs wiki.  It doesn't matter if your root is ext3 or ext4 - in fact, these steps may even work with ext2, how should I know.  The steps go something like this:
    root@serv$ # run fdisk -l as root to make sure you're using the hard disk's root filesystem partition for these next steps
    root@serv$ fsck -f /dev/sdX1
    root@serv$ btrfs-convert /dev/sdX1
    
    Use the following optional but prudent steps to make sure your data survived:
    root@serv$ mkdir /btrfs && mount -t btrfs /dev/sdX1 /mnt/btrfs
    root@serv$ btrfs subvol list
    root@serv$ # find the name of the saved subvolume, something like extX_saved
    root@serv$ mkdir /ext_saved && mount -t btrfs -o subvol=extX_saved /dev/sdX1 /ext_saved
    root@serv$ mkdir /orig && mount -o loop,ro /ext_saved/image /orig
    
    Yep, that's a triple mount.  The contents of the last mount should be the same as the contents of your root filesystem.  Check anything important or customized, and, if you're satisfied and want to set everything in stone:
  8. root@serv$ btrfs subvol delete extX_saved
    root@serv$ umount /orig
    root@serv$ rm /ext_saved/image
    root@serv$ umount /btrfs /ext_saved
    root@serv$ rmdir /orig /ext_saved /btrfs
    
  9. Modify fstab. Make sure you change fstab or your system isn't going to boot, fool.  Use blkid to get the UUID of the boot partition and make sure this matches the entry for your / in fstab (I don't think the UUID will change but I can't remember).  Then make sure the line looks something like this:
  10. UUID=deadbeef-beef-dead-beef-deadbeefbeef    /    btrfs    noatime,ssd,discard,space_cache    0    0
    
    Yes, it is correct that btrfs roots get a 0 for passno, the last number - this means don't worry about running fsck, since fsck.btrfs is just a feel-good utility anyway.  They only released it to fit in, the whole story's in the manpage, which is a pretty good read, btw.
    root@serv$ man fsck.btrfs
    
    Anyway, back to stuff that matters - don't just blindly use those mount options in my fstab line - if you use ssd on a drive that isn't an ssd, you'll probably have a bad time.  I didn't turn on certain options like autodefrag and compress=lzo becuase this is intended to be a VM server, and also probably because I don't know what I'm doing.  Check this out, the corresponding page on the ever-helpful wiki.  The whole thing is worth a read, make some damn decisions of your own!
  11. Pop out the alternate boot media and reboot.  Sometimes, when emerging from deeply nested sessions, chroots, or alternate boot environments, don't you feel like Cobb waking at the end of Inception?  Anyway, you should be booted into the newly buttery root of your recently installed system now.
  12. Verify integrity and clean up.  I know that shit's boring, yo, but we're gonna do it anyway.
  13. root@serv$ btrfs subvol delete ext_saved
    root@serv$ # allegedly you can verify with btrfs subvol list -d /, but the manpage for the btrfs-tools version pi on Jessie didn't have this documented
    root@serv$ btrfs fi defrag -r /
    root@serv$ btrfs balance start /
    
  14. Add the secondary drive and partition.  To get the second drive partitioned properly, I simply popped in the second drive and dd'ed the existing disk to the second one.
  15. root@serv$ dd if=/dev/sdSETUPDRIVE of=/dev/sdNEWDRIVE bs=32M # don't fuck this up, mmk?
    
    This will clone our boot, system and swap partitions to the new drive.  For general applications, I recommend halving each swap.  Even though I never had you modify the swap part of fstab, Linux is smart and will find and use all swap partitions attached to the computer.
  16. Convert to raid1 live!  "Fuck it, we're doing it live."  Yeah, computers are pretty cool I guess.  From here.
  17. root@serv$ btrfs fi show # to see which device is mounted as root
    root@serv$ fdisk -l # to see which device will be added to form our raid1 (aka, which one is NOT root)
    root@serv$ btrfs device add /dev/sdNOTBOOT1 /
    root@serv$ btrfs balance start -dconvert=raid1 -mconvert=raid1 -sconvert=raid1 -f /
    The last command complains if you try to convert system blocks to raid1 as well (-sconvert=raid1), which is why we use the -f flag, which has the ominous manpage description "force reducing of metadata integrity".  But there is no information I could find out there regarding this, and I want to support complete failover, so this is what I'm using and it's working ok for now.
  18. And we're done!  Isn't it great?  Hypothetically, one of our drives can fail and we'll still be able to boot!  I think we might be screwed if the boot partition gives us trouble, but I'm not realy sure yet.
As always, the Arch wiki docs are unparalleled, peruse related info here.  I hope it all worked, drop a line below if something didn't, or if something did!

Tuesday, September 9, 2014

on the mortality of SSDs

One of my servers, lil turbo, was booting from one of those bottom-of-the-barrel ADATA 32GB SSDs.  There are tons of reviews out there saying that these things are little turds, but I was feeling ballsy.  Then, one day, the server wasn't on the network any more.  I went into the closet, where lil turbo lives, to see what was the matter.

One of the non-boot drives was locked in a death grip on the sector it had been reading when it was interrupted, and fractured, seemingly non-Latin characters were bleeding all over the display.  Fuck.

Rebooted, and no dice.  Neither SSD was even seen in POST, not the boot drive and not the one I bought a year ago to mirror the boot drive with.

That was three months ago.

Last week, I decided to take a crack at reviving the comatose lil turbo.  Thinking either the SSD hot swap module or the SATA controller had died, I tried replacing both parts.  Still no dice.

So I started working on something else, and needed a spare 3.5" HDD to test a bus on a different server (vault 101).  So I pulled one of the RAID drives from lil turbo to use.  Then, forgetting that lil turbo was missing a drive, I booted it again, and the SSDs showed up!  However, they didn't boot - the screen came up with "Missing boot drive" or some shit.

I was thinking that the hot swap enclosure must be loose, and the drive was making connection and then loosing it.  But several subsequent boots failed the same way.

Then it hit me.  I grabbed the RAID disk back from vault 101 and inserted it in lil turbo's yawning, empty bay, but not all the way.  Then I went down the front and opened all the hot swap bays for the RAID disks, nine in all, so none of the would be seen or spun up when I next booted lil turbo.

When lil turbo booted, both SSDs were seen, and once it got to grub, I slowly began closing all the RAID drive bays.  Once the system had booted, I issued an mdadm --assemble --verbose /dev/md0 /dev/sd[abcdehijk] and a mount /dev/md0 /mnt/store, and watched the drive lights flicker as my data, marooned for three months, finally came back to me.

* * *

Later I learned that the ADATA was a turd after all - the smart log showed two critical-looking errors from around the time that the server would have crashed.

Next step: turn the root into a btrfs RAID1 and mirror it across both drives, finally!

(Edit: So I ended up trying various things live and borked the install.  Rather than fixing it or restoring from backup, I decided it was time for a fresh start.  Read about how I reinstalled lil turbo to boot from a raid1 btrfs root here.)