October 11, 2024

RAID howto

In Linux you can either just use a hardware RAID, which makes all your disks look like a single one (or two) to you OS, or do a software RAID setup when you load the OS, which can look daunting if you’re new to it.

To do this you have to format each drive as “physical volume for RAID”, and then you create an MD device that will be the partition that goes on top of the RAID volumes you created. If you use RAID 1, that means you have a hard drive mirror, and so two hard drives RAID 1 would be the same capacity as one of the drives, but if it fails, you still have the other one.

RAID 5 and 6 offer more redundancy, but use more drives, like at least 3. RAID 6 means 2 drives can fail and still have your data, so it’s a little more robust than RAID 5, but takes more drives. On the other hand, big drives are a pain to try to recover, so it might be worth it. Here’s an example of what it looks like to set up RAID 6 in Debian, and then I put my /var directory, since I plan on that holding lots of stuff, and I boot the OS off a 30G SSD, since that’s fast:
raid6_debian

Once you create your RAID volumes and put stuff on them, you use mdadm tool to manage them, you start by getting the status like:

cat /proc/mdstat
  personalities : [raid6] [raid5] [raid4] 
  md0 : active raid6 sdb1[1] sdc1[2] sdd1[3]
      1464883200 blocks super 1.2 level 6, 512k chunk, algorithm 2 [4/3] [_UUU]
      bitmap: 3/6 pages [12KB], 65536KB chunk
 
unused devices: <none>

This says I have a RAID 6 set up, but I know I *should* have really 4 drives in my RAID volume, but this only shows 3, so I check it out with:

mdadm -D /dev/md0
  /dev/md0:
        Version : 1.2
  Creation Time : Wed Dec  2 10:43:46 2015
     Raid Level : raid6
     Array Size : 1464883200 (1397.02 GiB 1500.04 GB)
  Used Dev Size : 732441600 (698.51 GiB 750.02 GB)
   Raid Devices : 4
  Total Devices : 3
    Persistence : Superblock is persistent
  Intent Bitmap : Internal
    Update Time : Wed Jan 20 10:45:06 2016
          State : clean, degraded 
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0
         Layout : left-symmetric
     Chunk Size : 512K
           Name : www:0  (local to host www)
           UUID : 574377b0:7ae2f061:ab417c11:357c8c53
         Events : 23258
    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       8       17        1      active sync   /dev/sdb1
       2       8       33        2      active sync   /dev/sdc1
       3       8       49        3      active sync   /dev/sdd1

You can tell by the list at the end that /dev/sda1 isn’t really hooked up (because it failed, then I replaced it), so now I have to re-add my /dev/sda1. Make TRIPLE SURE you are re-adding the right drive, or bad nasty things will happen :/ Once you do that, try to add it like:

mdadm --re-add /dev/md0 /dev/sda1
  mdadm: re-added /dev/sda1

Now you notice is shows up with the mdadm command like:

mdadm -D /dev/md0
  /dev/md0:
        Version : 1.2
  Creation Time : Wed Dec  2 10:43:46 2015
     Raid Level : raid6
     Array Size : 1464883200 (1397.02 GiB 1500.04 GB)
  Used Dev Size : 732441600 (698.51 GiB 750.02 GB)
   Raid Devices : 4
  Total Devices : 4
    Persistence : Superblock is persistent
  Intent Bitmap : Internal
    Update Time : Wed Jan 20 10:49:38 2016
          State : clean, degraded, recovering 
 Active Devices : 3
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 1
         Layout : left-symmetric
     Chunk Size : 512K
 Rebuild Status : 0% complete
           Name : www:0  (local to host www)
           UUID : 574377b0:7ae2f061:ab417c11:357c8c53
         Events : 23261
    Number   Major   Minor   RaidDevice State
       0       8        1        0      spare rebuilding   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1
       2       8       33        2      active sync   /dev/sdc1
       3       8       49        3      active sync   /dev/sdd1

See, now it says /dev/sda1 is rebuilding, you can check the process like:

cat /proc/mdstat
  Personalities : [raid6] [raid5] [raid4] 
  md0 : active raid6 sda1[0] sdb1[1] sdc1[2] sdd1[3]
      1464883200 blocks super 1.2 level 6, 512k chunk, algorithm 2 [4/3] [_UUU]
      [>....................]  recovery =  0.2% (1978368/732441600) finish=178.4min speed=68219K/sec
      bitmap: 3/6 pages [12KB], 65536KB chunk
  unused devices: <none>

That means it is 0.2% done rebuilding. This process takes forever (it seems) if you have larger drives, or like hours at least. Keep checking back, and you should eventually see:

cat /proc/mdstat
  Personalities : [raid6] [raid5] [raid4] 
  md0 : active raid6 sda1[0] sdb1[1] sdc1[2] sdd1[3]
      1464883200 blocks super 1.2 level 6, 512k chunk, algorithm 2 [4/4] [UUUU]
      bitmap: 0/6 pages [0KB], 65536KB chunk

Notice how it now shows my sda1 drive as part of the RAID 6 🙂

Also, mdadm will email you (if you have a default Postfix installed and set up your email address in /etc/aliases) if one of the drives does something weird, which is a super nice feature 🙂 Here’s how you set that up:

apt-get install postfix
  Select 'internet site'
vi /etc/aliases
  root: whateveruser
  whateveruser: enter@youremail.com
newaliases
vi /etc/mdadm/mdadm.conf
  MAILADDR root <-- make sure this it uncommented

to test a failed drive do:

mdadm --monitor --scan --test -1

if it worked, you should get an email summary from ‘mdadm monitoring’ of /proc/mdstat like:

This is an automatically generated mail message from mdadm
running on whateverhostname
A TestMessage event had been detected on md device /dev/md/1.
Faithfully yours, etc.
P.S. The /proc/mdstat file currently contains the following:
  Personalities : [raid6] [raid5] [raid4] 
  md0 : active raid6 sda1[0] sdb1[1] sdc1[2] sdd1[3]
      1464883200 blocks super 1.2 level 6, 512k chunk, algorithm 2 [4/4] [UUUU]
      bitmap: 0/6 pages [0KB], 65536KB chunk

If you want to manually remove a drive for some reason without breaking stuff, you can do the following, note: MAKE TRIPLE SURE YOU”RE REMOVING THE RIGHT DRIVE:

sudo mdadm --remove /dev/md0 /dev/sdb1

Configuring a RAID6 from scratch on Debian Jessie

This is not for the beginner. You have a very high chance of not getting it right and blowing up your ENTIRE data structure.

If you’re using large drives (over 2TB), you’ll have to use a different tool to format them, fdisk doesn’t work. Let’s start by seeing what is on your drives. We’ll need at least 3 drives for a RAID 6, so in my case I have my Operating System installed on a SSD called /dev/sda, so I’m starting my RAID volumes on /dev/sdb:

apt-get install parted gdisk mdadm
parted /dev/sdb
  (parted) p                                                                
  Model: ATA ST4000DM000-1F21 (scsi)
  Disk /dev/sdb: 4.00TB
  Sector size (logical/physical): 512B/4096B
  Partition Table: gpt
  Disk Flags: 
  Number  Start  End  Size  File system  Name  Flags
 
(parted)

This means you don’t have any partition table on there, or any partitions set up. So first you have to build a GPT partition table, then build a partition to fit on it like:

mklabel gpt
  Warning: The existing disk label on /dev/sdb will be destroyed and all data on this disk will be lost. Do you want to continue?
  Yes/No? yes
(parted)
(parted) unit TB

Okay, if there are no errors, you now have a usable disk, so now we’re going to create a single large RAID partition on the drive you just formatted like:

(parted) mkpart primary 0.00TB 4.00TB
(parted) p                                                                
  Model: ATA ST4000DM000-1F21 (scsi)
  Disk /dev/sdb: 4.00TB
  Sector size (logical/physical): 512B/4096B
  Partition Table: gpt
  Disk Flags: 
  Number  Start   End     Size    File system  Name     Flags
   1      0.00GB  4001GB  4001GB               primary

Now you have to create a RAID partition file system on it like:

(parted)set 1 raid on
(parted) p
  Model: ATA ST4000DM000-1F21 (scsi)
  Disk /dev/sdb: 4001GB
  Sector size (logical/physical): 512B/4096B
  Partition Table: gpt
  Disk Flags: 
  Number  Start   End     Size    File system  Name     Flags
   1      0.00GB  4001GB  4001GB               primary  raid
(parted) quit
Information: You may need to update /etc/fstab.

Now use fdisk to see if the OS sees it:

fdisk -l
  Disk /dev/sdb: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors
  Units: sectors of 1 * 512 = 512 bytes
  Sector size (logical/physical): 512 bytes / 4096 bytes
  I/O size (minimum/optimal): 4096 bytes / 4096 bytes
  Disklabel type: gpt
  Disk identifier: 29D2E5F9-7AFD-4DEA-8D15-9F3E3C8ED7C1
  Device     Start        End    Sectors  Size Type
  /dev/sdb1   2048 7814035455 7814033408  3.7T Linux RAID

Now you have to do the same formatting to the rest of your drives, in my case I have 5, so I’d do /dev/ sdc, sdd, sde, and sdf. Once you’re done with all that, you need mdadm, the RAID management tool, to build your RAID like:

mdadm --create --verbose /dev/md0 --level=6 --raid-devices=6 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1
mdadm: layout defaults to left-symmetric
mdadm: layout defaults to left-symmetric
mdadm: chunk size defaults to 512K
mdadm: /dev/sdb1 appears to be part of a raid array:
       level=raid6 devices=4 ctime=Tue Jul 19 10:34:55 2016
mdadm: size set to 3906885632K
mdadm: automatically enabling write-intent bitmap on large array
Continue creating array? yes
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.
>: cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] 
md0 : active raid6 sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1] sda1[0]
      15627542528 blocks super 1.2 level 6, 512k chunk, algorithm 2 [6/6] [UUUUUU]
      [>....................]  resync =  0.0% (1494792/3906885632) finish=827.3min speed=78673K/sec
      bitmap: 30/30 pages [120KB], 65536KB chunk
 
unused devices: <none>

This will take a very long time, like hours. You can format it in the meantime and use it. Also, the mdadm command will save it so it’s available the next time you reboot the machine. If you ever have to rebuild this array on a new machine, you can copy the /etc/mdadm/mdadm.conf file, and just run ‘mdadm –assemble –scan’.

To create the LVM on top of what you just build, use the command:

pvcreate -ff /dev/md0
  Really INITIALIZE physical volume "/dev/md0" of volume group "backupvol1" [y/n]? y
  WARNING: Forcing physical volume creation on /dev/md0 of volume group "backupvol1"
  Physical volume "/dev/md0" successfully created

Now you have to create your volume group like:

vgcreate raidvol1 /dev/md0
  Volume group "raidvol1" successfully created

Now you have to create the logical volume like:

lvcreate -l 100%FREE -n raidvol1lv raidvol1
  Logical volume "raidvol1lv" created

Now you format it like a normal filesystem:

mkfs -t ext4 /dev/raidvol1/raidvol1lv 
mke2fs 1.42.12 (29-Aug-2014)
Creating filesystem with 3906884608 4k blocks and 488361984 inodes
Filesystem UUID: d04aa16b-5f35-43a9-a0cd-35aa6fbfdc4f
Superblock backups stored on blocks: 
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
        4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 
        102400000, 214990848, 512000000, 550731776, 644972544, 1934917632, 
        2560000000, 3855122432
 
Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information:

Okay, now you want to mount it somewhere, so create a directory someplace and mount it:

mkdir /some/path/you/want
mount /dev/raidvol1/raidvol1lv /whatever/path/you/created/

Now it should be ready to use 🙂 If you want it to be seen the next time you reboot, add an entry to your /etc/fstab. To do that you should use the UUID of the volume, not just the /dev/raidvol1/raidvol1lv or some such. You find out what the UUID is by doing:

blkid -c /dev/null | grep raidvol
/dev/mapper/raidvol1-raidvol1lv: UUID="d04aa16b-5f35-43a9-a0cd-35aa6fbfdc4f" TYPE="ext4"

Now put that in the end of your /etc/fstab like:

vi /etc/fstab (add following line to end of it, but change as needed)
  #big raid6 volume used for whatever
  UUID=d04aa16b-5f35-43a9-a0cd-35aa6fbfdc4f /raid6volume  ext4    errors=remount-ro 0       1

Now if you reboot, you should see your raid volume, mounted wherever you wanted to mount it 🙂

RAID maintenance

Here are some commands to manage your RAID. First, here’s the status of your RAID you just created:

mdadm -D /dev/raidvol1/raidvol1lv
  /dev/raidvol1/raidvol1lv:
        Version : 1.2
  Creation Time : Thu Aug 25 11:37:17 2016
     Raid Level : raid6
     Array Size : 15627538432 (14903.58 GiB 16002.60 GB)
  Used Dev Size : 3906885632 (3725.90 GiB 4000.65 GB)
   Raid Devices : 6
  Total Devices : 6
    Persistence : Superblock is persistent
 
  Intent Bitmap : Internal
 
    Update Time : Fri Aug 26 14:53:52 2016
          State : active 
 Active Devices : 6
 Working Devices : 6
 Failed Devices : 0
  Spare Devices : 0
 
         Layout : left-symmetric
     Chunk Size : 512K
 
           Name : debian:0  (local to host debian)
           UUID : 9b16b2c6:0d326920:377c3e6f:23b37baa
         Events : 17624
 
    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1
       2       8       33        2      active sync   /dev/sdc1
       3       8       49        3      active sync   /dev/sdd1
       4       8       65        4      active sync   /dev/sde1
       5       8       81        5      active sync   /dev/sdf1

If you want to check the status of one of the disks in the RAID, do:

mdadm -E /dev/sda1
  /dev/sda1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 9b16b2c6:0d326920:377c3e6f:23b37baa
           Name : debian:0  (local to host debian)
  Creation Time : Thu Aug 25 11:37:17 2016
     Raid Level : raid6
   Raid Devices : 6
 
 Avail Dev Size : 7813771264 (3725.90 GiB 4000.65 GB)
     Array Size : 15627542528 (14903.59 GiB 16002.60 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : active
    Device UUID : f28484a6:ac669b6c:61059466:6f8b45f4
 
Internal Bitmap : 8 sectors from superblock
    Update Time : Fri Aug 26 14:58:07 2016
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 468b744 - correct
         Events : 17676
 
         Layout : left-symmetric
     Chunk Size : 512K
 
   Device Role : Active device 0
   Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)

If you want to do a file system check on the raid, called a scrub, do something like:

echo check > /sys/block/md0/md/sync_action
watch -n .1 cat /proc/mdstat

If a disk fails and you want to remove it do:

mdadm --remove /dev/md0 /dev/sda1
mdadm: hot removed /dev/sda1 from /dev/md0

If you don’t know which physical drive failed by looking at it, you can use ledctl. It’s not always supported, but you can try:

apt install ledmon
ledctl locate=/dev/sda

If that doesn’t work, try smartmon:

apt install smartmontools

It will generate emails telling you if something is breaking, including the serial number of the drive like:

This message was generated by the smartd daemon running on:
  host name:  whatever
  DNS domain: name.com
The following warning/error was logged by the smartd daemon:
Device: /dev/sde [SAT], Failed SMART usage Attribute: 5 Reallocated_Sector_Ct.
Device info:
Hitachi HDS723030ALA640, S/N:MK0301YHGKZ4AA, WWN:5-000cca-225c82b73, FW:MKAOA5C0, 3.00 TB
For details see hosts SYSLOG.
You can also use the smartctl utility for further investigation.
The original message about this issue was sent at Fri Jul 31 08:36:32 2020 PDT
Another message will be sent in 24 hours if the problem persists.

If you want to add a working one back in to begin the rebuild process, you have to format it first and make SURE the partitions match the other drives in the group. An easy way to do that is with sgdisk, it will give you an exact copy of one of the other drives in the array, but MAKE SURE your source and destination drives are the right ones like:

apt install gdisk
sgdisk -R /dev/sdNEWDISK(sdd in my case) /dev/sdOLDDISK(sdc in my case)
fdisk -l
  Disk /dev/sdd: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors
  Units: sectors of 1 * 512 = 512 bytes
  Sector size (logical/physical): 512 bytes / 512 bytes
  I/O size (minimum/optimal): 512 bytes / 512 bytes
  Disklabel type: gpt
  Disk identifier: 948A0ABA-23E3-425C-A0D5-578CF6A630D0
  Device         Start        End    Sectors  Size Type
  /dev/sdd1       2048  124999679  124997632 59.6G Linux RAID
  /dev/sdd2  124999680 5860532223 5735532544  2.7T Linux RAID
sgdisk -G /dev/sdd

That last command gives it a new UUID, otherwise raid will get confused.

Once you do that, you can attempt to re-add it like:

mdadm --add /dev/md0 /dev/sda1

Sometimes Debian will decide it still needs to be marked as a spare and not used. If that’s the case, you’ll have to unmount and stop the RAID and try to re-assemble it like below, making SURE you use the –assume-clean switch or it will just rebuild the whole thing, which is very bad:

mdadm --stop /dev/md0
mdadm --create /dev/md0 --level=6 --raid-devices=4 --chunk=64 --name=RackStation:2 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3 --assume-clean

Once you put it back in successfully, it will force mdadm to re-sync the RAID, which may take hours. You can monitor its progress by running:

watch -n .1 cat /proc/mdstat

which will give you a progress bar like:

md1 : active raid1 sdb3[2] sda3[0]
      1433512960 blocks super 1.2 [2/1] [U_]
      [===========>.........]  recovery = 56.9% (817050432/1433512960) finish=197.6min speed=51970K/sec
      bitmap: 7/11 pages [28KB], 65536KB chunk
 
md0 : active raid1 sdb2[2] sda2[0]
      31234048 blocks super 1.2 [2/2] [UU]

When it’s done, which could be a loooong time, you should see something like:

Every 0.1s: cat /proc/mdstat                                                   Thu Jan 25 14:35:40 2018
 
Personalities : [raid1]
md1 : active raid1 sdb3[2] sda3[0]
      1433512960 blocks super 1.2 [2/2] [UU]
      bitmap: 4/11 pages [16KB], 65536KB chunk
 
md0 : active raid1 sdb2[2] sda2[0]
      31234048 blocks super 1.2 [2/2] [UU]
 
unused devices: <none>

If you want to see what physical volumes you’ve created do:

pvdisplay
  --- Physical volume ---
  PV Name               /dev/md0
  VG Name               raidvol1
  PV Size               14.55 TiB / not usable 4.00 MiB
  Allocatable           yes (but full)
  PE Size               4.00 MiB
  Total PE              3815317
  Free PE               0
  Allocated PE          3815317
  PV UUID               k8y1t4-dNGG-krdL-mXEp-20e0-GoHn-n0hRgx

To show what volume groups you have do:

vgdisplay
  --- Volume group ---
  VG Name               raidvol1
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  2
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               0
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               14.55 TiB
  PE Size               4.00 MiB
  Total PE              3815317
  Alloc PE / Size       3815317 / 14.55 TiB
  Free  PE / Size       0 / 0   
  VG UUID               YJDj53-vowl-GwEh-yYyt-UYJU-G4jG-uKWWKN

To show your logical volumes do:

lvdisplay
  --- Logical volume ---
  LV Path                /dev/raidvol1/raidvol1lv
  LV Name                raidvol1lv
  VG Name                raidvol1
  LV UUID                NxXvcw-pPE3-z4Cp-pGaW-tSLY-MEmP-zGYDFz
  LV Write Access        read/write
  LV Creation host, time debian, 2016-08-25 14:23:05 -0700
  LV Status              NOT available
  LV Size                14.55 TiB
  Current LE             3815317
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto

To mount your volume groups:

vgscan
  Reading all physical volumes.  This may take a while...
  Found volume group "raidvol1" using metadata type lvm2
vgchange -ay raidvol1
  1 logical volume(s) in volume group "raidvol1" now active

Now you can mount them like:

mount /dev/raidvol1/raidvol1lv /wherever/you/want/tomount/it/
cd /wherever/you/just/mounted/it/

Now if it went well, you should have your raid up and working 🙂

: : :

There are some good RAID resources here.

General recovery stuff can be found here

RedHat (as usual) has some great stuff on it here

If you have to recover an LVM, there’s a good reference here, you can do the same thing on Debian using the Debian LiveCD. Here’s an adaptation of what it says to do:

  1. Get a live cd or USB, then boot to it rather than your host system you’re fixing.
  2. Search for these tools: lvm2. If it’s not there, install it like:
    apt-get install lvm2
  3. Use fdisk to figure out which drive(s) you’ll be recovering on. Note: this is not the disk you’re booting from, so MAKE SURE you’re about to work with the right disk. fdisk -lu
  4. Once installed, run pvscan to scan all disks for physical volume. this to make sure your LVM harddisk is recognized by Debian.
    pvscan
    PV /dev/sda2 VG VolGroup00 lvm2 [74.41 GB / 32.00 MB free]
    Total: 1 [74.41 GB] / in use: 1 [74.41 GB] / in no VG: 0 [0 ]
  5. fter that run vgscan to scan disks for volume groups.
    # vgscan
    Reading all physical volumes. This may take a while...
    Found volume group "VolGroup00" using metadata type lvm2
  6. Activate all volume groups available.
    vgchange -a y
    2 logical volume(s) in volume group "VolGroup00" now active
  7. Run lvscan to scan all disks for logical volume. You can see partitions inside the hard disk now active.
    # lvscan
    ACTIVE '/dev/VolGroup00/LogVol00' [72.44 GB] inherit
    ACTIVE '/dev/VolGroup00/LogVol01' [1.94 GB] inherit
  8. Mount the partition to any directory you want, usually to /mnt
    # mount /dev/VolGroup00/LogVol00 /mnt
  9. You can access the partition in the /mnt directory and can backup your data using rsync

mdadm –manage /dev/md0 –add /dev/sdh3
mdadm: added /dev/sdh3

Rebuilding a partially failed RAID:

mdadm --examine /dev/sd[bcd]1 >> raid.status
mdadm --stop /dev/md0
mdadm --create /dev/md0 --level=6 --raid-devices=4 --chunk=64 --name=backup1:0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 --assume-clean
mdadm: /dev/sda1 appears to be part of a raid array:
       level=raid6 devices=4 ctime=Tue Jul 19 17:34:55 2016
mdadm: /dev/sdb1 appears to be part of a raid array:
       level=raid6 devices=4 ctime=Tue Jul 19 17:34:55 2016
mdadm: /dev/sdc1 appears to be part of a raid array:
       level=raid6 devices=4 ctime=Tue Jul 19 17:34:55 2016
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.
root@debian:/home/user >: cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdd1[3] sdc1[2] sdb1[1] sda1[0]
      7812233216 blocks super 1.2 level 6, 64k chunk, algorithm 2 [4/4] [UUUU]
      bitmap: 30/30 pages [120KB], 65536KB chunk
 
unused devices: <none>

So it looks like it worked!

adding another drive to a raid6

In this case, I want to add a new drive to expand the capacity of an existing raid6 that HAD 4 drives, but I want add one, so I will have 5, and more capacity. So this is what I’m starting with:

cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
md127 : active raid6 sdd1[0] sdf1[2] sdh1[3] sde1[1]
      5850767360 blocks super 1.2 level 6, 512k chunk, algorithm 2 [4/4] [UUUU]
      bitmap: 2/22 pages [8KB], 65536KB chunk

Now I inserted a new drive, which the OS called /dev/sdg, which I formatted like:

parted /dev/sdg
GNU Parted 3.2
Using /dev/sdg
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) p
Error: Both the primary and backup GPT tables are corrupt.  Try making a fresh table, and using Parted rescue feature to recover partitions.
Model:  ST3000DM001-1CH1 (scsi)
Disk /dev/sdg: 2996GB
Sector size (logical/physical): 512B/512B
Partition Table: unknown
Disk Flags:
(parted) mklabel gpt
(parted) p
Model:  ST3000DM001-1CH1 (scsi)
Disk /dev/sdg: 2996GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:
Number  Start  End  Size  File system  Name  Flags
 
(parted) unit TB
(parted) p
Model:  ST3000DM001-1CH1 (scsi)
Disk /dev/sdg: 3.00TB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:
Number  Start  End  Size  File system  Name  Flags
 
(parted) mkpart primary 0.00TB 3.00TB
(parted) p
Model:  ST3000DM001-1CH1 (scsi)
Disk /dev/sdg: 3.00TB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:
Number  Start   End     Size    File system  Name     Flags
 1      0.00TB  3.00TB  3.00TB               primary
 
(parted) set 1 raid on
(parted) quit
Information: You may need to update /etc/fstab.

Now I have to add it to the raid6 md127 like:

mdadm --add /dev/md127 /dev/sdg1
mdadm: added /dev/sdg1
mdadm --grow --raid-devices=5 /dev/md127
cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
md127 : active raid6 sdg1[4] sdd1[0] sdf1[2] sdh1[3] sde1[1]
      5850767360 blocks super 1.2 level 6, 512k chunk, algorithm 2 [5/5] [UUUUU]
      [>....................]  reshape =  0.0% (104448/2925383680) finish=1866.8min speed=26112K/sec
      bitmap: 2/22 pages [8KB], 65536KB chunk

It will be re-syncing the data for a really long time behind the scenes now. In this example it took around 12 hours! When it’s done, it should look something like:

cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
md127 : active raid6 sdg1[4] sdd1[0] sdf1[2] sdh1[3] sde1[1]
      8776151040 blocks super 1.2 level 6, 512k chunk, algorithm 2 [5/5] [UUUUU]
      bitmap: 2/22 pages [8KB], 65536KB chunk

Notice how the size is much larger
You may have to re-add the new expanded raid to mdadm.conf like:

mdadm --detail --scan >> /etc/mdadm/mdadm.conf

Just make sure you delete the old entry.

Although you added the new drive, it has to finish reshaping the raid array before the new space will show up, so you HAVE TO WAIT before doing the next stuff.

Now I have to expand the physical volume, volume group, lvm and then the ext4 that sits on top of the md127 raid6 to use this new space. First, unmount your raid:

umount /dev/mapper/raidvol1-raidvol1lv

now use pvresize to resize the physical volume to the new raid size. Here is the starting point

pvdisplay
 --- Physical volume ---
  PV Name               /dev/md127
  VG Name               raidvol1
  PV Size               5.45 TiB / not usable 4.00 MiB
  Allocatable           yes (but full)
  PE Size               4.00 MiB
  Total PE              1428409
  Free PE               0
  Allocated PE          1428409
  PV UUID               L11RdW-3Yw5-fDNL-6TzJ-wSNb-eZ4g-HV6DM5

Now I resize and measure the size again.

pvresize /dev/md127
  Physical volume "/dev/md127" changed
  1 physical volume(s) resized / 0 physical volume(s) not resized
pvdisplay
--- Physical volume ---
  PV Name               /dev/md127
  VG Name               raidvol1
  PV Size               8.17 TiB / not usable 3.00 MiB
  Allocatable           yes
  PE Size               4.00 MiB
  Total PE              2142614
  Free PE               714205
  Allocated PE          1428409
  PV UUID               L11RdW-3Yw5-fDNL-6TzJ-wSNb-eZ4g-HV6DM5

Now you have to extend the LVM. Note the volume group (VG) was automatically expanded in the process of the pvresize. So now you do:

lvextend -l +100%FREE /dev/raidvol1/raidvol1lv
  Size of logical volume raidvol1/raidvol1lv changed from 5.45 TiB (1428409 extents) to 8.17 TiB (2142614 extents).
  Logical volume raidvol1/raidvol1lv successfully resized

and now you have to resize the partition on top of it, but first you should file system check it:

e2fsck -f /dev/raidvol1/raidvol1lv
  e2fsck 1.43.4 (31-Jan-2017)
  Pass 1: Checking inodes, blocks, and sizes
  Inode 15466499 extent tree (at level 2) could be narrower.  Fix<y>? yes
  Pass 1E: Optimizing extent trees
  Pass 2: Checking directory structure
  Pass 3: Checking directory connectivity
  Pass 4: Checking reference counts
  Pass 5: Checking group summary information
  /dev/raidvol1/raidvol1lv: ***** FILE SYSTEM WAS MODIFIED *****
  /dev/raidvol1/raidvol1lv: 14/182837248 files (0.0% non-contiguous), 13747352/1462690816 blocks
resize2fs -p /dev/raidvol1/raidvol1lv
  resize2fs 1.43.4 (31-Jan-2017)
  Resizing the filesystem on /dev/raidvol1/raidvol1lv to 2194036736 (4k) blocks.
  The filesystem on /dev/raidvol1/raidvol1lv is now 2194036736 (4k) blocks long.

Now mount it again and see if it worked!

mount /dev/raidvol1/raidvol1lv /raid6localvol/
df -h
Filesystem                       Size  Used Avail Use% Mounted on
  /dev/mapper/raidvol1-raidvol1lv  8.2T  7.5G  7.7T   1% /raid6localvol

mdadm not enough to start array – /dev/sdc2 no superblock

I had a server reboot on power failure and came up with the error mounting /dev/md1 like:

mdadm --assemble --scan
mdadm: /dev/md/1 assembled from 3 drives - not enough to start the array while not clean - consider --force.
mdadm: No arrays found in config file or automatically
>: mdadm --assemble --force /dev/md1 /dev/sdb2 /dev/sdc2 /dev/sdd2 /dev/sde2 -v
mdadm: looking for devices for /dev/md1
mdadm: cannot open device /dev/sdc2/: Not a directory
mdadm: /dev/sdc2/ has no superblock - assembly aborted

Crud! I examined the drives like:

>: mdadm -E /dev/sdc
/dev/sdc:
MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
 
>: mdadm -E /dev/sdc1
/dev/sdc1:
      Magic : a92b4efc
    Version : 1.2
Feature Map : 0x0
 Array UUID : 3e7427b8:e82071a4:8224a7b7:f4dc6d0f
       Name : kvmhost4:0  (local to host kvmhost4)
 Creation Time : Fri Nov  3 12:51:44 2017
 Raid Level : raid6
 Raid Devices : 4
 Avail Dev Size : 124932096 (59.57 GiB 63.97 GB)
 Array Size : 124932096 (119.14 GiB 127.93 GB)
Data Offset : 65536 sectors
Super Offset : 8 sectors
Unused Space : before=65448 sectors, after=0 sectors
      State : clean
Device UUID : dd82716f:49b276d7:9d383a43:06cf9206
Update Time : Fri Jul 10 06:20:40 2020
Bad Block Log : 512 entries available at offset 72 sectors
   Checksum : c2291539 - correct
     Events : 37
     Layout : left-symmetric
 Chunk Size : 512K
Device Role : Active device 1
Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)

which looks normal. I tried to find a good superblock by doing

>: e2fsck /dev/sdc2
ext2fs_open2: Bad magic number in super-block
e2fsck: Superblock invalid, trying backup blocks...
e2fsck: Bad magic number in super-block while trying to open /dev/sdc2
...
you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>
or 
    e2fsck -b 32768 <device>
>: e2fsck -b 8193
e2fsck: Bad magic number in super-block while trying to open /dev/sdc2

So that didn’t seem to work, so I tried to force an automatic re-assembly, which worked!

mdadm --assemble --scan --force
mdadm: Marking array /dev/md/1 as 'clean'
mdadm: /dev/md/1 has been started with 3 drives (out of 4).

Yay! Now it should start to rebuild the data

>: cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md1 : active raid6 sdb2[0] sdd2[2] sdc2[1]
  5735270400 blocks super 1.2 level 6, 512k chunk, algorithm 2 [4/3] [UUU_]
  [>....................]  resync =  0.0% (765440/2867635200) finish=561.7min speed=85048K/sec
  bitmap: 4/22 pages [16KB], 65536KB chunk
 
md0 : active (auto-read-only) raid6 sdb1[0] sde1[3] sdd1[2] sdc1[1]
  124932096 blocks super 1.2 level 6, 512k chunk, algorithm 2 [4/4] [UUUU]
unused devices: <none>

Now I have to wait some hours for it to rebuild, but I can mount the md1 volume and use it like normal while it’s doing that 🙂

Which drive is connected to what?

On large arrays it’s really hard to remember which device is part of which raid, so here are a few useful commands:

ls -l /dev/disk/by-id
total 0
lrwxrwxrwx 1 root root  9 Jan  3 09:42 ata-SanDisk_SSD_PLUS_240_GB_174807801684 -> ../../sdn
lrwxrwxrwx 1 root root 10 Jan  3 09:42 ata-SanDisk_SSD_PLUS_240_GB_174807801684-part1 -> ../../sdn1
lrwxrwxrwx 1 root root 10 Jan  3 09:42 ata-SanDisk_SSD_PLUS_240_GB_174807801684-part2 -> ../../sdn2
lrwxrwxrwx 1 root root  9 Jan  3 09:42 ata-ST4000DM000-1F2168_Z305KW5Q -> ../../sdc
lrwxrwxrwx 1 root root 10 Jan  3 09:42 ata-ST4000DM000-1F2168_Z305KW5Q-part1 -> ../../sdc1
lrwxrwxrwx 1 root root  9 Jan  3 09:42 ata-ST4000DM000-1F2168_Z305LWPH -> ../../sdb
lrwxrwxrwx 1 root root 10 Jan  3 09:42 ata-ST4000DM000-1F2168_Z305LWPH-part1 -> ../../sdb1
lrwxrwxrwx 1 root root  9 Jan  3 09:42 ata-ST4000DM004-2CV104_ZFN0TS0J -> ../../sde
lrwxrwxrwx 1 root root 10 Jan  3 09:42 ata-ST4000DM004-2CV104_ZFN0TS0J-part1 -> ../../sde1
...
lrwxrwxrwx 1 root root 10 Jan  3 09:42 dm-name-backup1-backuplvm1 -> ../../dm-1
lrwxrwxrwx 1 root root 10 Jan  3 09:42 dm-name-raidvol1-raidvol1lv -> ../../dm-0
lrwxrwxrwx 1 root root 10 Jan  3 09:42 dm-uuid-LVM-2Uszc2WdEbSLDdppdPOFqfXqj1SSfZhy96UqrAyY2cU53Tp5qw6oAZ0wQ9QFn4It -> ../../dm-1
lrwxrwxrwx 1 root root 10 Jan  3 09:42 dm-uuid-LVM-D9ulJDJemYQq9h1ax82STFXxLwJVMcniqb2qfoB4uoSrUPu8EMizpY612skVgx3G -> ../../dm-0
lrwxrwxrwx 1 root root  9 Jan  3 09:42 lvm-pv-uuid-L11RdW-3Yw5-fDNL-6TzJ-wSNb-eZ4g-HV6DM5 -> ../../md1
lrwxrwxrwx 1 root root  9 Jan  3 09:42 lvm-pv-uuid-n13akA-a57n-9gJd-TqFI-ndob-1VTm-fzmh14 -> ../../md0
lrwxrwxrwx 1 root root  9 Jan  3 09:42 md-name-backup1:0 -> ../../md0
lrwxrwxrwx 1 root root  9 Jan  3 09:42 md-name-kvmhost5:1 -> ../../md1
lrwxrwxrwx 1 root root  9 Jan  3 09:42 md-uuid-c3ef766b:2fe9581a:5a906461:d52ee71e -> ../../md0
lrwxrwxrwx 1 root root  9 Jan  3 09:42 md-uuid-cf429c1c:a64167a7:31613648:d5d31721 -> ../../md1
lrwxrwxrwx 1 root root  9 Jan  3 09:42 scsi-1ATA_ST3000DM001-1ER166_Z500EHPA -> ../../sdk
...
lrwxrwxrwx 1 root root 10 Jan  3 10:05 scsi-1ATA_ST8000DM0004-1ZC11G_ZA2BZNWQ-part1 -> ../../sdh1
 
blkid
/dev/sdn1: UUID="4d41daf7-0ff4-4ae8-9a89-d086b94df78f" TYPE="swap" PARTUUID="5a1e499b-01"
/dev/sdn2: UUID="7a2f33f2-9de8-4fad-bd64-330a59073de8" TYPE="ext4" PARTUUID="5a1e499b-02"
/dev/sdf1: UUID="cf429c1c-a641-67a7-3161-3648d5d31721" UUID_SUB="409b6cc7-e460-6be6-603c-52f6ef8c5063" LABEL="kvmhost5:1" TYPE="linux_raid_member" PARTLABEL="primary" PARTUUID="85b7659b-bb66-41b4-851c-5754dcb9368b"
/dev/sdd1: UUID="c3ef766b-2fe9-581a-5a90-6461d52ee71e" UUID_SUB="ab0504fd-478a-e4ae-636a-406fe2ed065a" LABEL="backup1:0" TYPE="linux_raid_member" PARTLABEL="primary" PARTUUID="0983f155-66d3-43a4-84e4-a27b490810f6"
/dev/sda1: UUID="c3ef766b-2fe9-581a-5a90-6461d52ee71e" UUID_SUB="08f73a3d-34ee-a28e-a527-aa9edfa7f026" LABEL="backup1:0" TYPE="linux_raid_member" PARTLABEL="primary" PARTUUID="2797cfd7-8070-481b-bbca-3ebde47e3b22"
/dev/sde1: UUID="c3ef766b-2fe9-581a-5a90-6461d52ee71e" UUID_SUB="5ee62bff-a9d4-9628-5897-572af65827fd" LABEL="backup1:0" TYPE="linux_raid_member" PARTLABEL="primary" PARTUUID="44aa6b33-f7ee-4475-90e1-5be7a99c4ed7"
/dev/sdl1: UUID="cf429c1c-a641-67a7-3161-3648d5d31721" UUID_SUB="22dfdfad-2d2d-19fc-5902-2b638393d629" LABEL="kvmhost5:1" TYPE="linux_raid_member" PARTLABEL="primary" PARTUUID="1f1e10b6-048f-4f8a-914c-29509efc71d3"
/dev/sdj1: UUID="cf429c1c-a641-67a7-3161-3648d5d31721" UUID_SUB="09d50f99-b280-b895-1562-924b82a0bb5f" LABEL="kvmhost5:1" TYPE="linux_raid_member" PARTLABEL="primary" PARTUUID="b2226f13-e5a1-48e1-bdf9-8f01fb34e464"
/dev/sdk1: UUID="cf429c1c-a641-67a7-3161-3648d5d31721" UUID_SUB="7ac464cb-2455-599c-b766-4f3384f66cd8" LABEL="kvmhost5:1" TYPE="linux_raid_member" PARTLABEL="primary" PARTUUID="c320dd8d-66f2-451c-8823-b839bf7eb26e"
/dev/sdi1: UUID="cf429c1c-a641-67a7-3161-3648d5d31721" UUID_SUB="7062d815-efe4-d0ef-bcb7-7ae5179f4aab" LABEL="kvmhost5:1" TYPE="linux_raid_member" PARTLABEL="primary" PARTUUID="6eeed044-7f1b-4654-9c61-ef66c148f24d"
/dev/sdm1: UUID="c3ef766b-2fe9-581a-5a90-6461d52ee71e" UUID_SUB="4965c4cc-032e-eab2-a13a-5b955c6e77e8" LABEL="backup1:0" TYPE="linux_raid_member" PARTLABEL="primary" PARTUUID="61a454de-8ac9-4aed-a3c7-065f52580abf"
/dev/sdg1: UUID="cf429c1c-a641-67a7-3161-3648d5d31721" UUID_SUB="a5c42986-c71f-cc0b-890c-3ef8c79ec70a" LABEL="kvmhost5:1" TYPE="linux_raid_member" PARTLABEL="primary" PARTUUID="27bcb55f-8e24-4423-95c7-cc3a4457edd0"
/dev/sdc1: UUID="c3ef766b-2fe9-581a-5a90-6461d52ee71e" UUID_SUB="318ed10f-89da-8eda-6f04-53181b93dcf1" LABEL="backup1:0" TYPE="linux_raid_member" PARTLABEL="primary" PARTUUID="b9ac6c4f-087c-46d7-9d32-446264a016b9"
/dev/sdb1: UUID="c3ef766b-2fe9-581a-5a90-6461d52ee71e" UUID_SUB="0844af05-ac62-057d-7b42-ea953ac9c3b8" LABEL="backup1:0" TYPE="linux_raid_member" PARTLABEL="primary" PARTUUID="92e37ba2-96b7-4b2d-b6b6-da0e5965e1f7"
/dev/md1: UUID="L11RdW-3Yw5-fDNL-6TzJ-wSNb-eZ4g-HV6DM5" TYPE="LVM2_member"
/dev/md0: UUID="n13akA-a57n-9gJd-TqFI-ndob-1VTm-fzmh14" TYPE="LVM2_member"
/dev/sdh1: UUID="cf429c1c-a641-67a7-3161-3648d5d31721" UUID_SUB="c10af376-45bb-4e15-163c-1401aee31bf7" LABEL="kvmhost5:1" TYPE="linux_raid_member" PARTLABEL="primary" PARTUUID="a27e6eb5-8233-4be8-a091-a55358a8e23b"
/dev/mapper/raidvol1-raidvol1lv: UUID="c02d5316-09b7-40fb-8f68-de20c30c2068" TYPE="ext4"
/dev/mapper/backup1-backuplvm1: UUID="e48af5c1-6f4c-42d1-b7e4-8714093047d9" TYPE="ext4"
 
udevadm trigger --verbose --dry-run | grep disk
/sys/devices/pci0000:00/0000:00:05.0/0000:04:00.0/host0/target0:1:1/0:1:1:0/scsi_disk/0:1:1:0
/sys/devices/pci0000:00/0000:00:05.0/0000:04:00.0/host0/target0:1:2/0:1:2:0/scsi_disk/0:1:2:0
/sys/devices/pci0000:00/0000:00:05.0/0000:04:00.0/host0/target0:1:3/0:1:3:0/scsi_disk/0:1:3:0
/sys/devices/pci0000:00/0000:00:05.0/0000:04:00.0/host0/target0:1:4/0:1:4:0/scsi_disk/0:1:4:0
/sys/devices/pci0000:00/0000:00:05.0/0000:04:00.0/host0/target0:1:5/0:1:5:0/scsi_disk/0:1:5:0
/sys/devices/pci0000:00/0000:00:05.0/0000:04:00.0/host0/target0:1:6/0:1:6:0/scsi_disk/0:1:6:0
/sys/devices/pci0000:00/0000:00:05.0/0000:04:00.0/host0/target0:1:7/0:1:7:0/scsi_disk/0:1:7:0
/sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0/host5/port-5:0/expander-5:0/port-5:0:0/end_device-5:0:0/target5:0:0/5:0:0:0/scsi_disk/5:0:0:0
/sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0/host5/port-5:0/expander-5:0/port-5:0:1/end_device-5:0:1/target5:0:1/5:0:1:0/scsi_disk/5:0:1:0
/sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0/host5/port-5:0/expander-5:0/port-5:0:2/end_device-5:0:2/target5:0:2/5:0:2:0/scsi_disk/5:0:2:0
/sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0/host5/port-5:0/expander-5:0/port-5:0:3/end_device-5:0:3/target5:0:3/5:0:3:0/scsi_disk/5:0:3:0
/sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0/host5/port-5:0/expander-5:0/port-5:0:4/end_device-5:0:4/target5:0:4/5:0:4:0/scsi_disk/5:0:4:0
/sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0/host5/port-5:0/expander-5:0/port-5:0:5/end_device-5:0:5/target5:0:5/5:0:5:0/scsi_disk/5:0:5:0
/sys/devices/pci0000:00/0000:00:1f.2/ata1/host1/target1:0:0/1:0:0:0/scsi_disk/1:0:0:0