Kangry.com [insert cool graphic here]
home | Topics | Logout | Search | Contact | ?? Kangry ?? | Bandwitdh
Topics:
DVR
nvrec
Mplayer
Links
Misc
Commands
Humor

Name

Password

New user

uploaded files
(linux_command_line)-> (Parent)->Problems with my Raid Array submited by Russell Fri 18 Feb 05
Edited Sun 20 Feb 05
Web kangry.com
OK I had a non critical failure, I built my raid6 array with 4 drives (one missing) and mounted it. This was a replacement array and when the new array was online I planed to shut down the first array ( still in the same machine) and give it's drives to the second array as second parity and a spare drive.

As can be the case in my office, I didn't get around to that change right away, and yesterday when I prepared to make these changes, I found that the array had degraded to only two drives ( no parity) several days earlier. I had not gotten an email notice about the failure.

It seems that even though the mdadm --monitor command was running , it was not watching my new array. I have explictly added the command :
mdadm  --monitor /dev/md1 &
to my /etc/rc.local

I was remote at the time I found the problem so the first thing I did was to add one of the drives from the old array

So I start watching the sync process and about 10 minutes lator the machine stops responding to me. my ssh session locks up. The server stops responding to pings. I can't do anything.

I get into the office in the moring and find the machine just locked up. No error messages on screen just totaly unresponsive. .. So I reboot. Mount the array (/dev/md1) read only and tell it to add a drive.
mdadm /dev/md1 -a /dev/hdc1
and it SLOWLY starts the sync process. All the drives in this array are 160 GB disks, I have timed the current process at 3 minutes per 1% completed or just under 5 hours for the whole process.

I guess that's not totaly unrealistic, it took about that long to copy the files onto this array, and that only involved about 1/2 of the data. to do this, I guess the Raid process needs to read and process 300 Gb of data and write a new 150~160 Gb of data.

Still I don't like the idea of leaving the office central file server in Read Only Mode for the first 4/5 hours of the busness day. My choice is to change it to mount r/w.. I 'm not sure that is safe. ( keep in mind, that I currently have no redunadancy on this drive ... )

9:15am
26% of the re-sync completed.. I needed to change the mount mode to read/write because a user needed to save files to the server.

I hope I don't regret it.
9:52am 31% ---- still running
10:49am 50% ---- still running
11:53am 71% ---- not dead yeat
12:35pm 84% ---- still not dead
[root@backuppc russell]# /sbin/mdadm --detail /dev/md1
/dev/md1:
        Version : 00.90.01
  Creation Time : Wed Jan 26 11:06:30 2005
     Raid Level : raid6
     Array Size : 312576512 (298.10 GiB 320.08 GB)
    Device Size : 156288256 (149.05 GiB 160.04 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Fri Feb 18 12:53:44 2005
          State : clean, no-errors
 Active Devices : 2
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 1


 Rebuild Status : 90% complete

    Number   Major   Minor   RaidDevice State
       0      33       65        0      active sync   /dev/hdf1
       1      34       65        1      active sync   /dev/hdh1
       2       0        0       -1      removed
       3       0        0       -1      removed
       4      22        1        2      spare   /dev/hdc1
           UUID : f8e14f75:1e6932f3:23750999:74c8702d
         Events : 0.611826
Fri Feb 18 12:53:49 EST 2005
[root@backuppc russell]#
12:53pm 90%
1:11pm 95%
1:31pm Re-sync complete...

Now to add the 2nd parity drive

7:pm finaly got the full array online:
[root@backuppc root]# mdadm --detail /dev/md1
/dev/md1:
        Version : 00.90.01
  Creation Time : Wed Jan 26 11:06:30 2005
     Raid Level : raid6
     Array Size : 312576512 (298.10 GiB 320.08 GB)
    Device Size : 156288256 (149.05 GiB 160.04 GB)
   Raid Devices : 4
  Total Devices : 5
Preferred Minor : 1
    Persistence : Superblock is persistent
 
    Update Time : Fri Feb 18 19:12:09 2005
          State : clean, no-errors
 Active Devices : 4
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 1
 
 
    Number   Major   Minor   RaidDevice State
       0      33       65        0      active sync   /dev/hdf1
       1      34       65        1      active sync   /dev/hdh1
       2      22        1        2      active sync   /dev/hdc1
       3       3       65        3      active sync   /dev/hdb1
       4      33        1       -1      spare   /dev/hde1
           UUID : f8e14f75:1e6932f3:23750999:74c8702d
         Events : 0.618446
Feb 20, 7am
OK so the machine reset this morning, ( I think I have a lose wire on the powere supply, I will check it when I am back in the office) But when it restarted the array re-assembled with drives missing, I belive this is beacause I need to add lines to /etc/mdadm.conf. I beleive I have it right now. my current /etc/mdadm.conf lines:
DEVICE /dev/hde1 /dev/hdb1  /dev/hdh1 /dev/hdf1 /dev/hdc1
ARRAY /dev/md1 devices=/dev/hdh1,/dev/hdf1,/dev/hdc1,/dev/hdb1,/dev/hde1
Since it will be 4+ hours until the array is done re-adding the second parity drive, I don't think I will test those settings today. but Idealy I won't need to rebuild anything if the machine resets again.


Add comment or question...:
Subject:
Submited by: NOT email address. Leave blank for anonymous    (Spam Policy)

Enter Text: (text must match image for posting)




This file (the script that presented the data, not the data itself) , last modified Tuesday 06th of March 2018 11:41:12 PM
your client: claudebot
current time: Tuesday 19th of March 2024 04:38:46 AM