|
IBM-AUSTRIA - PC-HW-Support 30 Aug 1999 |
Recovery Procedures When HSP is Present at Time of Failure
Recovery Procedures When HSP is Present at Time of Failure
The following instructions apply to thc IBM SCSI-2 Fast/Wide PCI-Bus RAID Adapter and IBM
Fast/Wide Streaming Adapter/A.
One DDD Drive, No OFL
Follow the steps below to bring the DDD drive back to ONL if the following items arc true:
- Only one drive is marked DDD and the rest are ONL.
- The RAID logical drive status is OKY because an HSP is present in the system. Fither
the HSP drive is the hard drive that went DDD or the HSP has already automatically
taken over for the DDD drive and has been rebuilt successfully.
- There are no drives with an OFL status.
Once you verify the conditions above through either the RAID administration log or the RAID
administration utility, perform the following steps to bring the DDD drive back to HSP status.
- Physically replace the hard drive in the DDD bay with a new one of the same capacity
or greater.
- With a RAID-1 or RAID-5 array, the operating system is still functional at this point.
Use either NetFinity or the RAID administration utility to bring the drive hack to HSP
status. With the RATD administration utility, open the options menu and select
Replace Drive.
- When you see the prompt to select the DDD drive, highlight the drive you just
replaced and press Enter.
- The RAID adapter issues a start unit command to the drive. Once the drive
successfully spins up, the RAID adapter changes the drive's status to HSP and saves
the new configuration.
- If you see an 'Error in starting drive' message, reinsert cables, the hard drive, etc., to
verify these are connected properly, then go to step 2. If the error persists, go to step 1.
- If the error still occurs with a known good hard drive, then troubleshoot to determine
the defective part, which may be a cable, back plane, RAID adapter, etc. Once you
have replaced the defective part so that there is a good connection between RAID
adapter and hard drive, go to step 2.
Two DDD Drives, No OFL .
If the system has two DDD drives, and a defined hot spare existed prior to the drive fijilures, then
the system should still be up and running as long as the logical drives are configured as RAID-5
or RAID-1. If the system is still running, then one of the DDD drives becomes HSP when you
replace it. Perform the following steps to bring the logical drive back to ONL status. (Because the
operating system is functional, this procedure assumes you are using the RAID administration
utility within the operating system to recover.):
- Physically replace both drives that are marked DDD.
- Once you replace both drives, select the options menu of the RAID administration
utility. Choose Replace Drive, highlight the first DDD drive, and press Enter. You
receive a message confirming that the drive is starting. After that, one of two things
happens:
- The drive starts the rebuild process, when complete, the drive changes to
ONL.
-OR-
- The drive becomes HSP. This happens if the actual hot-spare drive that was
previously defined is defective, or a different drive was marked DDD and the
hot spare successfully rebuilt the data before the second drive went down.
You can check which one occurs by viewing the RAID log.
- Repeat step 2 for the second DDD drive.
More than 2 DDD Drives, No OFL
In this scenario, the operating system is no longer functional. Therefore, you must boot to the
RAID Option Diskette to recover the array. It is extremely important to confirm that either the
RAID administration utility or NetFinity Manager has been running prior to the drives being
marked defunct. If so, the utility or NetFinity Manager has logged the sequence of DDD events to
a log file either on a diskette or on a local or network drive. With this file, you can view the log
file on another machine to determine the 'inconsistent' drive. When you know which drive is
'inconsistent', you can attempt to recover data.
Note: The previous paragraph states 'attempt to recover' because once you lose more than one
drive in a set of RAID-5 or RAID-1 logical drives, loss of data is definitely a possibility. The
steps below guide you through a recovery, if at all possible.
- View the RAID log on another machine and write down the order in which the drives
went defunct.
- Boot to the RAID configuration diskette, and select View Configuration. Make sure
that the template contains the correct information for the status of all drives, not just
those listed in the RAID log.
- Using the RAID configuration utility, select Replace Drive and choose a DDD drive
that is not listed in the RAID log. Repeat this step until the only DDD drives
remaining arc those indicated in the RAID log file.
NOTE: The drives marked DDD that are not listed in the RAID log are the last ones to
go defunct. You must recover these drives first so that the infornaation from them can
be used to rebuild the original drive that failed (the 'inconsistent' drive). If you do not
replace the 'inconsistent' drive last, then the system uses it to rebuild the last drive
that went defunct, resulting in corrupted data. Therefore, it is extremely important to
perform step 3 carefully.
- Select Replace Drive and then select the last drive to go defunct according to the log
file. Repeat this step until you have replaced all drives in the correct order. One of the
drives should appear as OFL and one should appear as HSP, the rest appear as ONL.
- Select Rebuild and highlight the DDD drive.
- If the rebuild completes successfully, reboot to the operating system. If it does not
complete successfully, go to step 7.
At this point, run non-destructive RAID diagnostics individually on each drive. Run
these diagnostics individually to ensure that you do not get more than one drive that
goes defunct at a time. If a drive does go DDD, physically replace that drive and run a
replace/rebuild procedure. This verifies that you remove all defective drives from the
system, if any exist.
- If the rebuild process fails, then perform these steps:
- Exit to the RAID Main Menu.
- Select Drive Information and view the error counts for each of the hard
drives to determine which drive has errors.
- If the errors occurred on the drive being rebuilt, then physically replace this
drive. Select Replace. The status of the drive changes from DDD to OFL.
Attempt the rebuild process again. If it completes successfully, go to Step 6.
If the drive still fails the rebuild process, then verify that the drives being rebuilt from
do not have any errors. If they have no errors, then you should be able to rebuild the
data. Check cable connections to the drive being rebuilt it is possible that you replaced
a defective drive with another defective drive.
- If a backup configuration is available, restore the backup configuration.
- If a backup configuration is not available, write down the information you can
retrieve by selecting the View Configuration option. Delete the array and
manually create it to match this configuration information. Perform this step
carefully, for if you deviate in any way from the original configuration, then
you will lose all data.
NOTE: Do not Initialize this logical drive.
- Have all users verify their personal files to ensure their data is good. Keep in
mind that some files may be corrupt due to rebuild errors.
One or More DDD Drives, and One OFL Drive
Follow the same basic steps as those listed in the above section to recover your data. When a
drive is marked OFL, that means that it is spinning but 'inconsistent' with the rest of the array.
Usually when a drive is marked OFL, the data on it is being rebuilt from the remaining drives in
the array. If the server loses power, or if another drive goes DDD during a rebuild, then the drive
being rebuilt remains OFL. In this case, you have to boot the machine to the RAID Configuration
Diskette and then follow the procedure in the previous section. Make sure that the OFL drive is
the last drive to be software replaced. The offline drive is the 'inconsistent' drive, and it requires
a rebuilding process.
NOTE: Data corruption occurs if the OFL drive is used to rebuild another drive.
Back to
More INFORMATION / HELP is available at the IBM-HelpCenter
Please see the LEGAL - Trademark notice.
Feel free - send a for any BUG on this page found - Thank you.