Return to MAIN-Index  Return to SUB-Index    IBM-AUSTRIA - PC-HW-Support    30 Aug 1999

Recovery Procedures When HSP is Present at Time of Failure



Recovery Procedures When HSP is Present at Time of Failure


The following instructions apply to thc IBM SCSI-2 Fast/Wide PCI-Bus RAID Adapter and IBM Fast/Wide Streaming Adapter/A.

One DDD Drive, No OFL 

Follow the steps below to bring the DDD drive back to ONL if the following items arc true:



Once you verify the conditions above through either the RAID administration log or the RAID administration utility, perform the following steps to bring the DDD drive back to HSP status.
  1.  Physically replace the hard drive in the DDD bay with a new one of the same capacity  or greater.
  2.  With a RAID-1 or RAID-5 array, the operating system is still functional at this point.  Use either NetFinity or the RAID administration utility to bring the drive hack to HSP  status. With the RATD administration utility, open the options menu and select Replace Drive.
  3.  When you see the prompt to select the DDD drive, highlight the drive you just  replaced and press Enter.
  4.  The RAID adapter issues a start unit command to the drive. Once the drive  successfully spins up, the RAID adapter changes the drive's status to HSP and saves  the new configuration.
  5.  If you see an 'Error in starting drive' message, reinsert cables, the hard drive, etc., to  verify these are connected properly, then go to step 2. If the error persists, go to step 1.
  6.  If the error still occurs with a known good hard drive, then troubleshoot to determine  the defective part, which may be a cable, back plane, RAID adapter, etc. Once you  have replaced the defective part so that there is a good connection between RAID  adapter and hard drive, go to step 2.


Two DDD Drives, No OFL .

If the system has two DDD drives, and a defined hot spare existed prior to the drive fijilures, then the system should still be up and running as long as the logical drives are configured as RAID-5 or RAID-1. If the system is still running, then one of the DDD drives becomes HSP when you replace it. Perform the following steps to bring the logical drive back to ONL status. (Because the operating system is functional, this procedure assumes you are using the RAID administration utility within the operating system to recover.):
  1.  Physically replace both drives that are marked DDD.
  2.  Once you replace both drives, select the options menu of the RAID administration  utility. Choose Replace Drive, highlight the first DDD drive, and press Enter. You  receive a message confirming that the drive is starting. After that, one of two things  happens:

     You can check which one occurs by viewing the RAID log.

  3.  Repeat step 2 for the second DDD drive.


More than 2 DDD Drives, No OFL 

In this scenario, the operating system is no longer functional. Therefore, you must boot to the RAID Option Diskette to recover the array. It is extremely important  to confirm that either the RAID administration utility or NetFinity Manager has been running prior to the drives being marked defunct. If so, the utility or NetFinity Manager has logged the sequence of DDD events to a log file either on a diskette or on a local or network drive. With this file, you can view the log file on another machine to determine the 'inconsistent' drive. When you know which drive is 'inconsistent', you can attempt to recover data.

Note: The previous paragraph states 'attempt to recover' because once you lose more than one drive in a set of RAID-5 or RAID-1 logical drives, loss of data is definitely a possibility. The steps below guide you through a recovery, if at all possible.

  1.  View the RAID log on another machine and write down the order in which the drives  went defunct.
  2.  Boot to the RAID configuration diskette, and select View Configuration. Make sure  that the template contains the correct information for the status of all drives, not just  those listed in the RAID log.
  3.  Using the RAID configuration utility, select Replace Drive and choose a DDD drive  that is not listed in the RAID log. Repeat this step until the only DDD drives  remaining arc those indicated in the RAID log file.

    NOTE: The drives marked DDD that are not listed in the RAID log are the last ones to  go defunct. You must recover these drives first so that the infornaation from them can  be used to rebuild the original drive that failed (the 'inconsistent' drive). If you do not  replace the 'inconsistent' drive last, then the system uses it to rebuild the last drive  that went defunct, resulting in corrupted data. Therefore, it is extremely important to  perform step 3 carefully.

  4.  Select Replace Drive and then select the last drive to go defunct according to the log  file. Repeat this step until you have replaced all drives in the correct order. One of the  drives should appear as OFL and one should appear as HSP, the rest appear as ONL.
  5.  Select Rebuild and highlight the DDD drive.
  6.  If the rebuild completes successfully, reboot to the operating system. If it does not  complete successfully, go to step 7.

     At this point, run non-destructive RAID diagnostics individually on each drive. Run  these diagnostics individually to ensure that you do not get more than one drive that  goes defunct at a time. If a drive does go DDD, physically replace that drive and run a  replace/rebuild procedure. This verifies that you remove all defective drives from the  system, if any exist.

  7.  If the rebuild process fails, then perform these steps:

    1.  Exit to the RAID Main Menu.
    2.  Select Drive Information and view the error counts for each of the hard  drives to determine which drive has errors.
    3.  If the errors occurred on the drive being rebuilt, then physically replace this  drive. Select Replace. The status of the drive changes from DDD to OFL.  Attempt the rebuild process again. If it completes successfully, go to Step 6.

     If the drive still fails the rebuild process, then verify that the drives being rebuilt from  do not have any errors. If they have no errors, then you should be able to rebuild the  data. Check cable connections to the drive being rebuilt it is possible that you replaced  a defective drive with another defective drive.

  8.  If a backup configuration is available, restore the backup configuration.
  9.  If a backup configuration is not available, write down the information you can  retrieve by selecting the View Configuration option. Delete the array and  manually create it to match this configuration information. Perform this step  carefully, for if you deviate in any way from the original configuration, then  you will lose all data.

    NOTE: Do not Initialize this logical drive.

  10.  Have all users verify their personal files to ensure their data is good. Keep in  mind that some files may be corrupt due to rebuild errors.


One or More DDD Drives, and One OFL Drive 

Follow the same basic steps as those listed in the above section to recover your data. When a drive is marked OFL, that means that it is spinning but 'inconsistent' with the rest of the array. Usually when a drive is marked OFL, the data on it is being rebuilt from the remaining drives in the array. If the server loses power, or if another drive goes DDD during a rebuild, then the drive being rebuilt remains OFL. In this case, you have to boot the machine to the RAID Configuration Diskette and then follow the procedure in the previous section. Make sure that the OFL drive is the last drive to be software replaced. The offline drive is the 'inconsistent' drive, and it requires a rebuilding process.

NOTE: Data corruption occurs if the OFL drive is used to rebuild another drive.


Back to  Jump to TOP-of-PAGE
More INFORMATION / HELP is available at the  IBM-HelpCenter

Please see the LEGAL  -  Trademark notice.
Feel free - send a Email-NOTE  for any BUG on this page found - Thank you.