volwatch man page on DigitalUNIX

volwatch man page on DigitalUNIX
Printed from http://www.polarhome.com/service/man/?qf=volwatch&af=0&tf=2&of=DigitalUNIX
volwatch(8)							   volwatch(8)

NAME
       volwatch	 -  Monitors  the  Logical  Storage  Manager (LSM) for failure
       events and performs hot sparing

SYNOPSIS
       /usr/sbin/volwatch [-m] [-s] [-o] [mail-addresses...]

OPTIONS
       Runs volwatch with the mail notification support	 to  notify  root  (by
       default) or other specified users when a failure occurs. This option is
       started by default.  Runs volwatch with hot spare  support.   Specifies
       an  argument  to	 pass  directly to volrecover if it is running and hot
       spare support is enabled.

DESCRIPTION
       The volwatch command monitors  LSM  waiting  for	 exception  events  to
       occur.  When  an	 exception  event  occurs,  the	 volwatch command uses
       mailx(1) to send mail to: The root account.  The user  accounts	speci‐
       fied  when you use the rcmgr command to set the VOLWATCH_USERS variable
       in the /etc/rc.config.common file.  The user account that  you  specify
       on the command line with the volwatch command.

       The  volwatch  command uses the volnotify command to wait for events to
       occur. When an event occurs,  there is a 15  second  delay  before  the
       failure is analyzed and the message is sent.  This delay allows a group
       of related events to be collected and reported in a  single  mail  mes‐
       sage.  By  default,  the volwatch command automatically starts when the
       system boots.

       You can enter the volwatch -s command to start volwatch with  hot-spare
       support. Hot-spare support: Detects LSM events resulting from the fail‐
       ure of a disk, plex, or RAID5 subdisk.  Sends mail to the root  account
       (and  other specified accounts) with notification about the failure and
       identifies the affected LSM  objects.   Determines  which  subdisks  to
       relocate,  finds	 space for those subdisks in the disk group, relocates
       the subdisks, and notifies  the	root  account	(and  other  specified
       accounts) of these actions and their success or failure.

	      When a partial disk failure occurs (that is, a failure affecting
	      only some subdisks on a disk), redundant data on the failed por‐
	      tion of the disk is relocated and the existing volumes comprised
	      of the unaffected portions of the disk remain accessible.

					Note

       Hot-sparing is only performed for redundant (mirrored  or  RAID5)  sub‐
       disks on a failed disk. Non-redundant subdisks on a failed disk are not
       relocated, but you are notified of the failure.

       Only one volwatch daemon can be running on a system or cluster node  at
       any time.

       Hot-sparing does not guarantee the same layout of data or the same per‐
       formance after relocation. You may  want	 to  make  some	 configuration
       changes after hot-sparing occurs.

   Mail Notification Support
       The following is a sample mail notification when a failure is detected:
       Failures have been detected by the Logical Storage Manager:

       failed disks:

       medianame

	...

       failed plexes:

       plexname

	...

       failed log plexes:

       plexname

	...

       failing disks:

       medianame
	...

       failed subdisks:

       subdiskname

	...

       The Logical Storage Manager will attempt to find spare disks,  relocate
       failed subdisks and then recover the data in the failed plexes.

       The following describes the sections of the mail message: The medianame
       list under failed disks specifies disks that appear to have  completely
       failed; The medianame list under failing disks indicates a partial disk
       failure or a disk that is in the process of failing. When  a  disk  has
       failed  completely,  the	 same medianame list appears under both failed
       disks: and failing disks.  The plexname list under failed plexes	 shows
       plexes  that  have  been detached due to I/O failures experienced while
       attempting to do I/O to subdisks they contain.  The plexname list under
       failed log plexes indicates RAID5 or dirty region log (DRL) plexes that
       have experienced failures. The subdiskname list specifies  subdisks  in
       RAID5 volumes that have been detached due to I/O errors.

   Enabling Hot-Sparing
       By  default,  hot-sparing is disabled. To enable hot-sparing, enter the
       volwatch command with the -s option, for example: # volwatch -s

       To use hot-spare support you should configure a disk as a spare,	 which
       identifies  the	disk  as  an available site for relocating failed sub‐
       disks.  Disks that are identified as spares are	not  used  for	normal
       allocations  unless you explicitly specify otherwise. This ensures that
       there is a pool of spare disk space  available  for  relocating	failed
       subdisks and that this disk space is not consumed by normal operations.

       Spare  disk  space is the first space used to relocate failed subdisks.
       However, if no spare disk space is available or if the available	 spare
       disk space is not suitable or sufficient, free disk space is used.

       You  must  initialize  a	 spare	disk and place it in a disk group as a
       spare before it can be used for replacement purposes. If no  disks  are
       designated  as spares when a failure occurs, LSM automatically uses any
       available free disk space in  the  disk	group  in  which  the  failure
       occurs. If there is not enough spare disk space, a combination of spare
       disk space and free disk space is used.

       When hot-sparing selects a disk for relocation, it preserves the redun‐
       dancy  characteristics of the LSM object to which the relocated subdisk
       belongs.	 For example, hot-sparing ensures that subdisks from a	failed
       plex  are  not  relocated  to  a disk containing a mirror of the failed
       plex. If redundancy cannot be preserved	using  available  spare	 disks
       and/or  free disk space, hot-sparing does not take place. If relocation
       is not possible, mail is sent indicating that no action was taken.

       When hot-sparing takes place, the failed subdisk is  removed  from  the
       configuration  database	and  LSM  takes precautions to ensure that the
       disk space used by the failed subdisk is	 not  recycled	as  free  disk
       space.

   Initializing and Removing Hot-Spare Disks
       Although hot-sparing does not require you to designate disks as spares,
       HP recommends that you initialize at least one disk as a	 spare	within
       each  disk  group; this gives you control over which disks are used for
       relocation. If no spare disks exist, LSM uses available free disk space
       within the disk group. When free disk space is used for relocation pur‐
       poses, it is likely that there may be performance degradation after the
       relocation.

       Follow these guidelines when choosing a disk to configuring as a spare:
       The hot-spare feature works best if you specify at least one spare disk
       in  each	 disk  group containing mirrored or RAID5 volumes.  If a given
       disk group spans multiple controllers and has more than one spare disk,
       set  up	the  spare  disks on different controllers (in case one of the
       controllers fails).  For a mirrored volume, the disk group must have at
       least  one  disk that does not already contain one of the volume's mir‐
       rors. This disk should either be a spare disk with some available space
       or  a  regular  disk  with some free space.  For a mirrored and striped
       volume, the disk group must have	 at  least  one	 disk  that  does  not
       already	contain	 one of the volume's mirrors or another subdisk in the
       striped plex. This disk should either be a spare disk with some	avail‐
       able space or a regular disk with some free space.  For a RAID5 volume,
       the disk group must have at least one disk that does not	 already  con‐
       tain the volume's RAID5 plex or one of its log plexes. This disk should
       either be a spare disk with some available space or a regular disk with
       some free space.	 If a mirrored volume has a DRL log subdisk as part of
       its data plex (for example, volprint does not list the plex  length  as
       LOGONLY),  that plex cannot be relocated. Therefore, place log subdisks
       in plexes that contain no data (log plexes). By default, the  volassist
       command	creates	 log  plexes.  For mirroring the root disk, the rootdg
       disk group should contain  an  empty  spare  disk  that	satisfies  the
       restrictions  for  mirroring the root disk.  Although it is possible to
       build LSM objects on spare disks, it is preferable to use  spare	 disks
       for  hot-spare  only.   When relocating subdisks off a failed disk, LSM
       attempts to use a spare disk large enough to hold  all  data  from  the
       failed disk.

       To  initialize  a  disk as a spare that has no associated subdisks, use
       the voldiskadd command and enter y at the following prompt: Add disk as
       a spare disk for newdg? [y,n,q,?] (default: n) y

       To  initialize  an  existing LSM disk as a spare disk, enter: # voledit
       set spare=on medianame

       For example, to initialize a disk called test03 as a spare disk, enter:
       # voledit set spare=on test03

       To remove a disk as a spare, enter: # voledit set spare=off medianame

       For  example,  to  make	a disk called test03 available for normal use,
       enter: # voledit set spare=off test03

   Replacement Procedure
       In the event of a disk failure, mail is sent, and if volwatch was  con‐
       figured	to  run	 with hot sparing support with the -s option, volwatch
       attempts to relocate any subdisks that  appear  to  have	 failed.  This
       involves	 finding appropriate spare disk or free disk space in the same
       disk group as the failed subdisk.

       To determine which disk from among the eligible	spare  disks  to  use,
       volwatch tries to use the disk that is closest to the failed disk.  The
       value of closeness depends on the controller, target, and  disk	number
       of  the	failed disk. For example, a disk on the same controller as the
       failed disk is closer than a disk on a  different  controller;  a  disk
       under  the  same	 target	 as the failed disk is closer than one under a
       different target.

       If no spare or free disk space is found, the following mail message  is
       sent  explaining the disposition of volumes on the failed disk: Reloca‐
       tion was not successful for subdisks on disk dm_name in	volume	v_name
       in  disk	 group dg_name.	 No replacement was made and the disk is still
       unusable.

       The following volumes have storage on medianame:

       volumename ...

       These volumes are still usable, but the redundancy of those volumes  is
       reduced.	 Any RAID-5 volumes with storage on the failed disk may become
       unusable in the face of further failures.

       If non-RAID5 volumes are made unusable due to the failure of the	 disk,
       the following is included in the mail message: The following volumes:

       volumename ...

       have data on medianame but have no other usable mirrors on other disks.
       These volumes are now unusable and the data  on	them  is  unavailable.
       These volumes must have their data restored.

       If RAID5 volumes are made unavailable due to the disk failure, the fol‐
       lowing message is included in the mail message:	The  following	RAID-5
       volumes:

       volumename ...

       have  storage  on  medianame and have experienced other failures. These
       RAID-5 volumes are now unusable and data on them is unavailable.	 These
       RAID-5 volumes must have their data restored.

       If  spare  disk	space is found, LSM attemps to set up a subdisk on the
       spare disk and use it to replace the failed subdisk. If	this  is  suc‐
       cessful,	 the  volrecover command runs in the background to recover the
       contents of data in volumes on the failed disk.

       If the relocation fails, the following mail message is sent: Relocation
       was  not	 successful  for  subdisks on disk dm_name in volume v_name in
       disk group dg_name. No replacement was made and the disk is still unus‐
       able.

       error message

       If  the	relocation  fails  after the plexs associated with the failing
       disk were detached, the following email message is sent: The  following
       plex(s)	in  volume  v_name  in	diskgroup  dg_name  were targetted for
       replacement but the relocation did not succeed due to not enough	 spare
       space. The plex(s) will be left in the DETACHED state.

       detachedpl_name

       If  disk dm_name is not faulty, re-attach plex(s) using the volplex att
       command. Otherwise add additional spare space and  re-invoke  the  vol‐
       watch -s command to replace faulty plex(s).

       If  any	volumes	 (RAID5 or otherwise) are rendered unusable due to the
       failure, the following is included in the mail message:	The  following
       volumes:

       volumename ...

       have  data  on dm_name but have no other usable mirrors on other disks.
       These volumes are now unusable and the data  on	them  is  unavailable.
       These volumes must have their data restored.

       If  the	relocation  procedure  completes  successfully and recovery is
       under way, the following mail message is sent:  Volume  v_name  Subdisk
       sd_name relocated to newsd_name, but not yet recovered.

       Once  recovery has completed, a message is sent relaying the outcome of
       the recovery procedure. If the recovery was successful,	the  following
       is included in the mail message: Recovery complete for volume v_name in
       disk group dg_name.

       If the recovery was not successful, the following is  included  in  the
       mail message: Failure recovering v_name in disk group dg_name.

SEE ALSO
       mailx(1),  rcmgr(8),  voldiskadm(8),  voledit(8),  volintro(8),	volre‐
       cover(8), volrootmir(8)

								   volwatch(8)
[top]

List of man pages available for DigitalUNIX

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome