fenced man page on YellowDog

Man page or keyword search:  
man Server   18644 pages
apropos Keyword Search (all sections)
Output format
YellowDog logo
[printable version]


fenced(8)							     fenced(8)

NAME
       fenced - the I/O Fencing daemon

SYNOPSIS
       fenced [OPTION]...

DESCRIPTION
       The  fencing  daemon,  fenced,  fences  cluster nodes that have failed.
       Fencing a node generally means rebooting it or otherwise preventing  it
       from  writing  to  storage,  e.g.  disabling  its port on a SAN switch.
       Fencing involves interacting with a hardware device, e.g. network power
       switch,	SAN switch, storage array.  Different "fencing agents" are run
       by fenced to interact with various hardware devices.

       Software related to sharing storage among nodes in a cluster, e.g. GFS,
       usually	requires fencing to be configured to prevent corruption of the
       storage in the presence of node failure and  recovery.	GFS  will  not
       allow  a	 node  to  mount  a GFS file system unless the node is running
       fenced.	Fencing happens in the context of a cman/openais  cluster.   A
       node must be a cluster member before it can run fenced.

       Once started, fenced waits for the 'fence_tool join' command to be run,
       telling it to join the fence domain: a group of nodes  managed  by  the
       openais/cpg/groupd  cluster  infrastructure.   In most cases, all nodes
       will join the fence domain after joining the cluster.

       Fence domain members are aware of the membership of the group, and  are
       notified when nodes join or leave.  If a fence domain member fails, one
       of the remaining members will fence it.	If the cluster has  lost  quo‐
       rum,  fencing  won't occur until quorum has been regained.  If a failed
       node is reset and rejoins the cluster before the remaining domain  mem‐
       bers have fenced it, the fencing will be bypassed.

   Node failure
       When a domain member fails, fenced runs an agent to fence it.  The spe‐
       cific agent to run and the parameters the agent requires are  all  read
       from  the cluster.conf file (using libccs) at the time of fencing.  The
       fencing operation against a failed  node	 is  not  considered  complete
       until  the  exec'ed agent exits.	 The exit value of the agent indicates
       the success or failure of the  operation.   If  the  operation  failed,
       fenced  will  retry  (possibly with a different agent, depending on the
       configuration) until fencing succeeds.  Other systems such as  DLM  and
       GFS  will  not  begin their own recovery for a failed node until fenced
       has successfully completed fencing it.  So, a delay or problem in fenc‐
       ing  will result in other systems like DLM/GFS being blocked.  Informa‐
       tion about fencing operations will appear in syslog.

       When a domain member fails, the actual fencing operation can be delayed
       by  a  configurable  number of seconds (cluster.conf:post_fail_delay or
       -f).  Within this time, the failed node could be reset and  rejoin  the
       cluster	to avoid being fenced.	This delay is 0 by default to minimize
       the time that other systems are blocked (see above).

   Domain startup
       When the domain is first created in the cluster (by the first  node  to
       join  it)  and subsequently enabled (by the cluster gaining quorum) any
       nodes listed in cluster.conf that are not presently members of the cman
       cluster are fenced.  The status of these nodes is unknown, and to be on
       the side of safety they are assumed to be in  need  of  fencing.	  This
       startup	fencing can be disabled, but it's only truely safe to do so if
       an operator is present to verify that no cluster nodes are in  need  of
       fencing.

       This  example  illustrates  why	startup	 fencing is important.	Take a
       three node cluster with nodes A, B and C;  all  three  have  a  GFS  fs
       mounted.	  All  three nodes experience a low-level kernel hang at about
       the same time.  A watchdog triggers a reboot on nodes A and B, but  not
       C.  A and B boot back up, form the cluster again, gain quorum, join the
       fence domain, *don't* fence node C which is still  hung	and  unrespon‐
       sive,  and  mount the GFS fs again.  If C were to come back to life, it
       could corrupt the fs.  So, A and B need to fence C when they reform the
       fence  domain  since  they  don't know the state of C.  If C *had* been
       reset by a watchdog like A and B, but was just slow in rebooting,  then
       A and B might be fencing C unnecessarily when they do startup fencing.

       The  first  way	to  avoid fencing nodes unnecessarily on startup is to
       ensure that all nodes have joined the cluster before any of  the	 nodes
       start the fence daemon.	This method is difficult to automate.

       A  second  way to avoid fencing nodes unnecessarily on startup is using
       the cluster.conf:post_join_delay setting (or -j option).	 This  is  the
       number of seconds fenced will delay before actually fencing any victims
       after nodes join the domain.  This delay gives  nodes  that  have  been
       tagged for fencing a chance to join the cluster and avoid being fenced.
       A delay of -1 here will cause the daemon to wait indefinitely  for  all
       nodes  to  join	the  cluster  and  no nodes will actually be fenced on
       startup.

       To disable fencing at domain-creation time entirely, the -c option  can
       be  used	 to  declare  that  all	 nodes are in a clean or safe state to
       start.  The clean_start cluster.conf option can also be set to do this,
       but  automatically  disabling  startup fencing in cluster.conf can risk
       file system corruption.

       Avoiding unnecessary fencing at startup is  primarily  a	 concern  when
       nodes  are  fenced  by power cycling.  If nodes are fenced by disabling
       their SAN access, then unnecessarily fencing a  node  is	 usually  less
       disruptive.

   Fencing override
       If  a  fencing  device fails, the agent may repeatedly return errors as
       fenced tries to fence a failed node.  In this case, the admin can manu‐
       ally  reset  the	 failed	 node,	and  then use fence_ack_manual to tell
       fenced to continue without fencing the node.

CONFIGURATION FILE
       Fencing daemon behavior can be controlled by  setting  options  in  the
       cluster.conf  file  under  the  section <fence_daemon> </fence_daemon>.
       See above for complete descriptions of these values.  The delay	values
       are  in	seconds;  -1 secs means an unlimitted delay.  The values shown
       are the defaults.

       Post-join delay is the number of seconds the daemon  will  wait	before
       fencing any victims after a node joins the domain.

	 <fence_daemon post_join_delay="6"/>

       Post-fail  delay	 is  the number of seconds the daemon will wait before
       fencing any victims after a domain member fails.

	 <fence_daemon post_fail_delay="0"/>

       Clean-start is used to prevent any startup fencing the daemon might do.
       It  indicates  that  the	 daemon should assume all nodes are in a clean
       state to start.

	 <fence_daemon clean_start="0"/>

       Override-path is the location of a FIFO used for communication  between
       fenced and fence_ack_manual.

	 <fence_daemon override_path="/var/run/cluster/fenced_override"/>

   Per-node fencing settings
       The  per-node  fencing  configuration can become complex and is largely
       specific to the hardware being used.  The general framework begins like
       this:

	 <clusternodes>

	 <clusternode name="node1" nodeid="1">
		 <fence>
		 </fence>
	 </clusternode>

	 <clusternode name="node2" nodeid="2">
		 <fence>
		 </fence>
	 </clusternode>

	 ...
	 </clusternodes>

       The  simple fragment above is a valid configuration: there is no way to
       fence these nodes.  If one of these nodes is in the  fence  domain  and
       fails,  fenced  will  repeatedly fail in its attempts to fence it.  The
       admin will need	to  manually  reset  the  failed  node	and  then  use
       fence_ack_manual	 to tell fenced to continue on without fencing it (see
       override above).

       There is typically a single method used to fence each  node  (the  name
       given to the method is not significant).	 A method refers to a specific
       device listed in the separate <fencedevices> section,  and  then	 lists
       any node-specific parameters related to using the device.

	 <clusternodes>

	 <clusternode name="node1" nodeid="1">
		 <fence>
		    <method name="single">
		       <device name="myswitch" hw-specific-param="x"/>
		    </method>
		 </fence>
	 </clusternode>

	 <clusternode name="node2" nodeid="2">
		 <fence>
		    <method name="single">
		       <device name="myswitch" hw-specific-param="y"/>
		    </method>
		 </fence>
	 </clusternode>

	 ...
	 </clusternodes>

   Fence device settings
       This  section  defines  properties  of the devices used to fence nodes.
       There may be one or more devices listed.	 The per-node fencing sections
       above reference one of these fence devices by name.

	 <fencedevices>
		 <fencedevice name="myswitch" ipaddr="1.2.3.4" .../>
	 </fencedevices>

   Multiple methods for a node
       In  more	 advanced  configurations,  multiple  fencing  methods	can be
       defined for a node.  If fencing fails using the	first  method,	fenced
       will  try  the next method, and continue to cycle through methods until
       one succeeds.

	 <clusternode name="node1" nodeid="1">
		 <fence>
		    <method name="first">
		       <device name="powerswitch" hw-specific-param="x"/>
		    </method>

		    <method name="second">
		       <device name="storageswitch" hw-specific-param="1"/>
		    </method>
		 </fence>
	 </clusternode>

   Dual path, redundant power
       Sometimes fencing a node requires disabling two power ports or two  i/o
       paths.  This is done by specifying two or more devices within a method.

	 <clusternode name="node1" nodeid="1">
		 <fence>
		    <method name="single">
		       <device name="sanswitch1" hw-specific-param="x"/>
		       <device name="sanswitch2" hw-specific-param="x"/>
		    </method>
		 </fence>
	 </clusternode>

       When  using power switches to fence nodes with dual power supplies, the
       agents must be told to turn off both power ports before restoring power
       to  either port.	 The default off-on behavior of the agent could result
       in the power never being fully disabled to the node.

	 <clusternode name="node1" nodeid="1">
		 <fence>
		    <method name="single">
		       <device name="nps1" hw-param="x" action="off"/>
		       <device name="nps2" hw-param="x" action="off"/>
		       <device name="nps1" hw-param="x" action="on"/>
		       <device name="nps2" hw-param="x" action="on"/>
		    </method>
		 </fence>
	 </clusternode>

   Hardware-specific settings
       Find   documentation   for   configuring	  specific   devices	at
       http://sources.redhat.com/cluster/

OPTIONS
       Command line options override corresonding values in cluster.conf.

       -j secs
	      Post-join fencing delay

       -f secs
	      Post-fail fencing delay

       -c     All nodes are in a clean state to start.

       -O     Path of the override fifo.

       -D     Enable debugging code and don't fork into the background.

       -V     Print the version information and exit.

       -h     Print  out  a  help  message  describing available options, then
	      exit.

DEBUGGING
       The fenced daemon keeps a circular buffer of debug messages that can be
       dumped with the 'fence_tool dump' command.

SEE ALSO
       fence_tool(8), cman(8), groupd(8), group_tool(8)

								     fenced(8)
[top]

List of man pages available for YellowDog

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net