crash man page on UNIXv7

crash man page on UNIXv7
Man page or keyword search:
man Server 300 pages
apropos Keyword Search (all sections)
Output format
CRASH(8)							      CRASH(8)

NAME
       crash - what to do when the system crashes

DESCRIPTION
       This  section  gives  at	 least a few clues about how to proceed if the
       system crashes.	It can't pretend to be complete.

       Bringing it back up.  If the reason for the crash is not	 evident  (see
       below for guidance on `evident') you may want to try to dump the system
       if you feel up to debugging.  At the moment a dump can be taken only on
       magtape.	 With a tape mounted and ready, stop the machine, load address
       44, and start.  This should write a copy of all of  core	 on  the  tape
       with  an EOF mark.  Caution: Any error is taken to mean the end of core
       has been reached.  This means that you must be sure the ring is in, the
       tape  is	 ready, and the tape is clean and new.	If the dump fails, you
       can try again, but some of the registers will be lost.  See  below  for
       what to do with the tape.

       In  restarting  after  a crash, always bring up the system single-user.
       This is accomplished by following the directions in boot(8) as modified
       for  your particular installation; a single-user system is indicated by
       having a particular value in the switches (173030 unless you've changed
       init)  as  the  system starts executing.	 When it is running, perform a
       dcheck and icheck(1) on all file systems which could have been  in  use
       at  the	time  of  the  crash.  If any serious file system problems are
       found, they should be repaired.	When you are satisfied with the health
       of your disks, check and set the date if necessary, then come up multi-
       user.  This is most easily accomplished	by  changing  the  single-user
       value  in the switches to something else, then logging out by typing an
       EOT.

       To even boot UNIX at all, three files (and the directories  leading  to
       them) must be intact.  First, the initialization program /etc/init must
       be present and executable.  If it is not, the CPU  will	loop  in  user
       mode  at location 6.  For init to work correctly, /dev/tty8 and /bin/sh
       must be present.	 If  either  does  not	exist,	the  symptom  is  best
       described  as  thrashing.  Init will go into a fork/exec loop trying to
       create a Shell with proper standard input and output.

       If you cannot get the  system  to  boot,	 a  runnable  system  must  be
       obtained	 from  a backup medium.	 The root file system may then be doc‐
       tored as a mounted file system as described below.  If  there  are  any
       problems	 with  the root file system, it is probably prudent to go to a
       backup system to avoid working on a mounted file system.

       Repairing disks.	 The first rule to keep in mind is that an addled disk
       should be treated gently; it shouldn't be mounted unless necessary, and
       if it is very valuable yet in quite bad shape,  perhaps	it  should  be
       dumped  before  trying surgery on it.  This is an area where experience
       and informed courage count for much.

       The problems reported by icheck typically fall into two	kinds.	 There
       can  be	problems  with	the free list: duplicates in the free list, or
       free blocks also in files.  These can be cured easily  with  an	icheck
       -s.   If the same block appears in more than one file or if a file con‐
       tains bad blocks, the files should be deleted, and the free list recon‐
       structed.   The	best way to delete such a file is to use clri(1), then
       remove its directory entries.  If any of the affected files  is	really
       precious, you can try to copy it to another device first.

       Dcheck  may  report files which have more directory entries than links.
       Such situations are potentially dangerous;  clri	 discusses  a  special
       case  of the problem.  All the directory entries for the file should be
       removed.	 If on the other hand there  are  more	links  than  directory
       entries,	 there	is  no	danger of spreading infection, but merely some
       disk space that is lost for use.	 It is sufficient to copy the file (if
       it has any entries and is useful) then use clri on its inode and remove
       any directory entries that do exist.

       Finally, there may be inodes reported by dcheck that have 0 links and 0
       entries.	  These	 occur	on  the root device when the system is stopped
       with pipes open, and on other file systems when the system  stops  with
       files  that  have  been deleted while still open.  A clri will free the
       inode, and an icheck -s will recover any missing blocks.

       Why did it crash?  UNIX types a message on the console typewriter  when
       it  voluntarily	crashes.   Here	 is the current list of such messages,
       with enough information to provide a hope at least of the remedy.   The
       message has the form `panic: ...', possibly accompanied by other infor‐
       mation.	Left unstated in all cases is the possibility that hardware or
       software error produced the message in some unexpected way.

       blkdev
	    The	 getblk	 routine was called with a nonexistent major device as
	    argument.  Definitely hardware or software error.

       devtab
	    Null device table entry for the major device used as  argument  to
	    getblk.  Definitely hardware or software error.

       iinit
	    An I/O error reading the super-block for the root file system dur‐
	    ing initialization.

       out of inodes
	    A mounted file system has no more i-nodes when  creating  a	 file.
	    Sorry, the device isn't available; the icheck should tell you.

       no fs
	    A  device  has  disappeared	 from the mounted-device table.	 Defi‐
	    nitely hardware or software error.

       no imt
	    Like `no fs', but produced elsewhere.

       no inodes
	    The in-core	 inode	table  is  full.   Try	increasing  NINODE  in
	    param.h.  Shouldn't be a panic, just a user error.

       no clock
	    During initialization, neither the line nor programmable clock was
	    found to exist.

       swap error
	    An unrecoverable I/O error during a swap.  Really shouldn't	 be  a
	    panic, but it is hard to fix.

       unlink - iget
	    The	 directory  containing	a  file	 being deleted can't be found.
	    Hardware or software.

       out of swap space
	    A program needs to be swapped out,	and  there  is	no  more  swap
	    space.  It has to be increased.  This really shouldn't be a panic,
	    but there is no easy fix.

       out of text
	    A pure procedure program is being executed, and the table for such
	    things is full.  This shouldn't be a panic.

       trap
	    An unexpected trap has occurred within the system.	This is accom‐
	    panied by three numbers: a `ka6', which is	the  contents  of  the
	    segmentation  register for the area in which the system's stack is
	    kept; `aps', which is the location where the hardware  stored  the
	    program  status  word  during  the	trap;  and a `trap type' which
	    encodes which trap occurred.  The trap types are:

       0	 bus error
       1	 illegal instruction
       2	 BPT/trace
       3	 IOT
       4	 power fail
       5	 EMT
       6	 recursive system call (TRAP instruction)
       7	 11/70 cache parity, or programmed interrupt
       10	 floating point trap
       11	 segmentation violation

       In some of these cases it is possible for octal 20 to be added into the
       trap  type; this indicates that the processor was in user mode when the
       trap occurred.  If you wish to examine the stack	 after	such  a	 trap,
       either  dump  the  system, or use the console switches to examine core;
       the required address mapping is described below.

       Interpreting dumps.  All file system problems should be taken  care  of
       before  attempting  to look at dumps.  The dump should be read into the
       file /usr/sys/core; cp(1) will do.  At this point, you  should  execute
       ps  -alxk  and who to print the process table and the users who were on
       at the time of the crash.  You should dump ( od(1)) the first 30	 bytes
       of  /usr/sys/core.   Starting  at location 4, the registers R0, R1, R2,
       R3, R4, R5, SP and KDSA6 (KISA6 for 11/40s) are stored.	 If  the  dump
       had  to	be restarted, R0 will not be correct.  Next, take the value of
       KA6 (location 022(8) in	the  dump)  multiplied	by  0100(8)  and  dump
       01000(8) bytes starting from there.  This is the per-process data asso‐
       ciated with the process running at the time of the crash.  Relabel  the
       addresses  140000  to  141776.	R5  is	C's  frame or display pointer.
       Stored at (R5) is the old R5 pointing to the previous stack frame.   At
       (R5)+2  is  the	saved PC of the calling procedure.  Trace this calling
       chain until you obtain an R5 value of 141756, which is where the user's
       R5 is stored.  If the chain is broken, you have to look for a plausible
       R5, PC pair and continue from there.  Each PC should be	looked	up  in
       the  system's  name  list  using	 adb(1)	 and its `:' command, to get a
       reverse calling order.  In most cases this procedure will give an  idea
       of  what	 is  wrong.  A more complete discussion of system debugging is
       impossible here.

SEE ALSO
       clri(1), icheck(1), dcheck(1), boot(8)

								      CRASH(8)
[top]

List of man pages available for UNIXv7

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome