volintro(8)volintro(8)NAME
volintro, lsm, LSM - Introduction to Logical Storage Manager (LSM)
terms and commands
DESCRIPTION
The following LSM commands provide a shell-level interface used by the
system administrator and higher-level applications and scripts to query
and manipulate LSM objects:
volassist, volclonedg, vold, voldctl, voldg, voldisk, voldiskadd, vold‐
iskadm, voldisksetup, voledit, volencap, volevac, volinfo, volinstall,
voliod, vollogcnvt, volmake, volmend, volmigrate, volmirror, volnotify,
volplex, volprint, volreattach, volreconfig, volrecover, volrestore,
volrootmir, volsave, volsd, volsetup, volstat, voltrace, volume, volun‐
migrate, volunroot, volwatch
GLOSSARY
The following are LSM terms and definitions. A virtual disk device
that looks to applications and file systems like a physical disk parti‐
tion device. Volumes present block and raw device interfaces that are
compatible in their use with disk partition devices. However, a volume
is a virtual device that can be mirrored, striped or spanned across
several disk drives, and moved to use different storage, using adminis‐
trative commands. The configuration of a volume can be changed, using
LSM commands, without causing disruption to applications or file sys‐
tems that are using the volume. A copy of a volume's logical data
address space; also known as a mirror. A volume can have up through 32
plexes associated with it. Each plex is, at least conceptually, a copy
of the volume that is maintained consistently in the presence of volume
I/O and changes to the LSM configuration. Plexes represent the primary
means of configuring storage for a volume. Plexes can have a striped,
concatenated, or RAID5 organization (layout). Disks exist as two enti‐
ties: A physical disk on which all data is ultimately stored and which
exhibits all the behaviors of the underlying technology. An LSM repre‐
sentation of the disk which, while mapping one-to-one with the physical
disk, is just a representation of storage devices from which alloca‐
tions of storage are made.
The difference is that a physical disk presents the image of a
device with a definable geometry with a definable number of
cylinders, heads, and so on, while a Logical Storage Manager
disk is simply a unit of allocation with a name and a size.
Disks used by LSM usually contain two special regions: a private
region and a public region. Typically, each region is formed
from a complete partition of the disk, resulting in a sliced
disk; however, the private and public regions can be allocated
from the same partition, resulting in a simple disk. A disk used
by LSM can also be a nopriv disk, which has only a public region
and no private region. LSM nopriv disks are created as the
result of encapsulating a disk or disk partition. A region of
storage allocated on a disk for use by a volume. Subdisks are
associated with volumes through plexes. You organize one or
more subdisks to form plexes based on the plex layout (concate‐
nated, striped, or RAID5). Subdisks are defined relative to disk
media records. A reference to a physical disk, or possibly a
disk partition. This record can be thought of as a physical disk
identifier for the disk or partition. Disk media records are
configuration records that provide a name (known as the disk
media name or DM name) that an administrator can use to refer‐
ence a particular disk, independent of its location on the sys‐
tem's various disk controllers. Disk media records reference
particular physical disks through a disk ID, which is a unique
identifier that is assigned to a disk when it is initialized for
use with the LSM software.
Operations are provided to set or remove the disk ID stored in a
disk media record. Such operations have the effect of removing
or replacing disks, along with any associated subdisks. A con‐
figuration record that defines the path to a disk. Disk access
records most often name a unit number. LSM uses the disk access
records stored in a system to find all disks attached to the
system. Disk access records do not identify particular physical
disks.
Disk access records are identified by their disk access names
(also known as DA names).
Through the use of disk IDs, LSM allows you to move disks
between controllers or to different locations on a controller.
When you move a disk, a different disk access record is used to
access the disk, although the disk media record will continue to
track the actual physical disk.
On some systems, LSM builds a list of disk access records auto‐
matically, based on the list of devices attached to the system.
On these systems, it is not necessary to define disk access
records explicitly. On other systems, you must define disk
access records with the /sbin/voldisk define command. Specialty
disks, such as RAM disks or floppy disks, are likely to require
explicit /sbin/voldisk define commands. A group of disks that
share a common configuration database. A configuration database
is a set of records describing objects including disks, volumes,
plexes, and subdisks that are associated with one particular
disk group. Each disk group has an administrator-assigned name
that is used to reference that disk group. Each disk group also
has an internally defined unique disk group ID, which differen‐
tiates two disk groups with the same administrator-assigned
name.
Disk groups provide a method to partition the configuration
database, so that the database size is not too large and so that
database modifications do not affect too many drives. They also
allow LSM to operate with groups of physical disk media that can
be moved between systems.
Disks and disk groups have a circular relationship: disk groups
are formed from disks, and disk group configurations are stored
on disks. All disks in a disk group are stamped with a disk
group ID, which is a unique identifier for naming disk groups.
Some or all disks in a disk group also store copies of the con‐
figuration database of the disk group. A small database that
contains all volume, plex, subdisk, and disk media records.
These databases are replicated onto some or all disks in the
disk group, with up to two copies on each disk. Because these
databases pertain to disk groups, record associations cannot
span disk groups. Thus, you cannot define a subdisk on a disk in
one disk group and associate it with a volume in another disk
group. LSM creates and requires one special disk group called
rootdg, which is generally the default for most utilities. In
addition to defining the regular disk group information, the
configuration database for the root disk group contains local
information that is specific to a disk group. The rootdg disk
group cannot be moved to a different host, unlike other, admin‐
istrator-created disk groups. Most disks used by LSM contain
two special regions: a private region and a public region. Usu‐
ally, each region is formed from a complete partition of the
disk; however, the private and public regions can be allocated
from the same partition.
The private region of a disk contains on-disk structures that
are used by LSM for internal purposes. Each private region is
typically 4096 blocks and begins with a disk header that identi‐
fies the disk and its disk group. Private regions can also con‐
tain copies of a disk group's configuration database and copies
of the disk group's kernel log. The public region of a disk is
the space reserved for allocating subdisks. Subdisks are defined
with offsets that are relative to the beginning of the public
region of a particular disk partition. A subdisk represents a
contiguous region of the disk, and subdisks must be contiguous
with each other within the public region. Only one contiguous
region of disk can form the public region for a disk. A log
kept in the private region on the disk that is written by the
LSM kernel. The log contains records describing the state of
volumes in the disk group. This log provides a mechanism for the
kernel to persistently register state changes so that the vold
daemon can detect the state changes even in the event of a sys‐
tem failure. A block stored in a private region of a disk that
defines several properties of the disk, such as the: Size of the
private region Location and size of the public region Unique
disk ID for the disk Disk group ID and disk group name (if the
disk is currently associated with a disk group) Host ID for a
host that has exclusive use of the disk A 64-byte, universally
unique identifier that is assigned to a physical disk when its
private region is initialized with the /sbin/voldisk init com‐
mand. The disk ID is recorded in the disk media record so that
the physical disk can be related to the disk media record at
system startup. A 64-byte, universally unique identifier that
is assigned to a disk group when the disk group is created with
the /sbin/voldg init command. This identifier is in addition to
the disk group name, which you assign. The disk group ID differ‐
entiates between disk groups that have the same administrator-
assigned names. A name, usually assigned by you, that identi‐
fies a particular host. Host IDs are used to assign ownership to
particular physical disks. When a disk is part of a disk group
that is in active use by a particular host, the disk is stamped
with that host's host ID. If another system attempts to access
the disk, it detects that the disk has a nonmatching host ID and
disallows access until the host with ownership discontinues use
of the disk. Use the /sbin/voldisk clearimport command to clear
the host ID stored on a disk.
If a disk is a member of a disk group and has a host ID that
matches a particular host, then that host will import the disk
group as part of system startup. A plex that scatters data
evenly across each of its associated subdisks. A plex has a
characteristic number of stripe columns (represented by the num‐
ber of associated subdisks) and a characteristic stripe width.
The stripe width defines how data with a particular address is
allocated to one of the associated subdisks. Given a stripe
width of 128 blocks and two stripe columns, the first group of
128 blocks is allocated to the first subdisk, the second group
of 128 blocks is allocated to the second subdisk, the third
group to the first subdisk again, and so on. A plex that uses
subdisks on one or more disks to create a virtual contiguous
region of storage space that is accessed linearly. If LSM
reaches the end of a subdisk while writing data, it continues to
write data to the next subdisk, which can physically exist on
the same disk or a different disk. This layout allows you to use
space on several regions of the same disk, or regions of several
disks, to create a single big pool of storage. The volboot file
is a special file (usually stored as /etc/vol/volboot) that is
used to bootstrap the root disk group and to define a system's
host ID. In addition to a host ID, the volboot file might also
contain a list of disk access records. On system startup, the
list of disks is scanned to find a disk that is a member of the
rootdg disk group and that is stamped with this system's host
ID. The volboot file allows the configuration to be located on
disks not detected by system initialization, or to be detected
in cases where autoconfig is disabled. When such a disk is
found, its configuration database is read and is used to get a
complete list of disk access records that are used as a second-
stage bootstrap of the root disk group, and to locate all other
disk groups. If the plexes of a volume contain different data,
then the plexes are said to be inconsistent. This is a problem
only if LSM is unaware of the inconsistencies, as the volume can
return differing results for consecutive reads.
Plex inconsistency is a serious compromise of data integrity.
This inconsistency is caused by write operations that start
around the time of a system failure, if parts of the write com‐
plete on one plex but not the other. If the plexes are not first
synchronized to contain the same data, plexes are inconsistent
after creation of a mirrored volume. An important role of LSM is
to ensure that consistent data is returned to any application
that reads a volume. This might require that plex consistency of
a volume be “recovered” by copying data between plexes so that
they have the same contents. Alternatively, you can put the vol‐
ume into a state so that reads from one plex are automatically
written back to the other plexes, thus making the data consis‐
tent for that volume offset.
CONVENTIONS
The following conventions are available for LSM commands to provide a
finer degree of administration.
Command Syntax
Most LSM commands provide more than one operation, with operations
grouped primarily by object type. Commands that provide multiple opera‐
tions are typically invoked with the following form: command [options]
[keyword] [operands]
Here, command is the name of the command and keyword is a name that
identifies the specific operation to perform. Any options introduced in
the standard -letter form precede the operation keyword.
To aid normal use, each command provides an extended usage message that
lists the options and operation keywords it supports. For commands that
are keyword-based, the extended usage message can be displayed by using
the help keyword. For commands that use operands for purposes other
than operation selection, the extended usage message can be displayed
by using the -H option. The extended usage messages are reminders, not
replacements for user documentation.
Standard Length Numbers
Many basic properties of objects managed by LSM require specification
of lengths, either as a pure object length or as an offset relative to
some other object. LSM supports volume lengths up through 2,147,483,647
disk sectors (one terabyte on most systems). Typing such large numbers,
or even much smaller numbers, can be annoying and subject to error. LSM
provides a uniform syntax for representing such numbers, which uses
suffixes to provide convenient multipliers. Numbers can be specified
in decimal, octal, or hexadecimal values. Also, numbers can be speci‐
fied as a sum of several numbers.
A hexadecimal (base 16) number is introduced using a prefix of 0x. For
example, 0xfff is the same as decimal 4095. An octal (base 8) number is
introduced using a prefix of 0. For example, 0177777 is the same as
decimal 65535.
A number can be followed by a suffix character to indicate a multiplier
for the number. A length number with no suffix character represents a
count of standard disk sectors. The length of a standard disk sector
can vary between systems; it is commonly 512 bytes. On systems where
disks can have different sector sizes, one of the sector sizes will be
chosen as the “standard” size. Supported suffix characters are:
Multiply the length by 512 bytes (blocks) Multiply the length by the
standard sectors size (default) Multiply the length by 1024 bytes for
kilobytes Multiply the length by 1,048,576 (1024K bytes) for megabytes
Multiply the length by 1,073,741,824 (1024MB) for gigabytes Multiply
the length by 1,099,511,627,776 (1024GB) for terabytes
Numbers are represented internally as an integer number of sectors. As
a result, if the standard disk sector size is larger than 512 bytes,
numbers will be rounded down to the nearest multiple of the specified
number of sectors. Rounding is always done to the next lowest, not the
nearest, multiple of the sector size.
The letter b is a valid hexadecimal character. To use b to indicate a
length in blocks, leave a single space between the length and the b
suffix. Use of a blank character within a number, when invoking com‐
mands from the shell, usually requires enclosing the number in quotes.
For example: /sbin/volassist make vol01 "0x1000 b"
Numbers can be added or subtracted by separating two or more numbers by
a plus or minus sign, respectively. A plus sign is optional. For exam‐
ple, the largest allowed number that can be represented on a system
with a 512 byte sector size can be entered as: 1023g+1023m+1023k+1
The number 2g-1 can be used to represent the largest volume size that
can be used with most file systems.
In output, LSM reports length numbers as a simple count of sectors,
with no suffix character.
Case is not important in length specification. Hexadecimal numbers and
suffix characters can be specified using any reasonable combination of
uppercase and lowercase letters.
Disk Group Selection
Most commands operate upon only one disk group. Each disk group has a
separate configuration from every other disk group. It is possible for
two disk groups to contain objects that have the same name. This can
happen if a disk group is moved from one system to another. However,
most utilities make no attempt to ensure that names between disk groups
are unique, so name collisions can occur anyway.
In general you specify disk groups only when creating objects. You can‐
not use a single command that references objects in more than one disk
group, but disk groups are selected automatically, based on objects
specified in the command.
The standard rules most commands use for selecting the disk group for a
command are as follows: Given a particular set of object names speci‐
fied on the command line, look for the disk group of each object. If
all objects are in the same disk group, use that disk group. If any
named object is not unique in all disk groups, and if one named object
is not in the rootdg disk group, then fail. To force use of a particu‐
lar disk group, use -g diskgroup to indicate the group. Names do not
cause errors when a disk group is specified explicitly. The diskgroup
specification is either a disk group ID or a disk group name.
Exception: Any set of objects in the rootdg disk group can be
specified without specifying -g rootdg, even if the given object
name is used in another disk group.
If a set of object names is given on the command line, and if some are
unique but some are not unique, then the command will fail according to
the preceding rules.
RECORD TYPES
Disk group configurations contain six types of records: volume records,
plex records, subdisk records, disk media records, disk group records,
and disk access records. Each of these record types is described in the
following sections. Disk access records are specific to the root disk
group and are stored in configurations only because there is no other
convenient place to store them; otherwise, they are logically separate
from all disk groups. Since they are specific and meaningful only to
the local system, the logical place for their storage is the rootdg
because that is the only disk group guaranteed to exist on the system.
Disk Group Records
Disk group records define several different types of names for a disk
group. The different types of names are: The name of the disk group, as
defined on disk. This name is stored in the disk group configuration
and is also stored in the disk headers of disks in the disk group. The
standard name that the system uses when referencing the disk group.
References to the disk group name usually mean the alias name. Volume
directories are structured into subdirectories based on the disk group
alias name. Typically, the disk group's alias name and real name are
identical. A local alias can be useful for gaining access to a disk
group with a name that conflicts with other disk groups in the system
or that conflicts with records in the rootdg disk group. A 64-byte
identifier that represents the unique ID of the disk group. All disk
groups on all systems should have a unique disk group ID, even if they
have the same real name. This identifier is stored in the disk headers
of disks in the disk group that have a private region. It is used to
ensure that LSM does not confuse two disk groups that were created with
the same name.
Volume Records
Volume records define the characteristics of volume devices. The name
of a volume record defines the node name used for files in the /dev/vol
and /dev/rvol directories. The block device for a volume (which can be
used as an argument to the mount command has the path:
/dev/vol/groupname/volume
where groupname is the name of the disk group containing the volume.
The raw device for a volume, typically used for application I/O and for
issuing I/O control operations has the path:
/dev/rvol/groupname/volume
For convenience, volumes assigned to the root disk group are accessible
under the rootdg subdirectories of the /dev/vol and /dev/rvol directo‐
ries, but are also accessible under the /dev/vol/volume and
/dev/rvol/volume directories.
Reads from a volume are directed to one of the read-write or read-only
plexes associated with the volume. Writes to the volume are directed to
the enabled read-write and write-only plexes associated with the vol‐
ume.
During a write operation, two plexes of a volume can become out of sync
with each other, because writes directed to two disks can complete at
different times. This is not normally a problem. However, if the system
were to crash or lose power during a write operation, the two plexes
could have different contents.
Most applications and file systems are not designed with the presump‐
tion that two separate reads of a device can return different contents
without an intervening write operation. Because plexes with different
contents could cause such a situation, LSM expends considerable effort
to guarantee that this does not happen.
Volumes have the following fundamental attributes: Defines a class of
rules for operating on the volume, typically based on the expected con‐
tent of the volume. Several utilities can apply extensions or limita‐
tions that apply to volumes with a particular usage type. Several usage
types are included with the base release of LSM: fsgen, for use with
volumes that contain file systems; gen, for use with volumes that are
used as swap devices or for other applications that do not use the sys‐
tem buffer cache; raid5 for use with volumes that have a RAID 5 plex
layout, regardless of what the volume is used for; and the following
special usage types: root, for use with the root file system volume on
a single system; cluroot, for use with the cluster_root domain volume
on a cluster; and swap, for use with the primary swap device on a sin‐
gle system and swap devices for cluster members. Usage types maintain
a private state field related to the volume that records operations
that have been performed on the volume or failure conditions that have
been encountered. This state field contains a string of up through 14
characters. Each volume has a length, which defines the limiting off‐
set of read and write operations. The length is assigned by the admin‐
istrator and might or might not match the lengths of the associated
plexes. Each volume is either enabled, disabled, or detached. When
enabled, normal read and write operations are allowed on the volume,
and any file system residing on the volume can be mounted or used in
the usual way. When disabled, no access to the volume or any of its
associated plexes is allowed. When detached, some ioctl calls can be
used by commands to operate on the volume. Each volume has zero
through 32 associated plexes. A configurable policy for switching
between plexes for volume reads. When a volume has more than one
enabled associated plex, LSM can distribute reads between the plexes to
distribute the I/O load and thus increase total possible bandwidth of
reads through the volume.
You can set and change the read policy to one of the following:
For every other read operation, switches to a different plex
from the previous read operation. Given three plexes, this
switches between each of the three plexes, in order. Specifies
a plex used to satisfy read requests. In the event that a read
request cannot be satisfied by the preferred plex, the volume
changes to round-robin read policy. The default policy. Adjusts
to use an appropriate read policy based on the set of plexes
associated with the volume. If only one enabled read-write
striped plex is associated with the volume, then that plex is
chosen automatically as the preferred plex; otherwise, the
round-robin policy is used. If a volume has one striped plex and
one concatenated plex, preferring the striped plex often yields
better throughput. A string organized as a set of usage-type
options to apply when starting (enabling) a volume. See vol‐
ume(8) for details. An assignable policy to use for logging
changes to the volume. The policies are: Does not log any
changes when writing to the volume. Writes the requested data to
all read-write or write-only plexes. Maintains a bitmap that
represents different regions of a mirrored volume. When a write
to a particular region occurs, the respective bit is set. When
the system is restarted after a crash, this region bitmap is
used to limit the amount of data copying required to recover
plex consistency for the volume. The region changes are logged
to a special log subdisk associated with the volume. Use of DRL
can greatly speed recovery of a volume, but it might degrade
performance of the volume under normal operation. Stores a copy
of the data and parity for several full stripes of I/O. When a
write to a RAID 5 volume occurs, the parity is calculated and
the data and parity are first written to the RAID 5 log, then to
the volume. When the system is restarted after a crash, all the
writes in the RAID 5 log are written (or possibly rewritten) to
the volume. The writes are logged to a special log subdisk
associated with a separate log plex, associated with the volume.
Use of a RAID 5 log protects against data loss in the event of a
system failure. A mode that applies to the volume during plex
consistency recovery. When this mode is enabled, the data read
from blocks of one plex region is written back to the corre‐
sponding region in all other writable plexes. This ensures that
a future read operation covering the same range of blocks will
return the same data. Can be enabled or disabled using voledit.
If this mode is enabled, a read failure for a plex causes data
to be read from an alternate plex and then written back to the
plex that had the read failure. This usually fixes the error.
Only if the writeback fails will the plex be detached for having
an unrecoverable I/O failure. This is the default. Can be
enabled or disabled using voledit. This mode takes effect only
if the DRL feature is in effect. When the operating system
passes a write request to the volume driver, the operating sys‐
tem might continue to change the memory being written to disk.
LSM cannot detect that the memory is changing, so it can inad‐
vertently leave plexes with inconsistent contents. This is not
normally a problem, because the operating system ensures that
any such modified memory is rewritten to the volume before the
volume is closed (such as by a clean system shutdown). However,
if the system crashes, plexes can be inconsistent. Because the
DRL logging feature prevents recovery of the entire volume, it
might not ensure that plexes are entirely consistent.
Turning on the writecopy mode (which is normally set by default)
often causes LSM to copy the data for a write request to a new
section of memory before writing it to disk. Because the write
is done from the copied memory, it cannot change and so the data
written to each plex is guaranteed to be the same if the write
completes. Several modes can be set on the volume according to
its usage type. These modes affect operation of a volume in the
presence of I/O failures. Only one of these policies, called
GEN_DET_SPARSE is used. This policy tracks complete and incom‐
plete plexes in a volume. (An incomplete plex does not have a
backing subdisk for all blocks in the volume.) If an unrecover‐
able error occurs on an incomplete plex, the plex is detached
(disabled from receiving regular volume I/O requests). If an
unrecoverable error occurs on a complete plex, the plex is
detached unless it is the last complete plex, in which case any
incomplete plexes that overlap with the error will be detached
but the plex with the error will remain attached.
This exception policy is chosen to ensure that an I/O that fails
on one plex will not be directed to that plex again unless that
plex is the last complete plex remaining attached to the volume.
In that case, the policy ensures that the volume will return the
error consistently, even in the presence of incomplete plexes.
An administrator-assigned string of up through 40 characters
that can be set and changed using the voledit command. LSM does
not interpret the comment field. The comment cannot contain new‐
line characters. The user, group, and file permission modes
used for the volume device nodes. The user and group modes are
normally root and system. The mode usually grants read and
write permission to the owner and no access by other users.
Plex Records
Plex records define the characteristics of a particular plex of a vol‐
ume. A plex can be in either an associated state or a dissociated
state. In the dissociated state, the plex is not a part of a volume. A
dissociated plex cannot be accessed in any way. An associated plex can
be accessed through the volume.
Plexes have the following fundamental attributes: Each plex is either
enabled, disabled, or detached. When enabled, normal read and write
operations from the volume can be directed to the plex. When disabled
or detached, no I/O operations can be applied to the plex.
Failures encountered during normal volume I/O can change the
plex state from enabled to detached. See the preceding descrip‐
tion of the volume record exception policy for more information.
Each plex is in read-write, read-only, or write-only mode. The
I/O mode affects read and write operations directed to the vol‐
ume, if the plex is enabled. For read-write and read-only modes,
volume read operations can be directed to the plex. For read-
write and write-only modes, volume write operations are directed
to the plex.
Plexes are normally in read-write mode. Write-only mode is used
to recover a plex that failed and whose contents have become out
of date with respect to the volume. It is also used when attach‐
ing a new plex to a volume. In read-write mode, writes to the
volume will update the plex, causing written regions to be up to
date. Typically, a set of special copy operations is used to
update the remainder of the plex. The organization of associ‐
ated subdisks with respect to the plex address space. The layout
is striped, concatenated, or RAID 5. Each plex can have zero or
more associated subdisks. Subdisks are associated at offsets
relative to the beginning of the plex address space. Subdisks
for concatenated plexes might not cover the entire length of the
plex, in which case they leave holes in the plex. A plex that is
not as long as the associated volume is considered to have a
hole extending from the end of the plex to the end of the vol‐
ume. A plex with a hole is considered incomplete and is some‐
times called sparse. Each plex can have one associated log sub‐
disk. A log subdisk is used with the DRL feature to reduce the
time required to recover consistency of a volume after a system
failure. If a plex is associated with a log subdisk, that plex
is a log plex. The length of a plex is the offset of the last
subdisk in the plex plus the length of that subdisk. In other
words, the length of the plex is defined by the last block in
the plex address space that is backed by a subdisk. This value
might not relate to the length of the volume, depending on
whether the plex is completely contiguously allocated. The off‐
set of the first block in the plex address space that is not
backed by a subdisk. If the plex has no holes, the contiguous
length matches the plex length. If the contiguous length is
equal to or greater than the length of the associated volume,
the plex is considered complete; otherwise it is incomplete.
Volume usage types maintain a private state field related to the
operations that have been performed on the plex or to failure
conditions that have been encountered. This state field contains
a string of up through 14 characters. Various condition flags
are defined for the plex that LSM sets and changes independent
of the volume usage type. Defined flags are: No physical disk
could be found corresponding to the disk ID in the disk media
record for one of the subdisks associated with the plex. The
plex cannot be used until the condition is fixed or the affected
subdisk is dissociated. A disk media record was put into the
removed state through explicit administrative action. The plex
cannot be used until the disk is replaced or the affected sub‐
disk is dissociated. A disk for a disk media record was
replaced or was reattached too late to prevent the plex from
becoming out of date with respect to the volume. The plex
requires complete recovery from another plex in the volume to
synchronize the plex with the correct contents of the volume.
The plex was detached when an I/O failure was detected during
normal volume I/O. The plex is out of date with respect to the
volume and in need of complete recovery. However, this condition
can also indicate that a disk in the system should be replaced.
A plex is considered to have “volatile” contents if the disk for
any of the plex's subdisks is considered to be volatile. The
contents of a volatile disk are not presumed to survive a system
reboot. The contents of a volatile plex are always considered
out of date after a recovery and in need of complete recovery
from another plex. An administrator-assigned string of up
through 40 characters that can be set and changed using the
voledit command. LSM does not interpret the comment field. The
comment cannot contain newline characters.
Subdisk Records
Subdisk records define a region of disk, allocated from a disk's public
region. Subdisks have few states associated with them, other than the
configuration state that defines which region of disk the subdisk occu‐
pies. Subdisks cannot overlap each other, either in their associations
with plexes or in their arrangement on disk public regions.
Subdisks have the following fundamental attributes: The name of the
disk media record that points to the physical disk. The offset from
the beginning of the disk's public region to the start of the subdisk.
For associated subdisks, this is the offset (from the beginning of the
plex) of the subdisk association. For subdisks associated with striped
plexes, the plex offset defines relative ordering of subdisks in the
plex, rather than actual offsets within the plex address space. The
length of the subdisk. An administrator-assigned string of up through
40 characters that can be set and changed using the voledit command.
LSM does not interpret the comment field. The comment cannot contain
newline characters.
Disk Media Records
Disk media records define a specific disk within a disk group. The name
of a disk media record (the disk media name) is assigned when a disk is
first added to a disk group. Disk media records can be assigned to spe‐
cific physical disks by associating the disk media record with the cur‐
rent disk access record for the physical disk.
Disk media records have the following fundamental attributes: A 64-byte
unique identifier assigned to the physical disk associated with the
disk media record. This can be cleared to indicate that the disk is in
the REMOVED state. A removed disk has no current association with any
physical disk. The disk access name currently used to access the phys‐
ical disk referenced by the disk ID. If the disk ID is defined, but no
physical disk with that ID can be found, the disk access name will be
null. If the physical disk is not found, the disk state is NODAREC, or
inaccessible. A disk can become inaccessible either because the indi‐
cated disk is not currently attached to the system or because I/O fail‐
ures on the physical disk prevented LSM from identifying or using the
physical disk.
A disk media record that has an active association with a physical disk
(both the disk ID and the disk access name attributes are defined)
inherits several properties from the underlying physical disk. These
attributes are taken from the disk header, which is stored in the pri‐
vate region of the disk. These inherited attributes are: The length of
the region of the physical disk available for subdisk allocations. The
length of the region of the physical disk reserved for storing private
Logical Storage Manager information. The fundamental I/O size for the
disk, in bytes, also known as the sector size. All I/Os destined for
this disk must be multiples of this size. LSM requires that all disks
have the same sector size. On Tru64 UNIX systems, the sector size is
512 bytes.
Disk Access Records
Disk access records define an address, or access path, for a disk. LSM
uses the disk access records to locate physical disks. Disk access
records do not define specific physical disks, because physical disks
can be moved on a system. When a physical disk is moved, a different
disk access record might be necessary to locate it.
Disk access records are stored in the rootdg disk group configuration.
Unlike other record types, the names of disk access records can con‐
flict with the names of other records. For example, a specialty disk
(such as a RAM disk) can use the same name for both the disk access
record and the disk media record that points to it.
Disk access records can be defined explicitly. Some (sometimes all)
disk access records might be configured automatically by LSM, based on
available information in the operating system. Such automatically con‐
figured disks are not stored persistently in the on-disk root disk
group configuration, but instead are regenerated every time LSM starts.
Disk access records have the following fundamental attributes: The name
of the disk access record is typically a disk address of some kind.
Disk names are usually of the form dsknp, where dsk is the device mne‐
nomic for disk devices, n is the sequence number of the disk, and p is
the partition identifier (in the range a through h). Each disk access
record has a type, which identifies certain key characteristics of
LSM's interaction with the disk. Available types are: sliced, simple,
and nopriv. See voldisk(8) for more information on disk types. Typi‐
cally, most or all of the disks will be of type sliced. It might be
desirable to create specialty disks (such as RAM disks) with type
nopriv.
If the physical disk represented by the disk access record is currently
associated with a disk media record, then the following fields are
defined: The name of the disk group containing the disk media record.
The name of the disk media record that points to the physical disk.
Additional attributes can be added, arbitrarily, by disk types. See
voldisk(8) for a list of additional attributes defined by the standard
disk types.
VOLUME USAGE TYPES
The usage type of a volume represents a class of rules for operating on
a volume. Each usage type is defined by a set of executables under the
directory /sbin/lsm.d/usage_type, where usage_type is the name given to
the usage type. The required executables are: volinfo, volmake, vol‐
mend, volplex, volsd, and volume. These executables are invoked by LSM
administrative utilities with the same names. The executables under
/sbin/lsm.d/usage_type should not, normally, be executed directly.
The usage types provided with LSM are: gen, fsgen, root, cluroot, swap,
and raid5. It is likely that new usage types will be added in future
releases. It is also possible for third-party products to install addi‐
tional usage types.
The usage types provided with LSM store state information in the volume
and plex usage-type state fields.
The volume states are: The volume is not yet initialized. This is the
initial state for volumes created by volmake. The volume has been
stopped and the contents for all plexes are consistent. The volume has
been started and is running normally or was running normally when the
system was stopped. If the system crashes in this state, then the vol‐
ume might require plex consistency recovery. The volume requires
recovery. A volume is typically set to this state after a system fail‐
ure to indicate that the plexes in the volume might be inconsistent and
require recovery. (See the resync operation in volume(8).) Plex con‐
sistency recovery is currently being done on the volume. The volume
resync operation sets this state when it starts to recover plex consis‐
tency on a volume that was in the NEEDSYNC state.
The plex states are: The plex is not yet initialized. This state is
set when the volume state is also EMPTY. The plex was running normally
when the volume was stopped. The plex will be enabled without requiring
recovery when the volume is started. The plex is running normally on a
started volume. The plex condition flags (NODAREC, REMOVED, RECOVER,
and IOFAIL) can apply if the system is rebooted and the volume
restarted. The plex was detached, either by a volplex det operation or
by an I/O failure. The volume start operation will change the state
for a plex to STALE if any of the plex condition flags are set. STALE
plexes will be reattached automatically when a volume is started. The
plex was disabled explicitly by the volmend off operation. See vol‐
mend(8) for more information. Applies to a snapshot plex that is being
attached by the volassist snapstart operation. When the attach is com‐
plete, the state for the plex will be changed to SNAPDONE. If the sys‐
tem fails before the attach completes, the plex and all of its subdisks
will be removed. Applies to a snapshot plex created by the volassist
snapstart operation that is fully attached. A plex in this state can be
turned into a snapshot volume with the volassist snapshot operation.
See volassist(8) for more information. If the system fails before the
attach completes, the plex and all of its subdisks will be removed.
Applies to a snapshot plex being attached by the volplex snapstart
operation. When the attach is complete, the state for the plex will be
changed to SNAPDIS. If the system fails before the attach completes,
the plex will be dissociated from the volume. Applies to a snapshot
plex created by a volplex snapstart operation that is fully attached. A
plex in this state can be turned into a snapshot volume with the vol‐
plex snapshot operation. See volplex(8) for more information. If the
system fails before the attach completes, the plex will be dissociated
from the volume. Applies to a plex that is being associated and
attached to a volume with the volplex att operation. If the system
fails before the attach completes, the plex will be dissociated from
the volume. Applies to a plex that is being associated and attached to
a volume with the volplex att operation. If the system fails before the
attach completes, the plex will be dissociated from the volume and
removed. Any subdisks in the plex will be kept. Applies to a plex that
is being associated and attached to a volume with the volplex att oper‐
ation. If the system fails before the attach completes, the plex and
its subdisks will be dissociated from the volume and removed.
EXIT STATUS
The majority of LSM utilities use a common set of exit codes, which can
be used by shell scripts or other types of programs to react to spe‐
cific problems detected by the utilities. For C programmers, these exit
status codes are defined in the include file volclient.h. The number
and macro name for each distinct exit code is described in the follow‐
ing list. Shell script writers must directly compare the numbers spec‐
ified. The command is not reporting any error through the exit code.
Some command-line arguments were invalid. A syntax error occurred in a
command line or description, or a specified record name is too long or
contains invalid characters. This code is returned only by utilities
that implement a command or description language. This code can also be
returned for errors in search patterns. The volume daemon might not be
running. An unexpected error was encountered while communicating with
the volume daemon. An unexpected error was returned by a system call
or by the C library. This can also indicate that the command ran out of
memory. The status for a commit was lost because the volume daemon was
killed and restarted during the commit of a transaction, but after
restart the volume daemon did not know whether the commit succeeded or
failed. The command encountered an error that it should not have
encountered. This generally implies a condition that the command
should have tested for but did not or a condition that results from the
volume daemon returning a value that did not make sense.
VEX_UNKNOWN: An unknown or internal error was encountered. This
code can be used, for example, when the volume daemon returns an
unrecognized error number. The time required to complete a
transaction exceeded 60 seconds, causing the transaction locks
to be lost. Because most utilities will reattempt the transac‐
tion at least once if a timeout occurs, this usually implies
that a transaction timed out two or more times. No disk group
could be identified for an operation. This results either from
specifying a disk group that does not exist or from supplying
names on a command line that are in different disk groups or in
multiple disk groups. A change made to the database by another
process caused the command to stop. This code is also returned
by a usage-type-dependent command if it is given a record that
has a different usage type. A requested subdisk, plex, or vol‐
ume record was not found in the configuration database. This
can also mean that a record was an inappropriate type. A name
used to create a new configuration record matches the name of an
existing record. A subdisk, plex, or volume is locked against
concurrent access. This code is used for intertransaction locks
associated with usage-type utilities. The code is also used for
the dissociated-plex or subdisk lock convention, which writes a
nonblank string to the tutil[0] field in a plex or subdisk
structure to indicate that the record is being used. No usage
type could be determined for a command that requires a usage
type. An invalid usage type was specified. A plex or subdisk
is associated, but the operation requires a dissociated record.
A plex or subdisk is dissociated, but the operation requires an
associated record. This code can also be used to indicate that a
subdisk or plex is not associated with a specific plex or vol‐
ume. A plex or subdisk was not dissociated, because it was the
last record associated with a volume or plex. Association of a
plex or subdisk would surpass the maximum number that can be
associated with a volume or plex. A specified operation is
invalid within the parameters specified. An I/O error was
encountered that caused the operation to abort. A volume
involved in an operation did not have any associated plexes,
although at least one was required. A plex involved in an oper‐
ation did not have any associated subdisks, although at least
one was required. A volume could not be started by the volume
start operation, because the configuration of the volume and its
plexes prevented the operation. A specified volume was already
started. A specified volume was not started. For example, this
code is returned by the volume stop operation, if the operation
is given a volume that is not started. A volume or plex
involved in an operation is in the detached state, thus prevent‐
ing a successful operation. A volume or plex involved in an
operation is in the disabled state, thus preventing a successful
operation. A volume or plex involved in an operation is in the
enabled state, thus preventing a successful operation. An
unrecognized error was encountered. This code is currently
unused. An operation failed because a volume device was open or
mounted or because a subdisk was associated with an open or
mounted volume or plex.
Exit codes 32 through 64 are reserved for use by usage types. Codes
greater than 64 can be reserved for use by specific utilities.
SEE ALSO
Commands: mount(8), volassist(8), volclonedg(8), vold(8), voldctl(8),
voldg(8), voldisk(8), voldiskadd(8), voldiskadm(8), voldisksetup(8),
voledit(8), volencap(8), volevac(8), volinfo(8), volinstall(8),
voliod(8), vollogcnvt(8), volmake(8), volmend(8), volmigrate(8),
volmirror(8), volnotify(8), volplex(8), volprint(8), volreattach(8),
volrecover(8), volreconfig(8), volrestore(8), volrootmir(8), vol‐
save(8), volsd(8), volsetup(8), volstat(8), voltrace(8), volume(8),
volunmigrate(8), volunroot(8), volwatch(8)
Functions: ioctl(2)
Files: vol_pattern(4), volmake(4)volintro(8)