PMIE(1)PMIE(1)NAME
pmie - inference engine for performance metrics
SYNOPSIS
pmie [-bCdefHVvWxz] [-A align] [-a archive] [-c filename] [-h host] [-l
logfile] [-j stompfile] [-n pmnsfile] [-O offset] [-S starttime] [-T
endtime] [-t interval] [-Z timezone] [filename ...]
DESCRIPTION
pmie accepts a collection of arithmetic, logical, and rule expressions
to be evaluated at specified frequencies. The base data for the
expressions consists of performance metrics values delivered in real-
time from any host running the Performance Metrics Collection Daemon
(PMCD), or using historical data from Performance Co-Pilot (PCP) ar‐
chive logs.
As well as computing arithmetic and logical values, pmie can execute
actions (popup alarms, write system log messages, and launch programs)
in response to specified conditions. Such actions are extremely useful
in detecting, monitoring and correcting performance related problems.
The expressions to be evaluated are read from configuration files spec‐
ified by one or more filename arguments. In the absence of any file‐
name, expressions are read from standard input.
A description of the command line options specific to pmie follows:
-a archive is the base name of a PCP archive log written by pmlog‐
ger(1). Multiple instances of the -a flag may appear on the com‐
mand line to specify a set of archives. In this case, it is
required that only one archive be present for any one host. Also,
any explicit host names occurring in a pmie expression must match
the host name recorded in one of the archive labels. In the case
of multiple archives, timestamps recorded in the archives are used
to ensure temporal consistency.
-b Output will be line buffered and standard output is attached to
standard error. This is most useful for background execution in
conjunction with the -l option. The -b option is always used for
pmie instances launched from pmie_check(1).
-C Parse the configuration file(s) and exit before performing any
evaluations. Any errors in the configuration file are reported.
-c An alternative to specifying filename at the end of the command
line.
-d Normally pmie would be launched as a non-interactive process to
monitor and manage the performance of one or more hosts. Given
the -d flag however, execution is interactive and the user is pre‐
sented with a menu of options. Interactive mode is useful mainly
for debugging new expressions.
-e When used with -V, -v or -W, this option forces timestamps to be
reported with each expression. The timestamps are in ctime(3)
format, enclosed in parenthesis and appear after the expression
name and before the expression value, e.g.
expr_1 (Tue Feb 6 19:55:10 2001): 12
-f If the -l option is specified and there is no -a option (ie. real-
time monitoring) then pmie is run as a daemon in the background
(in all other cases foreground is the default). The -f option
forces pmie to be run in the foreground, independent of any other
options.
-H The default hostname written to the stats file will not be looked
up via gethostbyname(3), rather it will be written as-is. This
option can be useful when host name aliases are in use at a site,
and the logical name is more important than the physical host
name.
-h By default performance data is fetched from the local host (in
real-time mode) or the host for the first named archive on the
command line (in archive mode). The host argument overrides this
default. It does not override hosts explicitly named in the
expressions being evaluated.
-l Standard error is sent to logfile.
-j An alternative STOMP protocol configuration is loaded from stomp‐
file. If this option is not used, and the stomp action is used in
any rule, the default location $PCP_VAR_DIR/pmie/config/stomp will
be used.
-n An alternative Performance Metrics Name Space (PMNS) is loaded
from the file pmnsfile.
-t The interval argument follows the syntax described in PCPIntro(1),
and in the simplest form may be an unsigned integer (the implied
units in this case are seconds). The value is used to determine
the sample interval for expressions that do not explicitly set
their sample interval using the pmie variable delta described
below. The default is 10.0 seconds.
-v Unless one of the verbose options -V, -v or -W appears on the com‐
mand line, expressions are evaluated silently, the only output is
as a result of any actions being executed. In the verbose mode,
specified using the -v flag, the value of each expression is
printed as it is evaluated. The values are in canonical units;
bytes in the dimension of ``space'', seconds in the dimension of
``time'' and events in the dimension of ``count''. See
pmLookupDesc(3) for details of the supported dimension and scaling
mechanisms for performance metrics. The verbose mode is useful in
monitoring the value of given expressions, evaluating derived per‐
formance metrics, passing these values on to other tools for fur‐
ther processing and in debugging new expressions.
-V This option has the same effect as the -v option, except that the
name of the host and instance (if applicable) are printed as well
as expression values.
-W This option has the same effect as the -V option described above,
except that for boolean expressions, only those names and values
that make the expression true are printed. These are the same
names and values accessible to rule actions as the %h, %i and %v
bindings, as described below.
-x Execute in domain agent mode. This mode is used within the Per‐
formance Co-Pilot product to derive values for summary metrics,
see pmdasummary(1). Only restricted functionality is available in
this mode (expressions with actions may not be used).
-Z Change the reporting timezone to timezone in the format of the
environment variable TZ as described in environ(5).
-z Change the reporting timezone to the timezone of the host that is
the source of the performance metrics, as identified via either
the -h option or the first named archive (as described above for
the -a option).
The -S, -T, -O, and -A options may be used to define a time window to
restrict the samples retrieved, set an initial origin within the time
window, or specify a ``natural'' alignment of the sample times; refer
to PCPIntro(1) for a complete description of these options.
Output from pmie is directed to standard output and standard error as
follows:
stdout
Expression values printed in the verbose -v mode and the output of
print actions.
stderr
Error and warning messages for any syntactic or semantic problems
during expression parsing, and any semantic or performance metrics
availability problems during expression evaluation.
EXAMPLES
The following example expressions demonstrate some of the capabilities
of the inference engine.
The directory $PCP_DEMOS_DIR/pmie contains a number of other annotated
examples of pmie expressions.
The variable delta controls expression evaluation frequency. Specify
that subsequent expressions be evaluated once a second, until further
notice:
delta = 1 sec;
If total syscall rate exceeds 5000 per second per CPU, then display an
alarm notifier:
kernel.all.syscall / hinv.ncpu > 5000 count/sec
-> alarm "high syscall rate";
If the high syscall rate is sustained for 10 consecutive samples, then
launch top(1) in an xwsh(1G) window to monitor processes, but do this
at most once every 5 minutes:
all_sample (
kernel.all.syscall @0..9 > 5000 count/sec * hinv.ncpu
) -> shell 5 min "xwsh -e 'top'";
The following rules are evaluated once every 20 seconds:
delta = 20 sec;
If any disk is performing more than 60 I/Os per second, then print a
message identifying the busy disk to standard output and launch
dkvis(1):
some_inst (
disk.dev.total > 60 count/sec
) -> print "disk %i busy " &
shell 5 min "dkvis";
Refine the preceding rule to apply only between the hours of 9am and
5pm, and to require 3 of 4 consecutive samples to exceed the threshold
before executing the action:
$hour >= 9 && $hour <= 17 &&
some_inst (
75 %_sample (
disk.dev.total @0..3 > 60 count/sec
)
) -> print "disk %i busy ";
The following rules are evaluated once every 10 minutes:
delta = 10 min;
If either the / or the /usr filesystem is more than 95% full, display
an alarm popup, but not if it has already been displayed during the
last 4 hours:
filesys.free #'/dev/root' /
filesys.capacity #'/dev/root' < 0.05
-> alarm 4 hour "root filesystem (almost) full";
filesys.free #'/dev/usr' /
filesys.capacity #'/dev/usr' < 0.05
-> alarm 4 hour "/usr filesystem (almost) full";
The following rule requires a machine that supports the PCP environment
metrics. If the machine environment temperature rises more than 2
degrees over a 10 minute interval, write an entry in the system log:
environ.temp @0 - environ.temp @1 > 2
-> alarm "temperature rising fast" &
syslog "machine room temperature rise alarm";
And last, something interesting if you have performance problems with
your Oracle database:
db = "oracle.ptg1";
host = ":moomba.melbourne.sgi.com";
lru = "#'cache buffers lru chain'";
gets = "$db.latch.gets $host $lru";
total = "$db.latch.gets $host $lru +
$db.latch.misses $host $lru +
$db.latch.immisses $host $lru";
$total > 100 && $gets / $total < 0.2
-> alarm "high lru latch contention";
QUICK START
The pmie specification language is powerful and large.
To expedite rapid development of pmie rules, the pmieconf(1) tool pro‐
vides a facility for generating a pmie configuration file from a set of
generalized pmie rules. The supplied set of rules covers a wide range
of performance scenarios.
The Performance Co-Pilot User's and Administrator's Guide provides a
detailed tutorial-style chapter covering pmie.
EXPRESSION SYNTAX
This description is terse and informal. For a more comprehensive
description see the Performance Co-Pilot User's and Administrator's
Guide.
A pmie specification is a sequence of semicolon terminated expressions.
Basic operators are modeled on the arithmetic, relational and Boolean
operators of the C programming language. Precedence rules are as
expected, although the use of parentheses is encouraged to enhance
readability and remove ambiguity.
Operands are performance metric names (see pmns(4)) and the normal lit‐
eral constants.
Operands involving performance metrics may produce sets of values, as a
result of enumeration in the dimensions of hosts, instances and time.
Special qualifiers may appear after a performance metric name to define
the enumeration in each dimension. For example,
kernel.percpu.cpu.user :foo :bar #cpu0 @0..2
defines 6 values corresponding to the time spent executing in user mode
on CPU 0 on the hosts ``foo'' and ``bar'' over the last 3 consecutive
samples. The default interpretation in the absence of : (host), #
(instance) and @ (time) qualifiers is all instances at the most recent
sample time for the default source of PCP performance metrics.
Host and instance names that do not follow the rules for variables in
programming languages, ie. alphabetic optionally followed by alphanu‐
merics, should be enclosed in single quotes.
Expression evaluation follows the law of ``least surprises''. Where
performance metrics have the semantics of a counter, pmie will automat‐
ically convert to a rate based upon consecutive samples and the time
interval between these samples. All expressions are evaluated in dou‐
ble precision, and where appropriate, automatically scaled into canoni‐
cal units of ``bytes'', ``seconds'' and ``counts''.
A rule is a special form of expression that specifies a condition or
logical expression, a special operator (->) and actions to be performed
when the condition is found to be true.
The following table summarizes the basic pmie operators:
┌────────────────┬────────────────────────────────────────────┐
│ Operators │ Explanation │
├────────────────┼────────────────────────────────────────────┤
│+ - * / │ Arithmetic │
│< <= == >= > != │ Relational (value comparison) │
│! && || │ Boolean │
│-> │ Rule │
│rising │ Boolean, false to true transition │
│falling │ Boolean, true to false transition │
│rate │ Explicit rate conversion (rarely required) │
└────────────────┴────────────────────────────────────────────┘
Aggregate operators may be used to aggregate or summarize along one
dimension of a set-valued expression. The following aggregate opera‐
tors map from a logical expression to a logical expression of lower
dimension.
┌─────────────────────────┬─────────────┬──────────────────────────┐
│ Operators │ Type │ Explanation │
├─────────────────────────┼─────────────┼──────────────────────────┤
│some_inst │ Existential │ True if at least one set │
│some_host │ │ member is true in the │
│some_sample │ │ associated dimension │
├─────────────────────────┼─────────────┼──────────────────────────┤
│all_inst │ Universal │ True if all set members │
│all_host │ │ are true in the associ‐ │
│all_sample │ │ ated dimension │
├─────────────────────────┼─────────────┼──────────────────────────┤
│N%_inst │ Percentile │ True if at least N per‐ │
│N%_host │ │ cent of set members are │
│N%_sample │ │ true in the associated │
│ │ │ dimension │
└─────────────────────────┴─────────────┴──────────────────────────┘
The following instantial operators may be used to filter or limit a
set-valued logical expression, based on regular expression matching of
instance names. The logical expression must be a set involving the
dimension of instances, and the regular expression is of the form used
by egrep(1) or the Extended Regular Expressions of regcomp(3G).
┌─────────────┬──────────────────────────────────────────┐
│ Operators │ Explanation │
├─────────────┼──────────────────────────────────────────┤
│match_inst │ For each value of the logical expression │
│ │ that is ``true'', the result is ``true'' │
│ │ if the associated instance name matches │
│ │ the regular expression. Otherwise the │
│ │ result is ``false''. │
├─────────────┼──────────────────────────────────────────┤
│nomatch_inst │ For each value of the logical expression │
│ │ that is ``true'', the result is ``true'' │
│ │ if the associated instance name does not │
│ │ match the regular expression. Otherwise │
│ │ the result is ``false''. │
└─────────────┴──────────────────────────────────────────┘
For example, the expression below will be ``true'' for disks attached
to controllers 2 or 3 performing more than 20 operations per second:
match_inst "^dks[23]d" disk.dev.total > 20;
The following aggregate operators map from an arithmetic expression to
an arithmetic expression of lower dimension.
┌─────────────────────────┬───────────┬──────────────────────────┐
│ Operators │ Type │ Explanation │
├─────────────────────────┼───────────┼──────────────────────────┤
│min_inst │ Extrema │ Minimum value across all │
│min_host │ │ set members in the asso‐ │
│min_sample │ │ ciated dimension │
├─────────────────────────┼───────────┼──────────────────────────┤
│max_inst │ Extrema │ Maximum value across all │
│max_host │ │ set members in the asso‐ │
│max_sample │ │ ciated dimension │
├─────────────────────────┼───────────┼──────────────────────────┤
│sum_inst │ Aggregate │ Sum of values across all │
│sum_host │ │ set members in the asso‐ │
│sum_sample │ │ ciated dimension │
├─────────────────────────┼───────────┼──────────────────────────┤
│avg_inst │ Aggregate │ Average value across all │
│avg_host │ │ set members in the asso‐ │
│avg_sample │ │ ciated dimension │
└─────────────────────────┴───────────┴──────────────────────────┘
The aggregate operators count_inst, count_host and count_sample map
from a logical expression to an arithmetic expression of lower dimen‐
sion by counting the number of set members for which the expression is
true in the associated dimension.
For action rules, the following actions are defined:
┌──────────┬────────────────────────────────────────┐
│Operators │ Explanation │
├──────────┼────────────────────────────────────────┤
│alarm │ Raise a visible alarm with xconfirm(1) │
│print │ Display on standard output │
│shell │ Execute with sh(1) │
│stomp │ Send a STOMP message to a JMS server │
│syslog │ Append a message to system log file │
└──────────┴────────────────────────────────────────┘
Multiple actions may be separated by the & and | operators to specify
respectively sequential execution (both actions are executed) and
alternate execution (the second action will only be executed if the
execution of the first action returns a non-zero error status.
Arguments to actions are an optional suppression time, and then one or
more expressions (a string is an expression in this context). Strings
appearing as arguments to an action may include the following special
selectors that will be replaced at the time the action is executed.
%h Host(s) that make the left-most top-level expression in the condi‐
tion true.
%i Instance(s) that make the left-most top-level expression in the
condition true.
%v Values(s) from the left-most top-level expression in the condition
subject to the host and instance assignments that make the condi‐
tion true.
Note that expansion of the special selectors is done by repeating the
whole argument once for each unique binding to any of the qualifying
special selectors. For example if a rule were true for the host mumble
with instances grunt and snort, and for host fumble the instance puff
makes the rule true, then the action
...
-> shell myscript "Warning: %h-%i busy ";
will execute myscript with the argument string "Warning: mumble-grunt
busy Warning: mumble-snort busy Warning: fumble-puff busy".
By comparison, if the action
...
-> shell myscript "'Warning! busy:" " %i@%h" "'";
were executed under the same circumstances, then myscript would be exe‐
cuted with the argument string '"Warning! busy: grunt@mumble snort@mum‐
ble puff@fumble"'.
The semantics of the expansion of the special selectors leads to a com‐
mon usage, where one argument is a constant (contains no special selec‐
tors) the second argument contains the desired special selectors with
minimal separator characters, and an optional third argument provides a
constant postscript (e.g. to terminate any argument quoting from the
first argument). If necessary post-processing (eg. in myscript) can
provide the necessary enumeration over each unique expansion of the
string containing just the special selectors.
For complex conditions, the bindings to these selectors is not obvious.
It is strongly recommended that pmie be used in the debugging mode
(specify the -W command line option in particular) during rule develop‐
ment.
SCALE FACTORS
Scale factors may be appended to arithmetic expressions and force lin‐
ear scaling of the value to canonical units. Simple scale factors are
constructed from the keywords: nanosecond, nanosec, nsec, microsecond,
microsec, usec, millisecond, millisec, msec, second, sec, minute, min,
hour, byte, Kbyte, Mbyte, Gbyte, Tbyte, count, Kcount, Mcount, Gcount
and Tcount, and the operator /, for example ``Kbytes / hour''.
MACROS
Macros are defined using expressions of the form:
name = constexpr;
Where name follows the normal rules for variables in programming lan‐
guages, ie. alphabetic optionally followed by alphanumerics. constexpr
must be a constant expression, either a string (enclosed in double
quotes) or an arithmetic expression optionally followed by a scale fac‐
tor.
Macros are expanded when their name, prefixed by a dollar ($) appears
in an expression, and macros may be nested within a constexpr string.
The following reserved macro names are understood.
minute Current minute of the hour.
hour Current hour of the day, in the range 0 to 23.
day Current day of the month, in the range 1 to 31.
month Current month of the year, in the range 0 (January) to 11
(December).
year Current year.
day_of_week
Current day of the week, in the range 0 (Sunday) to 6 (Satur‐
day).
delta Sample interval in effect for this expression.
Dates and times are presented in the reporting time zone (see descrip‐
tion of -Z and -z command line options above).
AUTOMATIC RESTART
It is often useful for pmie processes to be started and stopped when
the local host is booted or shutdown, or when they have been detected
as no longer running (when they have unexpectedly exited for some rea‐
son). Refer to pmie_check(1) for details on automating this process.
EVENT MONITORING
It is common for production systems to be monitored in a central loca‐
tion. Traditionally on UNIX systems this has been performed by the
system log facilities - see logger(1), and syslogd(1). On Windows,
communication with the system event log is handled by pcp-eventlog(1).
pmie fits into this model when rules use the syslog action. Note that
if the action string begins with -p (priority) and/or -t (tag) then
these are extracted from the string and treated in the same way as in
logger(1) and pcp-eventlog(1).
However, it is common to have other event monitoring frameworks also,
into which you may wish to incorporate performance events from pmie.
You can often use the shell action to send events to these frameworks,
as they usually provide their a program for injecting events into the
framework from external sources.
A final option is use of the stomp (Streaming Text Oriented Messaging
Protocol) action, which allows pmie to connect to a central JMS (Java
Messaging System) server and send events to the PMIE topic. Tools can
be written to extract these text messages and present them to opera‐
tions people (via desktop popup windows, etc). Use of the stomp action
requires a stomp configuration file to be setup, which specifies the
location of the JMS server host, port number, and username/password.
The format of this file is as follows:
host=messages.sgi.com # this is the JMS server (required)
port=61616 # and its listening here (required)
timeout=2 # seconds to wait for server (optional)
username=joe # (required)
password=j03ST0MP # (required)
topic=PMIE # JMS topic for pmie messages (optional)
The timeout value specifies the time (in seconds) that pmie should wait
for acknowledgements from the JMS server after sending a message (as
required by the STOMP protocol). Note that on startup, pmie will wait
indefinately for a connection, and will not begin rule evaluation until
that initial connection has been established. Should the connection to
the JMS server be lost at any time while pmie is running, pmie will
attempt to reconnect on each subsequent truthful evaluation of a rule
with a stomp action, but not more than once per minute. This is to
avoid contributing to network congestion. In this situation, where the
STOMP connection to the JMS server has been severed, the stomp action
will return a non-zero error value.
FILES
$PCP_DEMOS_DIR/pmie/*
annotated example rules
$PCP_VAR_DIR/pmns/*
default PMNS specification files
$PCP_TMP_DIR/pmie
pmie maintains files in this directory to identify the run‐
ning pmie instances and to export runtime information about
each instance - this data forms the basis of the pmcd.pmie
performance metrics
$PCP_PMIECONTROL_PATH
the default set of pmie instances to start at boot time -
refer to pmie_check(1) for details
$PCP_VAR_DIR/config/pmie/*
the predefined alarm action scripts (email, log, popup and
syslog), the example action script (sample)and the concurrent
action control file (control.master).
/usr/pcp/lib/pmie-common
common shell procedures for the predefined alarm action
scripts
BUGS
The lexical scanner and parser will attempt to recover after an error
in the input expressions. Parsing resumes after skipping input up to
the next semi-colon (;), however during this skipping process the scan‐
ner is ignorant of comments and strings, so an embedded semi-colon may
cause parsing to resume at an unexpected place. This behavior is
largely benign, as until the initial syntax error is corrected, pmie
will not attempt any expression evaluation.
PCP ENVIRONMENT
Environment variables with the prefix PCP_ are used to parameterize the
file and directory names used by PCP. On each installation, the file
/etc/pcp.conf contains the local values for these variables. The
$PCP_CONF variable may be used to specify an alternative configuration
file, as described in pcp.conf(4).
UNIX SEE ALSOlogger(1).
WINDOWS SEE ALSOpcp-eventlog(1).
SEE ALSOPCPIntro(1), pmcd(1), pmdumplog(1), pmieconf(1), pmie_check(1),
pminfo(1), pmlogger(1), pmval(1), PMAPI(3), pcp.conf(4) and pcp.env(4).
USER GUIDE
For a more complete description of the pmie language, refer to the Per‐
formance Co-Pilot Users and Administrators Guide. This is distributed
in insight(1) format as part of the pcp.books subsystem, or in HTML
format from:
http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?\
db=bks&fname=/SGI_Admin/books/PCP_IRIX/sgi_html/ch05.html
Performance Co-Pilot SGI PMIE(1)