r10k_counters man page on IRIX

Man page or keyword search:  
man Server   31559 pages
apropos Keyword Search (all sections)
Output format
IRIX logo
[printable version]



PERF_COUNTERS(5)					      PERF_COUNTERS(5)

NAME
     r10k_evcntrs, r10k_event_counters, r10k_counters - Programming the
     processor event counters

DESCRIPTION
     The R1x000 processors include counters that can be used to count the
     frequency of events during the execution of a program.  The information
     returned by the counters can be helpful in optimizing the program.	 The
     perfex(1) and ssrun(1) commands provide convenient interfaces to hardware
     counter information.

THE COUNTERS
     The R10000 processor supplies two performance counters for counting
     certain hardware events. Each counter can track one event at a time and
     there are a choice of sixteen events per counter. There are also two
     associated control registers which are used to specify which event the
     relevant counter is counting.

     The R12000 and R14000 processors supply two performance counters for
     counting hardware events. Each counter can track one event at a time, and
     you can choose among 32 events per counter.

     Using performance counters in a machine with both R10000 and
     R12000/R14000 processors is currently undefined.

     Each counter is a 32-bit read / write register and is incremented by one
     each time the event specified in its associated control register occurs.
     Furthermore, the control registers allow one to indicate that the events
     are only counted in a specific mode. The modes may be user mode or
     several choices of kernel mode, or some combination of kernel and user
     mode.

     The counters can optionally assert an interrupt upon overflow, which is
     defined to be when the most significant bit of one of the counter
     registers (bit 31) becomes set. If such an overflow interrupt is enabled
     for that event in the associated control register, then the interrupt
     will be presented to the cpu. Whether the interrupt is asserted or not
     the counting of events will continue after overflow.

THE CONTROL REGISTERS
     The format of the control registers is as follows:

     31		      8		   4	    3	    2	    1	    0
     ___________________________________________________________________
       |	0     | Event	|   IE	 |   U	 |   S	 |   K	 | EXL |
     ___________________________________________________________________

     Bit 4 is the interrupt enable bit, which specifies whether overflows for
     the specified event will generate interrupts or not. Bits 3 through 0
     specify either the mode the event is counted in or the count enable bits.
     These bits will enable counting when they match the equivalent KSU
     settings in the status register of the R10000 or R12000/R14000. That is:

									Page 1

PERF_COUNTERS(5)					      PERF_COUNTERS(5)

     U bit <----> KSU = 2, EXL = 0, ERL = 0 (user mode)

     S bit <----> KSU = 1, EXL = 0, ERL = 0 (supervisor mode, not supported)

     K bit <----> KSU = 0, EXL = 0, ERL = 0 (kernel mode)

     EXL bit <---> EXL = 1, ERL = 0 (transient kernel mode)

     ERL is a field in the status register on coprocessor 0.  It is set when
     the processor hits an error and is forced into kernel mode.

     If the KSU bits in the status register are 2, and the ERL and EXL bits
     are both off, events enabled with the U bit will be counted. In this way,
     a program that intends to use the performance counters directly must
     specify the events that are to be counted and the modes in which they are
     to be counted.

EVENTS
     The following events can be tracked by the performance counters on R10000
     processors:

     0=cycles
	  Incremented on each clock cycle.

     1=issued instructions
	  Incremented each time an instruction is issued to ALU, FPU or
	  load/store units.

     2=issued loads
	  Incremented when a load, prefetch, or synchronization instruction is
	  issued.

     3=issued stores
	  Incremented when a store instruction is issued.

     4=issued store conditionals
	  Incremented when a conditional store instruction is issued.

     5=failed store conditionals
	  Incremented when a store-conditional instruction fails. A failed
	  store-conditional instruction will, in the normal course of events,
	  graduate; so this event represents a subset of the store conditional
	  instructions counted on event 20 (graduated store conditionals).

     6=Decoded branches
	  Incremented when a branch is decoded (for revision 2.x processors)
	  or resolved (for revision 3.x processors).

     7=Quadwords written back from secondary cache
	  Incremented when data is written back from secondary cache to the
	  system interface.

									Page 2

PERF_COUNTERS(5)					      PERF_COUNTERS(5)

     8=correctable secondary cache data array ECC errors
	  Incremented when single-bit ECC erros are detected on data read from
	  secondary cache.

     9=primary instruction cache misses
	  Incremented when the next instruction is not in primary instruction
	  cache.

     10=secondary instruction cache misses
	  Incremented when the next instruction is not in secondary
	  instruction cache.

     11=instruction misprediction from secondary cache way prediction table
	  Incremented when the secondary cache way mispredicted an
	  instruction.

     12=external interventions
	  Incremented when an external intervention is entered into the Miss
	  Handling Table (MHT), provided that the intervention is not an
	  invalidate type.

     13=external invalidations
	  Incremented when an intervention is entered into the Miss Handling
	  Table, provided that the intervention is an invalidate type.

     14=virtual coherency conditions or ALU/FPU completion cycles
	  Incremented on virtual coherency conditions (on revision 2.x R10000
	  processors) or on ALU/FPU functional unit completions cycles (on
	  revision 3.x R10000 processors).

     15=graduated instructions
	  Incremented when an instruction is graduated.

     16=cycles
	  Incremented on each clock cycle.

     17=graduated instructions
	  Incremented when an instruction is graduated.

     18=graduated loads
	  Incremented on a graduated load, prefetch, or synchronization
	  instruction.

     19=graduated stores
	  Incremented on a graduated store instruction.

     20=graduated store conditionals
	  Incremented when a graduated conditional store instruction is
	  issued.

									Page 3

PERF_COUNTERS(5)					      PERF_COUNTERS(5)

     21=graduated floating-point instructions
	  Incremented when a graduated floating-point instruction is issued.

     22=quadwords written back from primary data cache
	  Incremented when data is written back from primary data cache to
	  secondary cache.

     23=TLB misses
	  Incremented when a translation lookaside buffer (TLB) refill
	  exception occurs.

     24=mispredicted branches
	  Incremented when a branch is mispredicted.

     25=primary (L1) data cache misses.
	  Incremented when the next data item is not in primary data cache.

     26=secondary (L2) data cache misses.
	  Incremented when the next data item is not in secondary data cache.

     27=data mispredicted from secondary cache way prediction table
	  Incremented when the secondary cache way mispredicted a data item.

     28=external intervention hits in secondary cache (L2)
	  Set as follows when an external intervention is determined to have
	  hit in secondary cache:
	  00   Invalid, ho hit detected
	  01   Clean, shared
	  10   Clean, exclusive
	  11   dirty, exclusive

     29=external invalidation hits in secondary cache (L2)
	  Set when an external invalidate request is determined to have hit in
	  the secondary cache. Its value is equivalent to that described for
	  event 28.

     30=store/fetch exclusive to clean block in secondary cache (L2)
	  Incremented on each cycle by the number of entries in the Miss
	  Handling Table (MHT) waiting for a memory operation to complete.

     31=store/fetch exclusive to shared block in secondary cache (L2)
	  Incremented when an update request is issued for a line in the
	  secondary cache. If the line is in the clean state, the counter is
	  incremented by one. If the line is in the shared state, the counter
	  is incremented by two. The conditional counting mechanism can be
	  used to select whether one, both, or neither of these events is
	  chosen.

     Note that the definition of events 6 and 14 on counter 0 differ depending
     on the R10000 chip revision.  The chip revision can be determined via the
     command hinv(1).

									Page 4

PERF_COUNTERS(5)					      PERF_COUNTERS(5)

     The following events can be tracked by the performance counters on R12000
     and R14000 processors:

     0=cycles
	  Incremented on each clock cycle.

     1=decoded instructions
	  Incremented by the total number of instructions decoded on the
	  previous cycle. Since decoded instructions may later be killed (for
	  a variety of reasons), this count reflects the overhead due to
	  incorrectly speculated branches and exception processing.

     2=decoded loads
	  Incremented when a load instruction was decoded on the previous
	  cycle. Prefetch, cache operations, and synchronization instructions
	  are not included in the count of decoded loads.

     3=decoded stores
	  Incremented if a store instruction was decoded on the previous
	  cycle. Store conditionals are included in this count.

     4=mishandling table occupancy
	  Incremented on each cycle by the number of currently valid entries
	  in the Miss Handling Table (MHT). The MHT has five entries.  Four
	  entries are used for internally generated accesses; the fifth entry
	  is reserved for externally generated events. All five entries are
	  included in this count. See event 8 for a related definition.

     5=failed store conditionals
	  Incremented when a store-conditional instruction fails. A failed
	  store-conditional instruction will, in the normal course of events,
	  graduate; so this event represents a subset of the store-conditional
	  instructions counted on event 20 (graduated store-conditionals).

     6=resolved conditional branches
	  Incremented both when a branch is determined to have been
	  mispredicted and when a branch is determined to have been correctly
	  predicted. When this determination of the accuracy of a branch-
	  prediction is known, the branch is known as "resolved." This counter
	  correctly reflects the case of multiple floating-point conditional
	  branches being resolved in a single cycle.

     7=Quadwords written back from secondary cache
	  Incremented on each cycle that the data for a quadword is written
	  back from secondary cache to the system interface unit.

     8=correctable secondary cache data array ECC errors
	  Incremented on the cycle following the correction of a single-bit
	  error in a quadword read from the secondary cache data array.

									Page 5

PERF_COUNTERS(5)					      PERF_COUNTERS(5)

     9=primary instruction cache misses
	  Incremented one cycle after an instruction fetch request is entered
	  into the Miss Handling Table.

     10=secondary instruction cache misses
	  Incremented the cycle after a refill request is sent to the system
	  interface module of the CPU. This is normally just after the L2 tags
	  are checked and a miss is detected, but it may be delayed if the
	  system interface module is busy with another request.

     11=instruction misprediction from secondary cache way prediction table
	  Incremented when the secondary cache control begins to retry an
	  access because it hit in the unpredicted way, provided the access
	  that initiated the access was an instruction fetch.

     12=external interventions
	  Incremented on the cycle after an intervention is entered into the
	  Miss Handling Table, provided that the intervention is not an
	  invalidated type.

     13=external invalidations
	  Incremented on the cycle after an intervention is entered into the
	  Miss Handling Table, provided that the intervention is an invalidate
	  type.

     14=ALU/FPU progress cycles
	  Incremented on the cycle after either ALU1, ALU2, FPU1, or FPU2
	  marks an instruction as done.

     15=graduated instructions
	  Incremented by the number of instructions that were graduated on the
	  previous cycle. Integer multiply and divide instructions each count
	  two graduated instructions because they occupy two entries in the
	  active list.

     16=executed prefetch instructions
	  Incremented on the cycle after a prefetch instruction does its tag-
	  check, regardless of whether a data cache line refill is initiated.

     17=prefetch primary data cache misses
	  Incremented on the cycle after a prefetch instruction does its tag-
	  check and a refill of the corresponding data cache line is
	  initiated.

     18=graduated loads
	  Incremented by the number of loads that graduated on the previous
	  cycle. Prefetch instructions are included in this count. Up to four
	  loads can graduate in one cycle.

     19=graduated stores
	  Incremented on the cycle after a store graduates. Only one store can
	  graduate per cycle. Store conditionals are included in this count.

									Page 6

PERF_COUNTERS(5)					      PERF_COUNTERS(5)

     20=graduated store conditionals
	  Incremented on the cycle following the graduation of a store-
	  conditional instruction. Both failed and successful store-
	  conditional instructions are included in this count; so sucessful
	  store-conditionals can be determined as the difference between this
	  event and event 5 (failed store-conditionals).

     21=graduated floating-point instructions
	  Incremented by the number of floating-point instructions that
	  graduated on the previous cycle. There can be 0 to 4 such
	  instructions.

     22=quadwords written back from primary data cache
	  Incremented on each cycle that a quadword of data is valid and is
	  written from primary data cache to secondary cache.

     23=TLB misses
	  Incremented on the cycle after the translation lookaside buffer
	  (TLB) miss handler is invoked.

     24=mispredicted branches
	  Incremented on the cycle after a branch is restored because it was
	  mispredicted.

     25=primary data cache misses
	  Incremented one cycle after a request is entered into the SCTP
	  logic, provided that the request was initially targeted at the
	  primary data cache. Such requests fall into three categories:
	  1) Primary data cache misses.

	  2) Requests to change the state of
	  secondary and primary data cache
	  lines from clean to dirty ("update"
	  requests) due to stores that hit
	  a clean line in the primary data
	  cache.

	  3) Requests initiated by cache
	  operation instructions.

     26=secondary data cache misses
	  Incremented the cycle after a refill request is sent to the system
	  interface module of the CPU. This is normally just after the L2 tags
	  are checked and a miss is detected, but it can be delayed if the
	  system interface module is busy with another request.

     27=data misprediction from secondary cache way prediction table
	  Incremented when the secondary cache control begins to retry an
	  access because it hit in the unpredicted way. The counter is
	  incremented only if access that initiated the access was not an
	  instruction fetch.

									Page 7

PERF_COUNTERS(5)					      PERF_COUNTERS(5)

     28=state of external intervention hits in secondary cache
	  Set on the cycle after an external intervention is determined to
	  have hit in the secondary cache. The value of the event is equal to
	  the state of the secondary cache line that was hit. Setting a
	  performance control register to select this event has a special
	  effect on the conditional counting behavior. If event 28 or 29 is
	  selected, the sense of the "Negated conditional counting" bit is
	  inverted. See the description of conditional counting for details.
	  The values are:
	  00   Invalid, ho hit detected
	  01   Clean, shared
	  10   Clean, exclusive
	  11   dirty, exclusive

     29=state of invalidation hits in secondary cache (L2)
	  Set on the cycle after an external invalidate request is determined
	  to have hit in secondary cache. Its value is equivalent to that
	  described for event 28.

     30=Miss Handling Table entries accessing memory
	  Incremented on each cycle by the number of entries in the Miss
	  Handling Table (MHT) waiting for a memory operation to complete. It
	  is always less than or equal to the value tracked by counter 4. An
	  entry is considered to begin accessing memory when the cache control
	  logic recognizes that a request must be sent via the SysA/D bus. An
	  entry is included in this count from that point until the entry is
	  removed from the MHT. For example, once the secondary cache tags are
	  checked and an secondary cache miss is recognized, the entry that
	  originated the request is included in this count. It continues to be
	  included until the last word of the refilled line is written into
	  the secondary cache and the MHT entry is removed. Unlike counter 4,
	  the fifth slot of the MHT, which is reserved for externally
	  generated requests, is not included in this count.

     31=store/prefetch exclusive to shared block in secondary cache (L2)
	  Incremented on the cycle after an update request is issued for a
	  line in the secondary cache. If the line is in the clean state, the
	  counter is incremented by one. If the line is in the shared state,
	  the counter is incremented by two. The conditional counting
	  mechanism can be used to select whether one, both, or neither of
	  these events is chosen.

     The kernel maintains 64-bit virtual counters for the user program using
     the hardware counters. The view of the counters as being 64-bits is
     maintained through the programming interfaces that use them, even though
     the actual counters are only 32 bits. Similarly, there are only two
     hardware counters per CPU, but the programming interface supports the
     view that there are actually 32 counters. That is, a user program can
     specify that more than one event per hardware counter is to be counted,
     up to sixteen events per counter. The kernel will then multiplex the
     events across clock tick boundaries. So, if a program is tracking more
     than one event per counter, on every clock tick the kernel will check to

									Page 8

PERF_COUNTERS(5)					      PERF_COUNTERS(5)

     see if it is necessary to switch the events being tracked. If necessary,
     it will save the counts for the previous events and set up the counters
     for the next event. Thus, to the program there are 32 64-bit counters
     available.

     The performance counters are available to the user program primarily
     through the perfex(1) and ssrun(1) commands.  You can also access the
     counters through the /proc(4) interface. A limited and more specialized
     functionality is also provided through the syssgi(2) interface, but this
     is not intended to be the general interface.

     Using perfex, you can select the events to be counted on hardware
     counters and the executable program to be run.  The perfex command prints
     the values of the hardware counters following the run.  See the perfex(1)
     man page for more information.

     The ssrun command is part of the SpeedShop performance analysis package,
     and it provides input to the WorkShop cvperf(1) user interface or, in
     ASCII format, to the prof(1) command.  See the various man pages, the
     SpeedShop User's Guide, and the Developer Magic: Performance Analyzer
     User's Guide for more information.

     Through /proc, ioctls allow you to start or stop using the counters, to
     read the counts in your own counters, or to modify the way the counters
     are being used. Since this interface specifies a process ID as a
     parameter, it is possible, in general, for a process to read or
     manipulate the counters of another process, as long as the process
     belongs to the same process group or is root.

     There are also ioctls that allow the program to specify overflow
     thresholds on a per-event basis and to supply a signal to be sent to the
     program upon overflow. That is, the fact that an interrupt can be
     generated whenever a particular counter overflows can be exploited to
     allow a program to specify a threshold n for an event such that after n
     occurrences of the event an interrupt will be generated. In addition,
     while the kernel is servicing the counter overflow interrupt, it can
     perform some user-specified action, such as sending a user-specified
     signal to the program whenever an overflow is generated or incrementing a
     PC bucket for profiling. The latter choice is a more specialized
     functionality and is not part of the general /proc interface.

     For a process using the counters in user mode, the control block for the
     counters is kept in the u-area. Thus, once the process forks, the child
     acquires the same state of the counters as the parent, which implies that
     the next time the child runs the performance counters will be run for the
     child, tracking the same events as its parent. Therefore, the counter
     values are zeroed for the child upon fork so that at a later time the
     child's counters will accurately depict the activity of the child. For
     this reason, it is possible for the parent to fork and then wait for the
     child to exit. When the child exits, if the kernel sees that the parent
     is waiting for the child it will add the child's 64-bit counters to those
     of the parent, and the parent will thus have the event trace of the

									Page 9

PERF_COUNTERS(5)					      PERF_COUNTERS(5)

     child. Other methods for a parent to acquire a child's counters are
     discussed with the PIOCSAVECCNTRS ioctl.

Operation Modes for the Performance Counters
     There are two basic modes that the counters are used in, user mode and
     system mode. Using them in user mode allows the counters to be shared
     among any number of user programs. In this mode the kernel saves and
     restores the counts and state of the counters across context switch
     boundaries. System mode is defined when a user with root privileges uses
     the counters in kernel mode (user mode and/or EXL mode may also be
     specified, but kernel mode is essential). In this mode there are no
     context switch boundaries and so other programs will not be able to use
     the counters when they are in use in system mode.

     Therefore, when the counters are already in use in user mode, a program
     which attempts to use them in system mode will fail with EBUSY since the
     two modes cannot co-exist (unless certain commands are employed to force
     releasing of the counters in user mode and the acquiring of them in
     system mode- to be discussed later). Likewise, if the counters are in use
     in system mode, any program attempting to use the counters will fail with
     EBUSY (root-level or otherwise).

     The approach taken to these two operating modes is that system mode has a
     higher priority. For this reason there is a syssgi command to forcibly
     acquire the counters in system mode if the counters are not actively in
     use by a running program. And any users of the counters who are not
     currently running will not be able to acquire them when they run again.
     This latter situation holds at all times. That is, there may be several
     programs sharing the counters in user mode. If at any moment they happen
     to all be switched out, the counters are temporarily free. At this point
     it is possible for a super-user to acquire the counters in system mode.
     Then, when the other programs are run again, they won't be able to
     acquire the counters since they are in use in system mode. Since this
     program will then be run at this point without the intended event
     counting, the kernel will arrange it such that this program will not use
     the counters again, unless they are explicitly restarted. This is because
     the values in the counters are no longer representative of the program.

     To re-iterate, a root-level program may receive EBUSY from the kernel if
     it tries to acquire the counters in system mode through /proc and they
     are actively in use at the time of the system call. If they are in use in
     user mode by other programs but those programs are not running at the
     time of the system call, then the counters will be successfully acquired
     in system mode and the other programs will not be able to acquire them
     again- the kernel will not try to start up the counters for those other
     programs again.

     In order to make this situation visible to the program, a generation
     number is employed to reflect the current state of the counters. In this
     case, whenever the kernel does turn off the use of the counters for a

								       Page 10

PERF_COUNTERS(5)					      PERF_COUNTERS(5)

     program because the mode of operation has switched from user mode to
     system mode, the generation number for the counters for the user programs
     will be increased. Thus, subsequent reads of the counters will return the
     new number and should signal the program that the counter values are not
     to be trusted. The number will be discussed in greater detail later.

     To support using the counters in system mode, each cpu has its own
     control block for the counters, pointed to in its private area. There is
     also a global counter control block which maintains counter state for the
     entire system. When the counters are being used in system mode they are
     not read and stored across context switch boundaries. In fact, unless
     they are explicitly read by a program, the counters are not read by the
     kernel until there is an overflow interrupt. When this occurs the cpu on
     which the interrupt occurs updates its own private virtual counters, no
     changes are made to the global counter control block.

     When the counters are read in system mode via PIOCGETEVCTRS through
     /proc, the per-cpu counters are all added together into the global
     counters so that the global counters represent the sum total of the
     counted events for the entire system. This same coalescing of the per-cpu
     counters happens when the counters are released. Note that it is also
     possible to read a particular cpu's counters via the syssgi
     HWPERF_GET_CPUCNTRS command.

		 /proc Commands for the Performance Counters

     To support the /proc interface for the counters, there are several data
     structures defined in /usr/include/sys/hwperftypes.h that are used to
     either pass parameters with the calls or to receive data back from the
     kernel.

     struct hwperf_ctrlreg {
	     ushort_t	  hwp_ev  :11, /* event counted */
			  hwp_ie  :1,  /* overflow intr enable */
			  hwp_mode:4;  /* user/kernel/EXL */
     };

     typedef union {
	     short		   hwperf_spec;
	     struct hwperf_ctrlreg hwperf_creg;
     } hwperf_ctrl_t;

     typedef struct {
	     hwperf_ctrl_t hwp_evctrl[HWPERF_EVENTMAX];
     } hwperf_eventctrl_t;

     Each event is described to the kernel through an hwperf_ctrl_t. Where
     relevant, the ioctls take the address of an hwperf_eventctrl_t, the array
     of 32 hwperf_ctrl_t's. If the user is not interested in an event, then

								       Page 11

PERF_COUNTERS(5)					      PERF_COUNTERS(5)

     care must be taken to ensure that the corresponding element in this array
     is zero.

     For a user to gain access to the counters, it must indicate which events
     are of interest and how they are to be counted; whether overflow
     thresholds are to be used to generate overflow interrupts or not, and
     what those thresholds are per event; and what signal the user program
     would like to receive from the kernel upon overflow interrupt. All of
     this information is conveyed with the structure hwperf_profevctrarg_t:

     typedef struct hwperf_profevctrarg {
	  hwperf_eventctrl_t hwp_evctrargs;
	  int		     hwp_ovflw_freq[HWPERF_EVENTMAX];
	  int		     hwp_ovflw_sig; /* SIGUSR1,2 */
     } hwperf_profevctrarg_t;

     With the above structure as parameter the user program must take care to
     zero the hwp_ovflw_freq elements for which no overflow thresholds are
     intended. The hwp_ovflw_sig field is used to tell the kernel which signal
     the program wants to receive upon overflow interrupt. The acceptable
     signals are between 1 and 32 (SIG32). This field should be zero if no
     signals are wanted.

     The following structure is an array of 32 64-bit virtual counters and is
     used when a program wants to read the virtual counters of a process:

     typedef struct {
	  __uint64_t hwp_evctr[HWPERF_EVENTMAX];
     } hwperf_cntr_t;

     The ioctls available through /proc are the following:

     PIOCENEVCTRS   - Start using the counters for a process, either in user
		      mode or system mode. It initializes the counters for the
		      target process and, if the process is running, starts
		      them. Otherwise, the counters will be started the next
		      time the process is run. Fails with EINVAL if events are
		      specified events improperly, or if an input overflow
		      frequency (threshold) is negative.

		      If supervisor or kernel mode is specified for any of
		      the events and the caller does not have root privileges,
		      it will fail with EPERM. EBUSY may be returned for two
		      possible reasons:
		      (1) the counters are already in use in system mode or,
		      (2) the caller is requesting the counters in system
		      mode and, at the time of the request, the counters are
		      in use in user mode, on at least one cpu (this command

								       Page 12

PERF_COUNTERS(5)					      PERF_COUNTERS(5)

		      will not forcibly acquire the counters for a root
		      process).

		      Returns a positive generation number if successful.

     PIOCGETEVCTRS  - Read the virtual counters of the target process.
		      The address of an hwperf_cntr_t must be supplied in
		      the call.

		      Returns a positive generation number if successful.

     PIOCGETEVCTRL  - Retrieve the control information for the process's
		      counters: which events are being counted and the mode
		      they are being counted in. The kernel will copyout an
		      array of 32 event specifiers, so the user must supply
		      an address of an hwperf_eventctrl_t.

		      Returns a positive generation number if successful.

     PIOCSETEVCTRL  - Modify how a program is using the counters, whether it
		      be events and/or their associated mode of operation, or
		      overflow threshold values, or overflow signal. Once the
		      counters have been acquired this is how their operation
		      for a program is modified without releasing the
		      counters. Each time the PIOCSETEVCTRL is made the
		      generation number for the target process's counters will
		      be incremented. The parameter to this call is the
		      address of an hwperf_profevctrarg_t.

		      Returns a positive generation number if successful.

     PIOCRELEVCTRS  - Release the performance counters- the target process
		      will not have any events counted after this call. Note
		      that the virtual counters associated with the target
		      may still be read as long as the process has not exited.
		      No parameters are necessary.

     PIOCSAVECCNTRS - Allow a parent process to receive the counter values
		      of one of its children when it exits, without having to
		      wait for the child (when the parent is waiting no
		      explicit call is necessary). When the child exits its
		      counter values will be added to the parent's, whether
		      the parent is using its counters or not. No parameters
		      are necessary other than target pid.

EXAMPLE
     An example of how these commands would be used is given here. Suppose
     that we wanted to count instruction cache misses and data cache misses
     for our own program. That means that we want to count event 9 for both
     counters, and these events would be counted in user mode.	The following
     code would accomplish this. Note that the constants used are defined in

								       Page 13

PERF_COUNTERS(5)					      PERF_COUNTERS(5)

     /usr/include/sys/hwperfmacros.h, and evctr_args is an
     hwperf_profevctrarg_t.

     pid = getpid();
     sprintf(pfile, "/proc/%05d", pid);
     fd = open(pfile, O_RDWR);
     for (i = 0; i < HWPERF_CNTEVENTMAX; i++) {
	 if (i == 9) {
	     evctr_args.hwp_evctrargs.hwp_evctrl[i].hwperf_creg.hwp_mode = HWPERF_CNTEN_U;
	     evctr_args.hwp_evctrargs.hwp_evctrl[i].hwperf_creg.hwp_ie = 1;
	     evctr_args.hwp_evctrargs.hwp_evctrl[i].hwperf_creg.hwp_ev = i;
	     evctr_args.hwp_ovflw_freq[i] = 0;
	 } else {
	     evctr_args.hwp_evctrargs.hwp_evctrl[i].hwperf_spec = 0;
	     evctr_args.hwp_ovflw_freq[i] = 0;
	 }
     }

     for (i = HWPERF_CNT1BASE; i < HWPERF_EVENTMAX; i++) {
	 if (i == 9) {
	     evctr_args.hwp_evctrargs.hwp_evctrl[i].hwperf_creg.hwp_mode = HWPERF_CNTEN_U;
	     evctr_args.hwp_evctrargs.hwp_evctrl[i].hwperf_creg.hwp_ie = 1;
	     evctr_args.hwp_evctrargs.hwp_evctrl[i].hwperf_creg.hwp_ev = i - HWPERF_CNT1BASE;
	     evctr_args.hwp_ovflw_freq[i] = 0;
	 } else {
	     evctr_args.hwp_evctrargs.hwp_evctrl[i].hwperf_spec = 0;
	     evctr_args.hwp_ovflw_freq[i] = 0;
	 }
     }
     evctr_args.hwp_ovflw_sig = 0;
     generation1 = ioctl(fd, PIOCENEVCTRS, (void *)&evctr_args);
     if (generation1 < 0) {
	 perror("failed to acquire counters");
	 exit errno;
     }

	 . . . . . (body of program) . . . .

     /* now read the counter values */
     if ((generation2 = ioctl(fd, PIOCGETEVCTRS, (void *)&cnts)) < 0) {
	 perror("PIOCGETEVCTRS returns error");
	 exit(errno);
     }

     /* generation number should be the same */
     if (generation1 != generation2) {
	 printf("program lost event counters0);

								       Page 14

PERF_COUNTERS(5)					      PERF_COUNTERS(5)

	 exit 0;
     }

     /* release the counters */
     if ((ioctl(fd, PIOCRELEVCTRS)) < 0) {
	 perror("prioctl PIOCRELEVCTRS returns error");
	 exit(errno);
     }

     /* print out the counts */
     printf("instruction cache misses: %d/0, cnts.hwp_evctr[9]);
     printf("data cache misses: %d/0, cnts.hwp_evctr[25]);
     exit 0;

		 Syssgi Commands for the Performance Counters

     The syssgi commands that access the event counters are not intended for
     general use. Rather, specialized commands are implemented through this
     interface. Note that all the commands are the first argument to the
     syssgi command SGI_EVENTCTR. The available commands are:

     HWPERF_PROFENABLE	   - Enable sprofil-like profiling using the
			     performance counters rather than the clock.
			     Returns EINVAL on incorrect input, or EBUSY
			     if the counters are already in use in system
			     mode. The second argument to this command is
			     the address of an hwperf_profevctrarg_t, the
			     argument is a profp, the fourth is the profcnt,
			     both referring to input necessary for profiling.

			     Returns a positive generation number if
			     successful.

     HWPERF_ENSYSCNTRS	   - Forcibly acquire the counters in system mode.

			     ROOT PERMISSIONS ARE REQUIRED FOR THIS COMMAND.

			     Note that the counters must be set up in kernel
			     mode (usr and EXL may be included, but kernel mode
			     is required), EINVAL will be returned otherwise.
			     That is, at least one of the events must be
			     counted in kernel mode. Will fail with EBUSY if
			     the counters are already in use in system mode.
			     Otherwise, the command is guaranteed to return
			     the counters in system mode. Starts up the
			     counters on all the cpus, with all the cpus
			     counting the same events.

								       Page 15

PERF_COUNTERS(5)					      PERF_COUNTERS(5)

			     Takes as input (third parameter of syssgi call)
			     the address of an hwperf_profevctrarg_t, which
			     is set up just as it is for the PIOCENEVENTCTRS
			     (see example above).

			     Returns a positive generation number if
			     successful.

     HWPERF_GET_SYSCNTRS   - Read the global system counters to get the global
			     event counts. All of the per-cpu counters will be
			     aggregated into the global counters and the
			     results will be returned to the caller. Caller
			     must supply in third argument the address of
			     an hwperf_cntr_t.

			     Returns a positive generation number if
			     successful.

     HWPERF_GET_CPUCNTRS   - Read a particular cpu's event counters. The third
			     parameter is a cpuid, the fourth is the address
			     of an hwperf_cntr_t.

			     Returns a positive generation number if
			     successful, 0 otherwise (which would indicate
			     an invalid cpuid.)

     HWPERF_GET_SYSEVCTRL  - Retrieve the control information for the systems
			     event counters: which events are being counted
			     and the modes they are being counted in. The third
			     parameter must be the address of an
			     hwperf_eventctrl_t. Returns EINVAL if the counters
			     are not in use.

			     Returns a positive generation number if
			     successful.

     HWPERF_SET_SYSEVCTRL  - Modify how the system counters are operating,
			     whether it be events being counted and/or their
			     associated mode of operation, or overflow
			     threshold values, or overflow signal.

			     MUST BE ROOT TO ISSUE THIS COMMAND, or else EPERM
			     will be returned.

			     Once the counters have been acquired this is how
			     their operation is modified without releasing
			     them. Each time the system call
			     syssgi(SGI_EVENTCTR, HWPERF_SET_SYSEVCTRL,...)
			     is issued the generation number for the system's
			     counters is incremented. The third parameter to

								       Page 16

PERF_COUNTERS(5)					      PERF_COUNTERS(5)

			     this call is the address of an
			     hwperf_profevctrarg_t.

			     Returns a positive generation number if
			     successful.

     HWPERF_RELSYSCNTRS	   - Stop using the counters in system mode and to
			     make the counters available again.
			     ROOT PERMISSION REQUIRED.

			     Returns 0 upon success.

NOTES
     The following list, ordered by events traced, details revision 3 of the
     R10000 CPU counters that return information different from the
     R12000/R14000 CPU counters.  If an event is not listed here, it is the
     same on both CPU types.

     Event		 R10000			    R12000/R14000
     1	     Issued instructions	      Decoded instructions
     2	     Issued loads		      Decoded loads
     3	     Issued stores		      Decoded stores
     4	     Issued store conditionals	      Decoded store conditionals
     16	     Cycles
     17	     Graduated instructions	      Data cache misses
     30	     Store/fetch exclusive to clean   MHT entries

FILES
     /usr/include/sys/hwperftypes.h
     /usr/include/sys/hwperfmacros.h

SEE ALSO
     ecadmin(1M), ecstats(1M), perfex(1M), libperfex(3C), and libperfex(3F).

								       Page 17

[top]

List of man pages available for IRIX

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net