prof man page on DigitalUNIX

Man page or keyword search:  
man Server   12896 pages
apropos Keyword Search (all sections)
Output format
DigitalUNIX logo
[printable version]

prof(1)								       prof(1)

NAME
       prof, pixstats - Analyzes profile data

SYNOPSIS
       prof [options] [prog_name [PC-sampling_data_file]...]

       prof -pixie  [options] [prog_name [Addrs_file  |	 Counts_file]...]

       prof -pixstats  [options] [prog_name [Addrs_file	 |  Counts_file]...]

       pixstats [options] [prog_name [Addrs_file |  Counts_file]...]

OPERANDS
       Name  of the program executable to be profiled.	This program should be
       compiled with the -g1, -g2, or -g3 option to obtain more complete  pro‐
       filing  information.   If the default symbol table level (-g0) has been
       used, line number information, static procedure names, and  file	 names
       are  unavailable	 to the profiling code.	 Name of a profiling data file
       (default mon.out) produced by executing a program that has been	linked
       with the cc -p command.	Name of an instruction-counts file produced by
       executing a program that	 has  been  instrumented  with	pixie.	If  no
       Counts_file  or	Addrs_file  is	specified, prog_name.Counts is used if
       found in the current working directory.	Name of an instruction-address
       file  produced  when the executable or shared library object is instru‐
       mented with pixie. By default, the path of each object.Addrs file  will
       be  recorded  in	 the Counts_file, so they do not need to be specified.
       The order of precedence	for  finding  an  Addrs_file  is  as  follows:
       Addrs_file path specified on command line, current directory, directory
       of object specified in command line  argument,  directory  where	 pixie
       created it.

OPTIONS
       For  each prof option, you need to type only enough of the name to dis‐
       tinguish it from the other options. If you do not specify any  options,
       prof  uses  -procedures by default.  Always specify -pixie or -pixstats
       when you process and files.

       The prof command accepts the following options: Causes the profiles for
       all  shared libraries (if any) described in the data file(s) to be dis‐
       played, in addition to the profile for the executable.  Causes the pro‐
       filer to print the assembly instructions for each subroutine along with
       the cycle counts for each instruction. The subroutines are sorted  from
       highest cycle count to lowest. The instructions for each subroutine are
       printed in order; they are not sorted by cycle count.

	      When used without the -pixie option for a	 PC-sampling  profile,
	      the  CPU time used by each instruction is presented in millisec‐
	      onds.  (For uprofile and kprofile, per-instruction sample counts
	      are  also	 provided  for	events	other  than time.)  Alters the
	      appropriate parts of the listing to reflect the clock  speed  of
	      the  CPU.	 By  default, the cycle time of the processor on which
	      program was run is used. (Use this option only with  the	-pixie
	      option.)	 Disassembles and shows the analyzed object code. (Use
	      this option only with the -pixstats option.)  Limits the	disas‐
	      sembly  to  blocks with f% frequency. (Use this option only with
	      the -pixstats option.)  If you use one or more -exclude options,
	      the  profiler  omits the specified procedure and its descendents
	      from the listing.	 If any option	uses  an  uppercase  “E”  (for
	      “Exclude”),  prof	 also  omits that procedure from the base upon
	      which it calculates percentages. To represent all of the	varia‐
	      tions  of	 an overloaded C++ function name, you can specify just
	      the part of the name up to but not including  the	 “(”.	Causes
	      the profile for the named executable or shared library not to be
	      printed.	You can use this option multiple  times	 in  a	single
	      prof  command.   Produces	 a file with information that the com‐
	      piler system can use to decide which parts of the	 program  will
	      benefit most from global optimization and which parts will bene‐
	      fit most from in-line procedure  substitution  (requires	basic-
	      block counting). (Use this option only with the -pixie option.)

	      This  option  is for compilers whose -feedback option requires a
	      feedback file (rather than an executable file) and that  do  not
	      support  the  prof command's -update option.  For compilers that
	      support the -update option, better results can be achieved using
	      that option instead of the (prof) -feedback option.  Reports the
	      most heavily used lines in descending order of use.  Causes  the
	      profile  for the named shared library to be printed, in addition
	      to the profile for the executable. You can use this option  mul‐
	      tiple  times  in	a  single  prof	 command.  For each procedure,
	      reports how many times the procedure was invoked	from  each  of
	      its  possible callers (requires basic-block counting).  For this
	      listing, the -exclude and -only options apply  to	 callees,  but
	      not  to callers.	(Use this option only with the -pixie option.)
	      Changes the library directory search  order  for	shared	object
	      libraries	 so that prof looks for them in dir before the library
	      recorded in profile_file and the	default	 library  directories.
	      You  can	specify	 multiple  -Ldir  switches  to specify several
	      directory names.	Changes the library directory search order for
	      shared object libraries so that prof never looks for them in the
	      default library directories.  Use this option when  the  default
	      library directories should not be searched and only the directo‐
	      ries specified by -Ldir are to be searched.  Gives the lines  in
	      order  of	 occurrence  within  procedures.   The	procedures are
	      sorted in descending order of use.  Sums the sampling data files
	      (or,  in pixie mode, the files) and writes the result into a new
	      file with the specified name. The	 -only	and  -exclude  options
	      have  no effect on the merged data.  Uses 1 for each basic block
	      count. (Use this	option	only  with  the	 -pixstats  or	-pixie
	      option.)	Prints each procedure's starting line number if source
	      file information is available from the object file.  If you  use
	      one or more -only options, the profile listing includes only the
	      named procedures, rather than the entire program. If any	option
	      uses  an uppercase “O” for “Only,” prof uses only the named pro‐
	      cedures, rather than the entire program, as the base upon	 which
	      it calculates percentages. To represent all of the variations of
	      an overloaded C++ function name, you can specify just  the  part
	      of  the  name  up	 to  but not including the “(”.	 Selects pixie
	      mode, as opposed to sampling mode.   Selects  generation	of  an
	      alternative pixie-mode report for basic-block profiling data, as
	      previously produced by the pixstats(1) command. All  options  of
	      the previous version of pixstats(1) are recognized, for compati‐
	      bility.  Reports time spent per procedure (using	data  obtained
	      from  sampling  or basic-block counting; the listing tells which
	      one). For basic-block counting, this  option  also  reports  the
	      number  of  invocations  per procedure, including the aggregated
	      invocations of any alternate entry points.   Truncates  listings
	      after  n	lines (if n is an integer), after the first entry that
	      represents less than n percent of the total (if  n  is  followed
	      immediately  by  a  “%” character), or after enough entries have
	      been printed to account for n percent of the total (if n is fol‐
	      lowed immediately by “cum%”).  For example, “-quit 15” truncates
	      each part of the listing after 15 lines  of  text,  “-quit  15%”
	      truncates	 each  part  after the first line that represents less
	      than 15 percent of the whole, and “-quit 15cum%” truncates  each
	      part after the line that brought the cumulative percentage above
	      15 percent.  Reports all lines that never	 executed.  (Use  this
	      option only with the -pixie option.)  For -procedures and -invo‐
	      cations listings, prints cumulative statistics  for  the	entire
	      object file instead of for each procedure in the object.	Gener‐
	      ates more analysis of a program to provide a more accurate read‐
	      ing  of  cycles,	instead	 of  the  default  which  assumes each
	      instruction executes in one cycle. The higher the number	chosen
	      from  the arguments, the more accurate the reading, although the
	      profiler will run slower, and memory-access delays are still not
	      reflected.  This	option	has little or no effect on EV6 (21264)
	      and later Alpha systems. (Use this option only with  the	-pixie
	      option.)	 Updates  the program executable (prog_name) with pro‐
	      filing information in the specified .Counts files,  for  use  in
	      future  cc  -feedback prog_name command(s). This option requires
	      that prog_name have been compiled with the  -feedback  prog_name
	      option  or  updating  will fail. This option will not generate a
	      display unless another option forcing the	 display  behavior  is
	      specified.  (Use	this  option  only  with  the  -pixie option.)
	      Prints the tool's version number.	 Prints a list	of  procedures
	      that  were  never	 invoked (requires basic-block counting). (Use
	      this option only with the -pixie option.)

DESCRIPTION
       The prof command analyzes one or more data files generated by the  com‐
       piler's	execution-profiling  system  and  produces a listing. The prof
       command can also combine those data files or produce  a	feedback  file
       that lets the optimizer take into account the program's run-time behav‐
       ior  during  a  subsequent  compilation.	  Profiling  is	 a  three-step
       process:	 Compile  the  program Execute the program Run prof to analyze
       the data.

       The compiler system provides two kinds  of  profiling:  Interrupts  the
       program	periodically,  recording  the  value  of  the program counter.
       Divides the program into blocks delimited by labels, jump instructions,
       and  branch instructions. It counts the number of times each block exe‐
       cutes.

       The uprofile and kprofile tools provide a third kind of profiling, per‐
       formance	 counter  sampling. The Alpha architecture on-chip performance
       counters are used in performance counter sampling.

       The following sections describe how to perform  the  various  kinds  of
       profiling.

   PC-Sampling Profiles
       To  use	PC-sampling, compile your program with the -p option (strictly
       speaking, it is sufficient to use this option  only  when  linking  the
       program).  Then,	 run the program containing the profiling startup rou‐
       tine that calls monstartup to allocate extra memory to hold the profil‐
       ing  data.  If  the  program  terminates	 normally or calls exit(2), it
       records the data in a file at the end of execution.

       If your program uses shared libraries, note that only  its  call-shared
       portion is profiled in detail. Only the total time spent in each shared
       library is recorded. To individually profile  all  library  routines  a
       program	uses,  build  the  program  with  the  -non_shared  switch (by
       default, the compiler produces a call-shared object unless  -non_shared
       is  explicitly specified), or set the PROFFLAGS environment variable as
       described in the Environment Variables section.

       After running your program, use prof to analyze	the  PC-sampling  data
       file. For example:

       cc  -c  myprog.c cc -p -o myprog myprog.o myprog		       (gener‐
       ates mon.out) prof myprog mon.out

       When you use prof for PC-sampling, the program name defaults to	a.out.
       The PC-sampling data file name defaults to mon.out; if you specify more
       than one PC-sampling data file, prof reports the sum of the data.

   PC-Sampling Environment Variables
       You can use environment variables to change the default PC sampling and
       profile	data  collection behavior. The variables are PROFDIR and PROF‐
       FLAGS.  The general form for setting these variables is: For  C	shell:
       setenv varname "value" For Bourne shell: varname = "value"; export var‐
       name For Korn shell: export varname = value

       In the preceding example, varname can be one  of	 the  following:  This
       environment variable causes PC-sampling data files to be generated with
       unique file names in a specified directory.

	      You specify a directory path as the value and your prof  results
	      are placed in the file path/pid.progname where path is the path‐
	      name, pid is the process ID of the executing program, and	 prog‐
	      name  is	the  program name.  This environment variable can take
	      any of the following values: Causes a separate data file	to  be
	      generated	 for  each thread. The name of the data file takes the
	      following form: pid.sid.progname.

	      The form of the filename resolves to pid as the  process	ID  of
	      the program, sid as the sequence number of the thread, and prog‐
	      name as the name of the program being profiled.  Causes the pro‐
	      gram   to	 fully	profile	 all  the  permanently	loaded	shared
	      libraries, in addition to	 the  nonshared	 or  call-shared  exe‐
	      cutable.	 Causes	 the  program  to  profile only the named exe‐
	      cutable or shared library.  Causes the program  not  to  profile
	      the  named  executable or shared library.	 Causes prof to change
	      the ratio of text segment stride size to PC-sample counter  buf‐
	      fer  size,  that is, the number of instructions that are counted
	      together	in  a  single  counter	word.  The  appropriate	 ratio
	      involves	a tradeoff of size versus precision.  Strides of 1, 2,
	      4, and 8 are supported.  A special stride of 0 causes  a	single
	      PC-sample count to be recorded for each text segment.

	      The  default  stride  is 2 for the executable, and 0 for each of
	      its shared libraries. If -all  or	 -incobj  are  specified,  all
	      selected	objects	 are profiled with the same stride.  Automati‐
	      cally establishes monitor_signal(3) as the  signal  handler  for
	      the  named  signal,  and it causes monitor_signal(3) to zero the
	      profile after it is written to a file. This allows a  signal  to
	      be  sent	several times without the successive profiles overlap‐
	      ping, if the file is renamed. The asynchronous nature of a  sig‐
	      nal may cause small variations in the profile. Unrecognized sig‐
	      nal-names are ignored.  The -threads option is ignored  if  com‐
	      bined  with -sigdump.  Specifies the directory path in which the
	      profiling data file or files are created.	 [Disables] or enables
	      the addition of the process-id number to the name of the profil‐
	      ing data file or files.

       You can use the PROFDIR and PROFFLAGS environment  variables  together.
       For more information, see the Programmer's Guide.

   Basic-Block Counting
       To  use	basic-block  counting, compile your program without the option
       -p. Use the pixie program to translate your program  into  a  profiling
       version	 and   generate	 a  file  (prog_name.Addrs)  containing	 block
       addresses. Then, run the pixie version of the program, which  (assuming
       the  program  terminates	 normally  or  calls  exit(2))	will  generate
       a file (prog_name\.Counts) containing block counts.

       After running the pixie version of your	program,  use  prof  with  the
       -pixie  option  to analyze the and files.  Notice that you must specify
       the name of your original program, not the name	of  the	 version.  For
       example:

       cc  -c  myprog.c	 cc  -o myprog myprog.o pixie myprog	    (generates
       myprog.Addrs and myprog.pixie) myprog.pixie		    (generates
       myprog.Counts) prof -pixie myprog myprog.Addrs myprog.Counts

       When  you  use  prof  with the -pixie option, the file name defaults to
       prog_name.Addrs,	 and the file name defaults to prog_name.Counts.  Note
       that,  when  the	 file name defaults to prog_name.Counts, prof does not
       attach any path prefix to prog_name, and it looks for the file  in  the
       current	working	 directory.  If	 you  specify more than one file, prof
       reports the sum of the data.

       For each shared	library	 selected  for	profiling,  the	 prof  command
       searches	 for  an file in the following locations if the	 file location
       is not explicitly specified on  the  command  line:  Current  directory
       Directory  in  which  the object file is located if the location of the
       object file is explicitly specified on the command  line	 Directory  in
       which pixie created it, as recorded in the file

       For  each  selected  shared  library,  the prof command searches for an
       object file in the following locations: Directories specified in	 -Ldir
       options	Directory in which pixie found it, as recorded in the file, if
       the -L option is specified  Standard  library  search  directories,  as
       searched by ld, if the -L option is not specified

   Basic-Block Statistics
       Use the -pixstats option to get an alternative profile.	All options of
       the previous version of the pixstats(1)	command	 are  recognized,  for
       compatibility.

       If  a disassembly is requested, all basic blocks (or those whose execu‐
       tion count exceeds the -dislimit percentage of total instructions)  are
       disassembled,  in  increasing address order. Each block is labeled with
       its procedure name and any offset from the start of the procedure.  For
       each  instruction,  the	relative  estimated  CPU  cycle	 at  which the
       instruction executes is printed, plus its source line, address,	binary
       code,  and  assembly language.  The total CPU cycles used by one execu‐
       tion of the block, the number of times it was executed,	and  its  per‐
       centage	of  all	 instructions  executed	 are printed at the end of the
       block, following any line reporting a non-zero delay caused to  a  fol‐
       low-on block.

       The  main report begins with a record of the command line. This is fol‐
       lowed by a summary of the program's behavior: Total CPU cycles used  by
       the  profiled objects, plus the equivalent number of seconds Total num‐
       ber of instructions executed Total delay caused	by  instructions  exe‐
       cuted in the preceding basic block Total integer and floating-point no-
       op, arithmetic and logical,  logical,  shift,  load,  store,  load  and
       store,  load followed by load, load and store and fetch (data bus use),
       load and store relative to the  stack  or  global  pointers,  floating-
       point, floating-point compare, conditional branch instructions executed
       (itemized). Also, total number of branch	 instructions  executed	 whose
       target instruction is another branch; and total number of such branches
       that are estimated to be taken, rather than executing the next instruc‐
       tion  in	 line.	Total basic blocks, procedure calls, and branches that
       skip a single instruction that were executed.

       Next, some ratios are printed: Stores : stores + loads  Instructions  :
       basic  block  Instructions  : branches Backward branches : branches CPU
       cycles : procedure calls Instructions : procedure calls Integer	no-ops
       : integer and floating-point no-ops Floating-point no-ops : integer and
       floating-point no-ops Floating-point pipeline  interlocks  :  floating-
       point operators

       Next, basic blocks are analyzed according to how many instructions they
       contain. For each size, pixstats reports the execution count, its  pre‐
       centage	and  cumulative	 percentage  relative to both instructions and
       basic blocks, the number of instructions contained in  blocks  of  that
       size,  the percentage and cumulative percentage of this relative to all
       instructions, and the CPU-cycle cost per instruction of blocks of  that
       size.  Then,  pixstats  prints  various averages and quartiles of basic
       block size, plus the largest basic block	 execution  count  encountered
       (to indicate the chance of integer overflow in the analysis).

       Next,  pixstats analyzes the number of registers (integer and floating-
       point) that are saved on procedure entry (and restored  on  exit).   It
       prints the number of procedure entries that save a given number of reg‐
       isters, and the percentage and cumulative percentage of	this  relative
       to  all	procedure  entries,  all registers saved, and all instructions
       executed. Finally, it prints some averages and ratios.

       The next two tables contain information on the sizes of executed proce‐
       dures'  stack  frames  and  the	frequency of execution of each kind of
       instruction. Frame sizes are reported in “bits”; for  example,  6  bits
       means a 32- to 48-byte stack frame. The number, percentage, and cumula‐
       tive percentage of executed calls to procedures with  the  given	 frame
       size  is	 printed.  Similarly,  the execution count is printed for each
       machine instruction code, but  this  table  is  ordered	by  decreasing
       usage.

       The  next  four	tables are similar. They provide information about the
       size of literals used by	 various  categories  of  Alpha	 instructions:
       ADD,SUB,CMP instructions AND,BIC,BIS,XOR,CMOV instructions MUL instruc‐
       tions SHIFT,EXT,INS,MSK,ZAP instructions

       (Note that a table may be omitted if there is no use of literals in the
       program	for  the  particular  instruction category). For each of these
       tables the size of the literal is reported in bits (for example, 4 bits
       means the literal is greater than or equal to 8 and less than 16).

       The  next six tables are similar.  They contain information on the size
       of the memory displacement from a base register: LDA displacement  from
       0  (used	 like  a  load immediate instruction) LDAH displacement from 0
       (used like a load immediate high) Branch SP-based load/store  (load  or
       store within a stack frame) GP-based load/store (load or store within a
       global offset table) All load or store instructions

       Again, the “size” of the displacement is reported in bits; for example,
       6  bits means a 32 to 63 byte displacement. For both positive displace‐
       ments (in the “0-extend” column) and  negative  displacements  (in  the
       “1-extend”  column), the execution count is printed along with percent‐
       age and cumulative percentage.  The  summed  cumulative	percentage  is
       printed last (in the “Total” column).

       In  the	“static” analysis of instructions, each instruction is counted
       once per executed basic-block. The “static” distribution	 will  be  the
       same  as	 the  regular opcode distribution when -nocounts is specified.
       Following “static” totals for instructions and basic blocks, the number
       and percentage of each instruction code is listed.

       The  next two tables contain information on how many times each integer
       and floating-point register was accessed, plus its percentage,  ordered
       by  register  number.  For integer registers, the number and percent of
       uses as a base register in memory operations is also listed.

       Finally, pixstats prints a flat profile of CPU cycles  used  by	proce‐
       dures.	This  includes	the CPU cycles used by the procedure, the per‐
       centage of the total, the cumulative percentage, the number of instruc‐
       tions  executed	as  part  of  the procedure, its average number of CPU
       cycles per instruction, the number of calls made to the procedure,  the
       average number of CPU cycles per call, and the procedure name. If -num‐
       bers is specified, the object and source file names and line number are
       also printed.

   Performance Counter Samples
       After  running  the  uprofile  or kprofile utility to collect profiling
       data or your program or the kernel, respectively, run prof  to  examine
       the  resulting  mon.out or kmon.out file, as follows: For uprofile out‐
       put: prof prog_name mon.out For kprofile output: prof /vmunix kmon.out

       Use prof as for PC sampling, except that only the executable has a pro‐
       file.  Old performance counter sample data files, generated on versions
       of the operating system prior to DIGITAL UNIX Version 4.0, must be ana‐
       lyzed as if they contained PC-sampling data.

RESTRICTIONS
       The -pixstats option models execution assuming a perfect memory system.
       Memory system events such as cache misses will increase execution above
       the -pixstats predictions.

       The  set	 of statistics reported by the -pixstats option and the format
       of the report are the same as for previous versions of the  pixstats(1)
       command,	 but  note  the	 following:  The  labels on disassembled basic
       blocks take the form procedure-name (or proc_at_0x...  if no symbol  is
       available)  for	an  initial block and procedure-name+offset for subse‐
       quent blocks.  All reported cycles reflect CPU pipeline interlocks,  so
       they  usually do not match the reported instruction counts.  If not all
       the shared objects used by a program are profiled,  the	procedure-call
       counts may be smaller than the jsr/bsr instruction counts.

FILES
       Normal  startup	code  Startup code for PC-sampling Library for PC-sam‐
       pling Default kprofile data file Default PC-sampling data file  Default
       uprofile data file

SEE ALSO
       Introduction: prof_intro(1)

       Commands:  as(1),  cc(1), gprof(1), pixie(1), uprofile(1), kprofile(1),
       dxprof(1).  (dxprof is available as an option.)

       Functions:  monitor(3), profil(2)

       Programmer's Guide

								       prof(1)
[top]

List of man pages available for DigitalUNIX

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net