auto_p man page on IRIX

Man page or keyword search:  
man Server   31559 pages
apropos Keyword Search (all sections)
Output format
IRIX logo
[printable version]



AUTO_P(5)							     AUTO_P(5)

NAME
     AUTO_P - Automatic Parallelization

TOPIC
     This man page discusses automatic parallelization and how to achieve it
     with the Silicon Graphics MIPSpro Automatic Parallelization Option. The
     following topics are covered:

     Automatic Parallelization and the MIPSpro Compilers

     Using the MIPSpro Automatic Parallelization Option

Automatic Parallelization and the MIPSpro Compilers
     Parallelization is the process of analyzing sequential programs for
     parallelism so that they may be restructured to run efficiently on
     multiprocessor systems. The goal is to minimize the overall computation
     time by distributing the computational work load among the available
     processors. Parallelization can be automatic or manual.

     During automatic parallelization, the MIPSpro Automatic Parallelization
     Option, hereafter called the auto-parallelizer, analyzes and structures
     the program with little or no intervention by the developer. The auto-
     parallelizer can automatically generate code that splits the processing
     of loops among multiple processors. The alternative is manual
     parallelization by which the developer performs the parallelization using
     pragmas and other programming techniques. Manual parallelization is
     discussed in the mp(3f) and mp(3c) man pages.

     Automatic parallelization begins with the determination of data
     dependence of variables and arrays in loops. Data dependence can prevent
     loops from being safely run in parallel because the final outcome of the
     computation may vary depending on the order the various processors access
     the variables and arrays. Data dependence and other obstacles to
     parallelization are discussed in more detail in the next section.

     Once data dependences are resolved, a number of automatic parallelization
     strategies can be employed. They can consist of the following:

	  Loop interchange of nested loops

	  Scalar expansion

	  Loop distribution

	  Automatic synchronization of DOACROSS loops

	  Intraprocedural array privatization

     The 7.2 release of the MIPSpro compilers marks a major revision of the
     auto-parallelizer. The new release incorporates automatic parallelization
     into the other optimizations performed by the MIPSpro compilers. Previous

									Page 1

AUTO_P(5)							     AUTO_P(5)

     versions relied on preprocessors to provide source-to-source conversions
     prior to compilation. This change provides several benefits to
     developers:

     Automatic parallelization is integrated with optimizations for single
     processors

     A set of options and pragmas consistent with the rest of the MIPSpro
     compilers

     Support for C++

     Better run-time and compile-time performance

The MIPSpro Automatic Parallelization Option
     Developers exploit parallelism in programs to provide better performance
     on multiprocessor systems. You do not need a multiprocessor system to use
     th e automatic parallelizer. Although there is a slight performance loss
     when a single-processor system runs multiprocessed code, you can use the
     auto-parallelizer on any Silicon Graphics system to create and debug a
     program.

     The automatic parallelizer is an optional software product that is used
     as an extension to the following compilers:

	  MIPSpro Fortran 77

	  MIPSpro Fortran 90

	  MIPSpro C

	  MIPSpro C++

     It is controlled by flags inserted in the command lines that invoke the
     supported compilers.

Using the MIPSpro Automatic Parallelizer
     This section describes how to use the auto-parallelizer when you compile
     and run programs with the MIPSpro compilers.

   Using the MIPSpro Compilers to Parallelize Programs
     You invoke the auto-parallelizer by using the -pfa or -pca flags on the
     command lines for the MIPSpro compilers. The syntax for compiling
     programs with the auto-parallelizer is as follows:

     For Fortran 77 and Fortran 90 use -pfa:

     %f77 options -pfa [{ list | keep }] [ -mplist ] filename

									Page 2

AUTO_P(5)							     AUTO_P(5)

     %f90 options -pfa [{ list | keep }] [ -mplist ] filename

     For C and C++ use -pca:

     %cc options -pca [{ list | keep }] [ -mplist ] filename

     %CC options -pca [{ list | keep }] [ -mplist ] filename

     where options are MIPSPro compiler command-line options. For details on
     the other options see the documentation for your MIPSPro compiler.

     -pfa and -pca

	  Invoke the auto-parallelizer and enable any multiprocessing
	  directives.

     list
	  Produce an annotated listing of the parts of the program that can
	  (and cannot) run in parallel on multiple processors. The listing
	  file has the suffix .l.

     keep
	  Generate the listing file (.l), and the transformed equivalent
	  program (.m), and creates an output file for use with WorkShop Pro
	  MPF (.anl).

     -mplist
	  Generate a transformed equivalent program in a .w2f.f file for
	  Fortran 77 or a .w2c.c file for C.

     filename
	  The name of the file containing the source code.

     To use the automatic parallelizer with Fortran programs, add the -pfa
     flag to both the compile and link line. For C or C++, add the -pca flag.

     If you link separately, you must also add -mp to the link line. Previous
     versions of the Power compilers had a large set of flags to control
     optimization. The 7.2 version uses the same set of options as the rest of
     the MIPSPro compilers.  So, for example, while in the older Power
     compilers the option -pfa,-r=0 turned off roundoff changing
     transformations in the pfa preprocessor, in the new compiler
     -OPT:roundoff=0 turns off roundoff changing transformations in all phases
     of the compiler.

     The -pfa list option generates a .l file. The .l file lists the loops in
     your code, indicating which were parallelized and which were not. If any
     were not parallelized, it explains why not. The -pfa keep option
     generates a .l, a .m file and a .anl file that is used by the Workshop
     ProMPF tool. The .m file is similar to the .w2f.f or .w2c.c file except
     that the file is annotated with some information used by Workshop ProMPF

									Page 3

AUTO_P(5)							     AUTO_P(5)

     tool.

     The -mplist option will, in addition to compiling your program, generate
     a .w2f.f file (for Fortran 77, .w2c.c file for C) that represents the
     program after the automatic parallelization phase. These programs should
     be readable and in most cases should be valid code suitable for
     recompilation. The -mplist option can be used to see what portions of
     your code were parallelized.

     For Fortran 90 and C++, automatic parallelization happens after the
     source program has been converted into an internal representation. It is
     not possible to regenerate Fortran 90 or C++ after parallelization.

     Examples:

     Analyzing a .l File %cat foo.f

     subroutine sub(arr,n)
	   real*8 arr(n)
	   do i=1,n
	     arr(i) = arr(i) + arr(i-1)
	   end do
	   do i=1,n
	     arr(i) = arr(i) + 7.0
	     call foo(a)
	   end do
	   do i=1,n
	     arr(i) = arr(i) + 7.0
	   end do
	   end

     %f77 -O3 -n32 -mips4 -pfa list foo.f -c.

     Here's the associated .l file

     Parallelization Log for Subprogram sub_ 3: Not Parallel

	      Array dependence from arr on line 4 to arr on line 4.

     6: Not Parallel

	      Call foo on line 8.

     10: PARALLEL (Auto) __mpdo_sub_1

     Example Analyzing a .w2f.f File

     %cat test.f

     subroutine trivial(a)

       real a(10000)

									Page 4

AUTO_P(5)							     AUTO_P(5)

       do i=1,10000
	 a(i) = 0.0
       end do end

     %f77 -O3 -n32 -mips4 -c -pfa -c -mplist test.f

     We get both an object file, test.o, and a test.w2f.f file that contains
     the following code

     SUBROUTINE trivial(a)

       IMPLICIT NONE

       REAL*4 a(10000_8)

       INTEGER*4 i

     C$DOACROSS local(i), shared(a)

       DO i = 1, 10000, 1

	 a(i) = 0.0

       END DO

       RETURN

     END ! trivial

Running Your Program
     Invoke your program as if it were a sequential program. The same binary
     can execute using different numbers of processors. By default, the
     runtime will selec t how many processors to use based on the number of
     processors in the machine. The developer can use the environment
     variable, NUM_THREADS, to change the default to use an explicit number of
     processors. In addition, the developer can have the number of processors
     vary dynamically from loop to loop based on system load by setting the
     environment variable MP_SUGNUMTHD. Refer to the mp(3f) and mp(3c) for
     more details.

     Simply passing code through the auto-parallelizer does not always produce
     s all the increased performance available. In the next chapter, we
     discuss strategies for making effective use of the product when the
     auto-parallelizer is not able to fully parallelize an application.

   Analyzing the Automatic Parallelizer's Results

									Page 5

AUTO_P(5)							     AUTO_P(5)

     Running a program through the auto-parallelizer often results in
     excellent parallel speedups, but there are cases that cannot be
     automatically well parallelized. By understanding the listing files, you
     can sometimes identify small problems that prevent a loop from running
     safely in parallel. With a relatively small amount of work, you can
     remove these data dependencies and dramatically improve the program's
     performance.

     Hint:  When trying to find loops to run in parallel, focus your efforts
     on the areas of the code that use the bulk of the run time. Spending time
     trying to run a routine in parallel that uses only one percent of the run
     time of the program cannot significantly improve the overall performance
     of your program. To determine where your code spends its time, take an
     execution profile of the program using the Speedshop performance tools.

     The auto-parallelizer provides several mechanisms to analyze what it did.

     For Fortran 77 and C programs, the -mplist the code after
     parallelization. Manual parallelism directives are inserted on loops that
     have been automatically parallelized. For details about these directives,
     refer to Chapters 5-7, "Fortran Enhancements for Multiprocessors," of the
     MIPSpro Fortran 77 Programmer's Guide", or Chapter 11, "Multiprocessing
     C/C++ Compiler Directives," of the C Language Reference Manual.

     The output code in the .w2f.f or .w2c.c file should be readable and under
     standable. The user can use it as a tool to gain insight into what the
     auto-parallelizer did. The user can then use that insight to make changes
     to the original source program.

     Note that the auto-parallelizer is not a source to source preprocessor,
     but is instead an internal phase of the MIPSPro compilers. With a
     preprocessor system, a post parallelization file would always be
     generated and fed into the regular compiler. This is not the case with
     the auto-parallelizer. Therefore, compiling a .w2f.f or .w2c.c file
     through a MIPSPro compiler will not generate identical code to compiling
     the original source through the MIPSPro auto-parallelizer. But, often the
     two will be almost the same.

     The auto-parallelizer also provides a listing mechanism via the -pfa or
     -pca keep or -pfa or -pca list option. This will cause the compiler to
     generate a .l file. The .l file will list the original loops in the
     program along with messages telling whether or not the loops were
     parallelized. For loops that were not parallelized, an explanation will
     be given.

     Parallelization Failures With the Automatic Parallelizer

     This section discusses mistakes you can avoid and actions you can take to
     enhance the performance of the auto-parallelizer. The auto-parallelizer
     is not always able to parallelize programs effectively. This can be true
     for a number of reason s, some of which you can address. There are three

									Page 6

AUTO_P(5)							     AUTO_P(5)

     broad categories of parallelization failure:

     The auto-parallelizer does not detect that a loop is safe to parallelize

     The auto-parallelizer chooses the wrong nested loop to make parallel

     The auto-parallelizer parallelizes a loop that would run more efficiently
     sequentially

   Failure to Recognize Safe Loops
     We want the auto-parallelizer to recognize every loop that is safe to par
     allelize. A loop is not safe if there is data dependence, so the
     automatic parallelizer analyzes each loop in a sequential program to try
     to prove it is safe. If it cannot prove a loop is safe, it does not do
     the parallelization. A loop that contains any of the constructs described
     in this section may not be proved safe. However, in many instances the
     loop can be proved safe after minor changes. You should review your
     program's .l file, to see if there are any of these constructs in your
     code.

     Usually the failure to recognize a loop as safe is related to one or more
     of the following practices:.

     Function Calls in Loops

     GO TO Statements in Loops

     Complicated Array Subscripts

     Conditionally Assigned Temporary Variables in Loops"

     Unanalyzable Pointer Usage in C/C++

   Function Calls in Loops
     By default, the auto-parallelizer does not parallelize a loop that
     contains a function call because the function in one iteration may modify
     or depend on data in other iterations of the loop. However, a couple of
     tools can help with this problem.

     Interprocedural analysis, specified by the -IPA command-line option, can
     provide the auto-parallelizer with enough additional information to
     parallelize some loops that contain function calls. For more information
     on interprocedural analysis, see the MIPSpro Compiling and Performance
     Tuning Guide.

     The C*$* ASSERT CONCURRENT CALL Fortran assertion, discussed below allows
     you to tell the auto-parallelizer to ignore function calls when analyzing
     the specified loops.

   GO TO Statements in Loops

									Page 7

AUTO_P(5)							     AUTO_P(5)

     The use of GO TO statements in loops can cause two problems:

     Early exits from loops.
	  It is not possible to parallelize loops with early exits, either
	  automatically or manually.

     Unstructured control flows.
	  The auto-parallelizer attempts to convert unstructured control flows
	  in loops into structured constructs. If the auto-parallelizer cannot
	  restructure these control flows, your only alternatives are manual
	  parallelization or restructuring the code.

   Complicated Array Subscripts
     There are several cases where array subscripts are too complicated to
     permit parallelization.

     Indirect Array References
	  The auto-parallelizer is not able to analyze indirect array
	  references. Consider the following Fortran example.

	  do i= 1,n

	    a(b(i)) ...

	  end do

	  This loop cannot be run safely in parallel if the indirect reference
	  b(i) is equal to the same value for different iterations of i. If
	  every element of array b is unique, the loop can safely be made
	  parallel. In such cases, use either manual methods or the C*$*
	  ASSERT PERMUTATION Fortran directive discussed below, to achieve
	  parallelism.

     Unanalyzable Subscripts
	  The auto-parallelizer cannot parallelize loops containing arrays
	  with unanalyzable subscripts. In the following case, the auto-
	  parallelizer is not able to analyze the / in the array subscript and
	  cannot reorder the loop.

	  do i = l,u,2

	    a(i/2) = ... Changed to ().

	  end do

     Hidden Knowledge
	  In the following example there may be hidden knowledge about the
	  relationship between the variables m and n.

									Page 8

AUTO_P(5)							     AUTO_P(5)

	  do i = 1,n

	    a(i) = a(i+m) Changed to ().

	  end do

	  The loop can be run in parallel if m > n, because the arrays will
	  not overlap. However, because the auto-parallelizer does not know
	  the value of the variables, it cannot make the loop parallel.

   Conditionally Assigned Temporary Variables in Loops
     When parallelizing a loop, the auto-parallelizer often localizes
     (privatizes) temporary scalar and array variables. Consider the following
     example.

     do i = 1,n

       do j = 1,n

	 tmp(j) = ...

       end do

       do j = 1,n

	 a(j,i) = a(j,i) + tmp(j)

       end do

     end do

     The array tmp is used for local scratch space. To successfully
     parallelize the outer (i) loop, each processor must be given a distinct,
     private tmp array. In this example, the auto-parallelizer is able to
     localize tmp and parallelize the loop.  The auto-parallelizer runs into
     trouble when a conditionally assigned temporary variable might be used
     outside of the loop, as in the following example.

     subroutine s1(a,b)

       common t

       ...

       do i = 1,n

	 if (b(i)) then

	   t = ...

	   a(i) = a(i) + t

									Page 9

AUTO_P(5)							     AUTO_P(5)

	 end if

       end do

       call s2()

     If the loop were to be run in parallel, a problem would arise if the
     value of t were used inside subroutine s2(). Which processor's private
     copy of t should s2() use? If t were not conditionally assigned, the
     answer would be the processor that executed iteration n. But t is
     conditionally assigned and the auto-parallelizer cannot determine which
     copy to use.

     The loop is inherently parallel if the conditionally assigned variable t
     is localized. If the value of t is not used outside the loop, you should
     replace t with a local variable. Unless t is a local variable, the auto-
     parallelizer must assume that s2() might use it.

   Unanalyzable Pointer Usage in C/C++
     The C and C++ languages have features that make them more difficult than
     Fortran to automatically parallelize. Many of these features are related
     to the use of pointers. The following practices involving pointers
     interfere with the auto-parallelizer's effectiveness:

     Arbitrary Pointer Dereferences
	  The auto-parallelizer does not analyze arbitrary pointer
	  dereferences. The only pointers it analyzes are array references and
	  pointer dereferences that can be converted into array references.
	  The auto-parallelizer can subdivide the trees formed by
	  dereferencing arbitrary pointers and run the parts in parallel.
	  However, it cannot determine if the tree is really a directed graph
	  with an unsafe multiple reference. Therefore the parallelization is
	  not done.

     Arrays of Arrays
	  Multidimensional arrays are sometimes implemented as arrays of
	  arrays. Consider this example:  double **p;

	  for (int i = 0; i < n; i++)

	    for (int j = 0; j < n; j++)

	      p[i][j] =	 ...

	  If p is a true multi-dimensional array, the outer loop can be run
	  safely in parallel. If two of the array pointers, p[2] and p[3] for
	  example, reference the same array, the loop must not be run in

								       Page 10

AUTO_P(5)							     AUTO_P(5)

	  parallel. Although this duplicate reference is unlikely, the auto-
	  parallelizer cannot prove it doesn't exist. You can avoid this
	  problem by always using true arrays. To parallelize the code
	  fragment above, rewrite it as follows:

	  double p[n][n];

	  for (int i = 0; i < n; i++)

	    for (int j = 0; j < n; j++)

	      p[i][j] = ...

	  Note:	 Although ANSI C does not allow variable-sized multi-
	  dimensional arrays, there is a proposal to allow them in the next
	  standard. The MIPSPro 7.2 auto-parallelizer already implements this
	  proposal.

     Loops Bounded by Pointer Comparisons

	  The auto-parallelizer reorders only those loops in which the number
	  of it erations can be exactly determined. In Fortran programs this
	  is rarely a problem, but in C and C++ subtle issues relating to
	  overflow and unsigned arithmetic can come to play. One consequence
	  of this is that loops should not be bounded by pointer comparisons
	  such as

	  int* pl, pu;

	  for (int *p = pl; p != pu; p++)

	  This loop cannot be made parallel, and compiling it will result in a
	  .l file entry stating the bound cannot be standardized. To avoid
	  this result, restructure the loop to be of the form

	  int lb, ub;

	  for (int i = lb; i <= ub; i++)

     Aliased Parameter Information
	  Perhaps the most frequent impediment to parallelizing C and C++ is
	  aliased information. Although Fortran guarantees that multiple
	  parameters to a subroutine are not aliased to each other, C and C++
	  do not. Consider the following example:

	  void sub(double *a, double *b,n) {

	    for (int i = 0; i < n; i++)

								       Page 11

AUTO_P(5)							     AUTO_P(5)

	      a[i] = b[i];

	  This loop can be parallelized only if arrays a and b do not overlap.
	  With the option -OPT:alias=restrict, you can assure the auto-
	  parallelizer that the arrays do not overlap. This assurance permits
	  the auto-parallelizer to proceed with the parallelization. See the
	  MIPSpro Compiling and Performance Tuning Guide for details about
	  this option.

     Incorrectly Parallelized Nested Loops
	  The auto-parallelizer parallelizes a loop by distributing its
	  iterations among the available processors.

	  Because the resulting performance is usually better, the auto-
	  parallelizer tries to parallelize the outermost loop.

	  If it cannot do so, probably for one of the reasons mentioned in the
	  previous section, it tries to interchange the outermost loop with an
	  inner one that it can parallelize.

	  Example Nested Loops

	  do i = 1,n

	    do j = 1,n

	      ...

	    end do

	  end do

	  Even when most of your program is parallelized, it is possible that
	  the wrong loop is parallelized. Given a nest of loops, the auto-
	  parallelizer will only parallelize one of the loops in the nest. In
	  general, it is better to parallelize outer loops rather than inner
	  ones.

	  The auto-parallelizer will try to either parallelize the outer loop
	  or in terchange the parallel loop so that it will be outermost, but
	  sometimes it is not possible. For any of the reasons mentioned in
	  the previous section, the auto-parallelizer might be able to
	  parallelize an inner loop but not the outer one. Even if this
	  results in most of your code being parallelized, it might be
	  advantageous to modify your code so that the outer loop is
	  parallelized.

	  It is better to parallelize loops that do not have very small trip
	  counts.  Consider the following example.

	  do i = 1,m

								       Page 12

AUTO_P(5)							     AUTO_P(5)

	    do j = 1,n

	  The auto-parallelizer may decide to parallelize the i loop, but if m
	  is v ery small, it would be better to interchange the j loop to be
	  outermost and then parallelize it. The auto-parallelizer might not
	  have any way to know that m is small. In such cases, the user can
	  either use the C*$* ASSERT DO PREFER

	  directives discussed in the next section to tell the auto-
	  parallelizer that it is better to parallelize the j loop, or the
	  user can use manual parallelism directives.

	  Because of memory hierarchies, performance can be improved if the
	  same processors access the same data in all parallel loop nests.
	  Consider the following two examples.

	  Example   Inefficient Loop

	  do i = 1,n

	    ...a(i)

	  end do

	  do i = n,1

	    ...a(i)...

	  end do

	  Assume that there are p processors. In the first loop, the first
	  processor will access the first n/p elements of a, the second
	  processor will access the next n/p and so on. In the second loop,
	  the first processor will access the last n/p elements of a. Assuming
	  n is not too large, those elements will be in the cache of the a
	  different processor. Accessing data that is in some other
	  processor's cache can be very expensive. This example might run much
	  more efficiently if we reverse the direction of one of the loops.

	  Example   Efficient Loop

	  do i = 1,n

	    do j = 1,n

	      a(i,j) = b(j,i) + ...

	    end do

	  end do

								       Page 13

AUTO_P(5)							     AUTO_P(5)

	  do i = 1,n

	    do j = 1,n

	      b(i,j) = a(j,i) + ...

	    end do

	  end do

	  In this second example, the auto-parallelizer might chose to
	  parallelize the outer loop in both nests. This means that in the
	  first loop the first processor is accessing the first n/p rows of a
	  and the first n/p columns of b, while in the second loop the first
	  processor is accessing the first n/p columns of a and the first n/p
	  rows of b. This example will run much more efficiently if we
	  parallelize the i loop in one nest and the j loop in the other. The
	  user can add the prefer directives described in the next section to
	  solve this problem.

Unnecessarily Parallelized Loops
     The auto-parallelizer may parallelize loops that would run better sequent
     ially. While this is usually not a disaster, it can cause unnecessary
     overhead. There is a certain overhead to running loops in parallel. If,
     for example, a loop has a small number of iterations, it is faster to
     execute the loop sequentially. When bounds are unknown (and even
     sometimes when they are known), the auto-parallelizer parallelizes loops
     conditionally. In other words, code is generated for both a parallel and
     sequential version of the loop. The parallel version is executed only
     when the auto-parallelizer thinks that there is sufficient work for it to
     be worthwhile to execute the loop in parallel. This estimate depends on
     the iteration count, what code is inside the loop body, how many
     processors are available and the auto-parallelizer estimate for the
     overhead cost to invoke a parallel loop. This user can control the
     compiler's estimate for the invocation overhead using the option
     -LNO:parallel_overhead=n. The default value for n will vary on different
     systems, but typical values are in the low thousands.

     By generating two versions of the loop, we avoid going parallel in small
     trip count cases, but versioning does incur an overhead to do the dynamic
     check. The user can use the DO PREFER assertions to insure that a loop
     goes parallel or sequential without incurring a run-time test.

     Nested parallelism is not supported. Consider the following case:

     subroutine caller

       do i

	 call sub

								       Page 14

AUTO_P(5)							     AUTO_P(5)

       end do

     subroutine sub

       ...

       do i

	 ..

       end do

     end

     Suppose that the first loop is parallelized. It is not possible to
     execute the loop inside sub in parallel whenever sub is called by caller.
     Thus the auto-parallelizer must generate a test for every parallel loop
     that checks whether the loop is being invoked from another parallel loop
     or region. While this check is not very expensive, in some cases it can
     add to overhead. If the user knows that sub is always called from caller,
     the user can use the prefer directives to force the loop in sub to go
     sequential.

Assisting the Silicon Graphics Automatic Parallelizer
     This section discusses actions you can take to enhance the performance of
     the auto-parallelizer.

   Assisting the Automatic Parallelizer
     There are circumstances that interfere with the auto-parallelizer's
     ability to optimize programs. As shown in Parallelization Failures With
     the Automatic Parallelizer, problems are sometimes caused by coding
     practices. Other times, the auto-parallelizer does not have enough
     information to make good parallelization decisions. You can pursue three
     strategies to attack these problems and achieve better results with the
     auto-parallelizer.

     The first approach is to modify your code to avoid coding practices that
     the auto-parallelizer cannot analyze well.

     The second strategy is to assist the auto-parallelizer with the manual
     parallelization directives described in the MIPSpro Compiling and
     Performance Tuning Guide. The auto-parallelizer is designed to recognize
     and coexist with manual parallelism. You can use manual directives with
     some loop nests, while leaving others to the auto-parallelizer. This
     approach has both positive and negative aspects.

     On the positive side, the manual parallelism directives are well defined
     and deterministic. If you use a manual directive, the specified loop will
     run in parallel.

     Note:  This last statement assumes that the trip count is greater than

								       Page 15

AUTO_P(5)							     AUTO_P(5)

     one and that the specified loop is not nested in another parallel loop.

     On the negative side, you must carefully analyze the code to determine
     that parallelism is safe. Also, you must mark all variables that need to
     be localized.

     The third alternative is to use the automatic parallelization directives
     and assertions to give the auto-parallelizer more information about your
     code. The automatic directives and assertions are described in Directives
     and Assertions for Automatic Parallelization. Like the manual directives,
     they have positive and negative features:

     On the positive side, automatic directives and assertions are easier to
     use and they allow you to express the information you know without your
     having to be certain that all the conditions for parallelization are met.

     On the negative side, they are hints and thus do not impose parallelism.
     In addition, as with the manual directives, you must ensure that you are
     using them legally. Because they require less information than the manual
     directives, automatic directives and assertions can have subtle meanings.

   Directives and Assertions for Automatic Parallelization
     Directives enable, disable, or modify features of the auto-parallelizer.
     Assertions assist the auto-parallelizer by providing it with additional
     information about the source program. The automatic directives and
     assertions do not impose parallelism; they give hints and assertions to
     the auto-parallelizer in order to assist it in paralleling the that the
     right loops. To invoke a directive or assertion, include it in the input
     file.  Listed below are the Fortran directives and assertions for the
     auto-parallelizer.

     C*$* NO CONCURRENTIZE
	  Do not parallelize either a subroutine or file.

     C*$* CONCURRENTIZE
	  Not used. (See below.)

     C*$* ASSERT DO (CONCURRENT)
	  Ignore perceived dependences between two references to the same
	  array when parallelizing.

     C*$* ASSERT DO (SERIAL)
	  Do not parallelize the following loop.

     C*$* ASSERT CONCURRENT CALL
	  Ignore subroutine calls when parallelizing.

     C*$* ASSERT PERMUTATION (array_name)
	  Array array_name is a permutation array.

								       Page 16

AUTO_P(5)							     AUTO_P(5)

     C*$* ASSERT DO PREFER (CONCURRENT)
	  Parallelize the following loop if it is safe.

     C*$* ASSERT DO PREFER (SERIAL)
	  Do not parallelize the following loop.

	  Note:	 The general compiler option -LNO:ignore_pragmas causes the
	  auto-parallelizer to ignore all of these directives and assertions.

     C*$* NO CONCURRENTIZE
	  The C*$* NO CONCURRENTIZE directive prevents parallelization. Its
	  effect depends on where it is placed.

	  When placed inside a subroutine, the directive prevents the
	  parallelization of the subroutine. In the following example, SUB1()
	  is not parallelized.	Example:

		 SUBROUTINE SUB1

	  C*$* NO CONCURRENTIZE

		   ...

		 END

	  When placed outside of a subroutine, C*$* NO CONCURRENTIZE prevents
	  the parallelization of all the subroutines in the file. The
	  subroutines SUB2() and SUB3() are not parallelized in the next
	  example.  Example:

		 SUBROUTINE SUB2

		   ...

		 END

	  C*$* NO CONCURRENTIZE

		 SUBROUTINE SUB3

		   ...

		 END

	  The C*$* NO CONCURRENTIZE directive is valid only when the -pfa or
	  -pca command-line option is used.

     C*$* CONCURRENTIZE
	  The C*$* CONCURRENTIZE directive exists only to maintain backwards
	  compatibility, and its use is discouraged. Using the -pfa or -pca
	  option replaces using this directive.

								       Page 17

AUTO_P(5)							     AUTO_P(5)

     C*$* ASSERT DO (CONCURRENT)
	  C*$* ASSERT DO (CONCURRENT) says that when analyzing the loop
	  immediately following this assertion, the auto-parallelizer should
	  ignore any perceived dependences between two references to the same
	  array. The following example is a correct use of the assertion when
	  M > N.

	  Example:

	  C*$* ASSERT DO (CONCURRENT)

		 DO I = 1, N

		   A(I) = A(I+M)

	  This assertion is usually used to help the auto-parallelizer with
	  loops that have indirect array references. There are other facts to
	  be aware of when using this assertion.

	  If multiple loops in a nest can be parallelized, C*$* ASSERT DO
	  (CONCURRENT) causes the auto-parallelizer to prefer the loop
	  immediately following the assertion.	The assertion does not affect
	  how the auto-parallelizer analyzes CALL statements and dependences
	  between two potentially aliased pointers.

	  Note:	 If there are real dependences between array references, C*$*
	  ASSERT DO (CONCURRENT) may cause the auto-parallelizer to generate
	  incorrect code.

     C*$* ASSERT DO (SERIAL)
	  C*$* ASSERT DO (SERIAL) instructs the auto-parallelizer to not
	  parallelize the loop following the assertion.

     C*$* ASSERT CONCURRENT CALL
	  The C*$* ASSERT CONCURRENT CALL assertion tells the auto-
	  parallelizer to ignore subroutine calls contained in a loop when
	  deciding if that loop is parallel. The assertion applies to the loop
	  that immediately follows it and to all loops nested inside that
	  loop. The auto-parallelizer ignores subroutine FRED() when it
	  analyzes the following loop.

	  C*$* ASSERT CONCURRENT CALL

		 DO I = 1, N

		   CALL FRED

		   ...

		 END DO

								       Page 18

AUTO_P(5)							     AUTO_P(5)

		 SUBROUTINE FRED

		   ...

		 END

	  To prevent incorrect parallelization, you must make sure the
	  following conditions are met when using C*$* ASSERT CONCURRENT CALL:

	  A subroutine cannot read from a location inside the loop that is
	  written to during another iteration. This rule does not apply to a
	  location that is a local variable declared inside the subroutine.

	  A subroutine cannot write to a location inside the loop that is read
	  from during another iteration. This rule does not apply to a
	  location that is a local variable declared inside the subroutine.

	  The following code shows an illegal use of the assertion. Subroutine
	  FRED() writes to variable T which is also read from by WILMA()
	  during other iterations.

	  C*$* ASSERT CONCURRENT CALL

		 DO I = 1,M

		   CALL FRED(B, I, T)

		   CALL WILMA(A, I, T)

		 END DO

		 SUBROUTINE FRED(B, I, T)

		   REAL B(*)

		   T = B(I)

		 END

		 SUBROUTINE WILMA(A, I, T)

		   REAL A(*)

		   A(I) = T

		 END

	  By localizing the variable T, you could manually parallelize the
	  above example safely. But, the auto-parallelizer does not know to
	  localize T, and it illegally parallelizes the loop because of the

								       Page 19

AUTO_P(5)							     AUTO_P(5)

	  assertion.

     C*$* ASSERT PERMUTATION (array_name)
	  C*$* ASSERT PERMUTATION tells the auto-parallelizer that array_name
	  is a permutation array: every element of the array has a distinct
	  value. Array B is asserted to be a permutation array in this
	  example.

	  Example:

	  C*$* ASSERT PERMUTATION (B)

		 DO I = 1, N

		   A(B(I)) = ...

		 END DO

	  As shown in the previous example, you can use this assertion to
	  parallelize loops that use arrays for indirect addressing. Without
	  this assertion, the auto-parallelizer is not able to determine that
	  the array elements used as indexes are distinct.

	  Note:	 The assertion does not require the permutation array to be
	  dense.

     C*$* ASSERT DO PREFER (CONCURRENT)
	  C*$* ASSERT DO PREFER (CONCURRENT) says that the auto-parallelizer
	  should parallelize the loop immediately following the assertion, if
	  it is safe to do so. The following code encourages the auto-
	  parallelizer to run the I loop in parallel.

	  C*$*ASSERT DO PREFER (CONCURRENT)

		 DO I = 1, M

		   DO J = 1, N

		     A(I,J) = B(I,J)

		   END DO

		   ...

		 END DO

	  When dealing with nested loops, follow these guidelines:

	  If the loop specified by this assertion is safe to parallelize, the

								       Page 20

AUTO_P(5)							     AUTO_P(5)

	  auto-parallelizer chooses it to parallelize, even if other loops in
	  the nest are safe.

	  If the specified loop is not safe, the auto-parallelizer chooses
	  another loop that is safe, usually the outermost.

	  This assertion can be applied to more than one loop in a nest. In
	  this case, the auto-parallelizer uses its heuristics to choose one
	  of the specified loops.

	  Note: C*$* ASSERT DO PREFER (CONCURRENT) is always safe to use. The
	  auto-parallelizer will not illegally parallelize a loop because of
	  this assertion.

	  C*$* ASSERT DO PREFER (SERIAL)

	  The C*$* ASSERT DO PREFER (SERIAL) assertion requests the auto-
	  parallelizer not to parallelize the loop that immediately follows.
	  In the following case, the assertion requests that the J loop be run
	  serially.

		 DO I = 1, M

	  C*$*ASSERT DO PREFER (SERIAL)

		   DO J = 1, N

		     A(I,J) = B(I,J)

		   END DO

		   ...

		 END DO

	  Using C*$* ASSERT DO PREFER (SERIAL)

	  The assertion applies only to the loop directly after the assertion.

								       Page 21

[top]

List of man pages available for IRIX

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net