mmci man page on IRIX

Man page or keyword search:  
man Server   31559 pages
apropos Keyword Search (all sections)
Output format
IRIX logo
[printable version]



mmci(5)								       mmci(5)

NAME
     mmci - Memory Management Control Interface

DESCRIPTION
     This document describes the concepts and interfaces provided by IRIX for
     fine tuning memory management policies for user applications.

   Policy Modules
     The ability of applications to control memory management becomes an
     essential feature in multiprocessors with a CCNUMA memory system
     architecture. For most applications, the operating system is capable of
     producing reasonable levels of locality via initial placement; however,
     in order to maximize performance, some applications may need fine tuned
     memory management policies.

     We provide a Memory Management Control Interface based on the
     specification of policies for different kinds of operations executed by
     the Virtual Memory Management System. Users are allowed to select a
     policy from a set of available policies for each one of these VM
     operations. Any portion of a virtual address space, down to the level of
     a page, may be connected to a specific policy via a Policy Module.

     A policy module or PM contains the policy methods used to handle each of
     the operations shown in the table below.

       MEMORY OPERATION		 POLICY			 DESCRIPTION
      _______________________________________________________________________
      Initial Allocation   Placement Policy	Determines what physical
						memory node to use when
						memory is allocated
			   Page Size Policy	Determines what virtual page
						size to use to map physical
						memory
			   Fallback Policy	Determines the relative
						importance between placement
						and page size
      _______________________________________________________________________
      Dynamic Relocation   Migration Policy	Determines the aggressiveness
						of memory migration
			   Replication Policy	(not implemented,
						retained for compatibility)
      _______________________________________________________________________
      Paging		   Paging Policy	(not implemented,
						retained for compatibility)

     When the operating system needs to execute an operation to manage a
     section of a process' address space, it uses the methods specified by the
     Policy Module connected (attached) to that section.

     To allocate a physical page, the VM system physical memory allocator
     first determines the page size to be used for the current allocation.
     This page size is acquired using a method provided by the Page Size
     Policy.

									Page 1

mmci(5)								       mmci(5)

     Second, the physical memory allocator calls the method provided by the
     Placement Policy that determines where the page should be allocate from.
     On NUMA systems this method returns a handle identifying the node memory
     should be allocated from. On non-NUMA systems this is treated as a no-op.
     The Placement Policy is described in detail later in this document.

     Now, knowing both the page size and the source node, the physical memory
     allocator calls a per-node memory allocator specifying both parameters.
     If the system finds memory on this node that meets the page size
     requirement, the allocation operation finishes successfully; if not, the
     operation fails, and a fallback method specified by the Fallback Policy
     is called. The fallback method provided by this policy decides whether to
     try the same page size on a different node, a smaller page size on the
     same source node, sleep, or just fail.

     The Fallback Policy to choose depends on the kind of memory access
     patterns an application exhibits. If the application tends to generate a
     tolerable or low level of cache misses, giving locality precedence over
     the page size may make sense; otherwise, especially if the application's
     working set is large, but has reasonable cache behavior, giving the page
     size higher precedence may make sense.

     Once a page has been placed, it stays on its source node until it is
     either migrated to a different node, or paged out and faulted back in.
     On Origin 2000/200 system with dynamic migration enabled, migratability
     of a page is determined by the migration policy. For some applications
     that present a very uniform memory access pattern from beginning to end,
     initial placement may be sufficient and migration can be turned off; on
     the other hand, applications with phase changes may benefit from some
     level of dynamic migration, which has the effect of attracting memory to
     the nodes where it is being used.

     The current version of IRIX provides the policies shown in the table
     below.

									Page 2

mmci(5)								       mmci(5)

	   POLICY TYPE		 POLICY NAME		  ARGUMENTS
	__________________________________________________________________
	Placement Policy     PlacementDefault	    Number Of Threads
			     PlacementFixed	    Memory Locality Domain
			     PlacementFirstTouch    No Arguments
			     PlacementRoundRobin    Roundrobin Mldset
			     PlacementThreadLocal   Application Mldset
			     PlacementCacheColor    Memory Locality Domain
	__________________________________________________________________
	Fallback Policy	     FallbackDefault	    No Arguments
			     FallbackLargepage	    No Arguments
			     FallbackLocal	    No Arguments
	__________________________________________________________________
	Replication Policy   ReplicationDefault	    No Arguments
			     ReplicationOne	    No Arguments
	__________________________________________________________________
	Migration Policy     MigrationDefault	    No Arguments
			     MigrationControl	    migr_policy_uparms_t
			     MigrationRefcnt	    No Arguments
	__________________________________________________________________
	Paging Policy	     PagingDefault	    No Arguments
	__________________________________________________________________
	Page Size Policy     -			    Page size
	__________________________________________________________________

     The following list briefly describes each policy.

     PlacementDefault	   This policy automatically creates and places an MLD
			   for every two processes in a process group on an
			   Origin 2000/200. For the Origin 3000 this policy
			   creates and places an MLD for every four processes
			   in a process group. The number of processes is
			   provided as a passed in argument.  Each process's
			   memory affinity link (memory affinity hint used by
			   the process scheduler) is automatically set to the
			   MLD created on behalf of the process. The MLD(s)
			   estimate a memory size hint based on the size of
			   the address space for the calling process.  Memory
			   is allocated by referencing the MLD being used as
			   the memory affinity link for the currently running
			   process. By using this policy the application does
			   not need to create and place MLD(s) or an MLDSET.

     PlacementFixed	   This policy requires a placed MLD to be passed as
			   an argument. All memory allocation is done using
			   the node where the MLD has been placed.

     PlacementFirstTouch   This policy starts with the creation of one MLD,
			   placing it on the node where creation happened. All
			   memory allocation is done using the node where the
			   MLD has been placed. For this policy to take
			   effect, thread affinity links must be established.
			   In order to set thread affinity links, one of two
			   methods must be employed.  The first method is to
			   set the address space default data policy to
			   PlacementFirstTouch by calling pm_setdefault() in

									Page 3

mmci(5)								       mmci(5)

			   the parent thread prior to a fork or sproc call
			   (see Default Policy Module discussion).  The other
			   method is to use either the process_mldlink() or
			   process_cpulink() function call. This is done just
			   after the new threads are created but prior to
			   allocating memory.  If memory affinity links are
			   not established, then this policy module is
			   ineffective, and the memory allocation is
			   determined by the prior memory affinity link for
			   each thread.

     PlacementRoundRobin   This policy requires a placed MLDSET to be passed
			   as an argument. Memory allocation happens in a
			   round robin fashion over each one of the MLDs in
			   the MLDSET. The policy maintains a round robin
			   pointer that points to the next MLD to be used for
			   memory allocation, which is moved to point to the
			   next MLD in the MLDSET after every successful
			   memory allocation. Note that the round robin
			   operation is done in the time axis, not the space
			   axis.

     PlacementThreadLocal  This policy requires a placed MLDSET to be passed
			   as an argument.  For this policy to take effect,
			   thread affinity links must be established. For a
			   new thread created by a fork or sproc call, the
			   memory affinity link (the target memory node) for
			   each new thread is assigned the next MLD in the
			   MLDLIST in round robin fashion. The result is that
			   every new thread creation will likely obtain a
			   memory affinity link on a different node from the
			   previous thread, in circular fashion. All memory
			   allocation is done using the node where the MLD has
			   been placed.	 In order to set thread affinity
			   links, one of two methods must be employed.	The
			   first method is to set the address space default
			   data policy to PlacementThreadLocal by calling
			   pm_setdefault() in the parent thread prior to a
			   fork or sproc call (see Default Policy Module
			   discussion).	 The other method is to use either the
			   process_mldlink() or process_cpulink() function
			   call. This is done just after the new threads are
			   created but prior to allocating memory.  If memory
			   affinity links are not established, then this
			   policy module is ineffective, and the memory
			   allocation is determined by the prior memory
			   affinity link for each thread.

     PlacementCacheColor   This policy requires a placed MLD to be passed as
			   an argument.	 Memory is allocated using the node
			   where the MLD has been placed, with careful
			   attention to cache coloring relative to the Policy

									Page 4

mmci(5)								       mmci(5)

			   Module instead of the global virtual address space.
			   It is not necessary to establish thread affinity
			   links for this policy module to be effective.

     FallbackDefault	   The default fallback policy has two possible
			   behaviors, depending on the large page wait timeout
			   value. If the page wait timeout value is zero, then
			   the use of large pages is considered to be
			   opportunistic.  This means that locality is given a
			   priority over large pages.

			   If the large page wait timeout value is non zero,
			   the the use of large pages is considered to be a
			   priority over locality. FallbackLargepage
			   methodology is used internally when the default
			   policy has a non zero page wait timeout value. For
			   this case, see the description of FallbackLargepage
			   for details.

			   The remaining description is for fallback actions
			   on opportunistic requests for large pages.  If a
			   large page of the requested size is not available
			   on the requested node, then first attempt to find
			   smaller pages sizes that are available on the
			   requested node. If no smaller large pages sizes are
			   configured or available on the requested node, then
			   base page sizes are considered to fill the request
			   on the requested node.
			   The actions taken when memory is not available on
			   the requested node is:

			   - Try to allocate a base pagesize on other mlds in
			     this mldset (if an mldset is in effect).

			   - Then try to allocate a base pagesize on
			     neighboring nodes that are in the
			     effective_nodemask.

			   - Then try to allocate a base pagesize on
			     neighboring nodes ignoring the
			     effective_nodemask.

			   - Finally try to allocate a base pagesize on
			     neighboring nodes ignoring the
			     effective_nodemask, in effect all nodes in the
			     system.

     FallbackLargepage	   When this fallback policy is selected, we give
			   priority to the page size. We first try to allocate
			   a page of the requested size on a nearby node, then
			   try the next smaller page size on the requested

									Page 5

mmci(5)								       mmci(5)

			   node and nearby neighbor nodes, and finally
			   fallback to a base page only if a large page of
			   this size or lower is not available on any node in
			   the system.
			   The actions taken when memory is not available on
			   the requested node is to consider a group of other
			   nodes in a progressively broader criteria.  The
			   progression is:

			   - Try to allocate the requested pagesize on other
			     mlds in this mldset (if an mldset is in effect).

			   - Then try to allocate the requested pagesize on
			     neighboring nodes that are in the
			     effective_nodemask.

			   - Switch to the next lower large page size and
			     repeat the search sequence of local node, nodes
			     in the mldset, and remaining nodes of the
			     effective nodemask.

			   - If no large pages of any size are available, try
			     to allocate a default pagesize on the requested
			     node.

			   - Then try to allocate the default pagesize on
			     other mlds in this mldset (if available).

			   - Then try to allocate a default pagesize on
			     neighboring nodes that are in the
			     effective_nodemask.

			   - Then try to allocate the requested pagesize on
			     neighboring nodes ignoring the
			     effective_nodemask.

			   - Finally try to allocate a default pagesize on
			     neighboring nodes ignoring the
			     effective_nodemask, in effect all nodes in the
			     system.

     FallbackLocal	   The local fallback policy gives priority to
			   locality. We first try to allocate a base page
			   (16KB in Origin systems) on the requested node. If
			   no memory is available on that node, we borrow from
			   some close neighbor, following a spiral search
			   path. This policy is nearly identical to
			   FallbackDefault, and is kept for compatibility
			   reasons.

									Page 6

mmci(5)								       mmci(5)

			   The actions taken when memory is not available on
			   the requested node is to consider a group of other
			   nodes in a progressively broader criteria.  The
			   progression is:

			   - Try to allocate the requested pagesize on other
			     mlds in this mldset (if an mldset is in effect).

			   - Then try to allocate the requested pagesize on
			     neighboring nodes that are in the
			     effective_nodemask.

			   - Then try to allocate a default pagesize on the
			     requested node.

			   - Then try to allocate the default pagesize on
			     other mlds in this mldset (if available).

			   - Then try to allocate a default pagesize on
			     neighboring nodes that are in the
			     effective_nodemask.

			   - Then try to allocate the requested pagesize on
			     neighboring nodes ignoring the
			     effective_nodemask.

			   - Finally try to allocate a default pagesize on
			     neighboring nodes ignoring the
			     effective_nodemask, in effect all nodes in the
			     system.

     ReplicationDefault	   This policy has not been implemented. Applying the
			   ReplicationDefault policy has no effect.

     ReplicationOne	   This policy has not been implemented. Applying the
			   ReplicationOne policy has no effect.

     MigrationDefault	   This policy is only applicable to Origin 2000
			   systems. Any use of this policy on other platforms
			   has no effect.  When this default migration policy
			   is selected, migration behaves as explained in
			   migration(5) according to the tunable parameters
			   also described in migration(5).

     MigrationControl	   This policy is only applicable to Origin 2000
			   systems. Any use of this policy on other platforms
			   has no effect.  Users can select different
			   migration parameters when using this policy. It
			   takes an argument of type migr_policy_uparms_t
			   shown below.

									Page 7

mmci(5)								       mmci(5)

			   typedef struct migr_policy_uparms {
				   __uint64_t  migr_base_enabled	 :1,
					       migr_base_threshold	 :8,
					       migr_freeze_enabled	 :1,
					       migr_freeze_threshold	 :8,
					       migr_melt_enabled	 :1,
					       migr_melt_threshold	 :8,
					       migr_enqonfail_enabled	 :1,
					       migr_dampening_enabled	 :1,
					       migr_dampening_factor	 :8,
					       migr_refcnt_enabled	 :1; }
			   migr_policy_uparms_t;

			   This structure allows users to override the default
			   migration parameters defined in
			   /var/sysgen/mtune/numa and described in
			   migration(5).

			   - migr_base_enabled enables (1) or disables (0)
			     migration.

			   - migr_base_threshold defines the migration
			     threshold.

			   - migr_freeze_enabled enables (1) or disables (0)
			     freezing.

			   - migr_freeze_threshold defines the freezing
			     threshold.

			   - migr_melt_enabled enables (1) or disables (0)
			     melting.

			   - migr_melt_threshold defines the melting
			     threshold.

			   - migr_enqonfail_enabled is a no-op for IRIX 6.5
			     and earlier.

			   - migr_dampening_enabled enables (1) or disables
			     (0) dampening.

			   - migr_dampening_factor defines the dampening
			     threshold.

			   - migr_refcnt_enabled enables (1) or disables (0)
			     extended reference counters.

     MigrationRefcnt	   This policy is only applicable to Origin 2000
			   systems. Any use of this policy on other platforms
			   has no effect.  This policy turns migration
			   completely off (for the associated section of

									Page 8

mmci(5)								       mmci(5)

			   virtual address space) and enables the extended
			   reference counters.	No arguments are needed.

     PagingDefault	   There is no selectable paging policy. This name is
			   retained for compatibility reasons.

     Page Size		   Users can select any of the page sizes supported by
			   the IRIX kernel being used. For IRIX64 kernels the
			   allowed sizes are:  16KB, 64KB, 256KB, 1024KB
			   (1MB), 4096KB (4MB), and 16384KB (16MB).  For
			   IRIX32 kernels the page size is 4KB only. Selecting
			   any other page size will have no effect.

   Creation of Policy Modules
     A policy module can be created using the following Memory Management
     Control Interface call:

	  typedef struct policy_set {
		  char*	 placement_policy_name;
		  void*	 placement_policy_args;
		  char*	 fallback_policy_name;
		  void*	 fallback_policy_args;
		  char*	 replication_policy_name;
		  void*	 replication_policy_args;
		  char*	 migration_policy_name;
		  void*	 migration_policy_args;
		  char*	 paging_policy_name;
		  void*	 paging_policy_args;
		  size_t page_size;
			    short  page_wait_timeout;
			    short  policy_flags;
	  } policy_set_t;

	  pmo_handle_t pm_create(policy_set_t* policy_set);

     The policy_set_t structure contains all the data required to create a
     Policy Module. For each selectable policy listed above, this structure
     contains a field to specify the name of the selected policy and the list
     of possible arguments that the selected policy may require. The page size
     policy is the exception, for which the specification of the wanted page
     size suffices. Pages of larger sizes reduce TLBMISS overhead and can
     improve the performance of applications with large working sets. Like
     other system resources large pages are not guaranteed to be available in
     the system when the application makes the request. The application has
     two choices. It can either wait for a specified timeout or use a page of
     lower page size. The page_wait_timeout specifies the number of seconds a
     process can wait for a page of the requested size to be available. If the
     timeout value is zero or if the page of the requested size is not
     available even after waiting for the specified timeout the system uses a
     page of a lower page size.	 The policy_flags field allows users to
     specify special behaviors that apply to all the policies that define a

									Page 9

mmci(5)								       mmci(5)

     Policy Module. The only special behavior currently implemented forces the
     memory allocator to prioritize cache coloring over large pages and
     locality, and it can be selected using the flag POLICY_CACHE_COLOR_FIRST.
     For example:

	       policy_set.placement_policy_name = "PlacementFixed";
	       policy_set.placement_policy_args = (void *)mld_handle;
	       policy_set.fallback_policy_name = "FallbackDefault";
	       policy_set.fallback_policy_args = NULL;
	       policy_set.replication_policy_name = "ReplicationDefault";
	       policy_set.replication_policy_args = NULL;
	       policy_set.migration_policy_name = "MigrationDefault";
	       policy_set.migration_policy_args = NULL;
	       policy_set.paging_policy_name = "PagingDefault";
	       policy_set.paging_policy_args = NULL;
	       policy_set.page_size = PM_PAGESZ_DEFAULT;
	       policy_set.page_wait_timeout = 0;
	       policy_set.policy_flags = POLICY_CACHE_COLOR_FIRST;

     This example is filling up the policy_set_t structure to create a PM with
     a placement policy called "PlacementFixed" which takes a Memory Locality
     Domain (MLD) as an argument. All other policies are set to be the default
     policies, including the page size. We also ask for cache coloring to be
     given precedence over locality.

     Since filling up this structure with mostly default values is a common
     operation, we provide a special call to pre-fill this structure with
     default values:

	       void pm_filldefault(policy_set_t* policy_set);

     The pm_create call returns a handle to the Policy Module just created, or
     a negative long integer in case of error, in which case errno is set to
     the corresponding error code. The handle returned by pm_create is of type
     pmo_handle_t. The acronym PMO stands for Policy Management Object. This
     type is common for all handles returned by all the Memory Management
     Control Interface calls. These handles are used to identify the different
     memory control objects created for an address space, much in the same way
     as file descriptors are used to identify open files or devices. Every
     address space contains one independent PMO table. A new table is created
     only when a process execs.

     A simpler way to create a Policy Module is to used the restricted Policy
     Module creation call:

	       pmo_handle_t pm_create_simple(char* plac_name,
					     void* plac_args,
					     char* repl_name,
					     void* repl_args,
					     size_t page_size);

								       Page 10

mmci(5)								       mmci(5)

     This call allows for the specification of only the Placement Policy, the
     Replication Policy and the Page Size. Defaults are automatically chosen
     for the Fallback Policy, the Migration Policy, and the Paging Policy.

   Association of Virtual Address Space Sections
     The Memory Management Control Interface allows users to select different
     policies for different sections of a virtual address space, down to the
     granularity of a page. To associate a virtual address space section with
     a set of policies, users need to first create a Policy Module with the
     wanted policies, as described in the previous section, and then use the
     following MMCI call:

	  int pm_attach(pmo_handle_t pm_handle, void* base_addr, size_t
	  length);

     The pm_handle identifies the Policy Module the user has previously
     created, base_addr is the base virtual address of the virtual address
     space section the user wants to associate to the set of policies, and
     length is the length of the section.

     All physical memory allocated on behalf of a virtual address space
     section with a newly attached policy module follows the policies
     specified by this policy module. Physical memory that has already been
     allocated is not affected until the page is either migrated or swapped
     out to disk and then brought back into memory.

     Only existing address space mappings are affected by this call. For
     example, if a file is memory-mapped to a virtual address space section
     for which a policy module was previously associated via pm_attach, the
     default policies will be applied rather than those specified by the
     pm_attach call.

   Default Policy Module
     A new Default Policy Module is created and inserted in the PMO Name Space
     every time a process execs. This Default PM is used to define memory
     management policies for all freshly created memory regions. This Default
     PM can be later overridden by users via the pm_attach MMCI call.  This
     Default Policy Module is created with the policies listed below:

     * PlacementDefault

     * FallbackDefault

     * ReplicationDefault

     * MigrationDefault

     * PagingDefault

								       Page 11

mmci(5)								       mmci(5)

     * Page size: 16KB

     * Flags: 0

     The Default Policy Module is used in the following situations:

     - At exec time, when we create the basic memory regions for the stack,
       text, and heap.

     - At fork time, when we create all the private memory regions.

     - At sproc time, when we create all the private memory regions (at least
       the stack when the complete address space is shared).

     - When mmapping a file or a device.

     - When growing the stack and we find that the stack's region has been
       removed by the user via unmap, or the user has done a setcontext,
       moving the stack to a new location.

     - When sbreaking and we find the user has removed the associated region
       using munmap, or the region was not growable, anonymous or copy-on-
       write.

     - When a process attaches a portion of the address space of a "monitored"
       process via procfs, and a new region needs to be created.

     - When a user attaches a SYSV shared memory region.

     The Default Policy Module is also stored in the per-process group PMO
     Name space, and therefore follows the same inheritance rules as all
     Policy Modules:  it is inherited at fork or sproc time, and a new one is
     created at exec time.

     Users can select a new default policy module for the stack, text, and
     heap:

	       pmo_handle_t
	       pm_setdefault(pmo_handle_t pm_handle, mem_type_t mem_type);

     The argument pm_handle is the handle returned by pm_create. The argument
     mem_type is used to identify the memory section the user wants to change
     the default policy module for, and it can take any of the following 3
     values:

     o MEM_STACK

     o MEM_TEXT

								       Page 12

mmci(5)								       mmci(5)

     o MEM_DATA

     Users can also obtain a handle to the default PM using the following
     call:

	       pmo_handle_t pm_getdefault(mem_type_t mem_type);

     This call returns a PMO handle referring to the calling process's address
     space default PM for the specified memory type. The handle is greater or
     equal to zero when the call succeeds, and it is less than zero when the
     call fails, and errno is set to the appropriate error code.

   Destruction of a Policy Module
     Policy Modules are automatically destructed when all the members of a
     process group or a shared group have died. However, users can explicitly
     ask the operating system to destroy Policy Modules that are not in use
     any more, using the following call:

	       int pm_destroy(pmo_handle_t pm_handle);

     The argument pm_handle is the handle returned by pm_create. Any
     association to this PM that already exists will remain effective, and the
     PM will only be destroyed when the section of the address space that is
     associated to this PM is also destroyed (unmapped), or when the
     association is overridden via a pm_attach call.

   Policy Status of an Address Space
     Users can obtain the list of policy modules currently associated to a
     section of a virtual address space using the following call:

	       typedef struct pmo_handle_list {
		       pmo_handle_t* handles;
		       uint	     length;
	       } pmo_handle_list_t;

	       int pm_getall(void* base_addr,
			     size_t length,
			     pmo_handle_list_t* pmo_handle_list);

     The argument base_addr is the base address for the section the user is
     inquiring about, length is the length of the section, and pmo_handle_list
     is a pointer to a list of handles as defined by the structure
     pmo_handle_list_t.

     On success, this call returns the effective number of PMs that are being
     used by the specified virtual address space range. If this number is
     greater than the size of the list to be used as a container for the PM
     handles, the user can infer that the specified virtual address space

								       Page 13

mmci(5)								       mmci(5)

     range is using more PMs than we can fit in the list. On failure, this
     call returns a negative integer, and errno is set to the corresponding
     error code.

     Users also have read-only access to the internal details of a PM, using
     the following call:

	       typedef struct pm_stat {
		       char	    placement_policy_name[PM_NAME_SIZE + 1];
		       char	    fallback_policy_name[PM_NAME_SIZE + 1];
		       char	    replication_policy_name[PM_NAME_SIZE + 1];
		       char	    migration_policy_name[PM_NAME_SIZE + 1];
		       char	    paging_policy_name[PM_NAME_SIZE + 1];
		       size_t	    page_size;
		       int	    policy_flags;
		       pmo_handle_t pmo_handle;
	       } pm_stat_t;

	       int pm_getstat(pmo_handle_t pm_handle, pm_stat_t* pm_stat);

     The argument pm_handle identifies the PM the user needs information
     about, and pm_stat is an out parameter of the form defined by the
     structure pm_stat_t.  On success this call returns a non-negative
     integer, and the PM internal data in pm_stat. On error, the call returns
     a negative integer, and errno is set to the corresponding error code.

   Setting the Page Size
     Users can modify the page size of a PM using the following MMCI call:

	       int pm_setpagesize(pmo_handle_t pm_handle, size_t page_size);

     The argument pm_handle identifies the PM the user is changing the page
     size for, and the argument page_size is the requested page size. This
     call changes the page size for the PMs associated with the specified
     section of virtual address space so that newly allocated memory will use
     the new page size.	 On success this call returns a non-negative integer,
     and on error, it returns a negative integer with errno set to the
     corresponding error code.

   Locality Management
     One of the most important goals of memory management for the Origin
     platforms is the maximization of locality. IRIX uses several mechanisms
     to manage locality:

     o IRIX schedules memory in such a way that applications can allocate
       large amounts of relatively close memory pages.

     o IRIX does topology aware initial memory placement.

								       Page 14

mmci(5)								       mmci(5)

     o IRIX provides a topology aware process scheduler that integrates cache
       and memory affinity into the scheduling algorithms.

     o IRIX allows and encourages application writers to provide initial
       placement hints, using high level tools, compiler directives, or direct
       system calls.

     o IRIX allows users to select different policies for the most important
       memory management operations.

     o For Origin 2000 systems only, IRIX implements dynamic memory migration
       to automatically attract memory to those processes that are making the
       heaviest use of a page of memory.

   The Placement Policy
     The Placement Policy defines the algorithm used by the physical memory
     allocator to decide what memory source to use to allocate a page in a
     multi-node CCNUMA machine. The goal of this algorithm is to place memory
     in such a way that local accesses are maximized.

     The optimal placement algorithm would have knowledge of the exact number
     of cache misses that will be caused by each thread sharing the page to be
     placed.  Using this knowledge, the algorithm would place the page on the
     node currently running the thread that will generate most cache misses,
     assuming that the thread always stays on the same node.

     Unfortunately, we do not have perfect knowledge of the future. The
     algorithm has to be based on heuristics that predict the memory access
     patterns and cache misses on a page, or on user provided hints.

     All placement policies are based on two abstractions of physical memory
     nodes:

     o Memory Locality Domains (MLDs)

     o Memory Locality Domain Sets (MLDsets)

   Memory Locality Domains
     A Memory Locality Domain or MLD with center c and radius r is a source of
     physical memory composed of all memory nodes within a "hop distance" r of
     a center node c. Normally, MLDs have radius 0,representing one single
     node.

     MLDs may be interpreted as virtual memory nodes. Normally the application
     writer defining MLDs specifies the MLD radius, and lets the operating
     system decide where it will be centered. The operating system tries to
     choose a center according to current memory availability and other
     placement parameters that the user may have specified such as device
     affinity and topology.

								       Page 15

mmci(5)								       mmci(5)

     Users can create MLDs using the following MMCI call:

	       pmo_handle_t mld_create(int radius, long size);

     The argument radius defines the MLD radius, and the argument size is a
     hint specifying approximately how much physical memory will be required
     for this MLD.  On success this call returns a handle for the newly
     created MLD. On error, this call returns a negative long integer and
     errno is set to the corresponding error code.

     MLDs are not placed when they are created. The MLD handle returned by the
     constructor cannot be used until the MLD has been placed by making it
     part of an MLDset.

     Users can also destroy MLDs not in use any more using the following call:

	       int mld_destroy(pmo_handle_t mld_handle);

     The argument mld_handle is the handle returned by mld_create. On success,
     this call returns a non-negative integer. On error it returns a negative
     integer and errno is set to the corresponding error code.

   Memory Locality Domain Sets
     Memory Locality Domain Sets or MLDsets address the issue of placement
     topology and device affinity.

     Users can create MLDsets using the following MMCI call:

	  pmo_handle_t mldset_create(pmo_handle_t* mldlist, int mldlist_len);

     The argument mldlist is an array of MLD handles containing all the MLDs
     the user wants to make part of the new MLDset, and the argument
     mldlist_len is the number of MLD handles in the array. On success, this
     call returns an MLDset handle. On error, this call returns a negative
     long integer and errno is set to the corresponding error code.

     This call only creates a basic MLDset without any placement information.
     In order to have the operating system place this MLDset, and therefore
     place all the MLDs that are now members of this MLDset, users have to
     specify the wanted MLDset topology and device affinity, using the
     following MMCI call:

	       int mldset_place(pmo_handle_t mldset_handle,
				topology_type_t topology_type,
				raff_info_t* rafflist,
				int rafflist_len,
				rqmode_t rqmode);

								       Page 16

mmci(5)								       mmci(5)

     The argument mldset_handle is the MLDset handle returned by
     mldset_create, and identifies the MLDset the user is placing. The
     argument topology_type specifies the topology the operating system should
     consider in order to place this MLDset, which can be one of the
     following:

     TOPOLOGY_FREE	  This topology specification lets the Operating
			  System decide what shape to use to allocate the set.
			  The Operating System will try to place this MLDset
			  on a cluster of physical nodes as compact as
			  possible, depending on the current system load.

     TOPOLOGY_CUBE	  This topology specification is used to request a
			  cube-like shape.

     TOPOLOGY_CUBE_FIXED  This topology specification is used to request a
			  physical cube.

     TOPOLOGY_PHYSNODES	  This topology specification is used to request that
			  the MLDs in an MLDset be placed in the exact
			  physical nodes enumerated in the device affinity
			  list, described below.

     TOPOLOGY_CPUCLUSTER  This topology specification is used to request the
			  placement of one MLD per CPU instead of the default
			  one MLD per node. In an Origin 3000, the number of
			  cpus on a fully populated node is 4, hence each node
			  can have up to 4 MLDs placed per node. For a node
			  with less than the maximum number of cpus available
			  the number of MLDs placed on that node will not
			  exceed the actual number of CPUs. Also if cpusets
			  are in use, the MLDs will be placed on nodes that
			  are part of the defined cpuset.  This topology is
			  useful when the placement policy is managing cache
			  coloring relative to MLDs instead of virtual memory
			  regions.

     The topology_type_t type shown below is defined in <sys/pmo.h>.

	  /*
	   * Topology types for mldsets
	   */
	  typedef enum {
		  TOPOLOGY_FREE,
		  TOPOLOGY_CUBE,
		  TOPOLOGY_CUBE_FIXED,
		  TOPOLOGY_PHYSNODES,
		  TOPOLOGY_CPUCLUSTER,
		  TOPOLOGY_LAST
	  } topology_type_t;

								       Page 17

mmci(5)								       mmci(5)

     The argument rafflist is used to specify resource affinity. It is an
     array of resource specifications using the structure shown below:

	  /*
	   * Specification of resource affinity.
	   * The resource is specified via a
	   * file system name (dev, file, etc).
	  */
	  typedef struct raff_info {
	       void* resource;
	       ushort reslen;
	       ushort restype;
	       ushort radius;
	       ushort attr;
	  } raff_info_t;

     The fields resource, reslen, and restype define the resource. The field
     resource is used to specify the name of the resource, the field reslen
     must always be set to the actual number of bytes the resource pointer
     points to, and the field restype specifies the kind of resource
     identification being used, which can be any of the following:

     RAFFIDT_NAME This resource identification type should be used for the
		  cases where a hardware graph path name is used to identify
		  the device.

     RAFFIDT_FD	  This resource identification type should be used for the
		  cases where a file descriptor is being used to identify the
		  device.

     The radius field defines the maximum distance from the actual resource
     the user would like the MLDset to be place at. The attr field specified
     whether the user wants the MLDset to be placed close or far from the
     resource:

     RAFFATTR_ATTRACTION The MLDset should be placed as close as possible to
			 the specified device.

     RAFFATTR_REPULSION	 The MLDset should be placed as far as possible from
			 the specified device.

     The argument rafflist_len in the mldset_place call specifies the number
     of raff structures the user is passing via rafflist. There must be at
     least as many raff structures passed as the size of the corresponding
     mldset or the operation will fail and EINVAL will be returned.

     Finally, the rqmode argument is used to specify whether the placement
     request is ADVISORY or MANDATORY:

								       Page 18

mmci(5)								       mmci(5)

	  /*
	   * Request types
	   */
	  typedef enum {
		  RQMODE_ADVISORY,
		  RQMODE_MANDATORY
	  } rqmode_t;

     The Operating System places the MLDset by finding a section of the
     machine that meets the requirements of topology, device affinity, and
     expected physical memory used.

     The mldset_place call returns a non-negative integer on success. On
     error, it returns a negative integer and errno is set to the
     corresponding error code.

     Users can destroy MLDsets using the following call:

	  int mldset_destroy(pmo_handle_t mldset_handle);

     The argument mldset_handle identifies the MLDset to be destroyed. On
     success, this call returns a non-negative integer. On error it returns a
     negative integer and errno is set to the corresponding error code.

   Linking Execution Threads to MLDs
     After creating MLDs and placing them using an MLDset, a user can create a
     Policy Module that makes use of these memory sources, and attach sections
     of a virtual address space to this Policy Module.

     We still need to make sure that the application threads will be executed
     on the nodes where we are allocating memory. To ensure this, users need
     to link threads to MLDs using the following call:

	  int process_mldlink(pid_t pid, pmo_handle_t mld_handle, rqmode_t rqmode);

     The argument pid is the pid of the process to be linked to the MLD
     specified by the argument mld_handle. The rqmode argument is not used and
     is normally RQMODE_ADVISORY.  On success this call will return a non-
     negative integer.	On error it returns a negative integer and errno is
     set to the corresponding error code.

     This call sets up a hint for the process scheduler. However, the process
     scheduler is not required to always run the process on the node specified
     by the mld. The scheduler may decide to temporarily use different cpus in
     different nodes to execute threads to maximize resource utilization.

   Name Spaces For Memory Management Control

								       Page 19

mmci(5)								       mmci(5)

     o The Policy Name Space. This is a global system name space that contains
       all the policies that have been exported and therefore are available to
       users.  The domain of this name space is the set of exported policy
       names, strings of characters such as "PlacementDefault", and its range
       is the corresponding set of policy constructors. When a user creates a
       policy module, he or she has to specify the policies for all selectable
       policies by name. Internally, the operating system searches for each
       name in the Policy Name Space, thereby getting hold of the constructors
       for each of the specified policies, which are used to initialize the
       actual internal policy modules.

     o The Policy Management Object Name Space. This is a per-process group,
       either shared (sprocs) or not shared (forks), name space used to store
       handles for all the Policy Management Objects that have been created
       within the context of any of the members of the process group. The
       domain of this name space is the set of Policy Management Object (PMO)
       handles and its range is the set of references (internal kernel
       pointers) to the PMOs.

       PMO handles can refer to any of several kinds of Policy Management
       Objects:

       - Policy Modules

       - Memory Locality Domains (MLDs)

       - Memory Locality Domain Sets (MLDsets)

     The PMO Name Space is inherited at fork or sproc time, and created at
     exec time.

SEE ALSO
     numa(5), migration(5), mtune(4), /var/sysgen/mtune/numa, refcnt(5),
     nstats(1), sn(1), topology(1), mld(3c), mldset(3c), pm(3c),
     migration(3c), pminfo(3c), numa_view(1), dplace(1), dprof(1).

								       Page 20

[top]

List of man pages available for IRIX

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net