3.7. Claims Accounting

Note

Claims accounting state is only updated while holding heap_lock. See Locking of claims accounting for details on the locks used to protect claims accounting state.

This section formalizes the internal state and invariants that Xen must maintain to ensure correctness.

For readers following the design in order, the preceding sections are:

  1. Claims Design introduces the overall model and goals.

  2. Claim Installation Paths explains how claim sets are installed.

  3. Protection of Claims describes how claimed memory is protected during allocation.

  4. Redeeming Claims explains how claims are redeemed as allocations succeed.

3.7.1. Overview

Table 1: Claims accounting - All accesses, Aggregate state, and invariants protected by heap_lock.

Level

Claims must be lower or equal to

Available memory

Node

node_outstanding_claims[node]

Aggregate state:

Over all domains:

SUM(domain.claims[node])

node_avail_pages[node]

Global

outstanding_claims =

Aggregate state:

SUM() over all domains:

domain.global_claims + domain.node_claims

Also, the sum over all nodes:

node_outstanding_claims[*]

total_avail_pages =

Aggregate state:

SUM() over all nodes:

node_avail_pages[]

Dom global

domain.global_claims

total_avail_pages

Dom per-node

domain.claims[node]

node_avail_pages[node]

Dom slow tot

domain.global_claims + SUM(domain.claims[node])

total_avail_pages

Aggregate:

domain.node_claims = SUM(domain.claims[node])

Domain total

domain.global_claims + domain.node_claims

total_avail_pages

Domain mem

domain_tot_pages(domain)
Invariant: must be

lower or equal to

domain.max_pages

3.7.2. Claims accounting state

When installing claims and redeeming them for allocation, the page allocator redeems the allocation’s claims by deducing the claimed pages from per-node claims and if not sufficient to cover the allocation, from global claims as a fallback. See Redeeming Claims for details on redeeming claims during allocation.

domain.claims[MAX_NUMNODES]

The domain’s claims for specific NUMA nodes, indexed by node ID.

domain.global_claims

The domain’s global claim.

3.7.3. Aggregate state

Xen also maintains aggregate state for fast checks in allocator hot paths:

outstanding_claims:

The sum of all claims across all domains for global and node claims.

node_outstanding_claims[MAX_NUMNODES]:

The sum of all claims across all domains for specific NUMA nodes, indexed by node ID, used for efficient checks in the allocator hot paths to ensure that node claims do not exceed the available memory on the respective node.

domain.node_claims:

The total of the domain’s node claims, equal to the sum of domain.claims[MAX_NUMNODES] for all nodes and used for efficient checks in the allocator.

domain_tot_pages(domain)

The total pages allocated to the domain, used for validating that claims do not exceed the domain’s maximum page limits. This is the sum of the domain’s global claim and node claims, i.e. domain.global_claims + domain.node_claims.

3.7.4. Claims accounting invariants

Xen must maintain the following invariants:

3.7.5. Locking of claims accounting

rspinlock_t page_alloc_lock
spinlock_t heap_lock

Lock for all heap operations including claims. It protects the claims state and invariants from concurrent updates and ensures that checks in the allocator hot paths see a consistent view of the claims state.

If domain.page_alloc_lock is needed to check domain_tot_pages(domain) on top of new claims against domain.max_pages for the domain, it needs to be taken before heap_lock for consistent locking order to avoid deadlocks.

3.7.6. Variables and data structures

typedef uint8_t nodeid_t

Type for NUMA node IDs. The memflags variable of xc_populate_physmap() and related functions for populating the physmap allocates 8 bits in the flags for the node ID, which limits the theoretical maximum value of CONFIG_NR_NUMA_NODES at 254, which is far beyond the current maximum of 64 supported by Xen and should be sufficient for the foreseeable future.

MAX_NUMNODES

The maximum number of NUMA nodes supported by Xen. Used for validating node IDs in the memory_claim_t entries of claim sets. When Xen is built without NUMA support, it is 1. The default on x86_64 is 64 which is sufficient for current hardware and allows for efficient storage of e.g. the node_online_map for online nodes and domain.node_affinity in a single 64-bit value, and in the domain.claims[MAX_NUMNODES] array.

xen/arch/Kconfig limits the maximum number of NUMA nodes to 64. While Xen can be compiled for up to 254 nodes, configuring machines to split the installed memory into more than 64 nodes would be unusual. For example, dual-socket servers, even when using multiple chips per CPU package should typically be configured for 2 NUMA nodes by default.

long total_avail_pages

Total available pages in the system, including both free and claimed pages. This is used for validating that global claims do not exceed the total available memory in the system.

long outstanding_claims

The total global claims across all domains. This is maintained for efficient checks in the allocator hot paths to ensure the global claims invariant that total claims do not exceed the total available memory is not violated.

long node_avail_pages[MAX_NUMNODES]

Available pages for each NUMA node, including both free and claimed pages. This is used for validating that node claims do not exceed the available memory on the respective NUMA node.

long node_outstanding_claims[MAX_NUMNODES]

The total claims across all domains for each NUMA node, indexed by node ID. This is maintained for efficient checks in the allocator hot paths.

domain_tot_pages(domain)

The total pages allocated to the domain, used for validating that this allocation and the domain’s claims do not exceed domain.max_pages.

struct domain
unsigned int global_claims

The domain’s global claim, representing the number of pages claimed globally for the domain.

unsigned int node_claims

The total of the domain’s node claims, equal to the sum of claims for all nodes. It is maintained for efficient checks in the allocator hot paths without needing to sum over the per-node claims each time.

unsigned int claims[MAX_NUMNODES]

The domain’s claims for each NUMA node, indexed by node ID.

As domain is allocated using a dedicated page for each domain, this allows for efficient and fast storage with direct indexing without consuming any additional memory for an additional allocation.

The page allocated for struct domain is large enough to accommodate this array several times, even beyond the current MAX_NUMNODES limit of 64, so it should be sufficient even for future expansion of the maximum number of supported NUMA nodes if needed. The allocation has a build-time assertion for safety to ensure that struct domain fits within the allocated page.

The sum of these claims is stored in domain.node_claims for efficient checks in the allocator hot paths which need to know the total number of node claims for the domain.

unsigned int max_pages

The maximum number of pages the domain is allowed to claim, set at domain creation time.

rspinlock_t page_alloc_lock

Lock for checking domain_tot_pages(domain) on top of new claims against domain.max_pages when installing these new claims. This is a recursive spinlock to allow for nested calls into the allocator while holding it, such as when redeeming claims during page allocation. It is taken before heap_lock when installing claims to ensure a consistent locking order and may not be taken while holding heap_lock to avoid deadlocks.

nodemask_t node_affinity

A nodemask_t representing the set of NUMA nodes the domain is affine to. This is used for efficient checks in the allocator hot paths to quickly get the set of nodes a domain is affine to for memory allocation decisions.

type nodemask_t

A bitmap representing a set of NUMA nodes, used for status information like node_online_map and the domain.node_affinity and to track which nodes are online and which nodes are in a domain’s node affinity.

nodemask_t node_online_map

A bitmap representing which NUMA nodes are currently online in the system. This is used for validating that claims are only made for online nodes and for efficient checks in the allocator hot paths to quickly determine which nodes are online. Currently, Xen does not support hotplug of NUMA nodes, so this is set at boot time based on the platform firmware configuration and does not change at runtime.

3.7.7. Claims Accounting Diagram

This diagram illustrates the claims accounting state and the invariants:

        %% SPDX-License-Identifier: CC-BY-4.0
%% Claim variables and their Invariants
flowchart TD

subgraph "Access&nbsp;under&nbsp;the&nbsp;<tt><b>heap_lock</b></tt>&nbsp;only:"
   direction TB
   Memory_of_Nodes --"&nbsp; Contribute to &nbsp;"--> Overall_Memory
   Overall_Memory --"&nbsp; Available to &nbsp;"--> Memory_of_Domains
end

subgraph Memory_of_Nodes["Per-node claims and available memory"]
    direction LR
    per_node_claims -->|"&nbsp; less or equal to &nbsp;"| node_avail_pages
    per_node_claims["Claims on the node:
                     <tt>node_outstanding_claims[n]"]
    node_avail_pages["Available pages on the node:
                      <tt>node_avail_pages[n]"]
end

subgraph Overall_Memory["Overall claims and available memory"]
    direction LR
    outstanding -->|"&nbsp; less or equal to &nbsp;"| avail_pages
    outstanding["Total claims on the host:
                 <tt>outstanding_claims"]
    avail_pages["Available pages on the host:
                 <tt>total_avail_pages"]
end

subgraph Memory_of_Domains["Per-domain&nbsp;claims and available memory"]
    direction LR
    claims -->|"&nbsp; less or equal to &nbsp;"| available_memory_for_domains
    claims["Claims of the domain:<br><tt>d->claims[n]
                                         d->global_claims"]
    available_memory_for_domains["Available pages:<br><tt>node_avail_pages[n]
                                                          total_avail_pages"]
end
    

Diagram: Claims accounting state and invariants