3.8. Claims Implementation

Note

This part describes implementation details of claims and their interaction with memory allocation in Xen. It covers the functions and data structures involved in installing claims, allocating memory with claims, and handling related edge cases.

Functions related to the implementation of claims and their interaction with memory allocation.

3.8.1. Installation of claims

This section describes the functions and data structures involved in installing claims for domains and the internal functions for validating and installing claim sets.

3.8.1.1. xc_domain_claim_memory()

int xc_domain_claim_memory(xc_interface *xch, uint32_t domid, uint32_t nr_claims, memory_claim_t *claims)
Parameters:
  • xch (xc_interface*) – The libxenctrl interface to use for the hypercall

  • domid (uint32_t) – The ID of the domain for which to install the claim set

  • nr_claims (uint32_t) – The number of claims in the claim set

  • claims (memory_claim_t*) – The claim set to install for the domain

Returns:

0 on success, or a negative error code on failure.

Wrapper for XEN_DOMCTL_claim_memory to install claim sets for a domain.

3.8.1.2. domain_set_outstanding_pages()

int domain_set_outstanding_pages(struct domain *d, unsigned long pages)
Parameters:
  • d (struct domain*) – The domain for which to set the outstanding claims

  • pages (unsigned long) – The number of pages to claim globally for the domain

Returns:

0 on success, or a negative error code on failure.

Handles claim installation for XENMEM_claim_pages and XEN_DOMCTL_claim_memory with XEN_DOMCTL_CLAIM_MEMORY_LEGACY by setting the domain’s global claims to the specified number of pages. It calculates the claims as the requested pages minus the domain’s total pages. When pages == 0, it clears the claims of the domain.

3.8.1.3. domain_set_node_claims()

int domain_set_node_claims(struct domain *d, unsigned int nr_claims, memory_claim_t *claims)
Parameters:
  • d (struct domain*) – The domain for which to set the node claims

  • nr_claims (unsigned int) – The number of claims in the claim set

  • claims (memory_claim_t*) – The claim set to install for the domain

Returns:

0 on success, or a negative error code on failure.

Handles installing claim sets. It performs the validation of the claim set and updates the domain’s claims accordingly.

The function works in four phases:

  1. Validating claim entries and checking node-local availability

  2. Validating total claims and checking global availability

  3. Resetting any current claims of the domain

  4. Installing the claim set as the domain’s claiming state

Phase 1 checks claim entries for validity and memory availability:

  1. Target must be XEN_DOMCTL_CLAIM_MEMORY_GLOBAL or a node.

  2. Each target node may only appear once in the claim set.

  3. For node-local claims, requested pages must not exceed the available memory on that node after accounting for existing claims.

  4. The explicit padding field must be zero for forward compatibility.

Phase 2 checks:

  1. The sum of claims must not exceed globally available memory.

  2. The claims must not exceed the domain.max_pages limit. See Claims Accounting and Redeeming Claims for the accounting checks that enforce the domain’s domain.max_pages limit.

3.8.2. Helper functions for managing claims

3.8.2.1. deduct_global_claims()

unsigned long deduct_global_claims(struct domain *d, unsigned long pages_to_deduct)
Parameters:
  • d (struct domain*) – The domain for which to deduct the global claims

  • pages_to_deduct (unsigned long) – The number of pages to deduct

Returns:

The number of pages actually deducted from the global claim.

This function deducts the specified number of globally claimed pages and updates the global outstanding totals accordingly.

3.8.2.2. deduct_node_claims()

unsigned long deduct_node_claims(struct domain *d, nodeid_t node, unsigned long pages_to_deduct)
Parameters:
  • d (struct domain*) – The domain for which to deduct the node claim

  • node (nodeid_t) – The node for which to deduct the claim

  • pages_to_deduct (unsigned long) – The number of pages to deduct from the claim

Returns:

The number of pages actually deducted from the claim

This function deducts a specified number of pages from a domain’s claim on a specific node. It limits the deduction to the number of pages actually claimed by the domain on that node and updates the node-local claims currently held by the domain on that node, and it updates the global and node-level claim state accordingly.

3.8.2.3. cancel_all_node_claims()

void cancel_all_node_claims(struct domain *d)
Parameters:
  • d (struct domain*) – The domain for which to release all node-specific claims.

This function is used by domain_set_outstanding_pages() to release all node-specific claims of the domain’s claiming state.

3.8.3. Allocation with claims

The functions below play a key role in allocating memory for domains.

3.8.3.1. xc_domain_populate_physmap()

int xc_domain_populate_physmap(xc_interface *xch, uint32_t domid, unsigned long nr_extents, unsigned int extent_order, unsigned int mem_flags, xen_pfn_t *extent_start)
Parameters:
  • xch (xc_interface*) – The libxenctrl interface

  • domid (uint32_t) – The ID of the domain

  • nr_extents (unsigned long) – Number of extents

  • extent_order (unsigned int) – Order of the extents

  • mem_flags (unsigned int) – Allocation flags

  • extent_start (xen_pfn_t*) – Starting PFN

Returns:

0 on success, or a negative error code on failure.

This function is a wrapper for the XENMEM_populate_physmap hypercall, which is handled by the populate_physmap() function in the hypervisor. It is used by libxenguest for populating the guest physical memory of a domain. domain builders can set the NUMA node affinity and pass the preferred node to this function to steer allocations towards the preferred NUMA node(s) and let claims ensure that the memory will be available even in cases of parallel domain builds where multiple domains are being built at the same time.

3.8.3.2. populate_physmap()

The meminit API calls xc_domain_populate_physmap() for populating the guest physical memory. It invokes the restartable XENMEM_populate_physmap hypercall implemented by populate_physmap().

void populate_physmap(struct memop_args *a)
Parameters:
  • a (struct memop_args*) – Provides status and hypercall restart info

Allocates memory for building a domain and uses it for populating the physmap. For allocation, it uses alloc_domheap_pages(), which forwards the request to alloc_heap_pages().

During domain creation, it adds the MEMF_no_scrub flag to the request for populating the physmap to optimize domain startup by allowing the use of unscrubbed pages.

When that happens, it scrubs the pages as needed using hypercall continuation to avoid long hypercall latency and watchdog timeouts.

Domain builders can optimise on-demand scrubbing by running physmap population pinned to the domain’s NUMA node, keeping scrubbing local and avoiding cross-node traffic.

3.8.3.3. alloc_heap_pages()

struct page_info *alloc_heap_pages(unsigned int zone_lo, unsigned int zone_hi, unsigned int order, unsigned int memflags, struct domain *d)
Parameters:
  • zone_lo (unsigned int) – The lowest zone index to consider for allocation

  • zone_hi (unsigned int) – The highest zone index to consider for allocation

  • order (unsigned int) – The order of the pages to allocate (2^order pages)

  • memflags (unsigned int) – Memory allocation flags that may affect the allocation

  • d (struct domain*) – The domain for which to allocate memory or NULL

Returns:

The allocated page_info structure, or NULL on failure

This function allocates a contiguous block of pages from the heap. It checks claims and available memory before attempting the allocation. On success, it updates relevant counters and redeems claims as necessary.

It first checks whether the request can be satisfied given the domain’s claims and available memory using claims_permit_request(). If claims and availability permit the request, it calls get_free_buddy() to find a suitable block of free pages while respecting node and zone constraints.

If MEMF_no_scrub is allowed, it may return unscrubbed pages. When that happens, populate_physmap() scrubs them if needed with hypercall continuation to avoid long hypercall latency and watchdog timeouts.

Simplified pseudo-code of its logic:

struct page_info *alloc_heap_pages(unsigned int zone_lo,
                                   unsigned int zone_hi,
                                   unsigned int order,
                                   unsigned int memflags,
                                   struct domain *d) {
    /* Check whether claims and available memory permit the request.
     * `avail_pages` and `claims` are placeholders for the appropriate
     * global or node-local availability/counts used by the real code. */
    if (!claims_permit_request(d, avail_pages, claims, memflags,
                               1UL << order, NUMA_NO_NODE))
        return NULL;

    /* Find a suitable buddy block. Pass the zone range, order and
     * memflags so the helper can apply node and zone selection. */
    pg = get_free_buddy(zone_lo, zone_hi, order, memflags, d);
    if (!pg)
        return NULL;

    redeem_claims_for_allocation(d, 1UL << order, node_of(pg));
    update_counters_and_stats(d, order);
    if (pg_has_dirty_pages(pg))
        scrub_dirty_pages(pg);
    return pg;
}

3.8.3.4. get_free_buddy()

struct page_info *get_free_buddy(unsigned int zone_lo, unsigned int zone_hi, unsigned int order, unsigned int memflags, const struct domain *d)
Parameters:
  • zone_lo (unsigned int) – The lowest zone index to consider for allocation

  • zone_hi (unsigned int) – The highest zone index to consider for allocation

  • order (unsigned int) – The order of the pages to allocate (2^order pages)

  • memflags (unsigned int) – Flags for conducting the allocation

  • d (struct domain*) – domain to allocate memory for or NULL

Returns:

The allocated page_info structure, or NULL on failure

This function finds a suitable block of free pages in the buddy allocator while respecting claims and node-level available memory.

Called by alloc_heap_pages() after verifying the request is permissible, it iterates over nodes and zones to find a buddy block that satisfies the request. It checks node-local claims before attempting allocation from a node.

Using claims_permit_request(), it checks whether the node has enough unclaimed memory to satisfy the request or whether the domain’s claims can permit the request on that node after accounting for outstanding claims.

If the node can satisfy the request, it searches for a suitable block in the specified zones. If found, it returns the block; otherwise it tries the next node until all online nodes are exhausted.

Simplified pseudo-code of its logic:

/*
 * preferred_node_or_next_node() represents the policy to first try the
 * preferred/requested node then fall back to other online nodes.
 */
struct page_info *get_free_buddy(unsigned int zone_lo,
                                 unsigned int zone_hi,
                                 unsigned int order,
                                 unsigned int memflags,
                                 const struct domain *d) {
    nodeid_t request_node = MEMF_get_node(memflags);

    /*
     * Iterate over candidate nodes: start with preferred node (if any),
     * then try other online nodes according to the normal placement policy.
     */
    while (there are more nodes to try) {
        nodeid_t node = preferred_node_or_next_node(request_node);
        if (!node_allocatable_request(d, node_avail_pages[node],
                                      node_outstanding_claims[node],
                                      memflags, 1UL << order, node))
            goto try_next_node;

        /* Find a zone on this node with a suitable buddy */
        for (int zone = highest_zone; zone >= lowest_zone; zone--)
            for (int j = order; j <= MAX_ORDER; j++)
                if ((pg = remove_head(&heap(node, zone, j))) != NULL)
                    return pg;
     try_next_node:
        if (request_node != NUMA_NO_NODE && (memflags & MEMF_exact_node))
            return NULL;
        /* Fall back to the next node and repeat. */
    }
    return NULL;
}

3.8.4. Helper functions for allocation with claims

For allocating memory while respecting claims, alloc_heap_pages() and get_free_buddy() use claims_permit_request() to check whether the claims permit the request before attempting allocation.

If permitted, the allocation proceeds, and after success, redeem_claims_for_allocation() redeems the claims for the allocation based on the domain’s claiming state and the node of the allocation.

See Key design decisions for the rationale behind this design and the accounting checks that enforce the domain.max_pages limit during allocation with claims.

3.8.4.1. claims_permit_request()

bool claims_permit_request(const struct domain *d, unsigned long avail_pages, unsigned long claims, unsigned int memflags, unsigned long request, nodeid_t node)
Parameters:
  • d (const struct domain*) – domain for which to check

  • avail_pages (unsigned long) – pages available globally or on node

  • claims (unsigned long) – outstanding claims globally or on node

  • memflags (unsigned int) – memory allocation flags for the request

  • request (unsigned long) – pages requested for allocation

  • node (nodeid_t) – node of the request or NUMA_NO_NODE for global

Returns:

true if claims and available memory permit the request, false otherwise.

This function checks whether a memory allocation request can be satisfied given the current state of available memory and outstanding claims for the domain. It calculates the amount of unclaimed memory and determines whether it is sufficient to satisfy the request.

If unclaimed memory is insufficient, it checks if the domain’s claims can cover the shortfall, taking into account whether the request is node-specific or global.

3.8.4.2. redeem_claims_for_allocation()

void redeem_claims_for_allocation(struct domain *d, unsigned long allocation, nodeid_t alloc_node)
Parameters:
  • d (struct domain*) – The domain for which to redeem claims

  • allocation (unsigned long) – The number of pages allocated

  • alloc_node (nodeid_t) – The node on which the allocation was made

See Redeeming Claims for details on redeeming claims after allocation.

3.8.5. Offlining memory in presence of claims

When offlining pages, Xen must ensure that available memory on a node or globally does not fall below outstanding claims. If it does, Xen recalls claims from domains until accounting is valid again.

This is triggered by privileged domains via the XEN_SYSCTL_page_offline_op sysctl or by machine-check memory errors.

Offlining currently allocated pages does not immediately reduce available memory: pages are marked offlining and become offline only when freed. Pages marked offlining will not become available again, so this does not affect claim invariants.

However, when already free pages are offlined, free memory can drop below outstanding claims; in that case the offlining process calls reserve_offlined_page() to offline the page.

It checks whether offlining the page would cause available memory on the page’s node, or globally, to fall below the respective outstanding claims:

This can violate claim guarantees, but it is necessary to maintain system stability when memory must be offlined.

3.8.5.1. reserve_offlined_page()

int reserve_offlined_page(struct page_info *head)
Parameters:
  • head (struct page_info*) – The page being offlined

Returns:

0 on success, or a negative error code on failure.

This function is called during the offlining process to offline pages.

If offlining a page causes available memory to fall below outstanding claims, it checks the node and global claim accounting and recalls claims from domains as necessary to ensure accounting invariants hold after a buddy is offlined.