2.3. Development#

Note

This section provides historical context on the development of NUMA-aware claims, including previous implementations and feedback received, to give a better understanding of the design decisions made in the current implementation.

2.3.1. Version history#

The initial implementation of single-node claims by Alejandro Vallejo used the legacy claims hypercall XENMEM_claim_pages and passed a NUMA node in the existing NUMA node bits of xen_memory_reservation.mem_flags. This added the flag d->claim_node to struct domain, which defined the target node for the domain’s claims.

Roger Pau Monné reviewed it and proposed an initial multi-node claim-sets specification that inspired this design:

The interface here seems to be focused on domains only being allowed to allocate from a single node, or otherwise you must first allocate memory from a node before moving to the next one (which defeats the purpose of claims?).

I think we want to instead convert d->outstanding_pages into a per-node array, so that a domain can have outstanding claims for multiple NUMA nodes?

The hypercall interface becomes a bit awkward then, as the toolstack has to perform a different hypercall for each memory claim from a different node (and rollback in case of failure). Ideally we would need to introduce a new hypercall that allows making claims from multiple nodes in a single locked region, as to ensure success or failure in an atomic way.

—Roger Pau Monné

This led to the v2 and v3 series, adding a new hypercall API which designated passing an array of claims. This allowed for a more flexible claim set design targeting multiple NUMA nodes and unpinned claims, but only supported a single claim per domain at that time.

Between v3 and v4, Roger Pau Monné and Andrew Cooper developed and merged several critical fixes for Xen’s overall claims implementation. These fixes also allowed Roger to improve the implementation for redeeming claims during domain memory allocation. With a further suggestion by Bernhard Kaindl, this enabled a fully working implementation that protected claimed memory against parallel allocations by other domain builders.

v4 series

With the v4 series, we submitted the combined work that completed the fixes for protecting claimed memory on NUMA nodes. The review process indicated that supporting multiple claim sets would require a redesign of claim installation and management, which led to this design document.

v5 series

The v5 series implemented the Claim Sets Design Version 1 with support for multiple claim records per domain, and with a terminology of “consuming claims” for the process of redeeming claims during domain memory allocation and “retiring claims” for releasing claims for the low-level action of reducing the number of claimed pages for redeeming claims for an allocation, when destroying a domain or when claims must be recalled if all memory is claimed and then memory is offlined, which is needed to maintain the invariant that claimed memory can never be larger than free memory.

v6 series

The v6 series implemented the Claim Sets Design Version 2. The only difference between the two versions is that with design version 2, the initial term “consuming claims” was changed to “redeeming claims” and the term “retiring claims” was changed to “deducting claims”.

2.3.2. v7 series#

The v7 series will implement the Claim Sets Design Version 3 or newer with further improvements to the design and implementation:

  1. As the code often needs the total sum of claims of a domain, this update keeps domain.outstanding_pages as the total sum of outstanding claims of a domain. This obsoletes the former d->global_claims which only tracked the unbound claims which were not affine to a NUMA node.

  2. Avoid code duplication: Replace domain_set_outstanding_pages(), which handled the legacy claims hypercall XENMEM_claim_pages. The new claim sets hypercall handler domain_install_claim_set(), integrates installing legacy claims for backwards compatibility. The former domain_set_outstanding_pages() is removed to avoid duplicating the logic of installing claims in both places.

  3. Improve the clarity of function and variable names. For example, the new hypercall handler domain_install_claim_set() is more descriptive of its purpose than the former domain_set_outstanding_pages():

    Former function name

    New function name

    domain_set_outstanding_pages()

    domain_set_claim_entries() domain_get_claim_entries()

    deduct_global_claims()

    domain_release_unpinned_claims()

    deduct_node_claims()

    domain_release_node_claims()

2.3.3. Testing#

The basis of the v4 series is included in the XenServer XS9 preview release, and besides functional product testing, it has been tested to meet the performance expectation of customers from improved NUMA placement.

With the v6 series, a comprehensive set of functional system tests was added to the submission. Also, a separate host-side integration test suite for validating the v6 series was posted.

2.3.4. Further development#

Based on review feedback, there is the wish to normalise the page counts of the page allocator to unsigned long. A first patch in this direction was posted to normalise the types of total_avail_pages and outstanding_claims to unsigned long in the page allocator.

2.3.5. Acknowledgements#

The claim sets design builds on the single-node claims implementation described above and the feedback it generated. The following people should be acknowledged for their contributions:

  • Edwin Török for developing the initial best-effort NUMA placement feature in the XAPI toolstack, which inspired the initial implementation of NUMA-aware claims, and his work in productizing and validating the integration of NUMA claims with the XAPI toolstack.

  • Alejandro Vallejo for starting the development of the NUMA claims series.

  • Jan Beulich for providing review suggestions that led to many improvements.

  • Roger Pau Monné for reviewing the initial implementation, proposing the initial multi-node claim-sets specification, developing and merging critical fixes upstream that enabled product-quality support for single-node claims which is the basis of the multi-node claim sets implementation.

  • Andrew Cooper for integrating and validating the work internally, helping to stabilise and productise the single-node implementation.

  • Bernhard Kaindl for collaborating on the single-node implementation, developing the claim sets hypercall since version 2, designing and implementing the multi-node claim sets design, the functional system-level test suite and the host-side integration test suite for validating the claim sets implementation.

  • Marcus Granado for leading the development effort inside XenServer for productising the single-node claims implementation, for providing feedback and suggestions for improving the design and implementation. This included coordinating the work of multiple contributors and stakeholders, integrating the work into XenServer products and ensuring it meets customer requirements.