6.9. Regular Expression (Node/Proc Map) Support

This document describes how PMIx generates and parses the compact “regular expressions” it uses to convey a job’s node map and process map, from the public server APIs down through the MCA preg framework and its components, and how the result is consumed by the gds/hash datastore when a namespace is registered.

For a code-oriented orientation aimed at contributors working inside the framework, each preg directory also carries an AGENTS.md (with a CLAUDE.md symlink): src/mca/preg/AGENTS.md for the framework as a whole, plus one each in native/, raw/, and compress/. This document explains how regex support is integrated into the wider library; those explain each component’s internals in detail.

6.9.1. The Problem

When a launcher or resource manager starts a job, it must tell the PMIx server two things about the layout, and that information must ultimately reach every client process:

  • the node map — the list of node names in the allocation, and

  • the process map — the list of process ranks resident on each node.

On a large machine these lists are enormous and highly regular (node000001,node000002,...). Shipping the raw list to every one of hundreds of thousands of processes is wasteful, so PMIx encodes the list into a compact, self-describing representation on the producing side and expands it again on the consuming side. The preg framework owns both halves of that transform.

Two properties are guaranteed across the encode/decode round trip:

  1. Order preservation — values come out in the same order they went in.

  2. Self-identification — every encoded value is tagged with the name of the scheme that produced it, so a peer can select the correct parser without knowing which components the producer had configured. This is what makes the encoding interoperable across PMIx versions and deployments.

6.9.2. Public APIs

Two generations of API coexist.

6.9.2.1. Current API (pmix_regex2_t)

Declared in include/pmix_server.h:

pmix_status_t PMIx_generate_regex2(const char *input,
                                   pmix_info_t info[], size_t ninfo,
                                   pmix_regex2_t *regex);

pmix_status_t PMIx_parse_regex2(const pmix_regex2_t *regex,
                                pmix_info_t info[], size_t ninfo,
                                char **output);

PMIx_generate_regex2 takes a comma-separated input list and fills in a pmix_regex2_t (defined in include/pmix_common.h.in):

typedef struct pmix_regex2 {
    char    *type;    /* encoding tag: "pmix", "raw", "compress"     */
    uint8_t *bytes;   /* encoded representation (may NOT be a string) */
    size_t   len;     /* number of bytes in `bytes`                  */
} pmix_regex2_t;

PMIx_parse_regex2 reverses it, returning a comma-separated string. The info/ninfo attribute arrays are required by the PMIx API convention but carry no defined attributes yet.

The struct exists because an encoded regex is fundamentally bytes + a length + a type, not a C string — the compress encoding contains embedded NUL bytes. The matching data type is PMIX_REGEX2; helper routines PMIx_Regex2_construct / _destruct / _create / _free manage the struct (implemented in src/mca/bfrops/base/bfrop_base_macro_backers.c).

6.9.2.2. Deprecated API (char *)

Declared in include/pmix_deprecated.h:

pmix_status_t PMIx_generate_regex(const char *input, char **regex);
pmix_status_t PMIx_generate_ppn(const char *input, char **ppn);

These return the encoded form as a plain char * and use the deprecated PMIX_REGEX data type (value 49). They remain because production launchers built against older releases still call them, and because the per-node process map is still carried internally in the string form. New code should prefer PMIx_generate_regex2.

All four entry points are thin wrappers in src/server/pmix_server.c that check pmix_globals.initialized and forward to the corresponding pmix_preg module function:

PMIx_generate_regex   -> pmix_preg.generate_node_regex
PMIx_generate_ppn     -> pmix_preg.generate_ppn
PMIx_generate_regex2  -> pmix_preg.generate_regex
PMIx_parse_regex2     -> pmix_preg.parse_regex

6.9.3. Producer / Consumer Flow

The typical lifecycle of a node map:

[ Host RM / launcher, e.g. PRRTE ]
    PMIx_generate_regex2("node01,node02,...", NULL, 0, &regex)
        └─► pmix_preg.generate_regex()  [smallest encoding wins]
                                                  │
    Load regex into a pmix_info_t keyed PMIX_NODE_MAP (type PMIX_REGEX2)
    (and a PMIX_PROC_MAP for the process layout)
        │
        ▼
    PMIx_server_register_nspace(nspace, ..., info, ninfo, ...)
        └─► stored via the gds framework
                  │
                  ▼
[ gds/hash, when unpacking the job-level info ]
    store_map()  [src/mca/gds/hash/gds_hash.c]
        PMIX_NODE_MAP:
            PMIX_REGEX2 -> pmix_preg.parse_regex(regex2, &decoded)
                           pmix_preg.parse_nodes(decoded, &nodes)
            PMIX_REGEX  -> pmix_preg.parse_nodes(bo.bytes, &nodes)
            PMIX_STRING -> pmix_preg.parse_nodes(string, &nodes)
        PMIX_PROC_MAP: symmetric, using parse_procs
                  │
                  ▼
    The expanded node/rank argv arrays populate the datastore, and the
    per-process job-level keys (PMIX_NODEID, local peers, etc.) are
    derived from them and made available to clients via PMIx_Get.

Note the two-step decode for PMIX_REGEX2 in src/mca/gds/hash/gds_hash.c: parse_regex turns the pmix_regex2_t back into a comma-separated char *, and then the legacy parse_nodes expands any ranges in that string into individual names. The regex2 struct is the transport; the legacy string parser is still the range expander. src/mca/gds/hash/process_arrays.c performs the same parse_nodes / parse_procs expansion when node/proc maps arrive inside a nested info array.

6.9.4. The MCA preg Framework

The framework lives under src/mca/preg/. Its module interface (src/mca/preg/preg.h) is a struct of ten function pointers spanning both API generations:

typedef struct {
    char *name;
    /* legacy char* API */
    pmix_preg_base_module_generate_node_regex_fn_t generate_node_regex;
    pmix_preg_base_module_generate_ppn_fn_t        generate_ppn;
    pmix_preg_base_module_parse_nodes_fn_t         parse_nodes;
    pmix_preg_base_module_parse_procs_fn_t         parse_procs;
    pmix_preg_base_module_copy_fn_t                copy;
    pmix_preg_base_module_pack_fn_t                pack;
    pmix_preg_base_module_unpack_fn_t              unpack;
    pmix_preg_base_module_release_fn_t             release;
    /* current pmix_regex2_t API */
    pmix_preg_base_module_generate_regex_fn_t      generate_regex;
    pmix_preg_base_module_parse_regex_fn_t         parse_regex;
} pmix_preg_module_t;

A component may leave any pointer NULL; the base skips it. preg is a multi-select framework — several components are active at once and a request is offered to each in priority order.

6.9.4.1. Component Selection

src/mca/preg/base/preg_base_select.c queries each component once at init, wraps the returned modules, and inserts them into pmix_preg_globals.actives in descending priority order. If no component is selected it emits the no-plugins show_help topic and returns PMIX_ERR_SILENT — at least one component is required.

Component

Priority

Active when

compress

100

a pcompress (zlib) module supplies compress_string

raw

50

always

native

30

always

6.9.4.2. Dispatch (preg_base_stubs.c)

Every pmix_preg entry points at a pmix_preg_base_* stub in src/mca/preg/base/preg_base_stubs.c. There are three dispatch behaviors:

First-success-winsgenerate_node_regex, generate_ppn, parse_nodes, parse_procs, copy, pack, unpack. Walk the actives in priority order and return on the first PMIX_SUCCESS. If no module claims the request, apply a built-in fallback so the call essentially never fails:

  • generate_*strdup(input) (ship it uncompressed)

  • parse_nodesPMIx_Argv_split(regex, ',')

  • parse_procsPMIx_Argv_split(regex, ';')

  • copystrdup + strlen + 1

  • pack / unpack → serialize as a plain PMIX_STRING

First-success, no fallbackrelease returns PMIX_ERR_BAD_PARAM if no scheme claims the pointer. Always free a regex through the framework that produced it.

Smallest-winsgenerate_regex is the exception: it calls every module’s generate_regex, keeps the candidate with the smallest len, frees the losers, and returns the winner (or PMIX_ERR_NOT_SUPPORTED). This is how the framework automatically selects the most compact encoding without the caller choosing a component. parse_regex returns to first-success, matching on the type tag, and returns PMIX_ERR_NOT_SUPPORTED if no active scheme recognizes it.

A practical consequence: because raw’s generators always succeed and outrank native, the legacy generate path is claimed by compress (if present) or raw, and native’s generator is effectively shadowed in a default build. native still serves as the parser of the pmix[...] format, which external launchers produce.

6.9.5. preg Components

6.9.5.1. native (src/mca/preg/native/)

Implements PMIx’s own range-compressing format, tagged pmix:

Input:   odin009,odin010,odin011,odin012,odin017,odin018,thor176
Output:  pmix[odin[3:9-12,17-18],thor176]

It groups names sharing a prefix, a numeric-field width, and a suffix, and coalesces consecutive integers into start-end ranges. The width (3 above) preserves leading zeros on expansion. A skip flag on each candidate value enforces order preservation when names of different shapes are interleaved. native implements only the legacy string API (generate_regex is NULL); it is the parser of record for pmix[...] data whether produced locally or by an external launcher.

6.9.5.2. raw (src/mca/preg/raw/)

The pass-through, tagged raw:. Its generators simply prepend the tag and always succeed; its parsers strip the tag and PMIx_Argv_split. raw implements the complete module — every legacy function and both regex2 functions — making it the framework’s guaranteed encoder of last resort and the baseline candidate in the smallest-wins generate_regex contest. For input too small or too random to compress, raw wins.

6.9.5.3. compress (src/mca/preg/compress/)

Runs the list through the pcompress (zlib) framework. It disables itself (its component_query returns an error, so no module is selected) when no compression back end is present, so its encoding must never be assumed available. Two tags are involved, one per API generation:

  • Legacy string form — framed as "blob:\0component=zlib:\0size=<N>:\0<N bytes>". This contains embedded NUL bytes, which is why compress implements its own copy/pack/unpack/release rather than relying on the string fallbacks.

  • regex2 form — type = "compress", bytes = raw zlib output, len = its length (no framing needed, since the struct carries the length explicitly).

At priority 100 it is tried first on every legacy generate and almost always wins the smallest-wins regex2 contest — but it is also the component most likely to be absent, so both configurations must be tested.

6.9.6. Wire Format

When a pmix_regex2_t is transmitted between peers it is serialized by pmix_bfrops_base_pack_regex2 (src/mca/bfrops/base/bfrop_base_pack.c) as, in order: the type string, then len as a PMIX_SIZE, then len PMIX_BYTE values (only when len > 0). Unpacking mirrors this. Per the PMIx interoperability rules this field order is frozen — fields may only be appended, never reordered or removed — so that a server and client built against different PMIx versions remain compatible.

The legacy string form is serialized either by a component’s own pack/unpack (which memcpy the exact byte length into the buffer, embedded NULs included) or, in the base fallback, as a plain PMIX_STRING.

6.9.7. Thread Safety

The preg functions are pure, synchronous transforms of their arguments. They allocate and return; they do not thread-shift, block, or mutate shared library state beyond reading the actives list that was built once at init. They are therefore safe to call directly from a progress-thread handler — which is exactly where the server APIs and gds/hash invoke them — and there is no caddy/thread-shift dance in this path. The only ordering constraint is that they must not run concurrently with framework open/close, which happens only at startup and shutdown.

6.9.8. Key Source Files

File

Purpose

include/pmix_server.h

PMIx_generate_regex2 / PMIx_parse_regex2 declarations

include/pmix_common.h.in

pmix_regex2_t and PMIX_REGEX2 definitions

include/pmix_deprecated.h

deprecated PMIx_generate_regex / PMIx_generate_ppn / PMIX_REGEX

src/server/pmix_server.c

public API wrappers into pmix_preg

src/mca/preg/preg.h

preg MCA module interface

src/mca/preg/preg_types.h

pmix_regex_range_t / pmix_regex_value_t helper classes

src/mca/preg/base/preg_base_select.c

component selection / priority ordering

src/mca/preg/base/preg_base_stubs.c

the pmix_preg_base_* dispatch functions

src/mca/preg/native/preg_native.c

the pmix[...] range compressor and parser

src/mca/preg/raw/preg_raw.c

the raw: pass-through

src/mca/preg/compress/preg_compress.c

the blob: / compress zlib encoder

src/mca/gds/hash/gds_hash.c

node/proc map consumer (store_map)

src/mca/bfrops/base/bfrop_base_pack.c

pmix_regex2_t wire serialization