Model Container Contract¶
This reference defines the I/O contract that every model container must honour to be runnable under the multiverse orchestrator. It is the canonical specification; both the built-in models under store/models/ and any third-party model image must conform to it.
Filesystem Boundary¶
| Path | Direction | Required | Description |
|---|---|---|---|
/input/data.h5mu |
read | yes | Dataset materialized by the orchestrator. For single-modality datasets the file still uses MuData with a single modality. |
/output/job_spec.json |
read | yes | Per-job runtime instruction written by the orchestrator before the container starts. |
/output/embeddings.h5 |
write | yes | HDF5 file containing exactly one top-level dataset named latent. |
/output/metrics.json |
write | yes | JSON object with model-level metrics and (optionally) training history. |
/output/umap.png |
write | yes | UMAP scatter of the latent space rendered by the container. |
/output/run.log |
write | yes | Structured log (the multiverse.worker.setup_container_logging helper writes this for you). Honours $MULTIVERSE_LOG_LEVEL. |
Model code must read no other host paths and write to no other host paths.
In addition to run.log, the orchestrator captures the container's raw stdout/stderr to container.log and its own per-run reasoning to orchestrator.log in the same output directory. The container must not write those two files itself.
job_spec.json Schema¶
| Field | Type | Description |
|---|---|---|
seed |
integer | Deterministic seed. The container must apply this to random, numpy, and (where applicable) torch before training. |
dataset_id |
integer | null | Registry primary key. May be null when running offline. |
dataset_name |
string | Dataset display name or slug. |
model_name |
string | Model slug (lowercase) — pca, mofa, multivi, mowgli, cobolt, totalvi. |
hyperparameters |
object | Free-form key-value parameters selected in the GUI or manifest. Must conform to the model's hyperparameter JSON schema. |
run_settings |
object | Experiment-level settings: experiment_name, optional tags. |
Example:
{
"seed": 42,
"dataset_id": 7,
"dataset_name": "pbmc10k",
"model_name": "pca",
"hyperparameters": {"n_components": 20, "device": "cpu"},
"run_settings": {"experiment_name": "pbmc-baselines"}
}
embeddings.h5 Format¶
The number of rows must equal the number of cells in the input. Embedding row ordering must match the obs ordering of /input/data.h5mu.
Compliance check:
import h5py
with h5py.File("/output/embeddings.h5", "r") as f:
latent = f["latent"][:]
assert latent.ndim == 2 and latent.shape[0] > 0 and latent.shape[1] > 0
metrics.json Format¶
{
"model_metrics": {"reconstruction_loss": 0.12, "elbo": -1234.5},
"history": {
"epoch": [1, 2, 3],
"train_loss": [10.0, 5.0, 2.0]
}
}
The model_metrics map should contain finite scalars only. NaN and ±Inf values are sanitised by the tracking layer but degrade comparability across runs. history is optional and consumed by MLflow as a per-epoch metric stream when present.
Container Authoring with multiverse.worker¶
The file contract above is mandatory; the multiverse.worker SDK is an optional Python convenience that provides every helper needed to honour the contract. The expected import surface is:
from multiverse.worker import (
OUTPUT_DIR, # "/output"
load_input_mudata, # reads /input/data.h5mu
load_job_spec, # parses /output/job_spec.json
build_model_config, # resolves hyperparameters with sensible defaults
save_embeddings, # writes /output/embeddings.h5 with the latent matrix
save_umap, # writes /output/umap.png
anndata_concatenate, # multimodal feature concatenation
setup_container_logging, # configures /output/run.log
get_logger, # named logger
EpochLogger, # context manager streaming epoch metrics to MLflow + JSONL
resolve_device, # CPU/CUDA selection
)
Reference implementation (PCA, paraphrased from store/models/pca/container/run.py):
import random, numpy as np, scanpy as sc
from multiverse.worker import (
OUTPUT_DIR, anndata_concatenate, build_model_config, get_logger,
load_input_mudata, load_job_spec, save_embeddings, save_umap,
setup_container_logging,
)
def main() -> None:
setup_container_logging(OUTPUT_DIR)
spec = load_job_spec()
cfg = build_model_config("pca", spec, OUTPUT_DIR)
seed = cfg.get("seed") or 42
random.seed(seed); np.random.seed(seed)
mdata = load_input_mudata()
adata = anndata_concatenate([mdata[m] for m in mdata.mod], list(mdata.mod))
sc.pp.pca(adata, n_comps=cfg["model"]["pca"].get("n_components", 50))
save_embeddings(adata.obsm["X_pca"], OUTPUT_DIR)
save_umap(adata.obsm["X_pca"], adata.obs, OUTPUT_DIR)
Every built-in model image follows this skeleton.
Build Pattern¶
Container Dockerfiles use mambaorg/micromamba and install the SDK from the build context:
FROM mambaorg/micromamba:2.3.0
USER root
WORKDIR /app
COPY store/models/<slug>/container/environment.yml /tmp/environment.yml
RUN micromamba create -y -f /tmp/environment.yml && micromamba clean -afy
ENV PATH=/opt/conda/envs/<env-name>/bin:$PATH
ENV LD_LIBRARY_PATH=/opt/conda/envs/<env-name>/lib:$LD_LIBRARY_PATH
COPY pyproject.toml README.md /tmp/multiverse/
COPY multiverse/ /tmp/multiverse/multiverse/
RUN pip install "/tmp/multiverse[worker]"
COPY store/models/<slug>/container/run.py /app/run.py
ENTRYPOINT ["python", "/app/run.py"]
The build context is the repository root so that the COPY directives above resolve correctly; see the build: block of store/models/<slug>/model.yaml.
Determinism Rules¶
- Apply
job_spec.json:seedtorandom,numpy, andtorchbefore any stochastic operation. - Avoid wall-clock-derived seeds, including UMAP defaults; honour
umap_random_statefrom the hyperparameters when present. - Do not pin to GPU device 0 implicitly; use
resolve_device()and honour thedevicehyperparameter.
Failure Modes¶
| Symptom | Cause | Fix |
|---|---|---|
embeddings.h5 missing on success path |
Container exited before writing outputs. | Wrap I/O in a try/finally; flush before exit. |
latent key missing in HDF5 |
Wrong dataset name. | Use save_embeddings() from multiverse.worker. |
| Row count mismatch with input | Filtering applied after mdata.obs was captured. |
Filter the input once and reuse the same indexing. |
metrics.json invalid JSON |
Manual string concatenation. | Use json.dump and write scalars only. |
| Run unreproducible across hosts | Hidden state in CUDA kernels or library defaults. | Seed all RNGs; set torch.use_deterministic_algorithms(True) where supported. |