Architecture¶
This page is the current system map for Multiverse.
What the System Is¶
Multiverse is built around these pieces:
- Artifact store under
store/, which is the durable scientific record. - mvd kernel, which owns run state transitions, execution supervision (Docker or Slurm), cancellation, validation, and promotion.
- SQLite index (
multiverse_state.db), which gives the GUI fast registry/run listings and is rebuildable from the journal and artifact store. - Asset registry (
asset_registry.db), which holds dataset and model catalog rows and is separate from the run index. - Streamlit GUI, which plans benchmarks and talks to the in-process mvd controller for execution.
- Projection services such as MLflow and Optuna, which are useful comparison surfaces but not the source of run truth.
System Diagram¶
flowchart TD
A[Notebook prepared AnnData/MuData] --> B[Dataset registration]
B --> C[Asset registry DB]
C --> D[Streamlit GUI]
D --> E[run_manifest.yaml]
E --> F[mvd kernel]
F --> G[Resource broker]
F --> H1[Docker supervisor]
F --> H2[Slurm + Apptainer executor]
H1 --> I[Model container]
H2 --> I
I --> J[Workspace]
J --> K[Promotion saga]
K --> L[Verified artifact bundle]
L --> M[rebuild-index → multiverse_state.db]
L --> N[MLflow projection]
L --> O[Evaluation cohort/report]
Repository Layout¶
multiverse/
gui.py Streamlit entry point
cli_entrypoints.py First-class maintenance CLI commands (doctor, rebuild-index, gc, slurm-submit, …)
state_paths.py M1 state-root resolver (MULTIVERSE_STATE_DIR > config > XDG > $HOME/.multiverse)
runner/
cli.py CLI parser: run, register-dataset, init-db, migrate-asset-registry, …
mvd_entrypoint.py Headless mvd-backed run bridge
mvd_inprocess.py GUI in-process mvd controller
mvd/ Kernel, state machine, executor interface
docker_executor.py MvdDockerExecutor (Docker path)
slurm_executor.py MvdSlurmExecutor (Slurm + Apptainer path)
kernel.py KernelConfig, run-state machine
docker_supervisor/ Container engine protocol, RealDockerEngine, labels, leases, cancel saga
slurm/ SlurmEngine Protocol, RealSlurmEngine, InMemorySlurmEngine (fake)
apptainer/ ApptainerEngine Protocol, RealApptainerEngine (with OOM detection)
simple/ Simple-mode runner: contract-only execution without mvd/SQLite/MLflow
client/ Line-delimited JSON protocol for kernel ↔ client RPC
builder.py Docker image build helper (NFS-safe tar, used by register-model --build)
promotion/ Validation/promotion saga and quarantine helpers
artifact/ Artifact manifest, checksums, validators, bundle writer
evaluation/ Launch cohorts, readiness, Docker evaluation runner, reports
journal/ Append-only journal writer/reader
index/ SQLite rebuild support (multiverse_state.db — run index)
index_projection.py Read-only facade over the SQLite run index
asset_registry.py Canonical dataset/model catalog (asset_registry.db)
registry_db.py Legacy shim: kept for backward-compat monkey-patching in tests
gc/ doctor/ projection/ Maintenance and projection commands
registration/ Defensive registration checks
store/
datasets/<slug>/ Dataset manifests and data files
models/<slug>/ Model manifests and build contexts
workspaces/ In-flight workspaces
artifacts/ Promoted immutable run bundles
quarantine/ Recovery evidence requiring user decision
Artifact Store and SQLite¶
The artifact bundle is the scientific contract. A successful bundle includes artifact_manifest.json and artifact_manifest.sha256, plus validated artifact entries with checksums.
For Slurm runs, the manifest carries a dual-digest pair: the OCI registry digest of the source image and the sha256 of the SIF file that was physically executed. This ties the scientific result to both the registry provenance and the exact binary used on the cluster.
SQLite is split into two databases:
multiverse_state.db— the rebuildable run index. It is allowed to be stale or lost;multiverse rebuild-indexreconstructs run visibility from journals and artifact manifests without deleting result-like data. As of schema v4 it also holds areservation_eventstable rebuilt from journalRESERVATION_GRANTED/RESERVATION_RELEASEDrecords.asset_registry.db— the dataset and model catalog. Written only byasset_registry.py; never rebuilt from scratch (it is authoritative, not derived). Migrate from a pre-split install withmultiverse migrate-asset-registry.
The sole-writer invariant (test_sqlite_writer_isolation.py) enforces that raw SQL mutations appear only in the designated writer modules (index/, index_projection, asset_registry, registry_db, models_ingest). This is a CI gate.
Launch Evaluation¶
Each mvd-backed launch writes a cohort under <output-dir>/.multiverse/launches/<launch_id>/. The cohort records every planned member, including skipped/resumed members, submitted attempt IDs, artifact directories, dataset paths, batch_key, label_key, and requested metrics.
Evaluation is a separate containerized workflow. The host resolves readiness, writes a trimmed eval_config.json for ready members, mounts datasets/artifacts read-only and the output tree read-write, then runs multiverse-evaluate. The container writes evaluations/<member_id>.json files and a derived evaluation_report.json under the launch directory. scIB plots are stored under plots/dataset_<dataset_slug>/. Promoted artifact directories are not mutated by evaluation.
Readiness statuses (ready, running, training_failed, cancelled, not_submitted, missing_artifact_dir, bad_artifact_manifest, no_embeddings, missing_dataset, unsupported_dataset) are pre-evaluation. Evaluation statuses (pending, running, done, training_failed, not_ready, no_embeddings, missing_dataset, bad_manifest, obs_mismatch, unsupported_dataset, evaluation_failed) are per-member outcomes in the report.
Container Boundary¶
Every model container uses the same contract:
| Path | Contents |
|---|---|
/input/data.h5mu |
Read-only dataset mount. |
/output/job_spec.json |
Runtime instruction: dataset slug, model version, hyperparameters, seed. |
/output/ |
Writable model outputs. |
Host paths do not appear inside model code.
Execution Ownership¶
The GUI and CLI do not directly supervise containers. They submit work through the mvd kernel path. The kernel composes:
- resource admission (broker);
- container launch/reconcile — via
RealDockerEngineon workstations, or viaRealSlurmEngine+RealApptainerEngineon HPC clusters; - explicit state transitions with an append-only journal;
- cancellation saga;
- output validation;
- atomic promotion saga;
- projection status reporting.
The two executors have different defaults for image identity:
- Docker executor —
accept_degraded=Trueby default. Locally-built images (make build-pca) are the normal development workflow; no OCI digest is expected. Pass--strictto opt into publication mode, which requires a registry digest. - Slurm executor —
accept_degraded=Falseby default. HPC runs should have a verified OCI source digest; a SIF of unknown provenance is genuinely degraded. Pass--accept-degradedif you need to run an unverified SIF.