Getting Started¶
This tutorial walks through a first multiverse benchmark, from a Jupyter-prepared object to a model embedding you can inspect again in Scanpy. The guiding idea is to keep the biology in your notebook and let multiverse handle the repeatable execution between curation and interpretation.
What You Will Do¶
- Save a small
AnnDataorMuDataobject from Jupyter. - Register it in the Streamlit GUI.
- Configure a benchmark plan.
- Launch the run.
- Evaluate completed artifacts in the Run tab.
- Read
embeddings.h5back into Jupyter.
Before You Start¶
Install dependencies, initialize the registry, optionally start observability services, then launch the GUI:
make bootstrap # uv sync --group dev + init registry + register built-in models
make register-all-datasets # add all the datasets
make services-up # optional: MLflow on :25000, Optuna Dashboard on :28080
make setup # optional: GUI and ML model wrapper extras (Streamlit, Scanpy, scvi-tools)
make build-evaluate # optional: prebuild the evaluation image used by Evaluate
make gui # Streamlit on :28501
Open http://localhost:28501 (or the STREAMLIT_PORT in .env). You do not need to run docker commands by hand during normal use; the mvd-backed runner manages model containers on your behalf.
The same setup can be driven directly through the canonical CLI:
uv run multiverse init-db
uv run multiverse register-model --slug pca
uv run multiverse register-model --slug multivi
uv run multiverse register-dataset --slug pbmc_rna
uv run multiverse run --manifest run_manifest.yaml --output store/artifacts/run_output
Step 1: Prepare Data in Jupyter¶
For a single-modality RNA baseline:
from pathlib import Path
import scanpy as sc
# adata = sc.read_h5ad("my_project/processed_pbmc.h5ad")
adata.obs["batch"] = adata.obs["donor_id"].astype(str)
adata.obs["cell_type"] = adata.obs["manual_annotation"].astype(str)
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, n_top_genes=3000)
dataset_dir = Path("store/datasets/pbmc_rna")
(dataset_dir / "data").mkdir(parents=True, exist_ok=True)
adata.write_h5ad(dataset_dir / "data" / "rna.h5ad")
For multimodal RNA+ATAC data, save a MuData object:
from pathlib import Path
import mudata as md
dataset_dir = Path("store/datasets/pbmc_multiome")
(dataset_dir / "data").mkdir(parents=True, exist_ok=True)
mdata = md.MuData({"rna": adata_rna, "atac": adata_atac})
mdata.obs["batch"] = adata_rna.obs["donor_id"].astype(str)
mdata.obs["cell_type"] = adata_rna.obs["cell_type"].astype(str)
mdata.write_h5mu(dataset_dir / "data" / "processed.h5mu")
Step 2: Create a Dataset Manifest¶
The manifest describes what you saved and lets multiverse register the dataset consistently.
import yaml
manifest = {
"name": "PBMC RNA",
"omics": ["rna"],
"raw_files": {"rna": "data/rna.h5ad"},
"metadata_keys": {"batch": "batch", "cell_type": "cell_type"},
}
with open("store/datasets/pbmc_rna/dataset.yaml", "w") as f:
yaml.safe_dump(manifest, f, sort_keys=False)
The batch key identifies the technical or donor grouping that batch-correction metrics will evaluate. The cell_type key identifies biological labels used by supervised metrics. If either column is absent, multiverse logs the value "unknown" for affected cells and disables the metrics that depend on the missing column. It does not silently invent biological labels.
See Data Preparation for additional recipes (RNA+ATAC, RNA+ADT).
Step 3: Register the Dataset¶
In the GUI:
- Open the Registry tab.
- Expand Register New Dataset.
- Enter
store/datasets/pbmc_rna/dataset.yamlin Path to dataset.yaml, or switch on Build manifest from fields and fill the form. - Click Register Dataset, then Refresh Registry.
- Confirm your dataset appears with status
READY.
The CLI equivalent, useful for scripted workflows:
Step 4: Configure the Benchmark¶
- Open the Configure tab.
- Review the compatibility matrix. Only
Compatiblecells are selectable. - Select the dataset × model pairs you want to run.
- Adjust hyperparameters in the per-row forms — typed controls are rendered from each model's JSON schema.
- Optionally toggle a parameter into a sweep distribution (requires
run_gridsearch: truein globals). - Enter an experiment name and a random seed.
- Click Generate Run Manifest.
The resulting run_manifest.yaml is part of your scientific record. See Run Manifest for the schema.
Step 5: Launch and Monitor¶
In the GUI:
- Open the Run tab.
- Confirm the manifest path and output directory, usually
store/artifacts/run_output. - Click Launch Run.
- Watch the status table. Jobs cycle through kernel states such as
PENDING -> RUNNING -> PROMOTING -> ARTIFACT_SUCCESS, orFAILED/CANCELLED.
From the CLI:
Step 6: Evaluate and Inspect Results¶
- In the Run tab, find Evaluate Experiment after at least one job reaches
ARTIFACT_SUCCESS. - Click Evaluate experiment. The host prepares
.multiverse/launches/<launch_id>/eval_config.jsonand runs themultiverse-evaluatecontainer; the heavy evaluation stack is not imported by the GUI. - Review the launch-level comparison table. It is derived from
.multiverse/launches/<launch_id>/evaluation_report.jsonand includes one row per cohort member, with statuses such asdone,pending,not_ready,no_embeddings,obs_mismatch, orevaluation_failed. - Open the Results tab to filter by experiment, dataset, model, or status.
- Select a run to view metrics, the model log,
job_spec.json, and the artifact tree. - Copy the artifact directory for notebook analysis.
The artifact layout is:
<output-dir>/store/artifacts/<artifact-id>/
artifact_manifest.json
artifact_manifest.sha256
job_spec.json
embeddings.h5
metrics.json # optional
umap.png # optional
run.log # model SDK log (multiverse.worker)
container.log # host-captured container stdout/stderr
orchestrator.log # host-side run reasoning (state transitions, failures)
Where logs live¶
Each run carries up to three logs, surfaced together under Logs in the Results tab:
| File | Written by | Use it to debug |
|---|---|---|
run.log |
The model container via multiverse.worker |
Model-internal progress, metrics, warnings. |
container.log |
The host (captured container stdout/stderr) | Crashes, tracebacks, OOMs, or non-SDK images that never wrote run.log. |
orchestrator.log |
The host executor | Admission, launch, exit code, promotion outcome, and the exact failure reason. |
Successful runs are promoted to store/artifacts/<artifact-id>/. Runs that fail before promotion keep their logs in the run's workspace at <output-dir>/store/workspaces/<attempt-id>/, and cancelled runs under <output-dir>/store/cancelled/<date>/<attempt-id>/. Session-wide CLI events are written to <output-dir>/multiverse.log, and kernel state-machine events to <output-dir>/journal/current.log.
Set MULTIVERSE_LOG_LEVEL=DEBUG (a level name or numeric value) before launching to raise verbosity across the host logs and the in-container run.log.
Evaluation state for a launch lives beside the cohort, not inside promoted artifact directories:
<output-dir>/.multiverse/launches/<launch_id>/
cohort.json
eval_config.json
evaluations/<member_id>.json
evaluation_report.json
plots/dataset_<dataset_slug>/scib_results.svg
For cross-run comparison and metric histories, open the Analysis tab or visit MLflow at http://localhost:25000 directly.
Step 7: Bring Embeddings Back to Jupyter¶
from pathlib import Path
import h5py
import scanpy as sc
artifact_dir = Path("store/artifacts/run_output/store/artifacts/<artifact-id>")
# Copy the exact path from the Results tab.
with h5py.File(artifact_dir / "embeddings.h5", "r") as f:
embedding = f["latent"][:]
adata = sc.read_h5ad("store/datasets/pbmc_rna/data/rna.h5ad")
adata.obsm["X_multiverse_pca"] = embedding
sc.pp.neighbors(adata, use_rep="X_multiverse_pca")
sc.tl.umap(adata)
sc.pl.umap(adata, color=["batch", "cell_type"])
Common Issues¶
| Symptom | Likely cause | What to do |
|---|---|---|
| Dataset does not appear in Configure | Registry has not refreshed. | Registry → Refresh Registry. |
Job is FAILED |
Docker launch, container execution, or output validation failed. | Open orchestrator.log for the failure reason, then container.log for the container traceback. For failed runs these stay under store/workspaces/<attempt-id>/. |
executor crashed: unverified_local |
Running with --strict but image has no registry digest. |
Remove --strict. The default run allows locally-built images. |
| Metric is missing | batch_key or cell_type_key does not support that metric. |
Confirm columns exist in your obs; re-register if you fix them. |
database is locked |
Concurrent registry writes or an interrupted process. | Retry. If Results looks stale, run uv run multiverse rebuild-index --state-root store/artifacts/run_output --store-root store/artifacts/run_output/store. |
Writing Your Methods Section¶
For a publication, keep these artifacts with the analysis:
run_manifest.yaml: datasets, models, parameters, seed, metric selection.job_spec.json: exact per-job runtime instruction passed to the model container.metrics.json: model metrics and training histories where available..multiverse/launches/<launch_id>/evaluation_report.json: launch-level scIB comparison and per-member evaluation statuses..multiverse/launches/<launch_id>/evaluations/<member_id>.json: structured outcome for each evaluated member.run.log/container.log: model and host-captured execution logs.provenance.json: additional provenance when present.
A Methods paragraph can state:
Integration benchmarks were run with multiverse (commit
<sha>). Datasets were registered with batch keybatchand cell-type keycell_type. The benchmark plan, model parameters, random seed, and metric configuration are provided in Supplementary File X (run_manifest.yaml). Per-model runtime specifications and output provenance are archived with each run artifact.
Where to Go Next¶
- Data Preparation — recipes for RNA, RNA+ATAC, RNA+ADT.
- Models Glossary — assumptions and hyperparameters per model.
- Evaluation Metrics — what each metric measures.
- Benchmarking — designing a defensible comparison.