Skip to content

Models Glossary

This reference summarizes the six built-in integration models available on the platform. The Streamlit GUI reads each model's registered JSON schema and renders the appropriate parameter controls; users should set these values in the GUI rather than editing legacy configuration files.

All built-in models write a latent representation to embeddings.h5, model-specific metrics to metrics.json, and a UMAP plot when the requested color key is available in dataset metadata.

PCA

Principal Component Analysis is the linear baseline model. It operates on concatenated AnnData features and projects cells into a low-dimensional principal-component space. If a highly_variable feature flag is present, PCA uses highly variable features; otherwise it uses the full feature matrix.

Supported omics: any omics

Default model metric: total_variance, the sum of the PCA variance ratios retained by the selected components.

Hyperparameters:

Parameter Default Meaning
n_components 50 Number of principal components to compute.
device cpu Registered for schema consistency; the current PCA wrapper is CPU-oriented.
umap_random_state 42 UMAP seed.
umap_color_type cell_type Observation column for UMAP coloring.

MOFA

MOFA uses the MOFA+ workflow through muon to learn latent factors across modalities. It is useful when the scientific question concerns shared and modality-specific axes of variation rather than only clustering performance.

Supported omics: any omics

Default model metric: total_variance, based on the explained variance across learned factors when available.

Hyperparameters:

Parameter Default Meaning
n_factors 20 Number of latent factors to learn.
n_iterations 5000 Number of iterations for MOFA training.
device cpu CPU or CUDA target; non-CPU values enable MOFA GPU mode.
umap_random_state 42 UMAP seed.
umap_color_type cell_type Observation column for UMAP coloring.

MultiVI

MultiVI is an scvi-tools variational model for joint single-cell RNA and ATAC analysis, with Protein Expression also included whenever available.

Supported omics: rna, atac, adt (optional)

Default model metric: silhouette_score when the requested label column is present. Training history from scvi-tools is also be written when available.

Hyperparameters:

Parameter Default Meaning
latent_dimensions 20 Latent-space size for MultiVI configuration.
max_epochs 400 Number of training epochs.
learning_rate 0.001 training learning rate.
device cpu CPU or CUDA target.
umap_random_state 42 UMAP seed.
umap_color_type cell_type Observation column for UMAP coloring and silhouette labels.

TotalVI

TotalVI is an scvi-tools variational model for CITE-seq style RNA and protein data. Supported omics: rna, adt

Default model metrics: elbo_train and reconstruction_loss_train from the final available training history values.

Hyperparameters:

Parameter Default Meaning
latent_dimensions 20 Latent-space size.
max_epochs 400 NUmber of training epochs.
learning_rate 0.001 Training learning rate.
device cpu CPU or CUDA target.
umap_random_state 42 UMAP seed.
umap_color_type cell_type Observation column for UMAP coloring.

Mowgli

Mowgli integrates multimodal data with optimal transport and non-negative matrix factorization. The learned representation is stored from the model's optimal-transport latent matrix.

Supported omics: any omics

Default model metric: ot_loss, reported from the final optimal-transport loss.

Hyperparameters:

Parameter Default Meaning
latent_dimensions 20 Size of the learned latent representation.
optimizer adam Optimizer for training.
learning_rate 0.001 Optimizer learning rate.
tol_inner 1e-6 Inner-loop convergence tolerance.
max_iter_inner 500 Number of training epochs.
device cpu CPU or CUDA target.
umap_random_state 42 UMAP seed.
umap_color_type cell_type Observation column for UMAP coloring.

Cobolt

Cobolt integrates multimodal data with a Bayesian hierarchical model.

Supported omics: any omics

Default model metric: loss, the final training loss. The wrapper stores the loss history when available.

Hyperparameters:

Parameter Default Meaning
latent_dimensions 20 Size of the learned latent representation.
num_epochs 200 Number of training epochs.
learning_rate 0.001 Training learning rate.
random_state 42 Registered random-state parameter for the Cobolt model
device cpu CPU or CUDA target.
umap_random_state 42 UMAP seed.
umap_color_type cell_type Observation column for UMAP coloring.