stemness
Introduction¶
This notebook will show you how to run CytoTrace to infer degree of differentiation as implemented in the the CellRank package. Recall from the CytoTrace paper that this inference is based on the total number of genes expressed.
CellRank performs several other useful tasks:
- Estimate differentiation direction based a variety of biological priors, including pseudotime, developmental potential, RNA velocity, experimental time points, and more
- Compute initial, terminal, and intermediate 'macrostates'
- Infer fate probabilities and identify driver genes
- Cluster and visualize gene expression trends
In CellRank parlance, kernels are methods that compute cell-cell transition probabilities. Go here to learn more about the different kernels available in CellRank.
Data¶
adRusso22_clusters_abc_sub_031925.h5ad: Directed differentiation of mouse embryonic stem cells and sampled at day 2, 3, 4, and 5. See Russo et al 2022. We have subset the cells to make it quicker to run this notebook.
You can fetch the data from Canvas
Other resources¶
Setup¶
First, we will import necessary packages and load the data
import scanpy as sc
import numpy as np
import pandas as pd
import cellrank as cr
import scvelo as scv
import warnings
warnings.simplefilter("ignore", category=UserWarning)
adata = sc.read_h5ad("data/adRusso22_clusters_abc_sub_031925.h5ad")
adata
AnnData object with n_obs × n_vars = 597 × 26328 obs: 'n_genes_by_counts', 'total_counts', 'total_counts_ribo', 'pct_counts_ribo', 'total_counts_mt', 'pct_counts_mt', 'n_genes', 'n_counts', 'cluster', 'timepoint', 'sorting', 'cellid' var: 'mt', 'ribo', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts'
Filter out undetected genes
adstart = adata.copy()
min_cell_percent = 0.005
min_cells = min_cell_percent * adstart.shape[0]
min_cells
2.985
sc.pp.filter_genes(adstart, min_cells = min_cells)
adstart.shape
(597, 17500)
Normalize and define HVG, then PCA
n_hvg = 2000
adstart.layers['counts'] = adstart.X.copy()
sc.pp.normalize_total(adstart)
sc.pp.log1p(adstart)
sc.pp.highly_variable_genes(adstart, n_top_genes=n_hvg, flavor='seurat_v3', layer='counts')
sc.tl.pca(adstart, mask_var='highly_variable')
sc.pl.pca_variance_ratio(adstart, 50)
kNN and UMAP
def_npcs = 30
def_nneigh = 5
sc.pp.neighbors(adstart, n_neighbors = def_nneigh, n_pcs = def_npcs)
sc.tl.umap(adstart)
sc.pl.umap(adstart, color=['timepoint', 'Nanog', 'T','Mesp1', 'Tbx6','Sox1', 'Tubb3', 'cluster'], size=80, alpha=.75,frameon=False, ncols=2)
We will comute two different kernels. First, the ConnectivityKernel that computes cell-cell transitions based purely on transcriptional similarity.
from cellrank.kernels import ConnectivityKernel
ck = ConnectivityKernel(adstart, conn_key='connectivities')
ck.compute_transition_matrix()
ConnectivityKernel[n=597, dnorm=True, key='connectivities']
Once a transition matrix has been computed, we can use it to take random walks. These simulate the sequence of cell states that a given starting cell state will proceed through over time. In the resulting plot, black and yellow circles are initial and end states, respectively.
ck.plot_random_walks(seed=0,n_sims=100,start_ixs={"cluster": "A"},basis="umap")
100%|███████████████████████████████████████| 100/100 [00:00<00:00, 185.51sim/s]
You might find it interesting to see how vaying the kNN parameters impact this result.
Now, let's run CytoTrace. There is a bit of a word-around that we need to perform in oder to get this to work:
adstart.layers["spliced"] = adstart.layers['counts']
adstart.layers["unspliced"] = adstart.layers['counts']
scv.pp.moments(adstart, n_pcs=def_npcs, n_neighbors=def_nneigh)
computing moments based on connectivities finished (0:00:00) --> added 'Ms' and 'Mu', moments of un/spliced abundances (adata.layers)
from cellrank.kernels import CytoTRACEKernel
ctk = CytoTRACEKernel(adstart).compute_cytotrace()
sc.pl.embedding(adstart, color=["ct_pseudotime", "timepoint"],basis="umap")
sc.pl.pca(adstart, color=["ct_pseudotime", "timepoint"], projection='3d', ncols=1)
ctk.compute_transition_matrix(threshold_scheme="soft", nu=0.5)
100%|█████████████████████████████████████| 597/597 [00:00<00:00, 6364.41cell/s]
CytoTRACEKernel[n=597, dnorm=False, scheme='soft', b=10.0, nu=0.5]
We can visualize the transition matrix as a vector field as popularized by RNA velocity.
ctk.plot_projection(basis="pca", color="cluster", legend_loc="right", size=75)