stemness

Introduction¶

This notebook will show you how to run CytoTrace to infer degree of differentiation as implemented in the the CellRank package. Recall from the CytoTrace paper that this inference is based on the total number of genes expressed.

CellRank performs several other useful tasks:

Estimate differentiation direction based a variety of biological priors, including pseudotime, developmental potential, RNA velocity, experimental time points, and more
Compute initial, terminal, and intermediate 'macrostates'
Infer fate probabilities and identify driver genes
Cluster and visualize gene expression trends

In CellRank parlance, kernels are methods that compute cell-cell transition probabilities. Go here to learn more about the different kernels available in CellRank.

Data¶

adRusso22_clusters_abc_sub_031925.h5ad: Directed differentiation of mouse embryonic stem cells and sampled at day 2, 3, 4, and 5. See Russo et al 2022. We have subset the cells to make it quicker to run this notebook.

You can fetch the data from Canvas

Other resources¶

Setup¶

First, we will import necessary packages and load the data

In [1]:

Copied!





import scanpy as sc
import numpy as np
import pandas as pd
import cellrank as cr
import scvelo as scv
import warnings
warnings.simplefilter("ignore", category=UserWarning)
import scanpy as sc
import numpy as np
import pandas as pd
import cellrank as cr
import scvelo as scv
import warnings
warnings.simplefilter("ignore", category=UserWarning)

In [2]:

Copied!

adata = sc.read_h5ad("data/adRusso22_clusters_abc_sub_031925.h5ad")
adata
adata = sc.read_h5ad("data/adRusso22_clusters_abc_sub_031925.h5ad")
adata

Out[2]:

AnnData object with n_obs × n_vars = 597 × 26328
    obs: 'n_genes_by_counts', 'total_counts', 'total_counts_ribo', 'pct_counts_ribo', 'total_counts_mt', 'pct_counts_mt', 'n_genes', 'n_counts', 'cluster', 'timepoint', 'sorting', 'cellid'
    var: 'mt', 'ribo', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts'

Filter out undetected genes

In [3]:

Copied!





adstart = adata.copy()
min_cell_percent = 0.005
min_cells = min_cell_percent * adstart.shape[0]
min_cells
adstart = adata.copy()
min_cell_percent = 0.005
min_cells = min_cell_percent * adstart.shape[0]
min_cells

Out[3]:

2.985

In [4]:

Copied!

sc.pp.filter_genes(adstart, min_cells = min_cells)
adstart.shape
sc.pp.filter_genes(adstart, min_cells = min_cells)
adstart.shape

Out[4]:

(597, 17500)

Normalize and define HVG, then PCA

In [5]:

Copied!





n_hvg = 2000
adstart.layers['counts'] = adstart.X.copy()
sc.pp.normalize_total(adstart)
sc.pp.log1p(adstart)
sc.pp.highly_variable_genes(adstart, n_top_genes=n_hvg, flavor='seurat_v3', layer='counts')
n_hvg = 2000
adstart.layers['counts'] = adstart.X.copy()
sc.pp.normalize_total(adstart)
sc.pp.log1p(adstart)
sc.pp.highly_variable_genes(adstart, n_top_genes=n_hvg, flavor='seurat_v3', layer='counts')

In [6]:

Copied!

sc.tl.pca(adstart, mask_var='highly_variable')
sc.pl.pca_variance_ratio(adstart, 50)
sc.tl.pca(adstart, mask_var='highly_variable')
sc.pl.pca_variance_ratio(adstart, 50)

No description has been provided for this image

kNN and UMAP

In [7]:

Copied!





def_npcs = 30
def_nneigh = 5
sc.pp.neighbors(adstart, n_neighbors = def_nneigh,  n_pcs = def_npcs)
sc.tl.umap(adstart)
def_npcs = 30
def_nneigh = 5
sc.pp.neighbors(adstart, n_neighbors = def_nneigh,  n_pcs = def_npcs)
sc.tl.umap(adstart)

In [8]:

Copied!

sc.pl.umap(adstart, color=['timepoint', 'Nanog', 'T','Mesp1', 'Tbx6','Sox1', 'Tubb3', 'cluster'], size=80, alpha=.75,frameon=False, ncols=2)
sc.pl.umap(adstart, color=['timepoint', 'Nanog', 'T','Mesp1', 'Tbx6','Sox1', 'Tubb3', 'cluster'], size=80, alpha=.75,frameon=False, ncols=2)

We will comute two different kernels. First, the ConnectivityKernel that computes cell-cell transitions based purely on transcriptional similarity.

In [9]:

Copied!

from cellrank.kernels import ConnectivityKernel
ck = ConnectivityKernel(adstart, conn_key='connectivities')
ck.compute_transition_matrix()
from cellrank.kernels import ConnectivityKernel
ck = ConnectivityKernel(adstart, conn_key='connectivities')
ck.compute_transition_matrix()

Out[9]:

ConnectivityKernel[n=597, dnorm=True, key='connectivities']

Once a transition matrix has been computed, we can use it to take random walks. These simulate the sequence of cell states that a given starting cell state will proceed through over time. In the resulting plot, black and yellow circles are initial and end states, respectively.

In [10]:

Copied!

ck.plot_random_walks(seed=0,n_sims=100,start_ixs={"cluster": "A"},basis="umap")
ck.plot_random_walks(seed=0,n_sims=100,start_ixs={"cluster": "A"},basis="umap")

100%|███████████████████████████████████████| 100/100 [00:00<00:00, 185.51sim/s]

You might find it interesting to see how vaying the kNN parameters impact this result.

Now, let's run CytoTrace. There is a bit of a word-around that we need to perform in oder to get this to work:

In [11]:

Copied!

adstart.layers["spliced"] = adstart.layers['counts']
adstart.layers["unspliced"] = adstart.layers['counts']
scv.pp.moments(adstart, n_pcs=def_npcs, n_neighbors=def_nneigh)
adstart.layers["spliced"] = adstart.layers['counts']
adstart.layers["unspliced"] = adstart.layers['counts']
scv.pp.moments(adstart, n_pcs=def_npcs, n_neighbors=def_nneigh)

computing moments based on connectivities
    finished (0:00:00) --> added 
    'Ms' and 'Mu', moments of un/spliced abundances (adata.layers)

In [12]:

Copied!

from cellrank.kernels import CytoTRACEKernel
ctk = CytoTRACEKernel(adstart).compute_cytotrace()
from cellrank.kernels import CytoTRACEKernel
ctk = CytoTRACEKernel(adstart).compute_cytotrace()

In [13]:

Copied!

sc.pl.embedding(adstart, color=["ct_pseudotime", "timepoint"],basis="umap")
sc.pl.embedding(adstart, color=["ct_pseudotime", "timepoint"],basis="umap")

In [14]:

Copied!

sc.pl.pca(adstart, color=["ct_pseudotime", "timepoint"], projection='3d', ncols=1)
sc.pl.pca(adstart, color=["ct_pseudotime", "timepoint"], projection='3d', ncols=1)

In [15]:

Copied!

ctk.compute_transition_matrix(threshold_scheme="soft", nu=0.5)
ctk.compute_transition_matrix(threshold_scheme="soft", nu=0.5)

100%|█████████████████████████████████████| 597/597 [00:00<00:00, 6364.41cell/s]

Out[15]:

CytoTRACEKernel[n=597, dnorm=False, scheme='soft', b=10.0, nu=0.5]

We can visualize the transition matrix as a vector field as popularized by RNA velocity.

In [16]:

Copied!

ctk.plot_projection(basis="pca", color="cluster", legend_loc="right", size=75)
ctk.plot_projection(basis="pca", color="cluster", legend_loc="right", size=75)

In [ ]: