Homework #5

Due 11:59 pm EST, Friday April 1st, 2022.

Email your solutions (both .ipnb and .html files) to: compscbio@gmail.com.

Background:

That same sadistic post-doc in your lab has just read the RNA Velocity paper and is even more excited about it than about Cytotrace. You decide to see how well Velocity pseudotime compares to Cytotrace score derived pseudotime using data for which you can reasonably anchor the starting point of the trajectory inference.

The data

  1. The day 4 mESC data that we keep using over and over again This is the raw counts data, as well as spliced and unspliced counts. As in HW4, we have cleaned it (i.e. removed potential doublets, low quality cells, and mito, ribo and malat genes). In terms of pre-processing, all you have to do is deal with rarely detected genes and normalization.

  2. There is no second data set for this homework.

Your mission:

Run your usual pre-processing, clustering and annotation. Perform Cytotrace analysis. Then run RNA velocity and plot the velocity embeddings on either (1) the 1st and 3rd principal components, or (2) the UMAP embedding (which you need to figure out how to make from the scanpy documentation). Then compute velocity pseudotime. Finally, as you did in HW4, compute diffusion based pseudotime.

Determine how these three pseudotimes compare and produce an appropiate multi-panel figure to display them.

Is there a relationship between velocity confidence and discreptancies between velocity pseudotime and Cytotrace pseudotime and/or diffusion pseudotime?