Homework #4

Due 11:59 pm EST, Friday March 18th, 2022.

Email your solutions (both .ipnb and .html files) to: compscbio@gmail.com.

Background:

A post-doc in your lab has just read the Cytotrace paper and is very excited about it. But, as you are now a seasoned professional, your enthusiasm is a bit more modest and you decide to try it out before adding it to your rapidly expanding toolkit of sc analysis methods. You decide to see how well the Cytotrace score correlates with pseduotime computed on data for which you can reasonably anchor the starting point of the trajectory inference.

The data

  1. The day 4 mESC data that we keep using over and over again This is the raw counts data, however, we have cleaned it (i.e. removed potential doublets, low quality cells, and mito, ribo and malat genes). In terms of pre-processing, all you have to do is deal with rarely detected genes and normalization.

  2. The day 0-4 mESC data that we used in HW2 This is the raw counts data. Just like the data from #1 above, we have cleaned this data. Moreover, we have removed the MEFs, so you should start your analysis at the point of dealing with undetected genes.

Your mission:

Produce one figure per data set that illustrate the relationship between pseudotime and Cytotrace time (i.e. 1- Cytotrace score). The idea here is to see how well Cytotrace works on data that you know well. You will need to perform pseudotime analysis using diffusion maps, and judicisouly select a root based on cluster annotation. You may wish to color your scatterplots by cluster (or cell type), or facet them, to get a sense of how the relationships vary by cell type.

Bonus mission:

You wonder whether the overall trends that you have observed with the differentiating mESCs extend to other lineages. Therefore, you decide to perform a similar analysis using the young and old HSC data from HW3.