Homework #6

Due 11:59 pm EST, Friday April 8th, 2022.

Email your solutions (both .ipnb and .html files) to: compscbio@gmail.com.

Background:

A wise, less-sadistic post-doc in your collaborator’s lab has generated scRNAseq data from HSPCs exactly as was done in the Weinreb et al 2020 paper. (in fact, it is the same data). She has asked you to analyze it using CoSpar to address the two questions listed below.

The data

  1. scRNAseq data of hematopoietic stem with lineage barcodes This is the raw counts data, as well as lineage barcodes, as we discussed in the lineage tracing lectures.

  2. There is no second data set for this homework.

Your mission:

Analyze the data to answer the following questions

1.) What genes distinguish undifferentiated cells biased towards erythrocytes versus megakaryocytes?

2.) What genes distinguish undifferentiated cells biased towards erythrocytes AND megakaryocytes versus those undifferentiated cells biased towards monocytes AND neutrophils?

Bonus (i.e. extra credit)

3.) What genes distinguish the most multipotent cells from more fate bound but still undifferentiated cells?

4.) Are any signaling pathways enriched in any of the differential expression analyses that you performed in #1-#3? The underlying hypothesis here is that fate biases result from exposure to different signaling mileus. You might explore this using GSEAPY or using the sc.tl.score_genes(). Here is a list of signaling pathway targets, which might be helpful. It was derived as described in Emily Su’s paper. You can load this dict object with the following code

from joblib import dump, load sigPathTargets  = load("signaling_pathway_targets_040122.joblib")

Important notes:

1.) The first time that you run a CoSpar analysis, run the cs.hf.set_up_folders() function. This will set up some directories that CoSpar assumes are in place.

2.) To answer these questions, you will need to identify the undifferentiated cells that are likely to transition to either of these lineages. Please look at the updated Jupyter Notebook for CoSpar analysis as it contains some code that we did not cover in class to specifcially isolate fate biased progenitors.

3.) You may need to adjust some parameters such as sum_fate_prob_thresh in the tl.fate_bias(). Same is true when defining differentially expressed genes.

4.) The CoSpar documentation might be helpful if you get stuck or want to dig deeper.