{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# scRNAseq analysis\n",
"\n",
"- https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/what-is-cell-ranger\n",
"- cellranger `count`: Counts cell barcodes, aligns to genome/transcriptome, counts UMIs\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Example invocation:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```\n",
"$ cellranger count --id=sample345 \\\n",
" --transcriptome=/opt/refdata-gex-GRCh38-2020-A \\\n",
" --fastqs=/home/jdoe/runs/HAWT7ADXX/outs/fastq_path \\\n",
" --sample=mysample \\\n",
" --expect-cells=1000 \\\n",
" --localcores=8 \\\n",
" --localmem=64\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Example output:\n",
"\n",
"```\n",
"Outputs:\n",
"- Run summary HTML: /opt/sample345/outs/web_summary.html\n",
"- Run summary CSV: /opt/sample345/outs/metrics_summary.csv\n",
"- BAM: /opt/sample345/outs/possorted_genome_bam.bam\n",
"- BAM index: /opt/sample345/outs/possorted_genome_bam.bam.bai\n",
"- Filtered feature-barcode matrices MEX: /opt/sample345/outs/filtered_feature_bc_matrix\n",
"- Filtered feature-barcode matrices HDF5: /opt/sample345/outs/filtered_feature_bc_matrix.h5\n",
"- Unfiltered feature-barcode matrices MEX: /opt/sample345/outs/raw_feature_bc_matrix\n",
"- Unfiltered feature-barcode matrices HDF5: /opt/sample345/outs/raw_feature_bc_matrix.h5\n",
"- Secondary analysis output CSV: /opt/sample345/outs/analysis\n",
"- Per-molecule read information: /opt/sample345/outs/molecule_info.h5\n",
"- CRISPR-specific analysis: null\n",
"- Loupe Browser file: /opt/sample345/outs/cloupe.cloupe\n",
"- Feature Reference: null\n",
"- Target Panel File: null\n",
"\n",
"Waiting 6 seconds for UI to do final refresh.\n",
"Pipestance completed successfully!\n",
"\n",
"yyyy-mm-dd hh:mm:ss Shutting down.\n",
"Saving pipestance info to \"tiny/tiny.mri.tgz\"\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What is actually doing?\n",
"\n",
"- Trims reads\n",
"- Splicing aware alignment to reference genome\n",
"- GTF file to assign aligned reads to transcripts (see below)\n",
"- handles reads mapping to >1 locus\n",
"\n",
"How does it handle reads that hit different parts of the genome?\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- aligned reads for UMI counting, by default, are ' transcriptomic' (blue) reads\n",
"- Often useful an more appropriate to include exonic and intronic aligned reads, too\n",
"- `include-introns`: use all but antisense reads for UMI counting"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### UMI counting:\n",
" \n",
"- group reads with same cell bc, umi, gene\n",
"- correct umis (one base off)\n",
"- cell bc, umi, diff gene -> only keep gene with most support\n",
"- discard both if tied\n",
"- all reads with same cell bc, umi, gene are counted as one UMI\n",
"- number of reads contributing to that UMI are stored in molecule info file (useful later)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"tags": []
},
"source": [
"##### Detecting cells\n",
"\n",
"- EmptyDrops method (Lun et al 2018)\n",
"- ID true positives with total UMI threshold determine based on total UMIs in top fraction of expected numvber of barcodes\n",
"- Select set of true negatives and compare complexity of remaining barcodes to it\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### CellRanger output status (web summary)\n",
"\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Sequencing saturation: \n",
"\n",
"- \"If sequencing saturation is at 50%, it means that every 2 new reads will result in 1 new UMI count (unique transcript) detected.\"\n",
"- Calcualted based on the ratio of unique to duplicate UMIs\n",
"\n",
"
\n", " |
---|
AAACATACCCTACC-1 | \n", "
AAACATACGTCGTA-1 | \n", "
AAACATACTTTCAC-1 | \n", "
AAACATTGCATTGG-1 | \n", "
AAACATTGCTTGCC-1 | \n", "
... | \n", "
TTTGACTGAGGCGA-1 | \n", "
TTTGACTGCATTGG-1 | \n", "
TTTGACTGCTGGAT-1 | \n", "
TTTGACTGGTGAGG-1 | \n", "
TTTGACTGTACAGC-1 | \n", "
5405 rows × 0 columns
\n", "\n", " | gene_ids | \n", "
---|---|
Xkr4 | \n", "ENSMUSG00000051951 | \n", "
Gm1992 | \n", "ENSMUSG00000089699 | \n", "
Gm37381 | \n", "ENSMUSG00000102343 | \n", "
Rp1 | \n", "ENSMUSG00000025900 | \n", "
Rp1-1 | \n", "ENSMUSG00000109048 | \n", "
... | \n", "... | \n", "
AC168977.1 | \n", "ENSMUSG00000079808 | \n", "
PISD | \n", "ENSMUSG00000095041 | \n", "
DHRSX | \n", "ENSMUSG00000063897 | \n", "
Vmn2r122 | \n", "ENSMUSG00000096730 | \n", "
CAAA01147332.1 | \n", "ENSMUSG00000095742 | \n", "
27998 rows × 1 columns
\n", "\n", " | gene_ids | \n", "mt | \n", "ribo | \n", "
---|---|---|---|
Xkr4 | \n", "ENSMUSG00000051951 | \n", "False | \n", "False | \n", "
Gm1992 | \n", "ENSMUSG00000089699 | \n", "False | \n", "False | \n", "
Gm37381 | \n", "ENSMUSG00000102343 | \n", "False | \n", "False | \n", "
Rp1 | \n", "ENSMUSG00000025900 | \n", "False | \n", "False | \n", "
Rp1-1 | \n", "ENSMUSG00000109048 | \n", "False | \n", "False | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "
AC168977.1 | \n", "ENSMUSG00000079808 | \n", "False | \n", "False | \n", "
PISD | \n", "ENSMUSG00000095041 | \n", "False | \n", "False | \n", "
DHRSX | \n", "ENSMUSG00000063897 | \n", "False | \n", "False | \n", "
Vmn2r122 | \n", "ENSMUSG00000096730 | \n", "False | \n", "False | \n", "
CAAA01147332.1 | \n", "ENSMUSG00000095742 | \n", "False | \n", "False | \n", "
27998 rows × 3 columns
\n", "\n", " | sampleName | \n", "n_genes_by_counts | \n", "total_counts | \n", "total_counts_ribo | \n", "pct_counts_ribo | \n", "total_counts_mt | \n", "pct_counts_mt | \n", "
---|---|---|---|---|---|---|---|
AAACATACCCTACC-1 | \n", "mEB_day4 | \n", "1212 | \n", "2238.0 | \n", "629.0 | \n", "28.105453 | \n", "28.0 | \n", "1.251117 | \n", "
AAACATACGTCGTA-1 | \n", "mEB_day4 | \n", "1588 | \n", "3831.0 | \n", "1267.0 | \n", "33.072304 | \n", "34.0 | \n", "0.887497 | \n", "
AAACATACTTTCAC-1 | \n", "mEB_day4 | \n", "1538 | \n", "3381.0 | \n", "961.0 | \n", "28.423544 | \n", "2.0 | \n", "0.059154 | \n", "
AAACATTGCATTGG-1 | \n", "mEB_day4 | \n", "1221 | \n", "2489.0 | \n", "750.0 | \n", "30.132584 | \n", "24.0 | \n", "0.964243 | \n", "
AAACATTGCTTGCC-1 | \n", "mEB_day4 | \n", "2661 | \n", "9510.0 | \n", "3132.0 | \n", "32.933754 | \n", "71.0 | \n", "0.746583 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
TTTGACTGAGGCGA-1 | \n", "mEB_day4 | \n", "2446 | \n", "6908.0 | \n", "1999.0 | \n", "28.937466 | \n", "65.0 | \n", "0.940938 | \n", "
TTTGACTGCATTGG-1 | \n", "mEB_day4 | \n", "2906 | \n", "9558.0 | \n", "3067.0 | \n", "32.088303 | \n", "91.0 | \n", "0.952082 | \n", "
TTTGACTGCTGGAT-1 | \n", "mEB_day4 | \n", "1475 | \n", "3280.0 | \n", "1035.0 | \n", "31.554878 | \n", "22.0 | \n", "0.670732 | \n", "
TTTGACTGGTGAGG-1 | \n", "mEB_day4 | \n", "2808 | \n", "9123.0 | \n", "2923.0 | \n", "32.039898 | \n", "55.0 | \n", "0.602872 | \n", "
TTTGACTGTACAGC-1 | \n", "mEB_day4 | \n", "3518 | \n", "14918.0 | \n", "5091.0 | \n", "34.126560 | \n", "91.0 | \n", "0.610001 | \n", "
5405 rows × 7 columns
\n", "