In this notebook, we compute the differential analysis for a given design:
<3yo vs 4-18 year old for OGM subset given by Manon.
Reads were aligned to the human genome (GRCh38) using STAR and quantified using subread featureCounts in strand-specific mode. Differential analysis is computed using Deseq2 using additional log fold-change shrinkage with apeglm. We then extract multiple set of differentially expressed genes based on different thresholds, visualize them using heatmaps and send them to Enrichr and g:Profiler for ORA enrichment analysis. GSEA is also computed against MSigdb Hallmarks and Curated Pathways gene sets.
Attaching package: ‘dplyr’ The following object is masked from ‘package:Biobase’: combine The following objects are masked from ‘package:GenomicRanges’: intersect, setdiff, union The following object is masked from ‘package:GenomeInfoDb’: intersect The following objects are masked from ‘package:IRanges’: collapse, desc, intersect, setdiff, slice, union The following objects are masked from ‘package:S4Vectors’: first, intersect, rename, setdiff, setequal, union The following objects are masked from ‘package:BiocGenerics’: combine, intersect, setdiff, union The following object is masked from ‘package:matrixStats’: count The following objects are masked from ‘package:data.table’: between, first, last The following objects are masked from ‘package:stats’: filter, lag The following objects are masked from ‘package:base’: intersect, setdiff, setequal, union
class: DESeqDataSet dim: 27346 410 metadata(1): version assays(1): counts rownames: NULL rowData names(16): ensembl_gene_id gene_symbol ... symbol entrezid colnames(410): 1012 1059 ... multiple_runs.672 multiple_runs.1256 colData names(48): cDNA run ... Phenotype Oncotype
class: DESeqDataSet dim: 27346 78 metadata(1): version assays(1): counts rownames: NULL rowData names(16): ensembl_gene_id gene_symbol ... symbol entrezid colnames(78): 1059 1112 ... 808 838 colData names(48): cDNA run ... Phenotype Oncotype
Uniform distribution of p-value is expected if there are no differences between tested groups.
Listing the number of differentially expressed genes by chromosome.
< table of extent 0 >
DataFrame with 20955 rows and 52 columns ensembl_gene_id gene_symbol chr gene_start gene_end gene_width <character> <character> <factor> <integer> <integer> <integer> 1 ENSG00000227232 WASH7P 1 14404 29570 15167 2 ENSG00000278267 MIR6859-1 1 17369 17436 68 3 ENSG00000268903 AL627309.6 1 135141 135895 755 4 ENSG00000269981 AL627309.7 1 137682 137965 284 5 ENSG00000241860 AL627309.5 1 141474 173862 32389 ... ... ... ... ... ... ... 20951 ENSG00000198695 MT-ND6 MT 14149 14673 525 20952 ENSG00000210194 MT-TE MT 14674 14742 69 20953 ENSG00000198727 MT-CYB MT 14747 15887 1141 20954 ENSG00000210195 MT-TT MT 15888 15953 66 20955 ENSG00000210196 MT-TP MT 15956 16023 68 gene_strand gene_id gene_name gene_biotype <factor> <character> <character> <character> 1 - ENSG00000227232 WASH7P unprocessed_pseudogene 2 - ENSG00000278267 MIR6859-1 miRNA 3 - ENSG00000268903 AL627309.6 processed_pseudogene 4 - ENSG00000269981 AL627309.7 processed_pseudogene 5 - ENSG00000241860 AL627309.5 lncRNA ... ... ... ... ... 20951 - ENSG00000198695 MT-ND6 protein_coding 20952 - ENSG00000210194 MT-TE Mt_tRNA 20953 + ENSG00000198727 MT-CYB protein_coding 20954 + ENSG00000210195 MT-TT Mt_tRNA 20955 - ENSG00000210196 MT-TP Mt_tRNA seq_coord_system description gene_id_version <character> <character> <character> 1 chromosome WASP family homolog .. ENSG00000227232.5 2 chromosome microRNA 6859-1 [Sou.. ENSG00000278267.1 3 chromosome novel pseudogene ENSG00000268903.1 4 chromosome novel pseudogene ENSG00000269981.1 5 chromosome novel transcript ENSG00000241860.7 ... ... ... ... 20951 chromosome mitochondrially enco.. ENSG00000198695.2 20952 chromosome mitochondrially enco.. ENSG00000210194.1 20953 chromosome mitochondrially enco.. ENSG00000198727.2 20954 chromosome mitochondrially enco.. ENSG00000210195.2 20955 chromosome mitochondrially enco.. ENSG00000210196.2 canonical_transcript symbol entrezid baseMean baseVar <character> <character> <list> <numeric> <numeric> 1 ENST00000488147 WASH7P 653635 33.76843 218.3021 2 ENST00000619216 MIR6859-1 102466751 5.86586 14.2410 3 ENST00000494149 AL627309.6 NA 10.57676 188.1358 4 ENST00000595919 AL627309.7 NA 8.23722 108.9080 5 ENST00000484859 AL627309.5 NA 5.36797 25.5077 ... ... ... ... ... ... 20951 ENST00000361681 MT-ND6 4541 2903.6958 1.70039e+07 20952 ENST00000387459 MT-TE NA 152.5954 2.60570e+04 20953 ENST00000361789 MT-CYB 4519 33828.1970 1.17024e+09 20954 ENST00000387460 MT-TT NA 196.2208 7.44177e+04 20955 ENST00000387461 MT-TP NA 81.7716 8.44536e+03 allZero dispGeneEst dispGeneIter dispersion dispIter dispOutlier <logical> <numeric> <numeric> <numeric> <numeric> <logical> 1 FALSE 0.163174 9 0.173717 11 FALSE 2 FALSE 0.284756 10 0.330151 6 FALSE 3 FALSE 2.055268 13 2.011257 8 FALSE 4 FALSE 2.129247 12 2.092060 9 FALSE 5 FALSE 0.847645 13 0.887801 12 FALSE ... ... ... ... ... ... ... 20951 FALSE 0.688191 13 0.677965 13 FALSE 20952 FALSE 0.572638 13 0.568603 12 FALSE 20953 FALSE 0.456786 12 0.454881 12 FALSE 20954 FALSE 0.780555 13 0.768349 13 FALSE 20955 FALSE 0.799326 14 0.788683 12 FALSE dispMAP Intercept x_NKX2_vs_non_NKX2_TLX1_TLX3 <numeric> <numeric> <numeric> 1 0.173717 5.08279 -0.2896785 2 0.330151 2.46188 0.4372742 3 2.011257 3.40201 -0.3400029 4 2.092060 3.07204 -1.9369051 5 0.887801 2.48725 0.0339052 ... ... ... ... 20951 0.677965 11.19400 0.3573385 20952 0.568603 6.92985 0.7361034 20953 0.454881 14.89139 -0.1009107 20954 0.768349 7.64650 0.0473476 20955 0.788683 6.02209 0.8392170 x_TLX1_vs_non_NKX2_TLX1_TLX3 x_TLX3_vs_non_NKX2_TLX1_TLX3 SE_Intercept <numeric> <numeric> <numeric> 1 0.0894882 0.000655532 0.0913075 2 0.2384560 0.083112819 0.1442070 3 -1.0727889 0.521363186 0.2936259 4 -0.7176645 0.556119889 0.3009290 5 -0.9250758 -0.077715368 0.2091105 ... ... ... ... 20951 0.0582082 1.189984 0.166396 20952 0.4109170 0.999996 0.153521 20953 -0.1566909 0.826148 0.136256 20954 -0.1002889 -0.148936 0.177705 20955 0.4445913 0.973480 0.181293 SE_x_NKX2_vs_non_NKX2_TLX1_TLX3 SE_x_TLX1_vs_non_NKX2_TLX1_TLX3 <numeric> <numeric> 1 0.286510 0.245972 2 0.429721 0.380235 3 0.910718 0.812260 4 0.993177 0.827217 5 0.645695 0.598737 ... ... ... 20951 0.512833 0.451862 20952 0.471908 0.416024 20953 0.419970 0.370030 20954 0.547689 0.482541 20955 0.556548 0.490858 SE_x_TLX3_vs_non_NKX2_TLX1_TLX3 WaldStatistic_Intercept <numeric> <numeric> 1 0.202832 55.6667 2 0.319562 17.0718 3 0.647689 11.5862 4 0.662821 10.2085 5 0.466507 11.8944 ... ... ... 20951 0.369141 67.2734 20952 0.339570 45.1395 20953 0.302321 109.2898 20954 0.394391 43.0292 20955 0.400602 33.2175 WaldStatistic_x_NKX2_vs_non_NKX2_TLX1_TLX3 <numeric> 1 -1.0110584 2 1.0175771 3 -0.3733350 4 -1.9502119 5 0.0525096 ... ... 20951 0.6967933 20952 1.5598456 20953 -0.2402806 20954 0.0864499 20955 1.5078982 WaldStatistic_x_TLX1_vs_non_NKX2_TLX1_TLX3 <numeric> 1 0.363815 2 0.627127 3 -1.320746 4 -0.867565 5 -1.545045 ... ... 20951 0.128819 20952 0.987725 20953 -0.423455 20954 -0.207835 20955 0.905743 WaldStatistic_x_TLX3_vs_non_NKX2_TLX1_TLX3 WaldPvalue_Intercept <numeric> <numeric> 1 0.0032319 0.00000e+00 2 0.2600835 2.40586e-65 3 0.8049593 4.84097e-31 4 0.8390191 1.81629e-24 5 -0.1665900 1.26539e-32 ... ... ... 20951 3.223652 0.00000e+00 20952 2.944887 0.00000e+00 20953 2.732688 0.00000e+00 20954 -0.377635 0.00000e+00 20955 2.430041 6.01496e-242 WaldPvalue_x_NKX2_vs_non_NKX2_TLX1_TLX3 <numeric> 1 0.3119885 2 0.3088790 3 0.7088991 4 0.0511509 5 0.9581226 ... ... 20951 0.485932 20952 0.118796 20953 0.810113 20954 0.931109 20955 0.131581 WaldPvalue_x_TLX1_vs_non_NKX2_TLX1_TLX3 <numeric> 1 0.715996 2 0.530576 3 0.186586 4 0.385632 5 0.122335 ... ... 20951 0.897501 20952 0.323287 20953 0.671963 20954 0.835358 20955 0.365072 WaldPvalue_x_TLX3_vs_non_NKX2_TLX1_TLX3 betaConv betaIter deviance <numeric> <logical> <numeric> <numeric> 1 0.997421 TRUE 3 633.490 2 0.794799 TRUE 4 418.591 3 0.420843 TRUE 7 514.910 4 0.401459 TRUE 7 473.735 5 0.867693 TRUE 5 432.181 ... ... ... ... ... 20951 0.00126567 TRUE 4 1387.729 20952 0.00323072 TRUE 4 924.283 20953 0.00628198 TRUE 4 1757.500 20954 0.70570171 TRUE 4 979.682 20955 0.01509713 TRUE 5 838.188 maxCooks replace dispFit baseMean log2FoldChange lfcSE <numeric> <logical> <numeric> <numeric> <numeric> <numeric> 1 0.2014267 FALSE 0.862604 33.76843 -1.09289e-05 0.00144270 2 0.1805231 FALSE 3.661655 5.86586 2.60088e-06 0.00144269 3 0.1591125 FALSE 2.152866 10.57676 -3.89797e-07 0.00144269 4 0.0484600 FALSE 2.686454 8.23722 -1.39518e-06 0.00144269 5 0.0797114 FALSE 3.975848 5.36797 8.68260e-08 0.00144269 ... ... ... ... ... ... ... 20951 0.0118543 FALSE 0.281013 2903.6958 1.56320e-06 0.00144269 20952 0.0605151 TRUE 0.404386 152.5954 1.18676e-06 0.00144269 20953 0.0198521 FALSE 0.274757 33828.1970 -5.76429e-07 0.00144269 20954 0.0905510 FALSE 0.375436 196.2208 1.68287e-07 0.00144269 20955 0.1731561 TRUE 0.517170 81.7716 1.10785e-06 0.00144269 pvalue padj <numeric> <numeric> 1 0.3119885 0.676482 2 0.3088790 0.674928 3 0.7088991 0.902149 4 0.0511509 0.351316 5 0.9581226 0.989285 ... ... ... 20951 0.485932 0.793478 20952 0.118796 0.479232 20953 0.810113 0.942216 20954 0.931109 0.980599 20955 0.131581 0.492331
The xlsx table of genes is available here.
Log2 fold-change in this Volcano plot is shrinked using Apeglm procedure.
We will display the significant genes at different stringency levels to handle various cases. Note that they are displayed in increasing stringency order but a good starting point would be to look at the 0.05 threshold.
The top 50 p-value threshold is arbitrary and mostly chosen to display a heatmap with a few readable gene names.
We use various thresholds for functional enrichment analysis to get an appropriate number of genes in each class. For the classes defined by the q-value, note that if the number of gene in a class is less than a dozen, the statistical power of the enrichment test is very low. At the opposite, if the number of genes is too high, the enrichment test will be biased by the large number of genes in the background.
EnrichR queries are available below:
g:Profiler queries are availables below. All these queries include the filter on |shrinked log2 fold-change| > 0.2 :
The xlsx table of gsea results is available here.
Picking joint bandwidth of 0.181