In this notebook, we compute the differential analysis for a given design:
<3yo vs 4-18 year old for OGM subset given by Manon.
Reads were aligned to the human genome (GRCh38) using STAR and quantified using subread featureCounts in strand-specific mode. Differential analysis is computed using Deseq2 using additional log fold-change shrinkage with apeglm. We then extract multiple set of differentially expressed genes based on different thresholds, visualize them using heatmaps and send them to Enrichr and g:Profiler for ORA enrichment analysis. GSEA is also computed against MSigdb Hallmarks and Curated Pathways gene sets.
Attaching package: ‘dplyr’
The following object is masked from ‘package:Biobase’:
combine
The following objects are masked from ‘package:GenomicRanges’:
intersect, setdiff, union
The following object is masked from ‘package:GenomeInfoDb’:
intersect
The following objects are masked from ‘package:IRanges’:
collapse, desc, intersect, setdiff, slice, union
The following objects are masked from ‘package:S4Vectors’:
first, intersect, rename, setdiff, setequal, union
The following objects are masked from ‘package:BiocGenerics’:
combine, intersect, setdiff, union
The following object is masked from ‘package:matrixStats’:
count
The following objects are masked from ‘package:data.table’:
between, first, last
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
class: DESeqDataSet dim: 27346 410 metadata(1): version assays(1): counts rownames: NULL rowData names(16): ensembl_gene_id gene_symbol ... symbol entrezid colnames(410): 1012 1059 ... multiple_runs.672 multiple_runs.1256 colData names(48): cDNA run ... Phenotype Oncotype
class: DESeqDataSet dim: 27346 78 metadata(1): version assays(1): counts rownames: NULL rowData names(16): ensembl_gene_id gene_symbol ... symbol entrezid colnames(78): 1059 1112 ... 808 838 colData names(48): cDNA run ... Phenotype Oncotype
Uniform distribution of p-value is expected if there are no differences between tested groups.
Listing the number of differentially expressed genes by chromosome.
< table of extent 0 >
DataFrame with 20955 rows and 52 columns
ensembl_gene_id gene_symbol chr gene_start gene_end gene_width
<character> <character> <factor> <integer> <integer> <integer>
1 ENSG00000227232 WASH7P 1 14404 29570 15167
2 ENSG00000278267 MIR6859-1 1 17369 17436 68
3 ENSG00000268903 AL627309.6 1 135141 135895 755
4 ENSG00000269981 AL627309.7 1 137682 137965 284
5 ENSG00000241860 AL627309.5 1 141474 173862 32389
... ... ... ... ... ... ...
20951 ENSG00000198695 MT-ND6 MT 14149 14673 525
20952 ENSG00000210194 MT-TE MT 14674 14742 69
20953 ENSG00000198727 MT-CYB MT 14747 15887 1141
20954 ENSG00000210195 MT-TT MT 15888 15953 66
20955 ENSG00000210196 MT-TP MT 15956 16023 68
gene_strand gene_id gene_name gene_biotype
<factor> <character> <character> <character>
1 - ENSG00000227232 WASH7P unprocessed_pseudogene
2 - ENSG00000278267 MIR6859-1 miRNA
3 - ENSG00000268903 AL627309.6 processed_pseudogene
4 - ENSG00000269981 AL627309.7 processed_pseudogene
5 - ENSG00000241860 AL627309.5 lncRNA
... ... ... ... ...
20951 - ENSG00000198695 MT-ND6 protein_coding
20952 - ENSG00000210194 MT-TE Mt_tRNA
20953 + ENSG00000198727 MT-CYB protein_coding
20954 + ENSG00000210195 MT-TT Mt_tRNA
20955 - ENSG00000210196 MT-TP Mt_tRNA
seq_coord_system description gene_id_version
<character> <character> <character>
1 chromosome WASP family homolog .. ENSG00000227232.5
2 chromosome microRNA 6859-1 [Sou.. ENSG00000278267.1
3 chromosome novel pseudogene ENSG00000268903.1
4 chromosome novel pseudogene ENSG00000269981.1
5 chromosome novel transcript ENSG00000241860.7
... ... ... ...
20951 chromosome mitochondrially enco.. ENSG00000198695.2
20952 chromosome mitochondrially enco.. ENSG00000210194.1
20953 chromosome mitochondrially enco.. ENSG00000198727.2
20954 chromosome mitochondrially enco.. ENSG00000210195.2
20955 chromosome mitochondrially enco.. ENSG00000210196.2
canonical_transcript symbol entrezid baseMean baseVar
<character> <character> <list> <numeric> <numeric>
1 ENST00000488147 WASH7P 653635 33.76843 218.3021
2 ENST00000619216 MIR6859-1 102466751 5.86586 14.2410
3 ENST00000494149 AL627309.6 NA 10.57676 188.1358
4 ENST00000595919 AL627309.7 NA 8.23722 108.9080
5 ENST00000484859 AL627309.5 NA 5.36797 25.5077
... ... ... ... ... ...
20951 ENST00000361681 MT-ND6 4541 2903.6958 1.70039e+07
20952 ENST00000387459 MT-TE NA 152.5954 2.60570e+04
20953 ENST00000361789 MT-CYB 4519 33828.1970 1.17024e+09
20954 ENST00000387460 MT-TT NA 196.2208 7.44177e+04
20955 ENST00000387461 MT-TP NA 81.7716 8.44536e+03
allZero dispGeneEst dispGeneIter dispersion dispIter dispOutlier
<logical> <numeric> <numeric> <numeric> <numeric> <logical>
1 FALSE 0.163174 9 0.173717 11 FALSE
2 FALSE 0.284756 10 0.330151 6 FALSE
3 FALSE 2.055268 13 2.011257 8 FALSE
4 FALSE 2.129247 12 2.092060 9 FALSE
5 FALSE 0.847645 13 0.887801 12 FALSE
... ... ... ... ... ... ...
20951 FALSE 0.688191 13 0.677965 13 FALSE
20952 FALSE 0.572638 13 0.568603 12 FALSE
20953 FALSE 0.456786 12 0.454881 12 FALSE
20954 FALSE 0.780555 13 0.768349 13 FALSE
20955 FALSE 0.799326 14 0.788683 12 FALSE
dispMAP Intercept x_NKX2_vs_non_NKX2_TLX1_TLX3
<numeric> <numeric> <numeric>
1 0.173717 5.08279 -0.2896785
2 0.330151 2.46188 0.4372742
3 2.011257 3.40201 -0.3400029
4 2.092060 3.07204 -1.9369051
5 0.887801 2.48725 0.0339052
... ... ... ...
20951 0.677965 11.19400 0.3573385
20952 0.568603 6.92985 0.7361034
20953 0.454881 14.89139 -0.1009107
20954 0.768349 7.64650 0.0473476
20955 0.788683 6.02209 0.8392170
x_TLX1_vs_non_NKX2_TLX1_TLX3 x_TLX3_vs_non_NKX2_TLX1_TLX3 SE_Intercept
<numeric> <numeric> <numeric>
1 0.0894882 0.000655532 0.0913075
2 0.2384560 0.083112819 0.1442070
3 -1.0727889 0.521363186 0.2936259
4 -0.7176645 0.556119889 0.3009290
5 -0.9250758 -0.077715368 0.2091105
... ... ... ...
20951 0.0582082 1.189984 0.166396
20952 0.4109170 0.999996 0.153521
20953 -0.1566909 0.826148 0.136256
20954 -0.1002889 -0.148936 0.177705
20955 0.4445913 0.973480 0.181293
SE_x_NKX2_vs_non_NKX2_TLX1_TLX3 SE_x_TLX1_vs_non_NKX2_TLX1_TLX3
<numeric> <numeric>
1 0.286510 0.245972
2 0.429721 0.380235
3 0.910718 0.812260
4 0.993177 0.827217
5 0.645695 0.598737
... ... ...
20951 0.512833 0.451862
20952 0.471908 0.416024
20953 0.419970 0.370030
20954 0.547689 0.482541
20955 0.556548 0.490858
SE_x_TLX3_vs_non_NKX2_TLX1_TLX3 WaldStatistic_Intercept
<numeric> <numeric>
1 0.202832 55.6667
2 0.319562 17.0718
3 0.647689 11.5862
4 0.662821 10.2085
5 0.466507 11.8944
... ... ...
20951 0.369141 67.2734
20952 0.339570 45.1395
20953 0.302321 109.2898
20954 0.394391 43.0292
20955 0.400602 33.2175
WaldStatistic_x_NKX2_vs_non_NKX2_TLX1_TLX3
<numeric>
1 -1.0110584
2 1.0175771
3 -0.3733350
4 -1.9502119
5 0.0525096
... ...
20951 0.6967933
20952 1.5598456
20953 -0.2402806
20954 0.0864499
20955 1.5078982
WaldStatistic_x_TLX1_vs_non_NKX2_TLX1_TLX3
<numeric>
1 0.363815
2 0.627127
3 -1.320746
4 -0.867565
5 -1.545045
... ...
20951 0.128819
20952 0.987725
20953 -0.423455
20954 -0.207835
20955 0.905743
WaldStatistic_x_TLX3_vs_non_NKX2_TLX1_TLX3 WaldPvalue_Intercept
<numeric> <numeric>
1 0.0032319 0.00000e+00
2 0.2600835 2.40586e-65
3 0.8049593 4.84097e-31
4 0.8390191 1.81629e-24
5 -0.1665900 1.26539e-32
... ... ...
20951 3.223652 0.00000e+00
20952 2.944887 0.00000e+00
20953 2.732688 0.00000e+00
20954 -0.377635 0.00000e+00
20955 2.430041 6.01496e-242
WaldPvalue_x_NKX2_vs_non_NKX2_TLX1_TLX3
<numeric>
1 0.3119885
2 0.3088790
3 0.7088991
4 0.0511509
5 0.9581226
... ...
20951 0.485932
20952 0.118796
20953 0.810113
20954 0.931109
20955 0.131581
WaldPvalue_x_TLX1_vs_non_NKX2_TLX1_TLX3
<numeric>
1 0.715996
2 0.530576
3 0.186586
4 0.385632
5 0.122335
... ...
20951 0.897501
20952 0.323287
20953 0.671963
20954 0.835358
20955 0.365072
WaldPvalue_x_TLX3_vs_non_NKX2_TLX1_TLX3 betaConv betaIter deviance
<numeric> <logical> <numeric> <numeric>
1 0.997421 TRUE 3 633.490
2 0.794799 TRUE 4 418.591
3 0.420843 TRUE 7 514.910
4 0.401459 TRUE 7 473.735
5 0.867693 TRUE 5 432.181
... ... ... ... ...
20951 0.00126567 TRUE 4 1387.729
20952 0.00323072 TRUE 4 924.283
20953 0.00628198 TRUE 4 1757.500
20954 0.70570171 TRUE 4 979.682
20955 0.01509713 TRUE 5 838.188
maxCooks replace dispFit baseMean log2FoldChange lfcSE
<numeric> <logical> <numeric> <numeric> <numeric> <numeric>
1 0.2014267 FALSE 0.862604 33.76843 -1.09289e-05 0.00144270
2 0.1805231 FALSE 3.661655 5.86586 2.60088e-06 0.00144269
3 0.1591125 FALSE 2.152866 10.57676 -3.89797e-07 0.00144269
4 0.0484600 FALSE 2.686454 8.23722 -1.39518e-06 0.00144269
5 0.0797114 FALSE 3.975848 5.36797 8.68260e-08 0.00144269
... ... ... ... ... ... ...
20951 0.0118543 FALSE 0.281013 2903.6958 1.56320e-06 0.00144269
20952 0.0605151 TRUE 0.404386 152.5954 1.18676e-06 0.00144269
20953 0.0198521 FALSE 0.274757 33828.1970 -5.76429e-07 0.00144269
20954 0.0905510 FALSE 0.375436 196.2208 1.68287e-07 0.00144269
20955 0.1731561 TRUE 0.517170 81.7716 1.10785e-06 0.00144269
pvalue padj
<numeric> <numeric>
1 0.3119885 0.676482
2 0.3088790 0.674928
3 0.7088991 0.902149
4 0.0511509 0.351316
5 0.9581226 0.989285
... ... ...
20951 0.485932 0.793478
20952 0.118796 0.479232
20953 0.810113 0.942216
20954 0.931109 0.980599
20955 0.131581 0.492331
The xlsx table of genes is available here.
Log2 fold-change in this Volcano plot is shrinked using Apeglm procedure.
We will display the significant genes at different stringency levels to handle various cases. Note that they are displayed in increasing stringency order but a good starting point would be to look at the 0.05 threshold.
The top 50 p-value threshold is arbitrary and mostly chosen to display a heatmap with a few readable gene names.
We use various thresholds for functional enrichment analysis to get an appropriate number of genes in each class. For the classes defined by the q-value, note that if the number of gene in a class is less than a dozen, the statistical power of the enrichment test is very low. At the opposite, if the number of genes is too high, the enrichment test will be biased by the large number of genes in the background.
EnrichR queries are available below:
g:Profiler queries are availables below. All these queries include the filter on |shrinked log2 fold-change| > 0.2 :
The xlsx table of gsea results is available here.
Picking joint bandwidth of 0.181