Abstract

In this notebook, we compute the differential analysis for a given design:

<3yo vs 4-18 year old for OGM subset given by Manon.

Methods

Reads were aligned to the human genome (GRCh38) using STAR and quantified using subread featureCounts in strand-specific mode. Differential analysis is computed using Deseq2 using additional log fold-change shrinkage with apeglm. We then extract multiple set of differentially expressed genes based on different thresholds, visualize them using heatmaps and send them to Enrichr and g:Profiler for ORA enrichment analysis. GSEA is also computed against MSigdb Hallmarks and Curated Pathways gene sets.

Load depencies

Packages

Data

Attaching package: ‘dplyr’


The following object is masked from ‘package:Biobase’:

    combine


The following objects are masked from ‘package:GenomicRanges’:

    intersect, setdiff, union


The following object is masked from ‘package:GenomeInfoDb’:

    intersect


The following objects are masked from ‘package:IRanges’:

    collapse, desc, intersect, setdiff, slice, union


The following objects are masked from ‘package:S4Vectors’:

    first, intersect, rename, setdiff, setequal, union


The following objects are masked from ‘package:BiocGenerics’:

    combine, intersect, setdiff, union


The following object is masked from ‘package:matrixStats’:

    count


The following objects are masked from ‘package:data.table’:

    between, first, last


The following objects are masked from ‘package:stats’:

    filter, lag


The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union


$Age_category
0to3
'#7077A1'
4to18
'#54B435'
18to55
'#C80036'
$ETP
0
'#E3E1D9'
1
'#987D9A'
NI
'#EEEEEE'
$Phenotype
IM0/G/D
'#DDE6ED'
IMB/preAB
'#9DB2BF'
TCR AB
'#526D82'
TCR GD
'#27374D'
NI
'#EEEEEE'
$Oncotype
HOXA
'#FFCF81'
TLX1
'#C3FF93'
TLX3
'#88D66C'
SIL-TAL
'#D862BC'
Negative
'#E3E1D9'
$Bionano_driver_solved
Negative
'#E3E1D9'
MLLT10
'#C63D2F'
NUP98
'#E25E3E'
SET-NUP214
'#FF9B50'
KMT2A
'#FFBB5C'
NKX2
'#F8DE22'
TLX1
'#C3FF93'
TLX3
'#88D66C'
ERG
'#47A992'
SPI1
'#088395'
TAL1
'#D862BC'
LMO1/2r
'#8644A2'
class: DESeqDataSet 
dim: 27346 410 
metadata(1): version
assays(1): counts
rownames: NULL
rowData names(16): ensembl_gene_id gene_symbol ... symbol entrezid
colnames(410): 1012 1059 ... multiple_runs.672 multiple_runs.1256
colData names(48): cDNA run ... Phenotype Oncotype
class: DESeqDataSet 
dim: 27346 78 
metadata(1): version
assays(1): counts
rownames: NULL
rowData names(16): ensembl_gene_id gene_symbol ... symbol entrezid
colnames(78): 1059 1112 ... 808 838
colData names(48): cDNA run ... Phenotype Oncotype
  1. '18to55'
  2. '18to55'
  3. '18to55'
  4. '18to55'
  5. '4to18'
  6. '18to55'
  7. '18to55'
  8. '0to3'
  9. '4to18'
  10. '4to18'
  11. '18to55'
  12. '4to18'
  13. '18to55'
  14. '18to55'
  15. '4to18'
  16. '18to55'
  17. '18to55'
  18. '4to18'
  19. '18to55'
  20. '18to55'
  21. '18to55'
  22. '18to55'
  23. '18to55'
  24. '18to55'
  25. '18to55'
  26. '4to18'
  27. '4to18'
  28. '0to3'
  29. '0to3'
  30. '18to55'
  31. '18to55'
  32. '4to18'
  33. '18to55'
  34. '18to55'
  35. '4to18'
  36. '18to55'
  37. '18to55'
  38. '18to55'
  39. '4to18'
  40. '4to18'
  41. '18to55'
  42. '18to55'
  43. '18to55'
  44. '18to55'
  45. '18to55'
  46. '18to55'
  47. '0to3'
  48. '18to55'
  49. '18to55'
  50. '0to3'
  51. '0to3'
  52. '0to3'
  53. '0to3'
  54. '0to3'
  55. '0to3'
  56. '0to3'
  57. '0to3'
  58. '0to3'
  59. '4to18'
  60. '0to3'
  61. '0to3'
  62. '0to3'
  63. '0to3'
  64. '4to18'
  65. '4to18'
  66. '0to3'
  67. '4to18'
  68. '4to18'
  69. '0to3'
  70. '4to18'
  71. '4to18'
  72. '4to18'
  73. '4to18'
  74. '4to18'
  75. '4to18'
  76. '4to18'
  77. '4to18'
  78. '4to18'
'non_NKX2_TLX1_TLX3'

Define sample annotations for heatmaps

Results

Histogram of p-values

Uniform distribution of p-value is expected if there are no differences between tested groups.

Listing the number of differentially expressed genes by chromosome.

< table of extent 0 >
DataFrame with 20955 rows and 52 columns
      ensembl_gene_id gene_symbol      chr gene_start  gene_end gene_width
          <character> <character> <factor>  <integer> <integer>  <integer>
1     ENSG00000227232      WASH7P        1      14404     29570      15167
2     ENSG00000278267   MIR6859-1        1      17369     17436         68
3     ENSG00000268903  AL627309.6        1     135141    135895        755
4     ENSG00000269981  AL627309.7        1     137682    137965        284
5     ENSG00000241860  AL627309.5        1     141474    173862      32389
...               ...         ...      ...        ...       ...        ...
20951 ENSG00000198695      MT-ND6       MT      14149     14673        525
20952 ENSG00000210194       MT-TE       MT      14674     14742         69
20953 ENSG00000198727      MT-CYB       MT      14747     15887       1141
20954 ENSG00000210195       MT-TT       MT      15888     15953         66
20955 ENSG00000210196       MT-TP       MT      15956     16023         68
      gene_strand         gene_id   gene_name           gene_biotype
         <factor>     <character> <character>            <character>
1               - ENSG00000227232      WASH7P unprocessed_pseudogene
2               - ENSG00000278267   MIR6859-1                  miRNA
3               - ENSG00000268903  AL627309.6   processed_pseudogene
4               - ENSG00000269981  AL627309.7   processed_pseudogene
5               - ENSG00000241860  AL627309.5                 lncRNA
...           ...             ...         ...                    ...
20951           - ENSG00000198695      MT-ND6         protein_coding
20952           - ENSG00000210194       MT-TE                Mt_tRNA
20953           + ENSG00000198727      MT-CYB         protein_coding
20954           + ENSG00000210195       MT-TT                Mt_tRNA
20955           - ENSG00000210196       MT-TP                Mt_tRNA
      seq_coord_system            description   gene_id_version
           <character>            <character>       <character>
1           chromosome WASP family homolog .. ENSG00000227232.5
2           chromosome microRNA 6859-1 [Sou.. ENSG00000278267.1
3           chromosome       novel pseudogene ENSG00000268903.1
4           chromosome       novel pseudogene ENSG00000269981.1
5           chromosome       novel transcript ENSG00000241860.7
...                ...                    ...               ...
20951       chromosome mitochondrially enco.. ENSG00000198695.2
20952       chromosome mitochondrially enco.. ENSG00000210194.1
20953       chromosome mitochondrially enco.. ENSG00000198727.2
20954       chromosome mitochondrially enco.. ENSG00000210195.2
20955       chromosome mitochondrially enco.. ENSG00000210196.2
      canonical_transcript      symbol  entrezid   baseMean     baseVar
               <character> <character>    <list>  <numeric>   <numeric>
1          ENST00000488147      WASH7P    653635   33.76843    218.3021
2          ENST00000619216   MIR6859-1 102466751    5.86586     14.2410
3          ENST00000494149  AL627309.6        NA   10.57676    188.1358
4          ENST00000595919  AL627309.7        NA    8.23722    108.9080
5          ENST00000484859  AL627309.5        NA    5.36797     25.5077
...                    ...         ...       ...        ...         ...
20951      ENST00000361681      MT-ND6      4541  2903.6958 1.70039e+07
20952      ENST00000387459       MT-TE        NA   152.5954 2.60570e+04
20953      ENST00000361789      MT-CYB      4519 33828.1970 1.17024e+09
20954      ENST00000387460       MT-TT        NA   196.2208 7.44177e+04
20955      ENST00000387461       MT-TP        NA    81.7716 8.44536e+03
        allZero dispGeneEst dispGeneIter dispersion  dispIter dispOutlier
      <logical>   <numeric>    <numeric>  <numeric> <numeric>   <logical>
1         FALSE    0.163174            9   0.173717        11       FALSE
2         FALSE    0.284756           10   0.330151         6       FALSE
3         FALSE    2.055268           13   2.011257         8       FALSE
4         FALSE    2.129247           12   2.092060         9       FALSE
5         FALSE    0.847645           13   0.887801        12       FALSE
...         ...         ...          ...        ...       ...         ...
20951     FALSE    0.688191           13   0.677965        13       FALSE
20952     FALSE    0.572638           13   0.568603        12       FALSE
20953     FALSE    0.456786           12   0.454881        12       FALSE
20954     FALSE    0.780555           13   0.768349        13       FALSE
20955     FALSE    0.799326           14   0.788683        12       FALSE
        dispMAP Intercept x_NKX2_vs_non_NKX2_TLX1_TLX3
      <numeric> <numeric>                    <numeric>
1      0.173717   5.08279                   -0.2896785
2      0.330151   2.46188                    0.4372742
3      2.011257   3.40201                   -0.3400029
4      2.092060   3.07204                   -1.9369051
5      0.887801   2.48725                    0.0339052
...         ...       ...                          ...
20951  0.677965  11.19400                    0.3573385
20952  0.568603   6.92985                    0.7361034
20953  0.454881  14.89139                   -0.1009107
20954  0.768349   7.64650                    0.0473476
20955  0.788683   6.02209                    0.8392170
      x_TLX1_vs_non_NKX2_TLX1_TLX3 x_TLX3_vs_non_NKX2_TLX1_TLX3 SE_Intercept
                         <numeric>                    <numeric>    <numeric>
1                        0.0894882                  0.000655532    0.0913075
2                        0.2384560                  0.083112819    0.1442070
3                       -1.0727889                  0.521363186    0.2936259
4                       -0.7176645                  0.556119889    0.3009290
5                       -0.9250758                 -0.077715368    0.2091105
...                            ...                          ...          ...
20951                    0.0582082                     1.189984     0.166396
20952                    0.4109170                     0.999996     0.153521
20953                   -0.1566909                     0.826148     0.136256
20954                   -0.1002889                    -0.148936     0.177705
20955                    0.4445913                     0.973480     0.181293
      SE_x_NKX2_vs_non_NKX2_TLX1_TLX3 SE_x_TLX1_vs_non_NKX2_TLX1_TLX3
                            <numeric>                       <numeric>
1                            0.286510                        0.245972
2                            0.429721                        0.380235
3                            0.910718                        0.812260
4                            0.993177                        0.827217
5                            0.645695                        0.598737
...                               ...                             ...
20951                        0.512833                        0.451862
20952                        0.471908                        0.416024
20953                        0.419970                        0.370030
20954                        0.547689                        0.482541
20955                        0.556548                        0.490858
      SE_x_TLX3_vs_non_NKX2_TLX1_TLX3 WaldStatistic_Intercept
                            <numeric>               <numeric>
1                            0.202832                 55.6667
2                            0.319562                 17.0718
3                            0.647689                 11.5862
4                            0.662821                 10.2085
5                            0.466507                 11.8944
...                               ...                     ...
20951                        0.369141                 67.2734
20952                        0.339570                 45.1395
20953                        0.302321                109.2898
20954                        0.394391                 43.0292
20955                        0.400602                 33.2175
      WaldStatistic_x_NKX2_vs_non_NKX2_TLX1_TLX3
                                       <numeric>
1                                     -1.0110584
2                                      1.0175771
3                                     -0.3733350
4                                     -1.9502119
5                                      0.0525096
...                                          ...
20951                                  0.6967933
20952                                  1.5598456
20953                                 -0.2402806
20954                                  0.0864499
20955                                  1.5078982
      WaldStatistic_x_TLX1_vs_non_NKX2_TLX1_TLX3
                                       <numeric>
1                                       0.363815
2                                       0.627127
3                                      -1.320746
4                                      -0.867565
5                                      -1.545045
...                                          ...
20951                                   0.128819
20952                                   0.987725
20953                                  -0.423455
20954                                  -0.207835
20955                                   0.905743
      WaldStatistic_x_TLX3_vs_non_NKX2_TLX1_TLX3 WaldPvalue_Intercept
                                       <numeric>            <numeric>
1                                      0.0032319          0.00000e+00
2                                      0.2600835          2.40586e-65
3                                      0.8049593          4.84097e-31
4                                      0.8390191          1.81629e-24
5                                     -0.1665900          1.26539e-32
...                                          ...                  ...
20951                                   3.223652          0.00000e+00
20952                                   2.944887          0.00000e+00
20953                                   2.732688          0.00000e+00
20954                                  -0.377635          0.00000e+00
20955                                   2.430041         6.01496e-242
      WaldPvalue_x_NKX2_vs_non_NKX2_TLX1_TLX3
                                    <numeric>
1                                   0.3119885
2                                   0.3088790
3                                   0.7088991
4                                   0.0511509
5                                   0.9581226
...                                       ...
20951                                0.485932
20952                                0.118796
20953                                0.810113
20954                                0.931109
20955                                0.131581
      WaldPvalue_x_TLX1_vs_non_NKX2_TLX1_TLX3
                                    <numeric>
1                                    0.715996
2                                    0.530576
3                                    0.186586
4                                    0.385632
5                                    0.122335
...                                       ...
20951                                0.897501
20952                                0.323287
20953                                0.671963
20954                                0.835358
20955                                0.365072
      WaldPvalue_x_TLX3_vs_non_NKX2_TLX1_TLX3  betaConv  betaIter  deviance
                                    <numeric> <logical> <numeric> <numeric>
1                                    0.997421      TRUE         3   633.490
2                                    0.794799      TRUE         4   418.591
3                                    0.420843      TRUE         7   514.910
4                                    0.401459      TRUE         7   473.735
5                                    0.867693      TRUE         5   432.181
...                                       ...       ...       ...       ...
20951                              0.00126567      TRUE         4  1387.729
20952                              0.00323072      TRUE         4   924.283
20953                              0.00628198      TRUE         4  1757.500
20954                              0.70570171      TRUE         4   979.682
20955                              0.01509713      TRUE         5   838.188
       maxCooks   replace   dispFit   baseMean log2FoldChange      lfcSE
      <numeric> <logical> <numeric>  <numeric>      <numeric>  <numeric>
1     0.2014267     FALSE  0.862604   33.76843   -1.09289e-05 0.00144270
2     0.1805231     FALSE  3.661655    5.86586    2.60088e-06 0.00144269
3     0.1591125     FALSE  2.152866   10.57676   -3.89797e-07 0.00144269
4     0.0484600     FALSE  2.686454    8.23722   -1.39518e-06 0.00144269
5     0.0797114     FALSE  3.975848    5.36797    8.68260e-08 0.00144269
...         ...       ...       ...        ...            ...        ...
20951 0.0118543     FALSE  0.281013  2903.6958    1.56320e-06 0.00144269
20952 0.0605151      TRUE  0.404386   152.5954    1.18676e-06 0.00144269
20953 0.0198521     FALSE  0.274757 33828.1970   -5.76429e-07 0.00144269
20954 0.0905510     FALSE  0.375436   196.2208    1.68287e-07 0.00144269
20955 0.1731561      TRUE  0.517170    81.7716    1.10785e-06 0.00144269
         pvalue      padj
      <numeric> <numeric>
1     0.3119885  0.676482
2     0.3088790  0.674928
3     0.7088991  0.902149
4     0.0511509  0.351316
5     0.9581226  0.989285
...         ...       ...
20951  0.485932  0.793478
20952  0.118796  0.479232
20953  0.810113  0.942216
20954  0.931109  0.980599
20955  0.131581  0.492331

The xlsx table of genes is available here.

Volcano-plot

Log2 fold-change in this Volcano plot is shrinked using Apeglm procedure.

Heatmap of differentially expressed genes

We will display the significant genes at different stringency levels to handle various cases. Note that they are displayed in increasing stringency order but a good starting point would be to look at the 0.05 threshold.

Using q-value < 0.05

We found 425 significant genes with q-value < 0.05

Using q-value < 5e-04

We found 63 significant genes with q-value < 5e-04

Top 50 p-value

The top 50 p-value threshold is arbitrary and mostly chosen to display a heatmap with a few readable gene names.

Top 50 p-value upregulated

Top 50 p-value downregulated

Functional enrichment

We use various thresholds for functional enrichment analysis to get an appropriate number of genes in each class. For the classes defined by the q-value, note that if the number of gene in a class is less than a dozen, the statistical power of the enrichment test is very low. At the opposite, if the number of genes is too high, the enrichment test will be biased by the large number of genes in the background.

ORA

EnrichR queries are available below:

g:Profiler queries are availables below. All these queries include the filter on |shrinked log2 fold-change| > 0.2 :

GSEA

The xlsx table of gsea results is available here.

Picking joint bandwidth of 0.181