Exploratory analysis

In this section we will see descriptive figures about quality of the data, reads with adapter, reads mapped to miRNAs, reads mapped to other small RNAs.

size distribution

After adapter removal, we can plot the size distribution of the small RNAs.

miRNA

total miRNA expression annotated with mirbase

Distribution of mirna expression

cumulative distribution of miRNAs

complexity

Number of miRNAs with > 3 counts.

colSums(counts(obj) > 10)
miRQC_A 590
miRQC_A_repeat 626
miRQC_B 531
miRQC_B_repeat 494
miRQC_C 647
miRQC_C_repeat 640
miRQC_D 619
miRQC_D_repeat 631

Others small RNA

The data was analyzed with seqcluster

This tools used all reads, uniquely mapped and multi-mapped reads. The first step is to cluster sequences in all locations they overlap. The second step is to create meta-clusters: is the unit that merge all clusters that share the same sequences. This way the output are meta-clusters, common sequences that could come from different region of the genome.

genome covered

In this table 1 means % of the genome with at least 1 read, and 0 means % of the genome without reads.

coverage ratio_genome
0 0.9997890
1 0.0002112

The normal value for human data with strong small RNA signal is: 0.0002. This will change for smaller genomes.

classification

Number of reads in the data after each step:

  • raw: initial reads
  • cluster: after cluster detection
  • multimap: after meta-cluster detection

Check complex meta-clusters: This kind of events happen when there are small RNA over the whole genome, and all repetitive small rnas map to thousands of places and sharing many sequences in many positions. If any meta-cluster is > 40% of the total data, maybe it is worth to add some filters like: minimum number of counts -e or --min--shared in seqcluster prepare

     miRQC_A miRQC_A_repeat miRQC_B miRQC_B_repeat miRQC_C miRQC_C_repeat
     miRQC_D miRQC_D_repeat

complexity

Number of miRNAs with > 10 counts.

colSums(clus_ma > 10)
miRQC_A 710
miRQC_A_repeat 717
miRQC_B 504
miRQC_B_repeat 491
miRQC_C 706
miRQC_C_repeat 710
miRQC_D 732
miRQC_D_repeat 723

Contribution by class

Differential expression

DESeq2 is used for this analysis.

Analysis for miRNA

MA-plots

baseMean log2FoldChange lfcSE stat pvalue padj
hsa-miR-10a-5p 74339.987 7.109443 0.0548203 129.68640 0 0
hsa-miR-10b-5p 224315.085 6.742609 0.0544403 123.85329 0 0
hsa-miR-133a-3p 7913.271 5.212065 0.0712453 73.15658 0 0
hsa-miR-141-3p 10951.626 9.265565 0.1646798 56.26411 0 0
hsa-miR-143-3p 708824.163 2.675688 0.0490673 54.53099 0 0
hsa-miR-148a-3p 22314.666 3.931444 0.0573949 68.49818 0 0

Analysis for isomiRs

MA-plots

baseMean log2FoldChange lfcSE stat pvalue padj
hsa-miR-10a-5p.iso.t5:0.t3:0.ad:u-A.mm:0 2765.335 6.714222 0.1423670 47.16135 0 0
hsa-miR-10a-5p.iso.t5:0.t3:d-T.ad:0.mm:0 1785.733 6.689695 0.1741045 38.42346 0 0
hsa-miR-10a-5p.iso.t5:0.t3:u-G.ad:0.mm:0 46731.083 7.038884 0.0483076 145.70955 0 0
hsa-miR-10a-5p.iso.t5:0.t3:u-TG.ad:0.mm:0 3222.939 6.982275 0.1455374 47.97583 0 0
hsa-miR-10a-5p.iso.t5:d-T.t3:0.ad:0.mm:0 3713.865 7.028550 0.1394841 50.38961 0 0
hsa-miR-10a-5p.iso.t5:d-T.t3:0.ad:u-A.mm:0 3873.404 7.126067 0.1373717 51.87435 0 0

Analysis for clusters

MA-plots

baseMean log2FoldChange lfcSE stat pvalue padj miRQC_A miRQC_A_repeat miRQC_B miRQC_B_repeat miRQC_C miRQC_C_repeat miRQC_D miRQC_D_repeat
297 14544.633 5.560037 0.0946888 58.71908 0 0 22681 31034 812 570 6907 7753 18505 19849
334 11359.672 9.382545 0.1810657 51.81847 0 0 18401 23887 41 34 4976 5468 15495 15582
360 12436.680 -4.044059 0.0862888 -46.86657 0 0 1034 1347 27992 20045 22317 23068 7280 7983
478 8452.558 5.186183 0.0964175 53.78882 0 0 13495 17804 602 441 4318 4724 10346 11209
529 2577.567 5.266629 0.1290920 40.79748 0 0 3921 5424 168 125 1308 1344 3432 3478
549 2769.405 5.199808 0.1208995 43.00935 0 0 4420 5441 195 132 1445 1581 3634 3790

Files

Files generated contains raw count, normalized counts, log2 normalized counts and DESeq2 results.