Formation PROSE
April 24, 2024
The perfect / ideal gene marker:
Median of the number of 16S rRNA copies in 3,070 bacterial species according to data reported in rrnDB database – 2018
[B] The positions of sequence variation within 16S and 23S rRNA are shown along the gene organization of rrn operons. A total of 33 and 77 differences were identified in 16S rRNA and 23S rRNA, respectively.
[C] The number of bases that are different from the conserved sequence are shown for 16S and 23S rRNA for each rrn operon
remove_chimera.py
--input-biom clustering.biom \
--input-fasta clustering.fasta \
--non-chimera remove_chimera.fasta \
--out-abundance remove_chimera.biom \
--summary remove_chimera.html
@ST-E00114:1342:HHMGVCCX2:1:1101:3123:2012 1:N:0:TCCGGAGA+TCAGAGCC
CTTGGTCATTTAGAG
+
***<<*AEF???***
@ST-E00114:1342:HHMGVCCX2:1:1101:11556:2030 1:N:0:TCCGGAGA+TCAGAGCC
CATTGGCCATATCAT
+
AAAE??<<*???***
Meaning
@Identifier1 (comment)
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
+
QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQ
@Identifier2 (comment)
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
+
QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQ
Measure of the quality of the identification of the nucleobases generated by automated DNA sequencing
file.fastq.gz
Try to answer to (not always) simple questions:
Warning
QC without context leads to misinterpretation!
vsearch
Rognes et al. (2016), flash
Magoč and Salzberg (2011) or pear
Zhang et al. (2013) (only in command line)ASVs are identical denoised reads with as few as 1 base pair difference between variants, representing an inference of the biological sequences prior to amplification and sequencing errors.
ASV are inferred by a de novo process in which biological sequences are discriminated from errors on the basis of the expectation that biological sequences are more likely to be repeatedly observed than are error-containing sequences
d
: the small local linking thresholdReference based: against a database of «genuine» sequences
De novo: against abundant sequences in the samples
FROGS uses vsearch
Rognes et al. (2016) as chimera removal tool
blast
Bacteria;Firmicutes;Bacilli;Staphylococcales;Staphylococcaceae;Staphylococcus;Staphylococcus xylosus
Bacteria;Firmicutes;Bacilli;Staphylococcales;Staphylococcaceae;Staphylococcus;Staphylococcus saprophyticus
Strictly identical (V1-V3 amplification) on 499 nucleotides
Bacteria;Firmicutes;Bacilli;Staphylococcales;Staphylococcaceae;Staphylococcus;Multi-affiliations
Firmicutes
?fun
esquisse
Note \(c_i\) the number of species observed \(i\) times (\(i = 1, 2, \ldots\)) and \(p_s\) the proportion of species \(s\) (\(s = 1, \ldots, S\))
Note \(\color{red}{n_s^1}\) the count of species \(s\) (\(s = 1, \ldots , S\) ) in \(\color{red}{community \space 1}\) and \(\color{blue}{n_s^2}\) the count in \(\color{blue}{community \space 2}\).
For each branch \(e\), note \(l_e\) its length and \(\color{red}{p_e}\) (resp. \(\color{blue}{q_e}\)) the fraction of \(\color{red}{community \space 1}\) (resp. \(\color{blue}{community \space 2}\)) below branch \(e\).
Tip
Different distances capture different features of the samples.
There is no “one size fits all”!
But variance is not a very good measure of β-diversity.
complete
ward.D2
single
Does weaning affect community composition?
Are groups A and B different?
We know that groups A and B are different.
How do they differ (in terms of taxa)?
Formation PROSE – 24 avril 2024