Detecting inosine in nanopore sequencing data with machine learning
Master thesis
Permanent lenke
https://hdl.handle.net/11250/2775028Utgivelsesdato
2021-08-13Metadata
Vis full innførselSamlinger
- Master theses [197]
Sammendrag
Detecting modifications in DNA has been a long-standing challenge in understanding the workings of the genome, particularly with regards to regulatory function. The currently most widely used sequencing technology, NGS, offers protocols to tackle these challenges but these are modification specific and involve convoluting preparation steps. As an alternative, nanopore sequencing offers the direct observation of such modifications. Though inosine has been demonstrated to be distinguishable from adenine in poly(A) RNA using nanopore sequencing, no framework has been proposed for the general detection of inosine presence in nanopore sequence data. In this thesis, I propose a test-based approach to use out-of-the-box classifiers to distinguish between sequences containing inosine and sequences that don’t based on features present in nanopore sequencing data. The proposed model achieves a high accuracy on this classification task, providing avenues for further development of a self-contained inosine detector, as well as further exploration of the same approach to other modifications.