Development, validation and application of in-silico methods to predict the macromolecular targets of small organic compounds

Mathai, Neann Sarah

dc.contributor.author	Mathai, Neann Sarah
dc.date.accessioned	2022-06-28T09:36:07Z
dc.date.issued	2021-12-10
dc.date.submitted	2021-12-03T11:49:48.969Z
dc.identifier	container/ef/37/97/43/ef379743-a91c-4e6e-bf2a-bb8167407f21
dc.identifier.isbn	9788230847794
dc.identifier.isbn	9788230857410
dc.identifier.uri	https://hdl.handle.net/11250/3001272
dc.description.abstract	Computational methods to predict the macromolecular targets of small organic drugs and drug-like compounds play a key role in early drug discovery and drug repurposing efforts. These methods are developed by building predictive models that aim to learn the relationships between compounds and their targets in order to predict the bioactivity of the compounds. In this thesis, we analyzed the strategies used to validate target prediction approaches and how current strategies leave crucial questions about performance unanswered. Namely, how does an approach perform on a compound of interest, with its structural specificities, as opposed to the average query compound in the test data? We constructed and present new guidelines on validation strategies to address these short-comings. We then present the development and validation of two ligand-based target prediction approaches: a similarity-based approach and a binary relevance random forest (machine learning) based approach, which have a wide coverage of the target space. Importantly, we applied a new validation protocol to benchmark the performance of these approaches. The approaches were tested under three scenarios: a standard testing scenario with external data, a standard time-split scenario, and a close-to-real-world test scenario. We disaggregated the performance based on the distance of the testing data to the reference knowledge base, giving a more nuanced view of the performance of the approaches. We showed that, surprisingly, the similarity-based approach generally performed better than the machine learning based approach under all testing scenarios, while also having a target coverage which was twice as large. After validating two target prediction approaches, we present our work on a large-scale application of computational target prediction to curate optimized compound libraries. While screening large collections of compounds against biological targets is key to identifying new bioactivities, it is resource intensive and challenging. Small to medium-sized libraries, that have been optimized to have a higher chance of producing a true hit on an arbitrary target of interest are therefore valuable. We curated libraries of readily purchasable compounds by: i. utilizing property filters to ensure that the compounds have key physicochemical properties and are not overly reactive, ii. applying a similaritybased target prediction method, with a wide target scope, to predict the bioactivities of compounds, and iii. employing a genetic algorithm to select compounds for the library to maximize the biological diversity in the predicted bioactivities. These enriched small to medium-sized compound libraries provide valuable tool compounds to support early drug development and target identification efforts, and have been made available to the community. The distinctive contributions of this thesis include the development and benchmarking of two ligand-based target prediction approaches under novel validation scenarios, and the application of target prediction to enrich screening libraries with biologically diverse bioactive compounds. We hope that the insights presented in this thesis will help push data driven drug discovery forward.	en_US
dc.language.iso	eng	en_US
dc.publisher	The University of Bergen	en_US
dc.relation.haspart	Paper 1: Mathai, N.; Chen, Y.; Kirchmair, J. Validation strategies for target prediction methods, Briefings in Bioinformatics, 2020, 21(3), pp. 791-802. The article is available at: <a href="https://hdl.handle.net/1956/21871" target="blank">https://hdl.handle.net/1956/21871</a>	en_US
dc.relation.haspart	Paper 2: Mathai, N.; Kirchmair, J. Similarity-based methods and machine learning approaches for target prediction in early drug discovery: performance and scope, International Journal of Molecular Sciences, 2020, 21(10), 3585. The article is available at: <a href="https://hdl.handle.net/11250/2730409" target="blank">https://hdl.handle.net/11250/2730409</a>	en_US
dc.relation.haspart	Paper 3: Mathai, N.; Stork, C.; Kirchmair, J. BonMOLière: Small-sized libraries of readily purchasable compounds, optimized to produce genuine hits in biological screens across the protein space, International Journal of Molecular Sciences, 2021, 22(15), 7773. The article is available at: <a href="https://hdl.handle.net/11250/2768773" target="blank">https://hdl.handle.net/11250/2768773</a>	en_US
dc.rights	Attribution-NonCommercial (CC BY-NC). This item's rights statement or license does not apply to the included articles in the thesis.
dc.rights.uri	https://creativecommons.org/licenses/by-nc/4.0/
dc.title	Development, validation and application of in-silico methods to predict the macromolecular targets of small organic compounds	en_US
dc.type	Doctoral thesis	en_US
dc.date.updated	2021-12-03T11:49:48.969Z
dc.rights.holder	Copyright the Author.	en_US
dc.contributor.orcid	0000-0002-5763-6304
dc.description.degree	Doktorgradsavhandling
fs.unitcode	12-31-0
dc.date.embargoenddate	2022-06-10

Files in this item

Name:: archive.pdf
Size:: 16.91Mb
Format:: PDF
Description:: PDF

View/Open

This item appears in the following Collection(s)

Department of Chemistry [433]

Show simple item record

Attribution-NonCommercial (CC BY-NC). This item's rights statement or license does not apply to the included articles in the thesis.

Except where otherwise noted, this item's license is described as Attribution-NonCommercial (CC BY-NC). This item's rights statement or license does not apply to the included articles in the thesis.