Hit Dexter 2.0: Machine-Learning Models for the Prediction of Frequent Hitters

Stork, Conrad; Chen, Ya; Sicho, Martin; Kirchmair, Johannes

dc.contributor.author	Stork, Conrad
dc.contributor.author	Chen, Ya
dc.contributor.author	Sicho, Martin
dc.contributor.author	Kirchmair, Johannes
dc.date.accessioned	2021-04-20T13:59:10Z
dc.date.available	2021-04-20T13:59:10Z
dc.date.created	2019-01-10T14:11:26Z
dc.date.issued	2019
dc.Published	Journal of Chemical Information and Modeling. 2019, 59 (3), 1030-1043.
dc.identifier.issn	1549-9596
dc.identifier.uri	https://hdl.handle.net/11250/2738716
dc.description.abstract	Assay interference caused by small molecules continues to pose a significant challenge for early drug discovery. A number of rule-based and similarity-based approaches have been derived that allow the flagging of potentially “badly behaving compounds”, “bad actors”, or “nuisance compounds”. These compounds are typically aggregators, reactive compounds, and/or pan-assay interference compounds (PAINS), and many of them are frequent hitters. Hit Dexter is a recently introduced machine learning approach that predicts frequent hitters independent of the underlying physicochemical mechanisms (including also the binding of compounds based on “privileged scaffolds” to multiple binding sites). Here we report on the development of a second generation of machine learning models which now covers both primary screening assays and confirmatory dose–response assays. Protein sequence clustering was newly introduced to minimize the overrepresentation of structurally and functionally related proteins. The models correctly classified compounds of large independent test sets as (highly) promiscuous or nonpromiscuous with Matthews correlation coefficient (MCC) values of up to 0.64 and area under the receiver operating characteristic curve (AUC) values of up to 0.96. The models were also utilized to characterize sets of compounds with specific biological and physicochemical properties, such as dark chemical matter, aggregators, compounds from a high-throughput screening library, drug-like compounds, approved drugs, potential PAINS, and natural products. Among the most interesting outcomes is that the new Hit Dexter models predict the presence of large fractions of (highly) promiscuous compounds among approved drugs. Importantly, predictions of the individual Hit Dexter models are generally in good agreement and consistent with those of Badapple, an established statistical model for the prediction of frequent hitters. The new Hit Dexter 2.0 web service, available at http://hitdexter2.zbh.uni-hamburg.de, not only provides user-friendly access to all machine learning models presented in this work but also to similarity-based methods for the prediction of aggregators and dark chemical matter as well as a comprehensive collection of available rule sets for flagging frequent hitters and compounds including undesired substructures.	en_US
dc.language.iso	eng	en_US
dc.publisher	American Chemical Society	en_US
dc.title	Hit Dexter 2.0: Machine-Learning Models for the Prediction of Frequent Hitters	en_US
dc.type	Journal article	en_US
dc.type	Peer reviewed	en_US
dc.description.version	acceptedVersion	en_US
dc.rights.holder	Copyright 2019 American Chemical Society.	en_US
cristin.ispublished	true
cristin.fulltext	postprint
cristin.qualitycode	1
dc.identifier.doi	10.1021/acs.jcim.8b00677
dc.identifier.cristin	1654224
dc.source.journal	Journal of Chemical Information and Modeling	en_US
dc.source.40	59
dc.source.14	3
dc.source.pagenumber	1030-1043	en_US
dc.relation.project	Bergens forskningsstiftelse: BFS2017TMT01	en_US
dc.identifier.citation	Journal of Chemical Information and Modeling. 2019, 59 (3), 1030–1043.	en_US
dc.source.volume	59	en_US
dc.source.issue	3	en_US

Tilhørende fil(er)

Filnavn:: hit_preproof.pdf
Størrelse:: 5.120Mb
Format:: PDF
Beskrivelse:: accepted version

Åpne

Denne innførselen finnes i følgende samling(er)

Department of Chemistry [433]
Registrations from Cristin [9791]

Vis enkel innførsel