Evaluation and Improvement of Machine Learning Algorithms in Drug Discovery

Dyrland, Kjetil

Dyrland, Kjetil

Master thesis

Åpne

master thesis (4.293Mb)

Permanent lenke

https://hdl.handle.net/11250/3021973

Utgivelsesdato

2022-06-01

Metadata

Vis full innførsel

Samlinger

Master theses [205]

Sammendrag

Drug discovery plays a critical role in today’s society for treating and preventing sickness and possibly deadly viruses. In early drug discovery development, the main challenge is to find candidate molecules to be used as drugs to treat a disease. This also means assessing key properties that are wanted in the inter- action between molecules and proteins. It is a very difficult problem because the molecular space is so big and complex. Drug discovery development is es- timated to take around 12–15 years on average, and the costs of developing a single drug amount to $2.8 billion dollars in the US. Modern drug discovery and drug development often start with finding candi- date drug molecules (‘compounds’) that can bind to a target, usually a protein in our body. Since there are billions of possible molecules to test, this becomes an endless search for compounds that show promising bioactivity. The search method is called high-throughput screening (HTS), or virtual HTS (VHTS) in a virtual environment. The traditional approach to HTS has been to test every compound one by one. More recent approaches have seen the use of robotics and of features extracted from the molecule, combining them with machine learning algorithms, in an effort to make the process more automated. Research has shown that this will still lead to human errors and bias. So, how can we use machine learning algorithms to make this approach more cost-efficient and more robust to human errors? This project tried to address these issues and led to two scientific papers as a result. The first paper explores how common evaluation metrics used for classification can actually be unsuited to the task, leading to severe consequences when put into a real application. The argument is based on basic principles of Decision Theory, which is recognized in the field of machine learning but has not been put into much use. It makes a distinction between predicting the most probable class and predicting the most valuable class in terms of the “cost” or “gains” for the classes. In an algorithm for classifying a particular disease in a patient, the wrong classification could lead to a life or death situation. The principles also apply to drug discovery, where the cost of further developing and optimizing a "useless" drug could be huge. The goal of the classifier should therefore not be to guess the correct class but to choose the optimal class, and the metric must depend on the type of classification problem. Thus, we show that common metrics such as precision, balanced accuracy, F1-score, Area Under The Curve, Matthews Correlation Coefficient, and Fowlkes-Mallows index are affected by this problem, and propose an evaluation method grounded on the foundations of Decision Theory to provide a solution to this problem. The metric presented, called utility, takes into account gains and losses for each correct or incorrect classification of the confusion matrix. For this to work effectively, the output of the machine learning algorithm needs to be a set of sensible probabilities for each class. This brings us to the second paper. Machine learning algorithms usually output a set of real numbers for the classes they try to predict, which, possibly after some transformation (for exam- ple the ‘softmax’ function), are meant to represent probabilities for the classes. However, the problem is that these numbers cannot be reliably interpreted as actual probabilities, in the sense of degrees of belief. In the paper, we propose the implementation of a probability transducer to transform the output of the algorithm into sensible probabilities. These are then used in conjunction with the utilities to choose the class with the maximal expected utility. The results show that the transducer gives better scores, in terms of the utilities, for all cases compared to the standard method used in machine learning.

Utgiver

The University of Bergen