Vis enkel innførsel

dc.contributor.authorMiao, Fengming
dc.date.accessioned2019-04-10T15:38:31Z
dc.date.available2019-04-10T15:38:31Z
dc.date.issued2019-04-05
dc.date.submitted2019-04-04T22:00:03Z
dc.identifier.urihttps://hdl.handle.net/1956/19310
dc.description.abstractIn biology, the cell membrane is an important component of a cell and usually works as a “fence” to distinguish the inside and outside of a cell. The key role is to protect the cells from being interfered by their surroundings by preventing the molecules that will enter into the cell. However as we know, cells need to keep communicating with their surroundings to acquire nutrition and other necessary molecules in order to stay alive and grow. Due to this reason, membrane proteins are used as molecular carriers to participate the molecular communication and regulate the biological activities. There are two kinds of membrane proteins: integral and peripheral. In this project, we only focus on the latter. Unlike the integral membrane proteins which can go across the whole membrane, peripheral membrane proteins can only attach to the surface of the membrane through various interactions. Because peripheral proteins are also soluble, it is difficult to differentiate them from other kinds of proteins (i.e. non membrane-binding) from sequence or structure. In this project, we will develop a method to predict from its structure wether a protein is membrane-binding protein or not based on two machine learning algorithms: k-nearest neighbors(KNN) and support vector machine(SVM). We use them to train the data and create two models respectively, which will be used to classify new proteins as well as compare their performance. By for example collecting different features of proteins, adjusting the parameters of the algorithms or changing size and structure of the dataset, we can improve the performances of the algorithms as well as predict the protein type more accurately. We also use ROC curve and AUC to present the performance in overview, and cross validation to verify the result. For the problems in this field, several challenges should be considered as well, such as collecting of features, analysis and dealing with the huge variety of data, as well as the choice of machine learning algorithms for a design based on functional requirements, data structure, efficiency and other factors. In this project, we will encounter these challenges and solve them by effective methods.en_US
dc.language.isoengeng
dc.publisherThe University of Bergenen_US
dc.titleDiscriminating between surfaces of peripheral membrane proteins and reference proteins using machine learning algorithmsen_US
dc.typeMaster thesis
dc.date.updated2019-04-04T22:00:03Z
dc.rights.holderCopyright the Author. All rights reserveden_US
dc.description.degreeMaster's Thesis in Informaticsen_US
dc.description.localcodeINF399
dc.subject.nus754199eng
fs.subjectcodeINF399
fs.unitcode12-12-0


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel