Prediction of Polycomb/Trithorax Response Elements using Support Vector Machines
MetadataShow full item record
Polycomb/Trithorax Response Elements (PREs) are epigenetic elements that can maintain established transcriptional states over multiple cell divisions. Sequence motifs in known PREs have enabled genome-wide PRE prediction by the PREdictor and jPREdictor, using combined motif occurrences for scoring sequence windows. The EpiPredictor predicts PREs by using the method of Support Vector Machines (SVM), which enables the construction of non-linear classifiers by use of kernel functions. Aspects of using SVMs for PRE prediction can be investigated, such as setting of SVM parameters, using SVM decision values for scoring and using alternative feature sets. The PRE prediction implementation presented in this thesis, called PRESVM, uses SVM decision values to score sequence windows. PRESVM implements the feature sets used by (j)PREdictor and EpiPredictor, as well as feature sets using relative motif occurrence distances and periodic motif occurrence. Grid search and Particle Swarm Optimization are supported for setting SVM parameters. For evaluating PRE predictions of multiple classifiers against experimental data sets, an application called PREsent has been implemented. For a similar configuration for PRESVM and jPREdictor, PRESVM predicted a larger number of candidate PREs, which were more sensitive to but had lower Positive Predictive Values against experimental data considered than those of jPREdictor. A formal relationship was established between the PRESVM and jPREdictor decision functions for this configuration. The trade-offs make it difficult to conclude that either classifier is superior. Many configurations remain to be tested, and the results encourage further testing.