Practical implementation of a parametrized binary matrix clustering algorithm
Master thesis
Permanent lenke
https://hdl.handle.net/11250/2834804Utgivelsesdato
2021-11-22Metadata
Vis full innførselSamlinger
- Master theses [220]
Sammendrag
In the [Fomin et al., 2020b] paper, the authors gave an exact parameterized algorithm for the Binary r-Means clustering problem, parameterized by $k+r$, with the runtime of $2^{\mathcal{O} (\sqrt{rk log(k+r) logr})}*nm$. The problem of Binary r-Means takes as input an $n \times m$ binary matrix \textbf{A} with columns ($\textbf{a}^1,...,\textbf{a}^n$), a positive integer $\textbf{r}$ and a nonnegative integer $\textbf{k}$. The task is then to decide whether there is a positive integer $\textbf{r}\sp{\prime} \leq \textbf{r}$, a partition $\{I_1, ..., I_{r\sp{\prime}}\}$ of $\{1,...,n\}$ and vectors $(\textbf{c}^1,...,\textbf{c}^{r\sp{\prime}}) \in \{0,1\}^m$ such that $\sum_{i = 1}^{r\sp{\prime}} \sum_{j \in I_i} d_H(c^i, a^j) \leq k $, where $d_H$ is the Hamming distance. As the Binary r-Means problem is NP-complete, we cannot expect an algorithm in polynomial time. Therefore its performance as a practical implementation should be evaluated manually in detail, before we make any conclusions about it's viability. An implementation of the algorithm was therefore developed and tested against randomly generated binary matrices, as to measure its performance on a standard desk-top computer. The results show that even a sub-optimal implementation is inconceivably better than a brute force, and viable with small parameters.